爱的深处不正涌流着企图和对方分毫不差的这种不可能实现的热望吗?这种热望不正驱使着人们将不可能由另一极端变成可能,从而引导他们走向那种悲剧性的叛离之路吗?既然相爱的人不可能完全相似,那么不如干脆使他们致力于相互之间毫不相似,使这样的叛离完整地作用于媚态。

# 第一章 相关基础数学知识

Theorem: If the function ff is differentiable at x0x_0, then ff is continuous at x0x_0.

Notation: C[a,b]C[a,b] stands for the set of continuous function defined in [a,b][a,b].

Theorem (Generalized Rolle’s Theorem): Suppose fC[a,b]f\in C[a,b] is nn times differentiable on (a,b)(a,b). If f(x)f(x) is zero at the n+1n+1 distinct numbers x0,x1,...,xnx_0,x_1,...,x_n in the [a,b][a,b], then a number cc in the (a,b)(a,b) exists with f(n)(c)=0f^{(n)}(c)=0

Theorem (Taylor’s Theorem): f(x)=Pn(x)+Rn(x)f(x)=P_n(x)+R_n(x), Pn(x)=k=0nf(k)(x)k!(xx0)kP_n(x)=\sum_{k=0}^n\frac{f^{(k)}(x)}{k!}(x-x_0)^{k}, Rn(x)=f(n+1)(ξ(x))(n+1)!(xx0)n+1R_n(x)=\frac{f^{(n+1)(\xi(x))}}{(n+1)!}(x-x_0)^{n+1}. ξ(x)\xi(x) is between xx and x0x_0.


Definition: If pp^* is an approximation to pp, the absolute error is pp|p-p^*|, and the relative error is ppp\frac{|p-p^*|}{|p|}, provided that p0p\neq 0.

Definition: The number pp^* is said to approximate pp to tt significant digit if tt is the largest nonnegative integer for which ppp<5×10t\frac{|p-p^*|}{|p|}<5\times 10^{-t}.


Definition: An algorithm is said to be stable when small changes in the initial data can produce correspondingly small changes in final results. Some algorithm are stable only for certain choices of initial data, these cases are called conditionally stable.

Definition: Suppose that E0E_0 denotes an initial error and EnE_n represents the magnitude of an error after nn subsequent operations.

  • If EnnCE0E_n\approx nCE_0, where CC is a constant independent of nn, then the growth of error is said to be linear.

  • If EnCnE0E_n\approx C^nE_0, for some C>1C>1, then the growth of error is said to be exponential.

Definition: Suppose {βn}n=1\{\beta_n\}^{\infty}_{n=1} is a sequence known to converge to zero, and {αn}n=1\{\alpha_n\}^{\infty}_{n=1} converges to a number α\alpha.

  • If a positive constant KK exists with αnαKβn|\alpha_n-\alpha|\leq K|\beta_n| for large n, then we say that {α}n=1\{\alpha\}^\infty_{n=1} converges to α\alpha with the rate of convergence O(βn)O(\beta_n), writing αn=α+O(βn)\alpha_n=\alpha+O(\beta_n).

Property (Operator errors):

  • αn+βn=α+β+O(εα+εβ)\alpha_n+\beta_n=\alpha+\beta+O(\varepsilon_\alpha+\varepsilon_\beta).
  • αnβn=αβ+αO(εβ)+βO(εα)+O(εαεβ)\alpha_n\beta_n=\alpha\beta+\alpha O(\varepsilon_\beta)+\beta O(\varepsilon_\alpha)+O(\varepsilon_\alpha\varepsilon_\beta).

Example: αn=n+1n2,βn=1n\alpha_n=\frac{n+1}{n^2},\beta_n=\frac{1}{n}. αn0<n+nn2=2×1n|\alpha_n-0|<\frac{n+n}{n^2}=2\times\frac{1}{n}. So αn=0+O(1n)\alpha_n=0+O(\frac{1}{n}).

Definition: Suppose

limx0G(x)=0,limx0F(x)=L\lim_{x\rightarrow 0}G(x)=0,\lim_{x\rightarrow 0}F(x)=L

If a positive constant KK exists with F(x)LKG(x)|F(x)-L|\leq K|G(x)|, then we write F(x)=L+O(G(x))F(x)=L+O(G(x)).

Example: cos(x)+12x2=1+O(x4)cos(x)+\frac{1}{2}x^2=1+O(x^4).


Definition: A mathematical problem is said to be well-posed if a solution:

  • exists,
  • is unique,
  • depends continuously on input data

Otherwise, the problem is ill-posed.

# 第二章 一元方程的近似解

Definition: The system of mm nonlinear equations in nn unknowns can alternatively be represented by defining a function ff, mapping Rn\mathbb{R}^n into Rm\mathbb{R}^m by

f(x)=(f1(x),...,fm(x))T\textbf{f}(\textbf{x})=(f_1(\textbf{x}),...,f_m(\textbf{x}))^T

  • The function f1,f2,..,fmf_1,f_2,..,f_m are the coordinate functions of f.

  • The function f is continuous at x0Dx_0\in D provided limxx0f(x)=f(x0)\lim_{x\rightarrow x_0}\textbf{f}(\textbf{x})=\textbf{f}(\textbf{x}_0).

Theorem: Let ff be a function from DRnD\subset \mathbb{R}^n into R\mathbb{R} and x0Dx_0\in D. If

f(x)xjK,j=1,2,...,n|\frac{\partial f(\textbf{x})}{\partial x_j}|\leq K,j=1,2,...,n

whenever xjx0jδ|x_j-x_{0j}|\leq\delta, then ff is continuous at x0\textbf{x}_0.

Theorem: Suppose that fC[a,b]f\in C[a,b] and f(a)f(b)<0f(a)f(b)<0. The Bisection method generates a sequence {pn}1\{p_n\}^\infty_1 approximating a zero point pp of ff (f(p)=0f(p)=0) with

pnpba2n|p_n-p|\leq\frac{b-a}{2^ n}

So pn=p+O(12n)p_n=p+O(\frac{1}{2^n}).

# Fixed Point Method

Definition: Fixed point of given function g:RRg:\mathbb{R}\rightarrow\mathbb{R} is value xx^* such that x=g(x)x^*=g(x^*)


For given equation f(x)=0f(x)=0, there may be many equivalent fixed point problems x=g(x)x=g(x) with different choices for g.

Example: f(x)=0g(x)=xf(x)=0\Leftrightarrow g(x)=x, we can choose g(x)=xf(x)g(x)=x-f(x) or g(x)=x+3f(x)g(x)=x+3f(x).

To approximate the fixed point of a function g(x)g(x), we choose an initial approximation p0p_0, and generate the sequence by:

pn=g(pn1),n=1,2,3,...p_n=g(p_{n-1}),n=1,2,3,...

If the sequence converges to pp and g(x)g(x) is continuous, then we have

p=limnpn=limng(pn)=g(limnpn)=g(p)p=\lim_{n\rightarrow\infty}p_n=\lim_{n\rightarrow\infty}g(p_n)=g(\lim_{n\rightarrow \infty}p_n)=g(p)

This technique is called fixed point iteration (or functional iteration).


Theorem: If g(x)C[a,b]g(x)\in C[a,b] and g(x)[a,b]g(x)\in[a,b] for all x[a,b]x\in[a,b], then g(x)g(x) has a fixed point in [a,b][a,b]. What’s more, if g(x)g'(x) exists on (a,b)(a,b) and a positive constant k<1k<1 exists with g(x)k|g'(x)|\leq k for all x(a,b)x\in(a,b), then the fixed point is unique.

  • The proof is easily obtained by proof by contradiction considering the geometric feature.

Corollary: Obviously we have pnp=g(pn1)g(p)=g(ξ)pn1pp_n-p=|g(p_{n-1})-g(p)|=|g'(\xi)||p_{n-1}-p| where ξ(a,b)\xi\in(a,b). So pnpknp0p|p_n-p|\leq k^n|p_0-p|. If k<1k<1, the fixed point is unique and we have pn=p+O(kn)p_n=p+O(k^n).

What’s more, pnpknp0pknmax(p0a,bp0)|p_n-p|\leq k^n|p_0-p|\leq k^nmax(p_0-a,b-p_0), and

pnp0pnpn1+pn1pn2+...+p1p0kn1kp1p0|p_n-p_0|\leq |p_n-p_{n-1}|+|p_{n-1}-p_{n-2}|+...+|p_1-p_0|\leq \frac{k^n}{1-k}|p_1-p_0|

# Newton’s Method

Suppose that fC2[a,b]f\in C^2[a,b] and xx^* is a solution for f(x)=0f(x)=0. Let xˉ[a,b]\bar{x}\in[a,b] e an approximation to xx^* such that f(xˉ)0f'(\bar{x})\neq 0 and xˉx|\bar{x}-x^*| is very small. Due to the Taylor polynomial:

f(x)=f(xˉ)+(xxˉ)f(xˉ)+(xxˉ)22f(ξ)f(x)=f(\bar{x})+(x-\bar{x})f'(\bar{x})+\frac{(x-\bar{x})^2}{2}f''(\xi)

where ξ\xi lies between xx and xˉ\bar{x}. So

0=f(x)f(xˉ)+(xxˉ)f(xˉ)xxˉf(xˉ)f(xˉ)0=f(x^*)\approx f(\bar{x})+(x^*-\bar{x})f'(\bar{x})\\ x^*\approx\bar{x}-\frac{f(\bar{x})}{f'(\bar{x})}

Theorem: Let fC2[a,b]f\in C^2[a,b], if p[a,b]p\in[a,b] is such that f(p)=0f(p)=0 and f(p)0f'(p)\neq 0, then there exists a δ>0\delta>0 such that for any p0[pδ,p+δ]p_0\in[p-\delta,p+\delta], pn=pn1f(pn1)f(pn1)p_n=p_{n-1}-\frac{f(p_{n-1})}{f'(p_{n-1})} converges to pp.

  • The proof is based on the fixed point iteration g(x)=xf(x)f(x)g(x)=x-\frac{f(x)}{f'(x)}.

# Secant Method

Similar to Newton’s method:

pn=pn1f(pn1)f(pn1)f(pn2)pn1pn2p_{n}=p_{n-1}-\frac{f(p_{n-1})}{\frac{f(p_{n-1})-f(p_{n-2})}{p_{n-1}-p_{n-2}}}

# Method of False Position

Improved Secant Method. If f(p0)f(p1)<0f(p_0)f(p_1)<0, we can chose:

pn={pn1f(pn1)f(pn1)f(pn2)pn1pn2pn1f(pn1)f(pn1)f(pn3)pn1pn3p_n=\begin{cases} p_{n-1}-\frac{f(p_{n-1})}{\frac{f(p_{n-1})-f(p_{n-2})}{p_{n-1}-p_{n-2}}}\\ p_{n-1}-\frac{f(p_{n-1})}{\frac{f(p_{n-1})-f(p_{n-3})}{p_{n-1}-p_{n-3}}}\\ \end{cases}

dependent on f(pn1)f(pn2)<0f(p_{n-1})f(p_{n-2})<0 or f(pn1)f(pn3)<0f(p_{n-1})f(p_{n-3})<0.

  • Line1 across (pn1,f(pn1))(p_{n-1},f(p_{n-1})), (pn2,f(pn2))(p_{n-2},f(p_{n-2})).

  • Line2 across (pn1,f(pn1))(p_{n-1},f(p_{n-1})), (pn3,f(pn3))(p_{n-3},f(p_{n-3})).

We can choose which line to use by where zero point locates: [pn2,pn1][p_{n-2},p_{n-1}] or [pn1,pn3][p_{n-1},p_{n-3}].

# Error Analysis for Iteration

Definition: If positive constants λ\lambda and α\alpha exists with

limnpn+1ppnpα=λ\lim_{n\rightarrow\infty}\frac{|p_{n+1}-p|}{|p_n-p|^{\alpha}}=\lambda

then {pn}n=0\{p_n\}^\infty_{n=0} converges to pp of order α\alpha, with asymptotic error constant λ\lambda.

Higher the order is, more rapidly the sequence converges.

# Fixed point method

Theorem: If g(x)k<1|g'(x)|\leq k<1 for any x(a,b)x\in(a,b), then the fixed point iteration converges linearly to the unique fixed point.

  • Proof:

limnpn+1ppnp1=limng(pn)g(p)pnp=limng(ξn)=g(p)=Constantξn(pn,p)\lim_{n\rightarrow\infty}\frac{|p_{n+1}-p|}{|p_n-p|^1}=\lim_{n\rightarrow\infty}\frac{|g(p_n)-g(p)|}{|p_n-p|}=\lim_{n\rightarrow\infty}|g'(\xi_n)|=|g'(p)|=Constant\\ \xi_n\in(p_n,p)

​ So α=1\alpha=1。If g(p)=0g'(p)=0, then the order might be higher.

What’s more, if g(p)=0g'(p)=0 and g(p)M|g''(p)|\leq M, then we have:

limnpn+1ppnp2=limng(pn)g(p)pnp2=limng(p)+g(p)(pnp)+g(ξ)2(pnp)2g(p)pnp2=limng(ξ)2=M2\lim_{n\rightarrow\infty}\frac{|p_{n+1}-p|}{|p_n-p|^2}=\lim_{n\rightarrow\infty}\frac{|g(p_n)-g(p)|}{|p_n-p|^2}\\ =\lim_{n\rightarrow\infty}\frac{|g(p)+g'(p)(p_{n}-p)+\frac{g''(\xi)}{2}(p_n-p)^2-g(p)|}{|p_n-p|^2}\\ =\lim_{n\rightarrow\infty}\frac{|g''(\xi)|}{2}=\frac{M}{2}

So α=2\alpha=2 if the g(x)g''(x) has a bound.


If g(p)>1|g'(p)|>1, then the iteration diverges with any starting point other than p.

# From Root to Fix point

For root finding problem f(x)=0f(x)=0, we let g(x)g(x) be in the form

g(x)=xϕ(x)f(x),ϕ(x)0g(x)=x-\phi(x)f(x),\phi(x)\neq 0

so g(x)=xf(x)=0g(x)=x\Rightarrow f(x)=0.

Since g(x)=1ϕ(x)f(x)ϕ(x)f(x)g'(x)=1-\phi'(x)f(x)-\phi(x)f'(x), let x=px=p, g(p)=1ϕ(p)f(p)g'(p)=1-\phi(p)f'(p), and g(p)=0g'(p)=0 if and only if ϕ(x)=1f(p)\phi(x)=\frac{1}{f'(p)},

Definition: A solution pp of f(x)=0f(x)=0 is a zero of multiplicity mm of f(x)f(x) if for xpx\neq p, we can write

f(x)=(xp)mq(x)limxpq(x)0f(x)=(x-p)^mq(x)\\ \lim_{x\rightarrow p}q(x)\neq 0

when it comes to Newton’s method, g(x)=xf(x)/f(x)g(x)=x-f(x)/f'(x), we have

g(x)=x(xp)mq(x)m(xp)m1q(x)+(xp)mq(x)=x(xp)q(x)mq(x)+(xp)q(x)limxpg(x)=11m0g(x)=x-\frac{(x-p)^mq(x)}{m(x-p)^{m-1}q(x)+(x-p)^mq'(x)}\\ =x-(x-p)\frac{q(x)}{mq(x)+(x-p)q'(x)}\\ \lim_{x\rightarrow p}g'(x)=1-\frac{1}{m}\neq 0

注:这里我存疑,因为都是极限意义下,x=px=pg(x)g(x) 甚至无定义。

但这是数值计算,实际中xxpp 总有点差值的。

# Aitken’s Δ2 Method

Suppose {pn}n=0\{p_n\}^\infty_n=0 is a linearly convergent sequence (α=1\alpha=1) with limit pp:

limnpn+1ppnp=λ0\lim_{n\rightarrow\infty}\frac{p_{n+1}-p}{p_n-p}=\lambda\neq 0

So when n is large enough:

pn+1ppnppn+2ppn+1pppn(pn+1pn)2pn+22pn+1+pn\frac{p_{n+1}-p}{p_n-p}\approx\frac{p_{n+2}-p}{p_{n+1}-p}\\ p\approx p_n-\frac{(p_{n+1}-p_n)^2}{p_{n+2}-2p_{n+1}+p_n}

Aitken’s Δ2\Delta^2 method is to define a new sequence:

qn=pn(pn+1pn)2pn+22pn+1+pnq_n=p_n-\frac{(p_{n+1}-p_n)^2}{p_{n+2}-2p_{n+1}+p_n}

{qn}n\{q_n\}^\infty_{n} will converge to pp more rapidly than {pn}\{p_n\}

Definition: The forward difference Δpn\Delta p_n of a sequence {pn}n\{p_n\}^\infty_n is defined by:

Δpn=pn+1pn\Delta p_n=p_{n+1}-p_n

So qn=pn(Δpn)2Δ2pnq_n=p_n-\frac{(\Delta p_n)^2}{\Delta^2p_n}.

Theorem: limnqnppnp=0\lim_{n\rightarrow\infty}\frac{q_n-p}{p_n-p}=0, so qnq_n converges faster than pnp_n.

# Steffensen’s Method (Aitken’s Method for fixed point iteration)

Repeat:

{p0p1=g(p0)p2=g(p1)p3=p0(Δp0)2Δ2p0{p3p4=g(p3)p5=g(p4)p6=p3(Δp3)2Δ2p3{p6p7=g(p6)p8=g(p7)p9=p6(Δp6)2Δ2p6......\begin{cases}p_0\\p_1=g(p_0)\\p_2=g(p_1)\\p_3=p_0-\frac{(\Delta p_0)^2}{\Delta^2p_0}\end{cases} \begin{cases}p_3\\p_4=g(p_3)\\p_5=g(p_4)\\p_6=p_3-\frac{(\Delta p_3)^2}{\Delta^2p_3}\end{cases} \begin{cases}p_6\\p_7=g(p_6)\\p_8=g(p_7)\\p_9=p_6-\frac{(\Delta p_6)^2}{\Delta^2p_6}\end{cases}......

Theorem: If g(p)1g'(p)\neq 1, and there exists a δ>0\delta>0 such that gC3[pδ,p+δ]g\in C^3[p-\delta,p+\delta], then Steffensen’s method gives a quadratic (α=2\alpha=2) convergence for any p0[pδ,p+δ]p_0\in[p-\delta,p+\delta].

# Zeros of Polynomials and Muller’s Method

Find the root of given polynomial:

f(x)=xn+a1xn1+...+an1x+anf(x)=x^n+a_1x^{n-1}+...+a_{n-1}x+a_n

initially we have three points: (x0,f(x0)),(x1,f(x1)),(x2,f(x2))(x_0,f(x_0)),(x_1,f(x_1)),(x_2,f(x_2)), then we draw a parabola across the three points:

1

There are two meeting points of x-axis and the parabola, we choose the one which is closer to x2x_2 as x3x_3, and repeat the iteration to draw parabola across (x1,f(x1)),(x2,f(x2)),(x3,f(x3))(x_1,f(x_1)),(x_2,f(x_2)),(x_3,f(x_3)).

When xn+1xn<ε|x_{n+1}-x_n|<\varepsilon, the iteration terminates. and we’ve found the root.

# 第三章 差值和多项式拟合

Theorem (Weierstrass Approximation Theorem): Suppose ff is defined and continuous on [a,b][a,b], for each ε>0\varepsilon>0, there exists a polynomial P(x)P(x) defined on [a,b][a,b], with the property that:

x[a,b],  f(x)P(x)<ε\forall x\in[a,b],\;|f(x)-P(x)|<\varepsilon

# the n-th Lagrange Interpolating Polynomial

Ln,k(x)=i=0,iknxxixkxiLn,k(x)={0xxk1x=xkP(x)=k=0nf(xk)Ln,k(x)L_{n,k}(x)=\prod_{i=0,i\neq k}^n\frac{x-x_i}{x_k-x_i}\\ L_{n,k}(x)=\begin{cases}0&x\neq x_k\\1&x=x_k\end{cases} \\ P(x)=\sum_{k=0}^nf(x_k)L_{n,k}(x)

Theorem: Suppose x0,x1,..,xnx_0,x_1,..,x_n are distinct numbers in the interval [a,b][a,b] and fCn+1[a,b]f\in C^{n+1}[a,b], then for each x[a,b]x\in[a,b], a number ξ(x)(a,b)\xi(x)\in(a,b) exists with

f(x)=Pn(x)+f(n+1)(ξ(x))(n+1)!(xx0)(xx1)...(xxn)f(x)=P_n(x)+\frac{f^{(n+1)}(\xi(x))}{(n+1)!}(x-x_0)(x-x_1)...(x-x_n)

Where Pn(x)P_n(x) is the Lagrange Interpolating polynomial.

The error is related to h=bah=|b-a|. If f(n+1)Mn+1|f^{(n+1)}|\leq M_{n+1}, then:

f(x)P1(x)=f(ξ)2(xx0)(xx1)M2h28f(x)Pn(x)Mn+1n!hn|f(x)-P_1(x)|=\frac{|f''(\xi)|}{2}|(x-x_0)(x-x_1)|\leq\frac{M_2h^2}{8}\\ |f(x)-P_n(x)|\leq \frac{M_{n+1}}{n!}h^n

Runge Phenomenon: when nn is bigger, the difference between function and polynomial is larger.

# Data Approximation and Neville’s Method

Theorem: Consider nn points: x1,x2,...,xnx_1,x_2,...,x_n, and Lagrange polynomial:

P1(x)=k=0,kjnLn,k(x)f(xk)P2(x)=k=0,kinLn,k(x)f(xk)P_1(x)=\sum_{k=0,k\neq j}^nL_{n,k}(x)f(x_k)\\ P_2(x)=\sum_{k=0,k\neq i}^nL_{n,k}(x)f(x_k) \\

Then P(x)=(xxj)P1(x)(xxi)P2(x)xixjP(x)=\frac{(x-x_j)P_1(x)-(x-x_i)P_2(x)}{x_i-x_j}.

So the Lagrange Polynomial can be generate recursively.

(x1,x2,x3,...,xn1,xn)(x1,x2,..,xn1)+(x2,x3,...,xn)(x1,x2,...,xn2)+2(x2,x3,...,xn1)+(x3,x4,...,xn)...(x_1,x_2,x_3,...,x_{n-1},x_n)\\ \Rightarrow(x_1,x_2,..,x_{n-1})+(x_2,x_3,...,x_n)\\ \Rightarrow (x_1,x_2,...,x_{n-2})+2(x_2,x_3,...,x_{n-1})+(x_3,x_4,...,x_n)\\ \Rightarrow ...

# Divided Difference

Definition: The zeroth divided difference of the function ff with respect to xix_i, denoted f[xi]f[x_i], is simply the value of ff at xix_i:

f[xi]=fi=f(xi)f[x_i]=f_i=f(x_i)

while the first divided difference of ff is defined as

f[xi,xi+1]=f[xi+1]f[xi]xi+1xif[x_i,x_{i+1}]=\frac{f[x_{i+1}]-f[x_i]}{x_{i+1}-x_i}

the second divided difference is

f[xi,xi+1,xi+2]=f[xi+1,xi+2]f[xi,xi+1]xi+2xif[x_i,x_{i+1},x_{i+2}]=\frac{f[x_{i+1},x_{i+2}]-f[x_{i},x_{i+1}]}{x_{i+2}-x_i}

Theorem: Pn(x)P_n(x) can be rewritten as

Pn(x)=f[x0]+f[x0,x1](xx0)+f[x0,x1,x2](xx0)(xx1)(xx2)+...P_n(x)=f[x_0]+f[x_0,x_1](x-x_0)+f[x_0,x_1,x_2](x-x_0)(x-x_1)(x-x_2)+...

which is known as Newton’s interpolatory divided difference formula.

Theorem: a number ξ\xi exists in (a,b)(a,b) with

f[x0,x1,...,xn]=f(n)(ξ)n!f[x_0,x_1,...,x_n]=\frac{f^{(n)}(\xi)}{n!}


Definition: the forward difference is defined by:

Δf(xi)=f(xi+1)f(xi)\Delta f(x_i)=f(x_{i+1})-f(x_i)

if i,xi+1xih\forall i,x_{i+1}-x_i\equiv h and x=x0+shx=x_0+sh, then

Pn(x)=f[x0]+f[x0,x1]sh+f[x0,x1,x2]s(s1)h2+...=k=0nf[x0,x1,...,xk]Cskk!hkf[x0,x1,..,xk]=1k!hkΔ2f(x0)Pn(x)=k=0nCskΔkf(x0)P_n(x)=f[x_0]+f[x_0,x_1]sh+f[x_0,x_1,x_2]s(s-1)h^2+...\\ =\sum_{k=0}^nf[x_0,x_1,...,x_k]C_s^kk!h^k\\ \because f[x_0,x_1,..,x_k]=\frac{1}{k!h^k}\Delta^2f(x_0)\\ \therefore P_n(x)=\sum_{k=0}^nC_s^k\Delta^kf(x_0)

The formula is named as Newton Forward Difference Formula.

Definition: the backward difference is defined by:

f(xi)=f(xi)f(xi1)\bigtriangledown f(x_i)=f(x_i)-f(x_{i-1})

So we have

f[xn,xn1]=1hf(xn)f[xn,xn1,xn2,...,xnk]=1k!hkkf(xn)Pn(x)=k=0n(1)lCskkf(xn)where  Csk=s(s1)...(sk+1)k!f[x_n,x_{n-1}]=\frac{1}{h}\bigtriangledown f(x_n)\\ f[x_n,x_{n-1},x_{n-2},...,x_{n-k}]=\frac{1}{k!h^k}\bigtriangledown^kf(x_n)\\ P_n(x)=\sum_{k=0}^n(-1)^lC_{-s}^k\bigtriangledown^k f(x_n)\\ where\;C_{-s}^k=\frac{-s(-s-1)...(-s-k+1)}{k!}

The formula is named as Newton Backward Difference Formula.

# Hermite Interpolation

Problem: Given n+1n+1 distinct numbers x0,x1,...,xnx_0,x_1,...,x_n and non-negative integers k0,k1,...,knk_0,k_1,...,k_n, to construct the osculating polynomial approximating a function fCm[a,b]f\in C^m[a,b] where m=max(k0,k1,...,kn)m=max(k_0,k_1,...,k_n).

The osculating polynomial H(x)H(x) requires:

f(x0)=H(x0),f(x0)=H(x0),...,f(k0)(x0)=H(k0)(x0)f(x1)=H(x1),f(x1)=H(x1),...,f(k1)(x1)=H(k1)(x1)......f(xn)=H(xn),f(xn)=H(xn),...,f(kn)(xn)=H(kn)(xn)f(x_0)=H(x_0),f'(x_0)=H'(x_0),...,f^{(k_0)}(x_0)=H^{(k_0)}(x_0)\\ f(x_1)=H(x_1),f'(x_1)=H'(x_1),...,f^{(k_1)}(x_1)=H^{(k_1)}(x_1)\\ ......\\ f(x_n)=H(x_n),f'(x_n)=H'(x_n),...,f^{(k_n)}(x_n)=H^{(k_n)}(x_n)\\

Obviously the degree of this osculating polynomial is at most K=i=1nki+nK=\sum_{i=1}^nk_i+n.

since the number of equation to be satisfied is K+1K+1.


if k0=k1=...=knk_0=k_1=...=k_n, then the Hermite polynomial is:

H2n+1(x)=j=0nf(xj)Hn,j(x)+j=0nf(xj)H^n,j(x)where  Hn,j(x)=[12(xxj)Ln,j(xj)]Ln,j2(x)H^n,j(x)=(xxj)Ln,j2(x)H_{2n+1}(x)=\sum_{j=0}^n f(x_j)H_{n,j}(x)+\sum_{j=0}^nf'(x_j)\hat{H}_{n,j}(x)\\ where\;H_{n,j}(x)=[1-2(x-x_j)L'_{n,j}(x_j)]L^2_{n,j}(x)\\ \hat{H}_{n,j}(x)=(x-x_j)L^2_{n,j}(x)

the error is

f(x)=H2n+1(x)+k=0n(xxk)2(2n+2)!f(2n+2)(ξ)f(x)=H_{2n+1}(x)+\frac{\prod_{k=0}^n(x-x_k)^2}{(2n+2)!}f^{(2n+2)}(\xi)

# Piecewise Interpolation Linear Polynomial

Instead of constructing a high-degree polynomial to approximate the function, we can construct several segments:

2

First construct the base function:

l0(x)=xx1x0x1,x0xx1li(x)={xxi1xixi1xi1xxixxi+1xixi+1xixxi+1ln(x)=xxn1xnxn1,xn1xxnl_0(x)=\frac{x-x_1}{x_0-x_1},x_0\leq x\leq x_1\\ l_i(x)=\begin{cases}\frac{x-x_{i-1}}{x_i-x_{i-1}}&x_{i-1}\leq x\leq x_i\\\frac{x-x_{i+1}}{x_i-x_{i+1}}&x_{i}\leq x\leq x_{i+1} \end{cases}\\ l_n(x)=\frac{x-x_{n-1}}{x_n-x_{n-1}},x_{n-1}\leq x\leq x_n

Then the piecewise interpolation linear polynomial is:

P(x)=i=0nli(x)f(xi)P(x)=\sum_{i=0}^nl_i(x)f(x_i)

the error bounds is:

f(x)P(x)h28M|f(x)-P(x)|\leq\frac{h^2}{8}M

where h=max(xi+1xi)h=max(x_{i+1}-x_i), M=maxf(x)M=max|f''(x)|.


When considering the osculating polynomial and k0=k1=...=kn=1k_0=k_1=...=k_n=1, we can construct:

P(x)=i=0nHi(x)f(xi)+H^i(x)f(xi)P(x)=\sum_{i=0}^n H_i(x)f(x_i)+\hat{H}_i(x)f'(x_i)

where

Hi(x)={(1+2xxixi1xi)(xxi1xixi1)2xi1xxi(1+2xxixi+1xi)(xxi+1xixi+1)2xixxi+1H^i(x)={(xxi)(xxi1xixi1)2xi1xxi(xxi)(xxi+1xixi+1)2xixxi+1H_i(x)=\begin{cases}(1+2\frac{x-x_i}{x_{i-1}-x_i})(\frac{x-x_{i-1}}{x_i-x_{i-1}})^2&x_{i-1}\leq x\leq x_i\\ (1+2\frac{x-x_i}{x_{i+1}-x_i})(\frac{x-x_{i+1}}{x_i-x_{i+1}})^2&x_{i}\leq x\leq x_{i+1} \end{cases}\\ \hat{H}_i(x)=\begin{cases}(x-x_i)(\frac{x-x_{i-1}}{x_i-x_{i-1}})^2&x_{i-1}\leq x\leq x_i\\ (x-x_i)(\frac{x-x_{i+1}}{x_i-x_{i+1}})^2&x_i\leq x\leq x_{i+1} \end{cases}

# 第四章 数值微分和数值积分

Formula:

f(x0)f(x0+h)f(x0)hf(x0)f(x0)f(x0h)hf(x0)f(x0+h)f(x0h)2hf'(x_0)\approx \frac{f(x_0+h)-f(x_0)}{h}\\ f'(x_0)\approx \frac{f(x_0)-f(x_0-h)}{h}\\ f'(x_0)\approx \frac{f(x_0+h)-f(x_0-h)}{2h}

# (n+1)-Point Formula

f(x)=k=0nf(xk)Ln,k(x)+(xx0)...(xxn)(n+1)!f(n+1)(ξ)f(xj)=k=0nf(xk)Ln,k(xj)+f(n+1)(ξ)(n+1)!k=0,kjn(xjxk)\because f(x)=\sum_{k=0}^nf(x_k)L_{n,k}(x)+\frac{(x-x_0)...(x-x_n)}{(n+1)!}f^{(n+1)}(\xi)\\ \therefore f'(x_j)=\sum_{k=0}^nf(x_k)L'_{n,k}(x_j)+\frac{f^{(n+1)}(\xi)}{(n+1)!}\prod_{k=0,k\neq j}^n(x_j-x_k)

note that f(x)f'(x) is only meaningful when x=x0,x1,...,xnx=x_0,x_1,...,x_n.

when n=2n=2 and x1=x0+h,x2=x0+2hx_1=x_0+h,x_2=x_0+2h:

f(x0)=12h[3f(x0)+4f(x0+h)f(x0+2h)]+h23f(3)(ξ)f'(x_0)=\frac{1}{2h}[-3f(x_0)+4f(x_0+h)-f(x_0+2h)]+\frac{h^2}{3}f^{(3)}(\xi)


By Taylor polynomial, we have

f(x0+h)=f(x0)+f(x0)h+12f(x0)h2+16f(3)(x0)h3+124f(4)(ξ1)h4f(x0h)=f(x0)f(x0)h+12f(x0)h216f(3)(x0)h3+124f(4)(ξ1)h4f(x_0+h)=f(x_0)+f'(x_0)h+\frac{1}{2}f''(x_0)h^2+\frac{1}{6}f^{(3)}(x_0)h^3+\frac{1}{24}f^{(4)}(\xi_1)h^4\\ f(x_0-h)=f(x_0)-f'(x_0)h+\frac{1}{2}f''(x_0)h^2-\frac{1}{6}f^{(3)}(x_0)h^3+\frac{1}{24}f^{(4)}(\xi_{-1})h^4\\

Therefore:

f(x0+h)+f(x0h)=2f(x0)+f(x0)h2+h424[f(4)(ξ1)+f(4)(ξ1)]f(x_0+h)+f(x_0-h)=2f(x_0)+f''(x_0)h^2+\frac{h^4}{24}[f^{(4)}(\xi_1)+f^{(4)}(\xi_{-1})]

Suppose f(4)f^{(4)} is continuous, then there exists ξ[x0h,x0+h]\xi\in[x_0-h,x_0+h] satisfying

f(4)(ξ)=12[f(4)(ξ1)+f(4)(ξ1)]f^{(4)}(\xi)=\frac{1}{2}[f^{(4)}(\xi_1)+f^{(4)}(\xi_{-1})]

So

f(x0)=1h2[f(x0h)2f(x0)+f(x0+h)]h412f(4)(ξ)f''(x_0)=\frac{1}{h^2}[f(x_0-h)-2f(x_0)+f(x_0+h)]-\frac{h^4}{12}f^{(4)}(\xi)

# Richardson’s Extrapolation

Suppose that for each number h0h\neq 0 we have a formula N(h)N(h) that approximates an unknown value MM and that the truncation error involved with the approximation has the form:

MN(h)=K1h+K2h2+K3h3+...M-N(h)=K_1h+K_2h^2+K_3h^3+...

Then we have

M=N(h2)+K1h2+K2h24+...M=N(\frac{h}{2})+K_1\frac{h}{2}+K_2\frac{h^2}{4}+...

Therefore

M=[N(h2)+N(h2)N(h)]+K2(h22h2)+...=2N(h2)N(h)+O(h2)M=[N(\frac{h}{2})+N(\frac{h}{2})-N(h)]+K_2(\frac{h^2}{2}-h^2)+...\\ =2N(\frac{h}{2})-N(h)+O(h^2)

let N2(h)=2N(h2)N(h)N_2(h)=2N(\frac{h}{2})-N(h), then

M=N2(h)K22h23K34h3...3M=4N2(h2)N2(h)+O(h3)\because M=N_2(h)-\frac{K_2}{2}h^2-\frac{3K_3}{4}h^3-...\\ \therefore3M=4N_2(\frac{h}{2})-N_2(h)+O(h^3)


In general, if MM can be written in the form:

M=N1(h)+j=1m1Kjhj+O(hm)M=N_1(h)+\sum_{j=1}^{m-1}K_jh^j+O(h^m)

then

Nj(h)=Nj1(h2)+Nj1(h/2)Nj1(h)2j11M=Nj(h)+O(hj)N_j(h)=N_{j-1}(\frac{h}{2})+\frac{N_{j-1}(h/2)-N_{j-1}(h)}{2^{j-1}-1}\\ M=N_j(h)+O(h^j)

# Numerical Integration

f(x)=i=0nf(xi)Ln,i(x)+f(n+1)(ξ)(n+1)!i=0n(xxi)abf(x)dx=i=0naif(xi)+1(n+1)!abf(n+1)(ξ)i=0n(xxi)dxwhere  ai=abLn,i(x)dx\because f(x)=\sum_{i=0}^nf(x_i)L_{n,i}(x)+\frac{f^{(n+1)}(\xi)}{(n+1)!}\prod_{i=0}^n(x-x_i)\\ \therefore \int_a^bf(x)dx=\sum_{i=0}^na_if(x_i)+\frac{1}{(n+1)!}\int_a^bf^{(n+1)}(\xi)\prod_{i=0}^n(x-x_i)dx\\ where\;a_i=\int_a^bL_{n,i}(x)dx

let In(f)=i=0naif(xi)I_n(f)=\sum_{i=0}^na_if(x_i), En(f)=1(n+1)!abf(n+1)(ξ)i=0n(xxi)dxE_n(f)=\frac{1}{(n+1)!}\int_a^bf^{(n+1)}(\xi)\prod_{i=0}^n(x-x_i)dx.

  • when n=1n=1, we have the trapezoidal rule:

abf(x)dx=ba2[f(a)+f(b)]h312f(ξ)where  h=ba\int_a^bf(x)dx=\frac{b-a}{2}[f(a)+f(b)]-\frac{h^3}{12}f''(\xi)\\ where\;h=b-a

  • when n=2n=2, we have the Simpson’s rule:

abf(x)dx=h3[f(a)+4f(a+b2)+f(b)]h590f(4)(ξ)where  h=ba2\int_a^bf(x)dx=\frac{h}{3}[f(a)+4f(\frac{a+b}{2})+f(b)]-\frac{h^5}{90}f^{(4)(\xi)}\\ where\;h=\frac{b-a}{2}

# Degree of Accuracy

Definition: The degree of accuracy (precision) of a quadrature formula is the largest positive integer nn such that the formula is precise for P(x)=xnP(x)=x^n.

For example, Simpson’s rule is accurate for x3dx\int x^3dx in that P(4)(x)0P^{(4)}(x)\equiv 0, so the error is zero.

# Newton-Cotes formula

Close Newton-Cotes formula:

h=banx0=a,x1=a+h,...,xn=bN(x)={i=0naif(xi)+hn+3f(n+2)(ξ)(n+2)!1n+1t2(t1)...(tn)dtn  is  eveni=0naif(xi)+hn+2f(n+1)(ξ)(n+1)!1n+1t(t1)...(tn)dtn  is  oddwhere  ai=abLn,i(x)dxh=\frac{b-a}{n}\\ x_0=a,x_1=a+h,...,x_n=b\\ N(x)=\begin{cases} \sum_{i=0}^na_if(x_i)+\frac{h^{n+3}f^{(n+2)}(\xi)}{(n+2)!}\int_{-1}^{n+1}t^2(t-1)...(t-n)dt&n\;is\;even\\ \sum_{i=0}^na_if(x_i)+\frac{h^{n+2}f^{(n+1)}(\xi)}{(n+1)!}\int_{-1}^{n+1}t(t-1)...(t-n)dt&n\;is\;odd \end{cases}\\ where\;a_i=\int_a^bL_{n,i}(x)dx

  • when n=0n=0 we have the trapezoidal rule.

  • when n=1n=1 we have the Simpson’s rule.


Open Newton-Cotes formula:

h=ban+2x0=a+h,x1=a+2h,...,xn=bhN(x)={i=0naif(xi)+hn+3f(n+2)(ξ)(n+2)!1n+1t2(t1)...(tn)dtn  is  eveni=0naif(xi)+hn+2f(n+1)(ξ)(n+1)!1n+1t(t1)...(tn)dtn  is  oddwhere  ai=abLn,i(x)dxh=\frac{b-a}{n+2}\\ x_0=a+h,x_1=a+2h,...,x_n=b-h\\ N(x)=\begin{cases} \sum_{i=0}^na_if(x_i)+\frac{h^{n+3}f^{(n+2)}(\xi)}{(n+2)!}\int_{-1}^{n+1}t^2(t-1)...(t-n)dt&n\;is\;even\\ \sum_{i=0}^na_if(x_i)+\frac{h^{n+2}f^{(n+1)}(\xi)}{(n+1)!}\int_{-1}^{n+1}t(t-1)...(t-n)dt&n\;is\;odd \end{cases}\\ where\;a_i=\int_a^bL_{n,i}(x)dx

  • when n=0n=0, we have the Midpoint rule:

abf(x)dx=2hf(a+b2)+h33f(ξ)where  h=(ba)/2\int_a^bf(x)dx=2hf(\frac{a+b}{2})+\frac{h^3}{3}f''(\xi)\\ where\;h = (b-a)/2

  • when n=1n=1:

abf(x)dx=3h2[f(a+h)+f(a+2h)]+3h34f(ξ)where  h=(ba)/3\int_a^bf(x)dx=\frac{3h}{2}[f(a+h)+f(a+2h)]+\frac{3h^3}{4}f''(\xi)\\ where\;h=(b-a)/3

The difference between open Newton-Cotes formula and close Newton-Cotes formula is the choice of x0,...,xnx_0,...,x_n. For close case, x0=ax_0=a and xn=bx_n=b, h=banh=\frac{b-a}{n} and xi=x0+ihx_i=x_0+ih. For open case, x0=a+h,xn=bh,h=ban+2x_0=a+h,x_n=b-h,h=\frac{b-a}{n+2} and xi=x0+ihx_i=x_0+ih. So the difference is whether a,ba,b are involved.

# Composite Rule

The idea is to divide the interval [a,b][a,b] into nn subintervals:

abf(x)dx=j=0n1xjxj+1f(x)dxj=0n1xj+1xj6[f(xj)+4f(xj+xj+12)+f(xj+1)]\int_a^bf(x)dx=\sum_{j=0}^{n-1}\int_{x_j}^{x_{j+1}}f(x)dx\\ \approx\sum_{j=0}^{n-1}\frac{x_{j+1}-x_j}{6}[f(x_j)+4f(\frac{x_j+x_{j+1}}{2})+f(x_{j+1})]

The formula above is the composite Simpson’s rule. The composite trapezoidal rule is just similar to Simpson’s rule. And we have composite mid point rule and so on.

# Adaptive Quadrature Method

The basic idea of adaptive quadrature method is to estimate the error of Simpson’s rule, if the error is beyond tolerance, then we divide the interval into two separate subintervals and apply the Simpson’s rule each.

If we view the interval as a whole:

abf(x)dx=ba3[f(a)+4f(a+b2)+f(b))](ba)590f(4)(ξ)ξ(a,b)\int_a^bf(x)dx=\frac{b-a}{3}[f(a)+4f(\frac{a+b}{2})+f(b))]-\frac{(b-a)^5}{90}f^{(4)}(\xi)\\ \xi\in(a,b)

If we divide the interval evenly:

abf(x)dx=aa+b2f(x)dx+a+b2bf(x)dx=ba6[f(a)+4f(3a+b4)+f(a+b2)]+ba6[f(a+b2)+4f(a+3b4)+f(b)](ba)59025[f(4)(ξ1)+f(4)(ξ2)]ξ1(a,a+b2),ξ2(a+b2,b)\int_a^bf(x)dx=\int_a^{\frac{a+b}{2}}f(x)dx+\int_{\frac{a+b}{2}}^bf(x)dx\\ =\frac{b-a}{6}[f(a)+4f(\frac{3a+b}{4})+f(\frac{a+b}{2})]\\ +\frac{b-a}{6}[f(\frac{a+b}{2})+4f(\frac{a+3b}{4})+f(b)]\\ -\frac{(b-a)^5}{90*2^5}[f^{(4)}(\xi_1)+f^{(4)}(\xi_2)]\\ \xi_1\in(a,\frac{a+b}{2}),\xi_2\in(\frac{a+b}{2},b)


f(4)(ξ1)+f(4)(ξ2)2=f(4)(ξ^)ξ^(ξ1,ξ2)(a,b)abf(x)dx=ba6[f(a)+4f(3a+b4)+2f(a+b2)+4f(a+3b4)+f(b)](ba)59024f(4)(ξ^)\because \frac{f^{(4)}(\xi_1)+f^{(4)}(\xi_2)}{2}=f^{(4)}(\hat{\xi})\\ \hat\xi\in(\xi_1,\xi_2)\in(a,b)\\ \therefore \int_a^bf(x)dx=\frac{b-a}{6}[f(a)+4f(\frac{3a+b}{4})+2f(\frac{a+b}{2})+4f(\frac{a+3b}{4})+f(b)]\\ -\frac{(b-a)^5}{90*2^4}f^{(4)}(\hat{\xi})

The error estimation is derived by assuming that ξξ^\xi\approx\hat{\xi}, or f(4)(ξ)f(4)(ξ^)f^{(4)}(\xi)\approx f^{(4)}(\hat{\xi}).

The success of the technique depends on the accuracy of this assumption! (If bab-a is small enough, then ξξ^\xi\approx\hat{\xi})

So we have the estimation of error:

ba6[f(a)+4f(3a+b4)+2f(a+b2)+4f(a+3b4)+f(b)](ba)59024f(4)(ξ^)ba3[f(a)+4f(a+b2)+f(b))](ba)590f(4)(ξ)(ba)590f(4)(ξ)1615[...]\frac{b-a}{6}[f(a)+4f(\frac{3a+b}{4})+2f(\frac{a+b}{2})+4f(\frac{a+3b}{4})+f(b)]\\ -\frac{(b-a)^5}{90*2^4}f^{(4)}(\hat{\xi})\approx \frac{b-a}{3}[f(a)+4f(\frac{a+b}{2})+f(b))]-\frac{(b-a)^5}{90}f^{(4)}(\xi)\\ \Rightarrow \frac{(b-a)^5}{90}f^{(4)}(\xi)\approx \frac{16}{15}[...]

The method is to divide the interval if the estimation of error is beyond tolerance.

# Gaussian Quadrature

Definition: Legendre polynomials are a collection \

  • for each nn, Pn(x)P_n(x) is a polynomial of degree nn.
  • 11P(x)Pn(x)dx=0\int_{-1}^1P(x)P_n(x)dx=0 if P(x)P(x) is a polynomial of degree <n< n.

The first few Legendre polynomials are

P0=1P1(x)=xP2(x)=x213P3(x)=x335x...P_0=1\\ P_1(x)=x\\ P_2(x)=x^2-\frac{1}{3}\\ P_3(x)=x^3-\frac{3}{5}x\\ ...

How to derive Legendre polynomial is quite simple:

Pn(x)=xn+a1xn1+...+an1x+an{11Pn(x)=011xPn(x)=011x2Pn(x)=0...11xn1Pn(x)=0P_n(x)=x^n+a_1x^{n-1}+...+a_{n-1}x+a_n\\ \begin{cases}\int_{-1}^1P_n(x)=0\\ \int_{-1}^1xP_n(x)=0\\ \int_{-1}^1x^2P_n(x)=0\\ ...\\ \int_{-1}^1x^{n-1}P_n(x)=0\\ \end{cases}

fine, that seems not that simple. We need to solve the equation system.

Theorem: Suppose that x1,x2,..,xnx_1,x_2,..,x_n are the roots of the nthnth Legendre polynomial Pn(x)P_n(x), and that for each i=1,2,...,ni=1,2,...,n, the numbers cic_i are defined by

ci=11j=1,jinxxjxixjdxc_i=\int_{-1}^1\prod_{j=1,j\neq i}^n\frac{x-x_j}{x_i-x_j}dx

If P(x)P(x) is any polynomial of degree <2n<2n, then

11P(x)dx=i=1nciP(xi)\int_{-1}^1P(x)dx=\sum_{i=1}^nc_iP(x_i)

# 第五章 线性常系数微分方程

Problemdydt=f(t,y),atb\frac{dy}{dt}=f(t,y),a\leq t\leq b. subject to y(a)=αy(a)=\alpha.

Definition: A function f(t,y)f(t,y) is said to satisfy a Lipschitz Condition in the variable yy on a set DR2D\subset\mathbb{R}^2, if a constant L>0L>0 exists with the property that

f(t,y1)f(t,y2)Ly1y2|f(t,y_1)-f(t,y_2)|\leq L|y_1-y_2|

whenever (t,y1),(t,y2)D(t,y_1),(t,y_2)\in D. The constant LL is called a Lipschitz Constant for ff.

Definition: A set DR2D\subset\mathbb{R}^2 is said to be convex if whenever

(t1,y1),(t2,y2)D(t_1,y_1),(t_2,y_2)\in D

then for any λ[0,1]\lambda\in[0,1], ((1λ)t1+λt2,(1λ)y1,λy2)D((1-\lambda)t_1+\lambda t_2,(1-\lambda)y_1,\lambda y_2)\in D.

Theorem: Suppose f(t,y)f(t,y) is defined on a convex set DR2D\subset\mathbb{R}^2, If a constant LL exists with

fy(t,y)L|\frac{\partial f}{\partial y}(t,y)|\leq L

for all (t,y)D(t,y)\in D, Then ff satisfies a Lipschitz condition on DD in the variable yy with Lipschitz constant LL.

Definition: The initial problem

{y(t)=f(t,y)y(a)=α\begin{cases}y'(t)=f(t,y)\\y(a)=\alpha\end{cases}

is a well-posed problem if

  • a unique solution y(t)y(t) exists,

  • for any ε>0\varepsilon>0, there exists a positive constant k(ε)k(\varepsilon) with the property that, whenever ε0<ε|\varepsilon_0|<\varepsilon and δ(t)\delta(t) is continuous with δ(t)<ε|\delta(t)|<\varepsilon on [a,b][a,b], a unique solution z(t)z(t) to the problem

    {z(t)=f(t,z)+δ(t)z(a)=α+ε0\begin{cases}z'(t)=f(t,z)+\delta(t)\\z(a)=\alpha+\varepsilon_0\end{cases}

    exists with z(t)y(t)<k(ε)ε|z(t)-y(t)|<k(\varepsilon)\varepsilon.

Theorem: Suppose that D={(t,y)atb,y}D=\{(t,y)|a\leq t\leq b,-\infty\leq y\leq\infty\} and f(t,y)f(t,y) is continuous on DD. If ff satisfies a Lipschitz condition on DD in the variable yy, then problem:

{y(t)=f(t,y)y(a)=α\begin{cases}y'(t)=f(t,y)\\y(a)=\alpha\end{cases}

has a unique solution y(t)y(t) and is well-posed.

# Euler’s Method

To solve a well-posed initial-value problem:

{y(t)=f(t,y)y(a)=α\begin{cases}y'(t)=f(t,y)\\y(a)=\alpha\end{cases}

First, we construct the mesh points in the interval [a,b][a,b] with equal space:

a=t0<t1<...<tN=ba=t_0<t_1<...<t_N=b

with equal stepi size h=(ba)/Nh=(b-a)/N.

Due to Taylor formula, we have:

y(tj+1)=y(tj)+hy(tj)+h22y(ξj)=y(tj)+hf(tj,y(tj))+h22y(ξj)y(t_{j+1})=y(t_j)+hy'(t_j)+\frac{h^2}{2}y''(\xi_j)\\ =y(t_j)+hf(t_j,y(t_j))+\frac{h^2}{2}y''(\xi_j)

So we can generate y(t1),y(t2),...,y(tN)y(t_1),y(t_2),...,y(t_N) based on y(t0)=αy(t_0)=\alpha.

Then we can construct Lagrange polynomial with (t0,y(t0)),...,(tN,y(tN))(t_0,y(t_0)),...,(t_N,y(t_N)).

Lemma: If ss and tt are positive real number, {aj}j=0k\{a_j\}_{j=0}^k is a sequence satisfying a0tsa_0\geq-\frac{t}{s} and

aj+1(1+s)a)j+t,j=0,1,...,ka_{j+1}\leq (1+s)a)j+t,j=0,1,...,k

then

aj+1e(j+1)s(a0+ts)tsa_{j+1}\leq e^{(j+1)s}(a_0+\frac{t}{s})-\frac{t}{s}

Theorem: Suppose ff is continuous and satisfies a Lipschitz Condition on DD, and that y(t)M|y''(t)|\leq M. y(t)y^*(t) is the solution for the problem, and y(tj)y(t_j) is generated by Euler’s Method, then:

y(tj)y(tj)hM2L[eL(tja)1]|y^*(t_j)-y(t_j)|\leq\frac{hM}{2L}[e^{L(t_j-a)}-1]


# Improved Euler Method (Predict-Correct)

we can generate y(t0),...,y(tn)y(t_0),...,y(t_n) using Euler’s Method, and correct them using trapezoidal rule:

y(ti+1)y(ti)hy(ti+1)+y(ti)2y^(ti+1)=h2[f(ti+1,y(ti+1))+f(ti,y^(ti)]+y^(ti)\frac{y(t_{i+1})-y(t_i)}{h}\approx\frac{y'(t_{i+1})+y'(t_i)}{2}\\ \Rightarrow \hat{y}(t_{i+1})=\frac{h}{2}[f(t_{i+1},y(t_{i+1}))+f(t_i,\hat{y}(t_i)]+\hat{y}(t_i)

where y(ti+1)y(t_{i+1}) is generated by Euler’s Method, and y^(ti+1)\hat{y}(t_{i+1}) is the corrected value.

# Higher-Order Taylor Methods

Definition: The local truncation error in Euler’s Method is defined by

τi+1(h)=y(ti+1)(y(ti)+hf(ti,yi))h=h2y(ξj)\tau_{i+1}(h)=\frac{y(t_{i+1})-(y(t_i)+hf(t_i,y_i))}{h}\\ =\frac{h}{2}y''(\xi_j)

Taylor Method of order nn:

y(t0)=αy(ti+1)=y(ti)+hT(n)(ti,y(ti))where  T(n)(ti,y(ti))=f(ti,y(ti))+h2f(ti,y(ti))+...+h(n1)n!f(n1)(ti,y(ti))y(t_0)=\alpha\\ y(t_{i+1})=y(t_{i})+hT^{(n)}(t_i,y(t_i))\\ where\;T^{(n)}(t_i,y(t_i))=f(t_i,y(t_i))+\frac{h}{2}f'(t_i,y(t_i))+...+\frac{h^{(n-1)}}{n!}f^{(n-1)}(t_i,y(t_i))

Euler’s Method is the Taylor method of order 1.

# Runge-Kutta Method

{y(t)=f(t,y)y(a)=αa=t0<t1<...<tN=b\begin{cases}y'(t)=f(t,y)\\y(a)=\alpha\end{cases}\\ a=t_0<t_1<...<t_N=b

The Runge-Kutta Method is by computing:

y(tn+1)=y(tn)+i=1NciKiy(t_{n+1})=y(t_n)+\sum_{i=1}^Nc_iK_i

where

K1=hf(tn,yn)Ki=hf(tn+αih,yn+j=1i1bijKj)K_1=hf(t_n,y_n)\\ K_i=hf(t_n+\alpha_ih,y_n+\sum_{j=1}^{i-1}b_{ij}K_j)

The coefficients are generated by Taylor’s Theorem and error analysis.

when N=4N=4:

y0=y(a)=αK1=hf(ti,yi)K2=hf(ti+h2,yi+12K1)K3=hf(ti+h2,yi+12K2)K4=hf(ti+1,yi+K3)y(ti+1)=y(ti)+16(K1+2K2+2K3+K4)y_0=y(a)=\alpha\\ K_1=hf(t_i,y_i)\\ K_2=hf(t_i+\frac{h}{2},y_i+\frac{1}{2}K_1)\\ K_3=hf(t_i+\frac{h}{2},y_i+\frac{1}{2}K_2)\\ K_4=hf(t_{i+1},y_i+K_3)\\ y(t_{i+1})=y(t_i)+\frac{1}{6}(K_1+2K_2+2K_3+K_4)

# Adams Fourth-Order Predictor-Corrector

First we need to compute y(t0),y(t1),y(t2),y(t3)y(t_0),y(t_1),y(t_2),y(t_3) by Runge-Kutta method.

Then we can combine the implicit and the explicit method:

  • Use Four-step Adams-Bashforth method as predictor:

    y^(tn+1)=y(tn)+h24[55f(tn,y(tn))59f(tn1,y(tn1))+37f(tn2,y(tn2))9f(tn3,y(tn3))]\hat{y}(t_{n+1})=y(t_n)+\frac{h}{24}[55f(t_n,y(t_{n}))\\ -59f(t_{n-1},y(t_{n-1}))\\ +37f(t_{n-2},y(t_{n-2}))\\ -9f(t_{n-3},y(t_{n-3}))]

    this is predictor for y(tn+1)y(t_{n+1}) using explicit method.

  • Then use Three-step Adams-Moulton Method as a corrector:

    y(tn+1)=y(tn)+h24[9f(tn+1,y^(tn+1))+19f(tn,y(tn))5f(tn1,y(tn1))+f(tn2,y(tn2))]y(t_{n+1})=y(t_n)+\frac{h}{24}[ 9f(t_{n+1},\hat{y}(t_{n+1}))\\ +19f(t_n,y(t_n))\\ -5f(t_{n-1},y(t_{n-1}))\\ +f(t_{n-2},y(t_{n-2})) ]

    this is corrector for y(tn+1)y(t_{n+1}) using implicit method while the y(tn+1)y(t_{n+1}) on the right side of the equation is replaced by the predictor y^(tn+1)\hat{y}(t_{n+1}).

# 第六章 线性方程组

The Linear System of Equations (LSEs)

# Gaussian Eliminiation

  • Maximal Row Pivoting Technique:

    choose maxajk,kjnmax|a_{jk}|,k\leq j\leq n as the pivot element.

    choose the maximum integer on the k-th column.

  • Partial Pivoting Technique:

    choose maxaij,ki,jnmax|a_{ij}|,k\leq i,j\leq n as the pivot element. which means that the pivot element is not necessarily on the k-th column.

  • Scaled Partial Pivoting Technique:

    compute the max integer for each row:

    si=max1jnaijs_i=max_{1\leq j\leq n}|a_{ij}|

    Then choose the number p satisifying:

    ap1sp=max1inai1si\frac{a_{p1}}{s_p}=max_{1\leq i\leq n}\frac{a_{i1}}{s_i}

    then ap1a_{p1} as the pivot element.

# Matrix Factorization

Ax=bA=LU\textbf{Ax=b}\\ A=\textbf{LU}

where

L=(100...0l2110...0l31l321...0...............ln1ln2ln3...1)U=(u11u12u13...u1n0u22u23...u2n00u33...u3n...............000...unn)\textbf{L}=\begin{pmatrix}1&0&0&...&0\\ l_{21}&1&0&...&0\\ l_{31}&l_{32}&1&...&0\\ ...&...&...&...&...\\ l_{n1}&l_{n2}&l_{n3}&...&1 \end{pmatrix}\\ \textbf{U}=\begin{pmatrix} u_{11}&u_{12}&u_{13}&...&u_{1n}\\ 0&u_{22}&u_{23}&...&u_{2n}\\ 0&0&u_{33}&...&u_{3n}\\ ...&...&...&...&...\\ 0&0&0&...&u_{nn} \end{pmatrix}\\

Then the LSEs can be written as LUx=b\textbf{LUx=b}. and we can solve:

{Ly=bUx=y\begin{cases} \textbf{Ly=b}\\ \textbf{Ux=y} \end{cases}

The process of factorization is essentially the same as Gaussian Elimination.

Consider Each step of Gaussian Elimination:

  • swap row i and row j
  • multiple row i by x then add to row j.

The two operations above can be described as matrix. e.g:

M=(0100100000100001)M=\begin{pmatrix} 0&1&0&0\\ 1&0&0&0\\ 0&0&1&0\\ 0&0&0&1 \end{pmatrix}

then B=MAB=MA is the result of swapping the first two rows of AA.

M=(1000210000100001)M=\begin{pmatrix} 1&0&0&0\\ 2&1&0&0\\ 0&0&1&0\\ 0&0&0&1 \end{pmatrix}

then B=MAB=MA is the result of adding 2*row1 to row2.

So we have:

MmMm1...M1A=UM_{m}M_{m-1}...M_1A=U

which is actually Gaussian Elimination.

So A=LU=M11...Mm1UA=LU=M_1^{-1}...M_m^{-1}U.

# Doolittle Factorization

The Method is to apply the LU factorization directly.

First select

{u1j=a1j1jnli1=ai1/u111in\begin{cases} u_{1j}=a_{1j}&1\leq j\leq n\\ l_{i1}=a_{i1}/u_{11}&1\leq i\leq n \end{cases}

If a11=0a_{11}=0, then the factorization is impossible. (Not unique solution)

Then compute

{uij=aijk=1i1likukjjilij=(aijk=1j1likukj)/ujjij+1\begin{cases} u_{ij}=a_{ij}-\sum_{k=1}^{i-1}l_{ik}u_{kj}&j\geq i\\ l_{ij}=(a_{ij}-\sum_{k=1}^{j-1}l_{ik}u_{kj})/u_{jj}&i\geq j+1 \end{cases}

Doolittle factorization works only when we don’t need any row swaps in Gaussian Elimination.

# Special types of matrices

Definition: The n×nn\times n matrix AA is said to be strictly diagonally dominant when

i[1,n].aii>j=1,jinaij\forall i\in[1,n].\quad|a_{ii}|>\sum_{j=1,j\neq i}^n|a_{ij}|

Theorem: A strictly diagonally dominant matrix AA is nonsingular.

In this case, Gaussian elimination can be performed on any linear system of the form Ax=b\textbf{Ax}=\textbf{b} to obtain its unique solution without row or column interchanges.

Definition: A matrix AA is positive definite if it is symmetric and for every column vector xx holds xTAx>0x^TAx>0.

Theorem: If AA is an n×nn\times n positive definite matrix, then

  • AA is nonsingular
  • aii>0a_{ii}>0 for all 1in1\leq i\leq n.
  • a_{ij}^2

Theorem: A symmetric matrix AA is positive definite if and only if each of its leading principal submatrices has a positive determinant.

Corollary: The matrix AA is positive definite if and only if AA can be factored in the form LDLTLDL^T, where LL is lower triangular with 1s on its diagonal and DD is a diagonal matrix with positive diagonal entries.

Corollary: The matrix AA is positive definite if and only if AA can be factored in the form LLTLL^T, where LL is lower triangular with nonzero diagonal entries.

# Choleski’s Method

The method is to factor a positive definite matrix A=LLTA=LL^T.

First set l11=a11l_{11}=\sqrt{a_{11}}.

For $ j =2,…,n$, set lj1=a1j/l11l_{j1}=a_{1j}/l_{11}.

Then for i=2,...,n1i=2,...,n-1, ji+1j\geq i+1:

lii=aiij=1i1lij2lji=(aijk=1i1likljk)/liil_{ii}=\sqrt{a_{ii}-\sum_{j=1}^{i-1}l_{ij}^2}\\ l_{ji}=(a_{ij}-\sum_{k=1}^{i-1}l_{ik}l_{jk})/l_{ii}

Finally set lnn=annk=1n1lnk2l_{nn}=\sqrt{a_{nn}-\sum_{k=1}^{n-1}l_{nk}^2}.


If we need to factor A=LDLTA=LDL^T

First set d11=a11d_{11}=a_{11}.

For j=2,...,nj=2,...,n, set lj1=a1j/d11l_{j1}=a_{1j}/d_{11}.

For i=2,...,ni=2,...,n, ji+1j\geq i+1

dii=aiij=1i1lij2djjlji=(aijk=1i1likljkdkk)/diid_{ii}=a_{ii}-\sum_{j=1}^{i-1}l_{ij}^2d_{jj}\\ l_{ji}=(a_{ij}-\sum_{k=1}^{i-1}l_{ik}l_{jk}d_{kk})/d_{ii}

# 第七章 线性方程组的迭代

# Norms of Vectors and Matrices

Definition: A vector norm on Rn\mathbb{R}^n is a function: ||\cdot||, from Rn\mathbb{R}^n into R\mathbb{R} with the following properties:

  • x0||\textbf{x}||\geq 0 for all xRn\textbf{x}\in\mathbb{R}^n.
  • x=0||\textbf{x}||=0 if and only if x=(0,0,...,0)T\textbf{x}=(0,0,...,0)^T.
  • αx=αx||\alpha\textbf{x}||=|\alpha|\cdot||\textbf{x}|| for all αR\alpha\in\mathbb{R} and xRn\textbf{x}\in\mathbb{R}^n.
  • x+yx+y||\textbf{x+y}||\leq ||\textbf{x}||+||\textbf{y}|| for all x,yRn\textbf{x,y}\in\mathbb{R}^n.

Definition: The l2l_2 and ll_{\infty} norm for vectors are defined by

x2=i=1nxi2x=max1inxi||\textbf{x}||_2=\sqrt{\sum_{i=1}^n|x_i|^2}\\ ||\textbf{x}||_{\infty}=\max_{1\leq i\leq n}|x_i|

l2l_2 norm is called Euclidean Norm.

Definition: the l2l_2 and ll_\infty distances between x\textbf{x} and y\textbf{y} are defined by

xy2=i=1nxiyi2xy=max1inxiyi||\textbf{x}-\textbf{y}||_2=\sqrt{\sum_{i=1}^n|x_i-y_i|^2}\\ ||\textbf{x}-\textbf{y}||_\infty=\max_{1\leq i\leq n}|x_i-y_i|

Definition: A sequence {x}0\{\textbf{x}\}_0^\infty of vectors is said to converge to x\textbf{x} with respect to the norm ||\cdot|| if :

limnxnx=0\lim_{n\rightarrow \infty}||\textbf{x}_n-\textbf{x}||=0

Theorem: xx2nx||\textbf{x}||\leq ||\textbf{x}||_2\leq\sqrt{n}||x||_\infty


Definition: A matrix norm on the set of all n×nn\times n matrices is a real valued function satisfying:

  • A0||\textbf{A}||\geq 0
  • A=0||\textbf{A}||=0 if and only if A\textbf{A} is all zero
  • αA=αA||\alpha\textbf{A}||=\alpha||\textbf{A}||
  • A+BA+B||\textbf{A}+\textbf{B}||\leq||\textbf{A}||+||\textbf{B}||
  • ABAB||\textbf{AB}||\leq||\textbf{A}||\cdot||\textbf{B}||

An example of the matrix norm is:

A=maxx=1Ax||\textbf{A}||=\max_{||\textbf{x}||=1}||\textbf{Ax}||

This is called the natural matrix norm.

Corollary: AxAx||\textbf{Ax}||\leq||\textbf{A}||\cdot||\textbf{x}||. when the norm is natural matrix norm.

Definition: The l2l_2 and ll_{\infty} norm for matrices are defined by

A=maxx=1AxA2=maxx2=1Ax2||\textbf{A}||_\infty=\max_{||\textbf{x}||_\infty=1}||\textbf{Ax}||_\infty\\ ||\textbf{A}||_2=\max_{||\textbf{x}||_2=1}||\textbf{Ax}||_2

Theorem: A=max1inj=1naij||\textbf{A}||_\infty=\max_{1\leq i\leq n}\sum_{j=1}^n|a_{ij}|.

# Eigenvalues and Eigenvectors

Definition: If A\textbf{A} is a square matrix, the characteristic polynomial of A\textbf{A} is defined by:

p(A)=det(AλI)p(\textbf{A})=det(\textbf{A}-\lambda\textbf{I})

Definition: The spectral radius ρ(A)\rho(\textbf{A}) of a matrix A\textbf{A} is defined by

ρ(A)=maxλ\rho(\textbf{A})=max|\lambda|

where λ\lambda is an eigenvalue of A\textbf{A}.

Theorem: A2=ρ(ATA)||\textbf{A}||_2=\sqrt{\rho(A^TA)} and ρ(A)A\rho(\textbf{A})\leq||\textbf{A}|| for any natural norm.

Definition: we say a matrix AA is convergent if

limnAn=0\lim_{n\rightarrow\infty}A^n=0

Theorem: The following statements are equivalent

  • A\textbf{A} is a convergent matrix
  • limnAn=0\lim_{n\rightarrow\infty}||\textbf{A}^n||=0 for some natural norm.
  • limnAn=0\lim_{n\rightarrow\infty}||\textbf{A}^n||=0 for all natural norms
  • ρ(A)<1\rho(\textbf{A})<1
  • limnAnx=0\lim_{n\rightarrow\infty}\textbf{A}^n\textbf{x}=0 for every x\textbf{x}.

# Iterative techniques for solving linear system

First we can split matrix AA into three parts:

  • diagonal matrix \textbf
  • strictly lower-triangular part \textbf
  • strictly upper-triangular part U\textbf{U}.

So

Ax=bDx=(L+U)x+bxn+1=D1(L+U)xn+D1b\textbf{Ax}=\textbf{b}\Leftrightarrow \textbf{Dx}=(\textbf{L}+\textbf{U})\textbf{x}+\textbf{b}\\ \therefore \textbf{x}_{n+1}=\textbf{D}^{-1}(\textbf{L}+\textbf{U})\textbf{x}_n+\textbf{D}^{-1}\textbf{b}

And this is the matrix form of the Jacobi iterative technique. We can repeat this iteration.


Gauss-Seidel method:

xn+1=(DL)1Uxn+(DL)1b\textbf{x}_{n+1}=(\textbf{D}-\textbf{L})^{-1}\textbf{U}\textbf{x}_n+(\textbf{D}-\textbf{L})^{-1}\textbf{b}


Lemma: If the spectral radius ρ(T)\rho(\textbf{T}) satisfies ρ(T)<1\rho(\textbf{T})<1, then

(IT)1=I+T+T2+...(\textbf{I}-\textbf{T})^{-1}=\textbf{I}+\textbf{T}+\textbf{T}^2+...

Theorem: Iteration xn+1=Txn+c\textbf{x}_{n+1}=\textbf{Tx}_n+\textbf{c} converges to the unique solution of x=Tx+c\textbf{x}=\textbf{Tx}+\textbf{c} if and only if ρ(T)<1\rho(\textbf{T})<1.

Corollary: If T<1||\textbf{T}||<1 for any one natural matrix norm and c\textbf{c} is a given vector, then the iteration xn+1=Txn+c\textbf{x}_{n+1}=\textbf{Tx}_n+\textbf{c} converges, and the following error bound hold:

  • xnxTnx0x||\textbf{x}_n-\textbf{x}||\leq ||\textbf{T}||^n||\textbf{x}_0-\textbf{x}||

# SOR Technique

ωAx=ωb(DωL)xn+1=[(1ω)D+ωU]xn+ωbTω=(DωL)1[(1ω)D+ωU]cω=ω(DωL)1bxn+1=Tωxn+cω\omega\textbf{Ax}=\omega\textbf{b}\\ \Rightarrow(\textbf{D}-\omega\textbf{L})\textbf{x}_{n+1}=[(1-\omega)\textbf{D}+\omega\textbf{U}]\textbf{x}_n+\omega\textbf{b}\\ \textbf{T}_\omega=(\textbf{D}-\omega\textbf{L})^{-1}[(1-\omega)\textbf{D}+\omega\textbf{U}]\\ \textbf{c}_\omega=\omega(\textbf{D}-\omega\textbf{L})^{-1}\textbf{b}\\ \therefore \textbf{x}_{n+1}=\textbf{T}_\omega\textbf{x}_n+\textbf{c}_\omega

Theorem(Kahan): ρ(Tω)ω1\rho(\textbf{T}_\omega)\geq|\omega-1|.

So the SOR method can converge if and only if 0<ω<20<\omega<2.

the optimal choice of ω\omega for the SOR method is

ω=21+1[ρ(D1(L+U))]2\omega=\frac{2}{1+\sqrt{1-[\rho(\textbf{D}^{-1}(\textbf{L}+\textbf{U}))]^2}}

and we have ρ(Tω)=ω1\rho(\textbf{T}_\omega)=\omega-1.

# 第八章 近似理论

# Discrete Least Squares Approximation

  • Minimax Rule:

    E1=mina0,a0max1i10yi(a1xi+x0)E_1=\min_{a_0,a_0}\max_{1\leq i\leq 10}|y_i-(a_1x_i+x_0)|

  • Absolute Deviation Rule:

    E2=mina0,a1i=110yi(a1xi+a0)E_2=\min_{a_0,a_1}\sum_{i=1}^{10}|y_i-(a_1x_i+a_0)|

  • Least Square Rule:

    E=mina0,a1i=110[yi(a1xi+a0)]2E = \min_{a_0,a_1}\sum_{i=1}^{10}[y_i-(a_1x_i+a_0)]^2

    we have

    {a0E=i=1m2(yia1xia0)(1)=0a1=i=1m2(yia1xia0)(xi)=0\begin{cases}\frac{\partial}{\partial a_0}E=\sum_{i=1}^m2(y_i-a_1x_i-a_0)(-1)=0\\ \frac{\partial}{\partial a_1}=\sum_{i=1}^m2(y_i-a_1x_i-a_0)(-x_i)=0\end{cases}


​ The genera form of discrete least square rule:

mina0,a1,...,anE=i=1m(yik=0nakxik)2Eaj=0k=0naki=1mxij+k=i=1myixij,j=0,1,...,n\min_{a_0,a_1,...,a_n}E=\sum_{i=1}^m(y_i-\sum_{k=0}^na_kx_i^k)^2\\ \frac{\partial E}{\partial a_j}=0\Rightarrow\\ \sum_{k=0}^na_k\sum_{i=1}^m x_i^{j+k}=\sum_{i=1}^my_ix_i^j,j=0,1,...,n

​ Let

R=(x10x11...x1n1x2nx20x21...x2n1x2n...............xm0xm1...xmn1xmn),a=(a0a1...an),y=(y0y1...ym)\textbf{R}=\begin{pmatrix}x_1^0&x_1^1&...&x_1^{n-1}&x_2^n\\ x_2^0&x_2^{1}&...&x_2^{n-1}&x_2^n\\ ...&...&...&...&...\\ x_m^0&x_m^{1}&...&x_m^{n-1}&x_m^n \end{pmatrix}, \textbf{a}=\begin{pmatrix}a_0\\a_1\\...\\a_n\end{pmatrix}, \textbf{y}=\begin{pmatrix}y_0\\y_1\\...\\y_m\end{pmatrix}

​ Then the equations can be written as

RTRa=RTy\textbf{R}^T\textbf{Ra}=\textbf{R}^T\textbf{y}


If we can compute the function, then the continuous least square rule is:

mina0,...,anE=ab[f(x)Pn(x)]2dx=ab[f(x)k=0nakxk]2dx\min_{a_0,...,a_n}E=\int_a^b[f(x)-P_n(x)]^2dx=\int_{a}^b[f(x)-\sum_{k=0}^na_kx^k]^2dx

Definition: The set of functions {ϕ0,ϕ1,...,ϕn}\{\phi_0,\phi_1,...,\phi_n\} is said to be linearly independent on [a,b][a,b] if for all x[a,b]x\in [a,b]:

c0ϕ0(x)+c1ϕ1(x)+...+cnϕn(x)=0c0=c1=...=cn=0c_0\phi_0(x)+c_1\phi_1(x)+...+c_n\phi_n(x)=0\\ \Rightarrow c_0=c_1=...=c_n=0

Theorem: If ϕj(x)\phi_j(x) is a polynomial of degree j, then {ϕ0,...,ϕn}\{\phi_0,...,\phi_n\} is linearly independent on any interval.

# Weight function and Orthogonal Set of functions

In addition to the discrete least square rule, the error can be

E=abw(x)[f(x)k=0nakϕk(x)]2dxE=\int_a^bw(x)[f(x)-\sum_{k=0}^na_k\phi_k(x)]^2dx

where w(x)w(x) is the weight function. The normal function can be written as

0=Eajk=0nakabw(x)ϕk(x)ϕj(x)dx=abw(x)f(x)ϕj(x)dx,j=0,1,...,n0=\frac{\partial E}{\partial a_j}\Rightarrow\\ \sum_{k=0}^na_k\int_a^bw(x)\phi_k(x)\phi_j(x)dx=\int_a^bw(x)f(x)\phi_j(x)dx,j=0,1,...,n

we can choose the function set {ϕ0,...,ϕn}\{\phi_0,...,\phi_n\} satisfying that:

abw(x)ϕk(x)ϕj(x)dx={0jkαj>0j=k\int_a^bw(x)\phi_k(x)\phi_j(x)dx=\begin{cases}0&j\neq k\\ \alpha_j>0&j=k\end{cases}

so the coefficients can be easily solved:

aj=1αjabw(x)f(x)ϕj(x)dxa_j=\frac{1}{\alpha_j}\int_a^bw(x)f(x)\phi_j(x)dx

Definition: {ϕ0,..,ϕn}\{\phi_0,..,\phi_n\} is said to be an orthogonal set of functions with respect to the weight function w(x)w(x) if

abw(x)ϕj(x)ϕk(x)dx={0jkαk>0j=k\int_a^bw(x)\phi_j(x)\phi_k(x)dx=\begin{cases}0&j\neq k \\ \alpha_k>0&j=k\end{cases}

Theorem(Gram-Schmidt orthogonalize):

Bk=abxw(x)[ϕk1(x)]2dxabw(x)[ϕk1(x)]2dx,Ck=abxw(x)ϕk1(x)ϕk2(x)dxabw(x)[ϕk2(x)]2dx{ϕ0(x)=1ϕ1(x)=xB1...ϕk(x)=(xBk)ϕk1(x)Ckϕk2(x)B_k=\frac{\int_a^bxw(x)[\phi_{k-1}(x)]^2dx}{\int_a^bw(x)[\phi_{k-1}(x)]^2dx},C_k=\frac{\int_a^bxw(x)\phi_{k-1}(x)\phi_{k-2}(x)dx}{\int_a^bw(x)[\phi_{k-2}(x)]^2dx}\\ \begin{cases}\phi_0(x)=1\\ \phi_1(x)=x-B_1 \\ ...\\ \phi_k(x)=(x-B_k)\phi_{k-1}(x)-C_k\phi_{k-2}(x) \end{cases}

Then we generate a set of polynomial functions.

when w(x)=1w(x)=1, then we can generate Legendre Polynomial.

# Chebyshev Polynomials

Tn(x)=cos[n  arccos(x)],x[1,1]T0=1Tn(θ(x))=cos(nθ),θ=arccos(x)Tn+1(θ)=cos[(n+1)θ]=2cos(nθ)cosθTn1(θ)T0=1,T1=x,Tn+1(x)=2xTn(x)Tn1(x)T_n(x)=cos[n\;arccos (x)],x\in[-1,1]\\ T_0=1\\ T_n(\theta(x))=cos(n\theta),\theta=arccos(x)\\ T_{n+1}(\theta)=cos[(n+1)\theta]=2cos(n\theta)cos\theta-T_{n-1}(\theta)\\ T_0=1,T_1=x,\\ T_{n+1}(x)=2xT_n(x)-T_{n-1}(x)

Theorem: The Chebyshev polynomials is orthogonal with respect to the weight function

w(x)=11x2w(x)=\frac{1}{\sqrt{1-x^2}}

Theorem: Zero points:

xˉk=cos(2k12nπ),k=1,2,...,nTn(xˉk)=0\bar{x}_k=cos(\frac{2k-1}{2n}\pi),k=1,2,...,n\\ T_n(\bar{x}_k)=0

extrema points:

xˉk=cos(kπn),k=0,1,...,nTn(xˉk)=(1)k,Tn(xˉk)=0\bar{x}_k'=cos(\frac{k\pi}{n}),k=0,1,...,n\\ T_n(\bar{x}_k')=(-1)^k,T_n'(\bar{x}_k')=0


What’s more, we can normalize Chebyshev Polynomial too ones with leading coefficient 1:

T~n(x)=12n1Tn(x)\tilde{T}_n(x)=\frac{1}{2^{n-1}}T_n(x)

let ~n\tilde{\prod}_n denotes all degree=n polynomials with leading coefficient 1, then:

Theorem:

Pn(x)~n,12n1=maxx[1,1]T~n(x)maxx[1,1]Pn(x)\forall P_n(x)\in\tilde{\prod}_n,\frac{1}{2^{n-1}}=\max_{x\in[-1,1]}|\tilde{T}_n(x)|\leq\max_{x\in[-1,1]}|P_n(x)|

equality can occur only if PnT~nP_n\equiv\tilde{T}_n.

Corollary: If P(x)P(x) is the interpolating polynomial of degree at most nn with nodes at the roots of Tn+1(x)T_{n+1}(x), then

maxx[1,1]f(x)P(x)12n(n+1)!maxx[1,1]f(n+1)(x)\max_{x\in[-1,1]}|f(x)-P(x)|\leq\frac{1}{2^n(n+1)!}\max_{x\in[-1,1]}|f^{(n+1)}(x)|

# 第九章 特征值与特征向量

Definition: left eigenvector is defined by

yTA=λyT\textbf{y}^T\textbf{A}=\lambda\textbf{y}^T

Definition: Spectrum of A\textbf{A} is the set of eigenvalues of A\textbf{A}, denoted by λ(A)\lambda(\textbf{A}).

Definition: Spectrum radius of A\textbf{A} is:

ρ(A)=max{λ,λλ(A)}\rho(\textbf{A})=max\{|\lambda|,\lambda\in\lambda(\textbf{A})\}

Theorem: If λ1,...,λk\lambda_1,...,\lambda_k are distinct eigenvalues with associated eigenvectors

x(1),...,x(k)\textbf{x}^{(1)},...,\textbf{x}^{(k)}

then the set of eigenvectors is linearly independent.

Definition: orthogonal requires vivj=cδij\textbf{v}_i\cdot\textbf{v}_j=c\delta_{ij}, when c=1c=1, it’s orthonormal.


Theorem (Gerschgorin Circle Theorem): A\textbf{A} is an n×nn\times n matrix, and Ri\mathbb{R}_i denotes the circle in the complex plane with center aiia_{ii}:

Ri={zCzaiij=1,jinaij}\mathbb{R}_i=\{z\in\mathcal{C}||z-a_{ii}|\leq\sum_{j=1,j\neq i}^n|a_{ij}|\}

Then the eigenvalues of A\textbf{A} are contained with R=i=1nRi\mathbb{R}=\cup_{i=1}^n\mathbb{R}_i.

Moreover, the union of any kk of these circles that do not intersect the remaining (nk)(n-k) circles contains precisely kk (counting multiplicities) of the eigenvalues.

# Computing Eigenvalues and Eigenvectors

Iterative Power Method

Assume that the matrix A\textbf{A} has n eigenvalues

λ1λ2λ3...λn|\lambda_1|\geq|\lambda_2|\geq|\lambda_3|\geq...\geq|\lambda_n|

with an associated collection of linearly independent eigenvectors

{v(1),...,v(n)}\{\textbf{v}^{(1)},...,\textbf{v}^{(n)}\}

If x\textbf{x} is any vector in Rn\mathbb{R}^n, then constants β1,...,βn\beta_1,...,\beta_n exist with

x=j=1nβjv(j)\textbf{x}=\sum_{j=1}^n\beta_j\textbf{v}^{(j)}

So we have

Akx=λ1kj=1nβj(λjλ1)kv(j)=λ1k(β1v(1)+j=2nβj(λjλ1)kv(j))\textbf{A}^k\textbf{x}=\lambda_1^k\sum_{j=1}^n\beta_j(\frac{\lambda_j}{\lambda_1})^k\textbf{v}^{(j)}\\ =\lambda_1^k(\beta_1\textbf{v}^{(1)}+\sum_{j=2}^n\beta_j(\frac{\lambda_j}{\lambda_1})^k\textbf{v}^{(j)})

limk(λjλ1)k=0limkAkx=limkλ1kβ1v(1)\because \lim_{k\rightarrow\infty}(\frac{\lambda_j}{\lambda_1})^k=0\\ \therefore\lim_{k\rightarrow\infty}\textbf{A}^k\textbf{x}=\lim_{k\rightarrow\infty}\lambda_1^k\beta_1\textbf{v}^{(1)}

However, the limit may converge to zero or diverge.

# The power method

  • Choose an arbitrary unit vector x(0)\textbf{x}^{(0)} relative to ||\cdot||_{\infty}.

  • Suppose a component xp0(0)x_{p0}^{(0)} of x(0)\textbf{x}^{(0)} with

    xp0(0)=1=x(0)x_{p0}^{(0)}=1=||\textbf{x}^{(0)}||_\infty

  • let y(1)=Ax(0)\textbf{y}^{(1)}=\textbf{Ax}^{(0)}, and define

    μ(1)=yp0(1)=yp0(1)xp0(0)=β1λ1vp0(1)+j=2nβjλjvp0(j)β1vp0(1)+j=2nβjvp0(j)=λ1[β1vp0(1)+j=2nβj(λj/λ1)vp0(j)β1vp0(1)+j=2nβjvp0(j)]\mu^{(1)}=y_{p0}^{(1)}=\frac{y_{p0}^{(1)}}{x_{p0}^{(0)}} =\frac{\beta_1\lambda_1v_{p0}^{(1)}+\sum_{j=2}^n\beta_j\lambda_jv_{p0}^{(j)}}{\beta_1v_{p0}^{(1)}+\sum_{j=2}^n\beta_jv_{p0}^{(j)}}\\ =\lambda_1[\frac{\beta_1v_{p0}^{(1)}+\sum_{j=2}^n\beta_j(\lambda_j/\lambda_1)v_{p0}^{(j)}}{\beta_1v_{p0}^{(1)}+\sum_{j=2}^n\beta_jv_{p0}^{(j)}}]

    where v(i)\textbf{v}^{(i)} is the eigenvectors and βj\beta_j is coefficient satisfying:

    x(0)=j=1nβjv(j)\textbf{x}^{(0)}=\sum_{j=1}^n\beta_j\textbf{v}^{(j)}

  • let p1p1 be the least integer such that yp1(1)=y(1)y_{p1}^{(1)}=||\textbf{y}^{(1)}||_\infty.

    Define x(1)\textbf{x}^{(1)} by

    x(1)=Ax(0)yp1(1)\textbf{x}^{(1)}=\frac{\textbf{Ax}^{(0)}}{y_{p1}^{(1)}}

  • Similarly, define

    y(m)=Ax(m1)μ(m)=ypm(m)=λ1[β1vpm1(1)+j=2nβj(λj/λ1)mvpm1(j)β1vpm1(1)+j=2nβj(λj/λ1)m1vpm1(j)]\textbf{y}^{(m)}=\textbf{Ax}^{(m-1)}\\ \mu^{(m)}=y_{pm}^{(m)}=\lambda_1[\frac{\beta_1v_{p_{m-1}}^{(1)}+\sum_{j=2}^n\beta_j(\lambda_j/\lambda_1)^mv_{p_{m-1}}^{(j)}}{\beta_1v_{p_{m-1}}^{(1)}+\sum_{j=2}^n\beta_j(\lambda_j/\lambda_1)^{m-1}v_{p_{m-1}}^{(j)}}]

    let pmp_m be the smallest integer with ypm(m)=y(m)|y_{pm}^{(m)}|=||\textbf{y}^{(m)}||_\infty,

    x(m)=y(m)ypm(m)=Amx(0)k=1mypk(k)\textbf{x}^{(m)}=\frac{\textbf{y}^{(m)}}{y_{pm}^{(m)}}=\frac{\textbf{A}^m\textbf{x}^{(0)}}{\prod_{k=1}^my_{pk}^{(k)}}

  • limmμ(m)=λ1\lim_{m\rightarrow\infty}\mu^{(m)}=\lambda_1

So the process of iteration is actually computing

y(m)pmμ(m)y(m+1)...\textbf{y}^{(m)}\rightarrow p_m\rightarrow \mu^{(m)}\rightarrow \textbf{y}^{(m+1)}\rightarrow...

if μ(m)=0\mu^{(m)}=0, then output x(m1)\textbf{x}^{(m-1)} and 0 immediately.

# Symmetric Power Method

If A\textbf{A} is symmetric:

  • Choose an arbitrary unit vector x2(0)||\textbf{x}||_2^{(0)}.
  • Let y(m)=Ax(m1)\textbf{y}^{(m)}=\textbf{Ax}^{(m-1)}.
  • Set μ(m)=(x(m1))Ty(m)\mu^{(m)}=(\textbf{x}^{(m-1)})^T\textbf{y}^{(m)}.
  • let x(m)=y/y2\textbf{x}^{(m)}=\textbf{y}/||\textbf{y}||_2.
  • Repeat computing μ\mu.

# Inverse Power method

Noticing that the two methods above can only compute the biggest eigenvalue.

And we can compute the eigenvalue which is most close to qq by applying the method above to (AqI)1(\textbf{A}-q\textbf{I})^{-1} since its eigenvalues can be sorted as:

1λ1q,1λ2q,...,1λnq\frac{1}{\lambda_1-q},\frac{1}{\lambda_2-q},...,\frac{1}{\lambda_n-q}

# Rayleigh Quotient Iteration

xTAx=λxTxλ=xTAxx22\because \textbf{x}^T\textbf{Ax}=\lambda\textbf{x}^T\textbf{x}\\ \therefore \lambda=\frac{\textbf{x}^T\textbf{Ax}}{||\textbf{x}||_2^2}

It’s useful to accelerate the convergence of a method.

By computing:

q(m)=(x(m1))TAx(m1)x(m1)22q^{(m)}=\frac{(\textbf{x}^{(m-1)})^T\textbf{Ax}^{(m-1)}}{||\textbf{x}^{(m-1)}||_2^2}

and solve the linear system (Aq(m)I)y(m)=x(m1)(\textbf{A}-q^{(m)}\textbf{I})\textbf{y}^{(m)}=\textbf{x}^{(m-1)} to get y(m)\textbf{y}^{(m)}.

and we can converge to 1λq\frac{1}{\lambda-q} more quickly.

# QR Factorization

# Householder transformation

Definition: Householder transformation has form

H=I2vvTvTv\textbf{H}=\textbf{I}-2\frac{\textbf{vv}^T}{\textbf{v}^T\textbf{v}}

for nonzero vector v\textbf{v}.

Property: H=HT=H1\textbf{H}=\textbf{H}^T=\textbf{H}^{-1}.

Given a vector a\textbf{a}, we want to find a vector v\textbf{v} so that

Ha=(α00...0)=αe1\textbf{Ha}=\begin{pmatrix}\alpha\\0\\0\\...\\0\end{pmatrix}=\alpha\textbf{e}_1

the vector is

v=(aαe1)vTv2vTa\textbf{v}=(\textbf{a}-\alpha\textbf{e}_1)\frac{\textbf{v}^T\textbf{v}}{2\textbf{v}^T\textbf{a}}

and α=sign(a1)a2\alpha=-sign(a_1)||\textbf{a}||_2, where a1a_1 is the first element of a\textbf{a}.


The process of QR factorization:

Consider a matrix \textbf{A}=\begin{pmatrix}2&2\\1&4\\2&6\end

  • The first column of it is u1\textbf{u}_1, then choose \textbf{v}=\textbf{u}_1-(-||\textbf{v}_1||_2\textbf{e}_1)=\begin{pmatrix}5\\1\\2\end

  • Then H=I2vvTvTv\textbf{H}=\textbf{I}-2\frac{\textbf{vv}^T}{\textbf{v}^T\textbf{v}}

    HA=(36.6602.2602.53)\textbf{HA}=\begin{pmatrix}-3&-6.66\\0&2.26\\0&2.53\end{pmatrix}

  • choose the second column of HA\textbf{HA} and repeat the process.

# Givens Rotations

Consider a matrix A\textbf{A}, if we want it to be up triangle matrix.

so we need to change all aij(i>j)a_{ij}(i>j) to zero.

consider the rotate operator:

R=(100000c0s0001000s0c000001)\textbf{R}=\begin{pmatrix}1&0&0&0&0\\ 0&c&0&s&0\\ 0&0&1&0&0\\ 0&-s&0&c&0\\ 0&0&0&0&1 \end{pmatrix}

It will only work on the second and the fourth row of the matrix.

RA=(a11a12a13a14a15ca21+sa41ca22+sa42ca23+sa43ca24+sa44ca25+sa45a31a32a33a34a35ca41sa21ca42sa22ca43sa23ca44sa24ca45sa25a51a52a53a54a55)\textbf{RA}=\begin{pmatrix}a_{11}&a_{12}&a_{13}&a_{14}&a_{15}\\ ca_{21}+sa_{41}&ca_{22}+sa_{42}&ca_{23}+sa_{43}&ca_{24}+sa_{44}&ca_{25}+sa_{45}\\ a_{31}&a_{32}&a_{33}&a_{34}&a_{35}\\ ca_{41}-sa_{21}&ca_{42}-sa_{22}&ca_{43}-sa_{23}&ca_{44}-sa_{24}&ca_{45}-sa_{25}\\ a_{51}&a_{52}&a_{53}&a_{54}&a_{55}\\ \end{pmatrix}

So the basic idea is that, by applying rotate operators, we can change A\textbf{A}'s first column into αe1\alpha\textbf{e}_1, then change the second row,…

since the first column is all zero when i>1i>1, so changing the second row won’t influence the first column.

Edited on Views times