Math 449 Numerical Applied Mathematics (Fall 2020)

Textbook:

An Introduction to Numerical Analysis, by Endre Süli and David Mayers, published by Cambridge University Press.

Numerical Optimization, by Jorge Nocedal and Stephen J. Wright, Springer Series in Operations Research and Financial Engineering book series. Free e-copy at https://link.springer.com/book/10.1007/978-0-387-40065-5

Iterative Methods for Linear and Nonlinear Equations, by C. T. Kelley, Frontiers in Applied Mathematics SIAM. Free e-copy at https://epubs.siam.org/doi/book/10.1137/1.9781611970944

Note: to download the electronic copy of the books from the publisher, you need either WashU library proxy or using a campus network.

Github repository

https://github.com/scaomath/wustl-math449

Schedule

This is only tentative, and with a fairly good chance, is subject to change due to COVID-19. Another note is that the lecture numbering may not directly correspond to the actual lecture given.

Module 1

Duration: Week 1-2 (Sept 14-Sept 25)

Problem: solve $f(x)=0$ approximately for a possible highly nonlinear $f(x)$.

Examples: how to find an accurate enough solution to $e^x - x - 2 = 0$, of which the solution does not have a closed analytical form (such as $\ln (1+ \sqrt{2}$).

Material: (Süli-Mayers Section 1.1-1.4)

Lecture 1: Brouwer’s fixed point theorem (Theorem 1.2), contraction mapping theorem (Theorem 1.3)
Lecture 2: simple iterative methods, rate of convergence
Lecture 3: relaxation (Def 1.5),
Lecture 4: Newton’s method (Def 1.6), and the proof of the rate of convergence (Def 1.4, Def 1.7, Theorem 1.8).

Module 2

Duration: Week 3-6 (Sept 25-Oct 19)

Problem: solve a (big) linear system $A\boldsymbol{x} = \boldsymbol{b}$ using Matrix decomposition/iterative methods for some special structured matrices $A$.

Examples: solve the finite difference discretization of $-(a(x) u’)’+ b(x)u’ + c(x)u = f(x)$ for $u: [0,1]\to \mathbb{R}, x\mapsto u(x)$, a toy model being the gateway to many problems arising from structural mechanics, fluid dynamics, chemistry.

Material: (selected material from Süli-Mayers Chapter 2-3, Nocedal-Wright Chapter 8 and Appendix)

Lecture 5: overview of linear solver (Süli-Mayers Section 2.1)
Lecture 6: Jacobi iteration (https://mathworld.wolfram.com/JacobiMethod.html).
Lecture 7: norms, spectral radius of a matrix (Nocedal-Wright Appendix A.1)
Lecture 8: convergence of the fixed point iteration in the form of $\boldsymbol{x}^{(k+1)} = T\boldsymbol{x}^{(k)} + \boldsymbol{c}$.
Lecture 9: finite difference method for a two-point boundary value problem, generation of a tri-diagonal matrix (Süli-Mayers Section 13.2)
Lecture 10: error analysis of finite difference method (Süli-Mayers Section 13.3)
Lecture 11: (cont.) error analysis of finite difference method.

Module 3

Duration: Week 6-7 (Oct 21-Nov 2)

Problem: solve for eigenvalue(s) and eigenvector(s) for $A \boldsymbol{x} = \lambda \boldsymbol{x}$.

Examples: solve $\displaystyle\frac{\mathrm{d} \boldsymbol{x}}{\mathrm{d} t}=A \boldsymbol{x}$, which can be the equation from a semi-discrete formulation for a physical model that evolves.

Material: (selected material from Süli-Mayers Chapter 5)

Lecture 12: Introduction of eigenvalue problems (Süli-Mayers Section 5.1)
Lecture 13: Motivation for approximating eigenvalue problems,
Lecture 14: (cont.) convergence of Jacobi iteration for finite difference methods.
Lecture 15: Gershgorin circle theorem, power method.

Module 4

Duration: Week 9-11 (Nov 9-Dec 18)

Problem: Unconstrained optimization of finding $\displaystyle\min_{\mathbf{x}\in \mathbb{R}^N} f(\mathbf{x})$.

Examples: Solving a linear regression or logistic regression problem, least square problems like $\displaystyle \min_{\mathbf{w}\in \mathbb{R}^N} \Vert A\mathbf{w} - \mathbf{b}\Vert^2 + \epsilon\Vert \mathbf{w} \Vert^2$ need to be solved. Transforming certain $A\boldsymbol{x} = \boldsymbol{b}$ to $\displaystyle\min_{\boldsymbol{x}} \frac{1}{2} \boldsymbol{x}^{\top} A\boldsymbol{x} - \boldsymbol{b}^{\top}\boldsymbol{x}$ .

Material: (Nocedal-Wright Chapter 2,3,5,6) Gradient descent, line search, convergence analysis, Newton’s method, quasi-Newton methods, Broyden-Fletcher-Goldfarb-Shanno (BFGS) method, Conjugate gradient method.

Lecture 16: Introduction to optimization (Nocedal-Wright Chapter 1, Appendix A.2)
Lecture 17: How to find a local minimum for a function defined in $\mathbb{R}^2$, quadratic forms (Nocedal-Wright Chapter 2.1).
Lecture 18: First order condition for a local minimum, directional derivatives, steepest descent (Nocedal-Wright Chapter 2.1, 2.2).
Lecture 19: Second order condition for a local minimum (Nocedal-Wright Chapter 2.1, 2.2)
Lecture 20: How to choose the step size for gradient descent, steepest descent through line search. (Nocedal-Wright Chapter 2.2, 3.1)
Lecture 21: The rate of convergence in $\ell^2$-norm when applying the gradient descent with $\alpha_k = 2/(\lambda_{\max}+\lambda_{\min})$ on $\displaystyle\min_{\boldsymbol{x}\in \mathbb{R}^n} \frac{1}{2} \boldsymbol{x}^{\top} Q\boldsymbol{x} - \boldsymbol{b}^{\top}\boldsymbol{x}$ is $(\kappa-1)/(\kappa+1)$, where $\kappa$ is the condition number of $Q$. (Nocedal-Wright Chapter 3.3)
Lecture 22: Looking for $\min_{\boldsymbol{y}\in S}|\boldsymbol{y} - \boldsymbol{x}^*|$ is equivalent of looking for a projection by solving $(\boldsymbol{y} - \operatorname{Proj}_S \boldsymbol{y})\cdot \boldsymbol{v}=0$ for any $\boldsymbol{v}\in S$. (Nocedal-Wright assumes us already knowing this)
Lecture 23: Solving $\displaystyle\min_{\boldsymbol{x}\in \mathbb{R}^n} \frac{1}{2} \boldsymbol{x}^{\top} Q\boldsymbol{x} - \boldsymbol{b}^{\top}\boldsymbol{x}$ equivalent to solving $\displaystyle \min_{\boldsymbol{x}\in \mathbb{R}^n} |\boldsymbol{x} - \boldsymbol{x}^*|_Q$. (Nocedal-Wright assumes us already knowing this)
Lecture 24: Conjugate gradient (Nocedal-Wright Chapter 5.1).

Module 5 (skipped)

Duration: Week 7 (Nov 6-Nov 9)

Problem: If we only know the values $(x_i, f(x_i))$ and/or the values of the derivatives $(x_i, f’(x_i))$ of a function $f(x)$ at certain data points, while the expression of this function remains unseen, how to construct a (piecewise) polynomial function $p_n(x)$ that “represents” this function reasonably well.

Examples: For a toy function $e^x$, sample its values at $x=-1,0,1$, then construct a quadratic polynomial that agrees with $e^x$ at these three points.

Material: (Süli-Mayers Chapter 6 and 11) Lagrange interpolation, interpolation error, Runge phenomenon, Hermite interpolation, spline, piecewise polynomial approximation.

Module 6 (skipped)

Duration: Week 8 (Nov 2-Nov 6)

Problem: Approximating $\displaystyle\int^b_a f(x) \,\mathrm{d} x$ on computer without knowing the antiderivatives of $f(x)$ explicitly.

Examples: For a toy function $e^x$, sample its values at $x=-1,0,1$, then construct a quadratic polynomial that agrees with $e^x$ at these three points.

Material: (Süli-Mayers Chapter 7 and 10) Newton–Cotes formulae, error analysis, Richardson extrapolation, Gauss quadrature.

Module 7 (Moved to Math 450 in Spring 2021)

Duration: Week 12-14 (Nov 30-Dec 18)

Problem: Modern variants of optimization methods for $\displaystyle\min_{\mathbf{w}\in \mathbb{R}^n} \frac{1}{N}\sum_{i=1}^N f_i(\mathbf{w})$ when $N$ is large.

Examples: $h(X; \mathbf{w})$ stands for the output of an underlying model (e.g., a feedforward neural network), $X$ are input data of the model with $\mathbf{w}$ representing the parameters of the model. Its corresponding least square loss function to be minimized is $\displaystyle \min_{\mathbf{w}\in \mathbb{R}^N} \Vert h(X; \mathbf{w}) - \mathbf{y}\Vert^2 + \epsilon\Vert \mathbf{w} \Vert^2$.

Material: Stochastic gradient descent, convergence analysis, step size choice, acceleration and variance reduction techniques (momentum, batch, SVRG, SAGA), convexification.

Reference: Bottou, L., Curtis, F.E. and Nocedal, J., 2018. “Optimization methods for large-scale machine learning”. SIAM Review, 60(2), pp.223-311. arXiv link

Teaching