# Math 449 Numerical Applied Mathematics (Fall 2020)

## Textbook:

** An Introduction to Numerical Analysis**, by Endre Süli and David Mayers, published by Cambridge University Press.

** Numerical Optimization**, by Jorge Nocedal and Stephen J. Wright, Springer Series in Operations Research and Financial Engineering book series. Free e-copy at https://link.springer.com/book/10.1007/978-0-387-40065-5

** Iterative Methods for Linear and Nonlinear Equations**, by C. T. Kelley, Frontiers in Applied Mathematics SIAM. Free e-copy at https://epubs.siam.org/doi/book/10.1137/1.9781611970944

Note: to download the electronic copy of the books from the publisher, you need either WashU library proxy or using a campus network.

## Github repository

https://github.com/scaomath/wustl-math449

## Schedule

This is only tentative, and with a fairly good chance, is subject to change due to COVID-19. Another note is that the lecture numbering may not directly correspond to the actual lecture given.

### Module 1

**Duration**: Week 1-2 (Sept 14-Sept 25)

**Problem**: solve $f(x)=0$ approximately for a possible highly nonlinear $f(x)$.

**Examples**: how to find an accurate enough solution to $e^x - x - 2 = 0$, of which the solution does not have a closed analytical form (such as $\ln (1+ \sqrt{2}$).

**Material**: (Süli-Mayers Section 1.1-1.4)

- Lecture 1: Brouwer’s fixed point theorem (Theorem 1.2), contraction mapping theorem (Theorem 1.3)
- Lecture 2: simple iterative methods, rate of convergence
- Lecture 3: relaxation (Def 1.5),
- Lecture 4: Newton’s method (Def 1.6), and the proof of the rate of convergence (Def 1.4, Def 1.7, Theorem 1.8).

### Module 2

**Duration**: Week 3-6 (Sept 25-Oct 19)

**Problem**: solve a (big) linear system $A\boldsymbol{x} = \boldsymbol{b}$ using Matrix decomposition/iterative methods for some special structured matrices $A$.

**Examples**: solve the finite difference discretization of $-(a(x) u’)’+ b(x)u’ + c(x)u = f(x)$ for $u: [0,1]\to \mathbb{R}, x\mapsto u(x)$, a toy model being the gateway to many problems arising from structural mechanics, fluid dynamics, chemistry.

**Material**: (selected material from Süli-Mayers Chapter 2-3, Nocedal-Wright Chapter 8 and Appendix)

- Lecture 5: overview of linear solver (Süli-Mayers Section 2.1)
- Lecture 6: Jacobi iteration (https://mathworld.wolfram.com/JacobiMethod.html).
- Lecture 7: norms, spectral radius of a matrix (Nocedal-Wright Appendix A.1)
- Lecture 8: convergence of the fixed point iteration in the form of $\boldsymbol{x}^{(k+1)} = T\boldsymbol{x}^{(k)} + \boldsymbol{c}$.
- Lecture 9: finite difference method for a two-point boundary value problem, generation of a tri-diagonal matrix (Süli-Mayers Section 13.2)
- Lecture 10: error analysis of finite difference method (Süli-Mayers Section 13.3)
- Lecture 11: (cont.) error analysis of finite difference method.

### Module 3

**Duration**: Week 6-7 (Oct 21-Nov 2)

**Problem**: solve for eigenvalue(s) and eigenvector(s) for $A \boldsymbol{x} = \lambda \boldsymbol{x}$.

**Examples**: solve $\displaystyle\frac{\mathrm{d} \boldsymbol{x}}{\mathrm{d} t}=A \boldsymbol{x}$, which can be the equation from a semi-discrete formulation for a physical model that evolves.

**Material**: (selected material from Süli-Mayers Chapter 5)

- Lecture 12: Introduction of eigenvalue problems (Süli-Mayers Section 5.1)
- Lecture 13: Motivation for approximating eigenvalue problems,
- Lecture 14: (cont.) convergence of Jacobi iteration for finite difference methods.
- Lecture 15: Gershgorin circle theorem, power method.

### Module 4

**Duration**: Week 9-11 (Nov 9-Dec 18)

**Problem**: Unconstrained optimization of finding $\displaystyle\min_{\mathbf{x}\in \mathbb{R}^N} f(\mathbf{x})$.

**Examples**: Solving a linear regression or logistic regression problem, least square problems like $\displaystyle \min_{\mathbf{w}\in \mathbb{R}^N} \Vert A\mathbf{w} - \mathbf{b}\Vert^2 + \epsilon\Vert \mathbf{w} \Vert^2$ need to be solved. Transforming certain $A\boldsymbol{x} = \boldsymbol{b}$ to
$\displaystyle\min_{\boldsymbol{x}} \frac{1}{2} \boldsymbol{x}^{\top} A\boldsymbol{x} - \boldsymbol{b}^{\top}\boldsymbol{x}$ .

**Material**: (Nocedal-Wright Chapter 2,3,5,6) Gradient descent, line search, convergence analysis, Newton’s method, quasi-Newton methods, Broyden-Fletcher-Goldfarb-Shanno (BFGS) method, Conjugate gradient method.

- Lecture 16: Introduction to optimization (Nocedal-Wright Chapter 1, Appendix A.2)
- Lecture 17: How to find a local minimum for a function defined in $\mathbb{R}^2$, quadratic forms (Nocedal-Wright Chapter 2.1).
- Lecture 18: First order condition for a local minimum, directional derivatives, steepest descent (Nocedal-Wright Chapter 2.1, 2.2).
- Lecture 19: Second order condition for a local minimum (Nocedal-Wright Chapter 2.1, 2.2)
- Lecture 20: How to choose the step size for gradient descent, steepest descent through line search. (Nocedal-Wright Chapter 2.2, 3.1)
- Lecture 21: The rate of convergence in $\ell^2$-norm when applying the gradient descent with $\alpha_k = 2/(\lambda_{\max}+\lambda_{\min})$ on $\displaystyle\min_{\boldsymbol{x}\in \mathbb{R}^n} \frac{1}{2} \boldsymbol{x}^{\top} Q\boldsymbol{x} - \boldsymbol{b}^{\top}\boldsymbol{x}$ is $(\kappa-1)/(\kappa+1)$, where $\kappa$ is the condition number of $Q$. (Nocedal-Wright Chapter 3.3)
- Lecture 22: Looking for $\min_{\boldsymbol{y}\in S}|\boldsymbol{y} - \boldsymbol{x}^*|$ is equivalent of looking for a projection by solving $(\boldsymbol{y} - \operatorname{Proj}_S \boldsymbol{y})\cdot \boldsymbol{v}=0$ for any $\boldsymbol{v}\in S$. (Nocedal-Wright assumes us already knowing this)
- Lecture 23: Solving $\displaystyle\min_{\boldsymbol{x}\in \mathbb{R}^n} \frac{1}{2} \boldsymbol{x}^{\top} Q\boldsymbol{x} - \boldsymbol{b}^{\top}\boldsymbol{x}$ equivalent to solving $\displaystyle \min_{\boldsymbol{x}\in \mathbb{R}^n} |\boldsymbol{x} - \boldsymbol{x}^*|_Q$. (Nocedal-Wright assumes us already knowing this)
- Lecture 24: Conjugate gradient (Nocedal-Wright Chapter 5.1).

### Module 5 (skipped)

**Duration**: Week 7 (Nov 6-Nov 9)

**Problem**: If we only know the values $(x_i, f(x_i))$ and/or the values of the derivatives $(x_i, f’(x_i))$ of a function $f(x)$ at certain data points, while the expression of this function remains unseen, how to construct a (piecewise) polynomial function $p_n(x)$ that “represents” this function reasonably well.

**Examples**: For a toy function $e^x$, sample its values at $x=-1,0,1$, then construct a quadratic polynomial that agrees with $e^x$ at these three points.

**Material**: (Süli-Mayers Chapter 6 and 11) Lagrange interpolation, interpolation error, Runge phenomenon, Hermite interpolation, spline, piecewise polynomial approximation.

### Module 6 (skipped)

**Duration**: Week 8 (Nov 2-Nov 6)

**Problem**: Approximating $\displaystyle\int^b_a f(x) \,\mathrm{d} x$ on computer without knowing the antiderivatives of $f(x)$ explicitly.

**Examples**: For a toy function $e^x$, sample its values at $x=-1,0,1$, then construct a quadratic polynomial that agrees with $e^x$ at these three points.

**Material**: (Süli-Mayers Chapter 7 and 10) Newton–Cotes formulae, error analysis, Richardson extrapolation, Gauss quadrature.

### Module 7 (Moved to Math 450 in Spring 2021)

**Duration**: Week 12-14 (Nov 30-Dec 18)

**Problem**: Modern variants of optimization methods for $\displaystyle\min_{\mathbf{w}\in \mathbb{R}^n} \frac{1}{N}\sum_{i=1}^N f_i(\mathbf{w})$ when $N$ is large.

**Examples**: $h(X; \mathbf{w})$ stands for the output of an underlying model (e.g., a feedforward neural network), $X$ are input data of the model with $\mathbf{w}$ representing the parameters of the model. Its corresponding least square loss function to be minimized is $\displaystyle \min_{\mathbf{w}\in \mathbb{R}^N} \Vert h(X; \mathbf{w}) - \mathbf{y}\Vert^2 + \epsilon\Vert \mathbf{w} \Vert^2$.

**Material**: Stochastic gradient descent, convergence analysis, step size choice, acceleration and variance reduction techniques (momentum, batch, SVRG, SAGA), convexification.

**Reference**: Bottou, L., Curtis, F.E. and Nocedal, J., 2018. “Optimization methods for large-scale machine learning”. *SIAM Review*, 60(2), pp.223-311. arXiv link