Extrema

Section 11.1 Extrema

Subsection 11.1.1 Extreme Values

Let's first define what we mean by extreme values.

Definition 11.1.1.

Let \(f: \RR^n \rightarrow \RR\) be a differentiable scalar function. Then a point \(p \in \RR^n\) is a local maximum of \(f\) is there exists \(\epsilon > 0\) such that \(f(p) \geq f(q)\) for all \(q \in B(p, \epsilon)\text{.}\) Similarly, a point \(p \in \RR^n\) is a local minimum of \(f\) is there exists \(\epsilon > 0\) such that \(f(p) \leq f(q)\) for all \(q \in B(p, \epsilon)\text{.}\)

This definition clarifies that if a point is a minimum or maximum, is it is a peak or valley in all directions. It needs to be above or below nearby function values in any direction in the domain. In two variables, we also want to classify a new kind of extreme value.

Definition 11.1.2.

Let \(f: \RR^2 \rightarrow \RR\) be a differentiable function. Then a point \(p \in \RR^n\) is a saddle point of \(f\) if there there are two unit directions \(u\) and \(v\) in \(\RR^2\) and \(\epsilon > 0\) such that \(f(p \pm \delta u) \geq f(p)\) and \(f(p \pm \delta v) \leq f(q)\) for all \(\delta \lt \epsilon\text{.}\)

A saddle point is both a minimum and a maximum: it is a minimum in some direction \(v\) and a maximum in some other direction \(u\text{.}\) It is called a saddle point for the saddle-like shape that results from this situation for graphs of two-variable funcitons. For higher dimensions, a saddle point is any point which is a maximum is some number of directions and a minimum is all other direction (for some linearly independent set of directions in the domain).

Subsection 11.1.2 Finding Exrema

A key obervation from Calculus I is that maxima and minima where found when \(f^\prime(x) = 0\text{.}\) (Though \(f^\prime =0\) didn't guarantee an exterme value, as in the example of \(f(x) = x^3\) at \(x=0\)).

Proposition 11.1.3.

Let \(f: \RR^n \rightarrow \RR\) be a differential function. Assume \(f\) has a minimum, maximum, saddle point, or other unclassified extremum at \(p \in \RR^n\text{.}\) Then \(\nabla f (p) = 0\text{.}\)

The gradient measures the direction of greatest change. At a minimum, maximum or saddle point, there is no such direction, so the gradient is zero. As with single-variable calculus, the implication is only one direction. As with single variable function, there may be points where the gradient is zero but the point is neither a minimum, maximum nor saddle point.

The gradient is the vector of partial derivatives, so it is important to note that all the partial derivative must vanish. If only some of them vanish, we get interesting behaviour, but not maxima or minima.

Subsection 11.1.3 Examples

Figure 11.1.4. The function \(f(x,y) = x-y^2+3\text{.}\)

Example 11.1.5.

For example, consider the function \(f(x,y) = x-y^2+3\text{,}\) shown in Figure 11.1.4. We have \(f_y(x,y) = -2y\text{,}\) which vanishes everywhere along the \(x\) axis (when \(y=0\)). However, the other partial is \(f_x = 1\text{,}\) which never vanishes. This means that all points \((x,0)\) are potentiall critical in \(y\) but not in \(x\text{.}\) What you get with this function is a ascending/decending ridge (with slope \(1\)) above the \(x\) axis.

Figure 11.1.6. The function \(f(x,y) = \cos (x+y)\)

Example 11.1.7.

Since we work in several dimensions, we can have very complicated sets of maxima/minima. The function \(f(x,y) = \cos (x+y)\text{,}\) shown in Figure 11.1.6 has a maximum whenever \(x+y\) is an even multiple of \(\pi\text{.}\) Each of those sets is a whole line, \(x+y=0\text{,}\) \(x+y = 2\pi\) and so on. For functions of two variables, its easy to have lines or curves of maximum or minimum points. In higher dimensions, we can have surfaces or hypersurfaces of maximum or minimum points.

We want to classify critical points: points where \(\nabla f = 0\text{.}\) We can do this informally by looking at each variable individually. If the point is a maximum in all of \(x_1, x_2, \ldots, x_n\text{,}\) then it is a maximum according to our definition above. Likewise, if it is a minimum in all the variables, it is a minimum. If it is a maxmum in some of the variables and a minimum in others, it is a saddle point. However, if even one of the variable has no max/min behvaiour (like \(f(x) = x^3\) at \(x=0\)), then the point is neither a maximum, minimum nor saddle point.

Subsection 11.1.4 Hessian Matrices

This informal approach is reasonable, but it would be nice to have a more formal method for determining the behaviour. In single variable calculus, we have the second-derivative test. If \(a\) is a critical point, then it is a maximum if \(f^{\prime \prime}(a)\) is negative, a minimum if \(f^{\prime \prime}(a)\) is positive, and the test is inconclusive if \(f^{\prime \prime}(a) = 0\text{.}\)

We'd like to generalize this, but we have many second derivatives: all of the possible mixed and non-mixed second partials. One way we can organize all these second partials is in a matrix.

Definition 11.1.8.

The matrix of all the second partial derivatives of a scalar function \(f\) is called the Hessian Matrix.

Here is the Hessian matrix for \(f(x,y)\) in two variables.

\begin{equation*} \left( \begin{matrix} \dfrac{\del^2 f}{\del x^2} \amp \dfrac{\del^2 f}{\del y \del x} \\ \dfrac{\del^2 f}{\del x \del y} \amp \dfrac{\del^2 f}{\del y^2} \end{matrix} \right) \end{equation*}

Here is the Hessian matrix for \(f(x,y,z)\) in three variables.

\begin{equation*} \left( \begin{matrix} \dfrac{\del^2 f}{\del x^2} \amp \dfrac{\del^2 f}{\del y \del x} \amp \dfrac{\del^2 f}{\del z \del y} \\ \dfrac{\del^2 f}{\del x \del y} \amp \dfrac{\del^2 f}{\del y^2} \amp \dfrac{\del^2 f}{\del z \del y} \\ \dfrac{\del^2 f}{\del x \del z} \amp \dfrac{\del^2 f}{\del y \del z} \amp \dfrac{\del^2 f}{\del z^2} \end{matrix} \right) \end{equation*}

Note that the Hessian is not the Jacobian matrix from before; that matrix had only first derivatives. The Hessian matrix only applies to single valued function (outputs in \(\RR\)), is always square, and lists all the possible second partials.

The Hessian matrix captures all of the information about the second derivative of this function, but it is often too unwieldy to be used to determine the behaviour of critical points. However, we have a useful tool from linear algebra to get specific information out of a matrix: the determinant. Let \(D\) be the determinant of the Hessian matrix. For \(f(x,y)\text{,}\) \(D\) has the following form (using Clairaut's theorm to simplify the mixed partials).

\begin{equation*} D = \frac{\del^2 f}{\del x^2} \frac{\del^2 f}{\del y^2} - \left( \frac{\del^2 f}{\del x \del y} \right)^2 \end{equation*}

For functions of two variables, the determinant of the Hessian tells us most of what we need to know.

Proposition 11.1.9.

Let \(f: \RR^2 \rightarrow \RR\) be a \(C^2\) function and let \((a,b)\) be a critical point. Then we have four cases.

If \(D(a,b) > 0\) and \(\frac{\del^2 f}{\del x^2} (a,b) > 0\) then the critical point is a minimum.
If \(D(a,b) > 0\) and \(\frac{\del^2 f}{\del x^2} (a,b) \lt 0\) then the critical point is a maximum.
If \(D(a,b) \lt 0\) then the critical point is a saddle point.
If \(D(a,b) = 0\) then the test is inconclusive.

This proposition can be generalized to higher dimensions, but it requires more machinery from linear algebra, namely eigenvalues. Clairaut's theorem means that the Hessian is always a symmetric matrix, so it always has a maximal set of real eigenvalues. The general proposition classifies the extrema using those eigenvalues.

Proposition 11.1.10.

Let \(f: \RR^n \rightarrow \RR\) be a \(C^2\) function with \(H\) its Hessian matrix. Let \(u \in \RR^n\) be a critical point and let \(H(u)\) be the Hessian evaluated at the point \(u\text{.}\) Then we have four cases.

If \(H(u)\) is not invertible (has determinant 0, has 0 as an eigenvalue), then the test is inconclusive.
If all the eigenvalues of \(H(u)\) are positive, then the critical point is a local minimum.
If all the eigenvalues of \(H(u)\) are negative, then the critical point is a local maximum.
If the eigenvalues of \(H(u)\) are a mix of positive and negative, then the critical point is a saddle point or a higher dimensional analogue mixing minima in some directions and maxima in others.

Subsection 11.1.5 Examples

Figure 11.1.11. The function \(f(x,y) = x^2+2y^2 - 4x + 4y + 6\text{.}\)

Example 11.1.12.

\(f(x,y) = x^2 y^2 + y^2\) is shown in Figure 11.1.11.

\begin{align*} \frac{\del f}{\del x} \amp = 2xy^2 \amp \amp \\ \frac{\del f}{\del y} \amp = 2x^2y + 2y \amp \amp \\ \nabla f(x,y) \amp = 0 \implies (x,y) = (a,0) \amp \amp \forall a \in \RR\\ \frac{\del^2 f}{\del x^2} \amp = 2y \amp \amp \\ \frac{\del^2 f}{\del y^2} \amp = 2x^2 + 2 \amp \amp \\ \frac{\del^2 f}{\del x \del y} \amp = 4xy \amp \amp \\ D \amp = (2y)(2x^2+ 2) - 16x^2 y^2 \amp \amp \\ \amp = 4yx^2 + 4y - 16x^2 y^2 \implies D(a,0) = 0 \amp \amp \end{align*}

At all the critical points \((a,0)\text{,}\) the second derivative test has \(D=0\text{,}\) which is inconclusive. We have to investigate directly. We can see that \(f(a,0) = 0\text{.}\) But \(f(a,b)\) for \(b\) any small non-zero nubmer takes the value \(a^2 b^2 + b^2\text{,}\) which is always positive. Therefore, we can conclude that all the critical points \((a,0)\) are local minima. In Figure 11.1.11, we can see that all along the \(y\) axis the values stay at \(0\text{,}\) which is the lowest output of the function.

Figure 11.1.13. The function \(f(x,y) = x^2 + 2y^2 - 4x + 4y + 6\text{.}\)

Example 11.1.14.

\(f(x,y) = x^2 + 2y^2 - 4x + 4y + 6\) is shown in Figure 11.1.13.

\begin{align*} \frac{\del f}{\del x} \amp = 2x-4\\ \frac{\del f}{\del y} \amp = 4y+4\\ \nabla f(x,y) \amp = 0 \implies (x,y) = (2,-1)\\ \frac{\del^2 f}{\del x^2} \amp = 2 > 0\\ \frac{\del^2 f}{\del y^2} \amp = 4 > 0\\ \frac{\del^2 f}{\del x \del y} \amp = 0\\ D \amp = 8 > 0 \end{align*}

The point \((2,-1)\) is a local minimum.

Figure 11.1.15. The function \(f(x,y) =\frac{8}{3}x^3 + 4y^3 - x^4 - y^4\text{.}\)

Example 11.1.16.

\(f(x,y) = \frac{8}{3} x^3 + 4y^3 - x^4 - y^4\) is shown in Figure 11.1.15

\begin{align*} \frac{\del f}{\del x} \amp = 8x^2 - 4x^3 = 4x^2 (2-x)\\ \frac{\del f}{\del y} \amp = 12y^2 - 4y^3 = 4y^2 (3-y)\\ \nabla f(x,y) \amp = 0 \implies (x,y) = (0,0), (0, 3), (2, 0), (2,3)\\ \frac{\del^2 f}{\del x^2} \amp = 16x - 12x^2 = 3x(4-3x)\\ \frac{\del^2 f}{\del y^2} \amp = 24y - 12y^2 = 12(2-y)\\ \frac{\del^2 f}{\del x \del y} \amp = 0\\ D \amp = 48xy (4-3x)(2-y)\\ D(0,0) \amp = 0 \\ D(0,3) \amp = 0 \\ D(2,0) \amp = 0 \\ D(2,3) \amp > 0 \\ \frac{\del^2 f}{\del x^2} (2,3) \lt 0 \end{align*}

The point \((2,3)\) is a local maximum, but the test is inconclusive at all other points. As we can see in Figure 11.1.15, there is a maximum above \((2,3)\text{.}\) At the other three critical points, we can observe a momentary flattening of the graph, but we have neither maxima, minima, nor saddle points. The zero Hessian for these points makes sense; they are like the critical point at \(x=0\) of the cubic \(f(x) = x^3\text{,}\) which has a zero derivative, but no extremum.

Example 11.1.17.

Let's try a more geometric optimization problem: what is the largest rectangular prism you can fit in a sphere of radius \(r\text{?}\) We'll assume that the prism is centrally located in the sphere, which means that its shape is entirely determined by one of its vertices on the edge of the sphere. If that vertex is \((x,y,z)\text{,}\) then the volume of the prism is \(2x \times 2y \times 2z\text{.}\)

I'd like to work in spherical coordinates instead of \(x\text{,}\) \(y\) and \(z\text{.}\) The radius \(r\) is fixed, but \(\theta\) (longitude) and \(\phi\) (colatitude) will vary.

\begin{align*} h \amp = 2r \cos \phi\\ w \amp = 2r \sin \phi \cos \theta\\ l \amp = 2 r \sin \phi \sin \theta\\ V \amp = hwl = 8r^3 \cos \phi \sin^2 \phi \cos \theta \sin \theta = 4 r^3 \sin (2\theta) (\cos \phi - \cos^3 \phi) \end{align*}

Then we can optimize the function \(V(\theta, \phi)\text{.}\)

\begin{align*} \frac{\del V}{\del \phi} \amp = 4r^3 \sin (2\theta) (- \sin \phi + 3 \cos^2 \phi \sin \phi)\\ \frac{\del V}{\del \theta} \amp = 8r^3 \cos 2 \theta (\cos \phi - \cos^3 \phi)\\ \nabla V \amp = 0 \implies (\phi, \theta) = \left( \arccos \frac{1}{\sqrt{3}}, \frac{\pi}{4} \right)\\ V \amp = 4t^3 \sin \frac{\pi}{2} \left( \frac{1}{\sqrt{3}} - \frac{1}{3\sqrt{3}} \right) = \frac{8r^3}{3\sqrt{3}} \end{align*}

We didn't do the calcluation, but it is reasonable to check that the critical point represents a maximum. The area we get is the area of cube of side length \(\frac{2r}{\sqrt{3}}\text{,}\) which seems, intuitively, like the right kind of length.

Subsection 11.1.6 Global Extrema

The above analysis was all about local maxima and minima. We can also ask for global maxima and minima. Like the single variable case, we do this by looking at all the local extrema as well as the boundary. The maximum or minimum might be a boundary point which is not a critical point. We do have an existence proposition, which relies on the topology of the domain.

Proposition 11.1.18.

Let \(C\) be closed set in \(\RR^n\) and \(f: C \rightarrow R\text{.}\) Then \(f\) has at least one global maximum and at least one global minimum, either at a local maximum/minimum or on its boundary.

In general, finding the maximum and minimum on the boundary can be quite difficult. In the next section, we'll give another technique for finding such maxima and minima.