I start with the definition of extreme values. This extends the definition from Calculus I for single variable functions.
Definition5.1.1.
Let \(f: \RR^n \rightarrow \RR\) be a differentiable scalar function. Then a point \(p \in \RR^n\) is a local maximum of \(f\) is there exists \(\epsilon > 0\) such that \(f(p) \geq f(q)\) for all \(q \in B(p,
\epsilon)\text{.}\) Similarly, a point \(p \in \RR^n\) is a local minimum of \(f\) is there exists \(\epsilon >
0\) such that \(f(p) \leq f(q)\) for all \(q \in B(p,
\epsilon)\text{.}\)
This definition clarifies that if a point is a minimum or maximum, is it is a peak or valley in all directions. It needs to be above or below nearby function values in any direction in the domain. In two variables, I also want to classify a new kind of extreme value.
Definition5.1.2.
Let \(f: \RR^2 \rightarrow \RR\) be a differentiable function. Then a point \(p \in \RR^n\) is a saddle point of \(f\) if there there are two unit directions \(u\) and \(v\) in \(\RR^2\) and \(\epsilon > 0\) such that \(f(p \pm \delta u) \geq f(p)\) and \(f(p \pm
\delta v) \leq f(q)\) for all \(\delta \lt \epsilon\text{.}\)
A saddle point is both a minimum and a maximum: it is a minimum in some direction \(v\) and a maximum in some other direction \(u\text{.}\) It is called a saddle point for the saddle-like shape that results from this situation for graphs of two-variable funcitons. For higher dimensions, a saddle point is any point which is a maximum is some number of directions and a minimum is all other direction (for some linearly independent set of directions in the domain).
Subsection5.1.2Finding Extrema
A key obervation from Calculus I is that maxima and minima were found when \(f^\prime(x) = 0\text{.}\) (Though \(f^\prime =0\) didn’t guarantee an extreme value, as in the example of \(f(x)
= x^3\) at \(x=0\)).
Proposition5.1.3.
Let \(f: \RR^n \rightarrow \RR\) be a differential function. Assume \(f\) has a minimum, maximum, saddle point, or other unclassified extremum at \(p \in \RR^n\text{.}\) Then \(\nabla f (p) = 0\text{.}\)
The gradient measures the direction of greatest change. At a minimum, maximum or saddle point, there is no such direction, so the gradient is zero. As with single-variable calculus, the implication is only one direction. As with single variable function, there may be points where the gradient is zero but the point is neither a minimum, maximum nor saddle point.
The gradient is the vector of partial derivatives, so it is important to note that all the partial derivatives must vanish. If only some of them vanish, the graph may have interesting behaviour, but the point is not a maximum or minimum.
Subsection5.1.3Examples
Figure5.1.4.The function \(f(x,y) = x-y^2+3\text{.}\)
For example, consider the function \(f(x,y) = x-y^2+3\text{,}\) shown in Figure Figure 5.1.4. The \(y\) partial is \(f_y(x,y) = -2y\text{,}\) which vanishes everywhere along the \(x\) axis (when \(y=0\)). However, the other partial is \(f_x = 1\text{,}\) which never vanishes. This means that all points \((x,0)\) are potentiall critical in \(y\) but not in \(x\text{.}\) This function is a ascending/decending ridge (with slope \(1\)) above the \(x\) axis.
Since scalar fields are defined in several dimensions, the collection of maxima/minima can be complicated sets. The function \(f(x,y) = \cos (x+y)\text{,}\) shown in Figure Figure 5.1.6 has a maximum whenever \(x+y\) is an even multiple of \(\pi\text{.}\) Each of those sets is a whole line, \(x+y=0\text{,}\)\(x+y = 2\pi\) and so on. For functions of two variables, its easy to have lines or curves of maximum or minimum points. In higher dimensions, there can be surfaces or hypersurfaces of maximum or minimum points.
I want to classify critical points: points where \(\nabla f =
0\text{.}\) We can do this informally by looking at each variable individually. If the point is a maximum in all of \(x_1, x_2,
\ldots, x_n\text{,}\) then it is a maximum according to our definition above. Likewise, if it is a minimum in all the variables, it is a minimum. If it is a maximum in some of the variables and a minimum in others, it is a saddle point. However, if even one of the variable has no max/min behvaiour (like \(f(x) = x^3\) at \(x=0\)), then the point is neither a maximum, minimum nor saddle point.
Subsection5.1.4Hessian Matrices
This informal approach is reasonable, but it would be nice to have a more formal method for determining the behaviour. In single variable calculus, a formal method is the second-derivative test. If \(a\) is a critical point, then it is a maximum if \(f^{\prime \prime}(a)\) is negative, a minimum if \(f^{\prime \prime}(a)\) is positive, and the test is inconclusive if \(f^{\prime \prime}(a) = 0\text{.}\)
I’d like to generalize this, but we have many second derivatives: all of the possible mixed and non-mixed second partials. One way I can organize all these second partials is in a matrix.
Definition5.1.8.
The matrix of all the second partial derivatives of a scalar function \(f\) is called the Hessian Matrix.
Here is the Hessian matrix for \(f(x,y)\) in two variables.
Here is the Hessian matrix for \(f(x,y,z)\) in three variables.
\begin{equation*}
\begin{pmatrix}
\dfrac{\del^2 f}{\del x^2} \amp \dfrac{\del^2 f}{\del y \del
x} \amp \dfrac{\del^2 f}{\del z \del y} \\[1em]
\dfrac{\del^2 f}{\del x \del y} \amp \dfrac{\del^2 f}{\del
y^2} \amp \dfrac{\del^2 f}{\del z \del y} \\[1em]
\dfrac{\del^2 f}{\del x \del z} \amp \dfrac{\del^2 f}{\del y
\del z} \amp \dfrac{\del^2 f}{\del z^2}
\end{pmatrix}
\end{equation*}
Note that the Hessian is not the Jacobian matrix from before; that matrix had only first derivatives. The Hessian matrix only applies to single valued function (outputs in \(\RR\)), is always square, and lists all the possible second partials.
The Hessian matrix captures all of the information about the second derivative of this function, but it is often too unwieldy to be used to determine the behaviour of critical points. However, I can use a useful tool from linear algebra to get specific information out of a matrix: the determinant. Let \(D\) be the determinant of the Hessian matrix. For \(f(x,y)\text{,}\)\(D\) has the following form (using Clairaut’s theorm to simplify the mixed partials).
\begin{equation*}
D = \frac{\del^2 f}{\del x^2} \frac{\del^2 f}{\del y^2} -
\left( \frac{\del^2 f}{\del x \del y} \right)^2
\end{equation*}
For functions of two variables, the determinant of the Hessian determines the behaviour.
Proposition5.1.9.
Let \(f: \RR^2 \rightarrow \RR\) be a \(C^2\) scalar field and let \((a,b)\) be a critical point. Then there are four cases.
If \(D(a,b) > 0\) and \(\frac{\del^2 f}{\del x^2}
(a,b) > 0\) then the critical point is a minimum.
If \(D(a,b) > 0\) and \(\frac{\del^2 f}{\del x^2}
(a,b) \lt 0\) then the critical point is a maximum.
If \(D(a,b) \lt 0\) then the critical point is a saddle point.
If \(D(a,b) = 0\) then the test is inconclusive.
This proposition can be generalized to higher dimensions, but it requires more machinery from linear algebra, namely eigenvalues. Clairaut’s theorem means that the Hessian is always a symmetric matrix, so it always has a maximal set of real eigenvalues. The general proposition classifies the extrema using those eigenvalues.
Proposition5.1.10.
Let \(f: \RR^n \rightarrow \RR\) be a \(C^2\) scalar field with \(H\) its Hessian matrix. Let \(u \in
\RR^n\) be a critical point and let \(H(u)\) be the Hessian evaluated at the point \(u\text{.}\) Then there are four cases.
If \(H(u)\) is not invertible (has determinant 0, has 0 as an eigenvalue), then the test is inconclusive.
If all the eigenvalues of \(H(u)\) are positive, then the critical point is a local minimum.
If all the eigenvalues of \(H(u)\) are negative, then the critical point is a local maximum.
If the eigenvalues of \(H(u)\) are a mix of positive and negative, then the critical point is a saddle point or a higher dimensional analogue mixing minima in some directions and maxima in others.
\(f(x,y) = x^2 y^2 + y^2\) is shown in Figure Figure 5.1.11. I calculate the gradient to find the potential extrema. Then I calculate the second partials and the determinant of the Hessian matrix to classify the extrema.
At all the critical points \((a,0)\text{,}\) the second derivative test has \(D=0\text{,}\) which is inconclusive. I have to investigate directly. I can see that \(f(a,0) =
0\text{.}\) But \(f(a,b)\) for \(b\) any small non-zero nubmer takes the value \(a^2 b^2 + b^2\text{,}\) which is always positive. Therefore, I can conclude that all the critical points \((a,0)\) are local minima. In Figure Figure 5.1.11, I can see that all along the \(y\) axis the values stay at \(0\text{,}\) which is the lowest output of the function.
\(f(x,y) = x^2 + 2y^2 - 4x + 4y + 6\) is shown in Figure Figure 5.1.13. I calculate the gradient to find the potential extrema. Then I calculate the second partials and the determinant of the Hessian matrix to classify the extrema.
\(f(x,y) = \frac{8}{3} x^3 + 4y^3 - x^4 - y^4\) is shown in Figure Figure 5.1.15. I calculate the gradient to find the potential extrema. Then I calculate the second partials and the determinant of the Hessian matrix to classify the extrema.
The point \((2,3)\) is a local maximum, but the test is inconclusive at all other points. As can be seen in Figure Figure 5.1.15, there is a maximum above \((2,3)\text{.}\) At the other three critical points, I can observe a momentary flattening of the graph, but there are neither maxima, minima, nor saddle points. The zero Hessian for these points makes sense; they are like the critical point at \(x=0\) of the cubic \(f(x) = x^3\text{,}\) which has a zero derivative, but no extremum.
I’ll try a more geometric optimization problem: what is the largest rectangular prism that can fit in a sphere of radius \(r\text{?}\) I’ll assume that the prism is centrally located in the sphere, which means that its shape is entirely determined by one of its vertices on the edge of the sphere. If that vertex is \((x,y,z)\text{,}\) then the volume of the prism is \(2x \times 2y \times 2z\text{.}\)
I’d like to work in spherical coordinates instead of \(x\text{,}\)\(y\) and \(z\text{.}\) The radius \(r\) is fixed, but \(\theta\) (longitude) and \(\phi\) (colatitude) will vary.
\begin{align*}
h \amp = 2r \cos \phi\\
w \amp = 2r \sin \phi \cos \theta\\
l \amp = 2 r \sin \phi \sin \theta\\
V \amp = hwl = 8r^3 \cos \phi \sin^2 \phi \cos \theta
\sin \theta = 4 r^3 \sin (2\theta) (\cos \phi - \cos^3
\phi)
\end{align*}
Then I can optimize the function \(V(\theta, \phi)\text{.}\)
I didn’t do the calcluation, but it is reasonable to check that the critical point represents a maximum. The resulting area is the area of cube of side length \(\frac{2r}{\sqrt{3}}\text{,}\) which seems, intuitively, like the right kind of length.
Subsection5.1.6Global Extrema
The above analysis was all about local maxima and minima. I can also ask for global maxima and minima. Like the single variable case, I do this by looking at all the local extrema as well as the boundary. The maximum or minimum might be a boundary point which is not a critical point. There is an existence proposition, which relies on the topology of the domain.
Proposition5.1.18.
Let \(C\) be closed set in \(\RR^n\) and \(f: C
\rightarrow R\text{.}\) Then \(f\) has at least one global maximum and at least one global minimum, either at a local maximum/minimum or on its boundary.
In general, finding the maximum and minimum on the boundary can be quite difficult.