Skip to main content

Course Notes for Calculus I

Section 7.1 The Chain Rule

Subsection 7.1.1 Derivatives of Compositions

The rules so far give the tools to calculate many derivatives. However, they don’t address composition. How do I differentiate \(\sin x^2\text{?}\) The rule that addresses derivatives of composed functions is called the chain rule. The chain rule has two notations, which I’ll present in a proposition.

Proof.

The proof of the chain rule is tricky, but the following steps give the idea. The argument is a little imprecise in the fourth step, where I replace \(g(x)\) with the temporary variable \(u\text{,}\) since I haven’t established that such an operation is valid. In the proof, let \(F = f \circ g\text{.}\)
\begin{align*} \frac{dF}{dx}(a) \amp = \lim_{x \rightarrow a} \frac{f \circ g (x) - f \circ g(a)}{x-a} = \lim_{x \rightarrow a} \frac{f(g(x)) - f(g(a))}{x-a}\\ \amp = \lim_{x \rightarrow a} \frac{f(g(x)) - f(g(a))}{g(x) - g(a)} \frac{g(x) - g(a)}{x-a}\\ \amp = \lim_{x \rightarrow a} \frac{f(g(x)) - f(g(a))}{g(x) - g(a)} \lim_{x \rightarrow a} \frac{g(x) - g(a)}{x-a}\\ \amp = \lim_{u \rightarrow g(a)} \frac{f(u) - f(g(a))}{u - g(a)} \lim_{x \rightarrow a} \frac{g(x) - g(a)}{x-a}\\ \amp = \left. \frac{df}{du} \right|_{u=g(a)} \left. \frac{dg}{dx} \right|_{x=a} = f^\prime(g(a)) g^\prime(a) \end{align*}
The variety of notations is somewhat a result of the difficult nature of the rule—it’s hard to clearly describe. The key idea is to think of composition in terms of inside and outside functions. The derivative of the composition is the derivative of the outside (\(f^\prime\)) evaluted at the inside (\(f^\prime(g(x))\)), then multiplied by the derivative of the inside (\(g^\prime\)). There is one other way of expressing the chain rule, which looks the most elegant.
\begin{equation*} \frac{df}{dx} = \frac{df}{dg} \frac{dg}{dx} \end{equation*}
In this last version of the chain rule, I treat \(df\text{,}\) \(dg\) and \(dx\) as something like pieces of fractions in Leibniz notation (\(\frac{df}{dx}\)). These terms are not fractions, but they act a bit like fractions. (These notations come from a different historical treatment of calculus, where the rules were constructed with an arithmetic of ‘infinitesimals’ — infinitely small but non-zero numbers. \(df\text{,}\) \(dg\) and \(dx\) were all infinitesimals in that alternative construction. This way of studying calculus fell out of favour in the 20th century, but some of the notation lingers.)

Example 7.1.2.

Here are some chain rule examples. In these examples, I’ll already use the rule \(\frac{d}{dx} \ln x = \frac{1}{x}\text{,}\) even though the rule will be proved later in this section. In the first example, the outside function is \(e^u\) and the inside function is \(ax\text{.}\)
\begin{equation*} \frac{d}{dx} e^{ax} = \frac{d}{du} e^u \Bigg|_{u = ax} \frac{d}{dx} (ax) = ae^{ax} \end{equation*}
In the next, the outside function is \(\sin u\) and the inside functon is \(3x\text{.}\) After the derivatives, I’ll move the constant to the front of the function. This isn’t necessary, but fits the general conventions of writing constants and polynomials to the left of trigonometric or exponential functions.
\begin{equation*} \frac{d}{dx} \sin 3x = \frac{d}{du} \sin u \Bigg|_{u = 3x} \frac{d}{dx} (3x) = 3 \cos 3x \end{equation*}
In the next, the outside function is \(e^u\) and the inside function is \(x^2 + 2\text{.}\)
\begin{equation*} \frac{d}{dx} e^{x^2+2} = \frac{d}{du} e^u \Bigg|_{u = x^2 + 2} \frac{d}{dx} (x^2 + 2) = (2x) e^{x^2+2} \end{equation*}
In the next, the outside function is \(\ln u\) and the inside function is \(5x^3 + 7x + 1\text{.}\)
\begin{equation*} \frac{d}{dx} \ln (5x^3 + 7x + 1) = \frac{d}{du} \ln u \Bigg|_{u = 5x^3 + 7x + 1} \frac{d}{dx} (5x^3 + 7x + 1) = \frac{15x^2 + 7}{5x^3 + 7x + 1} \end{equation*}
In the last example, this is a triple compsition. I use the chain rule twice, making the outside function as easy as possible in each step. The first outside function is \(\cos u\) and the second outside function is \(e^v\text{.}\)
\begin{align*} \frac{d}{dx} \cos ( e^{12 x}) \amp = \frac{d}{du} \cos y \Bigg|_{u=e^{12 x}} \frac{d}{dx} e^{12 x} = -\sin u \Bigg|_{u=e^{12x}} \\ \amp = -\sin (e^{12 x}) \frac{d}{dv} e^v \Bigg|_{v = 12 x} \frac{d}{dv} 12 x = -\sin (e^{12 x}) e^v \Bigg|_{v = 12x} (12) \\ \amp = -12 \sin (e^{12 x}) e^{12 x} \end{align*}

Example 7.1.3.

I can use the chain rule to differentiate arbitrary exponential functions. I gave an argument for this rule in Section 6.3 using the limit definition, but the argument using the chain rule is more direct (one the derivative of \(e^x\) is known, at least). This derivative uses a common exponential trick, where I write \(a^x\) as \(e^{\ln a^x} = e^{a \ln x}\text{.}\) Since I applied the exponential and the logarithm, this doesn’t change the function, but this forms allows the use of the chain rule and then the product rule.
\begin{equation*} \frac{d}{dx} a^x = \frac{d}{dx} e^{x\ln a} = \frac{d}{du} e^u \Bigg|_{u = x \ln a} \frac{d}{dx} (x \ln a) = e^{x \ln a} \ln a = a^x \ln a \end{equation*}

Example 7.1.4.

This derivative also makes use of the same exponential trick as the previous examples. In the chain rule step, the derivative of the inside function (which is \(x \ln x\)) requires the product rule.
\begin{equation*} \frac{d}{dx} x^x = \frac{d}{dx} e^{x\ln x} = e^{x\ln x} \frac{d}{dx} x\ln x = x^x (\ln x \frac{d}{dx} x + x \frac{d}{dx} \ln x) = x^x(\ln x + 1) \end{equation*}

Subsection 7.1.2 Derivatives of Inverse Functions

In the previous examples, I used the rule \(\frac{d}{dx} \ln x = \frac{1}{x}\) without justification. What is this justification? The logarithm is the inverse of the exponential, which leads to thinking about the derivatives of inverse functions. There is a rule for such derivatives. I will write \(x = f \circ f^{-1}(x)\text{,}\) differentiate both sides using the chain rule, and rearrange to get an expression for \(\frac{d}{dx} f^{-1}(x)\text{.}\)
\begin{align*} 1 \amp = \frac{d}{dx} x = \frac{d}{dx} f \circ f^{-1} = f^\prime ( f^{-1}(x)) \frac{d}{dx} f^{-1}(x)\\ \frac{d}{dx} f^{-1}(x) \amp = \frac{1}{f^\prime(f^{-1}(x))} \end{align*}
This is now a reliable method for differentiating inverse functions as long as I know the derivative of the original. I will now use this to differentiate \(\ln x\text{.}\)
\begin{equation*} \frac{d}{dx} \ln x = \frac{1}{e^{\ln x}} = \frac{1}{x} \end{equation*}
I can also prove derivative rules for the inverse trig functions using this inverse derivative rule.
\begin{align*} \frac{d}{dx} \arccos x \amp = \frac{1}{-\sin(\arccos x)} = \frac{-1}{\sqrt{1-\cos^2(\arccos x)}} = \frac{-1}{\sqrt{1-x^2}}\\ \frac{d}{dx} \arcsin x \amp = \frac{1}{\cos(\arcsin x)} = \frac{1}{\sqrt{1-\sin^2(\arcsin x)}} = \frac{1}{\sqrt{1-x^2}}\\ \frac{d}{dx} \arctan x \amp = \frac{1}{\sec^2(\arctan x)} = \frac{1}{1+\tan^2(\arctan x)} = \frac{1}{1+x^2} \end{align*}

Example 7.1.5.

Here are two calculations of chain rule derivatives that make use of some of the inverse derivatives I just calculated. In the first inside function, I need linearity as well. In the second inside function, I need both linearity and the product rule to differentiate \(x + 3x \arcsin x\text{.}\)
\begin{align*} \frac{d}{dx} \ln(x^2+1) \amp = \frac{1}{x^2+1} \frac{d}{dx} x^2+1 = \frac{2x}{x^2+1}\\ \frac{d}{dx} \ln (x + 3x \arcsin x) \amp = \frac{1}{x+3\arcsin x} \left( 1 + 3\arcsin x + \frac{3x}{\sqrt{1-x^2}} \right) \end{align*}

Example 7.1.6.

Usually, when I differentiate, I get complicated expressions which I can’t simplify into nice forms. Every now and then, there is an example that allows a very nice simplifcation. Consider this difficult derivative. I haven’t annotated all the algebra here, but what I am doing is the first step is differentiating using linearity on the two pieces, then the product rule and the chainr rule of the first piece of the sum, and finally the chain rule followed by linearity followed by the chain rule again on the second piece of the sum. I’ve gone just to the derivative result and then done a bunch of algebraic manipulations to simplify, showing that the complicated result in this special case will condense down to something relatively elegant.
\begin{align*} \amp \frac{d}{dx} \left( \frac{x}{2} \sqrt{x^2-9} - \frac{9}{2} \ln( x + \sqrt{x^2-9} ) \right) \\ \amp = \frac{\sqrt{x^2-9}}{2} + \frac{x 2x}{4\sqrt{x^2-9}} - \frac{9}{2} \frac{1}{x+ \sqrt{x^2-9}} \left( 1 + \frac{2x}{2\sqrt{x^2-9}} \right)\\ \amp = \frac{\sqrt{x^2-9}}{2} + \frac{x^2}{2\sqrt{x^2-9}} - \frac{9}{2} \frac{2 \sqrt{x^2-9} + 2x}{2\sqrt{x^2-9}(x+ \sqrt{x^2-9})}\\ \amp = \frac{\sqrt{x^2-9}}{2} + \frac{x^2}{2\sqrt{x^2-9}} - \frac{9}{2} \frac{\sqrt{x^2-9} + x}{\sqrt{x^2-9}(x+ \sqrt{x^2-9})}\\ \amp = \frac{\sqrt{x^2-9}}{2} + \frac{x^2-9}{2\sqrt{x^2-9}}\\ \amp = \frac{\sqrt{x^2-9}}{2} + \frac{\sqrt{x^2-9}}{2} = \sqrt{x^2-9} \end{align*}