Why does it work?
This page is for those curious about how the
chain rule works. If you are interested, read slowly and
carefully with pencil in hand.
Now that we know how to use the chain, rule, let's see
why it works.
First recall the definition of derivative:
$$f'(x) = \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} = \lim_{\Delta x
\to 0} \frac{\Delta f}{\Delta x},$$
where $\Delta f = f(x+h)-f(x)$ is the change in $f(x)$ (the rise)
and $\Delta x=h$ is the change in $x$ (the run).
From change in $x$ to change in $y$
In other words, whenever $\Delta x$ is small, $\frac{\Delta
f}{\Delta x}$ is close to $f'(x)$.
Another way of saying that is: $$
\hbox{if } \Delta x \approx0, \,\, \text{ then }\,
\Delta f \approx f'(x) \Delta x.$$
Multiplying by $f'(x)$ converts changes in $x$ into
changes in $f(x)$. We say that $f'(x)$ is a conversion factor
from
changes in $x$ to changes in $f(x)$.
From change in $x$ to change in $y$, passing through $u$
Now suppose that $y=f(u)$ and that $u=g(x)$. Then
$\frac{dy}{du}=f'(u)$ is the conversion factor between changes in
$u$ and changes in $y$. That is, $$\Delta y \approx f'(u) \Delta
u.$$
Meanwhile, $\frac{du}{dx}=g'(x)$ is the conversion factor between
changes in $x$ and changes in $u$. That is, $$\Delta u \approx
g'(x) \Delta x.$$
Put those two results together to get $$\Delta y \approx f'(u)
\Delta u \approx f'(u)g'(x) \Delta x,$$ which means that the
conversion factor from $\Delta x$ to $\Delta y$ is $f'(u)g'(x)$,
or equivalently that $$\frac{dy}{dx}=
\frac{dy}{du}\frac{du}{dx}.$$
That's the chain rule!
|