Why does it work?
Now that we know how to use the chain, rule, let's see why it works.
First recall the definition of derivative:
$$f'(x) = \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} =
\lim_{\Delta x \to 0} \frac{\Delta f}{\Delta x},$$
where $\Delta f = f(x+h)-f(x)$ is the change in $f(x)$ (the rise) and
$\Delta x=h$ is the change in $x$ (the run).
From change in $x$ to change in $y$
In other words, whenever $\Delta x$ is small,
$\frac{\Delta f}{\Delta x}$ is close to $f'(x)$.
Another way of saying that is:
$$
\hbox{if } \Delta x \approx0, \,\, \text{ then }\,
\Delta f \approx f'(x) \Delta x.$$
Multiplying by $f'(x)$ converts changes in $x$ into
changes in $f(x)$. We say that $f'(x)$ is a conversion factor from
changes in $x$ to changes in $f(x)$.
From change in $x$ to change in $y$, passing through $u$
Now suppose that $y=f(u)$ and that $u=g(x)$.
Then $\frac{dy}{du}=f'(u)$ is the conversion factor
between changes in $u$ and changes in $y$. That is,
$$\Delta y \approx f'(u) \Delta u.$$
Meanwhile, $\frac{du}{dx}=g'(x)$ is the conversion
factor between changes in $x$ and changes in $u$.
That is, $$\Delta u \approx g'(x) \Delta x.$$
Put those two results together to get
$$\Delta y \approx f'(u) \Delta u \approx f'(u)g'(x) \Delta x,$$
which means that the conversion factor from $\Delta x$ to $\Delta y$
is $f'(u)g'(x)$, or equivalently that
$$\frac{dy}{dx}= \frac{dy}{du}\frac{du}{dx}.$$
That's the chain rule!
|