Why does it work?
Now that we know how to use the chain, rule, let's see why it works.
First recall the definition of derivative:
f′(x)=lim
where \Delta f = f(x+h)-f(x) is the change in f(x) (the rise) and
\Delta x=h is the change in x (the run).
From change in x to change in y
In other words, whenever \Delta x is small,
\frac{\Delta f}{\Delta x} is close to f'(x).
Another way of saying that is:
\hbox{if } \Delta x \approx0, \,\, \text{ then }\,
\Delta f \approx f'(x) \Delta x.
Multiplying by f'(x) converts changes in x into
changes in f(x). We say that f'(x) is a conversion factor from
changes in x to changes in f(x).
From change in x to change in y, passing through u
Now suppose that y=f(u) and that u=g(x).
Then \frac{dy}{du}=f'(u) is the conversion factor
between changes in u and changes in y. That is,
\Delta y \approx f'(u) \Delta u.
Meanwhile, \frac{du}{dx}=g'(x) is the conversion
factor between changes in x and changes in u.
That is, \Delta u \approx g'(x) \Delta x.
Put those two results together to get
\Delta y \approx f'(u) \Delta u \approx f'(u)g'(x) \Delta x,
which means that the conversion factor from \Delta x to \Delta y
is f'(u)g'(x), or equivalently that
\frac{dy}{dx}= \frac{dy}{du}\frac{du}{dx}.
That's the chain rule!
|