M408M Learning Module Pages
Main page Chapter 10: Parametric Equations and Polar CoordinatesChapter 12: Vectors and the Geometry of SpaceChapter 13: Vector FunctionsChapter 14: Partial DerivativesLearning module LM 14.1: Functions of 2 or 3 variables:Learning module LM 14.3: Partial derivatives:Learning module LM 14.4: Tangent planes and linear approximations:Learning module LM 14.5: Differentiability and the chain rule:Learning module LM 14.6: Gradients and directional derivatives:Learning module LM 14.7: Local maxima and minima:Maxima, minima and critical pointsClassifying critical points Example problems Linear regression Learning module LM 14.8: Absolute maxima and Lagrange multipliers:Chapter 15: Multiple Integrals |
Linear regressionLinear Regression: There are many other applications of optimization. For example, 'fitting' a curve to data is often important for modelling and prediction. To the left below, a linear fit seems appropriate for the given data, while a quadratic fit seems more appropriate for the data to the right.
Suppose an experiment is conducted at times x1,x2,…xn yielding observed values y1,y2,…yn at these respective times. If the points (xj,yj), 1≤j≤n, are then plotted and they look like ones the to left above, one might conclude that the experiment can be described mathematically by a Linear Model indicated by the line drawn through the data. Using y=mx+b for the equation of the model line, we get predicted values p1,p2,…pn at x1,x2,…xn by setting pj=mxj+b. The difference pj−yj=mxj+b−yj between the predicted and observed values is a measure of the error at the jth-observation: it measures how far above or below the model line the observed value yj lies. We want to minimize the total error over all observations.
Taking squares (pj−yj)2 avoids positive and negative errors canceling each other out. Other choices like |pj−yj| could be used, but since we'll want to differentiate to determine m and b, the calculus will be a lot simpler if we don't use absolute values!! In the following video, we derive the equations for a least squares line and work an example. For your convenience, we repeat the calculation in text form: To determine the critical point of E(m,b) we compute the components of the gradient and set them equal to zero: ∂E∂m = 2(x1(mx1+b−y1)+x2(mx2+b−y2)+…+xn(mxn+b−yn)) = 0,
∂E∂b = 2((mx1+b−y1)+(mx2+b−y2)+…+(mxn+b−yn)) = 0,
After rearranging terms, we get
0 = ∂E∂m=2((x21+x22+…+x2n)m+(x1+x2+…+xn)b−(x1y1+x2y2+…+xnyn)),0 = ∂E∂m=2((x1+x2+…+xn)m+(1+1+…+1)b−(y1+y2+…+yn)).
That's two linear equations in m and b, and solving two linear
equations in two unknowns isn't hard. In fact, most spreadsheet
programs or computer algebra systems have a built-in algorithm for
calculating m and b to determine the regression line for a given
data set. Here's how they do it.
Dividing the two equations by 2 and rearranging terms gives the system of equations (n∑i=1x2i)m+(n∑i=1xi)b = n∑i=1xiyi,(n∑i=1xi)m+nb = n∑i=1yi.
Instead of looking at sums, it's convenient to look at averages,
which we denote with angle brackets.
Let ⟨x⟩ = 1nn∑i=1xi,⟨x2⟩ = 1nn∑i=1x2i,⟨y⟩ = 1nn∑i=1yi,⟨xy⟩ = 1nn∑i=1xiyi.
After dividing by n, our equations become
⟨x2⟩m+⟨x⟩b = ⟨xy⟩,⟨x⟩m+⟨x⟩b = ⟨y⟩.
The first equation minus ⟨x⟩ times the second equation gives
(⟨x2⟩−⟨x⟩2)m = ⟨xy⟩−⟨x⟩⟨y⟩,
which we can solve for m. Plugging that back into the second equation
gives b. The results are
|