Difference between revisions of "The Least-squares Solution"
Line 66: | Line 66: | ||
:<math>\hat{\boldsymbol{\beta}}= \boldsymbol{\beta} + (\mathbf{X}^\top \mathbf{X} )^{-1}\mathbf {X}^\top \boldsymbol{\varepsilon}.</math> | :<math>\hat{\boldsymbol{\beta}}= \boldsymbol{\beta} + (\mathbf{X}^\top \mathbf{X} )^{-1}\mathbf {X}^\top \boldsymbol{\varepsilon}.</math> | ||
− | == | + | == Licensing == |
− | + | Content obtained and/or adapted from: | |
− | * [https://en.wikipedia.org/wiki/Ordinary_least_squares Ordinary least squares | + | * [https://en.wikipedia.org/wiki/Ordinary_least_squares Ordinary least squares, Wikipedia] under a CC BY-SA license |
Latest revision as of 11:07, 4 November 2021
Ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the given dataset and those predicted by the linear function of the independent variable.
Geometrically, this is seen as the sum of the squared distances, parallel to the axis of the dependent variable, between each data point in the set and the corresponding point on the regression surface—the smaller the differences, the better the model fits the data. The resulting estimator can be expressed by a simple formula, especially in the case of a simple linear regression, in which there is a single regressor on the right side of the regression equation.
The OLS estimator is consistent when the regressors are exogenous, and—by the Gauss–Markov theorem-optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances. Under the additional assumption that the errors are normally distributed, OLS is the maximum likelihood estimator.
Linear model
Suppose the data consists of observations . Each observation includes a scalar response and a column vector of parameters (regressors), i.e., . In a linear regression model, the response variable, , is a linear function of the regressors:
or in vector form,
where , as introduced previously, is a column vector of the -th observation of all the explanatory variables; is a vector of unknown parameters; and the scalar represents unobserved random variables (errors) of the -th observation. accounts for the influences upon the responses from sources other than the explanators . This model can also be written in matrix notation as
where and are vectors of the response variables and the errors of the observations, and is an matrix of regressors, also sometimes called the design matrix, whose row is and contains the -th observations on all the explanatory variables.
As a rule, the constant term is always included in the set of regressors , say, by taking for all . The coefficient corresponding to this regressor is called the intercept.
Regressors do not have to be independent: there can be any desired relationship between the regressors (so long as it is not a linear relationship). For instance, we might suspect the response depends linearly both on a value and its square; in which case we would include one regressor whose value is just the square of another regressor. In that case, the model would be quadratic in the second regressor, but none-the-less is still considered a linear model because the model is still linear in the parameters ().
Matrix/vector formulation
Consider an overdetermined system
of linear equations in unknown coefficients, , with . (Note: for a linear model as above, not all elements in contains information on the data points. The first column is populated with ones, . Only the other columns contain actual data. So here is equal to the number of regressors plus one.) This can be written in matrix form as
where
Such a system usually has no exact solution, so the goal is instead to find the coefficients which fit the equations "best", in the sense of solving the quadratic minimization problem
where the objective function is given by
A justification for choosing this criterion is given in Properties below. This minimization problem has a unique solution, provided that the columns of the matrix are linearly independent, given by solving the normal equation
The matrix is known as the Gram matrix and the matrix is known as the moment matrix of regressand by regressors. Finally, is the coefficient vector of the least-squares hyperplane, expressed as
or
Licensing
Content obtained and/or adapted from:
- Ordinary least squares, Wikipedia under a CC BY-SA license