How are overdetermined linear systems important to statistics?

1 Answer
Mar 12, 2018

They are used in regression (typically least squares)

Explanation:

In an overdetermined system of equations, there are more simultaneous equations than there are unknown variables.

The chance that single values might be found for each of the variables that satisfy all of the system of simultaneous equations is very remote, particularly if the coefficients for the unknown variables in the system of equations is found by measurement.

In fact, modelling measurement (of the dependent variable---the independent variable is treated as though it were accurate) as though it were subject to error such that repeated measurements of the same thing produce a set of data that is normally distributed forms the basis of much statistical theory.

So, each equation in the overdetermined system of equations might be thought of as "voting" for some particular values for the unknown variables, and there are more votes than variables.

The most common way to deal with this is to undertake linear regression. A set of estimated values is found that minimises the sum of squared residuals of the observed values of the dependent variable (given some particular values taken by the independent variable(s)) from the estimated values.

The validity of using the minimisation of the sum of squared residuals is based on the assumption that the errors in measurement of the dependent variable might plausibly be modelled by a normal distribution that has constant variance across the range taken by the independent variable(s) (a property known as homoscedasticity).

Minimisation of summed squared residuals is equivalent to maximisation of likelihood under the presumption of normally distributed residuals.