Ordinary Least Squares

One of my favorite authors and historical statisticians Dr. Stephen Stigler published a wonderful historical review in 1981 titled Gauss and the Invention of Least Squares. He argued that the prolific Carl Freidrich Gauss discovered Ordinary Least Squares (OLS) in 1809 and fundamentally shaped the future of science, business, and society as we know it.

So, what is OLS and why is it so important?

OLS is often referred to by many things across several different discipilines, some of them are:

Linear Regression
Multivariate Regression
The Normal Equations
Maximum Likelihood
Method of Moments
Singular Value Decomposition of

But all of them ultimately reflect the same mathematical expression (in scalar notation):

Which yields the famous estimator (i.e., equation) for as

Or in matrix notation:

I find this simple equation to be so extraordinary.

Why? Because of what can be learned from it: the equation basically says "Look at data about and estimate a linear relationship to ".

As a concrete example, imagine you wanted to know the relationship between age and income (a simplification of the well-studied Mincer Equation), how would you figure this out? A simple linear regression could estimate that relationship and the would represent the partial-correlation (sometimes called the marginal effect or coefficient estimate) and it exactly represents the slope of the line below.

A Scatter Plot of Age and Income

Isn't that just amazing??

This single expression is used to estimate models for movie recommendations, businesses, pharmaceuticals, and even decisions about public health. I am constantly amazed at how one little equation could accomplish so much.

To think Gauss had discovered OLS as a method of calculating the orbits of celestial bodies and that today, over 200 years later, humans would use it to for so much of what we do is astounding.

Over the years statisticians, economists, computer scientists, engineers, and psychometricians have advanced OLS in such profound and unique ways. Some of them have been used to reflect data generated from more non-standard distributions (e.g., a Weibull distribution), or to frame the problem to use prior information in a structured way (e.g., through Bayesian Inference), while others have enhanced these equations to learn high-dimensional non-linear relationships (e.g., via Artificial Neural Networks). Again, all of these are extended from the extraordinary work of Gauss.

There's so much that can be written about all of the advancements that have been made in all of these fields and a short blog post simply won't do it justice, but I thought I'd at least share some thoughts about it.

Somewhere along the way today I came across something related to important equations and it led me to write this, so I hope you enjoyed it.

I'm such a fan of the history of statistics and mathematics that this piece, while not as structured as I'd like, was very enjoyable to write.

Happy computing!

-Francisco