Understanding when Simple and Multiple Linear Regression give Different Results

Understanding when Simple and Multiple Linear Regression give Different ResultsRyan GotesmanBlockedUnblockFollowFollowingMay 3Simple and multiple linear regression are often the first models used to investigate relationships in data.

If you play around with them for long enough you’ll eventually realize they can give different results.

Relationships that are significant when using simple linear regression may no longer be when using multiple linear regression and vice-versa, insignificant relationships in simple linear regression may become significant in multiple linear regression.

Realizing why this may occur will go a long way towards improving your understanding of what’s going on under-the-hood of linear regression.

Doing a quick review of simple linear regression, it attempts to model the data in the form of:and if the slope term is significant then for every unit increase in x there is an average increase in y by beta_1 that is unlikely to occur by chance.

Imagine we are an ice cream business trying to figure out what drives sales and we have measured 2 independent variables: (1) temperature and (2) the number of people wearing shorts we observe walking down the street in 10 minutes.

Our dependent variable is: number of ice creams we sell.

First we plot temperature vs ice creams soldand do a simple linear regression to find a significant relationship between sales and temperature.

This makes sense.

We then plot number of shorts observed against salesand do another simple linear regression to find a significant relationship between the number of people wearing shorts we observe in 10 minutes and ice cream sales.

Interesting…perhaps this doesn’t make as much sense.

Then we turn to multiple linear regression which attempts to model the data in the form of:Multiple linear regression is a bit different than simple linear regression.

First off note that instead of just 1 independent variable we can include as many independent variables as we like.

The interpretation differs as well.

If one of the coefficients, say beta_i, is significant this means that for every 1 unit increase in x_i, while holding all other independent variables constant, there is an average increase in y by beta_i that is unlikely to occur by chance.

We do multiple linear regression including both temperature and shorts into our model and look at our resultsTemperature is still significantly related but shorts is not.

It has gone from being significant in simple linear regression to no longer being significant in multiple linear regression.

Why?The answer can be found by plotting shorts and temperature.

There appears to be a relationship.

When we check the correlation between these 2 variables we find r =0.

3 Shorts and temperature tend to increase together.

When we did simple linear regression and found a relationship between shorts and sales we were really detecting the relationship between temperature and sales that was conveyed to shorts because shorts increased with temperature.

When we did multiple linear regression we looked at the relationship between shorts and sales while holding temperature constant and the relationship vanished.

The true relationship between temperature and sales remained however.

Correlated data can frequently lead to simple and multiple linear regression giving different results.

Whenever you find a significant relationship using simple linear regression make sure you follow it up using multiple linear regression.

You might be surprised by the result!(Note: This data we generated using the mvrnorm() command in R)Feel free to leave any thoughts or questions in the comments below!.