Data Science Interview: Linear Regression ExplainedLinear regression explained in a non-technical wayJoy Gracia HarjantoBlockedUnblockFollowFollowingApr 5Data science is an emerging field that attracts interest from undergraduates like myself who want to pursue a career in the field after graduation.
My passion and interest for the field have been met with challenge because of the lack of resources and unclear path in navigating the recruiting process.
Furthermore, a good number of open data science positions require prospective employees to have a masters degree, a phD degree or several years of experience.
I want to help make the process easier by sharing about my experience.
I have several friends who are also trying to break into the field who have had similar if not very different experiences; so this is by no means a one size fits all type of guide since every position is inherently unique.
The ChallengeUnlike other technical roles, there aren’t books published about how to study for data science interviews.
I found it difficult to find resources to study and practice from.
Perhaps this is because data science is still a relatively new field.
I have interviewed for 3 data science positions and each time I have been asked different questions.
The lack of clear resources and path makes preparing for interviews challenging.
Common ThemeApart from interview experience, I have also reached out to alumni who are working in data science roles.
I asked them about their interview experience and their study technique.
Through my experiences and talking to alumni, I have been able to identify a theme common in most if not all roles, which is to explain what you know.
This makes sense because they want you to be able explain what you know since the job will require you to work with people who don’t necessarily have the same background.
PurposeI have decided to write this post for two reasons:1.
I hope by reading those without a statistical background are able to understand linear regression, an important statistical concept.
I also hope those who have a background in statistics struggling to explain technical statistical concepts in interviews find this to be one of the many ways to do.
Linear RegressionFor this post, I have decided to cover linear regression because linear regression is the fundamental building block of my personal statistical knowledge.
I have done projects using linear regression and have grown to appreciate its importance over the past several years.
Furthermore, I was asked to explain linear regression in a non technical way in my most recent interview.
When the interviewer said non technical, it was immediately clear to me that I was to assume the interviewer had no prior statistical knowledge.
This means I also had to explain the fundamental building blocks of linear regression such as response and explanatory variables.
Given my background in journalism and storytelling, I immediately thought the best way to do this was by using an analogy.
AnalogyImagine a scenario where you just have a device that tracks the number of hours you sleep, the time you went to sleep and the time you wake up.
Every day once you wake up, you can rate your mood on a scale of one to five.
After keeping track of these numbers for the past ten days, you become curious as to how to improve your mood in the morning.
The data is everything collected by the device and your mood in the morning.
The number of days is the total rows in a data set.
Your mood, the number you are most concerned with, is the response variable.
The factors collected by the device that potentially affects your mood are explanatory variables.
Linear regression maps the relationship between the response and explanatory variable.
It typically connects response and explanatory variables this way:Y= aX1+bX2 +cX3where Y is the response variable and the Xs are each respective explanatory variable.
The coefficients in front of the respective Xs are the weights.
The weights are dependent on the importance of the variables.
Let’s assume for simplicity that the number of hours of sleep that you get significantly affects your mood, then it will have a greater weight than the other two variables.
You can find the weights and use linear regression in both programming languages such as R and calculate it numerically although is much easier to do it in R.
In R, after typing in the model, it can also tell you which explanatory variable, if any, is not significant.
An insignificant variable means it does not have any influence on the response variable.
The more you learn about linear regression, the more you will learn about the different ways to model a data set.
This is just the tip of the ice berg.
ConclusionI hope you have developed an intuitive understanding of how linear regression works.
If you already understand linear regression, then I hope you have learned of a new way to break down a technical concept and explain it.
Thanks for reading!.