Three reasons:In industry projects, it often makes sense to build a quick MVP or a simple solution, get some feedback and iterate over it.
So, it is useful to think about couple of different options to solve a problem and evaluate which one can be built quickly.
Reinforcement learning, or Deep learning or Regular expressions are just “methods” to solve problems.
They should not be the solutions themselves irrespective of the problem.
This is also going to tell me whether the candidate just followed the description of a mandatory project in their course, or actually understood and thought about solving the problemSo, my point is: when we are starting out, it is useful to not stop at one solution, and think about “what are the other ways of solving this?”.
This is what we normally do in a real world job too.
We look for optimal solutions under given constraints.
While it is difficult for someone just starting out to know everything — part of this is also common sense.
When someone suddenly asks you about an alternative solution, you should have an answer in an interview.
This is going to be very useful as one grows as a data scientist.
Generalized understanding or specific knowledgeLet me again take an anonymized interview experience I had.
A fresh out of college candidate on their resume had a bunch of typical projects — spam classification, MNIST dataset digit recognition, sentiment analysis etc.
On one of these, the candidate also claimed to be in the Top-10 performers on Kaggle leader board.
While that is impressive, these are also so far away from real-world project scenarios.
So, what should I do?Instead of asking questions on the specifics of these well-known datasets and projects, I modified my “problem descriptions” slightly.
I asked the candidate problem solving questions such as: “Let us say I run a online business and I frequently get customer emails complaining about something.
I only have three customer support departments: orders and billing, returns and refunds, others.
I want a machine learning solution which routes customer emails to one of these three departments.
” — if someone understood the projects they did above, they would have mapped this to a classification problem, potentially similar to spam or sentiment classification.
Even at an entry level, not seeing that connection is a red flag for me.
My point for this question is two fold:It is completely alright if all the data science projects you could show are standard datasets and kaggle competitions (Few months ago, I wondered if this is useful, but I changed my opinion now).
But, one needs to know how to generalize the knowledge from these to new problems.
For example, if you previously worked on a text classification problem, you need to be able to identify another text classification problem and walk through some steps in solving it.
Here too, my second point is similar to the Question 1.
This tells me whether the candidate really understood what they did, or just followed instructions, or followed online tutorials.
To summarize, when applying for data science jobs, entry level candidates need to think about their projects a bit beyond the exact things they did — looking for other possible solutions, and for examples of similar problems in real-world applications.
Of course, all this is my personal opinion, and not necessarily found in each and every data science interview under the sun!.