The Data Science Interview Study Guide121 resources to help you land your data science dream jobSeattleDataGuyBlockedUnblockFollowFollowingMay 19Photo by HelloquenceData science interviews, like other technical interviews, require plenty of preparation.
There are a number of subjects that need to be covered in order to ensure you are ready for back-to-back questions on statistics, programming and machine learning.
Before we get started, there’s one tip I’d like to share.
I’ve noticed that there are several types of data science interviews that companies conduct.
Some data science interviews are very product and metric driven.
These interviews focus more on asking product questions like what kind of metrics would you use to show what you should improve in a product.
These are often paired with SQL and some Python questions.
The other type of data science interview tends to be a mix of programming and machine learning.
We recommend asking the recruiter if you aren’t sure which type of interview you will be facing.
Some companies are very good at keeping interviews consistent, but even then, teams can deviate depending on what they are looking for.
Here are some examples of what we have noticed about some companies data science interviews.
Airbnb — Product heavy, metrics diagnostics, metrics creation, A/B testing, tons of behavioral questions and take home material.
Netflix — Product-sense questions, A/B testing, experimental design, metric designMicrosoft — Programming heavy, binary tree traversal, SQL, machine learningExpedia — Product, programming, SQL, product sense, machine learning questions about SVM, regression and decision treeDue to this variance, we’ve created a checklist to keep track of what subject areas you did study and what you still need to cover.
Data Science Study ChecklistLet’s first start with making sure you can explain the basic data science algorithms.
Machine Learning AlgorithmsLogistic Regression — VideoA/B Testing — VideoDecision Tree — PostSVM — PostHow SVM — VideoPrincipal Component Analysis: PCA — PostPrincipal Component Analysis — VideoAdaboost — PostAdaBoost — VideoA Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning — PostGradient Boost Part 1: Regression Main Ideas — VideoK-Means Clustering — The Math of Intelligence — VideoBayesian Network — PostNeural Network — PostDimensionality reduction algorithms — PostHow kNN algorithm works — VideoProbability And StatisticsAt large tech companies, it is common to receive an occasional probability or statistics question.
While the questions won’t necessarily require complex math, if you haven’t thought about independent and dependent probabilities in while it is good to review setting up the basic formulas.
Probability VideosDependent probability introductionIndependent & dependent probabilityIndependent ProblemsConditional Prob ArticleProbability QuizProbability & Statistics — Set 6Probability & Statistics — Set 2Independent ProbabilityDependent ProbabilityProbability Interview QuestionsMost of these questions are either similar to the ones we have been asked or taken directly from glassdoor.
A die is rolled twice.
What is the probability of showing a 3 on the first roll and an odd number on the second roll?In any 15-minute interval, there is a 20% probability that you will see at least one shooting star.
What is the probability that you see at least one shooting star in the period of an hour?Alice has 2 kids and one of them is a girl.
What is the probability that the other child is also a girl?.You can assume that there is an equal number of males and females in the world.
How many ways can you split 12 people into 3 teams of 4?Statistics Pre-QuizzesData Science Probability Statistics 14Statistics ConceptsStatistics is a broad concept so don’t get too bogged down in the details of each of these videos.
Instead, just make sure you can explain each of these concepts at the surface level.
Bias-Variance Trade-OffConfusion MatrixROC curveNormal DistributionP-ValuePearson SpearmanNormal distribution problem: z-scores (from ck12.
org)Continuous Probability DistributionsStandardizing Normally Distributed Random Variables (fast version)Statistics 101: Simple Linear Regression, The Very BasicsStatistics 101: Linear Regression, Outliers, and Influential ObservationsStatistics 101: ANOVA, A Visual IntroductionStatistics 101: Multiple Regression, The Very BasicsStatistics: Variance of a population | Probability and Statistics | Khan AcademyExpected Value: E(X)Law of large numbers | Probability and Statistics | Khan AcademyCentral limit theorem | Inferential statistics | Probability and Statistics | Khan AcademyMargin of error 1 | Inferential statistics | Probability and Statistics | Khan AcademyMargin of error 2 | Inferential statistics | Probability and Statistics | Khan AcademyHypothesis testing and p-values | Inferential statistics | Probability and Statistics | Khan AcademyOne-tailed and two-tailed tests | Inferential statistics | Probability and Statistics | Khan AcademyType 1 errors | Inferential statistics | Probability and Statistics | Khan AcademyLarge sample proportion hypothesis testing | Probability and Statistics | Khan AcademyBoosting and BaggingStatistics Post-QuizData Science Probability Statistics 17Product And Experiment DesignsProduct sense is an important skill for data scientists.
Knowing what to measure on new products and why can help determine whether a product is doing well or not.
The funny thing is, sometimes certain metrics going the way you want them to might not always be good.
The reason people are spending more time on your website might be because webpages are taking longer to load or other facing similar problems.
This is why metrics are tricky and what you measure is important.
Product And Experiment Design ConceptsUser Engagement MetricsData Scientist’s Toolbox: Experimental Design -VideoA/B Testing Guide6 Themes Of MetricsProduct And Metrics QuestionsAn important metric goes down, how would you dig into the causes?What metrics would you use to quantify the success of Youtube ads (this could also be extended to other products like Snapchat filters, Twitter live-streaming, Fortnite new features, etc)How do you measure the success or failure of a product/product featureGoogle has released a new version of its search algorithm, for which they used A/B testing.
During the testing process, engineers realized that the new algorithm was not implemented correctly and returned less relevant results.
Two things happened during testing:People in the treatment group performed more queries than the control group.
Advertising revenue was higher in the treatment group as well.
What may be the cause of people in the treatment group performing more searches than the control group?.There are different possible answers here.
Question 4 borrowed from Zarantech; We really enjoyed it and thought it was a good example of how things can go wrong.
ProgrammingJust because data science doesn’t always require heavy programming, it doesn’t mean that interviewers won’t ask you traverse a binary tree.
So make sure you ask your interviewer what to expect.
Don’t be daunted by these questions.
Pick a few to do just so you’re not surprised in an interview.
Pre-Video QuestionsFizz BuzzFind The Kth Smallest/Largest Integer In An ArrayNth FibonacciAlgorithms And Data StructuresPre-Study ProblemsBefore going through the video content about data structures and algorithms, consider trying out the problems below.
This will help you know what you need to focus on.
Sum of Even Numbers After QueriesRobot Return to OriginN-Repeated Element in Size 2N ArrayBalanced Binary TreeData Structures VideosData Structures & Algorithms #1 — What Are Data Structures?Multi-dim (video)Data Structures: Linked ListsCore Linked Lists Vs Arrays (video)Data Structures: TreesData Structures: HeapsData Structures: Hash TablesData Structures: Stacks and QueuesAlgorithm VideosPython Algorithms for InterviewsAlgorithms: Graph Search, DFS and BFSBFS(breadth-first search) and DFS(depth-first search) (video)Algorithms: Binary SearchBinary Search Tree Review (video)Algorithms: RecursionAlgorithms: Bubble SortAlgorithms: Merge SortAlgorithms: QuicksortString ManipulationCoding Interview Question and Answer: Longest Consecutive CharactersSedgewick — Substring Search (videos)SQLPost-Study ProblemsNow that you have studied for a bit and watched a few videos.
Let’s try some more problems!Bigger Is GreaterZigZag ConversionReverse IntegerCombination Sum IIMultiplying StringsLarry’s ArrayShort PalindromeValid NumberBigger is GreaterThe Full Counting SortLily’s HomeworkSQL — ProblemsGenerally, there will be at least one interview focused on SQL.
In addition, the interviewers may take you through the entire process of developing a product, choosing metrics to track and then querying to measure the effectiveness of that metric.
Trips and UsersHuman Traffic of StadiumDepartment Top Three SalariesExchange SeatsHackerrank The ReportNth Highest SalarySymmetric PairsOccupationsPlacementsOllivander’s InventorySQL — VideosIQ15: 6 SQL Query Interview QuestionsLearning about ROW_NUMBER and Analytic FunctionsAdvanced Implementation Of Analytic FunctionsAdvanced Implementation Of Analytic Functions Part 2Wise Owl SQL VideosPost SQL ProblemsBinary Tree NodesWeather Observation Station 18ChallengesPrint Prime NumbersBig CountriesExchange SeatsSQL Interview Questions: 3 Tech Screening Exercises (For Data Analysts)ConclusionTechnical interviews can be tough.
Whether they are for software engineers, data engineers or data scientists.
We do hope this study guide helps you keep track of your progress!If there is something you think we left off or you have additional resources which you think would be a benefit, please let me know.
Thank you!.. More details