What Separates Good from Great Data Scientists?The most valuable skills in an evolving fieldAmadeus MagrabiBlockedUnblockFollowFollowingJun 30The data science job market is changing rapidly.
Being able to build machine learning models used to be an elitist skill that only a few distinguished scientists possessed.
But nowadays, anyone with basic coding experience can follow the steps to train a simple scikit-learn or keras model.
Recruiters are being flooded with applications, because the hype around the “sexiest job of the century” has barely slowed down while the tools are becoming easier to use.
The expectation of what a data scientist should bring to the table has changed and companies are beginning to understand that training machine learning models is only a small part of what it takes to be successful in data science.
Here are four of the most valuable qualities that set the best data scientists apart.
Strong Focus on Business ImpactOne of the most common motivators for data scientists is a natural curiosity for finding patterns in data.
It can be exciting to delve into the detective work of exploring datasets, experimenting with the latest techniques in the field, systematically testing their effects and discovering something new.
This type of scientific motivation is something that data scientists should have.
But it becomes a problem if it is the only motivator.
In that case, it can cause people to think in an isolated bubble and get lost in statistical details without thinking about the concrete applications of their work and the larger context in the company.
The best data scientists understand how their work fits into the company as a whole and have an intrinsic drive to deliver business value.
They do not waste time with complicated techniques when simple solutions are good enough.
They ask about the larger goal of a project and challenge the core assumptions before jumping on a solution.
They focus on the impact of the whole team and proactively communicate with stakeholders.
They are full of ideas for new projects and are not afraid to think outside the box.
They pride themselves on how many people they helped and not on how advanced the technique they used was.
Data science is still a largely unstandardized field and there is a large gap between what data science bootcamps teach and what businesses actually need.
The best data scientists are not afraid to go out of their comfort zone to solve pressing problems and maximize their impact.
Solid Software Engineering SkillsWhen people think about the ideal data scientist, they often have reputable AI professors from prestigious universities in mind.
Hiring for a profile like that can make sense when companies are in a competitive race to build machine learning models with the highest possible accuracy.
When it is important to squeeze out the last percentages of accuracy by any means necessary, then you need to pay attention to mathematical details, test the most complex methods or even invent new statistical techniques that are specifically optimized for a particular use case.
But this is rarely necessary in the real world.
For most companies, standard models with decent accuracy are good enough and it is not worth it to invest the time and resources to turn decent models into the world’s best state-of-the-art models.
It is far more important to build models with passable accuracy quickly and establish feedback cycles early, so you can start to iterate and speed up the process of identifying the most valuable use cases.
Small differences in accuracy are usually not the reason why data science projects succeed or fail, which is why software engineering skills trump scientific skills in the business world.
The typical workflow of data teams often goes like this: The data scientists prototype some solutions with trial-and-error and Spaghetti code.
Once the results start to look promising, they hand them over to software engineers, who then have to rewrite everything from scratch to make the solution scalable, efficient and maintainable.
Data scientists cannot be expected to deliver production code that is on the level of full-time software engineers, but this whole process is much smoother and faster if data scientists are more familiar with software engineering principles and have an awareness for architectural issues that can occur down the line.
Together with the fact that more and more parts of the data science workflow are being replaced by new software frameworks, solid engineering skills are one of the most important skills for data scientists.
Careful Expectation ManagementFrom the outside, data science can be a very vague and confusing field.
Is it just a hype or is the world really going through a revolutionary transformation?.Is every data science project a machine learning project?.Are these people scientists, engineers or statisticians?.Is their main output software or dashboards and visualizations?.Why does this model show me a prediction that is wrong, can someone fix this bug?.What have they been doing for the past month if all they have now is these few lines of code?There are a lot of things that can be unclear and the expectation of what a data scientist should do can vary greatly between different people in a company.
It is crucial for data scientists to proactively and consistently communicate with stakeholders to set clear expectations, catch misunderstandings early and bring everyone on the same page.
The best data scientists understand how the different backgrounds and agendas of other teams affect their expectations and carefully adapt the way they communicate.
They are able to explain complicated methods in a simple manner to give non-technical stakeholders a better grasp on the goals.
They know when to dampen overly optimistic expectations and when to convince overly pessimistic colleagues.
And most importantly, they stress the inherently experimental nature of data science and do not overpromise when the success of a project is still unclear.
Comfortable with Cloud ServicesCloud computing is a core part of the data science tool kit.
There are just too many cases when fiddling around with a Jupyter notebook on a local machine reaches its limits and is not enough to get the job done.
Cloud services are particularly useful when you need to, for example, train machine learning models on powerful GPUs, parallelize data preprocessing on a distributed cluster, deploy REST APIs to expose machine learning models, manage and share datasets, or query databases for scalable analytics.
The largest providers are Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP).
Given the huge set of services and the differences between the platforms, it is virtually impossible to be competent in everything that cloud providers have to offer.
But it is important to have a basic understanding of cloud computing to be able to navigate through the documentation and learn how the features work when you need them.
At the very least, this allows you to ask better questions and formulate more specific requirements for your friendly neighborhood data engineers.
So there you have it.
For companies that are looking to start a data science team from scratch, I would recommend to look for candidates that are pragmatic problem-solvers with strong engineering skills and a fine-tuned sense for business value.
Statistical excellence can bring a lot of value, but it is becoming less important for the majority of use cases, especially in early-stage teams.
Up to now, most companies preferred to hire data scientists with a strong academic background, like a PhD in mathematics or physics.
Given how the industry has evolved in recent years, it will be interesting to see whether there will be a larger percentage of software engineers or technical product managers who will transition into data science roles in the future.