Which data role should I fill in a startup first?Pawel KoperekBlockedUnblockFollowFollowingJan 3Photo by JD Mason on UnsplashRecently I had a pleasure to speak to a few startup founders.
They have all recognized the need to bring in someone who could help to leverage data they collect or have plans to collect.
The first instinct in this situation is to bring an experienced data scientist, who by jumping right into the product could provide deep and insightful analyses.
Unfortunately, in most cases it is not that easy: that person needs time to learn.
They need to learn:the company: what is it’s structure, mission, how it makes money, what are it’s strong and weak sides etcthe product: understand the roadmap, planned new features, how it contributes to the greater company vision, what is already known about it, what are the unknowns, which metrics should be used to measure the product development and what are the levers the team can use to move themthe data: what data is collected and when, understand what the data actually meansthe data platform: how data is collected, how to access the data, how data is processed, how one can start collecting new data points, change how existing processes work or remove existing onesPhoto by rawpixel on UnsplashIf you consider a developed company, there should be already some materials which would help to bring a new person up to speed: documentation of existing data sets, ETL processes, used technologies, a playbook of good practices or just a set of notes left by someone who was working on a project until now.
In a startup the situation usually is dramatically different: very few people know exactly what data is collected (if any), no one has a holistic, end-to-end understanding of the data flows and how the numbers they produce should be interpreted, the data platform might not exist at all.
At this point it begins to be clear, that a company needs to reconsider if a data scientist is the role which should be filled at this point.
To make a him efficient and successful, there are some key ingredients to which would need to be provided: clean data and a data platform.
Lets look at this problem from a different angle.
For many people there are following roles in modern analytics:Data Engineer — models and defines data sets, writes scalable ETL processes and data collection code, ensures data is clean and well structured — in general makes sure data is being collected in a reliable and robust mannerData Analyst — creates reports, dashboards, answers business questions — provides a clear picture of what is happening and helps to drive business strategyData Scientist — applies advanced statistics to data to discover new insights, builds Machine Learning models, figures out ways to optimize business processes using data — leverages the power of data to bring the company to the next levelIn reality the data science (or analytics) skills are more of a spectrum.
Every data role has e.
component of understanding the business and a degree of coding skills.
For a DE, good coding skills are essential.
They need them to create robust and reliable ETL processes.
On the other hand, their business understanding does not have to be very thorough.
They have to use the same terminology according to company’s nomenclature, but they should not be expected to drive strategy for the next 3 or 5 years.
For a Data Analyst the business knowledge is key.
They need to filter out signal from noise and put it in context of general company trends.
Their coding skills have to allow them to access and use the data effectively, but do not have to be top-notch.
Most of early analytics problems in startups are related to data collection, cleaning and making sure the data is interpreted in the right way.
Collecting data might look different in every scenario: it can be as simple as plugging in Google Analytics or Mixpanel, creating an extract from an existing DB, polling data from remote APIs or scraping websites, finally collecting data directly from a large number of end-user devices.
The business questions are initially relatively simple: how many users are using the product, what is the most popular feature, what segment of users is returning to the product etc.
Which role should be filled in first in a startup then?.The answer to our question therefore largely depends on the context:the mix of skills which are already present in the team (e.
you might have software engineers who are capable of working on an early version of a data platform).
what and how much data has to be collectedwhat are the questions which would help to move the business forwardMy rule of a thumb is as follows:if the business idea depends on an implementation of a ML model — stop reading now and start hiring the best Data Science talent possibleif there is someone who can build the initial version of a data platform, or data can be extracted in an easy way (e.
through a SaaS app or a simple data extract) hire an experienced Data Analyst who will help to organize it and create a clear picture of the state of the business.
Grow the team according to the needs from there.
if there there is no one, who could fill the early Data Engineer role or there are many data sources which need to be accessed frequently – first look for a Data Engineer, who would also be able to help with answering the first, simplest questions.
When the foundation has been built — grow the team and bring in a Data Analyst or a Data Scientist to take it to the next level.
More from Hacking Analytics:One the evolution of Data EngineeringExperimental Design and How to Avoid blowing everything up4 Pillars of AnalyticsWhy a Data Scientist is not anotherDo Penguins do Analytics?.