Why Business & Product Should Always Define KPIs & Goals For Data-ScienceWhy business or product should choose a KPI that works for them and why researchers should choose a mathematical KPI that satisfies the business’ or the product’s need.
Ori CohenBlockedUnblockFollowFollowingJun 23Key Performance Indicator (KPI) is a type of performance measurement, used to evaluate the performance of an organization, projects or products in which it engages.
We need KPIs & Goals in order to have a set of expectations that are agreed upon by all relevant stakeholders, which allows us to measure the business impact and to optimize toward a single result.
In companies where the Business or Product (BP) teams are well established, good communication with the data-science (DS) is essential.
The DS team requires two types of KPIs, those that come from the BP side and those that are created by the DS team in order to meet the BP KPIs & Goals.
In most situations, DS KPIs will be different than BP KPIs.
Which in turn allows us to define KPIs that aligns with it.
In many cases, these KPIs don’t need to be the same!Imagine the following, you need to start a new project or fix an old model, but do not have a BP KPI.
You are told to do your best and iterate until a proper solution is found.
You then think you figured out what is the BP’s need and formulate a mathematical KPI for it.
Here is why I’m not recommending an iterative process to fit a DS KPI to a BP KPI that did not originate in BP:Research processes take time.
To achieve a certain Goal, researchers may check many possible solutions.
In practice, this can take weeks!Having a BP KPI first allows stakeholders to be in agreement, allows us (DS) to be focused on a solution that satisfies the business need and finally it is more agile, as it prevents unneeded long iterations without a clear-defined goal.
“To achieve a certain Goal, researchers may check many possible solutions.
”For example, let’s say your stakeholders have an issue with a classifier, they complain that it prefers to classify certain classes.
There are many possible solutions, some are expensive in terms of time or money, such as annotating more samples.
Others can be less expensive; solutions such as over or under-sampling, synthetic sampling, influencing class weights or sample weights, pushing and pulling precision and recall, etc.
The following are four examples that describe a certain task, a BP KPI or goal and how it can be translated into a DS KPI:Task 1: create models that predict whether a house has an air-conditioner,t washer-dryer or hot-water-boiler, etc.
Note: Electric companies have statistics about appliances in each household.
Product KPI: match e-company statistical reports.
Goal: Be closely aligned with electric companies expectation.
DS KPI: precision, Goal: 81%.
DS KPI: target distribution.
Goal: same as reported by the electric company.
, it is known that 40% of the houses have a dryer.
Therefore, we place a threshold over the classifier so that 40% of the houses would have a dryer.
Task 2: Improve and optimize the company’s overhead spending in various scenariosBusiness KPI: CRC.
Goal: reduce by 10%.
DS KPI: annotation capacity.
Goal: using the same resource to support 2x more clients.
DS KPI: annotation speed.
Goal: provide a solution to enable 3x faster sample annotation.
, more annotations allow to support more clients and get more data.
DS KPI: focus on the most relevant annotation samples.
Goal: reduce non-relevant annotation by 75%.
Task 3: Aid one of the teams to be more productiveBusiness KPI: time-to-production.
Goal: decrease by half.
DS KPI: precision, recall, task’s time to completion.
Goal: create an automatic process that solves the task, with 80% precision, 90% recall, taking no longer than 5 minutes to complete.
Task 4: Create signal alerts based on anomaly detectionBUSINESS KPI: number of times clients complain about spam alerts.
Goal: reduce by 50%.
DS KPI: false positive.
Goal: allow only 1% false positives.
The illusiveness of the right metricIn certain occasions, you will find yourself in a situation where the product, business or even sales promises the client a certain accuracy level for a product.
I understand that sometimes this is inevitable, however, here are some arguments to why accuracy is not always the proper KPI to look at.
A different combination of metrics can achieve the same accuracy.
A certain model with high accuracy can have high precision and low recall.
Another model with lower accuracy can have similar precision with higher recall.
I’m assuming most clients will pick the first one because it has better accuracy levels, however, the BP’s need might be satisfied with the second model because we want to catch more samples that are more precise.
High Accuracy doesn’t guarantee a good result.
For example, in Figure 1, we see a comparison of 4 multi-class algorithms, specifically showing accuracy and F1.
If we were only looking at accuracy, it seems like algorithm 3 is the best candidate, however, due to similar accuracy and most importantly low recall levels, algorithms 1,2 and 3 should not be considered because algorithm 4 is catching more samples.
Figure1: a comparison of accuracy vs F1 for 4 multi-class algorithms,The bottom line is that accuracy is not always correlated with our expectations or the BP’s needs and choosing the right mathematical metrics should be left to your researchers, i.
, BP should choose a KPI that works for them and DS should translate it to the math that satisfies the BP KPI.
My proposed flowThe following flow (Figure 2.
) tries to bridge the gap between BP and DS in terms of working relations, settings KPIs, validating them and finally delivering a solution that everybody is happy withFigure 2: a proposed flow for when working side by side with business and productWe begin inside the purple box, the BP has a new project and they define KPIs & goals.
The DS team accept them (pink) and starts formulating mathematical KPIs & goals to satisfies the BP’s need.
Once formulated, DS should communicate the math to the BP to see if there is an alignment.
With that in place, the DS team starts researching (orange), following internal validation of the KPIs and goals (yellow).
For example, trying to reach a certain precision point.
When the DS-validation step is completed, deliverables are passes to the BP for validation.
BP goes through their own validation stage (cyan), if not successful they hand it back to the DS team for another iteration, if it is successful then a solution is delivered and everybody is happy (green).
In the not-so-rare case of deciding that a new KPI or a new goal is needed, the process starts from the purple box, i.
, it's exactly like saying “we are starting a new project”.
I completely acknowledge that some stakeholders will not fully cooperate with this idea and it is fairly safe to assume that you will hear comments such as:Qualitative feedback is enough, therefore, we don't need KPIs and goals.
A certain team isn’t good at defining KPIs & Goals.
It’s the Product’s job to help with that.
Qualitative feedback will probably won't translate well to a solution that satisfies the original need, moreover, without a KPI its probably impossible to have a “definition of done” when there are multiple stakeholders and an infinite number of possible solutions.
Therefore, it’s advisable to reach out to the Product team and collaborate with them on defining these metrics.
I hope that these ideas will bring some order to the chaos that comes when starting a new project or adding new features to your product, save you some time and hopefully aid the complex relations between business managers, product managers, data scientists, and researchers.
I would like to thank my fellow colleagues, Sefi Keller, Samuel Jefroykin, Yoav Talmi & Ido Ivry for their invaluable advice.
Ori Cohen has a Ph.
in Computer Science with focus in machine-learning.
He leads the research team in Zencity.
io, trying to positively influence citizen lives.