I have my own opinion on this topic, of course, but I might be biased by my own attitude and past experience.
So, I and Patrick Slavenburgh asked our fellow colleagues on LinkedIn about the most common causes for deployment failure in a data science project.
This article aims at summarizing the answers.
Infant MortalityAs everybody knows, a data science project consists of a few mostly standard phases, summarized for example in the CRISP-DM cycle: business case understanding; data exploration; data cleaning and data preparation; model training, model optimization, and model testing; and finally, if everything went well in the previous phases, deployment.
“If everything went well” means:If we understood the business specs clearlyIf the data quality is sufficiently good for our taskIf we managed to clean and prepare the data properlyIf we trained, optimized, and tested a sufficiently good modelIf all of that succeeded, we might proceed with deployment.
The deployment phase is at the end of the food chain.
It collects all the garbage produced in the previous stages of the process.
That is, the deployment phase is where all previously created and undetected problems might show up to kill the entire project.
And here we are at the first cause for deployment failure: infant mortality.
Infant mortality is due to a deep undetected problem in any of the previous stages.
Kenneth Longo, for example, reports the misunderstanding of the original business question as one of the most common causes for a Data Science project to drown during deployment.
Kenneth describes the problem as follows: “What the Machine Learning model ultimately answers does not align with the original business question, or the business question has shifted during the development process.
”Sometimes, despite all efforts, model performances are still below expectations.
In this case, either expectations were too high to start with (Data Science was sold as magic) or the quality of the data used to train the model was not good enough (the training set might have been too small and/or not covering all possible events).
Sometimes it is possible to discover this data quality issue during the data exploration phase, for example with some statistics exploration, but sometimes this problem flies under the radar and becomes clear only right before deployment.
A failed deployment of the project, because of infant mortality, is disappointing, but actually not a deployment failure per se.
It just happens during the deployment phase.
I would not count this as a cause of deployment pain, but more as a disappointing late discovery during the deployment phase.
Missing the IT SkillsDeployment requires some IT skills.
Whether you need to write the final results on a database, to schedule the job execution every X hours, to run the application from a remote server with different connection parameters, or to run it as a REST service, you will need some help from your IT department, at the very least to get the server URI and the right credentials to access the resources.
And this is, in my experience, the second show stopper for deployment: the missing collaboration between the data science group and the IT department.
The IT department’s task is to protect the data.
The Data Science group’s task is to access the same data.
These two opposite tasks might not foster the best collaboration.
In this situation, it might help if a team within the Data Science group acquires the necessary IT skills to communicate with the IT department and take responsibility for all (some) applications coming from the group and moving into production, for example on a dedicated server.
In cases where extreme protection of the original data is required, the dedicated server can host all intermediate and final results of the data science applications and, maybe, even a copy of the original data.
In this way, the general IT and the Data Science IT are almost completely decoupled and the IT team of the data science group can take full responsibility for machines and applications.
Of course, choosing the right tool to deploy, schedule, and execute your application on a remote server might save you a lot of time, sweat, and endless discussions with the IT department.
There are a number of tools out there, allowing for server deployment, REST service deployment, scheduling, and final dashboard display on a web browser.
Choose one, the one that best fits your needs and group skills, but seriously choose one!.Training a Machine Learning model to a 99.
99% accuracy is a great academic achievement.
However, throwing the same model into real life is what will allow it to fulfil its purpose.
This is a challenge not to underestimate.
So, get all the help you can from the right tool!3.
Life in the Real World is more complicated than Life in the LabReal life is a jungle and throwing your predictor out there means to face the jungle of governance rules and data privacy issues, that were not considered during model training.
Your predictor application must be accountable in any moment for the decisions made.
This means that all predictions in the real world must be stored, traceable, and archived for the required — often legally required — amount of time.
Many countries, especially in Europe, have strong laws about data privacy.
Often what looked like a good idea during the initial brainstorming clashes with the legal reality of what you can do with the single person’s data.
A model trained on a large amount of data usually does not display information that can be traced back to the original data owner.
However, if the training set is too small or if the application is designed to track each single person, this might turn into a legal problem.
Lakshmi Krishnamurthy states that “Building, tuning and performance testing with training data in lab constraints is manageable; but integrating ‘’live’’ data and people issues of quality, governance, enterprise architecture constraints and last but not least getting customer buy-in on what your model is /was supposed to do is not always ‘’predictable’’.
That’s where the challenge is!!”On a similar note, Sam Taha thinks that the integration with the existing data plumbing inside the company might prove to be more difficult than initially thought.
“Going from a POC with ideal input and pipelines to a solution that works in the wild and across the solution space involves tight alignment with the business and with the upstream systems so that the model becomes part of the production “product””.
Paige Roberts also agrees and says that “a failure to take into account the data engineering work involved in creating production data pipelines to feed the model is another cause for missed deployment”.
Data LeakageRuben Ten Cate lists data leakage as another big impediment to deployment.
Data leakage is indeed a serious problem, not always easy to detect during training.
This happens when data in the training set contain features that are not available in the real world, for example features that are collected after or in consequence of the event to classify.
To quote Ruben: “To add another cause of missed deployment: a ‘data leak’.
Not a leak in the sense of a privacy breach but a leak of data into the training data, which is not present in the operational environment.
For example: often customers with more customer service interactions have a higher chance of churning.
This is a valid correlation and is a result of the fact that unhappy customers often call customer service to complain or to actually stop their contract (churn).
The problem with a model based on this feature is that in real life, we want to predict churning customers much BEFORE they become unhappy and call customer support (to churn), so the higher rate of interactions has not yet happened and is not present in an operational environment.
As a result, the model performs bad and is discarded before adding value in production.
”A classic example of data leakage is when we try to predict a travel delay including the arrival delay at destination.
Of course, if the transportation mean was delayed x minutes, this qualifies as a travel delay.
But how can we know the arrival delay before we arrive?.There are many more of these examples that with the after-knowledge become self-evident, but are not so clear during training.
Naming and describing your data columns clearly might help prevent this kind of problem.
Design Thinking, Lean Startup and Agile (or lack thereof)Patrick Slavenburg: Software development — and increasingly other teams s.
marketing & sales — moved to iterative processes like Design Sprints or Scrum.
Agile processes in Data Science are still fairly new but increasingly popular s.
DataOps (See DataOps manifesto).
User stories are not static however.
Even more problematic is that businesses often have difficulties articulating which business value they want to derive from AI.
And which has the highest priority.
The perennial Product Manager’s dilemma: “Should we build this?” and “Why?”.
Agile & Design Thinking may not be the typical Data Scientist’s background — instead they are using a longer term “scientific method”.
In the end, they may have solved the problem.
But the problem is no longer in alignment with business needs.
Or it never really was to begin with.
Company PoliticsFinally, Patrick Slavenburg reports “organizational problems — and tribalism — as a problem for deployment”.
It’s not just the usual office politics that we often find in Dilbert’s comics as well as, unfortunately, in real life.
It’s also the perennial Chinese wall between OT and IT.
Whether business departments or the subject matter experts in Operations: all view IT as an enabler to their needs.
It also means that if Data Science is introduced through IT, it will stay within IT and never become widely adapted.
While it is Citizen Data Scientists we need in order to drive that adaption.
Another example is the lack of collaboration between the IT department and the data science group, as reported previously in this article (in the section about the missing IT skills).
In the worst cases this can even evolve in an ugly unpleasant war.
Management support is also hard to win.
This has sometimes to do with too high expectations from management on data science applications.
Or the opposite: that machine can never replace humans.
There’s sometimes little middle ground to be found.
ConclusionsYes, deployment hurts.
I have listed here some of the most common causes for the deployment pain.
The pain can be cured by not underestimating any of these issues, choosing a tool that simplifies your life, and dedicating enough time, resources, skills, budget, and early planning from the beginning of the project.
If you want to read the original comments in the LinkedIn discussions, you can find them in these two sections:https://www.
com/feed/update/urn:li:activity:6499534506897211392I hope I have not killed the enthusiasm of implementing and deploying data science solutions.
My goal was actually to make you aware of the pitfalls, rather than to scare you.
I hope you will make treasure of our experience and keep going with the deployment of even more data science applications.