Here I share with you 4 principles I learned from my professional and personal projects.
Make business goal as clear as possibleDoing Data Science is an absolute trend, sometimes over used and not adapted while being a powerful tool.
It is why we have to be sure before starting any line of code what are we looking for to demonstrate / improve and which kind of data do we have to achieve this goal.
Define a clear business goal before coding, is an important key feature in enterprise, then this definition will be used to define the evaluation metrics of our solution.
Data have to be preparedMost of any Data Science project is to ensure that we have data.
Then we have to ensure that the data is well prepared to be used on our models.
Missing values and outliers, are the kind of things we have to deal with to have a good data quality for our model.
The time required for data preparation is often under-evaluate.
It is not the sexiest part of Machine Learning but prepared and then knowing your data by an EDA process allow us to challenge the work experts that we are trying to help, and then have a better understating of the primary issue (and sometime find other issue need to be addressed before).
Define precisely the deliveryHaving a model doing well with evaluation metrics is the primary goal of any Data Scientist.
But we have to be sure what is the final delivery we are looking for and how this delivery will be technologically integrate in a global solution at the beginning of the development.
Are we looking to develop an MVP ? It is a shell script execution? Web execution ? What do we want dynamic ? What do we want hard coding ? It is only demo or a production project ? All those questions had to be answered before coding.
No black box effectMachine Learning will be used in more and more of industries at every level.
From this assumption we have to be sure that all humans can well understand what the machine predict and why, Machine Learning interpretability is the key for that.
Ethic issues are more and more addressed in our community, I strongly believe that the “No Black box effect” will permit us to deliver a solution by anticipating ethics issues that AI may face in the future.
I strongly recommend to any Data Scientist to add in its basic data science pipeline a ML interpretability section as we do for Features engineering or metrics evaluation.
Any suggestions ?Please feel free to share your own advice in the commentary section.