3 Tips to Improving Your Data Science Workflow

This happened all too often when I first started learning to code.

There are a few solutions to this but the simplest I found was to use print functions within the loops to track how far along the code was to being finished.

The code in the image below shows exactly how to track the current progress of any loop in an iPython notebook.

A more detailed write up can be found here.

This now means I know whether I have time to grab a cup of tea or will need to leave the code running overnight and focus on another task in the meantime.

It has also helped when needing to update colleagues on a time scale required to complete work as I can estimate the time required when applied on a larger scale.


Optimising Parameters EfficientlyWhen I first started learning to apply machine learning, I would manually change the parameter inputs one by one and take a note of the results for my final output.

Although this helped my understanding with the parameters, it was time consuming and inefficient.

As time has gone on, I have intuitively developed three methods (though I make no claim that I was the first the come up with these) that have greatly improved my parameter tuning:Utilise loops to automate your testing of parameter inputsIteratively build the output table inside the loop ready for graphs or publishingDemonstrate the parameter’s impact with interactive animationsThe first seems somewhat obvious, instead of manually changing the inputs one by one use a simple loop to increase the parameter at each interval and output the value or a graph for that increment.

This can even be used for Grid Search parameter testing where we basically brute-force check across the range of possible parameters for multiple inputs as shown below.

To improve this further, a good method is to form a data frame that adds the output of each increment as it applies it rather than simply printing the output.

One way to do this is by the following:Introduce an empty Pandas data frameTest the parameter inputs inside a loopApply the append function to add a row to the introduced data frame with the outputs for each loop iterationThis is shown in the full code below where each row is formatted neatly into a data frame to add on to the previous outputs.

This also makes it easy to create any summary graphs and can be easily used as a normal table ready for publishing.

Lastly, though perhaps not required for most projects, is the use of interactive animations for showing the output for parameter changes.

I have written a full guide on how to do this here and have used it in this notebook for better illustrating the impact changing parameters has on the stability of the output.

I hope you find these useful and these can help improve your data science endeavours.


. More details

Leave a Reply