How to avoid rookie mistakes in the field of Data Science?A guide to the dos and don’ts as a beginner to Data SciencePritha SahaBlockedUnblockFollowFollowingJul 2I have recently started my journey towards becoming a data scientist through self study.
The path has not been smooth always, since there was no one to run me through a detailed sequential syllabus.
Hence I tried a few things, didn’t quite succeed, but then picked up from there.
If you are an aspiring data scientist, this article would probably help you avoid committing the mistakes which I did.
Firstly one should never try to memorise or learn machine learning algorithms.
The brain retains only some part of it and throws the rest.
The best way to absorb them is by practise.
There is no shortcut!I committed the mistake of going through the course “Machine Learning A-Z: Hands on Python and R in Data Science” on Udemy.
My brain absorbed the first half of the course, but eventually it became tedious to read through the rest, as a beginner.
I did get the intuition of most of the algorithms, but looking back now I feel I could have done away with this particular course.
Secondly, programming is a crucial part of mastering Data Science.
We can’t overlook this stage.
Proficiency in a programming language is a must, specifically Python.
It is the most accepted language, because it has a wide range of libraries, which helps data scientists in the deployment of ready to use tools.
Moreover, most of the courses and competitions requires us to code in Python.
Hence a ‘Pythonic’ mindset is pivotal for a career in Data science!I took the help of two resources- Codecademy in the beginning and later Datacamp.
I soon quit Codecademy, because it started at a much basic level-probably meant for people who are not from technology fields.
I bought a year long subscription on Datacamp and found that to be a great resource for learning Python from a Data Science perspective.
There are courses for the libraries you would be working on finally- numpy, scipy etc.
There are projects on data analysis and visualisation.
Again, one should remember that there is no need of memorising the syntax.
Building familiarity with the functions and packages of the language is important, the exact syntax can always be googled.
Stack Overflow is a great resource to find answers to queries and also answer others’ questions.
Thirdly, one should not even think about the machine learning algorithm without analysing the dataset.
The machine learning part is just a 2–3 lines of code.
The rest of the code is dedicated to detailed data analysis and visualisation.
Without knowing the patterns in the data, it is not possible to ascertain which inputs are important for your output, cancel noise in the data and finally transform the data to make it ready for the model to consume.
Kaggle is a great resource to get yourself started on simple machine learning exercises (Titanic & House price prediction)and get hands dirty with data cleaning and transformation.
I’l cover more on data engineering in my next article.
Till then, keep the data scientist alive in you and do reach out for any questions or feedback!.