Do Aspiring Data Scientist Need Mathematics in Their Toolkit?John OhakimBlockedUnblockFollowFollowingMar 5Source“Mathematics, a veritable sorcerer in our computerized society, while assisting the trier of fact in the search for truth, must not cast a spell over him.
” — California Supreme Court, People v.
Collins (1968)So maybe you recently read an article in the Wall Street Journal or listened to an episode of one their podcasts, in which Data Science was presented to you as a burgeoning field.
And now you have decided to pursue a career in it.
You started investing in yourself by taking online courses in Machine Learning and Python.
That’s certainly a good way to get your feet wet.
As you go through that process, do not forget to invest in equipping yourself with one of the vital tools that a good scientist employs: Mathematics.
This, unfortunately, is an aspect of the process that some of these online courses fail to or barely mention.
I would like to make a case for this important aspect of your journey, should you choose to go this way.
I made the career transition into data science.
But luckily for me, I came from a field where mathematics and statistics (and some coding) is our bread and butter.
This prior knowledge before the transition made my entry into the data science space less daunting.
My undergraduate experience presented economics to me in a simplistic way.
The models that were presented to me seldom explored the mathematical foundations that drove them.
For instance, when I first encountered the model of market equilibrium (formally termed the model of price determination) I was enthralled by its simplicity and elegance.
Individual demanders (consumers) and suppliers (firms or businesses) are assumed to take prices and then determine their respective optimal responses to those market prices.
This linear model, based on a number of assumptions, contains one equilibrium condition (a state characterized by a lack of tendency to change), plus two behavioral equations which govern the demand and supply sides of the market.
There was some math (and a whole lot of graphs) but nothing overwhelming.
At the time, I convinced myself that I had a firm grasp on how capitalistic societies were organized and functioned.
Fast-forward to my first week in grad school.
It did not take long to realize that underneath all the models I previously saw were a host of math equations.
The “words” previously employed to state assumptions and conclusions were now stated with mathematical symbols and equations.
Literary logic gave way to mathematical theorems.
I found myself immersed in the world of linear algebra and calculus.
Although I had taken a number of math classes in the aforementioned topics, I rarely made any connection to my beloved economics.
Yes, I took differentials and did some algebra in both my junior and senior year in my micro and macro classes.
Yet, the rigor associated with economics in grad school took me by surprise.
This surprise was two-fold: first, one of consternation, and afterward, one of excitement.
The latter came after I realized its value.
It is argued that a mathematical approach has a few advantages.
According to Chiang (2005), these include: 1) the language is more concise and precise; 2) a plethora of mathematical theorems are at our service; 3) it allows us to treat the general n-variable case; 4) we avoid the pitfall of implicitly using unwanted assumptions since the use of mathematical theorems make us explicitly state our assumptions.
These I found to be true.
I was soon building mathematical models to assist me in conducting research.
I once wrote a technical paper that had far more mathematical symbols and equations than words.
I, however, lost sight of the role mathematics should play as one objectively seeks to gain insights from data.
The “trier of fact” in his “search for truth” had fallen under the spell of this “veritable sorcerer”.
Mathematics should be a tool and not an end in itself.
Working with data sets, I find myself drawing on my knowledge of mathematics.
Tools such as linear algebra come in handy when working with list of lists, nested dictionaries, and dataframes.
These tools make it much easier to see what is going on under the hood and, therefore, quicker to comprehend.
Let me briefly illustrate:This creates a Dataframe, with 5 rows and 3 columns (that is, a 5 x 3 matrix).
Think of a Dataframe as a Matrix that contains other column vectors.
More often than not, data scientists operate on or extract specific portions of data.
When they perform indexing on a DataFrame or Series they can specify the specific section of the data they want to operate on.
A knowledge of Matrices should offer some help as one works through indexing on a Dataframe.
Let me present another illustration:These are just a few examples of the many cases in which a knowledge of mathematics assists the data scientist in her quest to unearth stories buried in the data.
So, while I highlight that mathematics is vital for a successful data science career, the aspiring data scientist must keep this in view: Mathematics is a tool that assists in our search for truth, it is not an end.
My advice to aspiring data scientists: take some time to learn some mathematics.
It will make the journey less daunting.