Well, unless it is Thursday, there is a tiny, but yet, real possibility that I will do.
This probability approach was my direct way to answer the question that has been haunting my dreams.
However, I wanted to complicate things, add an extra level of complexity to the problem, and blow my investigation out of proportion.
Thus, like a rational data person I am, I had to try to build a machine learning based predictive system to see if it could learn and predict if I will listen to Potato Salad at a given time and day.
The algorithm I applied to build my predictive model is one known as Support Vector Machine (SVM), a machine learning model that is mostly used to classify things into one of two classes, for example, yes or no, good or bad, spam and no spam.
For this particular problem, I applied a special variant of SVM, called One Class SVM, a technique used for outlier detection.
Now you might be asking, “but hey Juan why did you formulate this problem as an outlier detection one?” Good question.
Take a look at the next image.
This image shows the day of the week and times of the day in which I listen to Potato Salad.
From it, we can observe that my listening moments are usually quite regular in terms of days and times, in other words, there are not many outliers (instances that are not common) in the data (except that time I played the song on a Saturday at midnight).
Because of this, an outlier detection system like the One Class SVM could be used to learn a decision boundary — a “barrier” that divides the outliers from the inliers, a.
my usual Potato times — that knows when I might listen to Ensalada de Papa.
Thus, to predict if I will listen to the song, we need to see if the time and day pair lies inside the decision barrier, which I will show in the next image.
The X-axis shows the date of the week (starting from Monday) as numberThe data points enclosed within the dark red line, formally known as the decision boundary, are the inliers or the ordinary moments in which I play PS, and the points that are not within it, are outliers, or the unusual times in which I played the song, e.
Saturday midnight, Sunday at 4 a.
and Monday at 7 a.
The main takeaway here is that most of my listening times fall into the boundary, implying that my listening times are quite standard and expected.
I will ask again: am I going to listen to Potato Salad today? If the day and time are inside the red line, then maybe, otherwise, “less maybe.
”RecapIn this article, I have shown what happens when a data person is semi-obsessed with a hip-hop song, named Potato Salad.
To prove the said mild level of obsession, I created and analyzed a dataset made of every instance in which I played the song on Spotify, and found out the following facts:During the period of October 10, 2018, to April 13, 2019, I have listened to Potato Salad 104 times.
97% of my music sessions start with PS.
There’s a 100% chance that I will play PS if I open Spotify at 4 a.
I usually play the song around the same time.
One last time: am I going to listen to Potato Salad today? Hopefully, yes.
The real question here is, will you?The code used to produce this experiment is available on my GitHub, at the following link:juandes/potato-saladContribute to juandes/potato-salad development by creating an account on GitHub.
comThanks for reading.
Hope you enjoyed it :)If you have any questions, comments, doubt, want to chat, or tell me whether you like the song (I’m sure you did), leave a comment here or on Twitter and I will be happy to help.
Juan De Dios Santos (@jdiossantos) | TwitterThe latest Tweets from Juan De Dios Santos (@jdiossantos).
Machine Learning/Data Engineer.
Also, Pokemon Master, and…twitter.
comOh, I would appreciate if you share the story on Twitter, or retweet my Tweet with the story.
If I am lucky, Tyler will see this :).