Importance of Exhaust Data in Data ScienceRajdeep BiswasBlockedUnblockFollowFollowingJun 24Image Credit: NASA/JPL-CaltechI was fascinated when I first heard the term Exhaust Data.
There are a lot of definitions floating around on the same topic, and I wanted to dig an inch deeper.
In simple words exhaust data is the data which is generated without a specific purpose in mind and immediately might not reveal to be important for organizations to spend money on curating, storing and using it.
Let us try again with a few examples, and simultaneously we will try to connect the examples with Data Science.
Few Examples (we are just touching the tip of the iceberg here!):Online shopping activity: Let’s say you visit an online shop for buying shoes.
You looked around a few dozen and finally selected one and placed in the cart.
After an hour you came back and finally bought it.
For the online store, the primary information is which shoes you clicked on, which one you placed on the cart, your shoe size, and the final transaction information.
Now, let’s think of the exhaust data.
That can be your location information, the time you spent on other products, phrases that you did a search on, time the shoe is in the cart, which you device used, security logs, cookies, etc.
Leveraging this exhaust data, you can get several actionable insights using Data Science like product recommendation, fraud detection, efficient marketing, improve customer experience, price optimization, etc.
Tweets anyone?: Let’s say someone tweets “The first snow is like the first love — Lara Biyuts.
” Just this tweet and the username might not give us a lot of insights.
Come Exhaust Data in the form of metadata, things like location, a device id, time posted, number of followers for the user, the number of users the user follows, creation date for the account and so much more.
Now, this data can be used by retails store say to stock up medicine, supplies, winter boots or can be used by a media streaming company to stream holiday movies, can be used by the emergency services, navigation services can alert drivers and so much more application.
Video: This is a very interesting space.
Videos naturally come with some metadata like duration, file size, format, creator/director, actors, etc.
However, the scope of data science on this is rather limited.
Enter machine learning, and we can auto generate texts based on sound, the emotion of a scene (think emotion on a still image since a video is just a super fast series of still images), the location of the scene, historical importance, adult content identification.
Naturally, this metadata heavily gets harvested for personalized video segmentation, content moderation, video recommendation, searching and indexing videos based on content, personalized ad creation, adult content filtering, etc.
IoT space: Again, this is a very broad topic, and on similar lines like the other examples, the exhaust data in industrial IoT is probably the largest in volume.
For example, Rio Tinto, one of the world’s largest metal and mining corporations generates 2.
4 Terabytes of data per minute, and the volume of seismic data can add up to petabytes .
I feel the game changer here are two things 1.
Cost of storing and computing has decreased at an exponential rate in the last ten years thanks to cloud computing.
The edge compute size has decreased, and the power has increased (example FPGAs).
Thanks to this technology advancements the companies can run machine learning models in the edge locally and also keep storing huge volumes of data to do historical analysis.
These are of course, just a few examples of the growing enthusiasm of harvesting exhaust data.
In this thriving Information Society, every industrial sector is leveraging the power of exhaust data to have a competitive edge and excel in customer satisfaction.
In conclusion, while harvesting the exhaust data presents a great opportunity, it does come with the responsibility to protect personal and sensitive information.
An example would be NSA’s mass collection of phone records .
Two Stanford graduate student found in their research that the surveillance metadata can be used to find callers information, medical condition, financial and legal connections .
With an open, proactive and ethical mindset, I feel that the field of leveraging exhaust data would become a game changer in the coming future.
org," 12 November 2018.
com," 27 November 2015.
edu,” 12 March 2014.