Northstar — The Latest & Greatest in Drag-and-drop data analytics from MIT and Brown UniversityResearchers from MIT and Brown University have developed a system for interactive data analytics that runs on touchscreens and lets everyone use machine-learning models to make predictions for medical research, sales, and more.
Dip Ranjan ChatterjeeBlockedUnblockFollowFollowingJul 4We keep hearing the term ‘data-science for everyone’ what does it truly mean? From corporations to local shop owner everyone has some problem which they want to solve with data but while the corporations might be able to hire data scientists to do the job for them, the local coffee shop owner might not have that luxury.
The goal is to make data science so easy that everyone can use it to have data driven decisions in their daily lives.
Admitted that grand problems would still perhaps require grand solutions but solving the daily problems should not be difficult.
It is in removing this difficulty that Northstar takes a significant step.
Northstar’s aim is to democratize data science by making it easy to do complex analytics, quickly and accurately.
What is it?Northstar is not something which came out of the blue to stun the world.
It is a result of years of collaboration between researchers at MIT and Brown which has resulted in this interactive data-science system, which runs in the cloud but has an interface that supports any touchscreen device, including smartphones and large interactive whiteboards.
Users feed the system datasets, and manipulate, combine, and extract features on a user-friendly interface, using their fingers or a digital pen, to uncover trends and patterns.
A new component was presented at the ACM SIGMOD conference, called VDS for “virtual data scientist”, it instantly generates machine-learning models to run prediction tasks on data-sets.
VDS is based on an increasingly popular technique in artificial intelligence called automated machine-learning (AutoML), which lets people with limited data-science know-how train AI models to make predictions based on their data-sets.
Currently, the tool leads the DARPA D3M Automatic Machine Learning competition, which every six months decides on the best-performing AutoML tool.
“Even a coffee shop owner who doesn’t know data science should be able to predict their sales over the next few weeks to figure out how much coffee to buy and in companies that have data scientists, there’s a lot of back and forth between data scientists and nonexperts, so we can also bring them into one room to do analytics together.
” says co-author and long-time Northstar project lead Tim KraskaHow does it work?There is use-case which demoed in the video at the bottom is a must see.
Load the data — Northstar starts as a blank, white interface.
Users upload data sets into the system and can then explore, connect, filter and perform various types of EDA with visualization capability somewhat similar to Power BI or Tableau.
“It’s like a big, unbounded canvas where you can lay out how you want everything, then, you can link things together to create more complex questions about your data.
” says Zgraggen, who is the key inventor of Northstar’s interactive interface.
Approximating AutoML — With VDS, users can now also run predictive analytics on that data by getting models custom-fit to their tasks, such as data prediction, image classification, or analyzing complex graph structures.
The system will automatically find best-performing machine-learning pipelines, presented as tabs with constantly updated accuracy percentages.
Users can stop the process at any time, refine the search, and examine each model’s errors rates, structure, computations, and other things.
“Together with my co-authors I spent two years designing VDS to mimic how a data scientist thinks” Shang saysThe system instantly identifies which models and pre-processing steps it should or shouldn’t run on certain tasks, based on various encoded rules.
It first chooses from a large list of those possible machine-learning pipelines and runs simulations on the sample set.
In doing so, it remembers results and refines its selection.
After delivering fast approximated results, the system refines the results in the back end.
But the final numbers are usually very close to the first approximation.
“For using a predictor, you don’t want to wait four hours to get your first results back.
You want to already see what’s going on and, if you detect a mistake, you can immediately correct it.
That’s normally not possible in any other system.
” Kraska saysThe researchers evaluated the tool on 300 real-world datasets.
Compared to other state-of-the-art AutoML systems, VDS approximations were as accurate, but were generated within seconds, which is much faster than other tools, which operate in minutes to hours.
What Next?The researchers are looking to add a feature that alerts users to potential data bias, outliers or errors.
New users might not be able to identify that such issues exist with data so their analytics will be way off.
“If you’re a new user, you may get results and think they’re great, but we can warn people that there, in fact, may be some outliers in the dataset that may indicate a problem.
” Kraska says.
Northstar in all it’s gloryReferencesOfficial MIT News on NorthstarVimeo.