We can use classical scientific concepts such as inductive reasoning, but as data scientists, complex systems are too large for us to comprehend.
The pieces are interacting, but we’re only able to understand small pieces.
Machines can pick up the bigger picture.
Inductive systems don’t lend themselves to these broad data sets, whereas deductive reasoning can help us derive conclusions that we may otherwise miss.
As an example, Cevora looks at a data set outlining high value and low-value houses.
In traditional machine learning, he would be attempting to predict which houses would be of high value.
Instead, he’s looking at why certain houses are more valuable than others.
This type of white box machine learning may help reveal patterns that wouldn’t be available otherwise.
Methods For Understanding DataThere are several tools for increasing the explainability of the data.
Non-TopologicalDimensionality reduction is a crucial part of understanding the data because our minds cannot understand anything more than 3D.
Four dimensionality produces data that we cannot quite comprehend, so instead, we remove the extrinsic space to understand patterns in intrinsic space.
Think of all the readings of weather across all data from multiple weather stations versus the reasons behind why the data looks the way it does.
Reducing the information to the simple description is sometimes called manifold learning.
When reducing these weather patterns, Cevora outlines a few different ways to reduce dimensionality in a non-topographical sense.
PCA doesn’t work because there are no real linear patterns with the weather.
T-SNE could be a better method, but it assumes that distances are T-distributed.
Isometry could also break for something like weather because weather is highly non-linear.
Topological Data AnalysisIllumr would use topological methods here instead.
Non-linear data can still look linear on a very small scale giving you a better insight into data that doesn’t follow a linear pattern.
Measurements are taken in arbitrary units, and there is no straightforward relationship between those small distances and what’s going on.
TDA is good at preserving local features, which could illuminate new insights into data.
It structures data in a useful way and can help clarify what the fundamental drivers are with each group.
[Related article: The 2019 Data Science Dictionary — Key Terms You Need to Know]Wrapping UpYou have to understand what your machine is doing in many cases because of things like justifiability, human readability, and reducing discrimination.
It’s vital to view your data sets as a clue to understanding the fundamental drivers for what’s causing the prediction or pattern.
While it may not be useful all the time, it’s more and more necessary as our algorithms get smarter.
We need to retain the human check on data analysis and understand, not just predict.
This video was taken at ODSC London 2018 — attend ODSC East 2019 this April 30 to May 3 for more unique content!.Subscribe to our YouTube channel for more videos taken at past conferences.