Data, Algorithms, and HumansMaking sense of the Information AgeDawson EliasenBlockedUnblockFollowFollowingMay 25Photo by Amanda Dalbjörn on UnsplashData aren’t anything more than information that is stored on a computer.
There’s nothing particularly interesting about data besides the fact that we have a lot of it.
“Data are dumb.
“ — Judea PearlMany corporations and media would have you believe that the combination of data and algorithms begets AI, or even that algorithms themselves are AI in some cases.
This is not the case.
Machine learning pioneer and philosopher Judea Pearl put it best: “Data are dumb.
” I think he would agree that algorithms, too, are dumb.
The word “algorithm” has a futuristic buzz to it.
It certainly sounds like an intelligent thing to folks who don’t spend any time studying algorithms.
This is not helped by the way we use the word — we say, “the algorithm does…, “the algorithm found…,” “the algorithm predicted…” We personify algorithms.
It’s only natural.
But it makes algorithms seem like intelligent things, and this is a problem for conversations about technology.
In fact, algorithms are quite dumb.
The word “algorithm” could be defined as the specific way in which a programmer implements a function.
As such, the idea of an algorithm is not exactly important to anyone who isn’t a programmer.
But people who aren’t programmers are still members of conversations about technology, privacy, and big data.
And the word “algorithm” is still used.
So it’s important that we make it clear exactly what an algorithm is, lest it is confused with artificial intelligence.
The best way to understand algorithms is to distinguish them from functions.
A function is simply something that has a defined set of inputs and and desired output.
Functions differ from algorithms because a function could potentially use a variety of possible algorithms to get the desired output from the inputs, but an algorithm cannot necessarily be generalized to different functions.
The best way to interpret the distinction above is simple: a chess-playing computer is not going to beat you at checkers.
That’s because it employs an algorithm, and algorithms cannot necessarily be generalized to different functions.
In other words, the chess-playing computer is not intelligent.
It’s not AI.
This is true even when the algorithm employed is a machine learning algorithm (an algorithm combined with data).
Now, many would insist that I am simply making the distinction between strong or full AI and weak or narrow AI, and that machine learning algorithms are indeed AI — just a narrow version.
But this jargon induces a misunderstanding of what machine learning algorithms are capable of in those who are not familiar with the study of algorithms.
Besides, so-called narrow artificial intelligence does not exhibit the typical characteristics of what we would naturally understand as “intelligence,” such as self-awareness.
Data are dumb, algorithms are dumb, and two dummies don’t make an artificial intelligence.
This is still true when the data are big and the algorithms are fancy.
Deep learning algorithms are quite powerful, but they’re still algorithms, and they’re still dumb.
Fortunately, (some) humans aren’t dumb.
This is why so many impressive things have come of big data and machine learning — they’re accomplished by an intelligent human, not a dumb algorithm.
Photo by Nik MacMillan on UnsplashData, and by extension conclusions obtained from data, are perceived as objective facts.
But it’s all too easy to arrive at false conclusions, even when following good data.
Human brains are extremely effective pattern recognition machines — and they’re effective even when there are no true patterns.
Many of us are familiar with the sentiment “correlation is not causation.
” It’s true.
In fact, it’s quite difficult to establish causal relationships from data alone.
There are confounders, illusions, and paradoxes that get in the way of definitive conclusions.
Navigating these obstacles is confusing and sometimes impossible.
This is where the human comes in.
The value of big data and machine learning can only be realized at the hands of an intelligent human who can navigate the road to conclusions.
And being intelligent isn’t enough — the human must understand the domain she is working in, the data she is working with, and the other humans that she is working for.
The human must understand the domain because she will have to make assumptions in order to arrive at a conclusion from data.
It takes an intelligent human keen to the context of the problem or question to know which assumptions are okay and which assumptions are dangerous.
The human must understand the data because they are the raw material from which the conclusions are sculpted.
Machine learning can’t add any new material to the sculpture.
Statistics could be defined as the reduction of data, and machine learning is no different — it reduces data to a model or relationship.
Big data is just big reduction.
If the raw material is incomplete, insufficient, or biased, the conclusions will be as well.
“Garbage in, garbage out,” as they say.
The human must understand other humans because, at some point, the results of her work will be consumed by other humans.
She reduced the data to a value, relationship, or model, so that it can be internalized by her boss, some other stakeholder, or the public as an insight.
The consumer of the insight may or may not be aware of all the assumptions taking place on both ends of the research, and this means that the insight may be misunderstood or taken as the objective truth.
The consumer almost certainly does not understand the data and algorithms that went into producing the insight.
Therefore, our scientist must be transparent about the data, algorithms, and assumptions employed and meet the consumer halfway to support an equitable understanding.
To make sense of the Information Age, we must understand the limits of data and algorithms and the role of humans.
First, AI does not exist in the present day.
There are no intelligent machines.
There are only big data and complex algorithms, both of which are dumb.
The forefront of technology is still limited by the intelligence of the human that is wielding it.
Anytime you see the term “AI” being used, it’s nothing more than marketing.
There’s no reason to fear, antagonize, or worship data or algorithms just because they seem intelligent.
The incredible advancements in big data and deep learning are incredible because of incredibly smart people.
The other important consequence of these distinctions is that when the human wielding the data and algorithms is dumb — or simply doesn’t understand which assumptions are safe and which are dangerous — his conclusions will almost certainly be misleading.
It is our duty, then, to be prudent, transparent, and empathetic when conducting data driven research in order to produce equitable knowledge.