We, at Yuxi Global, have developed a nearly perfect solution for this kind of situation.
As the Head of Data Analytics at Yuxi Global, I was tasked with the construction of a Prezi presentation to aid in explaining what “Data Science”, “Artificial Intelligence”, “Machine Learning” (ML) and “Deep Learning” (DL) mean to medium and upper management folks.
The ultimate goal was to help them guide their strategic as well as commercial decisions about what services and kinds of solutions we want to develop for clients in these areas.
After doing this we thought it would be valuable to share our work with the world, to save every data scientist out there many hours of intense labor with a presentation software.
The presentation link is just below this paragraph.
Further below, we also share a script that presents the material in a fully linear fashion.
Of course, you can pick bits and pieces on which to focus the presentation and jump between them.
That was one of the reasons for choosing Prezi as our presentation solution.
Presentation’s Script for “Analytics / Data Science / AI / ML / DL and all that jazz” — Part 1 of 3Part 1: What is data science / data analytics?This part consists of two chapters.
The first one focuses on trying to define the somewhat sneaky concept of “Data Science” by yielding not only one but three related definitions.
The idea is that the presenter may choose the definition that best resonates with his own background and understanding of the subject or the one that will be most easily assimilated by his audience.
The second chapter attempts to explain why data science is a tendency nowadays and it wasn’t before, which is odd given how natural or fundamental the concept of data science is in light of the definitions.
Chapter 1: Three definitionsDefinition 1: Perhaps one of the best ways to understand “data science” is as a process, namely that of using data to understand things.
Among the “things” we would like to better we can list:Your customersYour business processesHow your product is consumed / perceived / liked by your customers.
The key words here are “data” and “understanding”.
Data is the main input, understanding is the ultimate goal.
Thus, data science goes far beyond business intelligence, that often stops at measuring and presenting what has happened without delving into why it happened or how to control or affect what will happen in the future.
Another simple way to put this is that one lets the “data tell a story”.
After, all humans have always tried to make sense of the world around them by creating narratives that connect the mere facts in logically coherent — if sometimes fantastic — ways, e.
myths and legends.
Yet another way to put this definition into words is that data science equates to uncovering insights, patterns and trends that are *hidden* behind the data.
Some people like this definition because it is short and mentions two nice buzzwords: “trends” and “insights”.
The meaning of the former is pretty clear to me.
Most people have a very distinct mental picture of what a trend is.
However, insight is a much more abstract notion with a rather unclear meaning.
This is illustrated by the fact that this word doesn’t have a distinct translation into Spanish.
From the translations given you would think that it means “understanding” in some contexts and “knowledge” in others… What I do like about this wording is that it introduces the key notion that data by itself is just the beginning, and that there are important pieces of the puzzle hidden behind the data and which one wants to uncover.
This brings us to a further wording of the same concept, that I especially like: data science means transforming data into knowledge, that can guide decisions.
A wonderful image to bring home the (for some people unexpected) distinction between data and knowledge is the “tomato analogy”.
Here it goes:The key insight here is that data in itself has practically zero value when it comes to making decisions.
Data is no more than a, sometimes huge, collection of recorded facts about some entities, such as your customers, your sales transactions or your web interactions.
Although it might cost you and your company a lot of money, time and effort, to manage this repository of facts, said cost does not translate automatically into value.
Information can be found at the next level, after adding a further layer of value on top of data, in the form of systematization or structure within a rich context.
The business related aspect of your data only shows up here for the first time.
Data becomes information, only insofar as it has a relevant meaning within the context of your business.
Now onto knowledge.
The defining characteristic of knowledge is that it gives you a competitive advantage, or as the old apothegm reads “knowledge is power”.
The way you gain such an advantage is by making better decisions than your competitors.
In other words, one doesn’t have (valuable) knowledge about something unless said knowledge can inform your decisions and lead you to more desirable outcomes for your business.
The bit about wisdom is really a joke, in case that isn’t clear.
As someone once said, anything learned with humor is never forgotten.
Definition 2: Data science is just science.
This definition emphasizes the fact that science has been around for centuries.
In all that time, the recipe applied — a.
the scientific method —has been one and the same:Start by observing some phenomenon and record those observations, i.
Formulate hypotheses and questions and try to confirm and answer them, based on the data at hand.
(Strictly, speaking one can never validate a hypothesis, unless it is formulated as the negation of another one.
The most ne can hope for is to either reject or fail to reject it.
However, I advise you to never discuss this subtlety in front of management as it will most likely lead to confusion and them erroneously thinking that you are a cheap philosopher.
)Given enough data and confirmed hypotheses, one can sometimes venture to build a model.
A model is a much more elaborate conceptual / mathematical artifact that further encodes or summarizes your understanding of the process or system in question.
A strong enough model could even be used to generate forward looking predictions.
For some subjects of study, it is possible to design experiments (e.
A/B tests) to really test the effects of certain conditions on your system in a reproducible way.
This is really the only sensible way to establish cause-and-effect relationships.
In short, under this definition, data science is no more than the application of the centuries old (and proven!) scientific method to solve business problems.
Incidentally, this definition helps us see that the “data” bit in “data science” is really redundant, as there has never been any science that doesn’t deal with data!Definition 3: Data science as an umbrella term.
This is probably the least deep but also the most direct definition.
It reduces data science to a direct sum of some of its tactical components.
Under this definition, data science would be a recent umbrella term that groups up several crafts such as:Artificial Intelligence: Some people claim that good data science requires employing at least a pinch of AI.
We don’t necessarily agree with this statement.
We will have much more to say about this in Part 2 of this series.
Machine Learning: This will be the subject of Part 3Statistics: really a tool of any scientific endeavor and the only way to turn a “soft” science into a “hard” science.
ETLs, data cleanup: ETL refers to extraction, transformation and load or the stages previous to data exploration, formulation of hypothesis and construction of models.
These are undoubtedly the least glamorous but undoubtedly necessary aspects of data science.
However, it has become fashionable to classify these within the realm of “data engineering”.
Data Visualization: essential for the “story telling” aspect of data science mentioned in the first definition, as well as for the successful communication of results derived from models or from simple initial explorations of the input data.
Algorithms and Infrastructure: Aimed for efficient manipulation as well as processing of data.
This is the “number crunching” involved in the second stage (after ETLs and cleanup) of almost any data scientific project.
Algorithms and infrastructure are really a tool and a means yet not an end per se of data science efforts.
Their development and theoretical study really belong in the Computer Science department.
Efficient data storage and retrieval architectures: Quite similar to the previous one.
This aspect relates to the technology (both hardware and software) to efficiently acquire, store and retrieve data.
Data science necessarily finds support in this field.
Business knowledge / understanding of the application domain: This is an often overlooked aspect of data science.
I personally have experienced many instances of data science projects failing due either to the people from the “business side” taking for granted that the data scientist completely gets the business dynamics and all of its implicit assumptions and constraints for granted, or the data scientist himself completely disregarding these and focusing only on the data analytics methods.
Chapter 2: Why is data science “a thing” nowadays?Given all that was mentioned in the previous chapter about data science being no more than the natural application of an old method to the field of business in order to make it better and get an advantage, the obvious natural question is then: why didn’t it happen earlier?The answer in our opinion boils down to two closely related and relatively recent developments:The advent of the Internet economy and the explosion in mobile apps has caused a deluge of data waiting to be turned into value.
As a result of this there are many companies or so called platforms, whose business consists of shifting bits around.
Data is thus the main asset for many of these companies.
The same applies to many traditional industries that have made the transition to the digital world — think newspapers, TV, hospitality, food delivery, insurance, etc…1 exabyte =1.
000 gigabytesThe sharp decrease in costs associated with data storage and processing.
According to this source and this other source, the cost of a hard-drive per GB dropped from around $100 USD in 1997 to $1 USD in 2005 and then again to around $0.
03 USD in 2017.
This has but fueled trends such as “Big Data” and “distributed computing”.
To a lesser extent, the following factors, which we view as a consequence of the two we just mentioned, have contributed to the popularity of Data science and the resurgence of ML, AI, and Deep Learning (ML):The abundance of open-source tools.
The convenience of sharing source code via public repositories has made open-source software the king of data science and almost completely supplant traditional expensive closed-source solutions such as SAS or IBM SPSS.
The development of a wealth of innovative ML and DL algorithms The need to be able to process data sets that grow every year and to be able to extract value from non-tabular data (e.
image or sound files) prompted this.
On the hardware side, the availability of GPUs that are used to run the heavy computations required by deep learning algorithms.
This was an almost accidental but fortunate side-effect of the popularity of gaming.
Thanks gamers!Conclusion for part 1In this first part, we presented the general context of data science as seen through three different optics.
We also commented on the environmental changes that directly contributed to data science applied to information centric industries.
Stay tuned for part two in which we shall explain what AI is and how it is really much more than just ML.
Referencesa history of storage cost – matt komorowskia history of storage costwww.
comCisco Visual Networking Index: Global Mobile Data Traffic Forecast, 2017-2022 Q&AFebruary 19, 2019 About the Cisco Visual Networking Index Q.
Why did Cisco develop the Cisco Visual Networking Index…www.