Why those weird requirements?In reality, the difference between BI and Data Science is so fundamental, that it makes everything different: expectations, project methodologies, people involved, etc.
But one has to take a different perspective to see it.
The difference is in the type of questions that they address: BI provides new values of previously known things using some formula that is available.
Data Science works with the unknown (see first part of this series), answering data questions that nobody have answered before and, therefore, without formula in hand.
In BI, business comes to a BI developer with a formula or a method of calculating a report or a KPI, that business owns.
This means that business have designed the BI method and it is what they understand and comfortable using.
In Data Science it is quite different: business come with their actual data and some question that have never been answered before .
It is now up to a Data Scientist to test multiple approaches and select the best one, balancing between accuracy, simplicity, usability and capabilities of a production platform.
Once the model is selected and agreed with the business, it becomes a known method for answering the question, it becomes a subject of Data Analytics rather than Data Science.
You may notice that above statement about BI is debatable — it does not deal with completely known things — it may have a formula or a method, but it calculates unknown KPI values or even makes predictions using approved methodology.
To explain this duality, I’m using a nice concept of known unknowns and unknown unknowns, that was popularised by US Secretary of Defence Donald Rumsfeld back in 2002 in his famous answer about lack of evidence linking the government of Iraq with the supply of WMD to terrorists.
Using this concept, I could now formulate the difference much shorter:BI deals with known unknowns, whileData Science deals with unknown unknowns.
This is the biggest and fundamental difference between them.
At first, it may seem a pure formalism, focusing on a difference that is not that significant, but it will change once you start thinking about the consequences.
Firstly, when dealing with unknown unknowns in the very beginning of a project, Data Science cannot guarantee success, predict what the solution would look like and how difficult it will be to implement.
This means it is much more difficult to build a business case and plan a project.
If you continue to unravel these implications, you may come up with something like this:All of these differences actually arise from that difference, which seemed insignificant at first.
Indeed, when the solution is unknown from the start, a Data Scientist will be using trial and error approach.
In these circumstances, it would be wise to use tools and methods that allow quick turnaround of ideas, so that each new trial would not require too much time to prepare: new data should be readily available if needed, new software and libraries implementing the next method to try should be easily available to install and download, infrastructure must be ready to support new software or frameworks and so on.
That is why a typical Data Scientist’s toolbox and practices are built for flexibility and agility.
Programming languages, open source libraries, microservices, containers, APIs, and extreme agile are all helping Data Scientist to skim through ideas and find solutions quickly.
All these tools however, work best in an open, agile and fluid environment, where use of these tools is not limited by some external factors.
This type of environment can usually be found in tech companies and start-ups.
A typical corporate environment is very different — it is built for control and reliability, which are delivered through strict process rules, shared responsibilities, multilevel decision making, etc.
This means that IT systems of most big non-tech companies are very regulated and slow to implement changes.
Any Data Science project in these companies will face multiple distracting hurdles: lengthy process of getting data out of business systems for analysis, inability to operationalise the solution because current IT architecture cannot support containers and microservices, etc.
People maintaining corporate systems have very different priorities and mindset too.
They will not necessarily be excited about making changes to their systems or adding new, they might be worried about security compliance when signing off access to company data, and so on.
Business end users might not be very excited about introducing AI or Data Science to their roles too.
They might be worried about lack of knowledge in new field or about security of their jobs because of a false perception of the AI threat they got from media.
Nevertheless, all those people are critical for any Data Science project.
Project leaders and sponsors have to consider those issues to avoid “clash of cultures”, that can ruin any project.
Project leaders have to provide strong support to Data Scientists, who would otherwise find themselves outnumbered and disadvantaged in a hostile over regulated environment.
They also have to communicate why a Data Science project is needed for the company and to reassure people that it will not impact their jobs security.
Project sponsors will also need to be capable of pushing exceptional requests through all layers of corporate structure quickly if needed.
In contrast, Business Intelligence is already a well established part of a typical corporate landscape and BI project is mostly free from those issues by definition.
BI functionality is usually provided by a single or a very few platforms that are already incorporated into IT architecture and processes.
Business users are largely familiar and confident with it.
BI projects will deal with known unknowns, which means there is a method of finding those unknowns and therefore the project can be well planned in advance.
There is not much trial and error in BI.
On the top of that, a company would usually have good experience and track record of successful BI projects and would have good project expertise available.
SummaryFrom a business perspective, both Data Science and Business intelligence play the same role in Business process — they both provide fact based insights to support business decisions.
But they are fundamentally different from another perspective, which makes everything different: expectations, methods, tools used, etc.
The difference is in the type of questions these two addressing: BI works with known unknowns, when a known formula is used to calculate new value of a known KPI, while Data Science works with unknown unknowns, answering data questions that no one has answered before.
This little difference in definition means a lot.
Without a formula or a method given, Data Scientists use a trial and error approach.
In these circumstances, generally speaking, Data Science cannot guarantee success before a project begins, it cannot predict how many steps would be needed to find a solution and what it will look like.
In order to find solutions as quickly as possible, Data Science employs tools and methods optimised for speed: programming languages, libraries, docker containers, microservices architecture, etc.
It is very different from a typical corporate environment where IT systems are built for control and reliability.
This difference alone creates many difficulties for first Data Science projects in any established company.
But that’s not all!.There is another problem lurking around the corner — use of Machine learning.
Introducing ML into a business environment can be a big cultural shock for business analysts, who’s life is designing and maintaining business rules.
For solutions that use Machine Learning, rules as they know it are no longer required!.Who would like a change like that?.In the next short part I will touch on the tectonic shift that Machine learning brings to an established corporate culture.
Power (10 March 2007).
“A Brief History of Decision Support Systems, version 4.
You can find me on LinkedIn, twitter.