The massive proliferation of data volumes has created an important opportunity for business insight and data-informed decision-making, but many companies struggle with getting the value out of all that data.
“While data is the new gold, and we are currently experiencing a data-driven ‘gold rush,’ enterprises are still struggling to manage data from multiple sources,” said Franz Faerber, Executive VP, Products & Innovation, at SAP, in an article entitled, Tackling Enterprise Big Data Challenges with SAP Data Hub.
The volume of relevant data for enterprises is growing exponentially, and IDC’s Data Age projects that it will reach 158 zettabytes (ZB) by 2025.
In a recent interview, Ken Tsai, Global VP at SAP, and Amit Satoor, Senior Direct Product & Solution Marketing at SAP talked with DATAVERSITY® about digital business trends and transformations related to this exponential growth.
Strong hyper connectivity between products and devices is on the rise and that Gartner has predicted that by the year 2022, 85 percent of the data the enterprise needs will not be within the four walls of the business, which means that the data is not directly under the business’s control, said Tsai.
A growing number of applications are cloud-based, and sensor data, Internet of Things (IoT), and other technologies are also using the cloud, but that doesn’t translate into easy connectivity across them.
“Each one of the cloud applications itself is a data silo,” Tsai said, and companies are struggling to responsibly and efficiently manage so many disparate data stores.
Adding to the challenge is the entanglement of managing Big Data in a technologically diverse landscape.
In terms of processing data, Machine Learning, AI, has further increased the complexity of existing data refinement, enrichment, and governance processes “each one of them has a different tools and techniques, and each one of them are isolated without proper orchestration,” he said.
The variety of data types is also a trend, further adding to the complexity.
Unstructured data from social media and the IoT provide opportunities for enterprises find innovative ways to reach and understand their customers, yet according to SAP’s Data 2020: State of Big Data study, 86 percent of enterprises claim that they are not getting the most out of their data, and 74 percent of enterprises say their data landscape is so complex that it limits agility.
Data Quality Concerns at Scale Companies also need good quality data for Machine Learning that projects which are driving advancement in fraud prevention, predictive maintenance, and supply chain optimization processes, but according to Faerber, it’s difficult to transform data from distributed locations within the data ecosystem.
He cited Forbes’ findings published in The Data Differentiator, showing that 84 percent of CEOs are concerned about the quality of the data on which they base their decisions.
That same study estimated that the average financial impact of poor Data Quality on organizations per year is $9.
“Additionally, when customers process data disparately, too often customers have redundant data sourcing, refining, and processing that consume precious resources which could be repurposed for something else,” he said.
New Technologies Hampered by Quality & Automation Issues The promise of new technologies is hampered by Data Quality issues across data siloes, remarked Satoor.
One of the biggest challenges is the complexity of preparing, cleansing, and delivering quality data for Machine Learning and IoT use cases, he said, along with the lack of simple, production-ready tools to automate end-to-end processing of all data, including IoT, social, and raw unstructured data.
Satoor said we need automated data preparation and computations to get most out of the Machine Learning projects for large scale production systems.
Regulations Require Greater Accountability Tsai commented that “very, very stringent data protection and governance requirements” are causing concern, not just due to the EU’s GDPR, but even in his home state of California, he is seeing regulations that are just as robust.
Satoor said that public datastores, which include personal, sensitive information, photographs, and other unstructured data have become more important recently with that increased regulation.
Uncontrolled data sourcing and consumption, as well as insufficient security and governance across a distributed data landscape make it difficult for companies to monitor and control data usage, said Satoor.
Tsai agrees: “You’re dealing with data silos on every level, but you are ultimately responsible for that protection and privacy,” no matter how difficult the process is for doing so.
A Better Way: SAP Envisions a Solution “You can look at database integration problems separately.
You can look at a computing hub separately, at cloud architecture separately, orchestration of data separately, governance of data usage separately, or ask yourself if there is an opportunity to design a solution,” said Tsai.
But, enterprises want a solution that can manage every one of these disparate elements from a common dashboard.
According to Satoor, “in looking at all these Big Data trends, we realized that we needed one sophisticated, scalable, automated tool.
” That’s when SAP’s Data Hub data orchestration solution was created – to provide one anchoring solution that solves a multitude of problems.
SAP Data Hub was developed to bridge raw data, Big Data stores, and enterprise data; to provide data discovery and Data Governance, data pipelines and orchestrations, as well as data ingestion and onboarding, “So you can actually make data-driven decisions,” said Tsai.
In addition to traditional data warehouses, data lakes, data marts, operational databases and cloud data stores are now included because a lot of new data is now in this category, said Satoor.
Connections can be through SAP’s HANA or other Data Management tools, so that customers can, “Orchestrate data flow from one point to the other, leaving the data where it resides, and then pushing the compute directly to that using our engine or the engine of your choice,” remarked Tsai.
Although there are individual tools on the market that provide one or more of these services, he said, SAP Data Hub is unique: “It’s really streamlined the IT and the data wrangling and data engineering process,” said Tsai.
Four Key Advantages Satoor described the four key attributes of SAP Data Hub: Data Landscape Management: Discovery and processing of disparate types of data in a multitude of locations, whether on premise, in the cloud or hybrid.
Data Pipelining: Modeling data pipeline workflows, data enrichment, preparation, and quality management.
Including distributed data processing.
Data Governance: Includes an information catalog so customers can to discover, define, and understand sourcesEnhanced User Experience: Console provides a unified, end-to-end view, with multiple personas for users at different levels.
Containerized Framework: SAP Data Hub and all its components reside inside a Kubernetes framework, which allows for quick and easy deployment.
The growth of data and the complexity of the systems where it resides does not preclude the need for Metadata Management, Data Governance, or Data Quality, they have in fact become even more important.
Companies are focusing more on the cloud, yet many still have on-premise datastores.
”For practical reasons, a lot of these customers have to keep their existing on-prem investments, but that data orchestration still needs to happen,” in order to do Big Data Analytics and bring in Data Science and technologies such as Machine Learning.
“You can’t spend all your time trying to find data where it’s hiding out,” said Tsai.
“Companies need actionable data to enhance customer-facing activities — such as pricing, customer churn, upselling, and promotion optimization — that drive growth,” wrote Faerber.
Having access to all this data allows enterprises to gain insights about their customers and create a better experience for them, while operating more efficiently.
Image used under license from Shutterstock.