A Design Thinking Mindset for Data ScienceAdapted from a research paper written for The University of Texas capstone.
Rachel WoodsBlockedUnblockFollowFollowingMar 22AbstractData science has received recent attention in the technical research and business strategy since; however, there is an opportunity for increased research and improvements on the data science research process itself.
Through the research methods described in this paper, we believe there is potential for the application of design thinking to the data science process in an effort to formalize and improve the research project process.
Thus, this paper will focus on three core areas of such theory.
The first is a background of the data science research process and an identification of the common pitfalls data scientists face.
The second is an explanation of how design thinking principles can be applied to data science.
The third is a proposed new process for data science research projects based on the aforementioned findings.
The paper will conclude with an analysis of implications for both data science individuals and teams and suggestions for future research to validate the proposed framework.
Keywords: data science, research process, design thinking applicationsIntroductionData science is arguably one of the most popular jobs of the century; yet, the characteristics of the job remain uncertain (HBR).
The lack of formal training available in university programs, unclear role requirements, and breadth of the position has led to both ambiguity on how to become a good data scientist, and an idolatry of those that are able to do it all — colloquially deemed “unicorns”.
Academic advancements in the field of data science have traditionally focused on the development of new statistical analysis techniques, machine learning models, and neural networks.
Little research has been performed on the process itself, with the most prominent being the KDD process — a framework for knowledge discovery in databases — proposed in 1997.
Beyond being out of date and lacking answers to common challenges in today’s data science research process, the KDD framework also focuses primarily on the data mining step within the process without expanding its depth to the research process as a whole.
Process research is quite popular in other fields; most notably, in design — a profession that has developed substantial literature on how to solve problems using the mindset of a designer.
Researchers and prominent designers have shared the core methods and reasoning patterns that is used in design work, resulting in the popularization of the term “design thinking” and its related practices within corporations and institutions.
Furthermore, many of the methods and principles of design thinking are widely applicable, and have already been applied to fields such as education, healthcare and writing studies.
The lack of formality in the role of a data scientist, a lack of existing literature, and interdisciplinary overlap with the field of design research encourages further research into the data science process as a whole.
Thus, this paper aims to identify the common pitfalls of the data science research process and propose a new framework for solving data science problems using the principles and process of design thinking by highlighting the strengths of design thinking as it relates to the data science research process.
This paper will also explain the methods used to reach such conclusions, explain potential implications, and identify future research opportunities.
MethodologyThe data for this paper was gathered through both professional first-hand experience, and a thorough review and analysis of industry research and academic literature.
The following provides a summary of methods used, with more detailed sources listed in the References:Systematic review of published papers on the subjects of design thinking and data science using Google Scholar and search terms such as Design Thinking Process, Design Thinking Applications, Data Science Process, Data Science Challenges, Research Mistakes, and Data Science ResearchReview of blogs and content published by subject matter experts at highly-regarded institutions such as Stanford D School, IDEO, Springboard, and O’ReillyReview of concepts in popular books on the topics including Change by Design, The Elements of Data Analytic Style, and Storytelling with DataOther process frameworks beyond design thinking were considered in attempts to solve the common shortcomings of the data science research process.
Amongst those considered were lean and agile frameworks.
In the end, design thinking was chosen due to its direct application as a general research process, over other frameworks that more closely resemble a development process.
Overview of Data Science Research ProcessIt is first important to establish a distinction between a data science question and a data science research project for this paper.
The former is defined as a well-defined question, or set of questions, that have been provided with the aim of reaching an answer quickly.
In contrast, the data science research project encompasses larger endeavors in which the goal and answer is developed simultaneously along the way.
The two can be categorized by length of time required for completion — a question is answered quickly, while a project evolves and is completed over a longer period of time.
Another distinction lies in the ambiguity of the task: a question is simply that; whereas a project typically begins more vaguely and thus results in many challenges.
The following sections detail the process for completing a data science project and the associated common pitfalls.
Frame the problemIt is common in any research process to being by framing the problem, stating one’s hypothesis, and developing a strategic approach to answering the questions that are posed.
Following this line of thinking, it is often common to approach research with the limitations of what data is available in mind, rather than starting with a strategic question or direction; however, the rise of big data has fundamentally changed the way one can approach such research.
As the available quantity, quality and variety of data has increased, effectively eliminating the need to start a research process with the limitations of the data, the approach to framing the data science project has not evolved.
This becomes most troublesome amongst inexperienced data scientists who often begin with the mindset of what data is available and struggle to uncover meaningful insights, rather than working backwards from what questions would be most strategically valuable to focus their research on.
Get the dataThe subsequent step in the data science research process — gathering, cleaning and preparing the data — is often overlooked in importance.
Data integrity is a large concern in the research sphere and it begins with a full understanding of the source of the data.
Without developing a full understanding of the data source, it can be difficult to articulate the generalizability and accuracy of results, the impact of any findings, or the theoretical basis for the models developed to an audience.
Explore the dataData exploration, also known as data mining, is the process of uncovering valuable insights from large datasets, often with the assistance of advanced statistical analysis and visualization.
Beyond surface-level techniques, such as running descriptive statistics of variables and checking for correlations, this point in the process can stump a data scientist as they struggle with what questions to ask of the data.
To be successful in this step, substantial knowledge of the research question and creativity to move beyond elementary exploration to valuable insights are required.
Perform in-depth analysisOnce the data has been explored, the process typically turns to either further in-depth analysis or model building, depending on the scope of the project.
The common downfall at this point exists when a data scientist becomes engrossed so deeply in the project that they lose sight of the end goal and either get stuck down a rabbit hole or produce an outcome that is not immediately actionable or valuable to the project’s stakeholders.
Communicate resultsAs with any research, the final step in a data science research project is to communicate the findings to the relevant stakeholders.
This point is where the data scientist or research team needs to communicate the actions that should be taken based on their findings are by packaging them up in a report or presentation and sharing those findings with teams that would be able to implement relevant insights.
Ideally, all research projects would end in a deeper understanding, if not a direct action, for a business in order to justify the investment of time spent researching; however, that is not often the case.
In summary, the above description of the data science process reveals an overarching theme of challenges: framing and asking the right questions of the data, and producing actionable results for the relevant stakeholders.
A Proposed Mindset: Design Thinking for the Data Science Research ProcessThe challenge of framing the problem correctly, and ensuring the process is producing actionable results is one that has also been tackled with the design thinking process.
The definition of a set of core principles and the creation of a high-level process flow has brought clarity in how to solve these issues among designers.
By synthesizing the strengths of design thinking and the identified downfalls in the data science research, the following sections propose principles and process improvements for data science research:Core Guiding PrinciplesEmpathy: Define the relevant stakeholders at the beginning and keep them in mind along the way, including performing subject-matter expert interviews, frequently gathering informal feedback, and creating an end-product specifically packaged to meet the stakeholders’ needs.
Understanding through prototyping: A researcher’s understanding of the problem space can benefit from drawing out hypothetical complex analysis or models beforehand to grasp scope and test out possible solutions before significant efforts are invested in a full solution.
Furthermore, rapid prototyping is proven to bring similar results as non-constrained prototyping, in less time.
Active and purposeful feedback: Gathering input frequently and intentionally from both technical and non-technical stakeholders can aid in both developing a deeper understanding of the problem and brainstorming approaches to find a solution.
Diagrams over descriptions: Communication of analysis, models, and findings can become complex.
Help a non-technical audience understand the process; make it visual.
Build on the ideas of others: Look internally and externally for what has been done and how one can build on that work.
Teams often suffer from duplicate work as the result of a lack of knowledge sharing and data science work is no different.
Embrace creativity and the non-linear journey: If a researcher has not explored lots of options, they haven’t diverged enough.
Embracing the mentality of quantity over quality for brainstorming provides more options for discovery down the road.
A New ProcessIdeationAs described earlier, a successful data science project begins with strategic question and selection of the area of research.
Design Thinking describes a strategic selection process as balancing the technical feasibility, desirability, and business viability.
This mindset forces design solutions to consider the human elements to a solution in conjunction with the business and technical aspects, which is an area that is often overlooked.
The following proposes a similar venn diagram for the selection of a data science research area to aid in increasing the chance for success in a data science project: the intersection of technical feasibility, business impact and data availability.
Demonstrating the Three Aspects of Data ScienceAnother important aspect of the ideation phase is the framing and scoping of the problem, including determining hypothesis, important questions, and goals of the research.
In design thinking, since the designer is often not the subject matter expert, designers conduct primary and secondary research nearly from the second the project is assigned to start developing an understanding of the problem space.
Often during these early stages, designers are pleasantly surprised to gather deeper insights than expected.
A simple question of asking to be walked through the process may illuminate pains and problems even if the designer or researcher isn’t explicitly asking for them.
Data scientists can apply the same techniques to help familiarize themselves with both the area of interest, get to know the stakeholders, and uncover insights that will assist in framing their research process.
Applying this mindset could look like the beginning stages of a data science research project including ample upfront research.
Examples of activities to perform include:Informational interviews with other business units and teamsDiagramming processes and concepts to test understandingAccessing user testimonials and support tickets, when applicableExploratory AnalysisEntering the exploratory phase of a project can seem disorganized, making it difficult to come up with ideas and lead to creative burnout.
Design thinking approaches this phase in the design process by creating process diagrams and frameworks to organize key learnings, identify areas of further interest, and communicate decisions to outside stakeholders.
Examples of exploratory tools used in design thinking include As-Is Journey Maps and Customer Journeys.
Similar tools, or simply drawing out a concept of the research area can help data scientists organize and strategize their next steps in the exploratory phase.
Brainstorming and creativity in analysis techniques are also a key part of the exploratory phase.
As described previously in this paper, data scientists can experience frustration when they are unable to think of new analysis questions in attempts to uncover an interesting area for further research.
An important theory from design thinking that can be borrowed for data science methods is the idea of quantity over quantity in the early stages of a project.
Instead of limiting ideas from the beginning, start by writing out 100 potential questions or queries that can be made of the data, no matter how absurd or un-useful, then bundle these ideas into themes and prioritize those by easiness versus potential impact.
Modeling, Prototyping and Deeper AnalysisThe design thinking process benefits from rapid iteration and targeted feedback from relevant stakeholders, allowing a larger range of possible solutions to be considered in the selection process.
This strategy aims to avoid personal biases and the selection of the first idea when a better idea may have come along down the road.
The data science process can benefit from a similar strategy to help increase the creativity and options considered, while avoiding pigeon-holing a solution based on the first idea that was tried.
Gathering feedback from relevant stakeholders also has benefits when it comes to communication and buy-in.
For example, designers use feedback not only as a method of gathering input on potential solutions, but also for learning how to best position solutions and share findings with specific audiences.
Similarly, data scientists often struggle with ensuring the project’s results are relevant to stakeholders, and if they are, how to communicate it.
Thus, frequent and intentional feedback throughout the process of modeling and creating deeper analysis could improve these weaknesses.
Presenting Findings and ModelsMost research processes gather a plethora of data and insights, typically more than is relevant and valuable to an audience.
As a result, the results often need to be distilled down to what is really important.
Design thinking approaches this step by framing the insights and proposed solution in a story, aimed at taking the audience through the journey of why the end results matter, how those results were achieved, and what to do about it.
Data science research does not often end in a well-crafted story, so creating engaging material for an audience can be challenging.
Through borrowing the principles of storytelling used in design thinking, a few suggestions are as follows:Focus on explanatory analysis over exploratory analysis: Explanatory analysis presents an important finding or recommendation first, then explains the process that was taken to get there.
Findings that are merely interesting and not useful are saved for in-depth descriptions of the project, or not included at all.
Use visualizations with purpose: Start a visualization with writing out what needs to be communicated, then create exactly that.
Often it’s easier to create a set of charts and graphs, then pull insights and craft a story around what has been created.
This results in less-compelling visualizations.
Instead, start with the purpose.
Document the process as a journey: Sharing the steps that were taken to reach a conclusion helps an audience develop a deeper understanding of the final recommendations and inspire action.
Use the journey of the research to create credibility and get buy-in from important stakeholders.
Potential ImplicationsThe beginning of this paper discussed the expectations and search for the “unicorn data scientist” as one who excels in the technical, business strategy, and communication aspects of data science.
Furthermore, the focus on the technical training associated with data science has left many lacking the strategic and communication skill sets.
The proposed Design Thinking Mindset for Data Science has the potential to assist technically-minded folks with the other aspects of the process, including framing the problem, expanding ideation through creative methods, performing exploratory analysis with the end goal in mind, gathering feedback on prototypes to keep stakeholders involved, and packaging the end results into a compelling story.
Furthermore, as the technical components of data science progress further and further towards automation with AI-based data analysis products such as Watson, these identified aspects of data science work will becoming increasingly important as they are inherently human-centered and more difficult to automate.
ConclusionExisting literature has demonstrated the benefits of the design thinking process and highlighted the downfalls and challenges faced in the data science process.
Through synthesizing the research from both disciplines, one can conclude that the data science discipline could benefit from applying proven design thinking methodologies.
The two specific areas in which the data science process can benefit most are asking the right questions of the data, and producing actionable results.
First, a data scientist can apply empathy and prototyping to ensure they have the right framing and are asking the right questions of the data.
Throughout the ideation phase, data scientists can conduct informational interviews to gather a deeper understanding of the problems they are trying to solve while developing empathy for their end user.
Prototyping of analysis and models can be used by data scientists to develop a deeper understanding of the data, get a sense of feasibility of solutions, and gather information on the usefulness of their conclusions to their stakeholders.
Second, data scientists can improve the actionability of their results by seeking proactive feedback on both the findings and communication style of those findings.
By gathering input from both technical and non-technical stakeholders in the project, a data scientist can check that they are on-track to produce a useful outcome throughout the project rather than waiting until they are presenting their findings at the end.
Feedback is also beneficial at the end of a project when a data scientist is evaluating the best communication style to relay their findings.
The data science research process will become increasingly important as the field of data analytics and machine learning continues to grow.
As seen in the research, it is clear that there is a gap between the known challenges and supporting research around the process.
By synthesizing learnings from another discipline, this paper has proposed a framework that could improve said challenges and define the next data science research process paradigm.
Future ResearchThis paper is intended to function as a starting point for a new process framework for performing data science research projects.
Research can be conducted to both assess the validity of the application of design thinking principles to data science research problems, and to build upon baseline ideas with more formalized process recommendations.
Additionally, research could focus on measuring the impact of data science teams on business to help quantify the importance of process improvements and aid in comparing multiple processes.
This research did not focus on quantifying the tradeoffs between “quick and dirty” analysis versus full research projects as it relates to the quality of decision-making and strategy.
The challenge with measuring both impact and tradeoffs between research types lies in how success is defined.
So, a suitable starting point for further research may be to define a few key performance indicators to assist in quantifying data science work.
Recommended further reading and key inspirations to this paperChange by Design, Tim BrownStorytelling with Data, Cole KnaflicThe Elements of Data Analytic Style, Jeffrey Leek.