Presenting Your DataEmily ZhaoBlockedUnblockFollowFollowingMay 30What is Data Visualization?Data visualization is the process of presenting data.
It is how we communicate findings from data in visually clear, concise, and often aesthetic ways.
A data visualization typically focuses on a specific dataset, aiming to communicate a relationship, trend, distribution, etc.
or lack thereof among variables.
Visualizations help us get a grasp of the holistic view of our dataset, one we cannot have if we simply eyeball the raw data.
In short, humans need visuals!Why is it Important?At the core of it, a data visualization transforms many data points into a single story.
And stories can be good or bad; ideally, a good data visualization — like a good story — holds the reader’s attention and presents information in a concise, easy-to-understand way, leaving the reader with something to take away.
Data visualization allows for visual literacy in the data, meaning that it allows otherwise complex data to be visually processed and understood in simpler ways.
Visualizations reduce the cognitive load required to understand a dataset, provide overviews of the data, and comprise a crucial part of conducting exploratory data analysis.
Example of data visualization used in exploratory data analysis.
Image credit: https://en.
org/wiki/Exploratory_data_analysisTo get a better sense of why data visualization is important, we can look into existing visualizations to get a sense of their communicative intents and whether they achieve that intent.
Junk Charts is a blog for data visualization critiques, run by Kaiser Fung, who picks out data visualizations in media to evaluate/critique them and provide suggestions for improvement.
Learning to weigh the pros and cons of a visualization can help give a sense of why we should take the time to produce strong data visualizations (which isn’t always easy).
Clear data presentation goes hand in hand with good design, and not just design in the aesthetically pleasing visual sense.
Good design is equivalent to clear communication, and it facilitates an end goal such that the viewers of the visualization can describe the relationships in the data.
Data visualizations should contain communicative intent.
So this brings into importance the choice of data visualization, data transformations, shapes, use of text and color, and more.
Types of Data VisualizationThe Python Graph Gallery provides a comprehensive list and guide to different data visualization types and the information they communicate.
The site splits data visualization types into categories of distribution, correlation, ranking, maps, and more.
There are many types of visualizations, ranging from a simple line chart to a more complex parallel plot.
The type of visualization makes a big difference on what is being communicated about the data.
The best choice of visualization depends on the data itself and what relationships in the data we intend to explore.
Take a heat map for example: this type of visualization would best be used if you want to understand the correlation between different values for multiple variables, for instance the correlation between average gas price and geographic location.
Example of a heat map.
Image credit: https://www.
com/story/travel/roadwarriorvoices/2015/01/10/use-this-us-gas-price-heat-map-to-design-cheapest-possible-road-trip/83204036/On the other hand, a simple histogram can suffice for displaying the distribution of a single variable, such as the arrival time for a certain event.
Example of a histogram.
Image credit: https://en.
For more examples of current data visualization use cases, checkout The Atlas, a collection of charts and graphs used by Quartz.
Visualization TipsChoosing a visualization typeThe best visualization for your data depends on how much data you have, what kind of data you have, what questions you are trying to answer from the data, and often there isn’t one best visualization.
It would be helpful to play around with different visualization types because two visualizations on the same data can draw attention to different attributes of the data.
This visualization picker allows you to customize what you’re exploring about the data, and filters out suggested visualizations for your specific goal, but keep in mind these are just suggestions for starter visualizations.
When you have one variable (univariate)If it’s numeric (e.
histogram, box plot), the visualization should display the distribution/dispersion of the data, mode, and outliersIf it’s categorical (e.
bar chart), the visualization should display frequency distribution and skewWhen you have 2 (or more) variables (bivariate)The better visualization type differs depending on whether the comparison is numeric to numeric (e.
scatterplot), numeric to categorical (e.
multiple histograms), or categorical to categorical (e.
side-by-side bar plot)The right choice of visualization also depends on what question you are exploring about the data.
See the below diagram for a loose guide to picking a visualization type.
Diagram for choosing a data visualization type.
Image credit: https://www.
com/blog/how-to-choose-the-right-data-visualization-types/Transforming the dataLog transform is often applied to skewed data to bring it to a less skewed and more normal-shaped curve, making it easier to visually perceive the data and perform exploratory data analysis.
When you have a lot of data, smoothing helps remove noise in the data to improve visibility of the general shape and important features of the data.
Use of colorFor qualitative/categorical data, choose distinct colors to clearly separate the categories.
For quantitative data, use gradients to show comparative differences in small to large valuesUse of textIncorporate text to add clarity and intention, avoid clutter.
Add legends, labels, and captions to point out important features or conclusionsResources and TutorialsTo get a more comprehensive understanding of data visualization and its purposes, checkout this guide on Developing Visualisation Literacy.
For another overview of data visualization and walkthroughs on presenting data using R, checkout the book Data Visualization: A Practical Introduction by Kieran Healy (the draft manuscript is available online).
Flowing Data covers data in everyday life and provides tutorials for doing data visualization in R.
For those interested in visual storytelling, The Pudding is a digital publication that uses creative data visualizations to explain ideas in popular culture.
Visualizing Data is an encyclopedia for data visualization, with insights into best practices, examples, interviews with experts, and more.
If you’re looking to learn more coding for data visualization, this tutorial walks you through simple data visualization using matplotlib and Pandas, covering how to filter data, and how to plot a line plot, bar chart, and box plot.
This Medium article does a good job of showcasing the use cases for bar charts, scatterplots, and pie charts (though we recommend against using pie charts because it’s hard to accurately compare values across pie charts — histograms/bar charts are generally better options).
.. More details