linkA BEGINNER’S GUIDEBeyond Bar Graphs and Pie ChartsTaylor FogartyBlockedUnblockFollowFollowingJun 12Using Python, R, Tableau, and RawGraphs to effectively and beautifully communicate your dataI understand.
Maybe you forgot about your presentation this afternoon.
Maybe you have 5 minutes to throw together the 3 visuals your boss wants on his desk by the end of the day.
Maybe you’re just tired of dealing with and looking at your data after spending hours, even days, cleaning it and analyzing it.
I get it, but please don’t slap together a bar graph, add some colors to make it look more put together, and throw it onto your deliverable.
There are so many tools out there for the sole purpose of creating visualizations of data and you’re doing yourself an injustice by creating a basic (and useless) pie chart in Powerpoint when you could be creating something beautiful AND informative.
Even if your visual isn’t for a presentation- maybe it’s part of your data exploration- you still want to make sure you are visualizing it in a manner that allows you to draw the right conclusion.
Lesson 1: Know What’s Available to YouThere are two main types of tools you can use to create visuals: coding packages (ex.
ggplot2 in R, seaborn or matplotlib in Python) and visualization software (ex.
Tableau, Chartio, RawGraphs).
Both of these can be useful in different ways and we will visualize some of these differences using a dataset about refugees.
Coding packages are useful if you want to simultaneously manipulate the data while visualizing it.
For example, if in addition to your line graph you always want to show the confidence band around it, this is much easier to add into the code.
Or, if you’re creating visuals as a way of exploring your data before performing a regression procedure on it, it would be easier to complete this in the code as well since it does not need to be as fine-tuned as long as it is not making it to your deliverable.
Finally, if the data requires an extensive amount of cleaning, you’ll have to do this in your code anyway, so it may be beneficial to also create the visuals in case part of the cleaning needs to be redone.
Visualization software is useful if you’re trying to either create obscure visual types or encode a lot of information.
With code, you can generally encode 2, maybe 3, additional variables depending on their nature, but much more can be easily encoded with software that does half the work for you.
Especially if your data is formatted in an odd way.
In order to code a graph similar to the bar graph on the left, the data has to be reorganized so that the type of refugee is itself a column and the data in those different columns must be combined.
This is simple if your dataset is small, such as in this case, but if you are working with a large data frame, it may be difficult to reformat it.
Lesson 2: Pick the Right Kind of GraphAnother factor that may play into your decision on whether to use a coding platform or a visualization software is the type of visual you want to make.
Below are the types of graphs you can create within the four different packages and platforms I mentioned earlier.
type of graphs available for each programIt’s easy to settle for a bar graph or, heaven forbid, a pie chart.
Pie charts are included in the list above for the minimal occasions where it is appropriate, but if you haven’t figured it out yet, you should avoid them.
They rarely effective communicate information, especially when comparing several pie charts side by side.
It’s much simpler to use stacked bar charts in this case and if this isn’t possible, then you shouldn’t have been comparing them anyway.
Another popular mistake is the overuse of line graphs.
If the values on your x-axis are discrete, meaning the space in between them do not have logical value, then the values on the y-axis should not connect.
If the x-values could be rearranged or if they’re not numeric (other than dates), stay away from line graphs.
The take-away from the list above is that not every visual will be possible to make in every platform.
Yes, you will always be able to make a scatter plot or a bar graph, but if you want to make a circular dendrogram, you can only accomplish that in RawGraphs.
Or, if you want to make a symbol map of the US, the easiest way to do this is in Tableau (plus Tableau has map layers that can added on to show more information that may not be in your dataset).
Lesson 3: Customize Your Visuals, Don’t Settle for Orange and BlueListen, I love the colors blue and orange.
As a student at the University of Virginia, it’s my favorite color combination.
However, it can get boring when you 20th graphic is still in shades of blue and orange with the occasional red popping up as a third level.
In both code and software, you can choose the colors you use.
Now, color should not be used willy-nilly.
If there is no clear, informative reason to add levels of color, don’t add levels of color.
If the observation’s color is using the same information as it’s size, shape, length, etc then the addition of colors will just make your graphs look more confusing.
Compare these two graphic above.
They are displaying the same amount of information, but the bottom one encodes the number of refugees as shades of purple.
Yeah, it looks prettier, but it doesn’t give the viewer any more information than they had from the first graph.
Color can also have accidental connotations.
Let’s say instead of purple, the color scale was red to green.
Not only are these colors not nice to our color-blind friends, but we also culturally have an association of red with bad and green with good (despite Disney’s obsession with lime green).
So if the bar for 5 to 9 year olds was a bright red, contrasting with greens, we might make the conclusion that there are too many 5 to 9 year olds which may be true, but may not be the goal of the visual.
Let’s look at an even worse graphic.
How many things can you spot on this visual that make it much more confusing to read?.We have the alphabetic ordering of the ages, the confusing y-axis label, and the weird color scale encoding the number of Applicants.
When sorting values on your axis, if they are not ordinal/numeric in nature, put them in either ascending or descending order.
This makes it much easier to pick out the best, the worst, and everything in between.
The labels of your axes should be intentionally labeled and if you are comparing multiple graphs with similar information, keep the scales of your axes the same.
You don’t need to settle for the name of the column, you can customize it with a little bit of code or some editing in your software.
Lastly, again, be intentional with your colors.
If a scale like this makes sense (if the goal is 2000 applicants and anything too far below is bad), then use this color scale.
Lesson 4: Be Careful How You EncodeColor isn’t the only way to add extra information into your visual.
Another approach you can take is making the size of a data point associated with an additional variable.
In this visual, the color of the point tells you how many refugees came in 2006 where the size shows the number of refugees in 2015.
While conclusions can still effectively be drawn from this, be careful with the use of area.
As humans, we generally interpret differences in length and width much more easily than area.
One last note, encoded data should not be the primary data you want to communicate.
If what you really want someone to get from the last bar graph was the distribution of applications, making it the color will not effectively communicate it.
Encoding is for secondary data- data that is useful to include and may provide additional insight, but may not be crucial to the visual.
Lesson 5: Create Your Graph CompletelyThe visuals made as examples are not complete.
Again, unless they are for data exploration and will not make it to a slide-deck or presentation, there is more to do than simply make the bar graph.
If you look back to the first graph I made, I used the seaborn package in Python and did not include labels for my x-axis, so you can’t conclude anything from it.
You don’t know what 0 and 1 are- they could be cats and dogs for all you know.
Add labels, add titles, add legends.
Make sure that if somebody looked at your visual and had no understanding of your data, they’d be able to draw the right conclusion.
Useful Cheat Sheets, Code, and Walkthroughs to Help You OutSeaborn Cheat Sheet and Example Codesns.
barplot(x='Sex', y='Applicants', data=data1).
title("Number of Refugee Applicants per Sex")sns.
scatterplot(x='Marital Status', y='All', data=data2).
title("Number of Refugees by Marital Status")https://www.
com/community/blog/seaborn-cheat-sheet-pythonggplot2 Cheat Sheet and Example Codeggplot(data, aes(x=data$Year,y=data$Refugees,color=data$Continent)) + geom_line() + labs(title="Refugees by Continent",x="Year", y="Number of Refugees") + scale_color_discrete(name="Continent") + scale_x_continuous(breaks=c(2006,2008,2010,2012,2014))https://www.
pdfTableau Cheat Sheet and Walkthrough Examplehttps://confluence.
pdf?api=v2RawGraphs Walkthrough Example.