Plotly Express Yourself

(and prefacing thoughts)For our test data I found this fun dataset on Kaggle on superheroes (hey, I just saw Avengers:Endgame!):2.

Code for getting and scrubbing the data, as well as the snippets below can be found in this jupyter notebook here.

3.

If you haven’t installed it or imported yet:# Installpip install plotly_express# Importimport plotly_express as px4.

We’ll assume you have some conceptual familiarity with the charts shown below, so we won’t be deep-diving into the pros and cons of each.

But we’ll also add some thoughts and references when available.

5.

The look of the following plots vary wildly because I was curious as to the look and feel of the various built-in templates (i.

e.

themes).

So, apologies if that’s annoying, but like I mentioned, I’m trying to flex all of Plotly Express’ muscles.

Plotting Univariate DataLet’s cover a couple of classic ways to perform univariate exploration of continuous variables: Histograms and Boxplots.

Histogram (Univariate):px.

histogram(data_frame=heroes_clean , x="Strength" , title="Strength Distribution : Count of Heroes" , template='plotly' )Not the prettiest histogram.

You can read more about how cool histograms are here.

Boxplot (Univariate):…And the boxplot.

So elegant in its simplicity.

px.

box(data_frame=heroes , y="Speed" , title="Distribution of Heroes' Speed Ratings" , template='presentation' )Boxplot.

More can be found on boxplots here and here.

But if you find boxplots to be a bit square, perhaps a violin plot will do?Violin Plotpx.

violin(data_frame=heroes , y="Speed" , box=True , title="Distribution of Heroes' Speed Ratings" , template='presentation' )More of an upright bass plot.

Violin plots are becoming increasingly popular.

I like to think of them as boxplot’s cooler, better-looking sibling.

Ouch.

But what if the variable or feature you want to explore is categorical, not continuous?.In this case, you’ll probably want to start with a bar chart to get a feel for counts of values.

Bar Chart (Univariate)px.

bar(data_frame=heroes_publisher , x='publisher' , y='counts' , template='plotly_white' , title='Count of Heroes by Publisher' )Remember when NBC’s Heroes was cool?Here’s a quick primer on bar charts.

Univariate analysis is all well and good, but really, we usually want to compare variables to other variables to try to tease out interesting relationships, so we can build models.

So let’s keep building our plotly-express superpowers on some examples of bivariate techniques.

Plotting Bivariate DataLet’s start with comparing continuous variables versus continuous variables.

Scatter Plotpx.

scatter(data_frame=heroes , x="Strength" , y="Intelligence" , trendline='ols' , title='Heroes Comparison: Strength vs Intelligence' , hover_name='Name' , template='plotly_dark' )If a theoretical character has 0 Strength, they at least rate 57 in Intelligence.

Hmm.

Scatter plots are the tried and true way of comparing two continuous (numeric) variables.

It’s a great way to quickly assess whether a relationship exists between the two variables.

In the example above, we further give ourselves a helping hand at spotting a relationship by adding a trendline.

It appears that there is a weak positive correlation between Strength and Intelligence.

Line Plotpx.

line(data_frame=heroes_first_appear_year ,x='Year' ,y='Num_Heroes' ,template='ggplot2' ,title="Number of Heroes by Year of First Appearance" ,labels={"Num_Heroes":"Number of Heroes"} )The early ’60s was a big turning point in comic superheroes.

A special case of continuous versus continuous comparison are time series.

The classic way to do this is with a line plot.

Almost always the date/time variable will be along the x-axis while the other continuous variable is measured along the y-axis.

And now you can see how it changed over time!What if we want to compare categorical versus continuous variables?.Well, it turns out that we can just use univariate techniques, but just “repeat” them!.One of my favorite ways is using a stacked histogram.

We can make a histogram for our continuous variable, for each value of a categorical variable, and then just stack them!For example, let’s revisit our histogram from prior, on Strength, but this time we'd like to see the data separated out by Gender.

Stacked Histogrampx.

histogram(data_frame=heroes[~heroes.

Gender.

isna()] , x="Strength" , color='Gender' , labels={'count':'Count of Heroes'} , title="Strength Distribution : Count of Heroes" , template='plotly' )I’m guessing the big bar for 10–19 is non-superpowered characters, like Batman.

Nerd.

Maybe the stacks are confusing to you and just want to see the bars grouped by bins:Stacked Histogram (grouped bins)px.

histogram(data_frame=heroes[~heroes.

Gender.

isna()] , x="Strength" , color='Gender' , barmode = 'group' , labels={'count':'Count of Heroes'} , title="Strength Distribution : Count of Heroes" , template='plotly' )…Or if either of those looks were too visually busy for you, then maybe you just want a chart for each category value.

You’ll see this sometimes called faceting (or at least that’s what I’ve come to call it).

Faceted Histogramspx.

histogram(data_frame=heroes[~heroes.

Gender.

isna()] , x="Strength" , color='Gender' , facet_row='Gender' , labels={'count':'Count of Heroes'} , title="Strength Distribution" , template='plotly' )Wow, I’m histogrammed out.

Let’s look at applying the same faceting/splitting concept to box plots.

Split Box Plotpx.

box(data_frame=heroes[~heroes.

Gender.

isna()] , y="Speed" , color="Gender" , title="Distribution of Heroes' Speed Ratings" , template='presentation' )And whatever box plots can do, violin plots can as well!Split Violin Plotpx.

violin(heroes[~heroes.

Gender.

isna()] , y="Speed" , color="Gender" , box=True , title="Distribution of Heroes' Speed Ratings" , template='presentation' )‘Agender’ characters have higher median (and likely mean) Speed.

So what about if you want to just compare categorical versus categorical values?.If that’s the case you usually want to look at relative counts.

So stacked bars are a good way to go:Stacked Bar Chart (Categorical vs Categorical)px.

histogram(data_frame=heroes ,x="Publisher" ,y="Name" ,color="Alignment" ,histfunc="count" ,title="Distribution of Heroes, by Publisher | Good-Bad-Neutral" ,labels={'Name':'Characters'} ,template='plotly_white' )Marvel and DC Comics are pretty top heavy with ‘Good’ characters.

Digression: It turns out that stacked bar charts are way easier using .

histogram since it gives access to histfunc, which allows you to apply a function to the histogram.

This saves steps from having to aggregate first (which you may have noticed was done for the bar chart above).

Plotting Three or More VariablesWe may be sensing a pattern here.

We can turn any univariate visualization into a bivariate one (or more) by using another visual element, such as color; or by faceting/splitting along category values.

Let’s explore adding a third variable.

A common technique is to add a categorical variable to a scatter plot using color.

Colored Scatter Plotpx.

scatter(data_frame=heroes[~heroes.

Gender.

isna()] , x="Strength" , y="Intelligence" , color="Alignment" , trendline='ols' , title='Heroes Comparison: Strength vs Intelligence' , hover_name='Name' , opacity=0.

5 , template='plotly_dark' )Similar relationships across Alignments.

Maybe this data is not that interesting with the added category, but categories really stand out when you find the right pattern, such as with the classic iris data set…like this:credit: https://www.

plotly.

express/But going back to our original scatter plot with color, what if we wanted to add on a third continuous variable?.How about if we tied it to the size of our markers?Scatter Plot, with Color and SizeBelow we add the continuous Power variable as the size of the markers.

px.

scatter(data_frame=heroes[~heroes.

Gender.

isna()] , x="Strength" , y="Intelligence" , color="Alignment" , size="Power" , trendline='ols' , title='Heroes Comparison: Strength vs Intelligence' , hover_name='Name' , opacity=0.

5 , template='plotly_dark' )Wow, Galactus is tops in Strength, Intelligence, and Power!One thing I noticed is that the legend doesn’t automatically add a legend for Size.

That’s a little annoying.

What can I say, Plotly Express has already spoiled me over the course of this post!We’ve barely begun to scratch the surface of what’s possible, based on what I’ve seen in the documentation.

We can go on and on, but let’s end our exploration on a couple more examples.

Scatter MatrixScatter matrices perform pair-wise scatter plots on a set of continuous variables, which you can then customize with colors, symbols, etc.

to stand for categorical variables.

px.

scatter_matrix(data_frame=heroes[~heroes['Gender'].

isna()] , dimensions=["Strength", "Speed", "Power"] , color="Alignment" , symbol="Gender" , title='Heroes Attributes Comparison' , hover_name='Name' , template='seaborn' )Maybe additional release will the option to toggle the diagonal into a histogram (or some other univariate chart).

Scatter with Marginal PlotsThat was neat, but this next one I really like for its simplicity.

The idea is you can add any of the univariate plots we’ve covered to the margins of a scatter plot.

px.

scatter(data_frame=heroes[~heroes.

Gender.

isna()] , x="Strength" , y="Speed" , color="Alignment" , title='Strength vs Speed | by Alignment' , marginal_x='histogram' , marginal_y='box' , hover_name='Name' , opacity=0.

2 , template='seaborn' )Whew!.That…was a lot.

But I think it’s a good start on creating a quick reference to the more common plotting techniques, all using plotly express.

I’m really digging what I’ve seen so far (everything we’ve done in this post are technically one-liners!) and look forward to their future updates!.Thanks for reading.

Sources:https://www.

plotly.

express/plotly.

Introducing Plotly Express, 20 Mar 2019, https://medium.

com/@plotlygraphs/introducing-plotly-express-808df010143d.

Accessed 11 May 2019.

.