PyViz may have an answerPython’s Current Visualisation LandscapeThe existing Python Data Visualisation system appears to be a confusing Mesh.
SourceNow, to choose the best tool for our job from amongst all of these is a bit tricky and confusing.
PyViz tries to plug this situation.
It helps to streamline the process of working with small and large datasets (from a few points to billions) in a web browser, whether doing exploratory analysis, making simple widget-based tools or building full-featured dashboardsPyViz EcosystemPyViz is a coordinated effort to make data visualization in Python easier to use, learn and more powerful.
PyViz consists of a set of open-source Python packages to work effortlessly with both small and large datasets right in the web browsers.
PyViz is just the choice for something as simple as mere EDA or something as complex as creating a widget enabled dashboard.
Here is the Python’s visualisation landscape with PyViz.
sourcePyViz GoalsSome of the important goals of Pyviz are:Emphasis should be on data of any size not codingFull functionality and interactivity should be available right in the browsers(not desktops)Focuses should be more on people who are Python users and not web programmers.
Again focus should be more on 2D viz more than 3D.
Exploitation of general -purpose SciPy/PyData tools with which the Python users are already familiar.
Also, Objects from nearly every other plotting library can be used with Panel , including specific support for all those listed here plus anything that can generate HTML, PNG, or SVG.
HoloViews also supports Plotly for 3D visualizations.
ResourcesPyViz provides examples, demos and training materials documenting how to solve visualization problems.
This tutorial provides starting points for solving your own visualization problems.
The entire tutorial material is also hosted at their Github Repository.
PyViz Tutorial — PyViz 0.
13a3 documentationHow to solve visualization problems with Python tools.
orgInstallationPlease consult pyviz.
org for full instructions on installation of the software used in these tutorials.
Here is the condensed version of those instructions, assuming you have already downloaded and installed Anaconda or Miniconda :conda create -n pyviz-tutorial python=3.
6conda activate pyviz-tutorialconda install -c pyviz/label/dev pyvizpyviz examplescd pyviz-examplesjupyter notebookOnce everything is installed, the following cell should print ‘1.
0a4’ or later:import holoviews as hvhv.
extension('bokeh', 'matplotlib')#should see the HoloViews, Bokeh, and Matplotlib logos#Import necessary librariesimport pandasimport datashaderimport daskimport geoviewsimport bokehIf it completes without errors your environment should be ready to go.
Exploring Data with PyVizIn this section, we will see how different libraries are effective in bringing out different insights from data and their conjunction can really help to analyse data in a better way.
DatasetThe dataset being used pertains to the number of cases of measles and pertussis recorded per, 100,000 people over time in each state of the US.
The dataset comes pre-installed with the PyViz tutorial.
Data Exploration with PandasIn any Data Science project, it is but natural to begin the exploration with pandas.
Let us import and display the first few rows of our dataset.
import pandas as pddiseases_data = pd.
head()Numbers are good but a plot would give us a better idea about the patterns in the data.
Data Exploration with Matplotlib%matplotlib inlinediseases_data.
plot();This doesn’t convey much.
Let’s do some manipulations with pandas to get meaningful results.
import numpy as npdiseases_by_year = diseases_data[["Year","measles"]].
sum)diseases_by_year();This makes much more sense.
Here we can clearly infer that around 1970, something happened which brought down the rate of measles to almost nil.
This is true since measles vaccines were introduced in the US around 1963[Wikipedia]Data Exploration with HVPlot and BokehThe plots above convey the right information but provide no interactivity.
This is because they are static plots without the functionalities of the pan, hover or zoom in a web browser.
However, we can achieve this interactive functionality by a mere import of the hvplot package.
hvplot();What is returned by the call is called a HoloViews object (here Holoviews Curve)which displays as a Bokeh plot.
Holoviews plots are much richer and make it easy to capture your understanding while exploring the data.
Let’s see what else can be done with HoloViews:Capturing important points on the Plot itself1963 was important with respect to measles and how about we record this point on the graph itself.
This will also help us to compare the number of measles cases before and after the vaccine introduction.
import holoviews as hvvline = hv.
options(color='red')vaccination_introduced = diseases_by_year.
hvplot() * vline * hv.
Text(1963, 27000, "Measles Vaccine Introduced", halign='left')vaccination_introducedHoloviews objects preserve the original data as opposed to other plotting libraries.
For instance, it is possible to access the original data in tabular format.
head()Here we were able to use the data that was used for making the plot.
Also, it is now very easy to break data in many different ways.
measles_agg = df.
sum()by_state = measles_agg.
hvplot('Year', groupby='State', width=500, dynamic=False)by_state * vlineInstead of a dropdown, we can place charts side by side for better comparison.
relabel('Alabama') + by_state["Florida"].
relabel('Florida')We can also change the type of plots, say to a bar chart.
Let us compare the measles pattern from 1980 to 1985 across four states.
states = ['New York', 'Alabama', 'California', 'Florida']measles_agg.
bar('Year', by='State', rot=90)It is quite evident from the examples above that by choosing HoloViews+Bokeh plots, we get the ability to explore data in our browser itself, with full interactivity and minimal code.
Visualising large datasets with PyVizPyViz also enables working on very large datasets with ease.
For such datasets, other members of PyViz suite come into the picture.
GeoViewsDatashaderPanelParamColorcet for perceptually uniform colormaps for big dataTo show you the capabilities of these libraries when handling voluminous amount of data, let’s work with the NYC taxi dataset which consists of data pertaining to a whopping 10 million taxi trips.
Again this data is already provided in the tutorial.
#Importing the necessary librariesimport dask.
dataframe as dd, geoviews as gv, cartopy.
crs as crsfrom colorcet import firefrom holoviews.
datashader import datashadefrom geoviews.
tile_sources import EsriImageryDask is a flexible library for parallel computing in Python.
A Dask DataFrame is a large parallel DataFrame composed of many smaller Pandas DataFrames, split along the index.
These Pandas DataFrames may live on disk for larger-than-memory computing on a single machine, or on many different machines in a cluster.
One Dask DataFrame operation triggers many operations on the constituent Pandas DataFrames.
Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses.
topts = dict(width=700, height=600, bgcolor='black', xaxis=None, yaxis=None, show_grid=False)tiles = EsriImagery.
options(**topts)dopts = dict(width=1000, height=600, x_sampling=0.
5)Reading in and plotting the data:taxi = dd.
persist()ts = hv.
Points(taxi, ['pickup_x', 'pickup_y'])trips = datashade(pts, cmap=fire, **dopts)tiles * tripsWe can also add widgets to control the selections.
This can be either done in the notebook or in a standalone server by marking the servable objects with .
servable() then running the .
ipynb file through Bokeh Server or extracting the code to a separate .
py file and doing the same thing:import param, panel as pnfrom colorcet import paletteclass NYCTaxi(param.
Parameterized): alpha = param.
75, doc="Map tile opacity") cmap = param.
ObjectSelector('fire', objects=['fire','bgy','bgyw','bmy','gray','kbc']) location = param.
ObjectSelector(default='dropoff', objects=['dropoff', 'pickup']) def make_view(self, **kwargs): pts = hv.
location+'_y']) trips = datashade(pts, cmap=palette[self.
cmap], **dopts) return tiles.
alpha) * tripsexplorer = NYCTaxi(name="Taxi explorer")pn.
servable()Taxi explorerConclusionThe PyViz tools help us to create beautiful visualisations even with a small amount of code.
This article is just a mere introduction to the multi useful PyViz ecosystem.
Go through the entire tutorial to understand the intricacies and its usage for different types of data.