Matplotlib: An Introduction To Its Object Oriented InterfaceKapil MathurBlockedUnblockFollowFollowingMar 5I write applications for embedded platforms.

One of my applications is deployed in a network where it receives the data from several network nodes and processes them.

Timing is important here.

My application should be able to process the data from all the nodes in specified amount of time.

This is a hard constraint.

I rely on matplotlib along with pandas to visualise/analyse the time profiling information for each network packet received.

Above graph is histogram plot for the processing time of each received packet.

This is generated using matplotlib.

There were almost 2,00,000 data points for this experiment.

This graph tells me several important things.

It tells me that most of the time a network packet is processed within 30 ms of its arrival.

It also tells me that there are two peaks, one at 10 ms and other at around 20 ms.

You can see visualisation is important here and matplotlib does the job nicely.

matplotlib is vast.

It is a very useful plotting tool but sometimes it can be confusing.

I got confused when I first started using it.

matplotlib provides two different interfaces for plotting and we can use any of the two to achieve the results.

This was the primary cause of my confusion.

Whenever I searched the web for any help, I found at-least two different ways of doing it.

It was then I decided to dig a little deeper into its interfaces and this tutorial is a result of that.

The focus of this tutorial is narrow — To understand the “Object Oriented Interface”.

We will not start with a large dataset.

Doing that would shift the focus to the dataset itself rather then the matplotlib objects.

Most of the time we will be working on very simple data, as simple as a list of numbers.

In the end we will work on a larger dataset to see how matplotlib can be used in the analysis of a larger dataset.

Matplotlib Interfacesmatplotlib provide two interfaces for plottingMATLAB style plotting using pyplotObject Oriented InterfaceAfter my study of matplotlib I decided to use its “Object Oriented Interface”.

I find it easier to use.

Every figure is divided into some objects and the object hierarchy is clear.

We work on objects to achieve the results.

So I will be focusing on the object oriented interface in this tutorial.

Some pyplot functionalities will also be used wherever it is convenient to use them.

Matplotlib Object Oriented InterfaceA Figure in matplotlib is divided into two different objects.

Figure objectAxes objectA Figure object can contain one or more axes objects.

One axes represents one plot inside figure.

In this tutorial we will be working with the axes object directly for all kinds of plotting.

The Figure Objectimport matplotlib.

pyplot as pltfig = plt.

figure()print(type(fig))The output of the above code snippet is matplotlib.

figure.

Figure.

plt.

figure() returns a Figure object.

type() method in python is used to find out the type of an object.

So we have an empty Figure object at this moment.

Lets try plotting it.

# Give the figure a titlefig.

suptitle("Empty figure")plt.

show()Executing above code returns an empty figure.

I am not including the figure here because its just empty.

The Axes ObjectOne or more axes objects are required to get us started with plotting.

There are more then one ways an axes object can be obtained.

We will start with the add_subplot() method and later will explore other ways.

ax = fig.

add_subplot(1,1,1)# Set the title of plotax.

set_title("Empty plot")plt.

show()add_subplot(num_rows, num_cols, subplot_location) method creates a grid of of subplots of size (num_rows x num_cols) and returns an axes object for the subplot at subplot_location.

The subplots are numbered in following way:First subplot is at (first row, first column) location.

Start from this position and continue numbering till the last column of the first rowStart from the left most position on the second row and continue numberingEx: The 3rd subplot in a grid of 2×2 subplots is at location = (2nd row, 1st column)So add_subplot(1, 1, 1) returns an axes object at 1st location in a 1×1 grid of subplots.

In other words, only one plot is generated inside the figure.

Executing above code gives us an empty plot with x-y axis as shown below.

Let us take one more example.

We divide the figure in a 2×2 grid of subplots and get the axes object for all the subplots.

The output of the above code is:2×2 Grid of subplotsOnce we get the axes object we can call the methods of the axes object to generate plots.

We will be using following methods of the axes objects in our examples:plot(x, y) : Generate y vs x graphset_xlabel() : Label for the X-axisset_ylabel() : Label for the Y-axisset_title() : Title of the plotlegend() : Generate legend for the graphhist() : Generate histogram plotscatter(): Generate scatter plotPlease refer to matplotlib axes class page for more details about axes class.

https://matplotlib.

org/api/axes_api.

htmlEx1 : A Simple XY PlotWe can plot data using the plot() method of the axes object.

This is demonstrated in the example below.

titExecuting above code will generate y=x plot as shown belowEx2: Multiple Graphs In Same PlotLets try generating 2 graphs in single plot window.

One is for y = x and the other one is for z = x²This time ax.

plot() is called with one additional argument — label.

This is to set the label for the graph.

This label is used by ax.

legend() method to generate a legend for the plot.

The output of above code is shown below:As you can see two graphs are generated in a single plot window.

Also a legend is placed at the top left corner.

Ex3: Two Plots In A FigureWe will now generate multiple plots in a figureExecuting above code generates the following figure:Ex3: Histogram PlotsHistogram plots are useful in visualising the underlying distribution of data.

Below is an example of histogram plot.

Data for this example is generated using numpy.

1000 samples are generated from a gaussian distribution with mean of 10 and standard deviation of 0.

5.

The x-axis in the above plot has values for the samples and y-axis is the frequency for each sample.

We can observe a peak at value 10.

According to 3 sigma rule, 99.

7% samples of a gaussian distribution lies within three standard deviations of the mean.

For this example this range is [8.

5, 11.

5].

This also can be verified from the above plot.

Plotting on a larger datasetWe will be working with “California Housing Price Dataset” in this example.

This dataset is used in the book “Hands-On Machine Learning with Scikit-Learn and Tensor Flow” by AurÈlien GÈron.

This dataset can be downloaded from kaggle from the link : https://www.

kaggle.

com/camnugent/california-housing-pricesEach row in the dataset contains data for a block.

A block can be considered a small geographical area.

The dataset has following columns:longitude — Longitude in degreeslatitude — Latitude in degreeshousing_median_age — Median age of a house within a blocktotal_rooms — Total number of rooms in the blocktotal_bedrooms — Total number of bedrooms in the blockpopulation — Population of the blockhouseholds — Total number of households, a group of people residing within a home unit, for a blockmedian_income — Median income for households in a block(Measured in tens of thousands of US Dollars)median_house_value — Median house value for households within a block (In USD)ocean_proximity — Location of the house w.

r.

t.

ocean/seaLet us generate some plots to learn certain things about the dataset.

I am interested in knowing following things about the dataset.

Distribution of “median_house_value”Distribution of “median_income”My common sense tells me that houses should be costly at those places where income is high and vice versa.

Also number of rooms in a block should be more at those place where population is high.

Lets try to figure this out by generating some plots.

Pyplot subplots() method — We will use pyplot subplots method in this example to get the axes objects.

We have seen that add_subplot() method returns only one axes object at a time.

So add_subplot() method needs to be called for each subplot inside figure.

pyplot subplots() API solves this problem.

It returns a numpy nd array of axes objects.

Generating plots using axes object is same as explained in earlier examples.

I have used python pandas library to read the data from the dataset.

The dataset is a csv file with name ‘housing.

csv’.

plt.

subplots(2, 2) returns a figure object and a 2D array of axes objects of size 2×2.

Axes object for individual subplot can be accessed by array indexing over the 2D array of axes objects.

First plot has a nice gaussian like distribution except at the end.

This plot tells us that the mean of the “median_house_value” lies somewhere between 1,00,000 to 2,00,000 USD.

The upper cap is at 5,00,000 USD.

Also there is surprisingly high number of houses priced at around 5,00,000 USD.

Second plot also has a nice distribution.

It tells us that mean of the median income is somewhere between 20,000 to 40,000 USD.

Also there are very few people with income above 80,000 USD.

Third plot (population vs total_rooms) confirms that number of rooms are more at those places where population is more.

Fourth plot (median_income vs median_house_value) confirms our common sense that “median_house_value” should be more at the places where “median_income” is more and vice versa.

This is just an example.

More analysis can be done on this dataset but this would be out of scope for this tutorial.

ConclusionI have provided an introduction of object oriented interface of matplotlib.

The focus in this tutorial was to explain the Figure and axes objects and their relationship.

I will try to come up with a post where I do complete analysis on a dataset using pandas, matplotlib and numpy.

All the examples of this tutorial can be downloaded from my github gist account — https://gist.

github.

com/kapil1987.