Data Visualization With MatPlotLib Using PythonData visualization using pythonPramod ChandrayanBlockedUnblockFollowFollowingJun 19I Feel:In today’s digital world data has become as important as air.
Machines & humans both are literally breathing in & breathing out data data and data….
People are consuming and generating huge volumes of data knowingly and unknowingly on a daily basis.
It is this bombardment of digital information is what current businesses are trying to tap and harness to sell and engage their customers more.
All types of Industries are bringing a personal touch into their services and offerings to give awesome user experience to their customers.
All these have become possible due to powerful Data science enabled AI/ML techniques which are empowering our machines, allowing them to take analytical decisions based on a sea of data accessible to them.
In order to analyze this huge data sets our machines make use of some really powerful data visualization packages built in Python.
So we will try to captureWhat Is Data Visualization?2.
What are the Data Visualization Packages ?3.
How To Use Them ?4.
Why You Should Learn Them ?In this series on Data visualization using pythin which we will brak in many parts.
Data Visualization In Data Science:As we know our human mind is trained to understand more by images.
So the saying goes “A picture is worth a thousand words”.
This is completely relevant when you are learning Data science.
You will be dealing with large volume of data sets which needs visual expression to make some sense in deducing valuable hidden patterns.
Data visualization is a technique in data science field, allowing you to tell a compelling story, visualizing data and findings in an approachable and stimulating way.
It makes complex data look simple and easy to understand.
Data Visualization Tools:We will try to cover some of the popular data visualization tools givens belowMatplotlibSeabornPlotlyPandasLearning how to leverage these software tool to visualize data will help you make sense of data , extract meaningful information and plot it visually to make more effective data driven decision.
So let’s get started with Matplotlib which we will cover in today’s piece of article, rest we will cover in an upcoming series of Data visualization.
A: Matplotlib:As per official Matplotlib Portal:Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.
Matpotlib is widely used tool for data visualization, which works great at low-level with a Matlab like UI interface and offers you lot of flexibility in terms of writing code, yes, it can be sometime tedious writing more codes but it is worth with the kind of freedom it gives.
Installing Matpotlib:Using PIP:python -m pip install -U pippython -m pip install -U matplotlib2.
Using Scientific Python Distribution :There are many third party scientific distributions likeAnacondaCanopyActivestateAnaconda is my personal favorite , it is one of the popular python data science distribution which gives you hassle free installation of all data science related packages and comes pre-loaded with Numpy, SciPy, Pandas, Matplotlib, Plotly, etc.
I would recommend you all to install it and you will be all set in few seconds.
You can install any package using conda command prompt/terminal by using conda terminal, though you need to visit to the package official site to get the exact command format.
conda install PackageNameFor Matpotlib:conda install matplotlibVarious types of data visualization matpolotlib provides are :Lines, bars and markersImages, contours & fieldsPie & polar chartsStatistical level Plotting& many more.
They are widely used for line chart, bar chart, histogram, pie-chart etc.
For detail visit gallery section by clicking on the link belowGallery – Matplotlib 3.
0 documentationThis gallery contains examples of the many things you can do with Matplotlib.
Click on any image to see the full image…matplotlib.
orgPlotting With MatplotLib: Let’s Learn By Examples:As discussed, Matplotlib facilitates various kinds of plot ranging from scatter plots, to bar charts, to histogram.
The selection is totally contextual and is made based on our data visualization requirements like group comparison, comparing two quantitative variables to each other, or to understand data distribution etc.
We will cover few popular plotting techniques here:Basic Requirements :Before we start getting our hands dirty with some real examples, we need to be ready with few installations :Install Anaconda Distribution:1.
First, you need to ensure anaconda is installed :Use the given link below to learn the installation process.
It is easy and you can get started in few seconds:Installation – Anaconda 2.
0 documentationOn Windows, macOS, and Linux, it is best to install Anaconda for the local user, which does not require administrator…docs.
comLaunch Jupyter Notebook:Once you are done with installation of anaconda distribution, open the anaconda navigator on your computer and launch Jupyter notebook as shown in the image below.
We will be using Jupyter notebook to code our examples.
Check for the Pre-requisite Package Installation:Refer the below given image: Go to Environments menu option and you will see various pre-installed packages on the right.
Search for Pandas and you will see that is pre-installed, similarly you can type in the required package and discover them to install if not already installed though Anaconda Navigator.
Check and ensure matplotlib, numpy, pandas, seaborn etc are pre-installed and install them if it is not installed.
Once you are done with required package installation, let’s get started with our first plot called Bar Chart:Some Key Points About Matplotlibs To Be Remembered:Matplotlib has a important module called pyplot, which aids in plotting figure.
The Jupyter notebook can be used for running the plots, it gives hassle free experience and is easy to get started .
We have to import matplotlib.
pyplot as plt for making it call the package module.
You can Import required libraries and dataset to plot using Pandas pd.
plot()for plotting line chart similarly in place of plot other functions are used for plotting.
All plotting functions require data and it is provided in the function through parameters.
xlabel , plt.
ylabel for labeling x and y-axis respectively.
xticks , plt.
yticks for labeling x and y-axis observation tick points respectively.
legend() for signifying the observation variables.
title() for setting the title of the plot.
show() for displaying the plot.
Bar Chart Plotting:Bar Plotting Example :#Here we import ther matplotlib package with alias name as pltimport matplotlib.
pyplot as plt plt.
bar([1,3,5,7,9],[5,2,7,8,2], label=”Example one”)plt.
bar([2,4,6,8,10],[8,6,2,5,6], label=”Example two”, color=’g’)plt.
title(‘Wow! We Got Our First Bar Graph’)plt.
show()Copy the above code and paste it in your Jupyter notebook, run it and you will be able to see the bar plot visuals as shown below:Explanation:After we import matplotlib data visualization package its submodule pyplot has got this bar method which helps you plot a basic bar graph ;Here plt.
bar method can be better understood by the explanation given below.
bar(x, height, width=0.
8, bottom=None, *, align='center', data=None, **kwargs)[source]So to Make a bar plot: The bars are positioned at x with the given alignment.
Their dimensions are given by width and height.
The vertical baseline is bottom(default 0).
Each of x, height, width, and bottom may either be a scalar applying to all bars, or it may be a sequence of length N providing a separate value for each bar.
For more detail visit:matplotlib.
bar – Matplotlib 3.
0 documentationThe optional arguments color, edgecolor, linewidth, xerr, and yerr can be either scalars or sequences of length equal…matplotlib.
Histrogram:A histogram is a plot of the frequency distribution of numeric array by splitting it into small equal-sized bins.
Histograms are used to estimate the distribution of the data, with the frequency of values assigned to a value range called a bin.
If you want to mathematically split a given array to bins and frequencies, use the numpy's histogram() method .
If you want to measure distribution of numeric values you can do so with .
hist() plot method to create a simple histogramMatplotlib provides the functionality to visualize Python histograms out of the box with a versatile wrapper around NumPy’s histogram():Example:#Histogram Code import matplotlib.
pyplot as pltimport numpy as np #importing numpy package for array generationnp.
set_printoptions(precision=3)>>> d = np.
laplace(loc=15, scale=3, size=500)>>> d[:5]# An "interface" to matplotlib.
hist() methodn, bins, patches = plt.
hist(x=d, bins='auto', color='#0504aa', alpha=0.
title('My First Histogram Ever')plt.
text(23, 45, r'$mu=15, b=3$')maxfreq = n.
max()# Set a clean upper y-axis limit.
ceil(maxfreq / 10) * 10 if maxfreq % 10 else maxfreq + 10)Explanation:The pyplot.
hist() in matplotlib lets you draw the histogram.
It required the array as the required input and you can specify the number of bins needed.
A plot of a histogram uses its bin edges on the x-axis and the corresponding frequencies on the y-axis.
In the chart above, passing bins='auto' chooses between two algorithms to estimate the “ideal” number of bins.
At a high level, the goal of the algorithm is to choose a bin width that generates the most faithful representation of the data.
Output of source code: #Histogram Code mentioned above:3.
Scatter Plot:A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data.
If the points are coded (color/shape/size), one additional variable can be displayed.
The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.
A scatter plot can suggest various kinds of correlations between variables with a certain confidence interval.
For example, weight and height, weight would be on y axis and height would be on the x axis.
Correlations may be positive (rising), negative (falling), or null (uncorrelated).
If the pattern of dots slopes from lower left to upper right, it indicates a positive correlation between the variables being studied.
If the pattern of dots slopes from upper left to lower right, it indicates a negative correlation.
Scatter plot Method format :matplotlib.
scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, edgecolors=None, *, plotnonfinite=False, data=None, **kwargs)[source]x, y : array_like, shape (n, )The data positions.
s : scalar or array_like, shape (n, ), optionalThe marker size in points**2.
Default is rcParams['lines.
markersize'] ** 2.
c : color, sequence, or sequence of color, optionalFor more detail about using the scatter plot method please refer the given link below:matplotlib.
scatter – Matplotlib 3.
0 documentationEdit descriptionmatplotlib.
orgScatter Plot Example:#scatter plot lib example using matplotlbimport numpy as npimport matplotlib.
pyplot as plt# Create dataN = 100x = np.
rand(N)y = np.
rand(N)colors = (0,100,255)area = np.
scatter(x, y, s=area, c=colors, alpha=0.
title('Scatter plot example using matplotlib')plt.
show()Compile the code on your jupyter notepad and you will see the outcome as given below:Understanding Data Visualization Through Real Data Sets :We will be using the automobile data set, which we have downloaded from kaggle, to understand data visualization using MatplotLib:Automobile DatasetDataset consist of various characteristic of an autowww.
comAlways Remember:Download the Automobile.
csv file from the above linkUpload the file Jupyter into your working directory where your current code files lie.
Plotting Histogram : Using grouping data categorically :We can have multiple histogram plots in the same plot.
This helps you to compare the distribution of a continuous variable grouped by different categories.
To understand it, we will be using Automobile.
csv data sets:Reading Data Sets:import pandas as pd#Reading data frm the automobile #data sets using pandas read methoddf = pd.
head()#When you compile this code you will see the below given o/p as a series of data column wise.
Let’s compare the distribution of car horsepower for different type of car make in above given data set of Automobile.
csvWrite/Copy-paste below given code in your jupyter notebook file:import matplotlib.
pyplot as plt#is you don't want to make a regular call on #plt.
show use this line%matplotlib inlinex1 = df.
make=='alfa-romero', 'horsepower']x2 = df.
make=='audi', 'horsepower']x3 = df.
make=='bmw', 'horsepower']x4 = df.
make=='toyota', 'horsepower']x5 = df.
make=='volvo', 'horsepower']kwargs = dict(alpha=0.
hist(x1, **kwargs, color='g', label='alfa-romero')plt.
hist(x2, **kwargs, color='b', label='audi')plt.
hist(x3, **kwargs, color='r', label='bmw')plt.
hist(x4, **kwargs, color='y', label='toyota')plt.
hist(x5, **kwargs, color='y', label='volvo')plt.
set(title='Horse power Varitation for various make of a car', ylabel='Frequency')#plt.
legend();Below is a histogram Plot plotted against the given set of values usingYou can clearly make out that the larger concentration of horsepower lies between 110–120 hp .
Scatter Plot :Let’s plot a data distribution using scatter plot.
Here we will try to see price distribution based on body_style of car .
Copy /Paste the below given code in your Jupyter notebook and compile it# Scatter Plotimport matplotlib.
pyplot as plt%matplotlib inlineimport pandas as pddf = pd.
csv’)bodystyle = df[‘body_style’] #fetching bodytype values rprice = df[‘price’] #fetching price for different body typeplt.
scatter(bodystyle, price, edgecolors=’r’)plt.
title(‘Price variation based on car body type’)Output:Observation :You can see that there is a lot of data density around sedan type car and price mostly falls in the budget range of $10K to $15K .
Second most used car body type comes out to be a hatchback.
Wagon type mostly falls in low-budget range.
What’s Next :There are more plots which we have not covered yet, like:Violin plotStacked plotStem PlotLine PlotBox PlotWhich we will cover in part 2 of this series on Data Visualization.
Also, we will cover“Data visualization using Seaborn package in detail “When to Use What Type Of Data Visualization Plots/Charts?Leaving you all, with this wonderful pictorial representation of a data visualization graph type, which explains what type of graphs you can choose based on your data analysis requirements:Summing Up:It is absolutely recommended to add Data Science understanding for all software engineers who wants to take advantage of the all the amazing opportunity this field of data engineering is poised to offer.
With data engineering augmented with AI/ML techniques you can really grow fast and become a instrument of change for your organization or your own startup.
The Future will be all about data analysis, data prediction, product recommendations and process automation, all these will require a lot of data engineers who can help organizations to make accurate, fast and intelligent decisions regarding services and product offerings.
So stay tuned with me and keep checking for my new articles, i will be constantly sharing Data Science/AI/ML related useful stuff for all you awesome readers of @MediumThanks A Lot….
For Being There And Supporting.