Pretty displaying tricks for columnar data in PythonImprove how Python and its libraries show data and make it readableAadarsh VadakattuBlockedUnblockFollowFollowingMay 22For everyone who has extensively wrangled data using lists, Pandas, or NumPy before, you might have had experienced issues with printing the data in the right way.
Especially if there are a lot of columns, displaying the data becomes a hassle.
This article shows you how you can print large columnar data in python in a readable way.
To explain clearly, I am using the NYC Property sales data, which has a total of 21 columns.
This is what happens if you have a Pandas DataFrame with many columns and try to print it out with a regular print statement:import pandas as pdnycdata=pd.
head())Data is omitted from printingThis happens as Pandas will detect the number of columns it can fit in the space of your terminal window, which would not display exactly what we need.
Most data is omitted from printing to save terminal space on your screen.
To solve this, here are some ways to go.
The GOOD trickYou can increase the max number of columns Pandas lets you display, by adding this line to your code:pd.
max_columns = NoneThis removes the max column limit for displaying on the screen.
Here is how it looks when printed (printing the first 11 columns only for now.
)Print result after removing max column limitBut hey, that is not totally what we needed.
The data is split into multiple lines, with headers divided by backward slashes.
This is readable and can be useful, but is still not perfect.
To get over this, simply use this line with your code and you can get all the data in a single line fashion:pd.
width=NoneAnd this is how it looks like now (printed only 7 columns):Print result after removing max width limitBeware — most code editors don’t display big chunks of columnar data in a good way.
Your output will be cluttered in a totally unreadable way, and you need to maximize your output window to have a good look at all of the data.
PyCharm does a very good job in this case — it shows you a horizontal scrollbar where you can scroll through the data you have printed.
This would not be the case with a standard Terminal or editors like VSCode.
If you are using Terminal or VSCode, maximize the window and reduce the font size so that you can fit in the most columns in the limited space and have a pretty look at your data.
This is how the complete data is displayed in Terminal output (VSCode prints it in the same way):All columns in TerminalAnd this is how PyCharm displays the same data:PyCharm’s scrollbar printIf you are using PyCharm and would like to display the data similar to the above, or if you think maximizing and reducing font size would be acceptable — here is a better way to display your data.
The BETTER trick:You can use a library called “tabulate” to easily display data in columns.
It is as easy as wrapping a simple function to the print function used on the DataFrame.
from tabulate import tabulate.
print(tabulate(df,headers='firstrow'))and this is how tabulate displays your data:Tabulate’s printTo get rid of extra lines of code at the print statement, a simple lambda function can be written in this way:from tabulate import tabulatepdtabulate=lambda df:tabulate(df,headers='keys').
print(pdtabulate(df))What is more amazing, is you can select the format of printing from a variety of formats.
All you need to do is add the ‘tablefmt’ argument to the tabulate function and assign it with a print format of your choice.
My favorite of all is ‘psql’, which uses PostgreSQL’s formatting for displaying tabular data.
from tabulate import tabulatepdtabulate=lambda df:tabulate(df,headers='keys',tablefmt='psql').
print(pdtabulate(df))And this is how it looks like:Feels like SQL!If you have a necessity of converting the data into HTML tables, tabulate easily does that for you.
from tabulate import tabulatepdtabulate=lambda df:tabulate(df,headers='keys',tablefmt='html').
print(pdtabulate(df))Data printed to HTML TablesTabulate works very well even for large lists and huge NumPy arrays.
from tabulate import tabulatepdtabulate=lambda df:tabulate(df,headers='keys',tablefmt='psql')#Creating a list using garbage valueslist = [['a', 'b', 'c','d'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]print(pdtabulate(list))And this is how it looks:Normal vs tabulated prints of 2D listsHere is a sample NumPy array visualized using tabulate:Normal vs tabulated prints of a simple NumPy arrayFor more information about this amazing library and more to know more about different print formats, visit this page.
Though these tricks would be very helpful, the following trick shows our data in the best way possible.
The BEST trick:Forget about PyCharm, Terminal and tabulate — use Jupyter notebooks instead to display your data.
Jupyter notebooks display your data similar to the first case out of the box— they omit a few columns in their display.
Data displayed in a Jupyter Notebook — few columns are omittedTo get around that, use the same line used in the print example to display all columns of your data:pd.
max_columns = NoneAnd Jupyter shows a perfectly formatted HTML table for you:Data after removing max column limitEnd of the day.
If you are comfortable frequently using Jupyter notebooks, simply setting max columns to None will display all of your data at once.
But if you are a person who likes to write code in editors and not move to Jupyter notebooks to understand data, using the tabulate library is the best way to go.
Did you like this article?.Connect with me and follow me on LinkedIn!.Also, have a look at my portfolio!.I’m open to opportunities, so if you have an opportunity for me, I would love to talk to you!.You can get my info here.