1 quick tip for pulling data from a Pandas dataframe using SQL queriesGeorge SeifBlockedUnblockFollowFollowingJun 13Photo by Bruce Hong on UnsplashThe Pandas library is awesome for anyone that uses Python to analyse their data.
It’s very easy to use, quite fast when applied properly, and flexible in its capabilities.
With Pandas, a lot of the functions that would have normally required more work, for example retrieving some basic statistics about your data, are just a single function call away!Still, sometimes we’re more comfortable with one tool over another.
If you’re used to looking through your data on Excel, Tableau, or SQL, then switching to Pandas is still a bit of a jump.
There’s a great Python library for helping smooth over that transition if you’re coming from a background of SQL: Pandasql.
Pandasql allows you to write SQL queries for querying your data from a pandas dataframe.
This allows you to get around the normal requirement of having to learn a lot of Python in Pandas.
Instead, you can simply write your regular SQL query within a function call and run it on a Pandas dataframe to retrieve your data!Querying Pandas dataframes with PandasqlInstallingWe can install the Pandasql with a quick pip:pip install pandasqlLoading dataLet’s start working with an actual dataset.
We’ll load in the iris flowers dataset using the seaborn library:import pandasqlimport seaborn as snsdata = sns.
load_dataset('iris')SELECT-ingNormally, if we wanted to retrieve the first 20 items in our dataframe, we’d do something like this with pandas:data.
head(20)With pandasql we can write out our standard SQL query in the exact same way we normally do when running it on an SQL database.
Just pass the name of the pandas dataframe as the name of the table you are parsing and the data will be retrieved:sub_data = pandasql.
sqldf("SELECT * FROM data LIMIT 20;", globals())print(sub_data)The regular filtering operations that we can do with WHERE in SQL are also applicable.
Let’s pull all the data where petal_length is greater than 5 using pandas first:sub_data = data[data["petal_length"] > 5.
0]To do it in SQL, wecertain rows simply add our WHERE call to do the same filtering:sub_data = pandasql.
sqldf("SELECT * FROM data WHERE petal_length > 5.
0;", globals())We can of course, always select only the columns we want as well:sub_data = pandasql.
sqldf("SELECT petal_width, petal_length FROM data WHERE petal_length > 5.
0;", globals())And that’s how you can use SQL queries to retrieve data from a pandas dataframe.
Like to learn?Follow me on twitter where I post all about the latest and greatest AI, Technology, and Science!.Connect with me on LinkedIn too!Recommended ReadingWant to learn more about coding in Python?.The Python Crash Course book is the best resource out there for learning how to code in Python!And just a heads up, I support this blog with Amazon affiliate links to great books, because sharing great books helps everyone!.As an Amazon Associate I earn from qualifying purchases.