From Excel To Databases with PythonLearn how to use Python for quick data analysisCostas AndreouBlockedUnblockFollowFollowingMay 5As a business analyst, not a day goes by that I do not find myself fiddling with some data in an excel spreadsheet.
Talk to anyone working in an analytical role, and they will tell you about their love-hate relationship with excel.
For all the good things that excel can do; it’s simply a pain to work with when it comes to larger data sets.
Pivots take for ever to load, the machine runs out of memory, and before you know it the whole thing becomes unmanageable.
Not to mention that excel can only support up to 1,048,576 rows.
Sure, you could consider doing things in VBA, but what’s the point.
Photo by Mika Baumeister on UnsplashIf only there were an easy way to transfer data into a SQL Database, do your analysis and then delete it all.
Well, this is where Python swoops in to save the day.
SQL In PythonTo begin with, let us explore the most popular options when it comes to SQL in Python.
The two most popular SQL DBs to work with in Python is MySQL and SQLite.
MySQL has two popular libraries associated with it: PyMySQL and MySQLDb; while SQLite has SQLite3.
SQLite is what is known as an embedded database, which means it runs within our application and hence it is not required to be installed somewhere first (unlike MySQL).
This is an important difference; and pivotal in our quest for quick data analysis.
As such, we will go ahead and learn how to use SQLite.
Photo by Hitesh Choudhary on UnsplashSetting up SQLite in PythonThe first thing we need to do is import the library:import sqlite3Then we need to determine whether we would like to save this database anywhere or simply hold it in memory while our application in running.
If decided to actually save the database down with any of the data imported, we would then have to give the DB a name, say ‘FinanceExplainedDb’, and have the following command:dbname = 'FinanceExplainedDb'conn = sqlite3.
connect(dbname + '.
sqlite')On the other hand, if we wanted the whole thing in memory, and for it to vanish when we were done, we could use the following command:conn = sqlite3.
connect(':memory:')At this point, SQLite is all set up and ready to be used in Python.
Assuming we had some data loaded in the DB under Table1, we could execute SQL commands in the following way:cur = conn.
execute('SELECT * FROM Table1')for row in cur: print(row)Let us now explore how we can make our data available through our application using Pandas.
NB: SQLite has a very cool browser should you decide you want to run your SQL commands manually and on the fly after you’ve created your db.
You can find it here.
Photo by Ales Krivec on UnsplashUsing Pandas to load data in our applicationAssuming that we already have the data, we would like to carry analysis on, we can use the Python Pandas library to do it.
First, we need to import the Pandas library and then we can load the data in a data frame (You can think of data frames as an array of sorts):import pandas as pd#if we have a csv filedf = pd.
csv')#if we have an excel filedf = pd.
xlsx')Once we have loaded the data, we can put it straight into our SQL Database with a simple command:df.
to_sql(name='Table1', con=conn)If you are loading multiple files within the same table, you can use the if_exists parameter:df.
to_sql(name='Table1', con=conn, if_exists='append')Memory ConsiderationsWhen it comes to dealing with larger data sets, we will not be able to use this one-line commands to load the data.
Our application will run out of memory.
Instead, we will have to load our data little by little.
For this example, let’s assume we will load 10,000 rows at a time:chunksize = 10000for chunk in pd.
csv', chunksize=chunksize): chunk.
to_sql(name='Table1', con=conn, if_exists='append')Photo by Sidharth Bhatia on UnsplashBringing it all togetherTo bring everything together, I have decided to give you a Python script that covers most of the things we talked about.
The script will do the following things:Load some sample data from a Python libraryWrite the data out to a CSVLoad the data back into our application through the CSV in a data frame chunk by chunk and put in a DBThen execute a SELECT statement on the databaseimport sqlite3, pandas as pd, numpy as np#####Creating test data for us — you can ignorefrom sklearn import datasetsiris = datasets.
load_iris()df1 = pd.
c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])df1.
csv',index=False)###########################conn = sqlite3.
connect(':memory:')cur = conn.
cursor()chunksize = 10for chunk in pd.
csv', chunksize=chunksize): chunk.
columns = chunk.
replace(' ', '_') #replacing spaces with underscores for column names chunk.
to_sql(name='Table1', con=conn, if_exists='append')cur.
execute('SELECT * FROM Table1')names = list(map(lambda x: x, cur.
description)) #Returns the column namesprint(names)for row in cur: print(row)cur.
close()There you have it guys.
A brief introduction on how you can import your data into a database for quick analysis.
Do you have any other tips or tricks?.. More details