Definitive Guide of Data ProfessionalsRelational Database Management (RDBMS) Basic for Data ProfessionalsBasic RDBMS with Python SQLite3 and SQLAlchemyVincent TatanBlockedUnblockFollowFollowingJun 23Source from UnsplashData scientists need to work with Database on daily basis.
As data analysts and engineers, we need to be proficient in SQL and Database Management.
Knowing RDBMS will help us access, communicate and work on data.
It will allow us to store and filter alternative data much more quickly and robust.
In this tutorial, we will learn how to:Setup SQLite Disk ConnectionCreate Table with StructureInsert Data Frame values into the tableIn this tutorial we will learn two ways to execute in Python.
The first one is to use SQLite 3, where we will use Python and SQL to execute each learning point.
But, we will also briefly talk about the SQL Alchemy which allows us to execute these learning points just by within 4 lines of code.
(No joke!)Before that, let us talk about RDBMSWhat is RDBMS?A relational database is a type of database.
It uses a structure that allows us to identify and access data in relation to another piece of data in the database.
Often, data in a relational database is organized into tables.
— CodecademyRelational database uses tables which are called records.
These records possess many columns with different names and data types.
We can then establish connections among records by using primary key and foreign key to identify table schema relationships.
Data Schema Sample for RDBMSWhy not just .
csv or .
xlsx compared to RDBMS (SQL)?Today, there are many limitations of Excel and csv to store our data needs which could be resolved with RDBMS:CapacityThe data ecosystem changes every day — What is considered big and fast today, might not be so tomorrow.
This means that we need a dedicated storage which could flexibly host large amount of data.
We need a more scalable storage than Excel and csv.
RDBMS is the solution — it allows scalability based on the server distribution rather than Excel who has limited amount of rows and columns (1,048,576 rows by 16,384 columns).
Dependency with Defined RelationshipRDBMS allows users to establish defined relationships between tables.
This will give users a complete pictures of data definitions.
For example in your shopping receipt, you might several entities such as Product description, Price of item, Store Branch Location, etc.
All of those could be separated and joined based on needs.
Analysis is not Separated from DataWe can now store data separately from our analysis.
In Excel, we need to manage different versions to collaborate with your teammates.
Each of the file needs to combine different versions of data and analysis.
But, in RDBMS, we can now use SQL instructions to reproduce and analyze data separately.
This way, we can make sure your teammates generate the updated data and analysis from a centralized data server.
Refer to this Codecademy article if you want to know more.
Excel to SQL: Why You Should Make the SwitchIf you use spreadsheet applications like Excel or Google Sheets at work, these problems might be familiar to you: It's…news.
comFor data professionals, this skill is valuable.
It creates a one stop data storage where everyone comes and leave with the same updated data from their SQL instructions.
Why Do We Choose SQLite Over PostgreSQL?SQLite provides a lightweight C Library for disk-based database which allows SQL to process CRUD process.
This means we could rely on SQLite for many small applications/use cases:SQLite for quick and easy internal data storageSQLite to develop small prototype quicklySQLite to host Proof of Concept (POC) before migrating to larger databases by PostgreSQL or Oracle.
PostgreSQL is a very advanced open source database to provide a dedicated data server to run its database.
But, SQLite provides a lightweight setup which does not require a dedicated data server.
If our data needs include proper administration and security, then PostgreSQL will be the proper choice.
Otherwise, SQLite will do.
SQLite vs PostgreSQL – Which database to use and why?SQLite and PostgreSQL are among the most widely used relational database management systems (RDMS).
They are both…tableplus.
ioTo build the POC for this article, we will use SQLite.
But feel free to try using PostgreSQL.
Inserting Data Extraction from Lazada into SQLiteProblem StatementWe will reuse the problem where we extract critical products information from Lazada.
Instead of exporting it into csv, we will export it into a SQLite Database.
product_df to store scraped information at Lazada WebsiteIf you are not familiar with this, feel free to skim through my article below.
In 10 minutes: Web Scraping with Beautiful Soup and Selenium for Data ProfessionalsExtract Critical Information Quickly with BS4 and Seleniumtowardsdatascience.
comFor this Tutorial, we would use SQLite3 first to generate connection to SQLite Engine.
This will alllow us to execute SQL commands to insert the values.
After that, we will take a look into SQLAlchemy to shorten and simplify this process without creating any SQL Commands.
Python SQLite3Connect to SQLite DiskWe first establish the connection to a disk file lazada.
db, which is a disk used by SQLite engine to store data.
If the lazada.
db does not exist, it will create a new one which we could connect next time.
import sqlite3conn = sqlite3.
db")c = conn.
cursor()Notice that when we open up the connection,we also establish a cursor.
A database cursor is a tool to traverse over database records.
Using this cursor, we can create tables and execute SQL commands into the database disk.
Create lazada_product TableAfter connecting our SQLite3 to lazada.
db, we will use the cursor to execute SQL query and create lazada_product table.
We will identify our metadata as following.
execute(''' CREATE TABLE lazada_product ( time date_time , id INTEGER , link TEXT NOT NULL, product_title TEXT NOT NULL, product_price DOUBLE NOT NULL, category TEXT NOT NULL, PRIMARY KEY (time, id) ); ''')Notice how we appoint time and id as the primary key.
This means every row has unique id and date time.
If we insert rows with the same id and date time; SQLite will complain and return a duplicate error.
This validation is useful to prevent unclean redundant data to enter the database.
Insert df rows to lazada_product tableLet us insert the extracted product_df to lazada_product table.
def write_from_df_with_sqlite3(df): for index, row in df.
execute( ''' INSERT INTO lazada_product VALUES (CURRENT_TIMESTAMP,?,?,?,?,?) ''', (row['id'], row['link'],row['product_title'],row['product_price'], row['category']) )Once you run this method, you will successfully dump every value into your lazada_product table.
Congratulations you have created RDBMS tables and inserted data into itNotice that there is a limitation to SQLite3 Python.
The code can be hard to read as you combine SQL Commands with Python code in one file.
It also looks verbose.
Therefore, we will take a look using SQLAlchemy to execute table and data insertion in a shorter and SQL-free method.
Python Object Relational Mapper (ORM) SQLAlchemySQLAlchemy is a Python ORM to activate DB Engines.
It creates a pythonic wrapper on top of SQL executions for SQLite.
This allows you to run logic to the table without touching any SQL Command Code.
What is Object Relational Mapper (ORM)?ORM provides a high level abstraction to allow developers to write Python Code to invoke SQL for CRUD and schemas in their database.
Each developer can use the programming language they are comfortable with instead of dealing with SQL statements or stored proceduresConnecting to SQLite Enginefrom sqlalchemy import create_enginedisk_engine = create_engine('sqlite:///lazada_alchemy.
db')Create lazada_product Table and Insert df rowsdef write_from_df_with_alchemy(df): df.
to_sql('lazada_product', disk_engine, if_exists='append')Look at how clean and short the code is.
After we run this method, we instantly create the table with default settings based on our df datatypes.
While at the same time, we will append the values into the lazada_product table without touching any SQL.
Therefore, the biggest of SQLAlchemy is to facilitate high level abstractions of SQL commands to help Python developers extract data using the same language.
Executing SQL without Running SQL Queries.
— SQLAlchemyOf course, this should not replace the importance of knowing SQL Language.
We can handle more complexities better with SQL.
However, to code Data Table Setups and Insertions, SQLAlchemy will save you much time and hassle.
Accessing our SQLite DiskTo check out the contents of our disks, we could interact with the following Web Dashboard Tool.
No DownLoad, No Install.
Online test SQL script…sqliteonline.
comFrom here you can insert your disk as a file and write a simple SQL select statements.
SELECT * from lazada_productClick run and this will display all of your data in lazada_product.
The results after inserting lazada_product inside the table.
In total there are 108 products appendedCongratulations, you have learnt RDBMS and Insert Values using Python SQLite3 and SQLAlchemyConclusionRDBMS provides many benefits to csv or excelsheet due to its larger capacity, dependency check, and separation of analysis and dataCreating a simple RDBMS does not take much time, we can use SQLAlchemy to create the schema just by within 4 lines of code.
We can resume reading and CRUD operations by using SQLite Browser online or download Linux or Microsoft SQLite Browser.
Purpose, Github Code and Your ContributionsVincentTatan/Web-ScrapingWeb Scraping with Beautiful Soup and Selenium.
Contribute to VincentTatan/Web-Scraping development by creating an…github.
comFeel free to clone the repository and contribute.
Finally…I really hope this has been a great read and a source of inspiration for you to develop and innovate.
Please Comment out below to suggest and feedback.
Happy coding :)About the AuthorVincent Tatan is a Data and Technology enthusiast with relevant working experiences from Visa Inc.
and Lazada to implement microservice architectures, business intelligence, and analytics pipeline projects.
Vincent is a native Indonesian with a record of accomplishments in problem solving with strengths in Full Stack Development, Data Analytics, and Strategic Planning.
He has been actively consulting SMU BI & Analytics Club, guiding aspiring data scientists and engineers from various backgrounds, and opening up his expertise for businesses to develop their products .
Please reach out to Vincent via LinkedIn , Medium or Youtube Channel.