Database Normalization ExplainedLearn about database normalization by designing and modifying an example database schema!Lorraine LiBlockedUnblockFollowFollowingJul 2Normalization is a technique for organizing data in a database.
It is important that a database is normalized to minimize redundancy (duplicate data) and to ensure only related data is stored in each table.
It also prevents any issues stemming from database modifications such as insertions, deletions, and updates.
The stages of organization are called normal forms.
In this tutorial we will be redesigning a database for a construction company and ensuring that it satisfies the three normal forms:First Normal Form (1NF):Data is stored in tables with rows uniquely identified by a primary keyData within each table is stored in individual columns in its most reduced formThere are no repeating groupsSecond Normal Form (2NF):Everything from 1NFOnly data that relates to a table’s primary key is stored in each tableThird Normal Form (3NF):Everything from 2NFThere are no in-table dependencies between the columns in each tableNote that there are actually six levels of normalization; however, the third normal form is considered the highest level necessary for most applications so we will only be discussing the first three forms.
Let’s get started!This tutorial is adapted from Next Tech’s Database Fundamentals course which comes with an in-browser MySQL database and interactive tasks and projects to complete.
You can get started here for free!Our Database: Codey’s ConstructionCodey’s Construction’s database schema with a new table that causes the database to violate the rules of normalization.
The database we will be working with in this tutorial is for Codey’s Construction company (Codey is a helpful coding bot that works with you in the course mentioned earlier).
As you can see from the schema above, the database contains the tables projects, job_orders, employees, and project_employees.
Recently, the customers table was added to store customer data.
Unfortunately, this table has not designed in a way that satisfies the three forms of normalization… Let’s fix that!First Normal FormFirst normal form relates to the duplication and over-grouping of data in tables and columns.
Codey’s Construction’s table customers violates all three rules of 1NF.
There is no primary key!.A user of the database would be forced to look up companies by their name, which is not guaranteed to be unique (since unique company names are registered on a state-by-state basis).
The data is not in its most reduced form.
The column contact_person_and_role can be further divided into two columns, such as contact_person and contact_role.
There are two repeating groups of columns — (project1_id, project1_feedback) and (project2_id, project2_feedback).
The following SQL statement was used to create the customers table:Example data for `customers` table.
By modifying some columns, we can help redesign this table so that it satisfies 1NF.
First, we need to add a primary key column called id with data type INT(6):With this statement, we added an automatically incrementing primary key as the first column in the table.
To satisfy the second condition, we need to split the contact_person_and_role column:Here, we simply renamed it as contact_person, and added a column contact_person_role immediately after it.
To satisfy the third condition, we need to move the columns containing project IDs and project feedback to a new table called project_feedbacks.
First, let’s drop these columns from the customers table:And then create the project_feedbacks table:Here’s what the database schema looks like now:Modified schema that now satisfies 1NF.
As you can see, there are no more repeating groups in either the project_feedbacks table or the customers table.
We still know which customer said what since project_feedbacks.
customer_id refers back to the customers table.
Now our customers table satisfies 1NF!.Let’s move on to second normal form.
Second Normal FormTo achieve second normal form, a database must first satisfy all the conditions for 1NF.
After this, satisfying 2NF requires that all data in each table relates directly to the record that the primary key of the table identifies.
We are in violation of 2NF because the contact_person, contact_person_role and phone_number columns track data that relate to the contact person, not the customer.
If the contact person for a customer changes, we would have to edit all of these columns, running the risk that we will change the values in one of the columns but forget to change another.
To help Codey’s Construction fix this table to satisfy 2NF, these columns should be moved to a table containing data on the contact person.
First, let’s remove the columns in `customers` that are not related to our primary key:Note that we kept the contact_person_id so we still know who to contact.
Now, let’s create our new table contact_persons so we have somewhere to store data about each contact.
Codey’s Construction’s database schema now looks like this:Modified schema that now satisfies 2NF.
Now, if the contact person for a customer changes, the construction company just has to insert a record into the contact_persons table and change the contact_person_id in the customers table.
Third Normal FormFor a database to be in third normal form, it must first satisfy all the criteria for 2NF (and therefore, also 1NF).
Then, each column must be non-transitively dependent on the table’s primary key.
This means that all columns in a table should rely on the primary key and no other column.
If column_a relies on the primary key and also on column_b then column_a is transitively dependent on the primary key so the table does not satisfy 3NF.
Does your brain hurt from reading that?.Don’t worry!.It’s explained more below.
This is how the customers table looks after we have satisfied 1NF and 2NF:Example data for modified `customers` table.
The table currently has transitively dependent columns.
The transitively dependent relationship is between city and zip.
The city in which a customer is located relies on the customer, so this satisfies 2NF; however, the city also depends on the zip code.
If a customer relocates, there may be a chance we update one column but not the other.
Because this relationship exists, the database is not in 3NF.
To fix our database to satisfy 3NF, we need to drop the city column from customers, and create a new table zips to store this data:Modified schema that now satisfies 3NF.
That’s it!.Finding issues that violate 3NF can be difficult, but it’s worth it to ensure that your database is resilient to errors caused by only partially updating data.
I hope you enjoyed this tutorial on database normalization!.Codey’s Construction’s database now satisfies the three forms of normalization.
If you’d like to continue learning about databases, Next Tech’s Database Fundamentals course covers all you need to know to get started with databases and SQL.
By helping an interactive coding bot named Codey, you will learn how to create and design databases, modify data, and write SQL queries to answer business questions.
You can get started for free here!.. More details