Database Normalization – Normal Forms 1nf 2nf 3nf Table Examples

As a database grows in size and complexity, it becomes increasingly important to organize the data in a logical and consistent manner. This is where database normalization comes into play. Normalization is the process of structuring a relational database to reduce data redundancy and improve data integrity.

By normalizing a database, you can ensure that data is stored efficiently, updates can be made reliably, and the database is easier to maintain over time. There are several progressive "normal forms" a database can achieve, with each form building upon the previous one.

In this article, we‘ll take an in-depth look at the first three normal forms – 1NF, 2NF and 3NF. I‘ll explain the criteria for each form and walk through examples of normalizing tables from 1NF to 3NF. By the end, you‘ll have a solid grasp of how to normalize a relational database to meet the standards of good database design.

What is Database Normalization?

Database normalization is the process of organizing data in a relational database to minimize redundancy and dependency. The goal is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then logically propagated through the rest of the database.

Dr. Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the First Normal Form (1NF) in 1970. Codd later defined the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.

Normalization involves decomposing a table into less redundant tables without losing information. The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys.

Purpose of Database Normalization

The key objectives of database normalization are to:

Minimize data redundancy: By separating data into multiple tables based on their dependencies, normalization helps eliminate duplicated data. This saves storage space and prevents data inconsistencies.
Avoid data modification issues: Anomalies can occur when inserting, updating, or deleting records in a database that hasn‘t been normalized. These anomalies can lead to data inconsistency or loss of data.
Simplify queries: Normalized databases produce tables with unique rows, which makes it easier to write queries that select the correct data.

While normalization is beneficial for data integrity, it does come with some trade-offs. A fully normalized database often results in more tables, which can lead to slower performance due to the need for more joins in queries. Denormalization, which is the intentional introduction of redundancy into a table, is sometimes necessary to meet performance requirements.

Normal Forms: 1NF, 2NF, 3NF

Let‘s dive into the details of the first three normal forms, looking at the criteria for each form and examples of normalizing tables.

First Normal Form (1NF)

A table is in first normal form if:

There are no repeating groups of columns.
The table has a primary key.
Columns are atomic, meaning cells have a single value, not a list of values.
There are no duplicate rows.

Consider the following table which stores information about books and their authors:

BookTitle	AuthorName	AuthorBirthYear
Database Design	Adrienne Watt	1970
Database Design	Nelson Eng	1975
Web Development	Adrienne Watt	1970

This table is not in 1NF because:

The primary key (BookTitle) does not uniquely identify each row, as there are duplicate rows for "Database Design".
The "AuthorName" and "AuthorBirthYear" columns are not atomic, as they contain data about multiple authors.

To convert this table to 1NF, we need to:

Remove duplicate rows.
Identify a primary key that uniquely identifies each row. In this case, we can add a surrogate key "BookID".
Put the repeating author data into a separate table.

The 1NF version would look like this:

BookID (PK)	BookTitle
1	Database Design
2	Web Development

BookID (FK)	AuthorName	AuthorBirthYear
1	Adrienne Watt	1970
1	Nelson Eng	1975
2	Adrienne Watt	1970

Second Normal Form (2NF)

A table is in second normal form if:

It is in first normal form.
All non-prime attributes (columns that are not part of any candidate key) are fully dependent on the primary key. In other words, there are no partial dependencies.

Consider this table of customer orders:

OrderID	CustomerID	OrderDate	CustomerName	CustomerAddress
1001	101	2023-01-01	John Doe	123 Main St
1002	102	2023-01-02	Jane Smith	456 Park Ave
1003	101	2023-01-03	John Doe	123 Main St

The primary key here is OrderID. However, the CustomerName and CustomerAddress columns are dependent on the CustomerID, not the OrderID. This is a partial dependency and violates 2NF.

To make this table comply with 2NF, we split it into two tables:

OrderID (PK)	CustomerID (FK)	OrderDate
1001	101	2023-01-01
1002	102	2023-01-02
1003	101	2023-01-03

CustomerID (PK)	CustomerName	CustomerAddress
101	John Doe	123 Main St
102	Jane Smith	456 Park Ave

Now, the non-prime attributes in each table are fully dependent on the primary key.

Third Normal Form (3NF)

A table is in third normal form if:

It is in second normal form.
There are no transitive dependencies. A transitive dependency is where a non-prime attribute depends on another non-prime attribute.

Consider the following table about employees:

EmployeeID	EmployeeName	DepartmentID	DepartmentName	ManagerID	ManagerName
1001	John Doe	101	Sales	201	Jane Doe
1002	Bob Smith	102	Marketing	202	Mike Brown
1003	Alice Jones	101	Sales	201	Jane Doe

In this table, DepartmentName is transitively dependent on DepartmentID (it depends on DepartmentID which is not part of the primary key), and ManagerName is transitively dependent on ManagerID. This violates 3NF.

To bring this table to 3NF, we need to split it into three tables:

EmployeeID (PK)	EmployeeName	DepartmentID (FK)	ManagerID (FK)
1001	John Doe	101	201
1002	Bob Smith	102	202
1003	Alice Jones	101	201

DepartmentID (PK)	DepartmentName
101	Sales
102	Marketing

ManagerID (PK)	ManagerName
201	Jane Doe
202	Mike Brown

Now, each non-prime attribute depends only on the primary key in its table.

Benefits of Database Normalization

Normalizing a database offers several significant benefits:

Minimizes data redundancy: By separating data into multiple tables based on their dependencies, normalization reduces the duplication of data across the database.
Ensures data integrity: Normalization helps prevent data inconsistencies that can arise from data redundancy and modification anomalies.
Simplifies data maintenance: With a normalized database, updating data is simpler and less error-prone because you only need to update it in one place.
Facilitates database expansion: As your data grows, a normalized database is easier to extend without needing to modify existing structures.
Improves query performance: While normalization can lead to more complex queries due to the need for joins, it often results in faster query execution because tables are smaller and more efficiently organized.

When to Denormalize

Despite the benefits of normalization, there are situations where denormalizing a database can be beneficial. Denormalization is the intentional introduction of redundancy into a table to improve query performance.

You might consider denormalizing when:

Query performance is critical and the queries are running too slowly.
The queries frequently require data from multiple tables, leading to complex joins.
The database is read-heavy, with few write operations.

However, denormalization should be approached cautiously as it can lead to data inconsistencies if not managed properly. It‘s often best to start with a fully normalized database and only denormalize when necessary based on observed performance issues.

Higher Normal Forms

While we‘ve focused on the first three normal forms, it‘s worth noting that there are higher normal forms, namely:

Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Sixth Normal Form (6NF)

These higher forms deal with more complex dependencies and are less commonly used in practice. For most applications, achieving 3NF is sufficient to ensure a well-structured and efficient database.

Conclusion

Database normalization is a critical aspect of designing a relational database. By organizing data into tables based on their dependencies and ensuring that each table meets the criteria for 1NF, 2NF, and 3NF, you can create a database that minimizes redundancy, ensures data integrity, and is easier to maintain and extend over time.

While the process of normalization can seem complex at first, understanding the basic principles of the first three normal forms provides a solid foundation. By examining your data, identifying functional dependencies, and splitting tables as necessary, you can progressively normalize your database to achieve a well-structured design.

Remember, normalization is a balancing act. While a fully normalized database provides the best data integrity, there may be situations where denormalization is necessary for performance reasons. The key is to start with a normalized design and carefully introduce redundancy only when it‘s justified based on actual performance needs.

By mastering database normalization, you‘ll be able to design databases that are efficient, reliable, and scalable, setting the stage for building robust and data-driven applications.

Database Normalization – Normal Forms 1nf 2nf 3nf Table Examples

What is Database Normalization?

Purpose of Database Normalization

Normal Forms: 1NF, 2NF, 3NF

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Benefits of Database Normalization

When to Denormalize

Higher Normal Forms

Conclusion

Related

How to Connect your Microsoft SQL Server Docker Container with Azure Data Studio

How to Connect Your AWS RDS Microsoft SQL Server Using Azure Data Studio

Firestore: Mastering the Free Tier for Full-Stack Developers

What is a Relational Database? RDBMS Definition

Horizontal vs. Vertical Scaling: A Comprehensive Guide to Scaling Your Database

Joining Real-Time Data from MySQL and Postgres with Materialize

What is Database Normalization?

Purpose of Database Normalization

Normal Forms: 1NF, 2NF, 3NF

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Benefits of Database Normalization

When to Denormalize

Higher Normal Forms

Conclusion

Related

Similar Posts