Concepts: Normalization
Topics
This concept document provides a brief discussion of the topic of data normalization
as it applies to the development of the
Artifact: Data Model. It does not provide a full treatment of normalization,
because the subject is quite broad and has been documented in many texts on
database design. In [NBG01], normalization
is defined as "an analytic technique used to produce a correct relational
database design." In practice, normalization is a procedure for eliminating
redundancy in the Data Model by means of applying restrictive rules. Elimination
of data redundancy in the tables of the Data Model helps enforce referential
integrity of the data in the database.
Normalization is usually performed on the Data Model after an initial version
of the tables and their relationships has been developed in the model. The
exact timing of when to apply normalization depends on the specific project
situation and is up to the database designer. The
normalization process is applied to the tables in the Data Model in series of
steps in which each step applies rules that are stricter than the last.
Levels of Normalization
Normalization is hierarchically classified into numeric forms, with the most
common being first, second, and third normal form. Each level of normalization
is more restrictive than the previous. The first three hierarchical levels of
normalization are:
- First Normal Form-Repeating groups of data columns in tables have
been eliminated such that data is organized into atomic units.
- Second Normal Form-Data is in first normal form, and redundancy on
primary key fields has been eliminated such that column values are wholly
dependent on the primary key field.
- Third Normal Form-Data is in second normal form, and each column
is not dependent on any other non-key column.
Other levels of normalization are possible but are not covered in this discussion.
Information on additional levels of normalization can be found in [DAT99]. The
exact level of normalization to apply to the Data Model is a decision that the
database designer must make based on the specifics of the project situation.
|