Computersight > Programming

Normal Forms of Relations

An overview of four of the normal forms used in modeling of relations, to make them consistent and secure.

Given a relation schema, we need a way to decide whether or not it is a good design, or if we need to decompose it into smaller relations. To help guide us through these decisions, normal forms were defined. If we can say that a relation is of a certain normal form, we know that certain problems cannot arise.

First Normal Form

A relation is said to be in First Normal Form (1NF) if and only if each attribute of the relation is atomic. This means that each columns (visualizing the relation as a table) only contains a single value and that each row contains the same columns.

Violation of 1NF is commonly achieved by storing lists of data in the same field (a comma-separated list for example), which will make it difficult to write queries comparing the separate elements in the list to other entries' elements.

IdIndexes
00,4,6
13,4

Also, by storing the list as separate fields containing the same data, there is also a breach of 1NF.

IdIndex (0)Index(1)Index(2)
0046
134

The desired solution in this case would be:

IdIndex
00
04
06
13
14

Second Normal Form

A 1NF relation is in Second Normal Form (2NF) if and only if all non-key attributes are dependent on the whole of the candidate key, and not just a part of it.

A side-effect of this definition is that all relations in 1NF that has no composite candidate keys are automatically in 2NF.

Given the following relation:

IdInterestAge
0Music18
0Football18
1Stamps46

As you can see, the non-key attribute Age is dependent only on part of the key, namely Id, therefore, this relation is not 2NF. A conversion to make it 2NF is as follows:

IdInterest
0Music
0Football
1Stamps

IdAge
018
146

An Alternative Notation

To make all this easier to work with, we will introduce a more clear notation that gives us only the necessary information. In the case above we could have written it as follows:

R(ABC), F = { A->C, A->B, B->A }, where A = Id, B = Interest and C = Age. One could use the actual names instead of letters, but I find single letters easier to read and understand.

R(ABC) denotes the relation and its attributes, and F denotes the dependencies, where C depends on A, and A and B depends on each other. Analyzing the dependencies will give us the candidate keys, in this case it's only AB, as both A and B is needed to fully identify the entry, but C is not needed, as no other attribute depends on it (note that a different attribute must imply C, or C would be needed as well).

Transforming these into 2NF results in the following:

R(AB), F = { A->B, B->A }, which gives the key AB.

R(AC), F = { A->C }, which gives the key A.

Third Normal Form

A 2NF relation is in Third Normal Form (3NF) if and only if for all the non-trivial functional dependencies X->A, one of the following is true:

  • X is a superkey (X identifies the whole entry unambiguously)
  • A is a prime attribute (i.e. A is part of a candidate key)

To violate 3NF, consider the following relation:

R(ABCD), F = { AB->CD, C->D }, where the key would be AB.

This is a breach of 3NF, as the non-prime attribute D can be identified by C, which is not a superkey. A decomposed relation in 3NF would be:

  • R(ABC), F = { AB->C }, where the key would be AB.
  • R(CD), F = { C->D }, where the key would be C.

Most relations in 3NF are not affected by update, insertion and deletion anomalies, such as updating of redundant data. For instance, if we look at the non-3NF example, when D changes, the CD-pair can be represented multiple places, and unless all of them are updated correctly, an inconsistency will occur. In the 3NF version, this problem will not occur, as the data is only present at one place.

Boyce-Codd Normal Form

This is a slightly more strict version of 3NF, and imposes just one other requirement. Namely, for every non-trivial functional dependency X->A, X is a candidate key.

An example of a 3NF relation that's not BCNF could be the following (using a table to suggest the actual relations):

A=EmployeeIDB=EmployeeSSNC=CustomerIDD=CustomerSSN

A and B are related, and C and D are related. The relation is supposed to model which employee handles which customer in a company. Since both the IDs and the Social Security Numbers are valid identifiers of both customer and employee, there is an implication between all single attributes, as follows:

R(ABCD), F = { A->B, B->A, C->D, D->C, A->C, C->A, B->D, D->B, etc.}

But the candidate keys are always two of the attributes (one that identifies the employee, and one that identifies the customer).

As you can see, the there are dependencies on attributes that are not the whole candidate key, therefore it violates BCNF.

Conclusion

There are even higher normal forms in use (4NF and 5NF), but I will not cover them here. I'm certain a search on Google will give you some information.

Using normal forms is a very efficient and safe way to model and decompose relations that are not prone to update errors and inconsistencies, and an understanding of these are very beneficial and can save a lot of time in error tracking.

1
Liked It
I Like It!
Related Articles
Network Operating System Topology  |  Review: Three Database Design Methods
More Articles by Newstead
The Phantom Problem  |  The Definite Three Step Guide to Creating Your Own RPG
Latest Articles in Programming
Microsoft Sql Server 2005: An Easy Task  |  SQL and Databases
Comments (0)
Post Your Comment:
Name:  
Copy the code into this box:  
Post comment with your Triond credentials?
Inside Computersight

Communication & Networks

 /

Computers

 /

Hardware

 /

Operating Systems

 /

Programming

 /

Software


Popular Tags
Popular Writers
Powered by
Computersight
About Us
Terms of Use
Privacy Policy
Services
Submit an Article
Advertise with Us
Contact

© 2007 Copyright Stanza Ltd. All Rights Reserved.