Design Access databases with normal forms and Excel

Power User Corner

By Colin Wilcox

This article explains how to use Microsoft Excel to create prototypes of your Access databases and how to make sure your data adheres to a set of rules called normal forms. Normal forms aren't hard to understand, and they're critical to designing useful databases.

Applies to
Microsoft Access 2002

To become a true Access power user, you need to understand a key database design concept called normalization. Normalization is the process of ensuring that your data conforms to sets of rules called (logically enough) normal forms. When you normalize your data, you eliminate redundant data and ensure that the information in each table is properly linked.

Normalizing enables your databases to work faster, it saves storage space (hey, disk space may be cheap, but it isn't free), and it also helps protect your data against corruption. If you're getting nervous about this, don't worry. You don't need to be a professional to normalize your data, and you can even try this exercise at home. We'll use Excel because prototyping your databases in Excel allows you to "make all your mistakes on paper," in a sense, before you go through the harder work of creating a new database, defining data types, and so on in Access.

The academics who study database design usually identify seven normal forms. This column explains how to make your data conform to three of the forms: first, second, and third normal forms.

  • First normal form Places your data in columns and rows.
  • Second normal form Adds a key—a column of values that uniquely identify each row in each table—to your data.
  • Third normal form Ensures that each table contains unique data. In other words, it ensures that a table of customer identification data does not contain order data, and so on.

The steps in the rest of this column walk you through the process of using Excel as they explain the normal forms.

A quick look at denormalized data

One of the best ways to understand normalized data is to look at denormalized data—data that does not adhere to the normal forms. Follow these steps:

  1. Download and install the sample spreadsheet, SpreadSheetDesignData.xls, from the Microsoft Download Center.
  2. Start Excel, open the spreadsheet, and select the Raw Data worksheet if it isn't already active.

You'll see several design flaws that denormalize the data in the worksheet:

  • It contains repeated data    For instance, the rows contain data on where each person works, thus wasting space.
  • It contains ambiguous information    What do the dates and numbers in columns F, G, and H mean? In this case, they represent pay periods and hours worked, but you can't grasp that fact easily.
  • It has a structure that leads to bloated data    First, as the year progresses, you have to add new columns for each pay period, thus increasing file size. Second, updating the data can be difficult. For instance, say a new employee joins the company in August. The large number of columns makes it difficult to locate the cells in which you would enter the new employee's hours worked.

Normalize your data

You normalize your data for several reasons:

  • It saves space.
  • It allows for fast retrieval of information.
  • It describes the data clearly. In other words, you never have to wonder what you're seeing.
  • It ensures data integrity. For example, it ensures that a field titled First_Name never contains an address or postal code.

Database designers have used the normal forms for a long time, and they're a standard, proven technology. We'll start by placing the data in first normal form.

Use first normal form to make your data easier to understand

When you put data in first normal form, you place all of it in columns and rows and you eliminate any repeated groups of related information. For example, in the denormalized data that you saw in the previous section, columns F, G, and H contain related data, and they repeat themselves—you have to add new columns for each pay period. First normal form eliminates that kind of repetition.

To see how first normal form changes the data, return to the spreadsheet and click the First Normal sheet tab. What do you see? The pay period and hours worked data now reside in their own columns, and they're labeled clearly, making them easier to interpret.

If you want to try putting your own data into first normal form, start by looking for columns that repeat themselves. Ask yourself if those columns can belong to unique groups, and then manually rearrange your data accordingly (that's what we did with the sample spreadsheet).

First normal form makes it easier to show records for individual pay periods. However, the table also contains three times as many rows as the original, and we would need to add a new row for each employee for each new pay period. That leads to file bloat, and it doesn't make it easier to find errors in data. To start fixing those problems, we'll go to second normal form.

Use second normal form to identify each row in your data

To put data into second normal form, you add values called keys. Keys are columns of values that uniquely identify each row in a table. Follow these steps to add a key:

  1. On the Insert menu, click Worksheet.
  2. In the new sheet, right-click the Sheet1 tab, click Rename, type Second Normal in the tab, and press ENTER.
  3. Go to the First Normal worksheet, and copy all of the data.
  4. Go back to cell B1 in the Second Normal worksheet, and then paste the data.
  5. In cell A1 of the Second Normal sheet, type Employee ID. If necessary, expand the column to the right to accommodate the text.
  6. Starting with cell A2, enter 1 next to all instances of Barnhill; enter 2 next to all instances of Funk; enter 3 next to all instances of Stahl, and so on until you have assigned a sequential number to each of the names listed in column B. The following figure depicts the upper part of the finished table:

The data in second normal form

The Employee ID column becomes the key. The values in the key uniquely identify each row in the table.

That may not look like much of a change, but it readies the data for third normal form, and that's where the structure of your data changes radically.

Use third normal form to make your data easier to enter and maintain

Tables in third normal form conform to two rules:

  • They use keys to uniquely identify each record.
  • In addition to the keys, they store data for single types of entities. For example, a Suppliers table can only contain data about your suppliers. It cannot contain information about shippers, customers, or the weather.

To reach third normal form with our sample data, we need to move the Hours Worked, Pay Period, Dept., and Dept. Location columns to other tables and eliminate repeated data.

First, create the new tables:

  1. Return to the Second Normal worksheet, and on the Insert menu, click Worksheet.
  2. Double-click the Sheet2 tab, type Department, and press ENTER.
  3. Repeat step 2, and name the new sheet Hours Worked. Your workbook should contain the following worksheets:

The new tables for third normal form

Now, you need to move the department data to the Department worksheet and add a key:

  1. Go to the Second Normal worksheet, highlight the data in the Dept. and Dept. Location columns, and then cut and paste the data into columns B and C in the Department worksheet.
  2. In columns B and C, delete all but one instance of each department name and location. When you finish, none of the data should repeat itself.
  3. In cell A1, type Dept. ID, and in cells A2 through A4, type 1, 2, and 3, respectively. Your table should look like this:

The Department table

Next, move the hour data to the Hours Worked worksheet and add two keys:

  1. Go to the Second Normal worksheet, highlight the data in the Pay Period and Hours Worked columns, cut the data, and then paste it into columns C and D in the Hours Workedworksheet.
  2. In cell A1, type Pay Period ID.
  3. In cell A2, type 1.
  4. Point to the lower right corner of cell A1. The pointer changes to a cross.
  5. Drag from cell A1 to cell A22, click the arrow in the AutoFill Options button, and click Fill Series.
  6. In cell B1, type Employee ID.
  7. Enter one instance of each employee ID for each pay period. If you can't remember the numbers, look them up in the Second Normal worksheet. Your finished table should look like this:

The Hours Worked table in third normal form

Finally, remove any redundant information from the Second Normal worksheet:

  1. In the Second Normal table, remove all redundant data.
  2. Change the name of the worksheet to Employee. Your table should look like this:

The Employee table in third normal form

Instead of having one spreadsheet with redundant and confusing data, you now have three easy-to-read worksheets that fit third normal form. At this point, you may ask why three tables are better than one. Remember, you're designing a database, not a spreadsheet. The normal forms ensure accurate data, they make the data easier to enter and update, and they enable your database management system to retrieve records at warp speed.

A quick look at primary and foreign keys

The Hours Worked worksheet now contains two keys: Pay Period ID and Employee ID. Because the Pay Period ID originates in the worksheet, it becomes a primary key. Because the Employee ID values originate from the Employee worksheet, they become a foreign key. Placing foreign keys in worksheets creates one-to-many relationships among the data (the database geeks call this a relation). In this case, each employee listing is now associated with many pay periods. That type of structure is what enables you to retrieve records quickly from large databases. The tables in complex databases often contain multiple foreign keys. The key point to remember is that you create relations when you need a one-to-many relationship.

As a final exercise, let's create a relation between the Department and Employee worksheets:

  1. In the Employee table, click anywhere in Column B, and on the Insert menu, click Columns.
  2. In the new cell, B1, type Dept. ID.
  3. If needed, go to the Department table to look up each department ID, go to the Raw Data table to see where each employee works, and then, in Column B, enter the department ID for each employee.

Those steps create a one-to-many relationship between the departments and the employees who work in them. Now, if an employee changes departments, all you need to do is change the ID in one place, the Employee table. It may seem like a bit more work upfront, but in the long run, your data is easier to enter and maintain, and retrieve later.

Congratulations! You've just created a prototype of a small, normalized database. From here, you can import the tables into a new Access database and create data entry forms without an inordinate amount of work. For more information about performing those tasks, see the Access page on Office Online.

Credits

The data structures for the sample spreadsheet used in this column came from Data Warehousing with SQL Server 7.0 by Jake Sturm, published by Microsoft Press.


About the author

Colin Wilcox writes for the Office Help team. In addition to contributing to the Office Power User Corner column, he writes articles and tutorials for Microsoft Data Analyzer.