Move data from Excel to Access

This article shows you how to move your data from Excel to Access and convert your data to relational tables so that you can use Microsoft Office Excel and Access together. To summarize, Access is best for capturing, storing, querying, and sharing data, and Excel is best for calculating, analyzing, and visualizing data.

Two articles, Using Access or Excel to manage your data and Top 10 reasons to use Access with Excel, discuss which program is best suited for a particular task and how to use Excel and Access together to create a practical solution.

In this article


The essence of relational databases and data modeling

Many data files, including Excel, are called flat files. These files are often large and contain redundant data, columns that are rarely used, and many blank values. You may inherit these files from another system or user, or they might have evolved this way as columns were added over time to satisfy changing requirements. Although the organization of the data in a flat file works for a specific purpose, they are not flexible, and you can find it difficult to answer unanticipated questions about your data.

A flat file

The time-tested solution to a flat file is a relational database. Access is a relational database program and works best when tables are well designed with relationships that conform to the relational database model.

Top of Page Top of Page

The components of a relational table

In a well-designed relational database, each table is a collection of named columns and multiple rows that stores information about a single subject, such as employees. Each column of the table is uniquely named and contains information about the subject, such as an employee's first name and address. The rows of the table contain occurrences of the subject, such as all the current employees in the company. A single value is stored in the intersection of a row and column, and is a single fact, such as "Seattle." Finally, you can re-order rows and columns without changing the meaning of the table.

A relational table

Callout 1 A table represents a single subject — a person, place, thing, event, or concept
Callout 2 Each row is unique and has a primary key, such as a badge number
Callout 3 Each column has a unique, short, and meaningful name
Callout 4 All the values in a column are similar in meaning and format
Callout 5 Each value in the table (The equivalent of a cell in Excel) represents a single fact

Top of Page Top of Page

What is Normalization?

Unfortunately, a well-designed relational table does not just automatically happen in an Access database. You must use a method to analyze data in a flat file and to rearrange data from one table into two or more related tables. This method is called normalization. In a step-by-step process, you split one table into two or more smaller tables by removing repeating values in a column, removing redundant data from rows, and adding primary keys (a field that uniquely identifies each record in a table) and foreign keys (a field that contains values found in a primary key field in a related table) to define relationships between the new tables.

A relational diagram

Callout 1 A relational table
Callout 2 Column names
Callout 3 A Primary Key
Callout 4 A Foreign Key
Callout 5 Relational lines and symbols

Top of Page Top of Page

Tables can be in different normal forms

A table can be in one of four different normal forms: zero, first, second, and third. Each form describes the degree to which data in a table is organized and can be successfully used in a relational database. Zero normal form is the least organized and third normal form is the most efficiently organized.

Zero normal form    One sign that a table is in the least organized state, known as "zero normal form," is when one or more columns contain "non-atomic" values, which means that multiple values are contained in a single cell. For example, a customer address might consist of a street address (such as 2302 Harvard Ave), a city, a state, and a postal (ZIP) code. Ideally, each of these elements of the address are stored in separate columns. As another example, a column that contains full names, such as "Li, Yale" or "Ellen Adams" should be split into separate columns for first and last names. Storing first and last names in separate columns is a good practice that helps you quickly find and sort your data.

Another sign of data in zero normal form is when it contains information about different subjects, such as salespersons, products, customers, and orders. Whenever possible, data should be separated into separate tables for each subject.

First normal form    A table is in first normal form when every column contains atomic values, but one or more columns contain some redundant data, such as salesperson or customer information for each portion of an order. For example, Adams, Ellen is repeated five times in the worksheet because she has two different orders (one with three products and one with two products).

Second normal form    A table is in second normal form when redundant data has been removed, but one or more columns are either not based on the primary key or contain calculated values (such as Price * Discount).

Third normal form    A table is in third normal form when all columns in the table are based solely on the primary key. As shown in the following picture, product and supplier information are stored in separate tables and are joined on a lookup to the Supplier ID field in the Suppliers table.

Tables in zero, first, and second normal form can present problems when you want to change data. For example, updating values that are repeated frequently is an extremely time consuming process. Each time you update a value, you need to check to see if every other row contains that same value. This can waste your time and be an error-prone operation. Furthermore, it's difficult to effectively sort and filter columns that contain repeating values. Tables in first and second normal form are a big improvement over zero normal form, but they can still be a problem when you insert, update or delete data.

When you go through the process of normalizing your data, you convert a table from a lower form to a higher form until all tables are in third normal form. In most cases, third normal form is ideal because:

  • Modification problems can be eliminated when data is inserted, deleted, or updated.
  • The integrity of the data can be maintained with data constraints and business rules.
  • You can query the data in a variety of ways to answer your questions.

Top of Page Top of Page

Relationships and keys

A well-defined relational database contains several tables, each in third normal form, but there are also relationships between these tables that help bring the data together. For example, Employees belong to Departments and are assigned to Projects, Projects have Subtasks, Subtasks are owned by Employees, and Departments manage Projects. In this scenario, a relational database would have four tables defined: Employees, Projects, Subtasks, and Departments, with each of these key relationships defined: belong to, are assigned to, have, are owned by, and manage.

There are three types of relationships:

  • One-to-one (1:1)    For example, each employee has a unique badge ID and each badge ID refers to has a unique employee.
  • One-to-many (1:M)    For example, each employee is assigned to one department, but a department has many employees. This is also called a parent-child relationship.
  • Many-to-many (M:M)    For example, an employee can be assigned to many projects, and each project can have many employees assigned. Note that a special table, called a junction table, is often used to create a one-to-many relationship between each table in third normal form for a total of three tables that together form the many-to-many relationship.

You create a relationship between two or more tables based on primary and foreign keys. A primary key is a column in a table whose values uniquely identify each row in a table, such as a Badge Number or Department Code. A foreign key is a column in a table whose values are the same as the primary key of another table. You can think of a foreign key as a copy of primary key from another relational table. The relationship between two tables is made by matching the values of the foreign key in one table with the values of the primary key in another.

Primary and foreign key relationships

Top of Page Top of Page

Data integrity and validity

After you create a relational database with all tables in third normal form and the correct relationships defined, you want to ensure the integrity of that data. Data integrity means that you can correctly and consistently navigate relationships and manipulate the tables in the database over time as the database is updated. There are two basic rules in relational databases that help to ensure data integrity.

The entity rule    Each row in a table must have a primary key and that primary key must have a value. This rule makes sure that every row in a table can be uniquely identified and is never inadvertently lost. Furthermore, whenever you insert, update, or delete data, the uniqueness and existence of all primary keys can be maintained.

The referential integrity rule    This rule controls insertion and deletion rules of the one-to-many relationship. If a table has a foreign key, every value of the foreign key must either be null (no value) or must match the values in the relational table in which that foreign key is a primary key.

Editing relationships

You can also further ensure data integrity in a relational database by using various data validation rules, including data type (such as Integer), data length (such as 15 characters or less), data format (such as currency), default value (such as 10), and constraints (such as Inventory_Amt > ReOrder_Amt). These data validation rules help ensure that the database has quality data and also conforms to established business rules.

It's worth noting that data entry is an important difference between an Access database and an Excel workbook. Entering data in an Excel worksheet is "free-form." You can enter data just about anywhere and you can easily undo a change. However, an Access database is much more structured and constrained. Furthermore, when you enter data in a table, the change is committed to the database. You cannot undo the data entry in the same way as Excel, although you can delete or update the data to correct any mistakes.

Top of Page Top of Page

Summing up

After you normalize your data into relational tables with well-defined relationships and define its data integrity, it becomes much easier to:

  • Save space and improve performance, because repeating and redundant data is physically removed.
  • Update data accurately and maintain the integrity of the data.
  • Sort, filter, create calculated columns, aggregate, and summarize data.
  • Query data in a variety of ways to answer expected and unanticipated questions.

To be sure, there are advanced aspects to relational database design, such as composite keys (a key that consists of values from two or more columns), additional normal forms (fourth normal form — a multivalued dependency), and denormalization. But for most simple to moderate database needs, you have the essential information about database design that you need to understand the following case studies in this article.

Top of Page Top of Page

The basic steps to moving data from Excel to Access

When you move data from Excel to Access, there are three basic steps to the process.

Three basic steps

Top of Page Top of Page

Step 1: Import data from Excel to Access

Importing data is an operation that can go a lot more smoothly if you take some time to prepare and clean your data. Importing data is like moving to a new home. If you clean out and organize your possessions before you move, settling into your new home is much easier.

Clean your data before you import

Before you import data into Access, in Excel it's a good idea to:

  • Convert cells that contain non-atomic data (that is, multiple values in one cell) to multiple columns. For example, a cell in a "Skills" column that contains multiple skill values, such as "C# programming," "VBA programming," and "Web design" should be broken out to separate columns that each contain only one skill value.
  • Use the TRIM command to remove leading, trailing, and multiple embedded spaces.
  • Remove non-printing characters.
  • Find and fix spelling and punctuation errors.
  • Remove duplicate rows or duplicate fields.
  • Ensure that columns of data do not contain mixed formats, especially numbers formatted as text or dates formatted as numbers.

Watch the following demo to learn how to clean up your data in Excel.


Show me Watch this

                             Video created by Office Online staff writers

For more information, see the following Excel help topics:

 Note   If your data cleaning needs are complex, or you don't have the time or resources to automate the process on your own, you might consider using a third-party vendor. For more information, search for "data cleansing software" or "data quality" by your favorite search engine in your Web browser.

Choose the best data type when you import

During the import operation in Access, you want to make good choices so that you receive few (if any) conversion errors that will require manual intervention. The following table summarizes how Excel number formats and Access data types are converted when you import data from Excel to Access, and offers some tips on the best data types to choose in the Import Spreadsheet Wizard.

Excel number format Access data type Comments Best practice
Text Text, Memo The Access Text data type stores alphanumeric data up to 255 characters. The Access Memo data type stores alphanumeric data up to 65,535 characters. Choose Memo to avoid truncating any data.
Number, Percentage, Fraction, Scientific Number Access has one Number data type that varies based on a Field Size property (Byte, Integer, Long Integer, Single, Double, Decimal). Choose Double to avoid any data conversion errors.
Date Date

Access and Excel both use the same serial date number to store dates. In Access, the date range is larger: from -657,434 (January 1, 100 A.D.) to 2,958,465 (December 31, 9999 A.D.).

Because Access does not recognize the 1904 date system (used in Excel for the Macintosh), you need to convert the dates either in Excel or Access to avoid confusion.

For more information, see Change the date system, format, or two-digit year interpretation and Import or link to data in an Excel workbook

.

Choose Date.
Time Time Access and Excel both store time values by using the same data type. Choose Time, which is usually the default.
Currency, Accounting Currency In Access, the Currency data type stores data as 8-byte numbers with precision to four decimal places, and is used to store financial data and prevent rounding of values. Choose Currency, which is usually the default.
Boolean Yes/No Access uses -1 for all Yes values and 0 for all No values, whereas Excel uses 1 for all TRUE values and 0 for all FALSE values. Choose Yes/No, which automatically converts underlying values.
Hyperlink Hyperlink A hyperlink in Excel and Access contains a URL or Web address that you can click and follow. Choose Hyperlink, otherwise Access may use the Text data type by default.

Once the data is in Access, you can delete the Excel data. Don't forget to backup the original Excel workbook first before deleting it.

For more information, see the Access help topic, Import or link to data in an Excel workbook.

Top of Page Top of Page


Automatically append data the easy way

A common problem Excel users have is appending data with the same columns into one large worksheet. For example, you may have an asset tracking solution that started out in Excel but now has grown to include files from many workgroups and departments. This data may be in different worksheets and workbooks, or in text files that are data feeds from other systems. There is no user interface command or easy way to append similar data in Excel. There is a work-around explained in the following article:

Microsoft Excel Blog: Append multiple text files into a worksheet without code

But even this workaround has limitations.

The best solution is to use Access, where you can easily import and append data into one table by using the Import Spreadsheet Wizard. Furthermore, you can append a lot of data into one table. You can save the import operations, add them as scheduled Microsoft Office Outlook tasks, and even use macros to automate the process.

Top of Page Top of Page


Step 2: Normalize data by using the Table Analyzer Wizard

At first glance, stepping through the process of normalizing your data may seem a daunting task. Fortunately, normalizing tables in Access is a process that is much easier, thanks to the Table Analyzer Wizard.

The Table Analyzer Wizard

Callout 1 Drag selected columns to a new table and automatically create relationships
Callout 2 Use button commands to rename a table, add a primary key, make an existing column a primary key, and undo the last action

You can use this wizard to do the following:

  • Convert a table into a set of smaller tables and automatically create a primary and foreign key relationship between the tables.
  • Add a primary key to an existing field that contains unique values, or create a new ID field that uses the AutoNumber data type.
  • Automatically create relationships to enforce referential integrity with cascading updates. Cascading deletes are not automatically added to prevent accidently deleting data, but you can easily add cascading deletes later.
  • Search new tables for redundant or duplicate data (such as the same customer with two different phone numbers) and update this as desired.
  • Back up the original table and rename it by appending "_OLD" to its name. Then, you create a query that reconstructs the original table, with the original table name so that any existing forms or reports based on the original table will work with the new table structure.

Top of Page Top of Page

Step 3: Connect to Access data from Excel

After the data has been normalized in Access and a query or table has been created that reconstructs the original data, it's a simple matter of connecting to the Access data from Excel. Your data is now in Access as an external data source, and so can be connected to the workbook through a data connection, which is a container of information that is used to locate, log on to, and access the external data source. Connection information is stored in the workbook and can also be stored in a connection file, such as an Office Data Connection (ODC) file (.odc file name extension) or a Data Source Name file (.dsn extension). After you connect to external data, you can also automatically refresh (or update) your Excel workbook from Access whenever the data is updated in Access.

For more information, see Overview of connecting (importing) data and Exchange (copy, import, export) data between Excel and Access.

Top of Page Top of Page

Get your data into Access

This section walks you through the following phases of normalizing your data: Breaking values in the Salesperson and Address columns into their most atomic pieces, separating related subjects into their own tables, copying and pasting those tables from Excel into Access, creating key relationships between the newly created Access tables, and creating and running a simple query in Access to return information.

Example data in non-normalized form

The following worksheet contains non-atomic values in the Salesperson column and the Address column. Both columns should be split into two or more separate columns. This worksheet also contains information about salespersons, products, customers, and orders. This information should also be split further, by subject, into separate tables.

Salesperson Order ID Order Date Product ID Qty Price Customer Name Address Phone
Li, Yale 2348 3/2/09 J-558 4 $8.50 Contoso, Ltd. 2302 Harvard Ave Bellevue, WA 98227 425-555-0222
Li, Yale 2348 3/2/09 B-205 2 $4.50 Contoso, Ltd. 2302 Harvard Ave Bellevue, WA 98227 425-555-0222
Li, Yale 2348 3/2/09 D-4420 5 $7.25 Contoso, Ltd. 2302 Harvard Ave Bellevue, WA 98227 425-555-0222
Li, Yale 2349 3/4/09 C-789 3 $7.00 Fourth Coffee 7007 Cornell St Redmond, WA 98199 425-555-0201
Li, Yale 2349 3/4/09 C-795 6 $9.75 Fourth Coffee 7007 Cornell St Redmond, WA 98199 425-555-0201
Adams, Ellen 2350 3/4/09 A-2275 2 $16.75 Adventure Works 1025 Columbia Circle Kirkland, WA 98234 425-555-0185
Adams, Ellen 2350 3/4/09 F-198 6 $5.25 Adventure Works 1025 Columbia Circle Kirkland, WA 98234 425-555-0185
Adams, Ellen 2350 3/4/09 B-205 1 $4.50 Adventure Works 1025 Columbia Circle Kirkland, WA 98234 425-555-0185
Hance, Jim 2351 3/4/09 C-795 6 $9.75 Contoso, Ltd. 2302 Harvard Ave Bellevue, WA 98227 425-555-0222
Hance, Jim 2352 3/5/09 A-2275 2 $16.75 Adventure Works 1025 Columbia Circle Kirkland, WA 98234 425-555-0185
Hance, Jim 2352 3/5/09 D-4420 3 $7.25 Adventure Works 1025 Columbia Circle Kirkland, WA 98234 425-555-0185
Koch, Reed 2353 3/7/09 A-2275 6 $16.75 Fourth Coffee 7007 Cornell St Redmond, WA 98199 425-555-0201
Koch, Reed 2353 3/7/09 C-789 5 $7.00 Fourth Coffee 7007 Cornell St Redmond, WA 98199 425-555-0201
Sousa, Luis 2354 3/7/09 A-2275 3 $16.75 Contoso, Ltd. 2302 Harvard Ave Bellevue, WA 98227 425-555-0222
Adams, Ellen 2355 3/8/09 D-4420 4 $7.25 Adventure Works 1025 Columbia Circle Kirkland, WA 98234 425-555-0185
Adams, Ellen 2355 3/8/09 C-795 3 $9.75 Adventure Works 1025 Columbia Circle Kirkland, WA 98234 425-555-0185
Li, Yale 2356 3/10/09 C-789 6 $7.00 Contoso, Ltd. 2302 Harvard Ave Bellevue, WA 98227 425-555-0222

Information in its smallest parts: atomic data

Working with the data in this example, you can use the Text to Column command in Excel to separate the "atomic" parts of a cell (such as street address, city, state, and postal code) into discrete columns.

The following table shows the new columns in the same worksheet after they have been split to make all values atomic. Note that the information in the Salesperson column has been split into Last Name and the First Name columns and that the information in the Address column has been split into Street Address, City, State, and ZIP Code columns. This data is in "first normal form."

Last Name First Name   Street Address City State ZIP Code
Li Yale 2302 Harvard Ave Bellevue WA 98227
Adams Ellen 1025 Columbia Circle Kirkland WA 98234
Hance Jim 2302 Harvard Ave Bellevue WA 98227
Koch Reed 7007 Cornell St Redmond Redmond WA 98199
Sousa Luis 2302 Harvard Ave Bellevue WA 98227

Breaking data out into organized subjects in Excel

The several tables of example data that follow show the same information from the Excel worksheet after it has been split into tables for salespersons, products, customers, and orders. The table design isn't final, but it's on the right track.

Watch the following demo to learn how to break your Excel data out into tables.


Show me Watch this

                             Video created by Office Online staff writers

The Salespersons table contains only information about sales personnel. Note that each record has a unique ID (SalesPerson ID). The SalesPerson ID value will be used in the Orders table to connect orders to salespeople.

Salespersons
Salesperson ID Last Name First Name
101 Li Yale
103 Adams Ellen
105 Hance Jim
107 Koch Reed
109 Sousa Luis

The Products table contains only information about products. Note that each record has a unique ID (Product ID). The Product ID value will be used to connect product information to the Order Details table.

Products
Product ID Price
A-2275 16.75
B-205 4.50
C-789 7.00
C-795 9.75
D-4420 7.25
F-198 5.25
J-558 8.50

The Customers table contains only information about customers. Note that each record has a unique ID (Customer ID). The Customer ID value will be used to connect customer information to the Orders table.

Customers
Customer ID Name Street Address City State ZIP Code Phone
1001 Contoso, Ltd. 2302 Harvard Ave Bellevue WA 98227 425-555-0222
1003 Adventure Works 1025 Columbia Circle Kirkland WA 98234 425-555-0185
1005 Fourth Coffee 7007 Cornell St Redmond WA 98199 425-555-0201

The Orders table contains information about orders, salespersons, customers, and products. Note that each record has a unique ID (Order ID). Some of the information in this table needs to be split into an additional table that contains order details so that the Orders table contains only four columns — the unique order ID, the order date, the salesperson ID, and the customer ID. The table shown here has not yet been split into the Order Details table.

Orders
Order ID Order Date SalesPerson ID Customer ID Product ID Qty
2348 3/2/09 101 1001 J-558 4
2348 3/2/09 101 1001 B-205 2
2348 3/2/09 101 1001 D-4420 5
2349 3/4/09 101 1005 C-789 3
2349 3/4/09 101 1005 C-795 6
2350 3/4/09 103 1003 A-2275 2
2350 3/4/09 103 1003 F-198 6
2350 3/4/09 103 1003 B-205 1
2351 3/4/09 105 1001 C-795 6
2352 3/5/09 105 1003 A-2275 2
2352 3/5/09 105 1003 D-4420 3
2353 3/7/09 107 1005 A-2275 6
2353 3/7/09 107 1005 C-789 5
2354 3/7/09 109 1001 A-2275 3
2355 3/8/09 103 1003 D-4420 4
2355 3/8/09 103 1003 C-795 3
2356 3/10/09 101 1001 C-789 5

Order details, such as the product ID and quantity are moved out of the Orders table and stored in a table named Order Details. Keep in mind that there are 9 orders, so it makes sense that there are 9 records in this table. Note that the Orders table has a unique ID (Order ID), which will referred to from the Order Details table.

The final design of the Orders table should look like the following:

Orders
Order ID Order Date SalesPerson ID Customer ID
2348 3/2/09 101 1001
2349 3/4/09 101 1005
2350 3/4/09 103 1003
2351 3/4/09 105 1001
2352 3/5/09 105 1003
2353 3/7/09 107 1005
2354 3/7/09 109 1001
2355 3/8/09 103 1003
2356 3/10/09 101 1001

The Order Details table contains no columns that require unique values (that is, there is no primary key), so it is okay for any or all columns to contain "redundant" data. However, no two records in this table should be completely identical (this rule applies to any table in a database). In this table, there should be 17 records — each corresponding to a product in an individual order. For example, in order 2349, three C-789 products comprise one of the two parts of the entire order.

The Order Details table should, therefore, look like the following:

Order Details
Order ID Product ID Qty
2348 J-558 4
2348 B-205 2
2348 D-4420 5
2349 C-789 3
2349 C-795 6
2350 A-2275 2
2350 F-198 6
2350 B-205 1
2351 C-795 6
2352 A-2275 2
2352 D-4420 3
2353 A-2275 6
2353 C-789 5
2354 A-2275 3
2355 D-4420 4
2355 C-795 3
2356 C-789 5

Copying and pasting data from Excel into Access

Now that the information about salespersons, customers, products, orders, and order details has been broken out into separate subjects in Excel, you can copy that data directly into Access, where it will become tables.

Watch the following demo to learn how to copy your Excel data into new tables in Access.


Show me Watch this

                             Video created by Office Online staff writers

Creating relationships between the Access tables and running a query

After you have moved your data to Access, you can create relationships between tables and then create queries to return information about various subjects. For example, you can create a query that returns the Order ID and the names of the salespersons for orders entered between 3/05/09 and 3/08/09.

In addition, you can create forms and reports to make data entry and sales analysis easier.

Watch the following demo to learn how to create relationships in Access and build and run a query.


Show me Watch this

                             Video created by Office Online staff writers

Top of Page Top of Page

 
 
Applies to:
Excel 2007