Information

Data Cleaning:

If you're cleaning data in excel or access remember to retain the original database key so you always have a route forward and back.





Information





Data Cleaning

Data Migration

A surprising number of projects experience delays because the data isn't ready to migrate to a new system. Data should be prepared and understood as early in the project as is practical. Never underestimate the task.

For simplicity, data can be categorised into three types

  1. critical data
  2. good to have data
  3. data you don't want / need

Only the critical data should hold up the project, i.e. concentrate on getting the critical data right and don't get diverted trying to prepare good to have data. Good to have data such as historical transactions can be problematic and difficult to map into a new system and there may be dubious value in doing so.

Existing databases

Your existing database is likely to contain records which you no longer need. You should filter these out of your critical data as part of the cleaning process. This can reduce the amount of data to clean and cut conversion costs.

One way to filter critical data is based on usage over a certain period of time, i.e. only vendors used within the last x months. The data outside these parameters may be good to have but not critical to the success of the project. You could decide to set these records up as new were you to buy from them again in the future when you're live with your new system.

Understanding your data

Understanding your data early in the project means you will be better prepared. Your project plan will be better informed with costs and resources, and you can sooner reach decisions about your data and what will be in scope. i.e. you will reduce your risk.

It will also mean you can begin sample data cleaning, mapping and to test data conversions, particularly for open items. This will reduce inherent risk even further. It will also highlight key areas where decisions need to be made and agreed between departments. This can include items like coding conventions, product groupings etc.

Also, you may have multiple sources for the same data, for example you might have a number of different systems containing customer data. You need to decide what your prime data source is going to be and what information, if any, you may need from the other sources.

Things to watch out for

Fields in your old system may not exactly match fields in the new system. This means the length of fields and the values they can contain might not be the same. The data will not fit into your new system and the conversion will not work.

Examples where this may arise are where a field used to allow alphanumeric values and will now only allow numeric. A description field used to allow 120 characters and in your new system only allows 100. More specifically, telephone numbers in the old system may contain extensions (i.e. EXT 123) or account numbers (i.e. A/C XYZ) or text (Tues pm). The new system will only allow telephone numbers to be numeric. You will need to separate the data in the old system before importing.

Special characters

Conversion programs are unlikely to be able to understand the context in which special characters are used in your old data. They will interpret the character as the start or end of a record. This will throw the field alignment out and the conversion won't work.

Special characters usually appear in address or description fields. You can use the standard find and replace functions in most applications to weed these out. Typical characters which may cause conversion problems include commas (,), text qualifiers ("), inverted commas (') and the asterisk (*).

Other considerations

The format of data records extracted from your legacy system may have other pecularities you need to address. The record may include whitespace, i.e. blank characters so the record is the same length as the field. You would need to trim these before import into the new system. Dates may be extracted as a text string which you need to reconvert to date format. Depending on how the financial periods have been configured in your new system you might not be able to date open items correctly and would need to amend them or have the software configured to allow them.

In Summary

Don't underestimate the time it can take to prepare your data. File formats extracted from legacy systems may not be in standard database format and may require further manipulation. The earlier you look at your data and understand it the more you'll accomplish in house and the more control you will be able to exert over your timetable and budget.

»Go to the resources section for specific examples


SAP Oracle Sage Coda Microsoft Dynamics
©newITsystem.com 2010, all rights reserved ::

Design: Eriginal