17
Aug

Top 10 things to watch out for in a data migration project

networkingWhen undergoing a data migration, from one system to another, or from one database format to another, there are a number of common issues which crop up.  Any of these can cause the migration to fail.   These issues can be mitigated by careful planning before starting the migration project, and by comprehensive testing before the migration is signed off as complete.

The top 10 data migration issues most likely to cause problems are:

Dates and Times
By far, dates and times cause the biggest issue with any data migration project.  A large number of database types hold dates and times not as strings, but as an integer of the number of days since a given date or time, which means dates need to be converted as part of a data extraction.   In addition, dates can be held in a wide variety of formats including USA (mm/dd/yyyy), European (dd/mm/yyyy), long, short, etc.  Dates can also be invalid (such as 32/03/2010) or missing.

Column types
When migrating between different SQL based databases, not all database flavours support all database column formats.  So when migrating between Informix, Access, SQL Server, or Oracle, a column type in one database may not be available in the new database.  Careful planning will be required on what is the appropriate data type in the new database format.

Unlinked Data
SQL databases generally require data tables to link, with transaction data linking to reference data.  However, when migrating from one database format to another, and especially when migrating older data, the original links may be broken with reference data missing or changed.  This can cause integrity failures in the loading in the new database.

Encrypted Data
Whilst most modern databases are fairly open, when migrating from a suppliers older proprietary database, it is likely that some columns, tables or even the entire database will be encrypted.  This would then need the assistance of the original supplier (or reverse engineering of the encryption process) to provide the un-encryption software for data extraction.

Links to External Data
It cannot be assumed that all data required will be held in a single central database.  For a lot of older databases, references may be made to other external databases including spreadsheets, ODBC linked systems and even text look-ups.  Such links should be included in the data extract routines.

Incomplete Data
As computer systems evolve, so the columns in the original database will also evolve to provide space to hold additional data.  Due to this evolution and possible software issues with the original source system, some fields which are now mandatory may be missing from earlier data.  Rules should be defined for this eventuality.

Binary Data
Some databases include binary objects such as documents, images, sound and video files embedded within the database.   Consideration should be given to the value of this data, the method of migration, and if the data is actually usable once migrated.

Duplicate Entries
It is not uncommon for older systems to incorrectly allow the same records to be recorded more than once in the same file or table.   This can cause problems when migrating records both in terms of which ‘version’ should be migrated, but also the issue of primary key conflicts in the receiving system.

Transient Data
Transient data (temporary data or data that is being changed) can be an issue where the data is in the source database but should have been previously deleted (the temporary saving of order details whilst the entire order is then cancelled, but parts of the record are left in the system by mistake), or records which are being updated as the migration takes place.

Long Data
Finally, thought should be given to long data.   This can occur where data is currently held in unix or DOS text files and is being migrated to a structured data store.  Transferring long text or other data which can range in size from 1 byte to many hundreds of thousands of bytes needs careful planning in terms of data storage types and  method of migration.