Planning ahead around data migrations

An abstract image in blue and white of a database.

(Image credit: Pixabay)

Whatever applications or services you might use, you will create data. According to Domo’s Data Never Sleeps report for 2024, 251.1 million emails are sent, 18.8 million text messages are received and more than 5.9 million Google searches take place every minute of every day. For businesses, the data they create is essential to their operations, and the volume of that data is going up all the time.

While your applications might not have the same level of traffic as a Google or Netflix, you will have to consider how you manage your data over time. Eventually - whether it is due to needing more space, replatforming your application or just to update your software and avoid end of life problems - you will have to move your data.

Data migration planning for data is a big deal. The potential impact from this kind of project can be massive. So how can you avoid problems and make migrations run as smoothly as possible?

Martin Visser

Valkey Tech Lead, Percona.

Planning Ahead

To make your migration process a success, the first step is to understand what you intend to move. By taking an inventory of what your system is built on, and what it is connected to, you can create a list of dependencies that have to be supported as part of the migration. This can show up some items that you had either overlooked, or other updates that might be needed to complete your migration successfully.

For example, you may find that you have more instances of the system you want to migrate than you thought. This can include test and development environments for your apps, or other systems that were initially considered out of scope. Finding these before you carry out any move is essential, as you don’t want to deal with these issues when you are mid-way through a migration.

Similarly, review your implementations for any specific deployment patterns and the target state requirements that you should be aware of. Databases all scale differently - for instance, most relational databases are built around using a single primary server instance. If they have to scale, then either a bigger machine is needed or additional replicated databases can be used to allow for more read requests. For others, they run as sharded environments where many different nodes work in concert with each other to serve a large dataset.

Databases may also have specific capabilities to consider - for instance, Redis uses modules to supply additional functionality alongside its core in-memory database design. PostgreSQL also relies on extensions to the core database for further functionality, so any of these that are in place would also have to be updated during a migration.

Alongside the IT infrastructure components that you have in place, you should carry out a performance evaluation to see how your system currently processes data. This could include tracking metrics like application throughput, latency and patterns in transaction volume over time. Getting these figures in advance of any move provides you with a baseline to compare against once you have completed that move. You can then use this data to plan ahead on any expected growth in traffic levels, or whether you might need to add more capacity as part of any migration.

Understanding the full inventory

Understanding the full inventory of components involved in the data migration is crucial. However, it is equally essential to have a clearly defined target and to communicate this target to all stakeholders. This includes outlining the potential implications of the migration for each stakeholder. The impact of the migration will vary significantly depending on the nature of the project. For example, a simple infrastructure refresh will have a much smaller impact than a complete overhaul of the database technology.

In the case of an infrastructure refresh, the primary impact might be a brief period of downtime while the new hardware is installed and the data is transferred. Stakeholders may need to adjust their workflows to accommodate this downtime, but the overall impact on their day-to-day operations should be minimal.

On the other hand, a complete change of database technology could have far-reaching implications. Stakeholders may need to learn new skills to interact with the new database, and existing applications may need to be modified or even completely rewritten to be compatible with the new technology. This could result in a significant investment of time and resources, and there may be a period of adjustment while everyone gets used to the new system.

Therefore, it is essential to have a clear understanding of the target environment and to communicate the potential implications of the migration to all stakeholders well in advance. This will help to ensure that everyone is prepared for the change and that the migration goes as smoothly as possible.

Making The Move

The golden rule for any big data migration project is to work step by step. Rather than a ‘big bang’ cut-over, isolate any change so that you can track your progress and easily roll back if you need to. Alongside this you should carry out a full backup for your data so you have a version to migrate back to, or implement separately if something goes wrong. With some migrations, the process to move back is difficult, so this back-up is a necessary backstop in case of a failure.

In the ideal scenario, you will have a complete mirror of your production environment and associated load generators and tests that cover all usage scenarios. This is notoriously difficult and expensive to achieve and having that 100% confidence requires a lot of effort. Even if you do have that confidence, there is a law that things will probably go wrong at some point. There are several techniques that you can employ to improve your chances of success.

The first of these is a canary deployment. This involves looking at your systems and selecting one that you will migrate over first. This deployment can be used to see how successful the move is over time, and to help you find any potential problems before you move all your systems over to the new database. Like a canary in a coalmine saves the majority, this initial migration shows you where any problems exist and how to fix them before the complete change-over leads to more rework.

This approach relies on your before-and-after metrics so you can spot any discrepancies in performance, as well as any failures or integration problems. This can show potential issues when you move - for example, many of those migrating away from MySQL 5.7 found that the supported version (MySQL 8.0) had worse performance than their previous deployment. This dip in performance might be a problem for your specific application, but in this case, 5.7 has reached its End of Life (EOL).

Although EOL support is available from specialist vendors to keep your systems going, relying on this is merely postponing the inevitable. So, looking at why that performance is lower and where this can be fixed is in order. Once you have checked your metrics and you are in a position you are comfortable with, you can then move other instances into production.

One additional consideration once you are in the middle of your migration is tracking that performance side. Comparing reports or dashboards manually is time-consuming and hard work. To get around this, you can set automated alerts or rules for potential rollbacks. This approach involves creating specific triggers where your deployment would lead to an automatic rollback to the previous deployment. This can help you take your time when you run into a situation that you did not expect, and then take the time to understand the problem.

Know Your Status

Alongside the technical aspects of a migration, there is another area that should be planned in advance. When you make a major change like a data migration, a communications plan that brings together everyone involved is essential. For data, this can include multiple departments across IT including any application developers responsible for the system through to database and IT operations professionals that manage the deployment. However, this plan should also include business teams that rely on that application, as they would be affected by the change too.

This plan provides a framework for talking through developments as they come up, and ensuring that everyone is aware of any incidents. As any unforeseen problems arise, the whole organization can be aware of the impact and how this might affect the migration plan. This can then keep migration plans on track, or get support for any amendments to that plan as needed.

Forrester points to how companies are expanding their operations based on data, and this relies on applications, infrastructure, people and processes as well as the physical data itself. As you plan any migration, you will have to take that mix of dependencies into consideration. By looking at data as part of that wider framework, you can plan ahead and ensure your migration is successful.

We've compiled a list of the best data recovery software.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro