A way to change a foreign key reference with zero downtime
In a project we recently completed, we had to refactor an object so that one of its foreign key associations referred to a new model. We use Ruby/Rails and deploy code to Heroku with Preboot enabled. Implementing this change safely without any downtime was surprisingly tricky to get right (in fact we didn’t at first, and ended up with a brief outage in a part of our product, although fortunately no data was lost). Pedro Belo’s post on hot compatibility lists some patterns to make basic data migrations zero-downtime deployment safe. As Belo mentions at the end of his post, hot compatibility needs to be addressed at the application level, and often takes a lot of planning and work. This article describes a reusable pattern to safely make a change to a foreign key reference with no downtime.
Consider a Rails model ABLink that refers to models A and B. The underlying table has two foreign key columns: a_id and b_id (note, these columns don’t have strictly enforced database level foreign key constraints); each ABLink object belongs to an object of model A and an object of model B:
class ABLink < ActiveRecord::Base
The class needed to be changed so that ABLink instead refers to A and NewB, a new model that has a one-to-one mapping with B.
A simple approach to change the reference is to deploy a single commit that includes the following:
- A migration to change all values in the b_id column to their corresponding NewB ID values.
- Changes to the ABLink model code for the belongs_to association to refer to NewB instead.
- Changes to all controller code that reads from or writes to ABLink to create objects referring to NewB instead of B.
However, this method is not zero-downtime safe. Consider the following sequence of events:
- All new code gets pushed to production.
- A user accesses the control route that creates ABLink objects and adds a new object that refers to a NewB object.
- The migration that is supposed to runs fails at this new record since the the value in the b_id column is already the ID of a NewB object instead of the ID of a B object that the migration expected.
- All the controllers/views that are now changed to read the linking objects and use the association don’t work (for us it was an important portion of our product).
There are a lot of approaches that are similarly prone to race conditions during deployment that can cause downtime. The pattern that we were able to finally employ to safely make this change had five steps:
- In one deploy, include a commit that migrates the database to add a new column called new_b_id to the table underlying ABLink.
- In another deploy, include a commit that adds the new association to the model in rails. Also in this commit, add some code in controllers that update/add linking model objects to double-populate both the original association and the new association.
- Run a script in the console that backfills the new association for all the objects that don’t have it populated. Note, we chose to do this in a script but this can also be done in a data migration with the second deploy.
- In a third deploy, add code to change the controllers to only use the new association and remove the double-population. Update the model to follow the common pattern of adding code to reject the old columns.
- In a final deploy, remove the b_id column from the table in a migration.
This method is generally applicable to any foreign key relationship that needs to refer to a new column and has proven to be very useful for us to refactor our models with zero downtime. Also, although we used this in the context of Rails, the general approach to fix the problem can apply to any web framework.
Sound fun? We’re hiring!