Automatic Rails Migrations in Production – yes or no?

This is a technical post from one of our developers. For more engineering posts, check out our code posts.

At TINT, the engineering team ends each week with a retrospective: a meeting recapping what we liked and didn’t like about the week of work. Did someone break the build? Did we have a p1 bug? I really like how retrospectives fit into TINT culture; they encourage fearless feedback in a low-pressure and supportive environment.

In our last retrospective, the TINT engineering team threw around the idea of doing automatic migrations to production. I was skeptical during the short discussion, but we like to keep our retrospectives from dragging, so I took an action item to review what’s considered the best practice.

The Pros and Cons of Automatic Rails Migration in Production

I don’t find writing a long list of pros and cons most helpful, since that is not always the reality of making a technical decision like this. Instead of listing a slew of pros and cons, I’ll try to focus on weighing the biggest pro and biggest con, starting with my own experiences and viewpoints. Then we will see how it stacks up with Internet Wisdom:

YES: Automatic Migration makes for less manual work during deploys. Less manual work is ALWAYS a best practice during deploys and why I advocate one-script deploys over one-click deploys.

NO: Migrations can break production too easily. Even if the migration worked in development, migrations can act differently on test vs production. If your test database is smaller, less loaded, with a less beefy server or RDS configuration then a deploy during production can definitely be a pain.

One yes, one no — again these are the biggest reasons, in my mind, that automatic migrations should or should not be used in production.

Context Matters

In order to make an informed decision, we must consider the nature of the migrations, the size of the tables, and the database engine. For TINT, the migrations are Rails migrations, and are usually adding a column or adding a table. We have some decently large tables with tens of millions of rows as well as many more with millions of rows. This type of setup is typical of a Not Big Data shop. Rails migrations, Django migrations or your favorite ORM are likely to be similar.

Database Engines

The database engine is going to have a pretty big impact. On a spectrum of “no brainer” to “too fragile to touch,” the two common setups stack up thusly:

PostgreSQL

PostgreSQL is highly capable of safe, simple and no-brainer migrations. Since the early days of PostgreSQL, schema changes are isolated in a transaction, and can be interrupted and rolled back without any impact.

 

Recent PostgreSQL can even create indexes without locking tables, although you have to write the SQL and use “execute” to do this. There is a “gotcha” with adding columns with default values: this can cause a table rewrite, and may have to be split into multiple steps. (create column allowing NULLs, update all values — maybe using some of the advanced CTE-based techniques for doing so concurrently — and finally altering the column).

MySQL

MySQL, on the other hand, lacks support for transactions on schema changes. If a migration fails to apply, you as the poor guy or gal wearing your DBA hat today get to pick up all of the pieces.

What’s worse, MySQL will basically take the work-longer-not-smarter approach and rewrite and lock tables for almost every schema operation. That means schema operations scale proportionally to the number of rows. Millions of rows = minutes, hundreds of millions of rows = hours.

The only truly safe operation is creating a table without constraints (which for better or worse is de riguer on Rails), or operations on very small tables.

TINT is on MySQL. Thats the horse pulling our cart right now; so how does that affect the weight of YES and NO for us?

Before we answer that, lets take a look at Internet Wisdom.

Internet Wisdom on Automatic Database Migrations

At first, my Google-fu was only finding results that assumed automatic migrations are good to use in production but did not address any specific pain points of rails and/or mysql.

In Database Migrations and Java brings up a good point:

 

  • “Why do we actually need such a solution? We aim for frequent, one-click deployments to staging and production environments. We can’t do that without an automated way of syncing the database”

 

In this StackOverflow question, the assumption is that automatic migrations are good, at least for the kind of app one runs on Heroku. One answer makes some insightful points, that you can separate your schema commits from the code that needs them, to allow a push/migrate/push model.

The next article, Practical continuous deployment, doesn’t even mention SQL, databases or migrations; but it does give a meaty introduction into how and why to implement CI (with an emphasis on Atlassian’s panoply of solutions). Key points are that automation of deployments is key for continuous integration and delivery. My takeaway: Continuous deployment is the logical extension of automatic migrations.

And so with a better perspective of what I’m really after, I modified my search terms to “Continuous deployment rails migrations” which yielded some on-point articles:

This article: The biggest obstacle to start with Continuous Deployment – database migrations, describes techniques to write migrations that are safe to auto-deploy. It also features a good list of increasing levels of automation to shoot for with respect to migrations.

The next article, Rails Migrations with Zero Downtime, goes in depth on how to structure Rails migrations to avoid any downtime (although the article focuses on Postgres instead of Mysql).

My goal was to answer the question “Automatic Rails Migrations in Production – yes or no?” and these articles have given the details I needed to reach a conclusion. After an enjoyable morning reading and researching, here is how I weigh things for TINTs environment:

YES – automatic deploys get us closer to the goal of continuous delivery but we need to mitigate and have a good process (read: script 🙂 ) for when less safe migrations need to be run.