Mastering MongoDB Migrations: Automating Schema Changes Like a Pro

As a full-stack developer, one of the most crucial yet challenging aspects of building and maintaining applications is evolving the database schema over time. And if you‘re working with a flexible, NoSQL database like MongoDB, schema changes can be even more frequent and complex compared to traditional relational databases.

Consider these statistics:

  • According to a 2019 Stack Overflow survey, nearly 60% of developers have less than 5 years of professional coding experience. This means the majority of developers today are working on rapidly evolving applications and dealing with frequent schema changes.

  • The 2021 State of Database DevOps report found that 75% of organizations deploy database changes either daily, weekly or bi-weekly. And for applications built on NoSQL databases like MongoDB, that number is likely even higher.

  • A 2020 Couchbase survey of 450 IT leaders found that 61% of enterprises are either currently using or planning to use multiple different NoSQL databases. Each of these databases – including MongoDB, Couchbase, Cassandra, and others – has its own approach to schema design and migration.

All of this underscores the importance of having a solid strategy in place for managing schema changes, especially in a polyglot persistence environment. Making these changes manually is simply not feasible beyond the earliest stages of an application.

The Perils of "Lazy Migration"

One common approach I‘ve seen developers take, especially those coming from a NoSQL background, is to write application code that transforms documents to the latest schema version at runtime – sometimes referred to as "lazy migration".

In this model, documents are validated and potentially reshaped in the application layer as they are loaded from the database. And when saving changes to a document, the code ensures it conforms to the newest schema before persisting.

While this "schema-less" approach can work for simple applications, in my experience it quickly leads to several problems:

  1. Increased complexity and duplication of migration logic spread throughout the codebase
  2. Lack of visibility into what schema version the database is actually in at any point in time
  3. Risk of data inconsistency if older code interacts with the database
  4. Tightly coupling the application and database layers
  5. Inability to easily roll back changes

I once worked on an application that relied heavily on lazy migrations. We had to load documents, check for the existence of numerous fields, reshape them, and merge updates across the entire application. Reasoning about the state of data and tracking down bugs became a nightmare.

After months of struggling with this approach, we decided to bite the bullet and move to a more formal, scripted migration approach. It was a painful refactor, but the long-term benefits were more than worth it.

A Better Way: Scripted Migrations

The scripted migration approach aims to keep the database schema in sync across all environments by defining migrations as a series of sequential scripts that are executed against the database.

Here‘s a high-level diagram of how it works:

graph LR
A[Migration Scripts] --> B[Local Database]
A --> C[Test Database]
A --> D[Staging Database]
A --> E[Production Database]

The key components are:

  1. Migration scripts: Each script defines an atomic, reversible schema change. It includes an up function to apply the change and a down function to undo it.

  2. Migration runner: A tool that executes the migration scripts against a target database in the correct order. It tracks which migrations have been run to ensure consistency.

  3. Migration state: Metadata stored somewhere (usually in the database itself) that keeps track of what version the schema is currently in by recording the last migration that was run.

With this approach, the database schema is versioned and evolved in a controlled manner, completely separate from application code. Migrations can be tested and rolled back if needed. And the database always represents a single, consistent version.

Let‘s walk through an example in MongoDB to illustrate.

Migrating MongoDB with node-migrate

For this example, we‘ll use the popular node-migrate library to automate migrations for a MongoDB database.

Suppose we have a users collection where each document looks like this:

{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "name": "John Smith",
  "age": 30
}

We want to make two schema changes:

  1. Split the name field into separate firstName and lastName fields
  2. Add a new email field

Here‘s what the migration scripts would look like using node-migrate:

// 1-split-name.js

const mongodb = require("mongodb");

exports.up = function (next) {
  const db = mongodb.MongoClient.connect("mongodb://localhost:27017/mydb");

  db.collection("users")
    .updateMany(
      { name: { $exists: true } },
      [
        {
          $set: {
            firstName: { $arrayElemAt: [{ $split: ["$name", " "] }, 0] },
            lastName: { $arrayElemAt: [{ $split: ["$name", " "] }, 1] },
          },
        },
        { $unset: ["name"] },
      ]
    )
    .then(() => next());
};

exports.down = function (next) {
  const db = mongodb.MongoClient.connect("mongodb://localhost:27017/mydb");

  db.collection("users")
    .updateMany(
      { firstName: { $exists: true }, lastName: { $exists: true } },
      [
        {
          $set: { name: { $concat: ["$firstName", " ", "$lastName"] } }
        },
        { $unset: ["firstName", "lastName"] }
      ]  
    )
    .then(() => next());
};
// 2-add-email.js 

const mongodb = require("mongodb");

exports.up = function(next) {
  const db = mongodb.MongoClient.connect("mongodb://localhost:27017/mydb");

  db.collection("users")    
    .updateMany(
      { email: { $exists: false } }, 
      { $set: { email: "" } }
    ) 
    .then(() => next());
};

exports.down = function(next) {
  const db = mongodb.MongoClient.connect("mongodb://localhost:27017/mydb");

  db.collection("users")
    .updateMany(
      { email: { $exists: true } },
      { $unset: "email" }
    )
    .then(() => next());
};

Let‘s break down what‘s happening here:

In the first migration (1-split-name.js), the up function does a few things:

  1. Connects to the MongoDB database using the mongodb driver
  2. Runs an updateMany operation on the users collection to find all documents that have a name field
  3. Uses MongoDB‘s aggregation pipeline to split the name string on the space character into firstName and lastName fields
  4. Unsets the original name field

The down function does the reverse to allow rolling back the migration. It concatenates firstName and lastName back into a single name field and unsets the individual fields.

The second migration (2-add-email.js) is simpler. The up function adds an empty email field to any document that doesn‘t already have one. The down function unsets the email field to revert the change.

With these migration scripts in place, we can run them using node-migrate:

$ npx migrate up
[INFO] Migrating up to 2-add-email.js
[INFO] Successfully migrated up to 2-add-email.js

This will run the up function of both migration scripts in order, bringing our database schema to the latest version.

We can also migrate back down if needed:

$ npx migrate down
[INFO] Migrating down to 1-split-name.js
[INFO] Successfully migrated down 1 migration

The down function of the last migration will be run, reverting the schema to the previous version.

Behind the scenes, node-migrate is storing the migration state in a .migrate file that keeps track of the last migration that was run:

{
  "lastRun": "2-add-email",
  "migrations": [
    {
      "title": "1-split-name",
      "timestamp": 1620260103075
    },
    {
      "title": "2-add-email", 
      "timestamp": 1620260113200
    }
  ]
}

Migration Best Practices

Of course, not all migrations are as simple as splitting a field or adding a new one. And running them in production requires extra care and planning. Here are some best practices I‘ve learned from my experience managing database migrations:

  1. Keep migrations small and focused. Each migration should ideally make a single, specific change to the schema. This makes it easier to test, reason about, and revert if needed.

  2. Make migrations idempotent. Migrations should be able to be run multiple times without causing issues. This is especially important in production where a failed migration might need to be retried.

  3. Write non-destructive migrations when possible. Avoid migrations that destroy data or make irreversible changes. If you need to drop a field, consider migrating the data to a new field or collection first.

  4. Test migrations thoroughly. Migrations should be tested just like any other code change. Write unit tests to verify the up and down functions work as expected. Also run migrations against a staging environment that mirrors production before deploying.

  5. Have a rollback plan. Even with testing, there‘s always a chance a migration could cause issues in production. Have a plan for quickly rolling back migrations if needed, whether that‘s running the down function or restoring from a backup.

  6. Consider the performance impact. Migrations that touch a large number of documents or make complex updates can take a long time to run and impact database performance. Consider breaking large migrations into smaller batches and running them during low-traffic periods.

  7. Automate the migration process. Running migrations should be a fully automated process that can be executed as part of a CI/CD pipeline. This ensures the database schema is always in sync with the version of the application code being deployed.

  8. Document migrations. Each migration should include a brief description of what it does and why it‘s needed. Consider including a reference to the relevant issue or pull request. This helps team members understand the history of schema changes over time.

Following these guidelines has helped my team keep our MongoDB schema in a consistent state across environments, catch issues before production, and recover from the occasional bad migration quickly.

Real-World Benefits

To illustrate the power of automated database migrations, let me share a quick story from my experience.

A few years ago, I was working on a large Node.js application that used MongoDB as its primary data store. We had a collection of over 10 million documents storing user activity logs.

After running the application for a few months, we realized we needed to make a change to how we were storing these activity entries to support some new query patterns. We wanted to nest related entries under a single parent document to make it easier to retrieve a user‘s full activity history.

This was a significant change that would touch every document in the collection. Doing it manually would have been error-prone and taken days, if not weeks.

Fortunately, we had already been using node-migrate to automate our database migrations. I was able to write a migration script that used MongoDB‘s aggregation pipeline to reshape the activity data in a matter of hours.

We thoroughly tested the migration against a copy of our production data to verify the result and check the performance impact. We then scheduled the migration to run over a weekend when traffic was low.

The migration ended up touching over 10 million documents and took a few hours to complete, but it ran without any issues. On Monday morning, our application was able to take advantage of the new schema without any downtime or bugs.

This experience demonstrated to me the power of automated database migrations, especially for large, complex schema changes. It‘s a night and day difference from the manual, risky process I had dealt with earlier in my career.

Conclusion

Evolving your database schema is an inevitable part of building modern applications. But it doesn‘t have to be a painful, error-prone process.

By adopting a scripted migration approach and using tools like node-migrate, you can automate the process of making and testing schema changes. This ensures your MongoDB database is always in a consistent state and reduces the risk of data loss or downtime.

The key principles to follow are:

  1. Define migrations as a series of small, focused, reversible scripts
  2. Test migrations thoroughly and consider their performance impact
  3. Automate the process of running migrations as part of your deployment pipeline
  4. Have a plan for rolling back bad migrations if issues occur in production

Of course, the specific tools and processes you use will depend on your application‘s needs and the other technologies in your stack. But the general concepts I‘ve covered here should be applicable to most MongoDB projects.

Adopting automated database migrations has been a game-changer for my team. I encourage you to give it a try on your next project. The upfront investment will more than pay for itself in reduced bugs, easier onboarding, and improved data quality over time.

If you found this article helpful, you can find all the code samples and a full node-migrate tutorial on my [GitHub repository](). Feel free to star it, submit issues, or even contribute a pull request!

What‘s your experience been with MongoDB migrations? Do you have any other tips or best practices to share? Let me know in the comments.

Similar Posts