Working with SQL migrations

Chapter 6.2.

Now that the migrate tool is installed, let’s illustrate how to use it by creating a new movies table in our database.

The first thing we need to do is generate a pair of migration files using the migrate create command. If you’re following along, go ahead and run the following command in your terminal:

$ migrate create -seq -ext=.sql -dir=./migrations create_movies_table
/home/alex/Projects/greenlight/migrations/000001_create_movies_table.up.sql
/home/alex/Projects/greenlight/migrations/000001_create_movies_table.down.sql

In this command:

The -seq flag indicates that we want to use sequential numbering like 0001, 0002, ... for the migration files (instead of a Unix timestamp, which is the default).
The -ext flag indicates that we want to give the migration files the extension .sql.
The -dir flag indicates that we want to store the migration files in the ./migrations directory (which will be created automatically if it doesn’t already exist).
The name create_movies_table is a descriptive label that we give the migration files to signify their contents.

If you look in your migrations directory, you should now see a pair of new ‘up’ and ‘down’ migration files like so:

./migrations/
├── 000001_create_movies_table.down.sql
└── 000001_create_movies_table.up.sql

At the moment these two new files are completely empty. Let’s edit the ‘up’ migration file to contain the necessary CREATE TABLE statement for our movies table, like so:

File: migrations/000001_create_movies_table.up.sql

CREATE TABLE IF NOT EXISTS movies (
    id bigserial PRIMARY KEY,  
    created_at timestamp(0) with time zone NOT NULL DEFAULT NOW(),
    title text NOT NULL,
    year integer NOT NULL,
    runtime integer NOT NULL,
    genres text[] NOT NULL,
    version integer NOT NULL DEFAULT 1
);

Notice here how the fields and types in this table are analogous to the fields and types in the Movie struct that we created earlier? This is important because it means we’ll be able to easily map the data in each row of our movies table to a single Movie struct in our Go code.

If you’re not familiar with the different PostgreSQL data types in the SQL statement above, the official documentation provides a comprehensive overview. But the most important things to point out are that:

The id column has the type bigserial, which is a 64-bit auto-incrementing integer starting at 1. This will be the primary key for the table.
The genres column has the type text[] which is an array of zero-or-more text values. It’s important to note that arrays in PostgreSQL are themselves queryable and indexable, which is something that we’ll demonstrate later in the book.
You might remember from Let’s Go that working with NULL values in Go can be awkward, and where possible it’s easiest to just set NOT NULL constraints on every table column along with appropriate DEFAULT values — like we have above.
For storing strings we’re using the text type, instead of the alternative varchar or varchar(n) types. If you’re interested, this excellent blog post explains why text is generally the best character type to use in PostgreSQL.

Alright, let’s move on to the ‘down’ migration and add the SQL statements needed to reverse the ‘up’ migration that we just wrote.

File: migrations/000001_create_movies_table.down.sql

DROP TABLE IF EXISTS movies;

The DROP TABLE command in PostgreSQL always removes any indexes and constraints that exist for the target table, so this single statement is sufficient to reverse the ‘up’.

Great, that’s our first pair of migration files ready to go!

While we are at it, let’s also create a second pair of migration files containing CHECK constraints to enforce some of our business rules at the database-level. Specifically, we want to make sure that the runtime value is always greater than zero, the year value is between 1888 and the current year, and the genres array always contains between 1 and 5 items.

Again, if you’re following along, run the command below to create a second pair of migration files:

$ migrate create -seq -ext=.sql -dir=./migrations add_movies_check_constraints
/home/alex/Projects/greenlight/migrations/000002_add_movies_check_constraints.up.sql
/home/alex/Projects/greenlight/migrations/000002_add_movies_check_constraints.down.sql

And then add the following SQL statements to add and drop the CHECK constraints respectively:

File: migrations/000002_add_movies_check_constraints.up.sql

ALTER TABLE movies ADD CONSTRAINT movies_runtime_check CHECK (runtime >= 0);

ALTER TABLE movies ADD CONSTRAINT movies_year_check CHECK (year BETWEEN 1888 AND date_part('year', now()));

ALTER TABLE movies ADD CONSTRAINT genres_length_check CHECK (array_length(genres, 1) BETWEEN 1 AND 5);

File: migrations/000002_add_movies_check_constraints.down.sql

ALTER TABLE movies DROP CONSTRAINT IF EXISTS movies_runtime_check;

ALTER TABLE movies DROP CONSTRAINT IF EXISTS movies_year_check;

ALTER TABLE movies DROP CONSTRAINT IF EXISTS genres_length_check;

When we insert or update data in our movies table, if any of these checks fail our database driver will return an error similar to this:

pq: new row for relation "movies" violates check constraint "movies_year_check"

Executing the migrations

Alright, now we’re ready to run the two ‘up’ migrations against our greenlight database.

If you’re following along, go ahead and use the following command to execute the migrations, passing in the database DSN from your environment variable. If everything is set up correctly, you should see some output confirming that the migrations have been executed successfully. Similar to this:

$ migrate -path=./migrations -database=$GREENLIGHT_DB_DSN up
1/u create_movies_table (38.19761ms)
2/u add_movies_check_constraints (63.284269ms)

At this point, it’s worth opening a connection to your database and listing the tables with the \dt meta command:

$ psql $GREENLIGHT_DB_DSN
Password for user greenlight: 
psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

greenlight=> \dt
                List of relations
 Schema |       Name        | Type  |   Owner    
--------+-------------------+-------+------------
 public | movies            | table | greenlight
 public | schema_migrations | table | greenlight
(2 rows)

You should see that the movies table has been created, along with a schema_migrations table, both of which are owned by the greenlight user.

The schema_migrations table is automatically generated by the migrate tool and used to track which migrations have been applied. Let’s take a quick look inside it:

greenlight=> SELECT * FROM schema_migrations;
 version | dirty 
---------+-------
       2 | f
(1 row)

The version column here indicates that our migration files up to (and including) number 2 in the sequence have been executed against the database. The value of the dirty column is false, which indicates that the migration files were cleanly executed without any errors and the SQL statements they contain were successfully applied in full.

If you like, you can also run the \d meta command on the movies table to see the structure of the table and confirm that the CHECK constraints were created correctly. Like so:

greenlight-> \d movies
                                        Table "public.movies"
   Column   |            Type             | Collation | Nullable |              Default               
------------+-----------------------------+-----------+----------+------------------------------------
 id         | bigint                      |           | not null | nextval('movies_id_seq'::regclass)
 created_at | timestamp(0) with time zone |           | not null | now()
 title      | text                        |           | not null | 
 year       | integer                     |           | not null | 
 runtime    | integer                     |           | not null | 
 genres     | text[]                      |           | not null | 
 version    | integer                     |           | not null | 1
Indexes:
    "movies_pkey" PRIMARY KEY, btree (id)
Check constraints:
    "genres_length_check" CHECK (array_length(genres, 1) >= 1 AND array_length(genres, 1) <= 5)
    "movies_runtime_check" CHECK (runtime >= 0)
    "movies_year_check" CHECK (year >= 1888 AND year::double precision <= date_part('year'::text, now()))

Additional information

Migrating to a specific version

As an alternative to looking at the schema_migrations table, if you want to see which migration version your database is currently on you can run the migrate tool’s version command, like so:

$ migrate -path=./migrations -database=$EXAMPLE_DSN version
2

You can also migrate up or down to a specific version by using the goto command:

$ migrate -path=./migrations -database=$EXAMPLE_DSN goto 1

Executing down migrations

You can use the down command to roll back by a specific number of migrations. For example, to roll back the most recent migration you would run:

$ migrate -path=./migrations -database =$EXAMPLE_DSN down 1

Personally, I generally prefer to use the goto command to perform roll backs (as it’s more explicit about the target version) and reserve use of the down command for rolling-back all migrations, like so:

$ migrate -path=./migrations -database=$EXAMPLE_DSN down
Are you sure you want to apply all down migrations? [y/N]
y
Applying all down migrations
2/d create_bar_table (39.988791ms)
1/d create_foo_table (59.460276ms)

Another variant of this is the drop command, which will remove all tables from the database including the schema_migrations table — but the database itself will remain, along with anything else that has been created like sequences and enums. Because of this, using drop can leave your database in a messy and unknown state, and it’s generally better to stick with the down command if you want to roll back everything.

Fixing errors in SQL migrations

It’s important to talk about what happens when you make a syntax error in your SQL migration files, because the behavior of the migrate tool can be a bit confusing to start with.

When you run a migration that contains an error, all SQL statements up to the erroneous one will be applied and then the migrate tool will exit with a message describing the error. Similar to this:

$ migrate -path=./migrations -database=$EXAMPLE_DSN up
1/u create_foo_table (36.6328ms)
2/u create_bar_table (71.835442ms)
error: migration failed: syntax error at end of input in line 0: CREATE TABLE (details: pq: syntax error at end of input)

If the migration file which failed contained multiple SQL statements, then it’s possible that the migration file was partially applied before the error was encountered. In turn, this means that the database is in an unknown state as far as the migrate tool is concerned.

Accordingly, the version field in the schema_migrations field will contain the number for the failed migration and the dirty field will be set to true. At this point, if you run another migration (even a ‘down’ migration) you will get an error message similar to this:

Dirty database version {X}. Fix and force version.

What you need to do is investigate the original error and figure out if the migration file which failed was partially applied. If it was, then you need to manually roll back the partially applied migration.

Once that’s done, then you must also ‘force’ the version number in the schema_migrations table to the correct value. For example, to force the database version number to 1 you should use the force command like so:

$ migrate -path=./migrations -database=$EXAMPLE_DSN force 1

Once you force the version, the database is considered ‘clean’ and you should be able to run migrations again without any problem.

Remote migration files

The migrate tool also supports reading migration files from remote sources including Amazon S3 and GitHub repositories. For example:

$ migrate -source="s3://<bucket>/<path>" -database=$EXAMPLE_DSN up
$ migrate -source="github://owner/repo/path#ref" -database=$EXAMPLE_DSN up
$ migrate -source="github://user:personal-access-token@owner/repo/path#ref" -database=$EXAMPLE_DSN up

More information about this functionality and a full list of the supported remote resources can be found here.

Running migrations on application startup

If you want, it is also possible to use the golang-migrate/migrate Go package (not the command-line tool) to automatically execute your database migrations on application startup.

We won’t be using this approach in the book, so if you’re following along please don’t change your code. But roughly, the pattern looks like this:

package main

import (
    "context"      
    "database/sql" 
    "flag"
    "fmt"
    "log"
    "net/http"
    "os"
    "time"

    "github.com/golang-migrate/migrate/v4"                   // New import
    "github.com/golang-migrate/migrate/v4/database/postgres" // New import
    _ "github.com/golang-migrate/migrate/v4/source/file"     // New import
    _ "github.com/lib/pq"
)

func main() {
    ...

    db, err := openDB(cfg)
    if err != nil {
        logger.Error(err.Error())
        os.Exit(1)
    }
    defer db.Close()

    logger.Info("database connection pool established")

    migrationDriver, err := postgres.WithInstance(db, &postgres.Config{})
    if err != nil {
        logger.Error(err.Error())
        os.Exit(1)
    }

    migrator, err := migrate.NewWithDatabaseInstance("file:///path/to/your/migrations", "postgres", migrationDriver)
    if err != nil {
        logger.Error(err.Error())
        os.Exit(1)
    }

    err = migrator.Up()
    if err != nil && err != migrate.ErrNoChange {
        logger.Error(err.Error())
        os.Exit(1)
    }
    
    logger.Info("database migrations applied")

    ...
}

Although this works — and it might initially seem appealing — tightly coupling the execution of migrations with your application source code can potentially be limiting and problematic in the longer term.

The article decoupling database migrations from server startup provides a good discussion on this, and I recommend reading it if this is a topic that you’re interested in. It’s Python-focused, but don’t let that put you off — the same principles apply in Go applications too.