Graceful shutdown of background tasks

Chapter 13.5.

Graceful shutdown of background tasks

Sending our welcome email in the background is working well, but there’s still an issue we need to address.

When we initiate a graceful shutdown of our application, it won’t wait for any background goroutines that we’ve launched to complete. So — if we happen to shut down our server at an unlucky moment — it’s possible that a new client will be created on our system but they will never be sent their welcome email.

Fortunately, we can prevent this by using Go’s sync.WaitGroup functionality to coordinate the graceful shutdown and our background goroutines.

An introduction to sync.WaitGroup

When you want to wait for a collection of goroutines to finish their work, the principal tool to help with this is the sync.WaitGroup type.

The way that sync.WaitGroup works is conceptually a bit like a ‘counter’. Each time you launch a background goroutine using the WaitGroup.Go() method it increments a counter by 1, and when each goroutine finishes, it decrements the counter by 1. You can then use the WaitGroup.Wait() method to block execution of your code and wait until the counter equals zero — at which point you know that all your background goroutines have finished.

Let’s take a quick look at a standalone example of how it works in practice.

In the code below, we’ll launch five goroutines that print out "hello from a goroutine", and use sync.WaitGroup to wait for them all to complete before the program exits.

package main

import (
	"log"
	"sync"
	"time"
)

func main() {
    // Declare a new WaitGroup.
    var wg sync.WaitGroup

    // Execute a loop 5 times.
    for range 5 {
        // Use the Go() method to launch a new goroutine, which sleeps for 1
        // second and then prints a log message.
        wg.Go(func() {
            time.Sleep(time.Second)
            log.Println("hello from a goroutine ")
        })
    }

    log.Println("all goroutines launched")

    // Wait() blocks until the WaitGroup counter is zero --- essentially blocking until all
    // goroutines have completed.
    wg.Wait()

    log.Println("all goroutines finished")
}

If you run the above code, you’ll see that the output looks like this:

2009/11/10 23:00:00 all goroutines launched
2009/11/10 23:00:01 hello from a goroutine 
2009/11/10 23:00:01 hello from a goroutine 
2009/11/10 23:00:01 hello from a goroutine 
2009/11/10 23:00:01 hello from a goroutine 
2009/11/10 23:00:01 hello from a goroutine 
2009/11/10 23:00:01 all goroutines finished

We can see from the log messages and timestamps that all the goroutines are launched, but the call to wg.Wait() blocks any further execution of our code until all the goroutines finish around 1 second later. Once they have all completed, the block is lifted and the final log message gets printed.

Fixing our application

Let’s update our application to incorporate a sync.WaitGroup that coordinates our graceful shutdown and background goroutines.

We’ll begin in our cmd/api/main.go file, and edit the application struct to contain a new sync.WaitGroup. Like so:

File: cmd/api/main.go

package main

import (
    "context"
    "database/sql"
    "flag"
    "log/slog"
    "os"
    "sync" // New import
    "time"

    "greenlight.alexedwards.net/internal/data"
    "greenlight.alexedwards.net/internal/mailer"

    _ "github.com/lib/pq"
)

...

// Include a sync.WaitGroup in the application struct. The zero value for a
// sync.WaitGroup type is a valid, usable, sync.WaitGroup with a 'counter' value of 0,
// so we don't need to do anything else to initialize it before we can use it.
type application struct {
    config config
    logger *slog.Logger
    models data.Models
    mailer *mailer.Mailer
    wg     sync.WaitGroup
}

...

Next let’s head to the cmd/api/helpers.go file and update the app.background() helper to launch goroutines using this sync.WaitGroup, like so:

File: cmd/api/helpers.go

package main

...

func (app *application) background(fn func()) {
    // Use the Go() method to launch the background goroutine. The code inside 
    // this is exactly the same as in the previous chapter --- we're just now 
    // launching the goroutine using the WaitGroup's Go() method instead of the 
    // regular go() keyword.
	app.wg.Go(func() {
		defer func() {
			pv := recover()
			if pv != nil {
				app.logger.Error(fmt.Sprintf("%v", pv))
			}
		}()

		fn()
	})
}

Then the final thing we need to do is update our graceful shutdown functionality so that it waits for any of these background goroutines to complete before terminating the application. We can do that by adapting our app.serve() method like so:

File: cmd/api/server.go

package main

...

func (app *application) serve() error {
    srv := &http.Server{
        Addr:         fmt.Sprintf(":%d", app.config.port),
        Handler:      app.routes(),
        IdleTimeout:  time.Minute,
        ReadTimeout:  5 * time.Second,
        WriteTimeout: 10 * time.Second,
        ErrorLog:     slog.NewLogLogger(app.logger.Handler(), slog.LevelError),
    }

    shutdownError := make(chan error)

    go func() {
        quit := make(chan os.Signal, 1)
        signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
        s := <-quit

        app.logger.Info("shutting down server", "signal", s.String())

        ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
        defer cancel()

        // Call Shutdown() on the server like before, but now we only send on the
        // shutdownError channel if it returns an error.
        err := srv.Shutdown(ctx)
        if err != nil {
            shutdownError <- err
        }

        // Log a message to say that we're waiting for any background goroutines to
        // complete their tasks.
        app.logger.Info("completing background tasks", "addr", srv.Addr)

        // Call Wait() to block until our WaitGroup counter is zero --- essentially
        // blocking until the background goroutines have finished. Then we return nil on
        // the shutdownError channel, to indicate that the shutdown completed without
        // any issues.
        app.wg.Wait()
        shutdownError <- nil
    }()

    app.logger.Info("starting server", "addr", srv.Addr, "env", app.config.env)

    err := srv.ListenAndServe()
    if !errors.Is(err, http.ErrServerClosed) {
        return err
    }

    err = <-shutdownError
    if err != nil {
        return err
    }

    app.logger.Info("stopped server", "addr", srv.Addr)

    return nil
}

To try this out, go ahead and restart the API and then send a request to the POST /v1/users endpoint immediately followed by a SIGTERM signal. For example:

$ BODY='{"name": "Edith Smith", "email": "edith@example.com", "password": "pa55word"}'
$ curl -d "$BODY" localhost:4000/v1/users & pkill -SIGTERM api &

When you do this, your server logs should look similar to the output below:

$ go run ./cmd/api
time=2023-09-10T10:58:12.124+02:00 level=INFO msg="database connection pool established"
time=2023-09-10T10:58:12.722+02:00 level=INFO msg="starting server" addr=:4000 env=development
...
time=2023-09-10T10:59:14.494+02:00 level=INFO msg="shutting down server" signal=terminated
time=2023-09-10T10:59:14.569+02:00 level=INFO msg="completing background tasks" addr=:4000
time=2023-09-10T10:59:16.348+02:00 level=INFO msg="stopped server" addr=:4000

Notice that after "completing background tasks" is logged there is a pause of a couple of seconds, while the background email sending completes, finally followed by the "stopped server" message?

This nicely illustrates how the graceful shutdown process waited for the welcome email to be sent (which took about two seconds in my case) before finally terminating the application.