Global rate limiting

Chapter 10.1.

Let’s build things up slowly and start by creating a single global rate limiter for our application. This will consider all the requests that our API receives (rather than having separate rate limiters for every individual client).

Instead of writing our own rate-limiting logic from scratch, which would be quite complex and time-consuming, we can leverage the x/time/rate package to help us here. This provides a tried-and-tested implementation of a token bucket rate limiter.

If you’re following along, please go ahead and download the latest version of this package like so:

$ go get golang.org/x/time/rate@latest
go: downloading golang.org/x/time v0.12.0
go: added golang.org/x/time v0.12.0

Before we start writing any code, let’s take a moment to explain how token-bucket rate limiters work. The official x/time/rate documentation says:

A Limiter controls how frequently events are allowed to happen. It implements a “token bucket” of size b, initially full and refilled at rate r tokens per second.

Putting that into the context of our API application…

We will have a bucket that starts with b tokens in it.
Each time we receive an HTTP request, we will remove one token from the bucket.
Every 1/r seconds, a token is added back to the bucket — up to a maximum of b total tokens.
If we receive an HTTP request and the bucket is empty, then we should return a 429 Too Many Requests response.

In practice this means that our application would allow a maximum ‘burst’ of b HTTP requests in quick succession, but over time it would allow an average of r requests per second.

In order to create a token bucket rate limiter from x/time/rate, we will need to use the NewLimiter() function. This has a signature which looks like this:

// Note that the Limit type is an 'alias' for float64.
func NewLimiter(r Limit, b int) *Limiter

So if we want to create a rate limiter which allows an average of 2 requests per second, with a maximum of 4 requests in a single ‘burst’, we could do so with the following code:

// Allow 2 requests per second, with a maximum of 4 requests in a burst. 
limiter := rate.NewLimiter(2, 4)

Enforcing a global rate limit

OK, with that high-level explanation out of the way, let’s jump into some code and see how this works in practice.

One of the nice things about the middleware pattern that we are using is that it is straightforward to include ‘initialization’ code which only runs once when we wrap something with the middleware, rather than running on every request that the middleware handles.

func (app *application) exampleMiddleware(next http.Handler) http.Handler {
    
    // Any code here will run only once, when we wrap something with the middleware. 

    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        
        // Any code here will run for every request that the middleware handles.

        next.ServeHTTP(w, r)
    })
}

In our case, we’ll make a new rateLimit() middleware method which creates a new rate limiter as part of the ‘initialization’ code, and then uses this rate limiter for every request that it subsequently handles.

If you’re following along, open up the cmd/api/middleware.go file and create the middleware like so:

File: cmd/api/middleware.go

package main

import (
    "fmt"
    "net/http"

    "golang.org/x/time/rate" // New import
)

...

func (app *application) rateLimit(next http.Handler) http.Handler {
    // Initialize a new rate limiter which allows an average of 2 requests per second, 
    // with a maximum of 4 requests in a single 'burst'.
    limiter := rate.NewLimiter(2, 4)

    // The function we are returning is a closure, which 'closes over' the limiter 
    // variable.
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Call limiter.Allow() to see if the request is permitted, and if it's not, 
        // then we call the rateLimitExceededResponse() helper to return a 429 Too Many
        // Requests response (we will create this helper in a minute).
        if !limiter.Allow() {
            app.rateLimitExceededResponse(w, r)
            return
        }

        next.ServeHTTP(w, r)
    })
}

In this code, whenever we call the Allow() method on the rate limiter exactly one token will be consumed from the bucket. If there are no tokens left in the bucket, then Allow() will return false and that acts as the trigger for us to send the client a 429 Too Many Requests response.

It’s also important to note that the code behind the Allow() method is protected by a mutex and is safe for concurrent use.

Let’s now go to our cmd/api/errors.go file and create the rateLimitExceededResponse() helper. Like so:

File: cmd/api/errors.go

package main

...

func (app *application) rateLimitExceededResponse(w http.ResponseWriter, r *http.Request) {
    message := "rate limit exceeded"
    app.errorResponse(w, r, http.StatusTooManyRequests, message)
}

Then, lastly, in the cmd/api/routes.go file we want to add the rateLimit() middleware to our middleware chain. This should come after our panic recovery middleware (so that any panics in rateLimit() are recovered), but otherwise we want it to be used as early as possible to prevent unnecessary work for our server.

Go ahead and update the file accordingly:

File: cmd/api/routes.go

package main

...

func (app *application) routes() http.Handler {
    router := httprouter.New()

    router.NotFound = http.HandlerFunc(app.notFoundResponse)
    router.MethodNotAllowed = http.HandlerFunc(app.methodNotAllowedResponse)

    router.HandlerFunc(http.MethodGet, "/v1/healthcheck", app.healthcheckHandler)

    router.HandlerFunc(http.MethodGet, "/v1/movies", app.listMoviesHandler)
    router.HandlerFunc(http.MethodPost, "/v1/movies", app.createMovieHandler)
    router.HandlerFunc(http.MethodGet, "/v1/movies/:id", app.showMovieHandler)
    router.HandlerFunc(http.MethodPatch, "/v1/movies/:id", app.updateMovieHandler)
    router.HandlerFunc(http.MethodDelete, "/v1/movies/:id", app.deleteMovieHandler)

    // Wrap the router with the rateLimit() middleware.
    return app.recoverPanic(app.rateLimit(router))
}

Now we should be ready to try this out!

Restart the API, then in another terminal window execute the following command to issue a batch of 6 requests to our GET /v1/healthcheck endpoint in quick succession. You should get responses which look like this:

$ for i in {1..6}; do curl http://localhost:4000/v1/healthcheck; done
{
    "status": "available",
    "system_info": {
        "environment": "development",
        "version": "1.0.0"
    }
}
{
    "status": "available",
    "system_info": {
        "environment": "development",
        "version": "1.0.0"
    }
}
{
    "status": "available",
    "system_info": {
        "environment": "development",
        "version": "1.0.0"
    }
}
{
    "status": "available",
    "system_info": {
        "environment": "development",
        "version": "1.0.0"
    }
}
{
    "error": "rate limit exceeded"
}
{
    "error": "rate limit exceeded"
}

We can see from this that the first 4 requests succeed, due to our limiter being set up to permit a ‘burst’ of 4 requests in quick succession. But once those 4 requests were used up, the tokens in the bucket ran out and our API began to return the "rate limit exceeded" error response instead.

If you wait a second and rerun this command, you should find that some requests in the second batch succeed again, due to the token bucket being refilled at the rate of two tokens every second.