IP-based rate limiting

Chapter 10.2.

Using a global rate limiter can be useful when you want to enforce a strict limit on the total rate of requests to your API, and you don’t care about where the requests are coming from. But it’s generally more common to want an individual rate limiter for each client, so that one bad client making too many requests doesn’t affect all the others.

A conceptually straightforward way to implement this is to create an in-memory map of rate limiters, using the IP address for each client as the map key.

Each time a new client makes a request to our API, we will initialize a new rate limiter and add it to the map. For any subsequent requests, we will retrieve the client’s rate limiter from the map and check whether the request is permitted by calling its Allow() method, just like we did before.

But there’s one thing to be aware of: by default, maps are not safe for concurrent use. This is a problem for us because our rateLimit() middleware may be running in multiple goroutines at the same time (remember, Go’s http.Server handles each HTTP request in its own goroutine).

From the Go blog:

Maps are not safe for concurrent use: it’s not defined what happens when you read and write to them simultaneously. If you need to read from and write to a map from concurrently executing goroutines, the accesses must be mediated by some kind of synchronization mechanism.

So to get around this, we’ll need to synchronize access to the map of rate limiters using a sync.Mutex (a mutual exclusion lock), so that only one goroutine can read or write to the map at any moment in time.

Now let’s talk about IP addresses.

The request’s r.RemoteAddr field should contain the IP address of the client making the request. But… in the real world it’s possible that there will be proxy servers positioned between your application and the client, meaning that the IP address stored in r.RemoteAddr may not actually be the true IP address of the original client — instead it might be the IP address of a proxy.

Well-behaved proxies will typically add a X-Forwarded-For or X-Real-IP header to the request, containing the IP of the original client. So we can increase the chance of getting the real client’s IP by checking for these headers and — if they exist — using the IP address from them.

Although we could write the logic to do this ourselves, I recommend using the realip package to help with this. It’s very small, and simply retrieves the client IP address from any X-Forwarded-For or X-Real-IP headers, falling back on r.RemoteAddr if neither of them are present.

If you’re following along, go ahead and install the latest version of realip using the go get command:

$ go get github.com/tomasen/realip@latest
go: downloading github.com/tomasen/realip v0.0.0-20180522021738-f0c99a92ddce
go get: added github.com/tomasen/realip v0.0.0-20180522021738-f0c99a92ddce

OK, with that setup out of the way, let’s jump into the code and update our rateLimit() middleware to implement the changes.

File: cmd/api/middleware.go

package main

import (
    "fmt"
    "net" // New import
    "net/http"
    "sync" // New import

    "github.com/tomasen/realip" // New import
    "golang.org/x/time/rate" 
)

...

func (app *application) rateLimit(next http.Handler) http.Handler {
    // Declare a mutex and a map to hold the clients' IP addresses and rate limiters.
    var (
        mu      sync.Mutex
        clients = make(map[string]*rate.Limiter)
    )

    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Use the realip.FromRequest() function to get the client's IP address.
        ip := realip.FromRequest(r)

        // Lock the mutex to prevent this code from being executed concurrently.
        mu.Lock()

        // Check to see if the IP address already exists in the map. If it doesn't, then
        // initialize a new rate limiter and add the IP address and limiter to the map.
        if _, found := clients[ip]; !found {
            clients[ip] = rate.NewLimiter(2, 4)
        }

        // Call the Allow() method on the rate limiter for the current IP address. If
        // the request isn't allowed, unlock the mutex and send a 429 Too Many Requests
        // response, just like before.
        if !clients[ip].Allow() {
            mu.Unlock()
            app.rateLimitExceededResponse(w, r)
            return
        }

        // Very importantly, unlock the mutex before calling the next handler in the
        // chain. Notice that we DON'T use defer to unlock the mutex, as that would mean
        // that the mutex isn't unlocked until all the handlers downstream of this 
        // middleware have also returned.
        mu.Unlock()

        next.ServeHTTP(w, r)
    })
}

Deleting old limiters

The code above will work, but there’s a slight problem — the clients map will grow indefinitely, taking up more and more resources with every new IP address and rate limiter that we add.

To prevent this, let’s update our code so that we also record the last seen time for each client. We can then run a background goroutine in which we periodically delete any clients that we haven’t seen recently from the clients map.

To make this work, we’ll need to create a custom client struct which holds both the rate limiter and last seen time for each client, and launch the background cleanup goroutine when initializing the middleware.

Like so:

File: cmd/api/middleware.go

package main

import (
    "fmt"
    "net"
    "net/http"
    "sync"
    "time" // New import

    "github.com/tomasen/realip"
    "golang.org/x/time/rate"
)

...

func (app *application) rateLimit(next http.Handler) http.Handler {
    // Define a client struct to hold the rate limiter and last seen time for each
    // client.
    type client struct {
        limiter  *rate.Limiter
        lastSeen time.Time
    }

    var (
        mu sync.Mutex
        // Update the map so the values are pointers to a client struct.
        clients = make(map[string]*client)
    )

    // Launch a background goroutine which removes old entries from the clients map once
    // every minute.
    go func() {
        for {
            time.Sleep(time.Minute)

            // Lock the mutex to prevent any rate limiter checks from happening while
            // the cleanup is taking place.
            mu.Lock()

            // Loop through all clients. If they haven't been seen within the last three
            // minutes, delete the corresponding entry from the map.
            for ip, client := range clients {
                if time.Since(client.lastSeen) > 3*time.Minute {
                    delete(clients, ip)
                }
            }

            // Importantly, unlock the mutex when the cleanup is complete.
            mu.Unlock()
        }
    }()

    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ip := realip.FromRequest(r)

        mu.Lock()

        if _, found := clients[ip]; !found {
            // Create and add a new client struct to the map if it doesn't already exist.
            clients[ip] = &client{limiter: rate.NewLimiter(2, 4)}
        }

        // Update the last seen time for the client.
        clients[ip].lastSeen = time.Now()

        if !clients[ip].limiter.Allow() {
            mu.Unlock()
            app.rateLimitExceededResponse(w, r)
            return
        }

        mu.Unlock()

        next.ServeHTTP(w, r)
    })
}

At this point, if you restart the API and try making a batch of requests in quick succession again, you should find that the rate limiter continues to work correctly from the perspective of an individual client — just like it did before.

$ for i in {1..6}; do curl  http://localhost:4000/v1/healthcheck; done
{
    "status": "available",
    "system_info": {
        "environment": "development",
        "version": "1.0.0"
    }
}
{
    "status": "available",
    "system_info": {
        "environment": "development",
        "version": "1.0.0"
    }
}
{
    "status": "available",
    "system_info": {
        "environment": "development",
        "version": "1.0.0"
    }
}
{
    "status": "available",
    "system_info": {
        "environment": "development",
        "version": "1.0.0"
    }
}
{
    "error": "rate limit exceeded"
}
{
    "error": "rate limit exceeded"
}

Additional information

Distributed applications

Using this pattern for rate-limiting will only work if your API application is running on a single-machine. If your infrastructure is distributed, with your application running on multiple servers behind a load balancer, then you’ll need to use an alternative approach.

If you’re using HAProxy or Nginx as a load balancer or reverse proxy, both of these have built-in functionality for rate limiting that you can use. Alternatively, you could use a fast database like Redis to maintain a request count for clients, running on a server which all your application servers can communicate with.