How to rate limit the API in an express app


The rate limit protects backend APIs from assaults and handles unnecessary user requests. It controls our server’s processing speed.

This article will explore the many techniques for enforcing rate limits, discussing the advantages and disadvantages of each. And then we’ll put theory into practice by building a working version of our chosen strategy in the Node-Express App.

What is rate limiting?

Rate limiting is a strategy for controlling the quantity of incoming and outgoing network traffic. The network in this instance refers to the line of communication between a client (such as a web browser) and our server (e.g., an API).

For instance, we may wish to restrict the number of API calls a user can make each day to 1,000. If the user reaches that limit, we can disregard the request and return an error message indicating that the user has reached their limit.

Keep in mind that rate restriction can be applied on the basis of any of the following constraints:

  1. IP Address
  2. Location
  3. Users

Rate limit algorithms

There are a variety of algorithms for implementing rate restrictions, each with its own set of benefits and drawbacks. Some of the widely used are:

Fixed window counter

To limit the rate at which data is being transferred, this seems like the most straightforward method to use. Count how many times the user performs a request in each tab.

The term “window” is being used to describe the time frame in question. To clarify, we have a 10-second window if I want my API to support ten queries per minute. Hence, the one-time frame will go from 00:00:00 to 00:00:10, commencing with 00:00:00.

Sliding logs

The sliding logs algorithm logs the elapsed amount of time between each user’s requests. Both HashMap and Redis can be used to keep track of requests. Requests can be prioritized in either scenario by the time of day for smoother operations.

Sliding window counter

This method makes an effort to fix the drawbacks of the fixed window counter and the sliding logs method. Rather than keeping track of individual user requests, this method keeps a running tally of how many requests fall into each time-based group.

It counts how many requests each user has made and groups them into intervals of time (often a small percentage of the window size set by the limit).

Token bucket

The token bucket is a technique wherein a simple counter is maintained to track the number of remaining tokens and a timestamp to indicate when the counter was last updated. This idea comes from the packet-switched computer and telecommunications networks, where tokens are added to a fixed-capacity bucket at a constant rate (window interval).

Leaky bucket

If you’re familiar with the leaky bucket algorithm, you’ll know that it employs a queue that works on a FIFO basis, meaning that requests are received and handled in the order they were received. The queue size is strictly regulated. If, for instance, the queue can only handle ten requests at a time and the limit is ten requests per minute, then that’s exactly what will happen.

Implement rate limit in the Express app

Here is basic index.js file for the express server

if you are new to express applications please refer to this article to create a new project.

const express = require('express');
const app = express();

    const lang = [
        id: 1,
        name : "JS"
        id: 2,
        name : "python"
        id: 3,

app.get('/lang', (req, res) =>{

console.log('listening on port 4000');

For this, we’ll be using a third-party package called Express Rate Limit to create our sliding window counter rate limiting mechanism.

Install the package

npm i express-rate-limit

This package requires you to use Node 14 or above.


const rateLimit = require('express-rate-limit')

const limiter = rateLimit({
	windowMs: 1 * 60 * 1000, // 1 minutes
	max: 10, // Limit each IP to 10 requests per `window` (here, per 1 minutes)
	standardHeaders: true, // Return rate limit info in the `RateLimit-*` headers
	legacyHeaders: false, // Disable the `X-RateLimit-*` headers
		'Too many request from this IP, please try again after a minute',

// Apply the rate limiting middleware to all requests
  1. windowMs: The time period in which requests are processed and kept in memory. To be used in the Retry-After heading for when that quota is hit.
  2. max: The maximum number of connections to allow during the window before rate limiting the client.
  3. message: The response body to send back when a client is rate limited.
  4. legacyHeaders: Whether all responses should include the legacy rate limit headers for the limit (X-RateLimit-Limit), current use (X-RateLimit-Remaining), and reset time (if the store supplies it) (X-RateLimit-Reset). If true, the middleware will also add the Retry-After header to all denied requests.


Response on the first call
rate limit
Response after 10 calls

Sudeep Mishra

Sudeep Mishra