Innovative Resilience: Utilizing Exponential Backoff and Jitter for Retries under Load

This section delves into the intelligent use of retries with exponential backoff and jitter to handle network failures gracefully without overwhelming backend services. It offers practical patterns, real-world examples, and a comprehensive understanding of these techniques.

ShareShare

Innovative Resilience: Utilizing Exponential Backoff and Jitter for Retries under Load

In distributed systems, failure scenarios such as timeouts, HTTP 503 Service Unavailable responses, dropped packets, and throttled endpoints are inevitable, particularly as the system scales. The instinctive reaction might be to immediately attempt a retry. However, a more strategic approach involves careful retries, progressively increasing delay between these retries, and introducing a degree of randomness into the process.

The strategy of exponential backoff combined with jitter sets the stage for this careful and strategic handling of failures. It enables clients to retry failed requests in a gradual manner, effectively avoiding overwhelming an already strained service with a retry storm.

This deep dive will explore:

  • The dangers of naive retries
  • The concept of exponential backoff
  • The role of jitter in preventing overload
  • Practical JavaScript patterns for implementing retries
  • Real-world examples from tech giants like Google, AWS, and Stripe

The Logic of Retries

In certain situations, retrying is the most logical approach:

  • A flaky mobile network leading to a timeout
  • A server temporarily returning HTTP 500 Internal Server Error or HTTP 429 Too Many Requests
  • A dropped connection during a fetch operation
  • A momentarily full queue

However, immediate or simultaneous retries from multiple clients, also known as a thundering herd, can lead to cascading failures.


The Art of Exponential Backoff

Exponential backoff is a strategy that involves progressively increasing the wait time between retries. More specifically, it increases this delay geometrically, as demonstrated in the following JavaScript function:

const delay = (attempt) => Math.pow(2, attempt) * 100;

The retry timeline with exponential backoff would look something like this:

  • Retry 1 → wait 100ms
  • Retry 2 → wait 200ms
  • Retry 3 → wait 400ms
  • Retry 4 → wait 800ms

By implementing exponential backoff, you provide your servers with the necessary breathing room to recover, thereby reducing the retry load.


Enhancing Safety with Jitter

Despite the benefits of exponential backoff, if all clients retry at the same intervals, the system can still be overloaded. This is where jitter comes into play, adding a level of randomness to the retry delays:

function backoffWithJitter(attempt) {
  const base = Math.pow(2, attempt) * 100;
  return base / 2 + Math.random() * (base / 2);
}

Jitter effectively disperses the retry waves, resulting in smoother traffic flow. There are several common jitter modes:

  • Full jitter: Math.random() * base
  • Equal jitter: base/2 + Math.random() * base/2
  • Decorrelated jitter: min(cap, random_between(base, prev_delay * 3)) (used by AWS)

Practical Retry Pattern in JavaScript

Here's an example of a practical JavaScript function that implements retries using exponential backoff and jitter, and can be used for GET requests, polling operations, and load-on-mount queries:

async function fetchWithRetry(url, retries = 5) {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      const res = await fetch(url);
      if (!res.ok) throw new Error("Failed");
      return await res.json();
    } catch (e) {
      const delay = backoffWithJitter(attempt);
      await new Promise((r) => setTimeout(r, delay));
    }
  }
  throw new Error("Retries exhausted");
}

Available Retry Libraries

There are several libraries that provide built-in support for retries with backoff and/or jitter:

  • retry (npm): This library provides retry decorators and policies.
  • axios-retry: This library plugs into Axios to provide retry functionality.
  • React Query: This library includes built-in support for retries with backoff.
  • SWR: This library supports retries with exponential delay through its onErrorRetry function.

When Retries Should Be Avoided

Retries should ideally be avoided for:

  • Mutations with side effects such as payments or sending data.
  • User-generated form submissions without providing user feedback.
  • Hard failures such as HTTP 401 Unauthorized, 403 Forbidden, and 404 Not Found; these are not likely to be resolved by simply retrying.

In essence, retries should only be used when the failure is likely to be transient.


Real-World Examples of Retry Strategies

AWS SDK

The AWS SDK employs capped exponential backoff combined with jitter for retries. It also provides configurable retry policies for every SDK.

Stripe

Stripe retries HTTP 429 and 500 errors with increasing delays. However, their documentation also provides warnings about which endpoints are safe or unsafe to retry.

Google APIs

Google APIs retry HTTP 5xx errors and rate limit errors using full jitter. They also provide Retry-After headers to control backoff.


Retry Anti-Patterns

Certain practices should be avoided when implementing retries:

  • Retrying without delay, as this can overload the backend.
  • Using a fixed delay for retries, as it can lead to synchronization storms.
  • Retrying indefinitely, as it can cause cascading failures.
  • Ignoring Retry-After headers, which are intended to control retries.
  • Failing silently after retries, without providing any feedback.

Conclusion: Fail Slowly, Recover Gracefully

Retrying isn't just a fallback mechanism — it's a well-thought-out strategy. It's about acknowledging the fact that failures do occur and planning for those failures in a way that doesn't contribute to the chaos. Exponential backoff ensures that retries are conducted in a careful and measured manner, while jitter introduces randomness to prevent synchronization issues. Together, they enable your frontend to be a responsible participant in a distributed environment.

In the face of adversity, the way you choose to retry can make a significant difference.

Subscribe for Deep Dives

Get exclusive in-depth technical articles and insights directly to your inbox.

about

Ehsan Hosseini

Ehsan Hosseini

me [at] ehosseini [dot] info

Staff Software Engineer and Tech Lead with a track record of leading high-performing engineering teams and delivering scalable, end-to-end technology solutions. With experience across web, mobile, and backend systems, I help companies make smart architectural decisions, scale efficiently, and align technical strategy with business goals.

© 2025 Ehsan Hosseini. All rights reserved.