Idempotency Explained: How to Design Safe APIs That Don’t Break in Production

Idempotency Explained How to Design Safe APIs That Don’t Break in Production

Idempotency is one of those topics that looks simple on paper and brutal in production.

Most engineers understand the definition. Fewer understand the failure modes. Almost no one gets it right the first time in a real distributed system.

I’ve reviewed systems where idempotency was:

  • assumed instead of designed,
  • half-implemented with flags,
  • bolted on after an outage,
  • or misunderstood as an HTTP concern.

This article is not about definitions.
It’s about why idempotency exists, where teams get it wrong, and how to design it correctly in real systems—especially under retries, crashes, and partial failures.


Why Idempotency Exists (The Real Reason)

Idempotency does not exist because:

  • REST says so
  • HTTP verbs say so
  • interviews say so

It exists because networks lie.

In production:

  • requests time out even though the server processed them
  • load balancers drop responses
  • clients retry aggressively
  • users double-click
  • background workers crash mid-job
  • queues redeliver messages

Your system must assume this will happen.

If retries cause:

  • duplicate orders
  • double payments
  • repeated inventory deduction
  • multiple emails or notifications

then your system is not resilient. It’s fragile.

Idempotency is a defensive design against reality.


idempotency meme
idempotency meme

The Definition Is Technically Correct — and Practically Useless

An operation is idempotent if performing it multiple times has the same effect as performing it once.

This hides the real challenge.

In real systems:

  • requests are not atomic
  • failures happen mid-operation
  • responses may never reach the caller
  • concurrency is unavoidable

Idempotency is not about calling the same function twice.
It’s about executing the same intent multiple times without corrupting state.

That distinction matters.


Where Idempotency Is Mandatory (No Exceptions)

If any of these are true, idempotency is required:

  • Money is involved
  • Inventory is involved
  • State transitions are irreversible
  • External systems are called
  • Background jobs can retry
  • Webhooks are consumed

Examples:

  • POST /payments
  • POST /orders
  • POST /reserve-inventory
  • POST /subscriptions/change
  • queue consumers
  • webhook receivers

If your system retries and the operation isn’t idempotent, you’ve built a bug factory.


The Most Dangerous Lie: “POST Is Not Idempotent”

This line has caused more outages than bad code.

HTTP semantics say:

  • GET → idempotent
  • POST → not guaranteed idempotent

That does not mean:

  • POST cannot be idempotent
  • POST should not be idempotent

In fact, most critical POST endpoints must be idempotent.

Payments, orders, reservations—almost all are POST requests.
If they aren’t idempotent, retries will destroy data integrity.

HTTP does not save you.
Your backend design does.


How Teams Commonly Get This Wrong

Let’s talk about real mistakes—not theory.


❌ “We Just Retry”

Retries without idempotency amplify failure.

What actually happens:

  1. Client sends request
  2. Server processes it successfully
  3. Response times out
  4. Client retries
  5. Server processes it again
  6. Duplicate side effects occur

Retries are safe only after idempotency exists.

If your retry strategy came before idempotency, the system is already broken.


❌ “Check If It Exists, Then Insert”

This pattern looks harmless:

SELECT * FROM orders WHERE order_id = ?
IF NOT EXISTSINSERT

It fails under concurrency.

Two requests arrive at the same time:

  • both see no record
  • both insert
  • duplicate created

This is not idempotency.
This is a race condition with confidence.


❌ “We Use a Processed Flag”

Flags like is_processed = true only work if:

  • the entire operation is atomic
  • no partial failure can occur
  • writes are perfectly ordered

In distributed systems, those assumptions are false.

If a crash happens:

  • after charging
  • before updating the flag

the retry charges again.

Flags are not idempotency. They are optimism.


The Correct Mental Model: Deduplicate Intent, Not Requests

This is the shift most teams miss.

Idempotency is not about requests.
It’s about logical actions.

“Create this order”
“Charge this payment”
“Reserve these items”

You must guarantee that the same intent is executed once, no matter how many times it is requested.

That’s why idempotency keys exist.


Idempotency Keys: The Only Reliable Foundation

An idempotency key is:

  • generated by the client
  • unique per logical action
  • reused on retries

Example:

Idempotency-Key: 9b1de3f4-...

Rules that actually matter:

  • Same logical action → same key
  • Different action → different key
  • Server enforces uniqueness
  • Client must resend the same key on retry

Anything else is incomplete.


A Correct End-to-End Implementation

Let’s walk through a production-safe design.


1. Persist the Idempotency Record

You need durable storage.
This can be:

  • a database table
  • Redis (with TTL)
  • a strongly consistent key-value store

A typical schema:

idempotency_key (unique)
request_hash
response_payload
status (processing, completed, failed)
created_at

Why store the response?
Because retries must return the same result, not just block execution.


2. Enforce Uniqueness at the Storage Layer

This is non-negotiable.

  • Use a unique index
  • Let the database handle concurrency
  • Do not rely on application-level locks

This single constraint eliminates race conditions.


3. Request Flow

On receiving a request:

  1. Attempt to insert the idempotency key
  2. If insert fails:
    • fetch existing record
    • return stored response
  3. If insert succeeds:
    • process the operation
    • store response
    • return response

This ensures:

  • exactly-once execution (logically)
  • safe retries
  • consistent responses

Handling Partial Failures (Where Most Systems Break)

Consider this scenario:

  • payment succeeds
  • server crashes before responding

Without idempotency:

  • client retries
  • payment runs again
  • double charge

With proper idempotency:

  • retry hits the same key
  • server returns stored success
  • no duplicate charge

If your system does not store the response, it is not truly idempotent.


Idempotency in Background Jobs and Queues

Queues retry. Always.

That means:

  • message handlers must be idempotent
  • side effects must be deduplicated

Common pattern:

  • use a job execution key
  • store processed job IDs
  • make downstream writes idempotent

Never assume:

“This message will only be delivered once”

It won’t.


Performance and Storage Trade-offs

Idempotency has costs:

  • extra reads
  • extra writes
  • additional storage
  • TTL management

But here’s the truth:

  • correctness beats performance
  • consistency beats speed
  • financial bugs cost more than infra

Practical strategies:

  • Redis + TTL for short-lived operations
  • Database for payments and orders
  • Reasonable expiration windows (not minutes, not forever)

Never expire keys before retries are impossible.


Security Considerations Engineers Ignore

Idempotency introduces new risks if done carelessly.

Watch out for:

  • replay attacks using old keys
  • global key collisions
  • keys not scoped per user or client

Best practices:

  • scope keys per user/account
  • validate ownership
  • limit TTL
  • never allow unlimited replays for sensitive actions

Idempotency improves safety—but only if scoped correctly.


What Idempotency Is Not

Let’s be clear.

Idempotency is not:

  • a retry strategy
  • a boolean flag
  • a best-effort check
  • a database query pattern
  • an HTTP method

It is a system-level guarantee.


The Senior Engineer Takeaway

If your system:

  • talks over a network
  • retries automatically
  • handles money or state
  • runs background jobs

then idempotency is part of correctness, not an enhancement.

Most “random” production bugs are not random.
They’re deterministic idempotency failures.

Design it early.
Implement it deliberately.
Audit it regularly.

Your future on-call self will be grateful.

Scroll to Top