Email Rate Limiting & Queues: How to Send Email at Scale Safely

If your app sends email directly inside signup, checkout, or password reset requests, it may work at first, but traffic spikes usually expose the weak points. Teams start seeing timeouts, duplicate sends, and provider throttling that quickly turns into support tickets.

The fix is combining email rate limiting with queues. A queue separates the user action from the email delivery workflow, so your API stays fast. Rate limiting then controls outbound send speed so you do not overwhelm your provider or create retry storms during bursts.

This guide walks through a production-safe pattern: enqueue → worker → rate limit → send → retry with backoff → confirm outcomes with logs and events. It also explains how to handle HTTP 429 (Too Many Requests) and the Retry-After header correctly, so your system slows down in a controlled way instead of pushing harder.

If you are implementing this with MailCub, use the MailCub Documentation for integration details and the Transactional Email Service page for product-specific context. You can also review MailCub Pricing when planning for higher sending volume.

Quick Answer

Put every email into a queue instead of sending inline from the web request.
Add a global limiter for the provider/API and a per-domain limiter for domains like Gmail or Outlook.
Handle HTTP 429 by honoring Retry-After when it is present.
Use exponential backoff plus jitter when the server does not provide a retry delay.
Make workers idempotent so retries do not create duplicate emails.
Use a delayed retry queue and a dead-letter queue for jobs that cannot be delivered after multiple attempts.

Why It Matters

Email traffic is bursty at scale. OTP spikes, billing runs, and password reset waves happen in real systems. If you send too fast, providers throttle you. If retry logic is weak, the system retries badly and increases pressure.

HTTP 429 exists to signal that clients must slow down, and Retry-After exists to communicate how long to wait. Respecting these signals is a reliability feature. Queues also protect your product because the API stays responsive while email delivery runs as a controlled background workflow with monitoring and safe retries.

Email Rate Limiting: Picking Safe Limits

Start with two simple limits:

Global sends/second to protect your provider and your own infrastructure.
Per-domain sends/second to protect deliverability at mailbox-provider level.

Per-domain throttling is especially useful because one domain can get a sudden burst and trigger deferrals. It also prevents important traffic like OTP and password resets from getting blocked behind bulk email traffic.

Keep the first version simple. You can tune later using real metrics such as 429 rate, deferrals, and queue age.

Strategy	Best for	Why it works	Watch out for
Fixed rate (N/sec)	steady traffic	predictable throughput	too rigid during bursts
Token bucket	bursty traffic	allows bursts when tokens exist	bucket too large = mini-storms
Leaky bucket	strict smoothing	outputs at a constant “drip”	can increase latency
Per-domain limiter	deliverability	avoids domain spikes	needs correct domain parsing
Priority queues	OTP vs bulk	critical mail stays fast	requires routing rules

Step-by-Step Solution

1) Queue-first: never send inside the web request

Your API should validate input, write a message record, enqueue a job, and return success quickly. The worker should then pull jobs, apply rate limits, send the email, and record attempts and outcomes.

This one change removes email delivery delays from user-facing endpoints and gives you control over retries, pacing, and monitoring.

2) Add idempotency to prevent duplicates

Create a stable idempotency key for each message. The source content suggests using an app_message_id or a hash based on template, recipient, variables, and a time window.

Store attempts by that key. If the same job is retried, update the attempt record rather than sending a second email. This is what prevents duplicate OTPs and duplicate receipts during failures.

3) Handle HTTP 429 the right way

When you receive HTTP 429 Too Many Requests, treat it as a control signal to slow down. The source content notes that 429 may include a Retry-After header telling you how long to wait before retrying.

Use these rules:

If Retry-After exists, schedule the retry after that delay.
If it does not exist, use exponential backoff + jitter with a maximum cap.

Do not sleep inside the worker thread. Instead, move the job into a delayed retry queue so worker capacity remains available for other jobs.

MailCub Documentation is the right reference for implementing provider-aware retry logic and delivery status checks.

4) Implement pacing in workers (limiter + concurrency)

You need two controls working together:

Limiter (pacing): token bucket or leaky bucket
Concurrency (parallelism): number of worker threads or processes

Token bucket is a common choice because it allows short bursts while preserving an average send rate. Leaky bucket is stricter and keeps output smoother, but it can increase latency.

A practical starting setup:

Start with low concurrency
Add a global token bucket
Add per-domain token buckets
Measure queue age and 429 rate before scaling up

5) Design retries to avoid retry storms

Retries are normal. Retry storms are a design problem. The goal is to retry in a way that reduces pressure, not increases it.

Use:

Exponential backoff + jitter
Max attempts (for example, 5–8)
Max time window (for example, up to 24 hours for non-critical messages)
Dead-letter queue for jobs that have exhausted retries

Route critical mail such as OTP and password resets into a priority queue with reserved capacity so they do not get delayed behind bulk traffic.

The Transactional Email Service can be paired with this queue-first retry model, and MailCub Pricing helps when you are planning throughput and scaling.

Common Mistakes

Sending email inline in the web request path.
Retrying immediately after 429 instead of honoring Retry-After.
Sleeping workers instead of scheduling delayed retries.
No idempotency, which leads to duplicate emails.
Using one queue for OTP and bulk email so critical messages get delayed.
No retry caps, causing infinite retries.

Troubleshooting

429 rate limits keep spiking

Check the following first:

Are you honoring Retry-After?
Is your limiter set too high for current provider capacity?
Are too many workers competing without a shared limiter?

Fixes:

Lower global QPS
Add jitter
Move retries into delayed queues

Queue depth grows and never recovers

Check:

Send latency and error rates
Database bottlenecks and lock contention
Worker concurrency vs limiter mismatch

Fixes:

Optimize I/O paths
Split queues by priority
Scale workers cautiously while keeping limiter control in place

MailCub Documentation is useful here for validating delivery behavior, logs, and retry outcomes when diagnosing queue slowdowns or provider throttling.

FAQ

What is email rate limiting?

Email rate limiting controls how quickly your system sends email so you do not overload providers, trigger throttling, or create retry storms during traffic spikes.

How should I handle HTTP 429 Too Many Requests?

Honor Retry-After when it is provided. If not, use exponential backoff with jitter and retry caps.

What’s the difference between a queue and a rate limiter?

A queue stores work to absorb bursts. A limiter controls output pace so sending stays safe and predictable.

Why do I need per-domain throttling for email?

Recipient domains can throttle or defer mail. Per-domain throttling reduces sudden bursts that cause temporary failures and delays.

How do I prevent duplicate emails during retries?

Use idempotency keys and store send attempts so retries update state instead of sending again.

Token bucket vs leaky bucket: which should I use?

Token bucket is flexible and allows bursts while maintaining an average rate. Leaky bucket smooths output more strictly but can add latency.

Conclusion

Scaling email safely is mostly about control: a queue to absorb spikes, rate limiting to pace output, and retries that do not multiply traffic under failure.

If you implement queue-first sending, shared rate limiting, 429-aware delayed retries, and idempotent workers, your system remains stable during real spikes and your product endpoints stay fast. Treat HTTP 429 and Retry-After as normal control signals in your sending loop, not as unexpected errors.

Use the MailCub Documentation to implement and verify your sending flow, and review the Transactional Email Service and MailCub Pricing when you are ready to scale and optimize.