Name: MailCub Email Hosting & Transactional Email
Brand: MailCub
Price: 9.99 USD
Availability: InStock

Introduction

If you send transactional emails directly inside a web request, you will eventually lose messages during deploys, worker crashes, timeouts, or rate-limit spikes. The biggest problem is silent failure: the user does not get the email, and your team cannot prove what happened.

This guide is for SaaS and development teams building transactional email pipelines for OTPs, password resets, receipts, and alerts. It shows how to design an email queue that does not silently drop messages by using a transactional outbox, at-least-once processing, idempotency, and dead-letter queues (DLQ). It also covers safe acknowledgment timing, visibility and ack timeouts, and troubleshooting with provider logs and webhook events.

You will not get a magic inbox guarantee. You will get a queue design where every email has a traceable lifecycle and recoverable failure states. Start by reviewing MailCub Documentation and confirming your test send appears in delivery logs so your visibility path is working.

Quick Answer

Use at-least-once delivery and make sending idempotent, because safe duplicates are better than dropped messages.
Implement a transactional outbox so database writes and send intent are persisted together.
Acknowledge queue messages only after the provider accepts the send and you store a message reference.
Add a dead-letter queue and alert on it so poison messages stop looping.
Tune visibility timeout or ack timeout to match worst-case worker time.
Keep an audit trail with app logs, provider logs, and events or webhooks.

Why It Matters

Transactional emails are part of your product reliability. If OTPs or password resets do not arrive, users assume your app is broken or unsafe. If invoices or receipts do not land, support load and disputes increase.

A good queue turns failures into visible states such as queued, processing, retrying, dead-lettered, and accepted-by-provider. That is the difference between “we think we sent it” and “we can show exactly what happened.”

The goal is not perfection. The goal is no silent loss, safe retries, and fast debugging.

What “Never Loses Messages” Really Means

In practice, “never loses messages” means three things:

Durability: the intent to send is persisted before any network call
No silent drops: every failure becomes a visible state you can alert on
Duplicate-safe processing: at-least-once delivery can happen, and it is harmless

Once you accept these rules, the architecture becomes much easier to design.

Step-by-Step Solution

1) Choose your reliability contract: at-least-once + idempotency

Most real queues behave like at-least-once systems, which means a message can be delivered more than once. If your sending logic cannot tolerate duplicates, you will either drop messages or spam users.

Fix this with an idempotency key. Make the key unique per business event, not per job attempt.

Example key shapes from the provided content:

email:{tenant_id}:{message_type}:{business_event_id}
otp:{tenant_id}:{user_id}:{purpose}:{time_bucket}

Store the idempotency key in a table with a unique constraint so repeat attempts become no-ops.

2) Use the transactional outbox pattern (stop dual-write loss)

The easiest way to lose messages is doing a database write and queue publish as separate operations. If enqueue fails after the database commit, the email intent is lost.

The transactional outbox pattern fixes this:

Write the business record and an outbox row in the same database transaction
A dispatcher publishes outbox rows to the queue and marks them as published

This ensures that if the business state exists, the send intent exists too.

3) Publish durably and record the broker reference

When publishing to the queue, record traceable references such as:

Outbox row ID
Broker message ID (if your queue provides one)
Published timestamp

This creates an audit trail and makes “lost message” reports debuggable.

4) Worker: acknowledge only after provider acceptance and state write

Your worker should follow a strict sequence:

Receive queue message
Check idempotency key (already accepted or sent?)
Call provider send API
Persist provider response (status and message reference)
Acknowledge the queue message

If you acknowledge before step 4, a crash can cause silent drops. If you acknowledge after step 4 but do not use idempotency, retries can create duplicates. You need both controls together.

5) Tune visibility and ack timeouts for slow sends

If your queue uses a visibility timeout or ack deadline, the message can reappear while a worker is still processing it. This creates duplicates under load.

Set the timeout higher than your worst-case processing time, and use heartbeat or timeout extension if your queue supports it. Also limit worker concurrency so you do not overwhelm the provider and trigger throttling spikes.

6) Add retries with backoff and a DLQ for poison messages

Retries should be:

Limited (maximum attempts)
Spaced out (exponential backoff plus jitter)
Classified (retry transient errors, stop on permanent errors)

Then add a dead-letter queue:

After N failures, send the message to the DLQ
Alert on DLQ growth
Keep DLQ retention long enough to investigate and replay safely

You can use MailCub Documentation as your operational reference while implementing outbox, retries, and failure visibility, and review the Transactional Email page for delivery logging and event tracking capabilities.

7) Add observability: trace IDs, metrics, and provider logs/events

At minimum, store:

Correlation ID (request ID or user ID)
Outbox ID and idempotency key
Timestamps for queued, processing, accepted, and dead-lettered states
Provider response snapshot (redacted as needed)

Add core metrics:

Queue lag and in-flight count
Retry rate
DLQ count
Provider throttle events

If your provider supports webhooks or events, consume them so your app can automatically record final outcomes such as delivered, bounced, or failed.

Email Queue Reliability Checklist

Layer	Must-have control	What it prevents	“Good” signal
Write path	Transactional outbox	DB updated but no email job	Outbox row exists for every send intent
Processing	At-least-once + idempotency	Duplicate sends becoming incidents	Replays become no-ops
Acking	Ack after provider acceptance	Silent drops on worker crash	No “missing” message without a state
Retries	Backoff + classification	Retry storms and spam	Retry rate remains stable under spikes
DLQ	Dead-letter after max attempts	Infinite retry loops	DLQ alerts and a replay runbook exist
Visibility	Timeout tuned + heartbeats	Accidental duplicates during slow provider responses	Low duplicate rate during provider slowdowns
Observability	IDs + logs + events	Guesswork	Message is traceable end-to-end

Common Mistakes

Doing database writes and queue publish separately, with no outbox
Acknowledging messages before provider acceptance
Retrying permanent failures forever, with no DLQ
No idempotency key, causing duplicate OTPs or receipts
Visibility timeout set too low, causing messages to reappear during processing
No trace ID linking user action to queue message and provider outcome

Troubleshooting

We’re losing emails

Check the pipeline in this order:

Does an outbox record exist?
Did the dispatcher publish it to the queue?
Did a worker pick it up?
Did the provider accept it?
Do provider logs or events show the final status?

If you cannot answer any step, add instrumentation there first.

We’re sending duplicates

The most common causes are:

Missing idempotency key
Visibility timeout too short, so the message reappears
Worker crash after provider accepted but before state was persisted

Fix this by enforcing idempotency and persisting provider acceptance before acknowledgment.

DLQ is growing

Treat the DLQ like a bug queue:

Inspect sample payloads
Fix template, domain, or configuration issues
Replay only after the root cause is resolved

FAQ

Can an email queue truly guarantee zero message loss?

Not in every failure mode. The practical target is no silent loss: durable intent, visible states, and duplicate-safe processing.

What is the transactional outbox pattern and why does it matter?

It ensures the database change and send intent are committed together, so emails are not lost during partial failures between database and queue operations.

When should a worker acknowledge a message?

After the provider accepts the send and your system persists the acceptance status and message reference.

Why is idempotency required for reliable queues?

Because at-least-once delivery can repeat messages. Idempotency makes those repeats safe.

What should go to a DLQ?

Messages that exceed max retries or fail with permanent errors, such as bad payloads or invalid domain configuration.

How do webhooks and events help?

They let your app record final outcomes like delivered, failed, or bounced automatically, which reduces guesswork and improves debugging.

Conclusion

To build an email queue that does not silently lose messages, design for reality: at-least-once delivery, safe duplicates, durable send intent, and visible failure states. The transactional outbox removes the biggest silent-loss risk, DLQs stop poison loops, and strict ack timing prevents drops during worker crashes.

Use MailCub Documentation to guide implementation, explore delivery logging and event workflows on the Transactional Email page, and review MailCub Pricing if you are planning production rollout capacity and operational support.

How to Build an Email Queue That Never Loses Messages