A Postgres job queue instead of a message broker
apsis is a small open-source backend I built to predict satellite passes. Like most systems of its kind it has two flavours of background work: something recurring (refresh orbital data on a schedule) and something reactive (when that data changes, recompute the predictions). The reflex answer is a message broker plus a worker framework. I used PostgreSQL for both. Here is the design, and the part that matters more: when I would not.
Two kinds of background work
- Recurring: ingest TLEs (orbital elements) from CelesTrak every couple of hours.
- Reactive: when a satellite’s orbit changes, recompute its passes over every registered ground station.
Why not a broker
- apsis already runs Postgres. A broker is one more service to deploy, secure, monitor, and reason about.
- The reactive work must fire if and only if the database write committed. With a broker that is the dual-write problem: you can commit the row and fail to publish the message, or publish and fail to commit. Inside one database, the event and the write share a single transaction.
Scheduled jobs: one table, a lease, no separate scheduler
There is a scheduled_jobs row per recurring job. A worker polls for due rows and
claims one with a conditional update:
UPDATE scheduled_jobs
SET status = 'RUNNING', lease_until = now() + :lease
WHERE id = :id AND status = 'PENDING' AND next_run_at <= now()
RETURNING job_name, interval_seconds;
Zero rows back means another worker won the race. No row locks are needed because
there are only a handful of jobs. A lease_until column plus a recovery pass
(RUNNING with an expired lease goes back to PENDING) reclaims a job whose
worker crashed mid-run. After the handler finishes, the row is rescheduled to
now() + interval.
The transactional outbox
When TLE ingest changes a satellite, it inserts an outbox_events row in the
same transaction as the update and issues a pg_notify. The event exists if and
only if the business change committed. A separate worker drains the outbox: woken
by LISTEN/NOTIFY, it claims one event with SELECT ... FOR UPDATE SKIP LOCKED LIMIT 1, dispatches it, and commits - one event per transaction, so a second
worker (a replica, a rolling deploy) can never grab the same row. Failures bump a
retry counter with exponential backoff and jitter; past a limit the event is
dead-lettered. Per-handler state stored on the row lets a retry skip the handlers
that already succeeded.
At-least-once, therefore idempotent
Both mechanisms are at-least-once: a worker can do the work and crash before
marking it done. So the handlers are written to be idempotent. recompute_passes
deletes and reinserts the predictions for each (satellite, ground station) pair;
TLE ingest upserts by catalog number. Re-running converges to the same state.
When I would NOT do this
This is the point of the post.
- Very high throughput (tens of thousands of claims per second): row churn and
VACUUMpressure will hurt. Reach for a real broker. - Fan-out to many independent consumers, streaming, or complex routing: that is exactly what brokers are good at.
- Consumers in other languages or other services: a database table is a poor integration boundary across teams.
- You are not already running Postgres: do not add it just for this.
For a single service that already owns a Postgres database and needs reliable, transactional background work at a human scale, a table is usually enough - and you can read it, query it, and back it up with the tools you already have.
The full implementation is open source at github.com/aJustDev/apsis; the trade-offs above are written up as ADRs in the repository.