<- Back to notes

A Postgres job queue instead of a message broker

apsis is a small open-source backend I built to predict satellite passes. Like most systems of its kind it has two flavours of background work: something recurring (refresh orbital data on a schedule) and something reactive (when that data changes, recompute the predictions). The reflex answer is a message broker plus a worker framework. I used PostgreSQL for both. Here is the design, and the part that matters more: when I would not.

Two kinds of background work

  • Recurring: ingest TLEs (orbital elements) from CelesTrak every couple of hours.
  • Reactive: when a satellite’s orbit changes, recompute its passes over every registered ground station.

Why not a broker

  • apsis already runs Postgres. A broker is one more service to deploy, secure, monitor, and reason about.
  • The reactive work must fire if and only if the database write committed. With a broker that is the dual-write problem: you can commit the row and fail to publish the message, or publish and fail to commit. Inside one database, the event and the write share a single transaction.

Scheduled jobs: one table, a lease, no separate scheduler

There is a scheduled_jobs row per recurring job. A worker polls for due rows and claims one with a conditional update:

UPDATE scheduled_jobs
   SET status = 'RUNNING', lease_until = now() + :lease
 WHERE id = :id AND status = 'PENDING' AND next_run_at <= now()
RETURNING job_name, interval_seconds;

Zero rows back means another worker won the race. No row locks are needed because there are only a handful of jobs. A lease_until column plus a recovery pass (RUNNING with an expired lease goes back to PENDING) reclaims a job whose worker crashed mid-run. After the handler finishes, the row is rescheduled to now() + interval.

The transactional outbox

When TLE ingest changes a satellite, it inserts an outbox_events row in the same transaction as the update and issues a pg_notify. The event exists if and only if the business change committed. A separate worker drains the outbox: woken by LISTEN/NOTIFY, it claims one event with SELECT ... FOR UPDATE SKIP LOCKED LIMIT 1, dispatches it, and commits - one event per transaction, so a second worker (a replica, a rolling deploy) can never grab the same row. Failures bump a retry counter with exponential backoff and jitter; past a limit the event is dead-lettered. Per-handler state stored on the row lets a retry skip the handlers that already succeeded.

At-least-once, therefore idempotent

Both mechanisms are at-least-once: a worker can do the work and crash before marking it done. So the handlers are written to be idempotent. recompute_passes deletes and reinserts the predictions for each (satellite, ground station) pair; TLE ingest upserts by catalog number. Re-running converges to the same state.

When I would NOT do this

This is the point of the post.

  • Very high throughput (tens of thousands of claims per second): row churn and VACUUM pressure will hurt. Reach for a real broker.
  • Fan-out to many independent consumers, streaming, or complex routing: that is exactly what brokers are good at.
  • Consumers in other languages or other services: a database table is a poor integration boundary across teams.
  • You are not already running Postgres: do not add it just for this.

For a single service that already owns a Postgres database and needs reliable, transactional background work at a human scale, a table is usually enough - and you can read it, query it, and back it up with the tools you already have.

The full implementation is open source at github.com/aJustDev/apsis; the trade-offs above are written up as ADRs in the repository.