Anatomy of a bad migration
Nobody plans for this. It arrives on a Friday, in a deploy that looked fine in review. Here's the same incident playing out two ways — once the way it usually goes, and once with a tested, off-site checkpoint sealed 30 seconds before the migration ran.
4:55 PM Friday — the deploy
A migration ships
An ORM auto-migration renames a table. Under the hood it's a DROP and re-create — and a foreign key turns it into a DROP ... CASCADE that quietly takes three dependent tables with it.
The data is gone
The deploy goes green. Orders, payments, and audit history for the last 18 months are no longer in the database. Nobody notices for forty minutes — until support tickets start.
The scramble
You confirm the tables are empty. Now the only question that matters: what's the last good copy, and can we actually restore it?
5:40 PM — the fork in the road
Without a tested off-site backup
- Platform PITR window is 7 days — but it's in the same account, and restoring it rolls back everything, including the good writes since 4:56.
- On a free tier, there's no PITR at all.
- The nightly pg_dump cron exists, but nobody has ever restored from it. The first attempt errors on a missing role.
- Hours later you get most of it back. The gap between the last good dump and 4:56 is gone for good.
- Monday: you write the incident report explaining what was lost.
With an OffsiteDB checkpoint
- The CI pipeline sealed a checkpoint at 4:54:30 — a restore-drilled snapshot, tagged pre-deploy-4f3a9c1, taken 30 seconds before the migration.
- You already know it restores: it was drilled into a real Postgres cluster when it was sealed, and it's on last month's report.
- You restore the three dropped tables from the checkpoint — just those tables, leaving the good writes intact.
- restored 184 tables, 9.2M rows — 94 seconds, a command you've watched succeed hundreds of times.
- 5:42 PM: back online. The incident report is two sentences.
What made the second path possible
Nothing heroic — just three things in place before the bad day, which is the only time they can be:
- A pre-migration checkpoint. One step in CI seals a tagged snapshot and blocks until it exists, so there's always a fresh, known-good copy from moments before any migration. See the GitHub Action →
- A backup that was already proven. Every snapshot is restored into a throwaway Postgres cluster and row-counted before it's marked sealed — so “can we restore it?” was answered weeks ago, not at 5:40 on a Friday.
- An off-site copy you own. The checkpoint lives in your own S3/R2 bucket, encrypted — outside the account, region, and blast radius of the database that just broke.
The cheapest insurance you'll ever expense
You will need a backup once. That day, you'll want one that's already proven it restores and sits one command away. Start a free trial, see a sample drill report, or read how it handles your credentials.