THE HABIT / PRE-MIGRATION CHECKPOINTS

Back up before the migration. Every time.

TAGGED · RESTORE-DRILLED · SEALED BEFORE THE DEPLOY PROCEEDS

Production data almost never dies from a crashed disk. It dies from a deploy: an ORM auto-migration that turns a rename into a DROP ... CASCADE, a backfill that updates the wrong rows, an index swap that was supposed to be safe. The fix isn't to deploy less — it's to make every migration trivially reversible by sealing a known-good copy seconds before it runs.

Why your scheduled backup doesn't cover this

A nightly backup gives you yesterday. The migration that goes wrong at 4:55 PM needs 4:54:30 today — restoring last night's snapshot means throwing away a full day of customer writes to undo one bad schema change. And platform rollbacks are all-or-nothing: the whole project goes back, good writes included. (Here's that Friday, minute by minute.) The checkpoint flips the geometry: a recovery point seconds wide, tagged with the exact commit that caused the trouble.

One step in CI, and the migration can't outrun its backup

The GitHub Action seals a checkpoint and blocks the workflow until it's proven — sealed, then restore-drilled into a real Postgres and row-counted. If the checkpoint fails, the step fails and the migration never runs:

- name: Checkpoint database before migration
  uses: offsitedb/checkpoint@v1
  with:
    api-key: ${{ secrets.OFFSITEDB_API_KEY }}
    database: prod-api

- name: Run migrations
  run: npm run migrate

No GitHub Actions? It's one curl from any deploy script:

curl -X POST https://offsitedb.com/api/checkpoint \
  -H "Authorization: Bearer $OFFSITEDB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"database": "prod-api", "tag": "pre-deploy-'$GITHUB_SHA'"}'

The response comes back only when the snapshot is sealed and drilled — your pipeline isn't proceeding on a promise:

{
  "ok": true,
  "tag": "pre-deploy-4f3a9c1",
  "size": "412.3 MB",
  "restore_drill": "proven"
}

No pipeline at all? The Back up now button in the dashboard is the same checkpoint — click it before you run the migration by hand.

When the migration is wrong anyway

Checkpoints are standard custom-format pg_dump archives in your own bucket, so recovery is surgical: restore just the tables the migration damaged with pg_restore --table, and keep every good write since the deploy. You already know the restore works — it was drilled when the checkpoint was sealed, and the row counts are in your ledger.

If your migrations are written by AI

This habit was useful when humans wrote migrations slowly. It's essential now that AI writes them fast: schema changes land daily, review is lighter, and the person approving the diff often can't spot the destructive operation hiding inside an innocent one. A pre-migration checkpoint is the mechanism that makes that speed safe — your AI writes fearless migrations, and you keep an undo button it can't delete.

Set up checkpoints See a drill report

FAQ

Why isn't my nightly backup enough?

A nightly backup gives you yesterday. A migration that goes wrong at 4:55 PM needs 4:54:30 today — restoring last night's snapshot throws away a full day of customer writes. The checkpoint exists precisely to make the recovery point seconds wide, not hours.

Doesn't this slow down every deploy?

A checkpoint takes about as long as a pg_dump of your database — a minute or two for most production apps, and the Action waits up to 15 minutes by default. It's the cheapest step in your pipeline relative to what it covers. If you want, gate it to run only on deploys that contain migration files.

What happens if the checkpoint fails?

The Action exits non-zero and your workflow stops — the migration never runs. That's deliberate: a schema change with no known-good copy seconds before it is exactly the situation this exists to prevent. The failure response includes a plain-English diagnosis of why the backup failed.

Can I restore just one table instead of the whole database?

Yes. Checkpoints are standard custom-format pg_dump archives, so pg_restore can restore a single table (--table) or any subset. After a bad migration, you typically restore only what the migration damaged and keep everything else.

Do I need CI for this?

No. The 'Back up now' button in the dashboard does the same thing, and a single curl works from anywhere — a deploy script, a Makefile, or your terminal right before you run the migration by hand.

My migrations are written by AI. Does that change anything?

It makes the habit more important, not less. AI-generated migrations ship faster and get lighter review — and a rename that's secretly a DROP...CASCADE looks identical at a glance. The checkpoint is what makes fast, machine-written migrations safe to run: the AI writes fearlessly, you keep an undo button.

Keep reading

Anatomy of a bad migration — the incident this page exists to prevent
Supabase backup: the complete picture — if your database lives there
OffsiteDB vs a cron job running pg_dump — why the cron job is never there at 4:54:30