// docs

deploy

Everything operational lives in deploy/: one compose stack, three shapes (local, TLS, tunnel), and a set of scripts with deliberately distinct verbs. This page is the digest; the runbook in deploy/README.md is the authority.

topology

                    host
                     │  :80  (the ONLY published port; :443 in prod)
              ┌──────┴───────┐
              │   traefik    │   sole ingress, TLS/ACME + WS passthrough
              └──────┬───────┘
        edge network │
              ┌──────┴───────┐
              │    relay     │   no published ports; content is opaque to it
              └──┬────────┬──┘
   internal net  │        │   (internal: true — no NAT, no host ports)
        ┌────────┴──┐  ┌──┴──────────┐
        │ postgres  │  │  openbao    │   Transit at-rest KMS
        └───────────┘  └─────────────┘

Two networks carry the security boundary: everything reachable arrives through Traefik on edge; Postgres and OpenBao join only internal and are unreachable from the host or internet (SO-6). Agents join the edge network with no published ports of their own.

the three shapes

shapecompose fileswhen
localdocker-compose.ymlvalidation on *.localhost, HTTP
production TLS+ docker-compose.prod.ymlthree public origins (app/relay/s3), Let’s Encrypt, :443
tunnel+ docker-compose.tunnel.ymlone origin behind Cloudflare Tunnel + Access, zero published ports — see remote coordination

Production is the local stack plus overrides — one extra -f flag each, nothing forked.

production OpenBao, not -dev

The compose ships OpenBao in dev mode for local validation only. Production runs a real sealed server on the baodata volume: bao operator init once (record the unseal shares and root token out-of-band — shown exactly once), unseal on every start (auto-unseal preferred), then mint the relay a scoped token allowing only encrypt/decrypt on the relay-atrest Transit key. preflight-prod.sh rejects a deployment still carrying the dev root token.

backups — the part you must not skip

The relay’s Postgres is the only off-device copy of every member’s sealed keystore and all ciphertext events. The relay cannot reconstruct any of it. Lose the database and users cannot recover their data even with their passphrase.

A complete backup is two things, paired:

  1. the Postgres dump — ./deploy/backup.sh /mnt/backups
  2. the OpenBao baodata snapshot (the relay-atrest key) — taken automatically by the same run, with a matching timestamp

Restoring Postgres rows without the matching Transit key leaves every at-rest envelope permanently undecryptable — so prune the pair together, store both off-host, and treat the OpenBao snapshot as more sensitive than the ciphertext dump.

0 3 * * * /path/to/repo/deploy/backup.sh /mnt/backups >> /var/log/cozylabs-backup.log 2>&1

Targets: RPO 24h by default (tighten to hourly if your write volume warrants), RTO 1h from fresh host to validated stack. On full host loss, restore OpenBao first, then the DB.

upgrades & rollback

Migrations are forward-only and auto-apply on boot — there are no down-migrations, so you do not roll back by redeploying old code onto a newer schema (ADR-0008). The upgrade dance takes its own rollback point first:

./deploy/backup.sh /mnt/backups       # 1. backup FIRST — this is your rollback point
git pull                              # 2. the new release
docker compose -f deploy/docker-compose.yml up -d --build   # 3. rebuild; migrations apply
./deploy/validate-prod.sh             # 4. confirm (prod)

Rollback = previous checkout + restore the pre-upgrade dump. The Transit key is untouched by app upgrades.

the scripts, by verb

scriptkindrun it
bootstrap.shsetuponce, first run — generates deploy/.env secrets
render-prod-config.shsetup (prod)after setting domains; SHAPE=tunnel for single-origin
preflight-prod.shstatic checkbefore up — validates .env, no running stack needed
validate-prod.shdynamic checkafter up — read-only against the live deployment
backup.sh / restore.shopsbefore every upgrade; restore to roll back
disaster-drill.shproof / beta gaterehearse full host-loss recovery on a throwaway stack
e2e-local.sh⚠ destructive, dev/CIephemeral boot→smoke→teardown; wipes volumes — never on a live host