# pgbak ## Overview `pgbak` is a utility for PostgreSQL backups based on WAL archiving. Data is stored in a local directory provided through the `PGBAK` environment variable. Note that `pgbak` itself does not back up the data to any remote location. Instead the user must provide a script at `$PGBAK/scripts/backup` which is invoked when a backup needs to be performed. If no remote backups are desired, it can simply be a symlink to `/bin/true`. ## Backup directory structure - `$PGBAK/TIMESTAMP/` — One or more directories containing base backups and WAL segments. The `TIMESTAMP` is when the base backup was taken. - `$PGBAK/TIMESTAMP/base.tzst` — A compressed tarball containing a full database backup as created by `pg_basebackup`. - `$PGBAK/TIMESTAMP/pg_wal/NNN.zst` — One or more compressed WAL segments following the full backup. - `$PGBAK/current` — Symlink to the latest backup directory. - `$PGBAK/scripts/` — Directory for user-provided scripts. - `$PGBAK/scripts/backup` — script called after one or more WAL files were archived; required - `$PGBAK/pgbak.lock` — Lock/state file. Locking is done through `fcntl()`, so deleting this file should never be necessary. Currently `pgbak` never deletes old backup directories. ## Usage The following subcommands are provided: - `pgbak wal PATH` — Archive the given WAL file and exit. A background sync process will be started if one isn't running. - `pgbak sync` — Run sync in the foreground if necessary. - `pgbak force-sync` — Run sync in the foreground. The `backup` script is always invoked. - `pgbak wait [TIMEOUT]` — If a sync is running, wait for it to finish. The `TIMEOUT` is in seconds and defaults to infinity. Exits with non-zero status on error or timeout. The sync process is responsible for maintaining the `$PGBAK` directory. In particular, it - creates a new full backup every now and again - calls `$PGBAK/scripts/backup` ### Writing `$PGBAK/scripts/backup` When the `backup` script is started, the current directory is set to the subdirectory of `$PGBAK` that needs to be backed up. The script is given the following arguments: - the timestamp of the base backup - the current timestamp Existing files will never disappear/change, provided the base backup timestamp is the same. However, new compressed WAL files may appear at any time. If this happens while the `backup` script is running, it will be called again with a refreshed current timestamp. On failure, the `backup` script will be retried indefinitely. The timestamps will be the same if no new WAL files have appeared. ### Restoring from a backup In order to restore from a backup, simply extract `base.tzst` to an empty `$PGDATA` directory and uncompress all WAL files into `$PGDATA/pg_wal`. Recovery can take quite a while if many WAL files need to be replayed. It may be beneficial to do it without `fsync` and instead `sync` later: ```sh mkdir -m700 data tar -C data -xf backup/base.tzst zstd -d --output-dir-flat data/pg_wal backup/pg_wal/* postgres --single -FD data < /dev/null sync -f data ``` ## Example configuration `postgresql.conf`: ```ini archive_mode = on archive_command = 'pgbak wal %p' archive_timeout = 1h ``` `postgresql.service`: ```ini [Unit] # Make sure DNS works while postgres/pgbak is running After=nss-lookup.target [Service] Environment=PGBAK=/media/backup/pgsql # Wait up to 2 minutes for pgbak sync to finish after postgres is stopped. ExecStopPost=pgbak wait 120 ``` `$PGBAK/scripts/backup`: ```sh #!/bin/sh exec rsync -r ./ "backup@host.example:pg-$1/" ``` To make your first backup, simply restart postgres and call `pg_switch_wal()`. ## Misc `pgbak` uses [Zstandard](http://www.zstd.net/)'s command-line tool for compression. The compression options may be tweaked in `config.h`. Currently only Linux is supported because of `O_TMPFILE` and `F_OFD_SETLK`.