# pgbak ## Overview `pgbak` is a robust WAL archiver for PostgreSQL. It automates compression as well as the creation of full snapshots. Data is stored in a local directory provided through the `PGBAK` environment variable. A user-provided script may back it up to a remote location if desired. Some tasks are performed by a "sync" process that is automatically started in the background when necessary. This includes taking full snapshots and running the backup script. ## Backup directory structure - `$PGBAK/TIMESTAMP/` — One or more directories containing base backups and WAL segments. The `TIMESTAMP` is when the base backup was taken. - `$PGBAK/TIMESTAMP/base.tzst` — A compressed tarball containing a full database backup as created by `pg_basebackup`. - `$PGBAK/TIMESTAMP/pg_wal/NNN.zst` — One or more compressed WAL segments following the full backup. - `$PGBAK/current` — Symlink to the latest backup directory. - `$PGBAK/scripts/` — Directory for user-provided scripts. - `$PGBAK/scripts/backup` — script called after one or more WAL files were archived; required - `$PGBAK/pgbak.lock` — Lock/state file. Locking is done through `fcntl()`, so deleting this file should never be necessary. Currently `pgbak` never deletes old backup directories. ## Usage The following subcommands are provided: - `pgbak wal PATH` — Archive the given WAL file and exit. A background sync process will be started if necessary. - `pgbak sync` — Run sync in the foreground if a previous run was interrupted. - `pgbak force-sync` — Run sync in the foreground. The `backup` script is always invoked. - `pgbak full-sync` — Run sync in the foreground. Take a full snapshot and run the `backup` script. - `pgbak wait [TIMEOUT]` — If a sync is running, wait for it to finish. The `TIMEOUT` is in seconds and defaults to infinity. Exits with non-zero status on error or timeout. ### Writing `$PGBAK/scripts/backup` When the `backup` script is started, the current directory is set to the subdirectory of `$PGBAK` that needs to be backed up. The script is given up to three arguments: 1. the timestamp of the base backup 2. the current timestamp 3. an optional `old` flag if this is the last invocation of the backup script before switching to a newer snapshot Existing files will never disappear or change, provided the base backup timestamp is the same. However, new compressed WAL files may appear at any time. If this happens while the `backup` script is running, it will be called again with a refreshed current timestamp. On failure, the `backup` script will be retried indefinitely. The "current timestamp" will be the same if no new WAL files have appeared. ### Restoring from a backup To restore from a backup, simply extract `base.tzst` to an empty `$PGDATA` directory and uncompress all WAL files into `$PGDATA/pg_wal`. Recovery can take quite a while if many WAL files need to be replayed. It may be beneficial to do it without `fsync` and instead `sync` later: ```sh mkdir -m700 data tar -C data -xf backup/base.tzst zstd -d --output-dir-flat data/pg_wal backup/pg_wal/* postgres --single -FD data < /dev/null sync -f data ``` ## Example configuration `postgresql.conf`: ```ini archive_mode = on archive_command = 'pgbak wal %p' archive_timeout = 1h ``` `postgresql.service`: ```ini [Unit] # Make sure DNS works while postgres/pgbak is running After=nss-lookup.target [Service] Environment=PGBAK=/media/backup/pgsql # Wait up to 2 minutes for pgbak sync to finish after postgres is stopped. ExecStopPost=pgbak wait 120 ``` `$PGBAK/scripts/backup`: ```sh #!/bin/sh exec rsync -r ./ "backup@host.example:pg-$1/" ``` To make your first backup, simply restart postgres and call `pg_switch_wal()`. ## Misc By default `pgbak` uses [Zstandard](http://www.zstd.net/)'s command-line tool for compression. The compression command and options may be tweaked in `config.h`. Currently only Linux is supported because of `O_TMPFILE` and `F_OFD_SETLK`.