pgbak
Overview
pgbak
is a robust WAL archiver for PostgreSQL. It automates compression as well as the creation of full snapshots.
Data is stored in a local directory provided through the PGBAK
environment variable. A user-provided script may back it up to a remote location if desired.
Some tasks are performed by a "sync" process that is automatically started in the background when necessary. This includes taking full snapshots and running the backup script.
Backup directory structure
$PGBAK/TIMESTAMP/
— One or more directories containing base backups and WAL segments. TheTIMESTAMP
is when the base backup was taken.$PGBAK/TIMESTAMP/base.tzst
— A compressed tarball containing a full database backup as created bypg_basebackup
.$PGBAK/TIMESTAMP/pg_wal/NNN.zst
— One or more compressed WAL segments following the full backup.
$PGBAK/current
— Symlink to the latest backup directory.$PGBAK/scripts/
— Directory for user-provided scripts.$PGBAK/scripts/backup
— script called after one or more WAL files were archived; required
$PGBAK/pgbak.lock
— Lock/state file. Locking is done throughfcntl()
, so deleting this file should never be necessary.
Currently pgbak
never deletes old backup directories.
Usage
The following subcommands are provided:
pgbak wal PATH
— Archive the given WAL file and exit. A background sync process will be started if necessary.pgbak sync
— Run sync in the foreground if a previous run was interrupted.pgbak force-sync
— Run sync in the foreground. Thebackup
script is always invoked.pgbak full-sync
— Run sync in the foreground. Take a full snapshot and run thebackup
script.pgbak wait [TIMEOUT]
— If a sync is running, wait for it to finish. TheTIMEOUT
is in seconds and defaults to infinity. Exits with non-zero status on error or timeout.
Writing $PGBAK/scripts/backup
When the backup
script is started, the current directory is set to the subdirectory of $PGBAK
that needs to be backed up. The script is given up to three arguments:
- the timestamp of the base backup
- the current timestamp
- an optional
old
flag if this is the last invocation of the backup script before switching to a newer snapshot
Existing files will never disappear or change, provided the base backup timestamp is the same. However, new compressed WAL files may appear at any time. If this happens while the backup
script is running, it will be called again with a refreshed current timestamp.
On failure, the backup
script will be retried indefinitely. The "current timestamp" will be the same if no new WAL files have appeared.
Restoring from a backup
To restore from a backup, simply extract base.tzst
to an empty $PGDATA
directory and uncompress all WAL files into $PGDATA/pg_wal
.
Recovery can take quite a while if many WAL files need to be replayed. It may be beneficial to do it without fsync
and instead sync
later:
mkdir -m700 data
tar -C data -xf backup/base.tzst
zstd -d --output-dir-flat data/pg_wal backup/pg_wal/*
postgres --single -FD data < /dev/null
sync -f data
Example configuration
postgresql.conf
:
archive_mode = on
archive_command = 'pgbak wal %p'
archive_timeout = 1h
postgresql.service
:
[Unit]
# Make sure DNS works while postgres/pgbak is running
After=nss-lookup.target
[Service]
Environment=PGBAK=/media/backup/pgsql
# Wait up to 2 minutes for pgbak sync to finish after postgres is stopped.
ExecStopPost=pgbak wait 120
$PGBAK/scripts/backup
:
#!/bin/sh
exec rsync -r ./ "backup@host.example:pg-$1/"
To make your first backup, simply restart postgres and call pg_switch_wal()
.
Misc
By default pgbak
uses Zstandard's command-line tool for compression. The compression command and options may be tweaked in config.h
.
Currently only Linux is supported because of O_TMPFILE
and F_OFD_SETLK
.