aboutsummaryrefslogtreecommitdiff

pgbak

Overview

pgbak is a utility for PostgreSQL backups based on WAL archiving. Data is stored in a local directory provided through the PGBAK environment variable.

Note that pgbak itself does not back up the data to any remote location. Instead the user must provide a script at $PGBAK/scripts/backup which is invoked when a backup needs to be performed. If no remote backups are desired, it can simply be a symlink to /bin/true.

Backup directory structure

  • $PGBAK/TIMESTAMP/ — One or more directories containing base backups and WAL segments. The TIMESTAMP is when the base backup was taken.
    • $PGBAK/TIMESTAMP/base.tzst — A compressed tarball containing a full database backup as created by pg_basebackup.
    • $PGBAK/TIMESTAMP/pg_wal/NNN.zst — One or more compressed WAL segments following the full backup.
  • $PGBAK/current — Symlink to the latest backup directory.
  • $PGBAK/scripts/ — Directory for user-provided scripts.
    • $PGBAK/scripts/backup — script called after one or more WAL files were archived; required
  • $PGBAK/pgbak.lock — Lock/state file. Locking is done through fcntl(), so deleting this file should never be necessary.

Currently pgbak never deletes old backup directories.

Usage

The following subcommands are provided:

  • pgbak wal PATH — Archive the given WAL file and exit. A background sync process will be started if one isn't running.
  • pgbak sync — Run sync in the foreground if necessary.
  • pgbak force-sync — Run sync in the foreground. The backup script is always invoked.
  • pgbak wait [TIMEOUT] — If a sync is running, wait for it to finish. The TIMEOUT is in seconds and defaults to infinity. Exits with non-zero status on error or timeout.

The sync process is responsible for maintaining the $PGBAK directory. In particular, it

  • creates a new full backup every now and again
  • calls $PGBAK/scripts/backup

Writing $PGBAK/scripts/backup

When the backup script is started, the current directory is set to the subdirectory of $PGBAK that needs to be backed up. The script is given the following arguments:

  • the timestamp of the base backup
  • the current timestamp

Existing files will never disappear/change, provided the base backup timestamp is the same. However, new compressed WAL files may appear at any time. If this happens while the backup script is running, it will be called again with a refreshed current timestamp.

On failure, the backup script will be retried indefinitely. The timestamps will be the same if no new WAL files have appeared.

Restoring from a backup

In order to restore from a backup, simply extract base.tzst to an empty $PGDATA directory and uncompress all WAL files into $PGDATA/pg_wal.

Recovery can take quite a while if many WAL files need to be replayed. It may be beneficial to do it without fsync and instead sync later:

mkdir -m700 data
tar -C data -xf backup/base.tzst
zstd -d --output-dir-flat data/pg_wal backup/pg_wal/*
postgres --single -FD data < /dev/null
sync -f data

Example configuration

postgresql.conf:

archive_mode = on
archive_command = 'pgbak wal %p'
archive_timeout = 1h

postgresql.service:

[Unit]
# Make sure DNS works while postgres/pgbak is running
After=nss-lookup.target

[Service]
Environment=PGBAK=/media/backup/pgsql
# Wait up to 2 minutes for pgbak sync to finish after postgres is stopped.
ExecStopPost=pgbak wait 120

$PGBAK/scripts/backup:

#!/bin/sh
exec rsync -r ./ "backup@host.example:pg-$1/"

To make your first backup, simply restart postgres and call pg_switch_wal().

Misc

pgbak uses Zstandard's command-line tool for compression. The compression options may be tweaked in config.h.

Currently only Linux is supported because of O_TMPFILE and F_OFD_SETLK.