aboutsummaryrefslogtreecommitdiff

pgbak

Overview

pgbak is a robust WAL archiver for PostgreSQL. It automates compression as well as the creation of full snapshots.

Data is stored in a local directory provided through the PGBAK environment variable. A user-provided script may back it up to a remote location if desired.

Some tasks are performed by a "sync" process that is automatically started in the background when necessary. This includes taking full snapshots and running the backup script.

Note that pgbak is not a full point-in-time recovery archiver. In some cases, for example during snapshots, some WAL files may be missed.

Backup directory structure

  • $PGBAK/TIMESTAMP/ — One or more directories containing base backups and WAL segments. The TIMESTAMP is when the base backup was taken.
    • $PGBAK/TIMESTAMP/base.tzst — A compressed tarball containing a full database backup as created by pg_basebackup.
    • $PGBAK/TIMESTAMP/pg_wal/NNN.zst — One or more compressed WAL segments following the full backup.
  • $PGBAK/current — Symlink to the latest backup directory.
  • $PGBAK/scripts/ — Directory for user-provided scripts.
    • $PGBAK/scripts/backup — script called after one or more WAL files were archived; required
  • $PGBAK/pgbak.lock — Lock/state file. Locking is done through fcntl(), so deleting this file should never be necessary.

Currently pgbak never deletes old backup directories.

Usage

The following subcommands are provided:

  • pgbak wal PATH — Archive the given WAL file and exit. A background sync process will be started if necessary.
  • pgbak sync — Run sync in the foreground if a previous run was interrupted.
  • pgbak force-sync — Run sync in the foreground. The backup script is always invoked.
  • pgbak full-sync — Run sync in the foreground. Take a full snapshot and run the backup script.
  • pgbak wait [TIMEOUT] — If a sync is running, wait for it to finish. The TIMEOUT is in seconds and defaults to infinity. Exits with non-zero status on error or timeout.

Writing $PGBAK/scripts/backup

When the backup script is started, the current directory is set to the subdirectory of $PGBAK that needs to be backed up. The script is given the following arguments:

  1. the timestamp of the base backup
  2. the current timestamp

Existing files will never disappear or change, provided the base backup timestamp is the same. However, new compressed WAL files may appear at any time. If this happens while the backup script is running, it will be called again with a refreshed current timestamp.

On failure, the backup script will be retried indefinitely. The "current timestamp" will be the same if no new WAL files have appeared.

Restoring from a backup

To restore from a backup, simply extract base.tzst to an empty $PGDATA directory and uncompress all WAL files into $PGDATA/pg_wal.

Recovery can take quite a while if many WAL files need to be replayed. It may be beneficial to do it without fsync and instead sync later:

mkdir -m700 data
tar -C data -xf backup/base.tzst
zstd -d --output-dir-flat data/pg_wal backup/pg_wal/*
postgres --single -FD data < /dev/null
sync -f data

Example configuration

postgresql.conf:

archive_mode = on
archive_command = 'pgbak wal %p'
archive_timeout = 1h

postgresql.service:

[Unit]
# Make sure DNS works while postgres/pgbak is running
After=nss-lookup.target

[Service]
Environment=PGBAK=/media/backup/pgsql
# Wait up to 2 minutes for pgbak sync to finish after postgres is stopped.
ExecStopPost=pgbak wait 120

$PGBAK/scripts/backup:

#!/bin/sh
exec rsync -r ./ "backup@host.example:pg-$1/"

To make your first backup, simply restart postgres and call pg_switch_wal().

Misc

By default pgbak uses Zstandard's command-line tool for compression. The compression command and options may be tweaked in config.h.

Currently only Linux is supported because of O_TMPFILE and F_OFD_SETLK.