From 8c89da4825322660a912b717ba83326151e0866e Mon Sep 17 00:00:00 2001 From: Hristo Venev Date: Fri, 23 Jul 2021 16:51:04 +0300 Subject: Initial commit --- README.md | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 0000000..3040ca0 --- /dev/null +++ b/README.md @@ -0,0 +1,93 @@ +# pgbak + +## Overview + +`pgbak` is a utility for PostgreSQL backups based on WAL archiving. Data is stored in a local directory provided through the `PGBAK` environment variable. + +Note that `pgbak` itself does not back up the data to any remote location. Instead the user must provide a script at `$PGBAK/scripts/backup` which is invoked when a backup needs to be performed. If no remote backups are desired, it can simply be a symlink to `/bin/true`. + +## Backup directory structure + +- `$PGBAK/TIMESTAMP/` — One or more directories containing base backups and WAL segments. The `TIMESTAMP` is when the base backup was taken. + - `$PGBAK/TIMESTAMP/base.tzst` — A compressed tarball containing a full database backup as created by `pg_basebackup`. + - `$PGBAK/TIMESTAMP/pg_wal/NNN.zst` — One or more compressed WAL segments following the full backup. +- `$PGBAK/current` — Symlink to the latest backup directory. +- `$PGBAK/scripts/` — Directory for user-provided scripts. + - `$PGBAK/scripts/backup` — script called after one or more WAL files were archived; required +- `$PGBAK/pgbak.lock` — Lock/state file. Locking is done through `fcntl()`, so deleting this file should never be necessary. + +Currently `pgbak` never deletes old backup directories. + +## Usage + +The following subcommands are provided: + +- `pgbak wal PATH` — Archive the given WAL file and exit. A background sync process will be started if one isn't running. +- `pgbak sync` — Run sync in the foreground if necessary. +- `pgbak force-sync` — Run sync in the foreground. The `backup` script is always invoked. +- `pgbak wait [TIMEOUT]` — If a sync is running, wait for it to finish. The `TIMEOUT` is in seconds and defaults to infinity. Exits with non-zero status on error or timeout. + +The sync process is responsible for maintaining the `$PGBAK` directory. In particular, it + +- creates a new full backup every now and again +- calls `$PGBAK/scripts/backup` + +### Writing `$PGBAK/scripts/backup` + +When the `backup` script is started, the current directory is set to the subdirectory of `$PGBAK` that needs to be backed up. The script is given the following arguments: + +- the timestamp of the base backup +- the current timestamp + +Existing files will never disappear/change, provided the base backup timestamp is the same. However, new compressed WAL files may appear at any time. If this happens while the `backup` script is running, it will be called again with a refreshed current timestamp. + +On failure, the `backup` script will be retried indefinitely. The timestamps will be the same if no new WAL files have appeared. + +### Restoring from a backup + +In order to restore from a backup, simply extract `base.tzst` to an empty `$PGDATA` directory and uncompress all WAL files into `$PGDATA/pg_wal`. + +Recovery can take quite a while if many WAL files need to be replayed. It may be beneficial to do it without `fsync` and instead `sync` later: + +```sh +mkdir -m700 data +tar -C data -xf backup/base.tzst +zstd -d --output-dir-flat data/pg_wal backup/pg_wal/* +postgres --single -FD data < /dev/null +sync -f data +``` + +## Example configuration + +`postgresql.conf`: +```ini +archive_mode = on +archive_command = 'pgbak wal %p' +archive_timeout = 1h +``` + +`postgresql.service`: +```ini +[Unit] +# Make sure DNS works while postgres/pgbak is running +After=nss-lookup.target + +[Service] +Environment=PGBAK=/media/backup/pgsql +# Wait up to 2 minutes for pgbak sync to finish after postgres is stopped. +ExecStopPost=pgbak wait 120 +``` + +`$PGBAK/scripts/backup`: +```sh +#!/bin/sh +exec rsync -r ./ "backup@host.example:pg-$1/" +``` + +To make your first backup, simply restart postgres and call `pg_switch_wal()`. + +## Misc + +`pgbak` uses [Zstandard](http://www.zstd.net/)'s command-line tool for compression. The compression options may be tweaked in `config.h`. + +Currently only Linux is supported because of `O_TMPFILE` and `F_OFD_SETLK`. -- cgit