aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md93
1 files changed, 93 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..3040ca0
--- /dev/null
+++ b/README.md
@@ -0,0 +1,93 @@
+# pgbak
+
+## Overview
+
+`pgbak` is a utility for PostgreSQL backups based on WAL archiving. Data is stored in a local directory provided through the `PGBAK` environment variable.
+
+Note that `pgbak` itself does not back up the data to any remote location. Instead the user must provide a script at `$PGBAK/scripts/backup` which is invoked when a backup needs to be performed. If no remote backups are desired, it can simply be a symlink to `/bin/true`.
+
+## Backup directory structure
+
+- `$PGBAK/TIMESTAMP/` — One or more directories containing base backups and WAL segments. The `TIMESTAMP` is when the base backup was taken.
+ - `$PGBAK/TIMESTAMP/base.tzst` — A compressed tarball containing a full database backup as created by `pg_basebackup`.
+ - `$PGBAK/TIMESTAMP/pg_wal/NNN.zst` — One or more compressed WAL segments following the full backup.
+- `$PGBAK/current` — Symlink to the latest backup directory.
+- `$PGBAK/scripts/` — Directory for user-provided scripts.
+ - `$PGBAK/scripts/backup` — script called after one or more WAL files were archived; required
+- `$PGBAK/pgbak.lock` — Lock/state file. Locking is done through `fcntl()`, so deleting this file should never be necessary.
+
+Currently `pgbak` never deletes old backup directories.
+
+## Usage
+
+The following subcommands are provided:
+
+- `pgbak wal PATH` — Archive the given WAL file and exit. A background sync process will be started if one isn't running.
+- `pgbak sync` — Run sync in the foreground if necessary.
+- `pgbak force-sync` — Run sync in the foreground. The `backup` script is always invoked.
+- `pgbak wait [TIMEOUT]` — If a sync is running, wait for it to finish. The `TIMEOUT` is in seconds and defaults to infinity. Exits with non-zero status on error or timeout.
+
+The sync process is responsible for maintaining the `$PGBAK` directory. In particular, it
+
+- creates a new full backup every now and again
+- calls `$PGBAK/scripts/backup`
+
+### Writing `$PGBAK/scripts/backup`
+
+When the `backup` script is started, the current directory is set to the subdirectory of `$PGBAK` that needs to be backed up. The script is given the following arguments:
+
+- the timestamp of the base backup
+- the current timestamp
+
+Existing files will never disappear/change, provided the base backup timestamp is the same. However, new compressed WAL files may appear at any time. If this happens while the `backup` script is running, it will be called again with a refreshed current timestamp.
+
+On failure, the `backup` script will be retried indefinitely. The timestamps will be the same if no new WAL files have appeared.
+
+### Restoring from a backup
+
+In order to restore from a backup, simply extract `base.tzst` to an empty `$PGDATA` directory and uncompress all WAL files into `$PGDATA/pg_wal`.
+
+Recovery can take quite a while if many WAL files need to be replayed. It may be beneficial to do it without `fsync` and instead `sync` later:
+
+```sh
+mkdir -m700 data
+tar -C data -xf backup/base.tzst
+zstd -d --output-dir-flat data/pg_wal backup/pg_wal/*
+postgres --single -FD data < /dev/null
+sync -f data
+```
+
+## Example configuration
+
+`postgresql.conf`:
+```ini
+archive_mode = on
+archive_command = 'pgbak wal %p'
+archive_timeout = 1h
+```
+
+`postgresql.service`:
+```ini
+[Unit]
+# Make sure DNS works while postgres/pgbak is running
+After=nss-lookup.target
+
+[Service]
+Environment=PGBAK=/media/backup/pgsql
+# Wait up to 2 minutes for pgbak sync to finish after postgres is stopped.
+ExecStopPost=pgbak wait 120
+```
+
+`$PGBAK/scripts/backup`:
+```sh
+#!/bin/sh
+exec rsync -r ./ "backup@host.example:pg-$1/"
+```
+
+To make your first backup, simply restart postgres and call `pg_switch_wal()`.
+
+## Misc
+
+`pgbak` uses [Zstandard](http://www.zstd.net/)'s command-line tool for compression. The compression options may be tweaked in `config.h`.
+
+Currently only Linux is supported because of `O_TMPFILE` and `F_OFD_SETLK`.