aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 3040ca0675445fa28c73b18c1d1525b156de2f89 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# pgbak

## Overview

`pgbak` is a utility for PostgreSQL backups based on WAL archiving. Data is stored in a local directory provided through the `PGBAK` environment variable.

Note that `pgbak` itself does not back up the data to any remote location. Instead the user must provide a script at `$PGBAK/scripts/backup` which is invoked when a backup needs to be performed. If no remote backups are desired, it can simply be a symlink to `/bin/true`.

## Backup directory structure

- `$PGBAK/TIMESTAMP/` — One or more directories containing base backups and WAL segments. The `TIMESTAMP` is when the base backup was taken.
    - `$PGBAK/TIMESTAMP/base.tzst` — A compressed tarball containing a full database backup as created by `pg_basebackup`.
    - `$PGBAK/TIMESTAMP/pg_wal/NNN.zst` — One or more compressed WAL segments following the full backup.
- `$PGBAK/current` — Symlink to the latest backup directory.
- `$PGBAK/scripts/` — Directory for user-provided scripts.
    - `$PGBAK/scripts/backup` — script called after one or more WAL files were archived; required
- `$PGBAK/pgbak.lock` — Lock/state file. Locking is done through `fcntl()`, so deleting this file should never be necessary.

Currently `pgbak` never deletes old backup directories.

## Usage

The following subcommands are provided:

- `pgbak wal PATH` — Archive the given WAL file and exit. A background sync process will be started if one isn't running.
- `pgbak sync` — Run sync in the foreground if necessary.
- `pgbak force-sync` — Run sync in the foreground. The `backup` script is always invoked.
- `pgbak wait [TIMEOUT]` — If a sync is running, wait for it to finish. The `TIMEOUT` is in seconds and defaults to infinity. Exits with non-zero status on error or timeout.

The sync process is responsible for maintaining the `$PGBAK` directory. In particular, it

- creates a new full backup every now and again
- calls `$PGBAK/scripts/backup`

### Writing `$PGBAK/scripts/backup`

When the `backup` script is started, the current directory is set to the subdirectory of `$PGBAK` that needs to be backed up. The script is given the following arguments:

- the timestamp of the base backup
- the current timestamp

Existing files will never disappear/change, provided the base backup timestamp is the same. However, new compressed WAL files may appear at any time. If this happens while the `backup` script is running, it will be called again with a refreshed current timestamp.

On failure, the `backup` script will be retried indefinitely. The timestamps will be the same if no new WAL files have appeared.

### Restoring from a backup

In order to restore from a backup, simply extract `base.tzst` to an empty `$PGDATA` directory and uncompress all WAL files into `$PGDATA/pg_wal`.

Recovery can take quite a while if many WAL files need to be replayed. It may be beneficial to do it without `fsync` and instead `sync` later:

```sh
mkdir -m700 data
tar -C data -xf backup/base.tzst
zstd -d --output-dir-flat data/pg_wal backup/pg_wal/*
postgres --single -FD data < /dev/null
sync -f data
```

## Example configuration

`postgresql.conf`:
```ini
archive_mode = on
archive_command = 'pgbak wal %p'
archive_timeout = 1h
```

`postgresql.service`:
```ini
[Unit]
# Make sure DNS works while postgres/pgbak is running
After=nss-lookup.target

[Service]
Environment=PGBAK=/media/backup/pgsql
# Wait up to 2 minutes for pgbak sync to finish after postgres is stopped.
ExecStopPost=pgbak wait 120
```

`$PGBAK/scripts/backup`:
```sh
#!/bin/sh
exec rsync -r ./ "backup@host.example:pg-$1/"
```

To make your first backup, simply restart postgres and call `pg_switch_wal()`.

## Misc

`pgbak` uses [Zstandard](http://www.zstd.net/)'s command-line tool for compression. The compression options may be tweaked in `config.h`.

Currently only Linux is supported because of `O_TMPFILE` and `F_OFD_SETLK`.