Yup. It's that good.
You might think from the picture that it might be a train wreck. But it's only a little bit.
It's not too bad, you just kinda need some back story first.
The story so far.
There are a whole lot of guides that will show you how to setup replication, and replicate, and how to setup pgpool2, and how to pool, but very little in the way of using those things. Certainly nothing that takes us deep as whatever the fuck this ends up being.
I have a docker container, and in that docker container, is postgres database, and one day, that database lost a bunch of data. It shouldn't have happened, but it did. After coming to terms with my now-broken database (ie: several days later) I decided database replication was the way - having a hot spare at all times is good, and the act of creating the hot spare also doubles as doing a backup. Kind of, in the sense that it's more of a backup than not doing any backups, but we're not here to judge how I live.
So I forge off, with little to no knowledge about how any of it works, I open a SearxNG tab and type "docker compose postgres replication"
Eventually I end up at some sites (see references) and develop some stuff. This is that stuff.
But first, the note at the end of this bad boy: https://github.com/docker-library/postgres/blob/master/16/bookworm/Dockerfile
# An additional setting that is recommended for all users regardless of this # value is the runtime "--stop-timeout" (or your orchestrator/runtime's # equivalent) for controlling how long to wait between sending the defined # STOPSIGNAL and sending SIGKILL (which is likely to cause data corruption). # # The default in most runtimes (such as Docker) is 10 seconds, and the # documentation at https://www.postgresql.org/docs/12/server-start.html notes # that even 90 seconds may not be long enough in many instances.
So I think I found why my database corrupted, and why postgres always seems to say it shut down uncleanly.
There's a fuckin' foot-gun built right into every single official docker image! If docker is taking "too long" (by default, longer than 10 seconds) to shut down, it'll send a kill -9 postgres' way, which will lead too ... you guessed it: Having a bad fuckin' day!
Good to know.
I found the reason for needing a better backup solution buried away in a comment at the end of a Dockerfile. How very Vogon.
The solution for that: increase the timeout in your docker-compose.yml:
stop_signal: SIGTERM stop_grace_period: 5m30s
As I've just alluded to, my existing backup solution was sub-optimal. It was connecting to docker, running a pg_dumpall (or pg_dump) and then shipping that via NFS to my NAS.
The hidden failure mode here is that if you've got a corrupt database, then pg_dump it, you'll get a partial dump.
That backup will then be worse than useless, because it will look like it's completed, you'll get a file, it'll get shipped - it'll just be truncated.
The final piece of the replication puzzle for me, was the realisation that setting up a replication slave is a two-part process - part 1: the initial replication, part2: ongoing replication.
And the very last gotchya that I came across: authntication with pgpool2.
By this stage I'd already committed to using scram-sha-256 auth instead of pain text / md5, because it's the new newness and if I was going to AAS this, I may as well go all in.
pgpool2 has it's own laundry list of stupid shit it wants and needs in order to function like normal software, and that was, shall we say, a lernin' kerv.
In short:
- everyone needs a username/password combo in the database, and also in pgpool2.
- for some reason at some point I had to reset everyone's postgres passwords after getting pgpool going. unsure why.
- pgpool uses a keyfile to do something, something, encryption? It let me use 'preparetobepegged!' for a bit, so I think it's just a salt.
- that keyfile needs to be in a known location, and that location passed to tools, like pg_enc.
- if it's wrong/different, no password decryption, and no pgpool client connections.
Enough bullshit already, let's go!
OK.
This is what we're setting up (in not-this order):
- PGPool-II (pgpool)
- PostgreSQL Replication master (db-primary)
- PostgreSQL Replication replca (db-replica)
- pgadmin
We don't need pgadmin, but it's nice, and makes life nice.
We're calling our project peggas. Because it's PostgreSQL As A Service, and also, like getting pegged. And Software As A Service is like getting pegged too, sooo.... make a directory, and call it peggas.
Here's mine:
lucas@BlackVelvet:~/docker/peggas$ ls -la total 3296 drwxr-xr-x 6 lucas lucas 4096 Mar 20 03:32 . drwxr-xr-x 64 lucas lucas 4096 Mar 18 16:06 .. -rw-r--r-- 1 lucas lucas 68 Mar 19 23:07 clean-shutdown.sh -rwxr-xr-x 1 lucas lucas 1295 Mar 20 03:32 create_db_user.sh -rw-r--r-- 1 lucas lucas 4982 Mar 20 02:46 docker-compose.yml -rw-r--r-- 1 lucas lucas 29470 Mar 19 22:39 my-postgres.conf drwxr-xr-x 3 lucas lucas 4096 Mar 20 02:00 pgpool drwxr-xr-x 9 lucas lucas 4096 Mar 19 18:10 postgres drwxr-xr-x 5 root root 4096 Mar 19 17:37 primary drwxr-xr-x 5 root root 4096 Mar 16 18:45 secondary -rw-r--r-- 1 lucas lucas 3300446 Mar 19 23:47 thething.sql
The important bits:
docker-compose.yml - obviously.
./postgres/ - the official docker image git repo. I'm 99% certain you can use vanilla images, I started off on the wrong foot.
./primary/ - primary db
./primary/config/00_init.sql - replication user SQL. I'm not sure it procs.
./primary/data/ - postgres data dir
./primary/waldata/ - postgres WAL dir
./secondary/ - primary db
./secondary/data/ - postgres data dir
./secondary/waldata/ - postgres WAL dir
There's other scripts and bits around, we'll burn those bridges when we get to them.
The docker-compose file:
version: '3' # large parts taken from https://medium.com/@eremeykin/how-to-setup-single-primary-postgresql-replication-with-docker-compose-98c48f233bbf x-postgres-common: &postgres-common build: ./postgres/16/bookworm/ user: postgres restart: always healthcheck: test: 'pg_isready -U pgadminuser --dbname=postgres' interval: 10s timeout: 5s retries: 5 x-postgres-common12: &postgres-common12 build: ./postgres/12/bookworm/ user: postgres restart: always healthcheck: test: 'pg_isready -U pgadminuser --dbname=postgres' interval: 10s timeout: 5s retries: 5 volumes: data-primary: driver: local driver_opts: type: nfs o: addr=10.1.1.10,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 device: 10.1.1.10:/mnt/pool/Services/peggas/data-primary wal-primary: driver: local driver_opts: type: nfs o: addr=10.1.1.10,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 device: 10.1.1.10:/mnt/pool/Services/peggas/wal-primary data-replica: driver: local driver_opts: type: nfs o: addr=10.1.1.10,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 device: 10.1.1.10:/mnt/pool/Services/peggas/data-secondary wal-replica: driver: local driver_opts: type: nfs o: addr=10.1.1.10,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 device: 10.1.1.10:/mnt/pool/Services/peggas/wal-secondary networks: peggas: enable_ipv6: true ipam: config: - subnet: 10.1.32.64/29 gateway: 10.1.32.65 - subnet: 2001:44b8:419c:b316::32:600/120 services: pgpool: build: ./pgpool depends_on: - db-primary - db-replica environment: - PGPOOL_BACKEND_NODES=0:db-primary:5432:4:primary:ALWAYS_PRIMARY,1:db-replica:5432:6:replica - PGPOOL_POSTGRES_USERNAME=pgadminuser - PGPOOL_POSTGRES_PASSWORD=postgrespassword - PGPOOL_ADMIN_USERNAME=pgadmin - PGPOOL_ADMIN_PASSWORD=mysuperdooperreallygoodadminpassword - PGPOOL_ENABLE_LOAD_BALANCING=yes - PGPOOL_ENABLE_STATEMENT_LOAD_BALANCING=yes - PGPOOL_NUM_INIT_CHILDREN=10 - PGPOOL_MAX_POOL=1 - PGPOOL_SR_CHECK_USER=peggas - PGPOOL_SR_CHECK_PASSWORD=getreadytogetpegged - PGPOOLKEYFILE=/etc/pgpool2/.pgpoolkey networks: peggas: ipv4_address: 10.1.32.69 ipv6_address: 2001:44b8:419c:b316::32:669 volumes: - ./pgpool/config/:/etc/pgpool2/ db-primary: <<: *postgres-common12 # ./postgres/ is git clone https://github.com/docker-library/postgres.git restart: always networks: peggas: ipv4_address: 10.1.32.67 ipv6_address: 2001:44b8:419c:b316::32:667 shm_size: 128mb command: | postgres -c wal_level=replica -c hot_standby=on -c max_wal_senders=10 -c max_replication_slots=10 -c hot_standby_feedback=on volumes: - ./primary/data:/var/lib/postgresql/data # - "data-primary:/var/lib/postgresql/data" - ./primary/waldata:/var/lib/postgresql/waldata # - "wal-primary:/var/lib/postgresql/waldata" - ./primary/config/00_init.sql:/docker-entrypoint-initdb.d/00_init.sql stop_signal: SIGTERM stop_grace_period: 5m30s environment: POSTGRES_USER: pgadminuser POSTGRES_PASSWORD: postgrespassword POSTGRES_REPLICATION_PASSWORD: getreadytogetpegged POSTGRES_INITDB_WALDIR: /var/lib/postgresql/waldata POSTGRES_HOST_AUTH_METHOD: "scram-sha-256\nhost replication peggas 2001:44b8:419c:b316::32:600/120 md5\nhost all all 10.1.0.0/16 md5" POSTGRES_INITDB_ARGS: "--auth-host=scram-sha-256" db-replica: <<: *postgres-common12 # build: ./postgres/12/bookworm/ shm_size: 128mb ports: - 5433:5432 environment: PGUSER: peggas PGPASSWORD: getreadytogetpegged stop_signal: SIGTERM stop_grace_period: 5m30s command: | postgres # command: | # bash -c " # until pg_basebackup --pgdata=/var/lib/postgresql/data -R --slot=replication_slot --host=db-primary --port=5432 # do # echo 'Waiting for primary to connect...' # sleep 1s # done # echo 'Backup done, starting replica...' # chmod 0700 /var/lib/postgresql/data # postgres # " depends_on: - db-primary volumes: - "data-replica:/var/lib/postgresql/data" - "wal-replica:/var/lib/postgresql/waldata" # - ./primary/config/00_init.sql:/docker-entrypoint-initdb.d/00_init.sql networks: peggas: ipv4_address: 10.1.32.68 ipv6_address: 2001:44b8:419c:b316::32:668
so that's just my docker-compose.yml
You can, and should, and hopefully, will, do things different. Go ahead and copy & paste it into a new file.
At the top, we have something cool I want to implement everywhere but I wont: the x-postgres-common: / x-postgres-common12: stanzas. they implement the health check for both images, and I think it's cool.
You can delete the x-postgres-common12: one, it's part of a database upgrade test/solution I'm cooking.
Next we have volumes. You can see I have NFS mounts for pg_data and WAL data directories. I'm currently using local (NVMe) storage for the primary, and NFS (ZFS on rust) for the replica.
You should do whatever you want and need. I have mounted samba shares on hosts and bind-mounted them into containers. I have used local storage and attempted convoluted backup scripts. I have also completely forgotten about persistent storage and nuked ephemeral containers with hours of config in them. Whatever it is, it should have been decided before now.
If you can't decide: local for the primary, remote for the replica.
Next up is the network: section. A while ago I setup BGP and so I have defined network configurations, almost nothing runs "on the host" as far as networking goes, it's all network subnets and dedicated IPs.
As you can see, IPv6 is enabled and doing stuff here. In the background, if IPv6 exists and works, that's what everything will use.
Note to self, do the write-up about BGP.
It turns out, we're building infrastructure.
db-primary is, as the name implies, an absolute prick and his mother hates him. Not true. That's all conjecture. db-primary may or may not owe me $20.
Either way, we're plugging a custom command into his container.
Is this going to foot-gun us? Let's look. Nope, the Dockerfile just runs `postgres` as it's final action. No foot-guns here!
I considered creating a postgres.conf file and overridding things and doing stuff, but, these arguments always need to be on the master, and we need to know they're always there so - this is fine.
Now, moving onto the db-replica section, we can see there's two command: stanzas, and one of them is really long, and commented out.
That's our two-part problem coming full circle.
As I understand it, PostgreSQL database replication works as follows:
- There is a master, and the master knows it's a master.
- There is a replication user (peggas) with replication permissions.
- Replicant connects to the master with the replication user, and says "gimme everything"
- Master sends replicant everything, and then hangs up.
- Replicant restarts(*) as a read only database and reconnects to the master, which will continue shipping changes to the replica.
We, the meatbags in control of the keyboards need to do the next bit ourselves. (I mean, automating things is kinda why I started down this path, but here we are)
So, on first run we using the big command: to make the replica database start itself and connect to the master and collect all of the information.
Then, we comment out that giant lot, and restart the DB as a replica, and it will live that way, forever.
Oh Jesus.
I'm this far into "let's automate a thing" and there's a manual step?
Fuck man, get your shit together.
here is a script to create users in both pgpoool and the backend database
#!/bin/bash # # #USERNAME=$1 #PASSWORD=$2 PRIMARYDB=db-primary POOLHOST=10.1.32.69 # Check if username and password are provided if [ $# -lt 2 ]; then echo "Usage: $0[primarydb=${PRIMARYDB}] [poolhost=${POOLHOST}]" exit 1 fi # Assigning arguments to variables USERNAME=$1 PASSWORD=$2 # Optional primarydb and poolhost arguments PRIMARYDB=${3:-db-primary} echo "Primary Database: ${PRIMARYDB}" if [ $# -eq 4 ]; then poolhost=$4 echo "Pool Host: ${POOLHOST}" else poolhost=10.1.32.69 echo "Pool Host: ${POOLHOST} (default)" fi # Printing out username and password echo "Username: ${PASSWORD}" echo "Password: " echo "Adding ${USERNAME} to ${PRIMARYDB}..." echo "CREATE ROLE ${USERNAME} LOGIN NOSUPERUSER NOCREATEDB;" > /tmp/temp.sql echo "ALTER ROLE ${USERNAME} WITH ENCRYPTED PASSWORD '${PASSWORD}';" >> /tmp/temp.sql docker compose cp /tmp/temp.sql ${PRIMARYDB}:/tmp/temp.sql docker compose exec -u postgres -it ${PRIMARYDB} psql -U pgadmin postgres -f /tmp/temp.sql echo -e "${USERNAME}:${PASSWORD}\n" >> ./pgpool/config/pool_plaintext cat ./pgpool/config/pool_plaintext docker compose exec -u root -it pgpool pg_enc -m -k /etc/pgpool2/.pgpoolkey -f /etc/pgpool2/pgpool.conf -i /etc/pgpool2/pool_plaintext
References:
- Image stone from here: https://ghostsofdc.org/2012/07/03/train-collision-laurel-1922/
- Dockerfile for postgres 16: https://github.com/docker-library/postgres/blob/master/16/bookworm/Dockerfile
- Medium article #1: https://medium.com/@eremeykin/how-to-setup-single-primary-postgresql-replication-with-docker-compose-98c48f233bbf
- Medium article #2: https://medium.com/@arkaprabha.chatterjee31/docker-postgres-pgpool2-how-to-set-up-dockerized-postgresql-master-slave-databases-b7f3fb0036a2