Yup. It's that good.

You might think from the picture that it might be a train wreck. But it's only a little bit.

It's not too bad, you just kinda need some back story first.

The story so far.

There are a whole lot of guides that will show you how to setup replication, and replicate, and how to setup pgpool2, and how to pool, but very little in the way of using those things. Certainly nothing that takes us deep as whatever the fuck this ends up being.

I have a docker container, and in that docker container, is postgres database, and one day, that database lost a bunch of data. It shouldn't have happened, but it did. After coming to terms with my now-broken database (ie: several days later) I decided database replication was the way - having a hot spare at all times is good, and the act of creating the hot spare also doubles as doing a backup. Kind of, in the sense that it's more of a backup than not doing any backups, but we're not here to judge how I live.

So I forge off, with little to no knowledge about how any of it works, I open a SearxNG tab and type "docker compose postgres replication"

Eventually I end up at some sites (see references) and develop some stuff. This is that stuff.

But first, the note at the end of this bad boy: https://github.com/docker-library/postgres/blob/master/16/bookworm/Dockerfile


# An additional setting that is recommended for all users regardless of this # value is the runtime "--stop-timeout" (or your orchestrator/runtime's # equivalent) for controlling how long to wait between sending the defined # STOPSIGNAL and sending SIGKILL (which is likely to cause data corruption). # # The default in most runtimes (such as Docker) is 10 seconds, and the # documentation at https://www.postgresql.org/docs/12/server-start.html notes # that even 90 seconds may not be long enough in many instances.

So I think I found why my database corrupted, and why postgres always seems to say it shut down uncleanly.

There's a fuckin' foot-gun built right into every single official docker image! If docker is taking "too long" (by default, longer than 10 seconds) to shut down, it'll send a kill -9 postgres' way, which will lead too ... you guessed it: Having a bad fuckin' day!

Good to know.

I found the reason for needing a better backup solution buried away in a comment at the end of a Dockerfile. How very Vogon.

The solution for that: increase the timeout in your docker-compose.yml:


    stop_signal: SIGTERM
    stop_grace_period: 5m30s

As I've just alluded to, my existing backup solution was sub-optimal. It was connecting to docker, running a pg_dumpall (or pg_dump) and then shipping that via NFS to my NAS.

The hidden failure mode here is that if you've got a corrupt database, then pg_dump it, you'll get a partial dump.

That backup will then be worse than useless, because it will look like it's completed, you'll get a file, it'll get shipped - it'll just be truncated.

The final piece of the replication puzzle for me, was the realisation that setting up a replication slave is a two-part process - part 1: the initial replication, part2: ongoing replication.

And the very last gotchya that I came across: authntication with pgpool2.

By this stage I'd already committed to using scram-sha-256 auth instead of pain text / md5, because it's the new newness and if I was going to AAS this, I may as well go all in.

pgpool2 has it's own laundry list of stupid shit it wants and needs in order to function like normal software, and that was, shall we say, a lernin' kerv.

In short:

  • everyone needs a username/password combo in the database, and also in pgpool2.
  • for some reason at some point I had to reset everyone's postgres passwords after getting pgpool going. unsure why.
  • pgpool uses a keyfile to do something, something, encryption? It let me use 'preparetobepegged!' for a bit, so I think it's just a salt.
  • that keyfile needs to be in a known location, and that location passed to tools, like pg_enc.
  • if it's wrong/different, no password decryption, and no pgpool client connections.

Enough bullshit already, let's go!

OK.

This is what we're setting up (in not-this order):

  • PGPool-II (pgpool)
  • PostgreSQL Replication master (db-primary)
  • PostgreSQL Replication replca (db-replica)
  • pgadmin

We don't need pgadmin, but it's nice, and makes life nice.

We're calling our project peggas. Because it's PostgreSQL As A Service, and also, like getting pegged. And Software As A Service is like getting pegged too, sooo.... make a directory, and call it peggas.

Here's mine:

lucas@BlackVelvet:~/docker/peggas$ ls -la
total 3296
drwxr-xr-x  6 lucas lucas    4096 Mar 20 03:32 .
drwxr-xr-x 64 lucas lucas    4096 Mar 18 16:06 ..
-rw-r--r--  1 lucas lucas      68 Mar 19 23:07 clean-shutdown.sh
-rwxr-xr-x  1 lucas lucas    1295 Mar 20 03:32 create_db_user.sh
-rw-r--r--  1 lucas lucas    4982 Mar 20 02:46 docker-compose.yml
-rw-r--r--  1 lucas lucas   29470 Mar 19 22:39 my-postgres.conf
drwxr-xr-x  3 lucas lucas    4096 Mar 20 02:00 pgpool
drwxr-xr-x  9 lucas lucas    4096 Mar 19 18:10 postgres
drwxr-xr-x  5 root  root     4096 Mar 19 17:37 primary
drwxr-xr-x  5 root  root     4096 Mar 16 18:45 secondary
-rw-r--r--  1 lucas lucas 3300446 Mar 19 23:47 thething.sql

The important bits:

docker-compose.yml - obviously.

./postgres/ - the official docker image git repo. I'm 99% certain you can use vanilla images, I started off on the wrong foot.

./primary/ - primary db

./primary/config/00_init.sql - replication user SQL. I'm not sure it procs.

./primary/data/ - postgres data dir

./primary/waldata/ - postgres WAL dir

./secondary/ - primary db

./secondary/data/ - postgres data dir

./secondary/waldata/ - postgres WAL dir

There's other scripts and bits around, we'll burn those bridges when we get to them.

The docker-compose file:

version: '3'

# large parts taken from https://medium.com/@eremeykin/how-to-setup-single-primary-postgresql-replication-with-docker-compose-98c48f233bbf
x-postgres-common:
  &postgres-common
  build: ./postgres/16/bookworm/
  user: postgres
  restart: always
  healthcheck:
    test: 'pg_isready -U pgadminuser --dbname=postgres'
    interval: 10s
    timeout: 5s
    retries: 5

x-postgres-common12:
  &postgres-common12
  build: ./postgres/12/bookworm/
  user: postgres
  restart: always
  healthcheck:
    test: 'pg_isready -U pgadminuser --dbname=postgres'
    interval: 10s
    timeout: 5s
    retries: 5


volumes:
  data-primary:
    driver: local
    driver_opts:
      type: nfs
      o: addr=10.1.1.10,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2
      device: 10.1.1.10:/mnt/pool/Services/peggas/data-primary


  wal-primary:
    driver: local
    driver_opts:
      type: nfs
      o: addr=10.1.1.10,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2
      device: 10.1.1.10:/mnt/pool/Services/peggas/wal-primary


  data-replica:
    driver: local
    driver_opts:
      type: nfs
      o: addr=10.1.1.10,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2
      device: 10.1.1.10:/mnt/pool/Services/peggas/data-secondary


  wal-replica:
    driver: local
    driver_opts:
      type: nfs
      o: addr=10.1.1.10,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2
      device: 10.1.1.10:/mnt/pool/Services/peggas/wal-secondary

networks:
  peggas:
    enable_ipv6: true
    ipam:
      config:
        - subnet: 10.1.32.64/29
          gateway: 10.1.32.65
        - subnet: 2001:44b8:419c:b316::32:600/120

services:

  pgpool:
    build: ./pgpool
    depends_on:
      - db-primary
      - db-replica
    environment:
      - PGPOOL_BACKEND_NODES=0:db-primary:5432:4:primary:ALWAYS_PRIMARY,1:db-replica:5432:6:replica
      - PGPOOL_POSTGRES_USERNAME=pgadminuser
      - PGPOOL_POSTGRES_PASSWORD=postgrespassword
      - PGPOOL_ADMIN_USERNAME=pgadmin
      - PGPOOL_ADMIN_PASSWORD=mysuperdooperreallygoodadminpassword
      - PGPOOL_ENABLE_LOAD_BALANCING=yes
      - PGPOOL_ENABLE_STATEMENT_LOAD_BALANCING=yes
      - PGPOOL_NUM_INIT_CHILDREN=10
      - PGPOOL_MAX_POOL=1
      - PGPOOL_SR_CHECK_USER=peggas
      - PGPOOL_SR_CHECK_PASSWORD=getreadytogetpegged
      - PGPOOLKEYFILE=/etc/pgpool2/.pgpoolkey
    networks:
      peggas:
        ipv4_address: 10.1.32.69
        ipv6_address: 2001:44b8:419c:b316::32:669
    volumes:
      - ./pgpool/config/:/etc/pgpool2/



  db-primary:
    <<: *postgres-common12
    # ./postgres/ is git clone https://github.com/docker-library/postgres.git
    restart: always
    networks:
      peggas:
        ipv4_address: 10.1.32.67
        ipv6_address: 2001:44b8:419c:b316::32:667
    shm_size: 128mb
    command: |
      postgres
      -c wal_level=replica
      -c hot_standby=on
      -c max_wal_senders=10
      -c max_replication_slots=10
      -c hot_standby_feedback=on
    volumes:
      - ./primary/data:/var/lib/postgresql/data
#      - "data-primary:/var/lib/postgresql/data"
      - ./primary/waldata:/var/lib/postgresql/waldata
#      - "wal-primary:/var/lib/postgresql/waldata"
      - ./primary/config/00_init.sql:/docker-entrypoint-initdb.d/00_init.sql
    stop_signal: SIGTERM
    stop_grace_period: 5m30s
    environment:
      POSTGRES_USER: pgadminuser
      POSTGRES_PASSWORD: postgrespassword
      POSTGRES_REPLICATION_PASSWORD: getreadytogetpegged
      POSTGRES_INITDB_WALDIR: /var/lib/postgresql/waldata
      POSTGRES_HOST_AUTH_METHOD: "scram-sha-256\nhost replication peggas  2001:44b8:419c:b316::32:600/120 md5\nhost all all 10.1.0.0/16 md5"
      POSTGRES_INITDB_ARGS: "--auth-host=scram-sha-256"      




  db-replica:
    <<: *postgres-common12
#    build: ./postgres/12/bookworm/
    shm_size: 128mb
    ports:
      - 5433:5432
    environment:
      PGUSER: peggas
      PGPASSWORD: getreadytogetpegged
    stop_signal: SIGTERM
    stop_grace_period: 5m30s
    command: |
      postgres

#    command: |
#      bash -c "
#      until pg_basebackup --pgdata=/var/lib/postgresql/data -R --slot=replication_slot --host=db-primary --port=5432
#      do
#      echo 'Waiting for primary to connect...'
#      sleep 1s
#      done
#      echo 'Backup done, starting replica...'
#      chmod 0700 /var/lib/postgresql/data
#      postgres
#      "
    depends_on:
      - db-primary
    volumes:
      - "data-replica:/var/lib/postgresql/data"
      - "wal-replica:/var/lib/postgresql/waldata"
#      - ./primary/config/00_init.sql:/docker-entrypoint-initdb.d/00_init.sql
    networks:
      peggas:
        ipv4_address: 10.1.32.68
        ipv6_address: 2001:44b8:419c:b316::32:668



so that's just my docker-compose.yml

You can, and should, and hopefully, will, do things different. Go ahead and copy & paste it into a new file.

At the top, we have something cool I want to implement everywhere but I wont: the x-postgres-common: / x-postgres-common12: stanzas. they implement the health check for both images, and I think it's cool.

You can delete the x-postgres-common12: one, it's part of a database upgrade test/solution I'm cooking.

Next we have volumes. You can see I have NFS mounts for pg_data and WAL data directories. I'm currently using local (NVMe) storage for the primary, and NFS (ZFS on rust) for the replica.

You should do whatever you want and need. I have mounted samba shares on hosts and bind-mounted them into containers. I have used local storage and attempted convoluted backup scripts. I have also completely forgotten about persistent storage and nuked ephemeral containers with hours of config in them. Whatever it is, it should have been decided before now.

If you can't decide: local for the primary, remote for the replica.

Next up is the network: section. A while ago I setup BGP and so I have defined network configurations, almost nothing runs "on the host" as far as networking goes, it's all network subnets and dedicated IPs.
As you can see, IPv6 is enabled and doing stuff here. In the background, if IPv6 exists and works, that's what everything will use.

Note to self, do the write-up about BGP.

It turns out, we're building infrastructure.

db-primary is, as the name implies, an absolute prick and his mother hates him. Not true. That's all conjecture. db-primary may or may not owe me $20.

Either way, we're plugging a custom command into his container.

Is this going to foot-gun us? Let's look. Nope, the Dockerfile just runs `postgres` as it's final action. No foot-guns here!

I considered creating a postgres.conf file and overridding things and doing stuff, but, these arguments always need to be on the master, and we need to know they're always there so - this is fine.


Now, moving onto the db-replica section, we can see there's two command: stanzas, and one of them is really long, and commented out.

That's our two-part problem coming full circle.

As I understand it, PostgreSQL database replication works as follows:

  • There is a master, and the master knows it's a master.
  • There is a replication user (peggas) with replication permissions.
  • Replicant connects to the master with the replication user, and says "gimme everything"
  • Master sends replicant everything, and then hangs up.
  • Replicant restarts(*) as a read only database and reconnects to the master, which will continue shipping changes to the replica.

We, the meatbags in control of the keyboards need to do the next bit ourselves. (I mean, automating things is kinda why I started down this path, but here we are)

So, on first run we using the big command: to make the replica database start itself and connect to the master and collect all of the information.
Then, we comment out that giant lot, and restart the DB as a replica, and it will live that way, forever.

Oh Jesus.

I'm this far into "let's automate a thing" and there's a manual step?

Fuck man, get your shit together.

here is a script to create users in both pgpoool and the backend database

#!/bin/bash
#
#

#USERNAME=$1
#PASSWORD=$2
PRIMARYDB=db-primary
POOLHOST=10.1.32.69


# Check if username and password are provided
if [ $# -lt 2 ]; then
    echo "Usage: $0   [primarydb=${PRIMARYDB}] [poolhost=${POOLHOST}]"
    exit 1
fi

# Assigning arguments to variables
USERNAME=$1
PASSWORD=$2

# Optional primarydb and poolhost arguments
PRIMARYDB=${3:-db-primary}
echo "Primary Database: ${PRIMARYDB}"

if [ $# -eq 4 ]; then
    poolhost=$4
    echo "Pool Host: ${POOLHOST}"
else
    poolhost=10.1.32.69
    echo "Pool Host: ${POOLHOST} (default)"
fi

# Printing out username and password
echo "Username: ${PASSWORD}"
echo "Password: "


echo "Adding ${USERNAME} to ${PRIMARYDB}..."

echo "CREATE ROLE ${USERNAME} LOGIN NOSUPERUSER NOCREATEDB;" > /tmp/temp.sql
echo "ALTER ROLE ${USERNAME}  WITH ENCRYPTED PASSWORD '${PASSWORD}';" >> /tmp/temp.sql

docker compose cp /tmp/temp.sql ${PRIMARYDB}:/tmp/temp.sql

docker compose exec -u postgres -it ${PRIMARYDB} psql -U pgadmin postgres -f /tmp/temp.sql


echo -e "${USERNAME}:${PASSWORD}\n" >> ./pgpool/config/pool_plaintext
cat ./pgpool/config/pool_plaintext


docker compose exec -u root -it pgpool pg_enc -m -k /etc/pgpool2/.pgpoolkey -f /etc/pgpool2/pgpool.conf  -i /etc/pgpool2/pool_plaintext

References: