Hi, Hey, Hello!
First, lots of this information comes from here, which is where I started: https://www.bentasker.co.uk/posts/blog/general/running-mastodon-in-docker-compose.html
So I need to give credit where it's due, so thank you, Ben! The above link is referred to as "the Original Article" randomly, below.
Also on the list of "people that have come before" is Thomas Leister with some great sidekiq information! https://thomas-leister.de/en/scaling-up-mastodon/
The current docker-compose.yml and postgres.conf files are available at github: https://github.com/lfitt/mastodon-docker-compose/tree/main
With the rates that even a small instance is receiving federated traffic after the twitter migration, the current defaults mean that sidekiq queues can be full, always. This does depend on a few factors, one of them is available bandwidth. I have spoken with admins of single-user instances that are just fine with the out-of-box defaults, but they're on 200Mb+ connections, which is not something I have easy access to here in the middle of nowhere.
Notes:
This is aimed at moderately-advanced home-labbers. This will (probably) not run on your Pi. I'm using NVMe SSD storage and 8 full-fat x86 cores, and 8Gb of RAM, so as to not end up hitting the swap file.
I have a web proxy already setup and doing SSL, and this instance does not have a public IP address (the web proxy deals with that, and SSL for me), so this configuration will be listening on 0.0.0.0:3000 if you copy and paste it verbatim.
Use the guidance for nginx+SSL from the Original Article if you're doing this on an internet-facing server.
The version of docker compose we end up with in this guide is "docker compose" and not "docker-compose" the Original Article uses docker-compose, and I do too, randomly. It's a force of habit. If you're getting "command not found" errors, check there's a space and not a hyphen in your "docker compose" commands.
Getting Started
First, we'll start with a blank VM, I'm using Debian 11.5, you can use whatever floats your boat. An install with just "System Utilities" and "SSH Server" is all we need, everything else is going to just take up space. On the topic of space - I'm lazy, and I recommend one giant partition. 200Gb should be enough to get started, but if you're going to be running an instance "in anger", more is better. You'll want 8 cores, and 8Gb of RAM, at a minimum, the cores are going to be over-provisioned to the max, and the RAM is going to be mostly wasted. I think. We'll see.
If you are starting from a minimal Debian base, you won't have sudo, install it with (as root) `apt install sudo`, and then `usermod -aG sudo your_username_here` then log out of your root account and in with your user account. You won't need to login as root ever again, so don't.
There are some great recommendations for initial server setup over at the Mastodon Docs, including firewall rules, SSH Keys, and such.
Go do that now: https://docs.joinmastodon.org/admin/prerequisites/
Next, we'll setup docker-compose, the official docs are the best place to do this:
https://docs.docker.com/engine/install/debian/#set-up-the-repository
Make sure you also add yourself to the docker group, or life will be 87% more irritating:
https://docs.docker.com/engine/install/linux-postinstall/
Now we should have the ability to run `docker compose version` as a user and see results:
~$ docker compose version Docker Compose version v2.12.2
If this isn't you, don't move on just yet, you have problem solving to do! It is me, however, so I'm moving on!
Setting up the Mastodon Repo
We need a place to put our stuff. I've chosen /docker/ for this. I like to put the data dirs in their own place, that way moving to other storage solutions is easier than if application data is littered through your container files.
sudo mkdir -p /docker/containers && sudo mkdir -p /docker/data sudo chown -R $USER /docker cd /docker/containers git clone https://github.com/mastodon/mastodon.git cd mastodon latest=$(git describe --tags `git rev-list --tags --max-count=1`) git checkout $lastest -b ${latest}-branch
Now we need somewhere to actually put the live docker-compose.yml file so we don't accidentally clobber it with git operations:
cd .. mkdir live cd live cp ../mastodon/docker-compose.yml ./
Now we need to edit our docker-compose.yml
First, adjust all of the volumes to use our /docker/data directory, and adjust the logging. By default the logs are "all of the logs, all of the time."
Add the following to the end of each service:
logging: driver: json-file options: max-size: 150m
You can adjust the "150m" however you see fit.
"150m" is 150MegaBytes - it works for me, it's enough logs to get an idea of errors, but not so much that a `docker compose logs -f` causes regret, and best of all, won't fill your disks accidentally.
Next adjust the build lines to point to our mastodon git repo. We will be building from source instead of using images. No good reason. We just are.
build: ../mastodon/
Optimising the Database
Follow the "Setting up PostgreSQL" instructions from the original article, I'm pretty sure I missed a step. But before you go there, we have some leg-work to do!
We are going to be providing our own postgresql.conf file from /docker/data/live/postgresql.conf
The example below is all of the not-commented lines from a default postgres14 install, plus recommendations from pgtune for 4Gb of RAM and 8 cores with SSD storage.
/docker/containers/live/postgresql.conf:
listen_addresses = '*' log_timezone = 'UTC' datestyle = 'iso, mdy' timezone = 'UTC' default_text_search_config = 'pg_catalog.english' max_connections = 140 shared_buffers = 2GB effective_cache_size = 3GB maintenance_work_mem = 512MB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 5242kB min_wal_size = 80Mb max_wal_size = 2GB max_worker_processes = 8 max_parallel_workers_per_gather = 4 max_parallel_workers = 8 max_parallel_maintenance_workers = 4 # remove the following lines if you dont want/need postgresql stats (uses less RAM) shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.track = all
Optimising Sidekiq
"Sidekiq exhaustion" will bring your instance to it's knees pretty quickly. Well, not to it's knees, but it will seem very out-of-touch with the rest of the FediVerse. The default configuration has 5 workers, and that's it. We're going to make it much more parallel, thanks to Thomas.
The default sidekiq configuration will use about 400Mb of RAM. This config will bump the RSS memory usage up to 1.6Gb or more. I've seen it hitting nearly 2Gb on my test instance. This is why the database is only optimised for 4Gb of RAM, we still need space for activities!
To do this, we're going to be running multiple sidekiq containers. This lets us adjust single queues without having to restart all of sidekiq via the -c command line argument.
The ingress and pull queues are the ones that usually end up at the queue depth on my test instance, so they're the ones I've bumped.
You'll notice I've also dropped the log size for each of these instances. This is purely for space efficiency on the host.
sidekiq: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "default" -c 8 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-ingress: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "ingress" -c 16 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-pull: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "pull" -c 16 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-mailers: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "mailers" -c 4 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-push: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "push" -c 8 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-scheduler: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "scheduler" -c 8 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m
Our final docker-compose should look like the below example:
/docker/containers/live/docker-compose.yml:
version: '3' services: db: restart: always image: postgres:14-alpine shm_size: 256mb networks: - internal_network healthcheck: test: ['CMD', 'pg_isready', '-U', 'postgres'] volumes: - /docker/data/postgres14:/var/lib/postgresql/data - /docker/containers/live/postgresql.conf:/etc/postgresql.conf command: postgres -c config_file=/etc/postgresql.conf environment: - 'POSTGRES_HOST_AUTH_METHOD=trust' logging: driver: json-file options: max-size: 150m redis: restart: always image: redis:7-alpine networks: - internal_network healthcheck: test: ['CMD', 'redis-cli', 'ping'] volumes: - /docker/data/redis:/data logging: driver: json-file options: max-size: 150m # es: # restart: always # image: docker.elastic.co/elasticsearch/elasticsearch:7.17.4 # environment: # - "ES_JAVA_OPTS=-Xms512m -Xmx512m -Des.enforce.bootstrap.checks=true" # - "xpack.license.self_generated.type=basic" # - "xpack.security.enabled=false" # - "xpack.watcher.enabled=false" # - "xpack.graph.enabled=false" # - "xpack.ml.enabled=false" # - "bootstrap.memory_lock=true" # - "cluster.name=es-mastodon" # - "discovery.type=single-node" # - "thread_pool.write.queue_size=1000" # networks: # - external_network # - internal_network # healthcheck: # test: ["CMD-SHELL", "curl --silent --fail localhost:9200/_cluster/health || exit 1"] # volumes: # - /docker/data/elasticsearch:/usr/share/elasticsearch/data # ulimits: # memlock: # soft: -1 # hard: -1 # nofile: # soft: 65536 # hard: 65536 # ports: # - '127.0.0.1:9200:9200' # logging: # driver: json-file # options: # max-size: 150m web: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bash -c "rm -f /mastodon/tmp/pids/server.pid; bundle exec rails s -p 3000" networks: - external_network - internal_network healthcheck: # prettier-ignore test: ['CMD-SHELL', 'wget -q --spider --proxy=off localhost:3000/health || exit 1'] ports: - '3000:3000' depends_on: - db - redis # - es volumes: - /docker/data/public/system:/mastodon/public/system logging: driver: json-file options: max-size: 150m streaming: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: node ./streaming networks: - external_network - internal_network healthcheck: # prettier-ignore test: ['CMD-SHELL', 'wget -q --spider --proxy=off localhost:4000/api/v1/streaming/health || exit 1'] ports: - '127.0.0.1:4000:4000' depends_on: - db - redis logging: driver: json-file options: max-size: 150m sidekiq: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "default" -c 8 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-ingress: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "ingress" -c 32 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-pull: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "pull" -c 16 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-mailers: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "mailers" -c 4 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-push: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "push" -c 8 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m sidekiq-scheduler: build: ../mastodon/ image: tootsuite/mastodon restart: always env_file: .env.production command: bundle exec sidekiq -q "scheduler" -c 8 depends_on: - db - redis networks: - external_network - internal_network volumes: - /docker/data/public/system:/mastodon/public/system healthcheck: test: ['CMD-SHELL', "ps aux | grep '[s]idekiq\ 6' || false"] logging: driver: json-file options: max-size: 50m ## Uncomment to enable federation with tor instances along with adding the following ENV variables ## http_proxy=http://privoxy:8118 ## ALLOW_ACCESS_TO_HIDDEN_SERVICE=true # tor: # image: sirboops/tor # networks: # - external_network # - internal_network # # privoxy: # image: sirboops/privoxy # volumes: # - /docker/data/priv-config:/opt/config # networks: # - external_network # - internal_network # networks: external_network: internal_network: internal: true
NOW we're finally ready for "configuring the database" from the original article.
Starting the postgres instance will be slightly different for us.
Generate a postgres password:
cat /dev/urandom | tr -dc "a-zA-Z0-9" |fold -w 24 | head -n 1
Put that someplace safe, and also below in place of YOURPASSWORD
Maybe make a second one, for the mastodon database. I didn't, you should.
We're making two accounts with passwords here: postgres, the PostgreSQL Super User, and mastodon, the actual user that will connect to the mastodon database. They really, really shouldn't have the same password, but also if someone is this far into your system, it's probably a moot point.
docker run --rm --name postgres \ -v /docker/data/postgres14:/var/lib/postgresql/data \ -e POSTGRES_PASSWORD="YOURPASSWORD" \ -d postgres:14-alpine
This will pull the image and get it ready. You'll notice we haven't passed in our custom config, it's not required just yet.
Connect to the container, create a new database, and then stop it:
$ docker exec -it postgres psql -U postgres psql (14.6) Type "help" for help. postgres=# CREATE USER mastodon WITH PASSWORD 'YOURPASSWORD' CREATEDB; CREATE ROLE (optional, if you want stats) postgres=# \c mastodon You are now connected to database "mastodon" as user "postgres". mastodon=# CREATE extension pg_stat_statements; CREATE EXTENSION (/optional) mastodon=# exit $ docker stop postgres
Now we need to create our temporary .env.production file.
/docker/containers/live/.env.production:
DB_HOST=db DB_PORT=5432 DB_NAME=mastodon DB_USER=mastodon DB_PASS=YOURPASSWORD REDIS_HOST=redis REDIS_PORT=6379 REDIS_PASSWORD=
Mastodon Setup
From here, lean heavily on the "Mastodon Setup" section from the original article.
Once it has completed, it will ask you if you want to save the .env.production file. Say yes. But it won't , you'll need to manually copy the output into our .env.production file from above (replace everything with the new output)
You can increase the number of web workers available by adding the following to the end of your .env.production file:
WEB_CONCURRENCY=8
I have 8 cores in my test system, so I've set this to 8. There is a recommendation to set this to 1.5 times the core count on your machine, but that implies all the machine is doing is running the web server.
Obviously, a bigger number equals more RAM usage, and depending on how busy your instance gets, this number should move around to make a trade off between sidekiq using all of your CPU vs puma using all of your CPU.
Make sure you fix your permissions, and keep track of the passwords it gives you!
sudo chown -R 70:70 /docker/data/postgres14 sudo chown -R 991:991 /docker/data/public
One, Two, Skip a Few....
Now I've glossed over the whole nginx+SSL setup, but you should now have a functioning Mastodon instance running!
You can confirm sidekiq is doing it's thing form the command line:
$ ps aux | grep -Po '\K[sS]idekiq 6.*' sidekiq 6.5.7 mastodon [3 of 16 busy] sidekiq 6.5.7 mastodon [0 of 4 busy] sidekiq 6.5.7 mastodon [0 of 8 busy] sidekiq 6.5.7 mastodon [0 of 8 busy] sidekiq 6.5.7 mastodon [0 of 8 busy] sidekiq 6.5.7 mastodon [1 of 16 busy]
Next Steps
We need some things we need to do to have a reasonable life-expectancy of our mastodon instance. SysAdmin Stuff, like a big kid.
My initial plan was to work out how to put all of these steps into the docker-compose.yml file, but frankly, I'm getting lazy.
I'm leveraging cron on the host machine for all of these. Is it the best way? Not really. Will it work? Absolutely!
Database Backups:
First, make a directory to dump our backups to:
mkdir -p /docker/data/postgres-backups
Put the following into your cron. It's all one line, and will spit out a database backup (compressed with gzip) every 8 hours. Change however you see fit, more backups are always more good.
Obviously moving that file to a different host (preferably into a backup) is important, but that requires knowledge of your existing backup solution that I simply do not have.
Anyway, here's the cron line:
0 */8 * * * docker exec -t live-db-1 pg_dumpall -c -U postgres | gzip -9 > /docker/data/postgres-backups/dump_`date +%d-%m-%Y"_"%H_%M_%S`.sql.gz
Next we have some mastodon housekeeping to do: cleaning up old/orphan files.
You can check all of these manually by putting --dry-run on the end of the commands to see what they'll look like.
$ docker exec -it live-web-1 tootctl media usage Attachments: 28.5 GB (0 Bytes local) Custom emoji: 187 MB (0 Bytes local) Preview cards: 688 MB Avatars: 3.55 GB (75.6 KB local) Headers: 7.49 GB (269 KB local) Backups: 0 Bytes Imports: 0 Bytes Settings: 0 Bytes $ docker exec -it live-web-1 tootctl media remove-orphans --dry-run Found and removed orphan: cache/preview_cards/images/000/000/322/original/13bee8508a89ff8f.png Found and removed orphan: cache/preview_cards/images/000/003/143/original/cbc16e00eb33ee32.jpeg Found and removed orphan: cache/preview_cards/images/000/005/203/original/21016c11db78ac35.jpg Found and removed orphan: cache/preview_cards/images/000/005/203/original/629cd4306c9680c0.jpg Found and removed orphan: cache/preview_cards/images/000/005/203/original/e548289660f80ce3.jpg Found and removed orphan: cache/preview_cards/images/000/008/585/original/fa3a8fb30cdb21f6.jpg Found and removed orphan: cache/preview_cards/images/000/011/459/original/42182b5ceb6f8195.png Found and removed orphan: cache/preview_cards/images/000/026/716/original/27745f45851e3cd0.jpg 200357/200357 |====================================================================================================================| Time: 00:07:15 Removed 8 orphans (approx. 469 KB) (DRY RUN) $ docker exec -it live-web-1 tootctl media remove --dry-run --days=3 6546/6546 |========================================================================================================================| Time: 00:00:05 Removed 6546 media attachments (approx. 3.93 GB) (DRY RUN) $ docker exec -it live-web-1 tootctl preview_cards remove --dry-run --days=3 4477/4477 |========================================================================================================================| Time: 00:00:03 Removed 4477 preview cards (approx. 154 MB) (DRY RUN)
I recommend running these in the reverse order of the above demonstration, you can see by the run times the orphan cleanup is the slowest / most system intensive operation.
As noted by The Mastodon Documentation preview_cards should not be cleaned with less than days=14
We'll run these once a day just after 3am, staggered slightly:
5 3 * * * docker exec -it live-web-1 tootctl preview_cards remove --days=21 15 3 * * * docker exec -it live-web-1 tootctl media remove --days=14 25 3 * * * docker exec -it live-web-1 tootctl media remove-orphans
You'll notice I have not added elasticsearch to the setup. That's a job for future-me
And that's it, until a new version of Mastodon releases, when I'll most likely update this with notes on what I broke during the update, if anything.
If you're curious how much space an instance can take up, below are some numbers from my test instance. My test instance was joined to every working relay I could find a few days ago, but it was onlly stood up 5 days ago, so the numbers are going to be on the lower-end, but they should give you a reasonable ballpark figure. If you get aggressive with the cleanup cron jobs, these numbers should be able to be bought down a little.
$ uptime 02:08:35 up 5 days, 5:55, 1 user, load average: 0.57, 0.62, 0.60 $ free -m total used free shared buff/cache available Mem: 7955 4532 450 390 2972 2602 Swap: 974 823 151 $ sudo du -h /docker/data/ --max-depth=1 84M /docker/data/postgres-backups 549M /docker/data/postgres14 40M /docker/data/elasticsearch 49G /docker/data/public 18M /docker/data/redis 50G /docker/data/ $ sudo du -h --max-depth=1 /var/lib/docker 136K /var/lib/docker/network 4.0K /var/lib/docker/trust 18M /var/lib/docker/image 16K /var/lib/docker/plugins 4.0K /var/lib/docker/swarm 15M /var/lib/docker/buildkit 4.0K /var/lib/docker/runtimes 28K /var/lib/docker/volumes 22G /var/lib/docker/overlay2 20M /var/lib/docker/containers 4.0K /var/lib/docker/tmp 22G /var/lib/docker
If you have any thoughts or questions, feel free to reach out to me on the fediverse! @lucas@fitt.au