docker_bind/README_TROUBLESHOOTING.md

43 lines
2.7 KiB
Markdown

# Troubleshooting docker.named.service
I'm writing this so hopefully the next time I touch this serivce, I can remember how to actually troubleshoot this service, and what the common pitfalls are.
## Docker-based setup
I was in the *very slow* process of switching `diagonal.blacka.com` from using `docker` to using `podman` and `podman-systemd.unit` when I last touched this. Let's assume we are still on docker.
It took me a while to remember how this service is run. It is run as simple systemd unit. No docker-compose, or anything. But all of the docker run option are in a shell script.
The unit file is in /etc/systemd/system/docker.named.service
Commands to stop/start, see the status
```bash
sudo systemctl status docker.named.service
sudo systemctl start docker.named.service
sudo systemctl stop docker.named.service
sudo journalctl -u docker.named.service
```
The config install is all in `/etc/bind` -- this is mounted into the container in different spots, but we have it all together, here.
Some pitfalls for working on this:
1. the TSIG keys are encrypted by git-crypt. There is a filter on the source repo that encrypts the files on update. This tool exists entirely to have these files unreadable, but checked in, in git. When you check out or pull from the repo, you will need to unlock the repo -- this will decrypt all of the encrypted files.
* The main README talks about how to get the key, but it is in my 1password, although I probably already have it downloaded. I expect it just be sitting in ~/src/docker_bind on diagonal itself.
* The main point is *check to see that files are not encrypted* before copying the updated config into place.
2. The ISC-provided docker image seems to change the internal UIDs on a periodic basis. This is a critical detail for running this, so I wonder how other users deal with this.
* There is code in `setup.sh` and `setup_docker.sh` to *create* the bind user and group, but it doesn't yet *correct* the UIDs when they change
* In any case, there is some code in there to fetch the UID and GID out of the container, which also gives you a hint on how to run the container and get a shell.
In general, if the container doesn't start, we need to just try and run it in debug mode in the container. The way I did this last:
```bash
export IMAGE=docker.io/internetsystemsconsortium/bind9:9.18
docker run -ti --rm --entrypoint=/bin/sh -v /etc/bind/cfg:/etc/bind -v /etc/bind/cache:/var/cache/bind -v /etc/bind/zones:/var/lib/bind -v /etc/bind/log:/var/log "$IMAGE"
# now inside the image
/usr/sbin/named -u bind -g -c /etc/bind/named.conf
```
Then look at the logging. There is a lot of noise (which we should probably investigate), but there should be a real error in there.