People often report and ask about these "failures".
More-so previously, when the `docker kill/rm` output was collected,
but it still happens now when people do `systemctl status
matrix-something` and notice that it says "FAILURE".
Suppressing to avoid further time being wasted on saying "this is
expected".
The `to_nice_yaml` helper will by default wrap any string YAML values on
the first space after column 80. This can in worst case yield invalid
YAML syntax. More details in Ansible's documentation here:
https://docs.ansible.com/ansible/latest/user_guide/playbooks_filters.html#formatting-data-yaml-and-json
In short, you need to explicitly provide a custom width argument of a
high number of some kind to avoid the line wrapping.
Reverts b1b4ba501f, 90c9801c56, a3c84f78ca, ..
I haven't really traced it (yet), but on some servers, I'm observing
`ansible-playbook ... --tags=start` completing very slowly, waiting
to stop services. I can't reproduce this on all Matrix servers I manage.
I suspect that either the systemd version is to blame or that some
specific service is not responding well to some `docker kill/rm` command.
`ExecStop` seems to work great in all cases and it's what we've been
using for a very long time, so I'm reverting to that.
Until now, we were leaving services "enabled"
(symlinks in /etc/systemd/system/multi-user.target.wants/).
We clean these up now. Broken symlinks may still exist in older
installations that enabled/disabled services. We're not taking care
to fix these up. It's just a cosmetic defect anyway.
These are just defensive cleanup tasks that we run.
In the good case, there's nothing to kill or remove, so they trigger an
error like this:
> Error response from daemon: Cannot kill container: something: No such container: something
and:
> Error: No such container: something
People often ask us if this is a problem, so instead of always having to
answer with "no, this is to be expected", we'd rather eliminate it now
and make logs cleaner.
In the event that:
- a container is really stuck and needs cleanup using kill/rm
- and cleanup fails, and we fail to report it because of error
suppression (`2>/dev/null`)
.. we'd still get an error when launching ("container name already in use .."),
so it shouldn't be too hard to investigate.
Now that 0.7.2 is out, the Docker image supports Postgres
and we can do the (SQLite -> Postgres) migration.
I've also found out that we needed to fix up the `tokens.ex_date` column
data type a bit to prevent matrix-registration from raising exceptions
when comparing `datetime.now()` with `ex_date` coming from the database.
Example:
> File "/usr/local/lib/python3.8/site-packages/matrix_registration/tokens.py", line 58, in valid
> expired = self.ex_date < datetime.now()
> TypeError: can't compare offset-naive and offset-aware datetimes
This switches us to a container image maintained by the
matrix-registration developer.
0.7.2 also supports a `base_url` configuration option we can use to
make it easier to reverse-proxy at a different base URL.
We still keep some workarounds, because of this issue:
https://github.com/ZerataX/matrix-registration/issues/47
Auto-migration and everything seems to work. It's just that
matrix-registration cannot load the Python modules required
for talking to a Postgres database.
Tracked here: https://github.com/ZerataX/matrix-registration/issues/44
Until this gets fixed, we'll continue default to 'sqlite'.
v0.7.0 is broken right now, because it calls
`/_matrix/client/r0/admin/register`, which is now at
`/_synapse/admin/v1/register`.
This has been fixed here: 6b26255fea
.. but it's not part of any release.
Switching to `master` (`docker.io/devture/zeratax-matrix-registration:latest`) until it gets resolved.
Reported upstream here: https://github.com/ZerataX/matrix-registration/issues/43
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:
```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```
The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:
> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start
A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
`-v` magically creates the source destination as a directory,
if it doesn't exist already. We'd like to avoid this magic
and the potential breakage that it might cause.
We'd rather fail while Docker tries to find things to `--mount`
than have it automatically create directories and fail anyway,
while having contaminated the filesystem.
There's a lot more `-v` instances remaining to be fixed later on.
This is just some start.
Things like `matrix_synapse_container_additional_volumes` and
`matrix_nginx_proxy_container_additional_volumes` were not changed to
use `--mount`, as options for each one are passed differently
(`ro` is `ro`, but `rw` doesn't exist and `slave` is `bind-propagation=slave`).
To avoid breaking people's custom volume mounts, we keep it as it is for now.
A deficiency with `--mount` is that it lacks the `z` option (SELinux
ownership changes), and some of our `-v` instances use that. I'm not
sure how supported SELinux is for us right now, but it might be,
and breaking that would not be a good idea.
Fixes https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/716
This patch makes us use more fully-qualified container image names
(either prefixed with docker.io/ or with localhost/).
The latter happens when self-building is enabled.
We've recently had issues where if an image was removed manually
and the service was restarted (making `docker run` fetch it from Docker Hub, etc.),
we'd end up with a pulled image, even though we're aiming for a self-built one.
Re-running the playbook would then not do a rebuild, because:
- the image with that name already exists (even though it's something
else)
- we sometimes had conditional logic where we'd build only if the git
repo changed
By explicitly changing the name of the images (prefixing with localhost/),
we avoid such confusion and the possibility that we'd automatically pul something
which is not what we expect.
Also, I've removed that condition where building would happen on git
changes only. We now always build (unless an image with that name
already exists). We just force-build when the git repo changes.