matrix-docker-ansible-deploy/docs/configuring-playbook-synapse-s3-storage-provider.md

105 lines
6.4 KiB
Markdown
Raw Normal View History

# Storing Synapse media files on Amazon S3 with synapse-s3-storage-provider (optional)
If you'd like to store Synapse's content repository (`media_store`) files on Amazon S3 (or other S3-compatible service),
you can use the [synapse-s3-storage-provider](https://github.com/matrix-org/synapse-s3-storage-provider) media provider module for Synapse.
An alternative (which has worse performance) is to use [Goofys to mount the S3 store to the local filesystem](configuring-playbook-s3-goofys.md).
## How it works?
Summarized writings here are inspired by [this article](https://quentin.dufour.io/blog/2021-09-14/matrix-synapse-s3-storage/).
The way media storage providers in Synapse work has some caveats:
- Synapse still continues to use locally-stored files (for creating thumbnails, serving files, etc)
- the media storage provider is just an extra storage mechanism (in addition to the local filesystem)
- all files are stored locally at first, and then copied to the media storage provider (either synchronously or asynchronously)
- if a file is not available on the local filesystem, it's pulled from a media storage provider
You may be thinking **if all files are stored locally as well, what's the point**?
You can run some scripts to delete the local files once in a while, thus freeing up local disk space. If these files are needed in the future (for serving them to users, etc.), Synapse will pull them from the media storage provider on demand.
While you will need some local disk space around, it's only to accommodate usage, etc., and won't grow as large as your S3 store.
## Installing
After [creating the S3 bucket and configuring it](configuring-playbook-s3.md#bucket-creation-and-security-configuration), you can proceed to configure Goofys in your configuration file (`inventory/host_vars/matrix.<your-domain>/vars.yml`):
```yaml
matrix_synapse_ext_synapse_s3_storage_provider_enabled: true
matrix_synapse_ext_synapse_s3_storage_provider_config_bucket: your-bucket-name
matrix_synapse_ext_synapse_s3_storage_provider_config_region_name: some-region-name # e.g. eu-central-1
matrix_synapse_ext_synapse_s3_storage_provider_config_endpoint_url: https://.. # delete this whole line for Amazon S3
matrix_synapse_ext_synapse_s3_storage_provider_config_access_key_id: access-key-goes-here
matrix_synapse_ext_synapse_s3_storage_provider_config_secret_access_key: secret-key-goes-here
matrix_synapse_ext_synapse_s3_storage_provider_config_storage_class: STANDARD # or STANDARD_IA, etc.
# For additional advanced settings, take a look at `roles/matrix-synapse/defaults/main.yml`
```
If you have existing files in Synapse's media repository (`/matrix/synapse/media-store/..`):
- new files will start being stored both locally and on the S3 store
- the existing files will remain on the local filesystem only until [migrating them to the S3 store](#migrating-your-existing-media-files-to-the-s3-store)
- at some point (and periodically in the future), you can delete local files which have been uploaded to the S3 store already
## Migrating your existing media files to the S3 store
Migrating your existing data can happen in multiple ways:
- [using the `s3_media_upload` script from `synapse-s3-storage-provider`](#using-the-s3_media_upload-script-from-synapse-s3-storage-provider) (very slow when dealing with lots of data)
- [using another tool in combination with `s3_media_upload`](#using-another-tool-in-combination-with-s3_media_upload) (quicker when dealing with lots of data)
### Using the `s3_media_upload` script from `synapse-s3-storage-provider`
Instead of using `s3_media_upload` directly, which is very slow and painful for an initial data migration, we recommend [using another tool in combination with `s3_media_upload`](#using-another-tool-in-combination-with-s3_media_upload).
To copy your existing files, SSH into the server and run `/usr/local/bin/matrix-synapse-s3-storage-provider-shell`.
This launches a Synapse container, which has access to the local media store, Postgres database, S3 store and has some convenient environment variables configured for you to use (`MEDIA_PATH`, `BUCKET`, `ENDPOINT`, `UPDATE_DB_DAYS`, etc).
Then use the following commands (`$` values come from environment variables - they're **not placeholders** that you need to substitute):
- `s3_media_upload update-db $UPDATE_DB_DURATION` - create a local SQLite database (`cache.db`) with a list of media repository files (from the `synapse` Postgres database) eligible for operating on
- `$UPDATE_DB_DURATION` is influenced by the `matrix_synapse_ext_synapse_s3_storage_provider_update_db_day_count` variable (defaults to `0`)
- `$UPDATE_DB_DURATION` defaults to `0d` (0 days), which means **include files which haven't been accessed for more than 0 days** (that is, **all files will be included**).
- `s3_media_upload check-deleted $MEDIA_PATH` - check whether files in the local cache still exist in the local media repository directory
- `s3_media_upload upload $MEDIA_PATH $BUCKET --delete --endpoint-url $ENDPOINT` - uploads locally-stored files to S3 and deletes them from the local media repository directory
The `upload` command may take a lot of time to complete.
### Using another tool in combination with `s3_media_upload`
To migrate your existing local data to S3, we recommend to:
- **first** use another tool ([`aws s3`](#copying-data-to-amazon-s3) or [`b2 sync`](#copying-data-to-backblaze-b2), etc.) to copy the local files to the S3 bucket
- **only then** [use the `s3_media_upload` tool to finish the migration](#using-the-s3_media_upload-script-from-synapse-s3-storage-provider) (this checks to ensure all files are uploaded and then deletes the local files)
#### Copying data to Amazon S3
Generally, you need to use the `aws s3` tool.
This documentation section could use an improvement. Ideally, we'd come up with a guide like the one used in [Copying data to Backblaze B2](#copying-data-to-backblaze-b2) - running `aws s3` in a container, etc.
#### Copying data to Backblaze B2
To copy to Backblaze B2, start a container like this:
```sh
docker run -it --rm \
-w /work \
--env='B2_KEY_ID=YOUR_KEY_GOES_HERE' \
--env='B2_KEY_SECRET=YOUR_SECRET_GOES_HERE' \
--env='B2_BUCKET_NAME=YOUR_BUCKET_NAME_GOES_HERE' \
--mount type=bind,src=/matrix/synapse/storage/media-store,dst=/work,ro \
--entrypoint=/bin/sh \
tianon/backblaze-b2:3.6.0 \
-c 'b2 authorize-account $B2_KEY_ID $B2_KEY_SECRET > /dev/null && b2 sync /work b2://$B2_BUCKET_NAME --skipNewer'
```