Django on Fly.io with Litestream/LiteFS
I’ve been playing with Fly.io a lot recently (see my series on deploying Wagtail to Fly.io).
One of the neat things that has come out of Fly is a renewed interest across the dev world in SQLite - an embedded database that doesn’t need any special servers or delicate configuration.
Some part of this interest comes from the thought that if you had an SQLite database that sat right next to your application, in the same VM, with no network latency, that’s probably going to be pretty quick and pretty easy to deploy.
Although in some ways it feels like this idea comes full circle back to the days of running a MySQL Server alongside our PHP application on a single VPS, we’re also in an era where we need to deal with things like geographic distribution, ephemeral filesystems and scale-to-zero.
So we want to run our apps in a nice PaaS, and also quite like the idea of our database being local to our application code, but there’s a few conflicts here:
- PaaS tools like Heroku/Fly tend to offer ephemeral storage, or no guarantees on the safety of storage. Trying to keep an SQLite database around on this sort of storage just won’t work out.
- A common approach to scaling is to “scale out” - start up more instances of your application and load balance between them. How would that work with SQLite? Even if you could access the same database file from each instance, we’re re-introducing latency and as SQLite can’t be written to by multiple processes at once, we’re probably slowing our app down too.
Thankfully Fly have been funding the development of some interesting tools; Litestream and LiteFS which aim to solve this.
The difference between these tools is not particularly obvious; so to summarise:
-
Litestream was Ben Johnson’s first attempt at solving this problem, and is now focused primarily on disaster recovery. It’s a tool to stream all the changes made to your SQLite database to some remote storage, like S3, and then recover from it when you need to.
This is great, and it nicely solves our first conflict. Our application can be configured to restore the database from remote storage when it starts, and we can be safe knowing that any changes are being backed up as our application runs.
Unfortunately, it doesn’t solve our second problem, replicating our databases to other instances of our app if we decide to scale out. While there were plans (and an initial implementation) for this in Litestream, live replication was instead moved to the second project, LiteFS.
-
LiteFS does some magic with FUSE to allow it to intercept SQLite transactions and then replicate to multiple instances of your application. It’s a little more complicated as you need additional tools like Consul so that it knows where to find the primary instance (where it will direct queries that write to the database), but it solves our second conflict!
Alas, our first conflict isn’t yet solved by LiteFS - if all your nodes go away, there’s nowhere to replicate your database from so it too will disappear. S3 replication like in Litestream is on the roadmap however, so it seems like LiteFS is fixed to solve all our problems!
So we know what these tools do, let’s experiment with getting our Django applications running with them on Fly.io
Litestream
For Litestream, we’ll need:
- An S3-compatible storage bucket and access keys
- Our django app, ideally configured with
dj-database-url
for convenience - The
litestream
binary available to our application. I have:
in my Dockerfile.wget https://github.com/benbjohnson/litestream/releases/download/v0.3.8/litestream-v0.3.9-linux-amd64.deb \ && dpkg -i litestream-v0.3.9-linux-amd64.deb
-
Prepare your Fly application with
flyctl launch
(we don’t need a Postgres database if it asks). -
Set all the environment variables we’re going to need by creating a new file (call it something like
env-vars
):LITESTREAM_ACCESS_KEY_ID=[your S3-compatible access key] LITESTREAM_SECRET_ACCESS_KEY=[your S3 compatible secret key] DB_DIR=/db DATABASE_URL=sqlite:////db/sqlite.db S3_DB_URL=s3://[your-bucket-name]/db
DB_DIR
will be the directory where the database is replicated to,DATABASE_URL
is the path where Django anddj-database-url
can find your database file, andS3_DB_URL
is the path to your S3-compatible bucket.Run
flyctl secrets import < env-vars
to import these values in to your Fly environment.
-
Create a
litestream.yml
:exec: uwsgi dbs: - path: "$DB_DIR/db.sqlite" replicas: - url: "$S3_DB_URL"
Replace your
exec
section with whatever you you normally run to start your web server. Litestream will do its stuff and conveniently run our own application, exiting when our server exits. -
Create a script,
start.sh
, that will run on application start to make sure all our directories are created:#!/usr/bin/env bash if [[ -z "$DB_DIR" ]]; then echo "DB_DIR env var not specified - this should be a path of the directory where the database file should be stored" exit 1 fi if [[ -z "$S3_DB_URL" ]]; then echo "S3_DB_URL env var not specified - this should be an S3-style URL to the location of the replicated database file" exit 1 fi mkdir -p "$DB_DIR" litestream restore -if-db-not-exists -if-replica-exists -o "$DB_DIR/db.sqlite" "$S3_DB_URL" ./manage.py migrate --noinput ./manage.py createcachetable chmod -R a+rwX "$DB_DIR" exec litestream replicate -config litestream.yml
This:
- Checks important environment variables are set.
- Creates a database directory and makes sure it’s open enough for the app to read/write to it (you might choose to tighten this up if appropriate).
- Restores the database using litestream if it doesn’t already exist.
- Runs migrate to make sure the database is up to date (or creates it if there wasn’t anything to restore).
- Runs
litestream replicate
which will in turn run theexec
command in the litestream config, starting the application.
Update your Docker
CMD
to run thisstart.sh
.
Once deployed with flyctl deploy
, Litestream will start backing up your database. Careful, if you try to scale out by adding more instances, at best you’ll see out of sync data, at worst you’ll end up with a corrupt database.
LiteFS
The following is not battle-tested and is not suitable for production, but if you’re interested in experimenting, it’s a bit of fun.
For LiteFS, we’ll need:
- Our django app, ideally configured with
dj-database-url
for convenience - The
litefs
binary available to our application. I have:
in my Dockerfile (alternatively, copy the binary from theRUN ARCH="amd64" && \ VERSION="0.2.0" && \ cd $(mktemp --directory) && \ wget "https://github.com/superfly/litefs/releases/download/v${VERSION}/litefs-v${VERSION}-linux-${ARCH}.tar.gz" && \ tar xvf "litefs-v${VERSION}-linux-${ARCH}.tar.gz" && \ mv litefs /usr/local/bin
litefs
image. - Some way to make sure our write requests only end up with the primary (we’ll come back to this).
-
Prepare your Fly application with
flyctl launch
(we don’t need a Postgres database if it asks). -
Set all the environment variables we’re going to need by creating a new file (call it something like
env-vars
):DB_DIR=/db DATABASE_URL=sqlite:////db/sqlite.db
DB_DIR
will be the directory where the database is replicated to andDATABASE_URL
is the path where Django anddj-database-url
can find your database file.Run
flyctl secrets import < env-vars
to import these values in to your Fly environment.
-
In your
fly.toml
, add:[experimental] enable_consul = true
This gives us access to the shared Fly.io-managed Consul instance.
-
Create a
litefs.yml
:exec: ./run.sh mount-dir: "$DB_DIR" data-dir: "/db-data" http: addr: ":20202" consul: url: "${FLY_CONSUL_URL}" advertise-url: "http://${FLY_REGION}.${FLY_APP_NAME}.internal:20202"
Replace your
exec
section with whatever you you normally run to start your web server.The
mount-dir
is where LiteFS will create its filesystem (where the database will live),data-dir
is where it keeps files it needs for replication. Thehttp
andconsul
blocks tell LiteFS how to talk to each other and where to find the Fly.io managed Consul instance. -
Create the
run.sh
that is started by LiteFS. We need things like migrations to run after LiteFS has set up its filesystem, so we do those in this script@:#!/usr/bin/env bash set -e echo "Starting app" ./manage.py migrate --noinput || true ./manage.py createcachetable chmod -R a+rwX "$DB_DIR" exec /venv/bin/uwsgi
-
Create a script,
start.sh
, that will run on application start to make sure all our directories are created:#!/usr/bin/env bash if [[ -z "$DB_DIR" ]]; then echo "DB_DIR env var not specified - this should be a path of the directory where the database file should be stored" exit 1 fi mkdir -p "$DB_DIR" exec litefs -config litefs.yml
This:
Update your Docker
CMD
to run thisstart.sh
. -
We’re not there yet. We need to make sure database writes only go to our primary. To do this, we’ll register a database
execute_wrapper
which intercepts any write queries. I’ve got this in mybase
app’s__init__.py
(heavily based on Adam Johnson’sdjango-read-only
):import os import os.path from pathlib import Path from typing import Any, Callable, Generator from django.apps import AppConfig from djang.conf import settings from django.db import connections from django.db.backends.base.base import BaseDatabaseWrapper from django.db.backends.signals import connection_created read_only = False class BaseConfig(AppConfig): name = "bakerydemo.base" verbose_name = "base" def ready(self) -> None: db_name = settings.DATABASES['default']['NAME'] db_dir = Path(db_name).parent primary_path = db_dir / ".primary" if not primary_path.is_file(): return for alias in connections: connection = connections[alias] install_hook(connection) connection_created.connect(install_hook) def install_hook(connection: BaseDatabaseWrapper, **kwargs: object) -> None: if blocker not in connection.execute_wrappers: connection.execute_wrappers.insert(0, blocker) class QueriesAttemptedError(Exception): pass def blocker( execute: Callable[[str, str, bool, dict[str, Any]], Any], sql: str, params: str, many: bool, context: dict[str, Any], ) -> Any: if should_block(sql): raise QueriesAttemptedError(msg) return execute(sql, params, many, context) def should_block(sql: str) -> bool: return not sql.lstrip(" \n(").startswith( ( "EXPLAIN ", "PRAGMA ", "ROLLBACK TO SAVEPOINT ", "RELEASE SAVEPOINT ", "SAVEPOINT ", "SELECT ", "SET ", ) ) and sql not in ("BEGIN", "COMMIT", "ROLLBACK")
This will raise an exception if the query will write to the database, and if the
.primary
file created by LiteFS exists (meaning this is not the primary). -
We need something to intercept this exception, so add some middleware:
from django.http import HttpResponse from djang.conf import settings from . import QueriesAttemptedError def replay_middleware(get_response): def middleware(request): try: response = get_response(request) except QueriesAttemptedError: res = HttpResponse() # Find the name of the primary instance by reading the .primary file db_name = settings.DATABASES['default']['NAME'] db_dir = Path(db_name).parent primary_path = db_dir / ".primary" primary = primary_path.read_text() res.headers['fly-replay'] = f"instance={primary}" return res return response return middleware
and register it in your
MIDDLEWARE
settings.This catches the exception raised by the previously registered
execute_wrapper
, finds out where the primary database is hosted and returns afly-replay
header telling Fly.io; “Sorry, I can’t handle this request, please replay it to this database primary”.
Once deployed with flyctl deploy
, LiteFS will start replicating your database!
The Future
These are fun tools to play with for now, but there’s clearly a lot of work to get them working with our normal apps.
I’m excited about how they could make getting a Django/Wagtail app deployed much more accessible, easier and cheaper, but they’re still some work to be done to make that a reality.
The LiteFS roadmap includes things like S3 replication (so we get similar backup features to Litestream), and write forwarding (so writes to read-replicas will automatically be forwarded to the primary). There’s a lot of promise there and I can’t wait to make more use of it!