Django on Fly.io with Litestream/LiteFS

I’ve been playing with Fly.io a lot recently (see my series on deploying Wagtail to Fly.io).

One of the neat things that has come out of Fly is a renewed interest across the dev world in SQLite - an embedded database that doesn’t need any special servers or delicate configuration.

Some part of this interest comes from the thought that if you had an SQLite database that sat right next to your application, in the same VM, with no network latency, that’s probably going to be pretty quick and pretty easy to deploy.

Although in some ways it feels like this idea comes full circle back to the days of running a MySQL Server alongside our PHP application on a single VPS, we’re also in an era where we need to deal with things like geographic distribution, ephemeral filesystems and scale-to-zero.

So we want to run our apps in a nice PaaS, and also quite like the idea of our database being local to our application code, but there’s a few conflicts here:

Thankfully Fly have been funding the development of some interesting tools; Litestream and LiteFS which aim to solve this.

The difference between these tools is not particularly obvious; so to summarise:

So we know what these tools do, let’s experiment with getting our Django applications running with them on Fly.io

Litestream

For Litestream, we’ll need:

  1. Prepare your Fly application with flyctl launch (we don’t need a Postgres database if it asks).

  2. Set all the environment variables we’re going to need by creating a new file (call it something like env-vars):

    LITESTREAM_ACCESS_KEY_ID=[your S3-compatible access key]
    LITESTREAM_SECRET_ACCESS_KEY=[your S3 compatible secret key]
    DB_DIR=/db
    DATABASE_URL=sqlite:////db/sqlite.db
    S3_DB_URL=s3://[your-bucket-name]/db

    DB_DIR will be the directory where the database is replicated to, DATABASE_URL is the path where Django and dj-database-url can find your database file, and S3_DB_URL is the path to your S3-compatible bucket.

    Run

    flyctl secrets import < env-vars

    to import these values in to your Fly environment.

  3. Create a litestream.yml:

    exec: uwsgi
    dbs:
        - path: "$DB_DIR/db.sqlite"
          replicas:
            - url: "$S3_DB_URL"

    Replace your exec section with whatever you you normally run to start your web server. Litestream will do its stuff and conveniently run our own application, exiting when our server exits.

  4. Create a script, start.sh, that will run on application start to make sure all our directories are created:

    #!/usr/bin/env bash
    if [[ -z "$DB_DIR" ]]; then
        echo "DB_DIR env var not specified - this should be a path of the directory where the database file should be stored"
        exit 1
    fi
    if [[ -z "$S3_DB_URL" ]]; then
        echo "S3_DB_URL env var not specified - this should be an S3-style URL to the location of the replicated database file"
        exit 1
    fi
    
    mkdir -p "$DB_DIR"
    
    litestream restore -if-db-not-exists -if-replica-exists -o "$DB_DIR/db.sqlite" "$S3_DB_URL"
    
    ./manage.py migrate --noinput
    ./manage.py createcachetable
    
    chmod -R a+rwX "$DB_DIR"
    
    exec litestream replicate -config litestream.yml

    This:

    • Checks important environment variables are set.
    • Creates a database directory and makes sure it’s open enough for the app to read/write to it (you might choose to tighten this up if appropriate).
    • Restores the database using litestream if it doesn’t already exist.
    • Runs migrate to make sure the database is up to date (or creates it if there wasn’t anything to restore).
    • Runs litestream replicate which will in turn run the exec command in the litestream config, starting the application.

    Update your Docker CMD to run this start.sh.

Once deployed with flyctl deploy, Litestream will start backing up your database. Careful, if you try to scale out by adding more instances, at best you’ll see out of sync data, at worst you’ll end up with a corrupt database.

LiteFS

The following is not battle-tested and is not suitable for production, but if you’re interested in experimenting, it’s a bit of fun.

For LiteFS, we’ll need:

  1. Prepare your Fly application with flyctl launch (we don’t need a Postgres database if it asks).

  2. Set all the environment variables we’re going to need by creating a new file (call it something like env-vars):

    DB_DIR=/db
    DATABASE_URL=sqlite:////db/sqlite.db

    DB_DIR will be the directory where the database is replicated to and DATABASE_URL is the path where Django and dj-database-url can find your database file.

    Run

    flyctl secrets import < env-vars

    to import these values in to your Fly environment.

  3. In your fly.toml, add:

    [experimental]
        enable_consul = true
    

    This gives us access to the shared Fly.io-managed Consul instance.

  4. Create a litefs.yml:

    exec: ./run.sh
    
    mount-dir: "$DB_DIR"
    data-dir: "/db-data"
    
    http:
      addr: ":20202"
    
    consul:
      url: "${FLY_CONSUL_URL}"
      advertise-url: "http://${FLY_REGION}.${FLY_APP_NAME}.internal:20202"

    Replace your exec section with whatever you you normally run to start your web server.

    The mount-dir is where LiteFS will create its filesystem (where the database will live), data-dir is where it keeps files it needs for replication. The http and consul blocks tell LiteFS how to talk to each other and where to find the Fly.io managed Consul instance.

  5. Create the run.sh that is started by LiteFS. We need things like migrations to run after LiteFS has set up its filesystem, so we do those in this script@:

    #!/usr/bin/env bash
    set -e
    echo "Starting app"
    
    ./manage.py migrate --noinput || true
    ./manage.py createcachetable
    chmod -R a+rwX "$DB_DIR"
    
    exec /venv/bin/uwsgi
  6. Create a script, start.sh, that will run on application start to make sure all our directories are created:

    #!/usr/bin/env bash
    if [[ -z "$DB_DIR" ]]; then
        echo "DB_DIR env var not specified - this should be a path of the directory where the database file should be stored"
        exit 1
    fi
    
    mkdir -p "$DB_DIR"
    
    exec litefs -config litefs.yml

    This:

    Update your Docker CMD to run this start.sh.

  7. We’re not there yet. We need to make sure database writes only go to our primary. To do this, we’ll register a database execute_wrapper which intercepts any write queries. I’ve got this in my base app’s __init__.py (heavily based on Adam Johnson’s django-read-only):

    import os
    import os.path
    from pathlib import Path
    from typing import Any, Callable, Generator
    
    from django.apps import AppConfig
    from djang.conf import settings
    from django.db import connections
    from django.db.backends.base.base import BaseDatabaseWrapper
    from django.db.backends.signals import connection_created
    
    read_only = False
    
    class BaseConfig(AppConfig):
        name = "bakerydemo.base"
        verbose_name = "base"
    
        def ready(self) -> None:
            db_name = settings.DATABASES['default']['NAME']
            db_dir = Path(db_name).parent
            primary_path = db_dir / ".primary"
            if not primary_path.is_file():
                return
            for alias in connections:
                connection = connections[alias]
                install_hook(connection)
            connection_created.connect(install_hook)
    
    
    def install_hook(connection: BaseDatabaseWrapper, **kwargs: object) -> None:
        if blocker not in connection.execute_wrappers:
            connection.execute_wrappers.insert(0, blocker)
    
    
    class QueriesAttemptedError(Exception):
        pass
    
    
    def blocker(
        execute: Callable[[str, str, bool, dict[str, Any]], Any],
        sql: str,
        params: str,
        many: bool,
        context: dict[str, Any],
    ) -> Any:
        if should_block(sql):
            raise QueriesAttemptedError(msg)
        return execute(sql, params, many, context)
    
    
    def should_block(sql: str) -> bool:
        return not sql.lstrip(" \n(").startswith(
            (
                "EXPLAIN ",
                "PRAGMA ",
                "ROLLBACK TO SAVEPOINT ",
                "RELEASE SAVEPOINT ",
                "SAVEPOINT ",
                "SELECT ",
                "SET ",
            )
        ) and sql not in ("BEGIN", "COMMIT", "ROLLBACK")

    This will raise an exception if the query will write to the database, and if the .primary file created by LiteFS exists (meaning this is not the primary).

  8. We need something to intercept this exception, so add some middleware:

    from django.http import HttpResponse
    from djang.conf import settings
    from . import QueriesAttemptedError
    
    def replay_middleware(get_response):
        def middleware(request):
            try:
                response = get_response(request)
            except QueriesAttemptedError:
                res = HttpResponse()
                # Find the name of the primary instance by reading the .primary file
                db_name = settings.DATABASES['default']['NAME']
                db_dir = Path(db_name).parent
                primary_path = db_dir / ".primary"
                primary = primary_path.read_text()
                res.headers['fly-replay'] = f"instance={primary}"
                return res
    
            return response
    
    return middleware

    and register it in your MIDDLEWARE settings.

    This catches the exception raised by the previously registered execute_wrapper, finds out where the primary database is hosted and returns a fly-replay header telling Fly.io; “Sorry, I can’t handle this request, please replay it to this database primary”.

Once deployed with flyctl deploy, LiteFS will start replicating your database!

The Future

These are fun tools to play with for now, but there’s clearly a lot of work to get them working with our normal apps.

I’m excited about how they could make getting a Django/Wagtail app deployed much more accessible, easier and cheaper, but they’re still some work to be done to make that a reality.

The LiteFS roadmap includes things like S3 replication (so we get similar backup features to Litestream), and write forwarding (so writes to read-replicas will automatically be forwarded to the primary). There’s a lot of promise there and I can’t wait to make more use of it!