Django on Fly.io with Litestream/LiteFS

I’ve been playing with Fly.io a lot recently (see my series on deploying Wagtail to Fly.io).

One of the neat things that has come out of Fly is a renewed interest across the dev world in SQLite - an embedded database that doesn’t need any special servers or delicate configuration.

Some part of this interest comes from the thought that if you had an SQLite database that sat right next to your application, in the same VM, with no network latency, that’s probably going to be pretty quick and pretty easy to deploy.

Although in some ways it feels like this idea comes full circle back to the days of running a MySQL Server alongside our PHP application on a single VPS, we’re also in an era where we need to deal with things like geographic distribution, ephemeral filesystems and scale-to-zero.

So we want to run our apps in a nice PaaS, and also quite like the idea of our database being local to our application code, but there’s a few conflicts here:

  • PaaS tools like Heroku/Fly tend to offer ephemeral storage, or no guarantees on the safety of storage. Trying to keep an SQLite database around on this sort of storage just won’t work out.
  • A common approach to scaling is to “scale out” - start up more instances of your application and load balance between them. How would that work with SQLite? Even if you could access the same database file from each instance, we’re re-introducing latency and as SQLite can’t be written to by multiple processes at once, we’re probably slowing our app down too.

Thankfully Fly have been funding the development of some interesting tools; Litestream and LiteFS which aim to solve this.

The difference between these tools is not particularly obvious; so to summarise:

  • Litestream was Ben Johnson’s first attempt at solving this problem, and is now focused primarily on disaster recovery. It’s a tool to stream all the changes made to your SQLite database to some remote storage, like S3, and then recover from it when you need to.

    This is great, and it nicely solves our first conflict. Our application can be configured to restore the database from remote storage when it starts, and we can be safe knowing that any changes are being backed up as our application runs.

    Unfortunately, it doesn’t solve our second problem, replicating our databases to other instances of our app if we decide to scale out. While there were plans (and an initial implementation) for this in Litestream, live replication was instead moved to the second project, LiteFS.

  • LiteFS does some magic with FUSE to allow it to intercept SQLite transactions and then replicate to multiple instances of your application. It’s a little more complicated as you need additional tools like Consul so that it knows where to find the primary instance (where it will direct queries that write to the database), but it solves our second conflict!

    Alas, our first conflict isn’t yet solved by LiteFS - if all your nodes go away, there’s nowhere to replicate your database from so it too will disappear. S3 replication like in Litestream is on the roadmap however, so it seems like LiteFS is fixed to solve all our problems!

So we know what these tools do, let’s experiment with getting our Django applications running with them on Fly.io

Litestream

For Litestream, we’ll need:

  • An S3-compatible storage bucket and access keys
  • Our django app, ideally configured with dj-database-url for convenience
  • The litestream binary available to our application. I have:
    wget https://github.com/benbjohnson/litestream/releases/download/v0.3.8/litestream-v0.3.9-linux-amd64.deb \
     && dpkg -i litestream-v0.3.9-linux-amd64.deb
    
    in my Dockerfile.
  1. Prepare your Fly application with flyctl launch (we don’t need a Postgres database if it asks).

  2. Set all the environment variables we’re going to need by creating a new file (call it something like env-vars):

    LITESTREAM_ACCESS_KEY_ID=[your S3-compatible access key]
    LITESTREAM_SECRET_ACCESS_KEY=[your S3 compatible secret key]
    DB_DIR=/db
    DATABASE_URL=sqlite:////db/sqlite.db
    S3_DB_URL=s3://[your-bucket-name]/db
    

    DB_DIR will be the directory where the database is replicated to, DATABASE_URL is the path where Django and dj-database-url can find your database file, and S3_DB_URL is the path to your S3-compatible bucket.

    Run

    flyctl secrets import < env-vars
    

    to import these values in to your Fly environment.

  3. Create a litestream.yml:

    exec: uwsgi
    dbs:
        - path: "$DB_DIR/db.sqlite"
          replicas:
            - url: "$S3_DB_URL"
    

    Replace your exec section with whatever you you normally run to start your web server. Litestream will do its stuff and conveniently run our own application, exiting when our server exits.

  4. Create a script, start.sh, that will run on application start to make sure all our directories are created:

    #!/usr/bin/env bash
    if [[ -z "$DB_DIR" ]]; then
        echo "DB_DIR env var not specified - this should be a path of the directory where the database file should be stored"
        exit 1
    fi
    if [[ -z "$S3_DB_URL" ]]; then
        echo "S3_DB_URL env var not specified - this should be an S3-style URL to the location of the replicated database file"
        exit 1
    fi
    
    mkdir -p "$DB_DIR"
    
    litestream restore -if-db-not-exists -if-replica-exists -o "$DB_DIR/db.sqlite" "$S3_DB_URL"
    
    ./manage.py migrate --noinput
    ./manage.py createcachetable
    
    chmod -R a+rwX "$DB_DIR"
    
    exec litestream replicate -config litestream.yml
    

    This:

    • Checks important environment variables are set.
    • Creates a database directory and makes sure it’s open enough for the app to read/write to it (you might choose to tighten this up if appropriate).
    • Restores the database using litestream if it doesn’t already exist.
    • Runs migrate to make sure the database is up to date (or creates it if there wasn’t anything to restore).
    • Runs litestream replicate which will in turn run the exec command in the litestream config, starting the application.

    Update your Docker CMD to run this start.sh.

Once deployed with flyctl deploy, Litestream will start backing up your database. Careful, if you try to scale out by adding more instances, at best you’ll see out of sync data, at worst you’ll end up with a corrupt database.

LiteFS

💡

The following is not battle-tested and is not suitable for production, but if you’re interested in experimenting, it’s a bit of fun.

For LiteFS, we’ll need:

  • Our django app, ideally configured with dj-database-url for convenience
  • The litefs binary available to our application. I have:
    RUN ARCH="amd64" && \
    VERSION="0.2.0" && \
    cd $(mktemp --directory) && \
    wget "https://github.com/superfly/litefs/releases/download/v${VERSION}/litefs-v${VERSION}-linux-${ARCH}.tar.gz" && \
    tar xvf "litefs-v${VERSION}-linux-${ARCH}.tar.gz" && \
    mv litefs /usr/local/bin
    
    in my Dockerfile (alternatively, copy the binary from the litefs image.
  • Some way to make sure our write requests only end up with the primary (we’ll come back to this).
  1. Prepare your Fly application with flyctl launch (we don’t need a Postgres database if it asks).

  2. Set all the environment variables we’re going to need by creating a new file (call it something like env-vars):

    DB_DIR=/db
    DATABASE_URL=sqlite:////db/sqlite.db
    

    DB_DIR will be the directory where the database is replicated to and DATABASE_URL is the path where Django and dj-database-url can find your database file.

    Run

    flyctl secrets import < env-vars
    

    to import these values in to your Fly environment.

  3. In your fly.toml, add:

    [experimental]
        enable_consul = true
    

    This gives us access to the shared Fly.io-managed Consul instance.

  4. Create a litefs.yml:

    exec: ./run.sh
    
    mount-dir: "$DB_DIR"
    data-dir: "/db-data"
    
    http:
      addr: ":20202"
    
    consul:
      url: "${FLY_CONSUL_URL}"
      advertise-url: "http://${FLY_REGION}.${FLY_APP_NAME}.internal:20202"
    

    Replace your exec section with whatever you you normally run to start your web server.

    The mount-dir is where LiteFS will create its filesystem (where the database will live), data-dir is where it keeps files it needs for replication. The http and consul blocks tell LiteFS how to talk to each other and where to find the Fly.io managed Consul instance.

  5. Create the run.sh that is started by LiteFS. We need things like migrations to run after LiteFS has set up its filesystem, so we do those in this script@:

    #!/usr/bin/env bash
    set -e
    echo "Starting app"
    
    ./manage.py migrate --noinput || true
    ./manage.py createcachetable
    chmod -R a+rwX "$DB_DIR"
    
    exec /venv/bin/uwsgi
    
  6. Create a script, start.sh, that will run on application start to make sure all our directories are created:

    #!/usr/bin/env bash
    if [[ -z "$DB_DIR" ]]; then
        echo "DB_DIR env var not specified - this should be a path of the directory where the database file should be stored"
        exit 1
    fi
    
    mkdir -p "$DB_DIR"
    
    exec litefs -config litefs.yml
    

    This:

    Update your Docker CMD to run this start.sh.

  7. We’re not there yet. We need to make sure database writes only go to our primary. To do this, we’ll register a database execute_wrapper which intercepts any write queries. I’ve got this in my base app’s __init__.py (heavily based on Adam Johnson’s django-read-only):

    import os
    import os.path
    from pathlib import Path
    from typing import Any, Callable, Generator
    
    from django.apps import AppConfig
    from djang.conf import settings
    from django.db import connections
    from django.db.backends.base.base import BaseDatabaseWrapper
    from django.db.backends.signals import connection_created
    
    read_only = False
    
    class BaseConfig(AppConfig):
        name = "bakerydemo.base"
        verbose_name = "base"
    
        def ready(self) -> None:
            db_name = settings.DATABASES['default']['NAME']
            db_dir = Path(db_name).parent
            primary_path = db_dir / ".primary"
            if not primary_path.is_file():
                return
            for alias in connections:
                connection = connections[alias]
                install_hook(connection)
            connection_created.connect(install_hook)
    
    
    def install_hook(connection: BaseDatabaseWrapper, **kwargs: object) -> None:
        if blocker not in connection.execute_wrappers:
            connection.execute_wrappers.insert(0, blocker)
    
    
    class QueriesAttemptedError(Exception):
        pass
    
    
    def blocker(
        execute: Callable[[str, str, bool, dict[str, Any]], Any],
        sql: str,
        params: str,
        many: bool,
        context: dict[str, Any],
    ) -> Any:
        if should_block(sql):
            raise QueriesAttemptedError(msg)
        return execute(sql, params, many, context)
    
    
    def should_block(sql: str) -> bool:
        return not sql.lstrip(" \n(").startswith(
            (
                "EXPLAIN ",
                "PRAGMA ",
                "ROLLBACK TO SAVEPOINT ",
                "RELEASE SAVEPOINT ",
                "SAVEPOINT ",
                "SELECT ",
                "SET ",
            )
        ) and sql not in ("BEGIN", "COMMIT", "ROLLBACK")
    

    This will raise an exception if the query will write to the database, and if the .primary file created by LiteFS exists (meaning this is not the primary).

  8. We need something to intercept this exception, so add some middleware:

    from django.http import HttpResponse
    from djang.conf import settings
    from . import QueriesAttemptedError
    
    def replay_middleware(get_response):
        def middleware(request):
            try:
                response = get_response(request)
            except QueriesAttemptedError:
                res = HttpResponse()
                # Find the name of the primary instance by reading the .primary file
                db_name = settings.DATABASES['default']['NAME']
                db_dir = Path(db_name).parent
                primary_path = db_dir / ".primary"
                primary = primary_path.read_text()
                res.headers['fly-replay'] = f"instance={primary}"
                return res
    
            return response
    
    return middleware
    

    and register it in your MIDDLEWARE settings.

    This catches the exception raised by the previously registered execute_wrapper, finds out where the primary database is hosted and returns a fly-replay header telling Fly.io; “Sorry, I can’t handle this request, please replay it to this database primary”.

Once deployed with flyctl deploy, LiteFS will start replicating your database!

The Future

These are fun tools to play with for now, but there’s clearly a lot of work to get them working with our normal apps.

I’m excited about how they could make getting a Django/Wagtail app deployed much more accessible, easier and cheaper, but they’re still some work to be done to make that a reality.

The LiteFS roadmap includes things like S3 replication (so we get similar backup features to Litestream), and write forwarding (so writes to read-replicas will automatically be forwarded to the primary). There’s a lot of promise there and I can’t wait to make more use of it!

← Back to Posts

Enjoyed this post?

Check out more of my articles or get in touch if you have any questions!

Explore More Posts