NixOS Planet

March 04, 2021

Tweag I/O

Announcing Gomod2nix

I’m very pleased to announce Gomod2nix, a new tool to create Go packages with Nix!

Gomod2nix is a code generation tool whose main focus is addressing the correctness and usability concerns I have with the current Go packaging solutions. It offers a composable override interface, which allows overrides to be shared across projects, simplifying the packaging of complex packages. As a bonus, it also boasts much better cache hit rates than other similar solutions, owing to not abusing fixed-output derivations.

I also took the opportunity of this new package to address some long-standing annoyances with existing Go Nix tooling. For instance Gomod2nix disables CGO by default, this let me enable static Go binaries by default. Changing the defaults in existing tooling would be very difficult as it would break a lot of existing packages, especially those maintained outside of Nixpkgs, which depend on the present behavior.

In order to motivate this new tool, let’s take a look at how Go dependency management evolved.

The developmentent of Trustix (which Gomod2nix was developed for) is funded by NLNet foundation and the European Commission’s Next Generation Internet programme through the NGI Zero PET (privacy and trust enhancing technologies) fund.

A history of Go packaging

In Go you don’t add dependencies to a manifest, but instead you add a dependency to your project by simply adding an import to a source file:

package main

import (

func main() {

and the go tool will figure out how to fetch this dependency.

From the beginning Go didn’t have package management in the traditional sense. Instead it enforced a directory structure that mimics the import paths. A project called that depends on expects to be located in a directory structure looking like:


This may not look so bad in this very simple example, but since this structure is not only enforced for your packages but also your dependencies, this quickly becomes messy. The $GOPATH mechanism has been one of the truly sore spots of Go development. Under this packaging paradigm you are expected to always use the latest Git master of all your dependencies and there is no version locking.

Dep was the first official packaging experiment for Go. This tool improved upon $GOPATH not by removing it, but by hiding that complexity from the user entirely. Besides that, it added lock files and a SAT dependency solver.

Finally, armed with the learnings and some critique of Dep, the Go team decided to develop a new simpler solution — Go modules. It addressed a number of perceived problems with Dep, like the fact that the use of semver and SAT solvers are far too complicated for the requirements Go has. As of now, Dep is deprecated, and Go modules is the solution I developed against.

A tale of two lock files

Originally, I set out to design gomod2nix in the same way as poetry2nix, a Python packaging solution for Nix. In poetry2nix one refers directly to a Poetry lock file from Nix, and poetry2nix does all the job needed to create the Nix package, which is very convenient.

However, this wasn’t possible here because of the design of Go modules, for reasons that I will explain below. As a consequence, gomod2nix is a Nix code generation tool. One feeds lock files to it, and it generates Nix expressions defining the corresponding packages.

In the following, I will compare the Poetry and Go lock files, and show which limitations the Go file format and import mechanism imposes upon us.

Exposed dependency graphs

First, let’s look at an excerpt of Poetry’s lock file:

name = "cachecontrol"
version = "0.12.6"
description = "httplib2 caching for requests"
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"

lockfile = {version = ">=0.9", optional = true, markers = "extra == \"filecache\""}
msgpack = ">=0.5.2"
requests = "*"

And also at go.sum: v1.1.0 h1:ZDRjVQ15GmhC3fiQ8ni8+OwkZQO4DARzQgrnXU1Liz8= v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= v0.1.0 h1:4G4v2dO3VZwixGIRoQ5Lfboy6nUhCyYzaqnIAPPhYs4= v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM= v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= v3.0.0-20200313102051-9f266ea9e77c h1:dUUwHk2QECo/6vqA44rthZ8ie2QXMNeKRTHCNY2nXvo= v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

After squinting at these files for a while we can already see some radical differences in semantics, most notably that go.sum is structured as a flat list rather than a graph. Of course, all dependencies to the build are listed in go.sum, but we don’t know what depends on what. What this means for us in Nix is that we have no good unit of incrementality — everything has to be built together — while Poetry can build dependencies separately.

Bespoke formats

Go modules use its own Go-like file format, while Poetry uses TOML to serialize both its manifest and lock files.

While this format is simple and writing a parser for it isn’t hard, it makes the lives of tooling authors harder. It would be much easier if a standard format for data interchange was used, rather than a custom format.

Bespoke hashes

The next problem with Go modules is its use of a custom hashing mechanism that’s fundamentally incompatible with how Nix hashes paths.

As explained in Eelco Dolstra’s thesis The Purely Functional Software Deployment Model Nix uses it’s own reproducible archive format NAR, which is used for both uploading build results and for directory hashing.

The Go developers faced with similar concerns created their own directory hashing scheme, which unfortunately is fundamentally incompatible with Nix hashes. I don’t see how Go modules could have done this any better, but the situation is unfortunate.

Dynamic package name resolving

In the previous example, I showed how a Go import path looks like. Sadly it turns out that the surface simplicity of those paths hide a lot of underlying logic.

Internally, these import paths are handled by the RepoRootForImport family of functions in the vcs (version control system) package, which maps import paths to repository URLs and VCS types. Some of these are matched statically using regex but others use active probing.

This is a true showstopper for a pure-Nix Go packaging solution, and the reason why Gomod2nix is a code generation tool — we don’t have network access in the Nix evaluator, making it impossible to correctly resolve VCS from a Nix evaluation.

Solutions to go modules packaging

The points above make our limitations clear. With these in mind, let’s discuss how Go packaging solutions were conceived.

Code generation: vgo2nix

My first attempt at creating a tool for packaging Go modules was vgo2nix, another code generation tool. It was written very shortly after modules were announced, and at the time the tooling support for them wasn’t good. For example, there wasn’t a parser for go.mod published back then.

It was based on the older Nixpkgs Go abstraction buildGoPackage, emulating a $GOPATH based build unfortunately with some assumptions that are not true.

Let’s again look at an excerpt from go.sum: v0.9.0/go.mod h1:xyHB1BMZT0cuDHU7I0+g046+BFDTQ8rEZB0s4Yfa6bI= v0.9.3/go.mod h1:GsRuLYvwzLjjjRoWEIyMUaYq8GNUx2nRB378IPt/1p0= v0.5.0/go.mod h1:8Z9fGy2MpX0PvDjB1pEgQTmVqjGhiHBW7RJJEciWzS0= v0.8.0/go.mod h1:Z6vX6WXXuyieHAXwMj0S6HY6e6wcHn37qQMBQlvY3lc= v0.8.1/go.mod h1:ZjhuQClTqx435SRJ2iMlOxPYt3d2C/T/7TiQCVZSn3Q= v0.1.0/go.mod h1:plvfp3oPSKwf2DNjlBjWF/7vwR+cUD/ELuzDCXwHUVA= v0.2.0/go.mod h1:vcORJHLJEh643/Ioh9+vPmf1Ij9AEBM5FuBIXLmIy0g=

These packages are all developed in the same repository but have different tags, and in this case vgo2nix would incorrectly only clone one version of the repository and sacrifice the correctness we get from modules because of how $GOPATH is set up.

Fixed-output derivations: buildGoModule

The buildGoModule tool is the most popular solution for Go packaging right now in Nixpkgs. A typical buildGoModule package looks something like:

{ buildGoModule, fetchFromGitHub, lib }:

buildGoModule {
  pname = "someName";
  version = "0.0.1";

  src = fetchFromGitHub { ... };

  vendorSha256 = "1500vim2lmkkls758pwhlx3piqbw6ap0nnhdwz9pcxih4s4as2nk";

The buildGoModule package is designed around fixed-output derivations, which means that a single derivation is created where all the dependencies of the package you want to build are wrapped, and only a single hash of the derivation is specified. It fetches all dependencies in the fixed output, creating a vendor directory which is used for the build.

This has several issues, most notably there is no sharing of dependencies between packages that depend on the same Go module.

The other notable issue is that it forces developers to remember editing the vendorSha256 attribute separately from the already existing hash/sha256 attribute on the derivation. Forgetting to do so can not only lead to incorrect builds but also be frustrating when working with larger packages that takes a long time to build, and only very late in the build notice that something was broken so you have to start over from scratch.

Because of the lack of hash granularity the build needs to clone every dependency every time the vendorSha256 is invalidated, and cannot use the cache from previous builds.

Fixed-output derivations can also be considered an impurity, and there is a push to restrict them.

My solution: gomod2nix

Approach-wise gomod2nix positions itself right between vgo2nix and buildGoModule. It’s still a code generation tool like vgo2nix, but fully embraces the Go modules world and only supports Go modules based builds — the old GOPATH way is unsupported. It uses the same vendoring approach that buildGoModule uses, but instead of vendoring the actual sources in a derivation, it uses symlinks instead. In that way, dependencies can be fetched separately, and identical dependency source trees can be shared between multiple different packages in the Nix store.

From a user perspective the workflow is largely similar to vgo2nix:

  • You write a basic expression looking like:
pkgs.buildGoApplication {
  pname = "gomod2nix-example";
  version = "0.1";
  src = ./.;
  modules = ./gomod2nix.toml;
  • Run the code generation tool: $ gomod2nix


Go packaging looks very simple on the surface, but murky details lure around underneath, and there are lots of tiny details to get right to correctly create a Go package in a sandboxed environment like Nix.

I couldn’t get the best-in-class user experience I was hoping for and gotten used to with Poetry2nix. Code generation adds extra steps to the development process and requires either a developer or a test pipeline to keep the Nix expressions in sync with the language specific lock files, something that requires discipline and takes extra time and effort. Despite that, it turned out there were major wins to be had regarding creating a new packaging solution.

The development of gomod2nix is funded by NLNet through the PET(privacy and trust enhancing technologies) fund. gomod2nix is being developed as a part of Trustix.



March 04, 2021 12:00 AM

February 24, 2021

Sander van der Burg

Deploying mutable multi-process Docker containers with the Nix process management framework (or running Hydra in a Docker container)

In a blog post written several months ago, I have shown that the Nix process management framework can also be used to conveniently construct multi-process Docker images.

Although Docker is primarily used for managing single root application process containers, multi-process containers can sometimes be useful to deploy systems that consist of multiple, tightly coupled, processes.

The Docker manual has a section that describes how to construct images for multi-process containers, but IMO the configuration process is a bit tedious and cumbersome.

To make this process more convenient, I have built a wrapper function: createMultiProcessImage around the dockerTools.buildImage function (provided by Nixpkgs) that does the following:

  • It constructs an image that runs a Linux and Docker compatible process manager as an entry point. Currently, it supports supervisord, sysvinit, disnix and s6-rc.
  • The Nix process management framework is used to build a configuration for a system that consists of multiple processes, that will be managed by any of the supported process managers.

Although the framework makes the construction of multi-process images convenient, a big drawback of multi-process Docker containers is upgrading them -- for example, for Debian-based containers you can imperatively upgrade packages by connecting to the container:

$ docker exec -it mycontainer /bin/bash

and upgrade the desired packages, such as file:

$ apt install file

The upgrade instruction above is not reproducible -- apt may install file version 5.38 today, and 5.39 tomorrow.

To cope with these kinds of side-effects, Docker works with images that snapshot the outcomes of all the installation steps. Constructing a container from the same image will always provide the same versions of all dependencies.

As a consequence, to perform a reproducible container upgrade, it is required to construct a new image, discard the container and reconstruct the container from the new image version, causing the system as a whole to be terminated, including the processes that have not changed.

For a while, I have been thinking about this limitation and developed a solution that makes it possible to upgrade multi-process containers without stopping and discarding them. The only exception is the process manager.

To make deployments reproducible, it combines the reproducibility properties of Docker and Nix.

In this blog post, I will describe how this solution works and how it can be used.

Creating a function for building mutable Docker images

As explained in an earlier blog post, that compares the deployment properties of Nix and Docker, both solutions support reproducible deployment, albeit for different application domains.

Moreover, their reproducibility properties are built around different concepts:

  • Docker containers are reproducible, because they are constructed from images that consist of immutable layers identified by hash codes derived from their contents.
  • Nix package builds are reproducible, because they are stored in isolation in a Nix store and made immutable (the files' permissions are set read-only). In the construction process of the packages, many side effects are mitigated.

    As a result, when the hash code prefix of a package (derived from all build inputs) is the same, then the build output is also (nearly) bit-identical, regardless of the machine on which the package was built.

By taking these reproducibilty properties into account, we can create a reproducible deployment process for upgradable containers by using a specific separation of responsibilities.

Deploying the base system

For the deployment of the base system that includes the process manager, we can stick ourselves to the traditional Docker deployment workflow based on images (the only unconventional aspect is that we use Nix to build a Docker image, instead of Dockerfiles).

The process manager that the image provides deploys its configuration from a dynamic configuration directory.

To support supervisord, we can invoke the following command as the container's entry point:

supervisord --nodaemon \
--configuration /etc/supervisor/supervisord.conf \
--logfile /var/log/supervisord.log \
--pidfile /var/run/

The above command starts the supervisord service (in foreground mode), using the supervisord.conf configuration file stored in /etc/supervisord.

The supervisord.conf configuration file has the following structure:



The above configuration automatically loads all program definitions stored in the conf.d directory. This directory is writable and initially empty. It can be populated with configuration files generated by the Nix process management framework.

For the other process managers that the framework supports (sysvinit, disnix and s6-rc), we follow a similar strategy -- we configure the process manager in such a way that the configuration is loaded from a source that can be dynamically updated.

Deploying process instances

Deployment of the process instances is not done in the construction of the image, but by the Nix process management framework and the Nix package manager running in the container.

To allow a processes model deployment to refer to packages in the Nixpkgs collection and install binary substitutes, we must configure a Nix channel, such as the unstable Nixpkgs channel:

$ nix-channel --add
$ nix-channel --update

(As a sidenote: it is also possible to subscribe to a stable Nixpkgs channel or a specific Git revision of Nixpkgs).

The processes model (and relevant sub models, such as ids.nix that contains numeric ID assignments) are copied into the Docker image.

We can deploy the processes model for supervisord as follows:

$ nixproc-supervisord-switch

The above command will deploy the processes model in the NIXPROC_PROCESSES environment variable, which defaults to: /etc/nixproc/processes.nix:

  • First, it builds supervisord configuration files from the processes model (this step also includes deploying all required packages and service configuration files)
  • It creates symlinks for each configuration file belonging to a process instance in the writable conf.d directory
  • It instructs supervisord to reload the configuration so that only obsolete processes get deactivated and new services activated, causing unchanged processes to remain untouched.

(For the other process managers, we have equivalent tools: nixproc-sysvinit-switch, nixproc-disnix-switch and nixproc-s6-rc-switch).

Initial deployment of the system

Because only the process manager is deployed as part of the image (with an initially empty configuration), the system is not yet usable when we start a container.

To solve this problem, we must perform an initial deployment of the system on first startup.

I used my lessons learned from the chainloading techniques in s6 (in the previous blog post) and developed hacky generated bootstrap script (/bin/bootstrap) that serves as the container's entry point:

cat > /bin/bootstrap <<EOF
#! ${} -e

# Configure Nix channels
nix-channel --add ${channelURL}
nix-channel --update

# Deploy the processes model (in a child process)
nixproc-${input.processManager}-switch &

# Overwrite the bootstrap script, so that it simply just
# starts the process manager the next time we start the
# container
cat > /bin/bootstrap <<EOR
#! ${} -e
exec ${cmd}

# Chain load the actual process manager
exec ${cmd}
chmod 755 /bin/bootstrap

The generated bootstrap script does the following:

  • First, a Nix channel is configured and updated so that we can install packages from the Nixpkgs collection and obtain substitutes.
  • The next step is deploying the processes model by running the nixproc-*-switch tool for a supported process manager. This process is started in the background (as a child process) -- we can use this trick to force the managing bash shell to load our desired process supervisor as soon as possible.

    Ultimately, we want the process manager to become responsible for supervising any other process running in the container.
  • After the deployment process is started in the background, the bootstrap script is overridden by a bootstrap script that becomes our real entry point -- the process manager that we want to use, such as supervisord.

    Overriding the bootstrap script makes sure that the next time we start the container, it will start instantly without attempting to deploy the system again.
  • Finally, the bootstrap script "execs" into the real process manager, becoming the new PID 1 process. When the deployment of the system is done (the nixproc-*-switch process that still runs in the background), the process manager becomes responsible for reaping it.

With the above script, the workflow of deploying an upgradable/mutable multi-process container is the same as deploying an ordinary container from a Docker image -- the only (minor) difference is that the first time that we start the container, it may take some time before the services become available, because the multi-process system needs to be deployed by Nix and the Nix process management framework.

A simple usage scenario

Similar to my previous blog posts about the Nix process management framework, I will use the trivial web application system to demonstrate how the functionality of the framework can be used.

The web application system consists of one or more webapp processes (with an embedded HTTP server) that only return static HTML pages displaying their identities.

An Nginx reverse proxy forwards incoming requests to the appropriate webapp instance -- each webapp service can be reached by using its unique virtual host value.

To construct a mutable multi-process Docker image with Nix, we can write the following Nix expression (default.nix):

pkgs = import <nixpkgs> {};

nix-processmgmt = builtins.fetchGit {
url =;
ref = "master";

createMutableMultiProcessImage = import "${nix-processmgmt}/nixproc/create-image-from-steps/create-mutable-multi-process-image-universal.nix" {
inherit pkgs;
createMutableMultiProcessImage {
name = "multiprocess";
tag = "test";
contents = [ ];
exprFile = ./processes.nix;
idResourcesFile = ./idresources.nix;
idsFile = ./ids.nix;
processManager = "supervisord"; # sysvinit, disnix, s6-rc are also valid options

The above Nix expression invokes the createMutableMultiProcessImage function that constructs a Docker image that provides a base system with a process manager, and a bootstrap script that deploys the multi-process system:

  • The name, tag, and contents parameters specify the image name, tag and the packages that need to be included in the image.
  • The exprFile parameter refers to a processes model that captures the configurations of the process instances that need to be deployed.
  • The idResources parameter refers to an ID resources model that specifies from which resource pools unique IDs need to be selected.
  • The idsFile parameter refers to an IDs model that contains the unique ID assignments for each process instance. Unique IDs resemble TCP/UDP port assignments, user IDs (UIDs) and group IDs (GIDs).
  • We can use the processManager parameter to select the process manager we want to use. In the above example it is supervisord, but other options are also possible.

We can use the following processes model (processes.nix) to deploy a small version of our example system:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager

nix-processmgmt = builtins.fetchGit {
url =;
ref = "master";

ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

sharedConstructors = import "${nix-processmgmt}/examples/services-agnostic/constructors/constructors.nix" {
inherit pkgs stateDir runtimeDir logDir cacheDir tmpDir forceDisableUserChange processManager ids;

constructors = import "${nix-processmgmt}/examples/webapps-agnostic/constructors/constructors.nix" {
inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager ids;
rec {
webapp = rec {
port = ids.webappPorts.webapp or 0;
dnsName = "webapp.local";

pkg = constructors.webapp {
inherit port;

requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];

nginx = rec {
port = ids.nginxPorts.nginx or 0;

pkg = sharedConstructors.nginxReverseProxyHostBased {
webapps = [ webapp ];
inherit port;
} {};

requiresUniqueIdsFor = [ "nginxPorts" "uids" "gids" ];

The above Nix expression configures two process instances, one webapp process that returns a static HTML page with its identity and an Nginx reverse proxy that forwards connections to it.

A notable difference between the expression shown above and the processes models of the same system shown in my previous blog posts, is that this expression does not contain any references to files on the local filesystem, with the exception of the ID assignments expression (ids.nix).

We obtain all required functionality from the Nix process management framework by invoking builtins.fetchGit. Eliminating local references is required to allow the processes model to be copied into the container and deployed from within the container.

We can build a Docker image as follows:

$ nix-build

load the image into Docker:

$ docker load -i result

and create and start a Docker container:

$ docker run -it --name webapps --network host multiprocess:test
unpacking channels...
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
created 1 symlinks in user environment
2021-02-21 15:29:29,878 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.
2021-02-21 15:29:29,878 WARN No file matches via include "/etc/supervisor/conf.d/*"
2021-02-21 15:29:29,897 INFO RPC interface 'supervisor' initialized
2021-02-21 15:29:29,897 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2021-02-21 15:29:29,898 INFO supervisord started with pid 1
these derivations will be built:
these paths will be fetched (78.80 MiB download, 347.06 MiB unpacked):

As may be noticed by looking at the output, on first startup the Nix process management framework is invoked to deploy the system with Nix.

After the system has been deployed, we should be able to connect to the webapp process via the Nginx reverse proxy:

$ curl -H 'Host: webapp.local' http://localhost:8080
<!DOCTYPE html>
<title>Simple test webapp</title>
Simple test webapp listening on port: 5000

When it is desired to upgrade the system, we can change the system's configuration by connecting to the container instance:

$ docker exec -it webapps /bin/bash

In the container, we can edit the processes.nix configuration file:

$ mcedit /etc/nixproc/processes.nix

and make changes to the configuration of the system. For example, we can change the processes model to include a second webapp process:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager

nix-processmgmt = builtins.fetchGit {
url =;
ref = "master";

ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

sharedConstructors = import "${nix-processmgmt}/examples/services-agnostic/constructors/constructors.nix" {
inherit pkgs stateDir runtimeDir logDir cacheDir tmpDir forceDisableUserChange processManager ids;

constructors = import "${nix-processmgmt}/examples/webapps-agnostic/constructors/constructors.nix" {
inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager ids;
rec {
webapp = rec {
port = ids.webappPorts.webapp or 0;
dnsName = "webapp.local";

pkg = constructors.webapp {
inherit port;

requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];

webapp2 = rec {
port = ids.webappPorts.webapp2 or 0;
dnsName = "webapp2.local";

pkg = constructors.webapp {
inherit port;
instanceSuffix = "2";

requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];

nginx = rec {
port = ids.nginxPorts.nginx or 0;

pkg = sharedConstructors.nginxReverseProxyHostBased {
webapps = [ webapp webapp2 ];
inherit port;
} {};

requiresUniqueIdsFor = [ "nginxPorts" "uids" "gids" ];

In the above process model model, a new process instance named: webapp2 was added that listens on a unique port that can be reached with the webapp2.local virtual host value.

By running the following command, the system in the container gets upgraded:

$ nixproc-supervisord-switch

resulting in two webapp process instances running in the container:

$ supervisorctl
nginx RUNNING pid 847, uptime 0:00:08
webapp RUNNING pid 459, uptime 0:05:54
webapp2 RUNNING pid 846, uptime 0:00:08

The first instance: webapp was left untouched, because its configuration was not changed.

The second instance: webapp2 can be reached as follows:

$ curl -H 'Host: webapp2.local' http://localhost:8080
<!DOCTYPE html>
<title>Simple test webapp</title>
Simple test webapp listening on port: 5001

After upgrading the system, the new configuration should also get reactivated after a container restart.

A more interesting example: Hydra

As explained earlier, to create upgradable containers we require a fully functional Nix installation in a container. This observation made a think about a more interesting example than the trivial web application system.

A prominent example of a system that requires Nix and is composed out of multiple tightly integrated process is Hydra: the Nix-based continuous integration service.

To make it possible to deploy a minimal Hydra service in a container, I have packaged all its relevant components for the Nix process management framework.

The processes model looks as follows:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager

nix-processmgmt = builtins.fetchGit {
url =;
ref = "master";

nix-processmgmt-services = builtins.fetchGit {
url =;
ref = "master";

constructors = import "${nix-processmgmt-services}/services-agnostic/constructors.nix" {
inherit nix-processmgmt pkgs stateDir runtimeDir logDir tmpDir cacheDir forceDisableUserChange processManager;

instanceSuffix = "";
hydraUser = hydraInstanceName;
hydraInstanceName = "hydra${instanceSuffix}";
hydraQueueRunnerUser = "hydra-queue-runner${instanceSuffix}";
hydraServerUser = "hydra-www${instanceSuffix}";
rec {
nix-daemon = {
pkg = constructors.nix-daemon;

postgresql = rec {
port = 5432;
postgresqlUsername = "postgresql";
postgresqlPassword = "postgresql";
socketFile = "${runtimeDir}/postgresql/.s.PGSQL.${toString port}";

pkg = constructors.simplePostgresql {
inherit port;
authentication = ''
local hydra all ident map=hydra-users
identMap = ''
hydra-users ${hydraUser} ${hydraUser}
hydra-users ${hydraQueueRunnerUser} ${hydraUser}
hydra-users ${hydraServerUser} ${hydraUser}
hydra-users root ${hydraUser}
# The postgres user is used to create the pg_trgm extension for the hydra database
hydra-users postgresql postgresql

hydra-server = rec {
port = 3000;
hydraDatabase = hydraInstanceName;
hydraGroup = hydraInstanceName;
baseDir = "${stateDir}/lib/${hydraInstanceName}";
inherit hydraUser instanceSuffix;

pkg = constructors.hydra-server {
postgresqlDBMS = postgresql;
user = hydraServerUser;
inherit nix-daemon port instanceSuffix hydraInstanceName hydraDatabase hydraUser hydraGroup baseDir;

hydra-evaluator = {
pkg = constructors.hydra-evaluator {
inherit nix-daemon hydra-server;

hydra-queue-runner = {
pkg = constructors.hydra-queue-runner {
inherit nix-daemon hydra-server;
user = hydraQueueRunnerUser;

apache = {
pkg = constructors.reverseProxyApache {
dependency = hydra-server;
serverAdmin = "admin@localhost";

In the above processes model, each process instance represents a component of a Hydra installation:

  • The nix-daemon process is a service that comes with Nix package manager to facilitate multi-user package installations. The nix-daemon carries out builds on behalf of a user.

    Hydra requires it to perform builds as an unprivileged Hydra user and uses the Nix protocol to more efficiently orchestrate large builds.
  • Hydra uses a PostgreSQL database backend to store data about projects and builds.

    The postgresql process refers to the PostgreSQL database management system (DBMS) that is configured in such a way that the Hydra components are authorized to manage and modify the Hydra database.
  • hydra-server is the front-end of the Hydra service that provides a web user interface. The initialization procedure of this service is responsible for initializing the Hydra database.
  • The hydra-evaluator regularly updates the repository checkouts and evaluates the Nix expressions to decide which packages need to be built.
  • The hydra-queue-runner builds all jobs that were evaluated by the hydra-evaluator.
  • The apache server is used as a reverse proxy server forwarding requests to the hydra-server.

With the following commands, we can build the image, load it into Docker, and deploy a container that runs Hydra:

$ nix-build hydra-image.nix
$ docker load -i result
$ docker run -it --name hydra-test --network host hydra:test

After deploying the system, we can connect to the container:

$ docker exec -it hydra-test /bin/bash

and observe that all processes are running and managed by supervisord:

$ supervisorctl
apache RUNNING pid 1192, uptime 0:00:42
hydra-evaluator RUNNING pid 1297, uptime 0:00:38
hydra-queue-runner RUNNING pid 1296, uptime 0:00:38
hydra-server RUNNING pid 1188, uptime 0:00:42
nix-daemon RUNNING pid 1186, uptime 0:00:42
postgresql RUNNING pid 1187, uptime 0:00:42

With the following commands, we can create our initial admin user:

$ su - hydra
$ hydra-create-user sander --password secret --role admin
creating new user `sander'

We can connect to the Hydra front-end in a web browser by opening http://localhost (this works because the container uses host networking):

and configure a job set to a build a project, such as libprocreact:

Another nice bonus feature of having multiple process managers supported is that if we build Hydra's Nix process management configuration for Disnix, we can also visualize the deployment architecture of the system with disnix-visualize:

The above diagram displays the following properties:

  • The outer box indicates that we are deploying to a single machine: localhost
  • The inner box indicates that all components are managed as processes
  • The ovals correspond to process instances in the processes model and the arrows denote dependency relationships.

    For example, the apache reverse proxy has a dependency on hydra-server, meaning that the latter process instance should be deployed first, otherwise the reverse proxy is not able to forward requests to it.

Building a Nix-enabled container image

As explained in the previous section, mutable Docker images require a fully functional Nix package manager in the container.

Since this may also be an interesting sub use case, I have created a convenience function: createNixImage that can be used to build an image whose only purpose is to provide a working Nix installation:

pkgs = import <nixpkgs> {};

nix-processmgmt = builtins.fetchGit {
url =;
ref = "master";

createNixImage = import "${nix-processmgmt}/nixproc/create-image-from-steps/create-nix-image.nix" {
inherit pkgs;
createNixImage {
name = "foobar";
tag = "test";
contents = [ ];

The above Nix expression builds a Docker image with a working Nix setup and a custom package: the Midnight Commander.


In this blog post, I have described a new function in the Nix process management framework: createMutableMultiProcessImage that creates reproducible mutable multi-process container images, by combining the reproducibility properties of Docker and Nix. With the exception of the process manager, process instances in a container can be upgraded without bringing the entire container down.

With this new functionality, the deployment workflow of a multi-process container configuration has become very similar to how physical and virtual machines are managed with NixOS -- you can edit a declarative specification of a system and run a single command-line instruction to deploy the new configuration.

Moreover, this new functionality allows us to deploy a complex, tightly coupled multi-process system, such as Hydra: the Nix-based continuous integration service. In the Hydra example case, we are using Nix for three deployment aspects: constructing the Docker image, deploying the multi-process system configuration and building the projects that are configured in Hydra.

A big drawback of mutable multi-process images is that there is no sharing possible between multiple multi-process containers. Since the images are not built from common layers, the Nix store is private to each container and all packages are deployed in the writable custom layer, this may lead to substantial disk and RAM overhead per container instance.

Deploying the processes model to a container instance can probably be made more convenient by using Nix flakes -- a new Nix feature that is still experimental. With flakes we can easily deploy an arbitrary number of Nix expressions to a container and pin the deployment to a specific version of Nixpkgs.

Another interesting observation is the word: mutable. I am not completely sure if it is appropriate -- both the layers of a Docker image, as well as the Nix store paths are immutable and never change after they have been built. For both solutions, immutability is an important ingredient in making sure that a deployment is reproducible.

I have decided to still call these deployments mutable, because I am looking at the problem from a Docker perspective -- the writable layer of the container (that is mounted on top of the immutable layers of an image) is modified each time that we upgrade a system.

Future work

Although I am quite happy with the ability to create mutable multi-process containers, there is still quite a bit of work that needs to be done to make the Nix process management framework more usable.

Most importantly, trying to deploy Hydra revealed all kinds of regressions in the framework. To cope with all these breaking changes, a structured testing approach is required. Currently, such an approach is completely absent.

I could also (in theory) automate the still missing parts of Hydra. For example, I have not automated the process that updates the garbage collector roots, which needs to run in a timely manner. To solve this, I need to use a cron service or systemd timer units, which is beyond the scope of my experiment.


The createMutableMultiProcessImage function is part of the experimental Nix process management framework GitHub repository that is still under heavy development.

Because the amount of services that can be deployed with the framework has grown considerably, I have moved all non-essential services (not required for testing) into a separate repository. The Hydra constructor functions can be found in this repository as well.

by Sander van der Burg ( at February 24, 2021 09:46 PM

February 17, 2021

Tweag I/O

Derivation outputs in a content-addressed world

This is another blog post on the upcoming content-addressed derivations for Nix. We’ve already explained the feature and some of its advantages, as well as the reasons why it isn’t easy to implement. Now we’re going to talk about about a concrete user-facing change that this feature will entail: the distinction between “derivation outputs” and “output paths”.

Note that the changes presented here might not yet be implemented or merged upstream.

Store paths

Store paths are pervasive in Nix. Run nix build? This will return a store path. Want to move derivation outputs around? Just nix copy their store path. Even if you can run nix copy nixpkgs#hello, this is strictly equivalent to nix build nixpkgs#hello --out-link hello && nix copy $(realpath ./hello). Need to know whether a derivation has been built locally or in a binary cache? Just check whether its output path exists.

This is really nice in a world where the output of the derivations are input-addressed, because there’s a direct mapping between a derivation and its output paths − the .drv file actually explicitly contains them − which means that given a derivation Nix can directly know what its output paths are.

However this falls short with the addition of content-addressed derivations: if hello is content-addressed then I can’t introspect the derivation to know its output path anymore (see the previous post on that topic). Locally, Nix has a database that stores (amongst other things) which derivation produced which outputs, meaning that it knows that hello has already been built and that its output path is /nix/store/1234-hello. But if I just copy this output path to another machine, and try to rebuild hello there, Nix won’t be able to know that its output path is already there (because it doesn’t have that mapping), so it will have rebuild the derivation, only to realise that it yields the output path /nix/store/1234-hello that’s already there and discard the result.

This is very frustrating, as it means that the following won’t work:

$ nix copy --to ssh://somewhereelse nixpkgs.hello
# Try to build with `--max-jobs 0` to make it fail if it needs to rebuild anything
$ ssh somewhereelse nix build nixpkgs.hello --max-jobs 0
error: --- Error ----- nix
252 derivations need to be built, but neither local builds ('--max-jobs') nor remote builds ('--builders') are enabled

We could ask for Nix to copy the mapping between the hello derivation and its output paths as well, but many derivations might have produced this path, so if we just say nix copy --to ssh://somewhereelse /nix/store/1234-hello, does that mean that we want to copy the 1234-hello store path? Or 1234-hello the output of the hello derivation? Or 1234-hello the output of the hello2 derivation? Nix has no way to know that.

This means that we need another way to identify these outputs other than just their store paths.

Introducing derivation outputs

The one thing that we know though, and that uniquely identifies the derivation, is the hash of the derivation itself. The derivation for hello will be stored as /nix/store/xxx-hello.drv (where xxx is a hash), and that xxx definitely identifies the derivation (and is known before the build). As this derivation might have several outputs, we need to append the name of the considered output to get a truly unique identifier, giving us /nix/store/xxx-hello.drv!out.

So given this “derivation output id”, Nix will be able to both retrieve the corresponding output path (if it has been built), and know the mapping (derivation, outputName) -> outputPath.

With this in hand, we now can run nix copy --to ssh://somewhereelse /nix/store/xxx-hello.drv!out, which will both copy the output path /nix/store/1234-hello and register on the remote machine that this path is the output out of the derivation /nix/store/xxx-hello.drv. Likewise, nix copy nixpkgs.hello will be a shortcut for nix copy /nix/store/xxx-hello.drv. And now we can do

$ nix copy --to ssh://somewhereelse nixpkgs.hello
# Try to build with `--max-jobs 0` to make it fail if it needs to rebuild anything
$ ssh somewhereelse nix build nixpkgs.hello --max-jobs 0
$ ./result/bin/hello
Hello, world!

In practice

What will this mean in practice?

This means that the Nix cli will now return or accept either store paths or derivation output ids depending on the context. For example nix build will still create symlinks to the output paths and nix shell will add them to the PATH because that’s what makes sense in the context. But as we’ve seen above, nix copy will accept both store paths and derivation output ids, and these will have different semantics. Copying store paths will just copy the store paths as it used to do (in the case you don’t care about rebuilding them on the other side) while copying derivation outputs will also register these outputs on the remote side.

Once more, right now, this feature is still under development, so the changes presented here might not yet be implemented or merged upstream. So don’t be surprised when the feature lands in the near future!

February 17, 2021 12:00 AM

February 01, 2021

Sander van der Burg

Developing an s6-rc backend for the Nix process management framework

One of my major blog topics last year was my experimental Nix process management framework, that is still under heavy development.

As explained in many of my earlier blog posts, one of its major objectives is to facilitate high-level deployment specifications of running processes that can be translated to configurations for all kinds of process managers and deployment solutions.

The backends that I have implemented so far, were picked for the following reasons:

  • Multiple operating systems support. The most common process management service was chosen for each operating system: On Linux, sysvinit (because this used to be the most common solution) and systemd (because it is used by many conventional Linux distributions today), bsdrc on FreeBSD, launchd for macOS, and cygrunsrv for Cygwin.
  • Supporting unprivileged user deployments. To supervise processes without requiring a service that runs on PID 1, that also works for unprivileged users, supervisord is very convenient because it was specifically designed for this purpose.
  • Docker was selected because it is a very popular solution for managing services, and process management is one of its sub responsibilities.
  • Universal process management. Disnix was selected because it can be used as a primitive process management solution that works on any operating system supported by the Nix package manager. Moreover, the Disnix services model is a super set of the processes model used by the process management framework.

Not long after writing my blog post about the process manager-agnostic abstraction layer, somebody opened an issue on GitHub with the suggestion to also support s6-rc. Although I was already aware that more process/service management solutions exist, s6-rc was a solution that I did not know about.

Recently, I have implemented the suggested s6-rc backend. Although deploying s6-rc services now works quite conveniently, getting to know s6-rc and its companion tools was somewhat challenging for me.

In this blog post, I will elaborate about my learning experiences and explain how the s6-rc backend was implemented.

The s6 tool suite

s6-rc is a software projected published on skarnet and part of a bigger tool ecosystem. s6-rc is a companion tool of s6:'s small & secure supervision software suite.

On Linux and many other UNIX-like systems, the initialization process (typically /sbin/init) is a highly critical program:

  • It is the first program loaded by the kernel and responsible for setting the remainder of the boot procedure in motion. This procedure is responsible for mounting additional file systems, loading device drivers, and starting essential system services, such as SSH and logging services.
  • The PID 1 process supervises all processes that were directly loaded by it, as well as indirect child processes that get orphaned -- when this happens they get automatically adopted by the process that runs as PID 1.

    As explained in an earlier blog post, traditional UNIX services that daemonize on their own, deliberately orphan themselves so that they remain running in the background.
  • When a child process terminates, the parent process must take notice or the terminated process will stay behind as a zombie process.

    Because the PID 1 process is the common ancestor of all other processes, it is required to automatically reap all relevant zombie processes that become a child of it.
  • The PID 1 process runs with root privileges and, as a result, has full access to the system. When the security of the PID 1 process gets compromised, the entire system is at risk.
  • If the PID 1 process crashes, the kernel crashes (and hence the entire system) with a kernel panic.

There are many kinds of programs that you can use as a system's PID 1. For example, you can directly use a shell, such as bash, but is far more common to use an init system, such as sysvinit or systemd.

According to the author of s6, an init system is made out of four parts:

  1. /sbin/init: the first userspace program that is run by the kernel at boot time (not counting an initramfs).
  2. pid 1: the program that will run as process 1 for most of the lifetime of the machine. This is not necessarily the same executable as /sbin/init, because /sbin/init can exec into something else.
  3. a process supervisor.
  4. a service manager.

In the s6 tool eco-system, most of these parts are implemented by separate tools:

  • The first userspace program: s6-linux-init takes care of the coordination of the initialization process. It does a variety of one-time boot things: for example, it traps the ctrl-alt-del keyboard combination, it starts the shutdown daemon (that is responsible for eventually shutting down the system), and runs the initial boot script (rc.init).

    (As a sidenote: this is almost true -- the /sbin/init process is a wrapper script that "execs" into s6-linux-linux-init with the appropriate parameters).
  • When the initialization is done, s6-linux-init execs into a process called s6-svscan provided by the s6 toolset. s6-svscan's task is to supervise an entire process supervision tree, which I will explain later.
  • Starting and stopping services is done by a separate service manager started from the rc.init script. s6-rc is the most prominent option (that we will use in this blog post), but also other tools can be used.

Many conventional init systems, implement most (or sometimes all) of these aspects in a single executable.

In particular, the s6 author is highly critical of systemd: the init system that is widely used by many conventional Linux distributions today -- he dedicated an entire page with criticisms about it.

The author of s6 advocates a number of design principles for his tool eco-system (that systemd violates in many ways):

  • The Unix philosophy: do one job and do it well.
  • Doing less instead of more (preventing feature creep).
  • Keeping tight quality control over every tool by only opening up repository access to small teams only (or rather a single person).
  • Integration support: he is against the bazaar approach on project level, but in favor of the bazaar approach on an eco-system level in which everybody can write their own tools that integrate with existing tools.

The concepts implemented by the s6 tool suite were not completely "invented" from scratch. daemontools is what the author considers the ancestor of s6 (if you look at the web page then you will notice that the concept of a "supervision tree" was pioneered there and that some of the tools listed resemble the same tools in the s6 tool suite), and runit its cousin (that is also heavily inspired by daemontools).

A basic usage scenario of s6 and s6-rc

Although it is possible to use Linux distributions in which the init system, supervisor and service manager are all provided by skarnet tools, a sub set of s6 and s6-rc can also be used on any Linux distribution and other supported operating systems, such as the BSDs.

Root privileges are not required to experiment with these tools.

For example, with the following command we can use the Nix package manager to deploy the s6 supervision toolset in a development shell session:

$ nix-shell -p s6

In this development shell session, we can start the s6-svscan service as follows:

$ mkdir -p $HOME/var/run/service
$ s6-svscan $HOME/var/run/service

The s6-svscan is a service that supervises an entire process supervision tree, including processes that may accidentally become a child of it, such as orphaned processes.

The directory parameter is a scan directory that maintains the configurations of the processes that are currently supervised. So far, no supervised process have been deployed yet.

We can actually deploy services by using the s6-rc toolset.

For example, I can easily configure my trivial example system used in previous blog posts that consists of one or multiple web application processes (with an embedded HTTP server) returning static HTML pages and an Nginx reverse proxy that forwards requests to one of the web application processes based on the appropriate virtual host header.

Contrary to the other process management solutions that I have investigated earlier, s6-rc does not have an elaborate configuration language. It does not implement a parser (for very good reasons as explained by the author, because it introduces extra complexity and bugs).

Instead, you have to create directories with text files, in which each file represents a configuration property.

With the following command, I can spawn a development shell with all the required utilities to work with s6-rc:

$ nix-shell -p s6 s6-rc execline

The following shell commands create an s6-rc service configuration directory and a configuration for a single webapp process instance:

$ mkdir -p sv/webapp
$ cd sv/webapp

$ echo "longrun" > type

$ cat > run <<EOF
$ #!$(type -p execlineb) -P

envfile $HOME/envfile
exec $HOME/webapp/bin/webapp

The above shell script creates a configuration directory for a service named: webapp with the following properties:

  • It creates a service with type: longrun. A long run service deploys a process that runs in the foreground that will get supervised by s6.
  • The run file refers to an executable that starts the service. For s6-rc services it is common practice to implement wrapper scripts using execline: a non-interactive scripting language.

    The execline script shown above loads an environment variable config file with the following content: PORT=5000. This environment variable is used to configure the TCP port number to which the service should bind to and then "execs" into a new process that runs the webapp process.

    (As a sidenote: although it is a common habit to use execline for writing wrapper scripts, this is not a hard requirement -- any executable implemented in any language can be used. For example, we could also write the above run wrapper script as a bash script).

We can also configure the Nginx reverse proxy service in a similar way:

$ mkdir -p ../nginx
$ cd ../nginx

$ echo "longrun" > type
$ echo "webapp" > dependencies

$ cat > run <<EOF
$ #!$(type -p execlineb) -P

foreground { mkdir -p $HOME/var/nginx/logs $HOME/var/cache/nginx }
exec $(type -p nginx) "-p" "$HOME/var/nginx" "-c" "$HOME/nginx/nginx.conf" "-g" "daemon off;"

The above shell script creates a configuration directory for a service named: nginx with the following properties:

  • It again creates a service of type: longrun because Nginx should be started as a foreground process.
  • It declares the webapp service (that we have configured earlier) a dependency ensuring that webapp is started before nginx. This dependency relationship is important to prevent Nginx doing a redirect to a non-existent service.
  • The run script first creates all mandatory state directories and finally execs into the Nginx process, with a configuration file using the above state directories, and turning off daemon mode so that it runs in the foreground.

In addition to configuring the above services, we also want to deploy the system as a whole. This can be done by creating bundles that encapsulate collections of services:

mkdir -p ../default
cd ../default

echo "bundle" > type

cat > contents <<EOF

The above shell instructions create a bundle named: default referring to both the webapp and nginx reverse proxy service that we have configured earlier.

Our s6-rc configuration directory structure looks as follows:

$ find ./sv

If we want to deploy the service directory structure shown above, we first need to compile it into a configuration database. This can be done with the following command:

$ mkdir -p $HOME/etc/s6/rc
$ s6-rc-compile $HOME/etc/s6/rc/compiled-1 $HOME/sv

The above command creates a compiled database file in: $HOME/etc/s6/rc/compiled-1 stored in: $HOME/sv.

With the following command we can initialize the s6-rc system with our compiled configuration database:

$ s6-rc-init -c $HOME/etc/s6/rc/compiled-1 -l $HOME/var/run/s6-rc \

The above command generates a "live directory" in: $HOME/var/run/s6-rc containing the state of s6-rc.

With the following command, we can start all services in the: default bundle:

$ s6-rc -l $HOME/var/run/s6-rc -u change default

The above command deploys a running system with the following process tree:

As as can be seen in the diagram above, the entire process tree is supervised by s6-svscan (the program that we have started first). Every longrun service deployed by s6-rc is supervised by a process named: s6-supervise.

Managing service logging

Another important property of s6 and s6-rc is the way it handles logging. By default, all output that the supervised processes produce on the standard output and standard error are captured by s6-svscan and written to a single log stream (in our case, it will be redirected to the terminal).

When it is desired to capture the output of a service into its own dedicated log file, you need to configure the service in such a way that it writes all relevant information to a pipe. A companion logging service is required to capture the data that is sent over the pipe.

The following command-line instructions modify the webapp service (that we have created earlier) to let it send its output to another service:

$ cd sv
$ mv webapp webapp-srv
$ cd webapp-srv

$ echo "webapp-log" > producer-for
$ cat > run <<EOF
$ #!$(type -p execlineb) -P

envfile $HOME/envfile
fdmove -c 2 1
exec $HOME/webapp/bin/webapp

In the script above, we have changed the webapp service configuration as follows:

  • We rename the service from: webapp to webapp-srv. Using suffixes is a convention commonly used for s6-rc services that also have a log companion service.
  • With the producer-for property, we specify that the webapp-srv is a service that produces output for another service named: webapp-log. We will configure this service later.
  • We create a new run script that adds the following command: fdmove -c 2 1.

    The purpose of this added instruction is to redirect all output that is sent over the standard error (file descriptor: 2) to the standard output (file descriptor: 1). This redirection makes it possible that all data can be captured by the log companion service.

We can configure the log companion service: webapp-log with the following command-line instructions:

$ mkdir ../webapp-log
$ cd ../webapp-log

$ echo "longrun" > type
$ echo "webapp-srv" > consumer-for
$ echo "webapp" > pipeline-name
$ echo 3 > notification-fd

$ cat > run <<EOF
#!$(type -p execlineb) -P

foreground { mkdir -p $HOME/var/log/s6-log/webapp }
exec -c s6-log -d3 $HOME/var/log/s6-log/webapp

The service configuration created above does the following:

  • We create a service named: webapp-log that is a long running service.
  • We declare the service to be a consumer for the webapp-srv (earlier, we have already declared the companion service: webapp-srv to be a producer for this logging service).
  • We configure a pipeline name: webapp causing s6-rc to automatically generate a bundle with the name: webapp in which all involved services are its contents.

    This generated bundle allows us to always manage the service and logging companion as a single deployment unit.
  • The s6-log service supports readiness notifications. File descriptor: 3 is configured to receive that notification.
  • The run script creates the log directory in which the output should be stored and starts the s6-log service to capture the output and store the data in the corresponding log directory.

    The -d3 parameter instructs it to send a readiness notification over file descriptor 3.

After modifying the configuration files in such a way that each longrun service has a logging companion, we need to compile a new database that provides s6-rc our new configuration:

$ s6-rc-compile $HOME/etc/s6/rc/compiled-2 $HOME/sv

The above command creates a database with a new filename in: $HOME/etc/s6/rc/compiled-2. We are required to give it a new name -- the old configuration database (compiled-1) must be retained to make the upgrade process work.

With the following command, we can upgrade our running configuration:

$ s6-rc-update -l $HOME/var/run/s6-rc $HOME/etc/s6/rc/compiled-2

The result is the following process supervision tree:

As you may observe by looking at the diagram above, every service has a companion s6-log service that is responsible for capturing and storing its output.

The log files of the services can be found in $HOME/var/log/s6-log/webapp and $HOME/var/log/s6-log/nginx.

One shot services

In addition to longrun services that are useful for managing system services, more aspects need to be automated in a boot process, such as mounting file systems.

These kinds of tasks can be automated with oneshot services, that execute an up script on startup, and optionally, a down script on shutdown.

The following service configuration can be used to mount the kernel's /proc filesystem:

mkdir -p ../mount-proc
cd ../mount-proc

echo "oneshot" > type

cat > run <<EOF
$ #!$(type -p execlineb) -P
foreground { mount -t proc proc /proc }

Chain loading

The execline scripts shown in this blog post resemble shell scripts in many ways. One particular aspect that sets execline scripts apart from shell scripts is that all commands make intensive use of a concept called chain loading.

Every instruction in an execline script executes a task, may imperatively modify the environment (e.g. by changing environment variables, or changing the current working directory etc.) and then "execs" into a new chain loading task.

The last parameter of each command-line instruction refers to the command-line instruction that it needs to "execs into" -- typically this command-line instruction is put on the next line.

The execline package, as well as many packages in the s6 ecosystem, contain many programs that support chain loading.

It is also possible to implement custom chain loaders that follow the same protocol.

Developing s6-rc function abstractions for the Nix process management framework

In the Nix process management framework, I have added function abstractions for each s6-rc service type: longrun, oneshot and bundle.

For example, with the following Nix expression we can generate an s6-rc longrun configuration for the webapp process:

{createLongRunService, writeTextFile, execline, webapp}:

envFile = writeTextFile {
name = "envfile";
text = ''
createLongRunService {
name = "webapp";
run = writeTextFile {
name = "run";
executable = true;
text = ''
#!${execline}/bin/execlineb -P

envfile ${envFile}
fdmove -c 2 1
exec ${webapp}/bin/webapp
autoGenerateLogService = true;

Evaluating the Nix expression above does the following:

  • It generates a service directory that corresponds to the: name parameter with a longrun type property file.
  • It generates a run execline script, that uses a generated envFile for configuring the service's port number, redirects the standard error to the standard output and starts the webapp process (that runs in the foreground).
  • The autoGenerateLogService parameter is a concept I introduced myself, to conveniently configure a companion log service, because this a very common operation -- I cannot think of any scenario in which you do not want to have a dedicated log file for a long running service.

    Enabling this option causes the service to automatically become a producer for the log companion service (having the same name with a -log suffix) and automatically configures a logging companion service that consumes from it.

In addition to constructing long run services from Nix expressions, there are also abstraction functions to create one shots: createOneShotService and bundles: createServiceBundle.

The function that generates a log companion service can also be directly invoked with: createLogServiceForLongRunService, if desired.

Generating a s6-rc service configuration from a process-manager agnostic configuration

The following Nix expression is a process manager-agnostic configuration for the webapp service, that can be translated to a configuration for any supported process manager in the Nix process management framework:

{createManagedProcess, tmpDir}:
{port, instanceSuffix ? "", instanceName ? "webapp${instanceSuffix}"}:

webapp = import ../../webapp;
createManagedProcess {
name = instanceName;
description = "Simple web application";
inherit instanceName;

process = "${webapp}/bin/webapp";
daemonArgs = [ "-D" ];

environment = {
PORT = port;

overrides = {
sysvinit = {
runlevels = [ 3 4 5 ];

The Nix expression above specifies the following high-level configuration concepts:

  • The name and description attributes are just meta data. The description property is ignored by the s6-rc generator, because s6-rc has no equivalent configuration property for capturing a description.
  • A process manager-agnostic configuration can specify both how the service can be started as a foreground process or as a process that daemonizes itself.

    In the above example, the process attribute specifies that the same executable needs to invoked for both a foregroundProcess and daemon. The daemonArgs parameter specifies the command-line arguments that need to be propagated to the executable to let it daemonize itself.

    s6-rc has a preference for managing foreground processes, because these can be more reliably managed. When a foregroundProcess executable can be inferred, the generator will automatically compose a longrun service making it possible for s6 to supervise it.

    If only a daemon can be inferred, the generator will compose a oneshot service that starts the daemon with the up script, and on shutdown, terminates the daemon by dereferencing the PID file in the down script.
  • The environment attribute set parameter is automatically translated to an envfile that the generated run script consumes.
  • Similar to the sysvinit backend, it is also possible to override the generated arguments for the s6-rc backend, if desired.

As already explained in the blog post that covers the framework's concepts, the Nix expression above needs to be complemented with a constructors expression that composes the common parameters of every process configuration and a processes model that constructs process instances that need to be deployed.

The following processes model can be used to deploy a webapp process and an nginx reverse proxy instance that connects to it:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager

constructors = import ./constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir;
inherit forceDisableUserChange processManager;
rec {
webapp = rec {
port = 5000;
dnsName = "webapp.local";

pkg = constructors.webapp {
inherit port;

nginx = rec {
port = 8080;

pkg = constructors.nginxReverseProxyHostBased {
webapps = [ webapp ];
inherit port;
} {};

With the following command-line instruction, we can automatically create a scan directory and start s6-svscan:

$ nixproc-s6-svscan --state-dir $HOME/var

The --state-dir causes the scan directory to be created in the user's home directory making unprivileged deployments possible.

With the following command, we can deploy the entire system, that will get supervised by the s6-svscan service that we just started:

$ nixproc-s6-rc-switch --state-dir $HOME/var \
--force-disable-user-change processes.nix

The --force-disable-user-change parameter prevents the deployment system from creating users and groups and changing user privileges, allowing the deployment as an unprivileged user to succeed.

The result is a running system that allows us to connect to the webapp service via the Nginx reverse proxy:

$ curl -H 'Host: webapp.local' http://localhost:8080
<!DOCTYPE html>
<title>Simple test webapp</title>
Simple test webapp listening on port: 5000

Constructing multi-process Docker images supervised by s6

Another feature of the Nix process management framework is constructing multi-process Docker images in which multiple process instances are supervised by a process manager of choice.

s6 can also be used as a supervisor in a container. To accomplish this, we can use s6-linux-init as an entry point.

The following attribute generates a skeleton configuration directory:

skelDir = pkgs.stdenv.mkDerivation {
name = "s6-skel-dir";
buildCommand = ''
mkdir -p $out
cd $out

cat > rc.init <<EOF
#! ${} -e

# Stage 1
s6-rc-init -c /etc/s6/rc/compiled /run/service

# Stage 2
s6-rc -v2 -up change default

chmod 755 rc.init

cat > rc.shutdown <<EOF
#! ${} -e

exec s6-rc -v2 -bDa change

chmod 755 rc.shutdown

cat > <<EOF
#! ${} -e
# Empty
chmod 755

The skeleton directory generated by the above sub expression contains three configuration files:

  • rc.init is the script that the init system starts, right after starting the supervisor: s6-svscan. It is responsible for initializing the s6-rc system and starting all services in the default bundle.
  • rc.shutdown script is executed on shutdown and stops all previously started services by s6-rc.
  • runs at the very end of the shutdown procedure, after all processes have been killed and all file systems have been unmounted. In the above expression, it does nothing.

In the initialization process of the image (the runAsRoot parameter of dockerTools.buildImage), we need to execute a number of dynamic initialization steps.

First, we must initialize s6-linux-init to read its configuration files from /etc/s6/current using the skeleton directory (that we have configured in the sub expression shown earlier) as its initial contents (the -f parameter) and run the init system in container mode (the -C parameter):

mkdir -p /etc/s6
s6-linux-init-maker -c /etc/s6/current -p /bin -m 0022 -f ${skelDir} -N -C -B /etc/s6/current
mv /etc/s6/current/bin/* /bin
rmdir etc/s6/current/bin

s6-linux-init-maker generates an /bin/init script, that we can use as the container's entry point.

I want the logging services to run as an unprivileged user (s6-log) requiring me to create the user and corresponding group first:

groupadd -g 2 s6-log
useradd -u 2 -d /dev/null -g s6-log s6-log

We must also compile a database from the s6-rc configuration files, by running the following command-line instructions:

mkdir -p /etc/s6/rc
s6-rc-compile /etc/s6/rc/compiled ${profile}/etc/s6/sv

As can be seen in the rc.init script that we have generated earlier, the compiled database: /etc/s6/rc/compiled is propagated to s6-rc-init as a command-line parameter.

With the following Nix expression, we can build an s6-rc managed multi-process Docker image that deploys all the process instances in the processes model that we have written earlier:

pkgs = import <nixpkgs> {};

createMultiProcessImage = import ../../nixproc/create-multi-process-image/create-multi-process-image-universal.nix {
inherit pkgs system;
inherit (pkgs) dockerTools stdenv;
createMultiProcessImage {
name = "multiprocess";
tag = "test";
exprFile = ./processes.nix;
stateDir = "/var";
processManager = "s6-rc";

With the following command, we can build the image:

$ nix-build

and load the image into Docker with the following command:

$ docker load -i result


With the addition of the s6-rc backend in the Nix process management framework, we have a modern alternative to systemd at our disposal.

We can easily let services be managed by s6-rc using the same agnostic high-level deployment configurations that can also be used to target other process management backends, including systemd.

What I particularly like about the s6 tool ecosystem (and this also applies in some extent to its ancestor: daemontools and cousin project: runit) is the idea to construct the entire system's initialization process and its sub concerns (process supervision, logging and service management) from separate tools, each having clear/fixed scopes.

This kind of design reminds me of microkernels -- in a microkernel design, the kernel is basically split into multiple collaborating processes each having their own responsibilities (e.g. file systems, drivers).

The microkernel is the only process that has full access to the system and typically only has very few responsibilities (e.g. memory management, task scheduling, interrupt handling).

When a process crashes, such as a driver, this failure should not tear the entire system down. Systems can even recover from problems, by restarting crashed processes.

Furthermore, these non-kernel processes typically have very few privileges. If a process' security gets compromised (such as a leaky driver), the system as a whole will not be affected.

Aside from a number of functional differences compared to systemd, there are also some non-functional differences as well.

systemd can only be used on Linux using glibc as the system's libc, s6 can also be used on different operating systems (e.g. the BSDs) with different libc implementations, such as musl.

Moreover, the supervisor service (s6-svscan) can also be used as a user-level supervisor that does not need to run as PID 1. Although systemd supports user sessions (allowing service deployments from unprivileged users), it still has the requirement to have systemd as an init system that needs to run as the system's PID 1.

Improvement suggestions

Although the s6 ecosystem provides useful tools and has all kinds of powerful features, I also have a number of improvement suggestions. They are mostly usability related:

  • I have noticed that the command-line tools have very brief help pages -- they only enumerate the available options, but they do not provide any additional information explaining what these options do.

    I have also noticed that there are no official manpages, but there is a third-party initiative that seems to provide them.

    The "official" source of reference are the HTML pages. For me personally, it is not always convenient to access HTML pages on limited machines with no Internet connection and/or only terminal access.
  • Although each individual tool is well documented (albeit in HTML), I was having quite a few difficulties figuring out how to use them together -- because every tool has a very specific purpose, you typically need to combine them in interesting ways to do something meaningful.

    For example, I could not find any clear documentation on skarnet describing typical combined usage scenarios, such as how to use s6-rc on a conventional Linux distribution that already has a different service management solution.

    Fortunately, I discovered a Linux distribution that turned out to be immensely helpful: Artix Linux. Artix Linux provides s6 as one of its supported process management solutions. I ended up installing Artix Linux in a virtual machine and reading their documentation.

    This kind of unclarity seems to be somewhat analogous to common criticisms of microkernels: one of Linus Torvalds' criticisms is that in microkernel designs, the pieces are simplified, but the coordination of the entire system is more difficult.
  • Updating existing service configurations is difficult and cumbersome. Each time I want to change something (e.g. adding a new service), then I need to compile a new database, make sure that the newly compiled database co-exists with the previous database, and then run s6-rc-update.

    It is very easy to make mistakes. For example, I ended up overwriting the previous database several times. When this happens, the upgrade process gets stuck.

    systemd, on the other hand, allows you to put a new service configuration file in the configuration directory, such as: /etc/systemd/system. We can conveniently reload the configuration with a single command-line instruction:

    $ systemctl daemon-reload
    I believe that the updating process can still be somewhat simplified in s6-rc. Fortunately, I have managed to hide that complexity in the nixproc-s6-rc-deploy tool.
  • It was also difficult to find out all the available configuration properties for s6-rc services -- I ended up looking at the examples and studying the documentation pages for s6-rc-compile, s6-supervise and service directories.

    I think that it could be very helpful to write a dedicated documentation page that describes all configurable properties of s6-rc services.
  • I believe it is also very common that for each longrun service (with a -srv suffix), that you want a companion logging service (with a -log suffix).

    As a matter of fact, I can hardly think of a situation in which you do not want this. Maybe it helps to introduce a convenience property to automatically facilitate the generation of log companion services.


The s6-rc backend described in this blog post is part of the current development version of the Nix process management framework, that is still under heavy development.

The framework can be obtained from my GitHub page.

by Sander van der Burg ( at February 01, 2021 09:29 PM

January 28, 2021


Safe service upgrades using system.stateVersion

One of the most important features for system administrators who operate NixOS systems are atomic upgrades which means that a deployment won’t reach an inconsistent state: if building a new system’s configuration succeeds, it will be activated in a single step by replacing the /run/current-system-symlink. If a build fails, e.g. due to broken packages, the configuration won’t be activated. This also means that downgrades are fairly simple since a previous configuration can be reactivated in a so-called rollback by changing the symlink to /run/current-system back to the previous store-path.

January 28, 2021 10:23 AM

January 22, 2021

Tweag I/O

Programming with contracts in Nickel

In a previous post, I gave a taste of Nickel, a configuration language we are developing at Tweag. One cool feature of Nickel is the ability to validate data and enforce program invariants using so-called contracts. In this post, I introduce the general concept of programming with contracts and illustrate it in Nickel.

Contracts are everywhere

You go to your favorite bakery and buy a croissant. Is there a contract binding you to the baker?

A long time ago, I was puzzled by this very first question of a law class exam. It looked really simple, yet I had absolutely no clue.


A contract should write down terms and conditions, and be signed by both parties. How could buying a croissant involve such a daunting liability?

Well, I have to confess that this exam didn’t go very well.

It turns out the sheer act of selling something implicitly and automatically establishes a legally binding contract between both parties (at least in France). For once, the programming world is not that different from the physical world: if I see a ConcurrentHashmap class in a Java library, given the context of Java’s naming conventions, I rightfully expect it to be a thread-safe implementation of a hashmap. This is a form of contract. If a programmer uses ConcurrentHashmap to name a class that implements a non-thread safe linked list, they should probably be sent to court.

Contracts may take multiple forms. A contract can be explicit, such as in a formal specification, or implicit, as in the ConcurrentHashMap example. They can be enforced or not, such as a type signature in a statically typed language versus an invariant written as a comment in a dynamically typed language. Here are a few examples:

Contract Explicitness Enforced
Static types Implicit if inferred, explicit otherwise Yes, at compile time
Dynamic types Implicit Yes, at run-time
Documentation Explicit No
Naming Implicit No
assert() primitive Explicit Yes, at run-time
pre/post conditions Explicit Yes, at run-time or compile time

As often, explicit is better than implicit: it leaves no room for misunderstanding. Enforced is better than not, because I would rather be protected by a proper legal system in case of contract violation.

Programming with Contracts

Until now, I’ve been using the word contract in a wide sense. It turns out contracts also refer to a particular programming paradigm which embodies the general notion pretty well. Such contracts are explicit and enforced, following our terminology. They are most notably used in Racket. From now on, I shall use contract in this more specific sense.

To first approximation, contracts are assertions. They check that a value satisfies some property at run-time. If the test passes, the execution can go on normally. Otherwise, an error is raised.

In Nickel, one can enforce a contract using the | operator:

let x = (1 + 1 | Num) in 2*x

Here, x is bound to a Num contract. When evaluating x, the following steps are performed:

  1. evaluate 1 + 1
  2. check that the result is a number
  3. if it is, return the expression unchanged. Otherwise, raise an error that halts the program.

Let’s see it in action:

$nickel <<< '1 + 1 | Num'
Done: Num(2.0)

$nickel <<< 'false | Num'
error: Blame error: contract broken by a value.
  ┌─ :1:1
1 │ Num
  │ --- expected type
  ┌─ <stdin>:1:9
1 │ false | Num
  │         ^^^ bound here

Contracts versus types

I’ve described contracts as assertions, but the above snippet suspiciously resembles a type annotation. How do contracts compare to types? First of all, contracts are checked at run-time, so they would correspond to dynamic typing rather than static typing. Secondly, contracts can check more than just the membership to a type:

let GreaterThan2 = fun label x =>
  if builtins.isNum x then
    if x > 2 then
      contracts.blame (contracts.tag "smaller or equals" label)
    contracts.blame (contracts.tag "not a number" label)

(3 | #GreaterThan2) // Ok, evaluate to 3
(1 | #GreaterThan2) // Err, `smaller or equals`
("a" | #GreaterThan2) // Err, `not a number`

Here, we just built a custom contract. A custom contract is a function of two arguments:

  • the label label, carrying information for error reporting.
  • the value x to be tested.

If the value satisfies the condition, it is returned. Otherwise, a call to blame signals rejection with an optional error message attached via tag. When evaluating value | #Contract, the interpreter calls Contract with an appropriate label and value as arguments.

Such custom contracts can check arbitrary properties. Enforcing the property of being greater than two using static types is rather hard, requiring a fancy type system such as refinement types , while the role of dynamic types generally stops at distinguishing basic datatypes and functions.

Back to our first example 1 + 1 | Num, we could have written instead:

let MyNum = fun label x =>
  if builtins.isNum x then x else contracts.blame label in
(1 + 1 | #MyNum)

This is in fact pretty much what 1 + 1 | Num evaluates to. While a contract is not the same entity as a type, one can derive a contract from any type. Writing 1 + 1 | Num asks the interpreter to derive a contract from the type Num and to check 1 + 1 against it. This is just a convenient syntax to specify common contracts. The# character distinguishes contracts as types from contracts as functions (that is, custom contracts).

To sum up, contracts are just glorified assertions. Also, there is this incredibly convenient syntax that spares us a whole three characters by writing Num instead of #MyNum. So… is that all the fuss is about?

Function contracts

Until now, we have only considered what are called flat contracts, which operate on data. But Nickel is a functional programming language: so what about function contracts? They exist too!

let f | Str -> Num = fun x => if x == "a" then 0 else 1 in ...

Here again, we ask Nickel to derive a contract for us, from the type Str -> Num of functions sending strings to numbers. To find out how this contract could work, we must understand what is the defining property of a function of type Str -> Num that the contract should enforce.

A function of type Str -> Num has a duty: it must produce a number. But what if I call f on a boolean? That’s unfair, because the function has also a right: the argument must be a string. The full contract is thus: if you give me a string, I give you a number. If you give me something else, you broke the contract, so I can’t guarantee anything. Another way of viewing it is that the left side of the arrow represents preconditions on the input while the right side represents postconditions on the output.

More than flat contracts, function contracts show similarities with traditional legal contracts. We have two parties: the caller, f "b", and the callee, f. Both must meet conditions: the caller must provide a string while the callee must return a number.

In practice, inspecting the term f can tell us if it is a function at most. This is because a function is inert, waiting for an argument to hand back a result. In consequence, the contract is doomed to fire only when f is applied to an argument, in which case it checks that:

  1. The argument satisfies the Str contract
  2. The return value satisfies the Num contract

The interpreter performs additional bookkeeping to be able to correctly blame the offending code in case of a higher-order contract violation:

$nickel <<< 'let f | Str -> Num = fun x => if x == "a" then 0 else 1 in f "a"'
Done: Num(0.0)

$nickel <<< '... in f 0'
error: Blame error: contract broken by the caller.
  ┌─ :1:1
1 │ Str -> Num
  │ --- expected type of the argument provided by the caller
  ┌─ <stdin>:1:9
1 │ let f | Str -> Num = fun x => if x == "a" then 0 else 1 in f 0
  │         ^^^^^^^^^^ bound here

$nickel <<< 'let f | Str -> Num = fun x => x in f "a"'
error: Blame error: contract broken by a function.
  ┌─ :1:8
1 │ Str -> Num
  │        --- expected return type
  ┌─ <stdin>:1:9
1 │ let f | Str -> Num = fun x => x in f "a"
  │         ^^^^^^^^^^ bound here

These examples illustrate three possible situations:

  1. The contract is honored by both parties.
  2. The contract is broken by the caller, which provides a number instead of a string.
  3. The contract is broken by the function (callee), which rightfully got a string but returned a string instead of a number.

Combined with custom contracts, function contracts make it possible to express succinctly non-trivial invariants:

let f | #GreaterThan2 -> #GreaterThan2 = fun x => x + 1 in ..

A warning about laziness

Nickel is a lazy programming language. This means that expressions, including contracts, are evaluated only if they are needed. If you are experimenting with contracts and some checks buried inside lists or records do not seem to trigger, you can use the deepSeq operator to recursively force the evaluation of all subterms, including contracts:

let exp = ..YOUR CODE WITH CONTRACTS.. in builtins.deepSeq exp exp


In this post, I introduced programming with contracts. Contracts offer a principled and ergonomic way of validating data and enforcing invariants with a good error reporting story. Contracts can express arbitrary properties that are hard to enforce statically, and they can handle higher-order functions.

Contracts also have a special relationship with static typing. While we compared them as competitors somehow, contracts and static types are actually complementary, reunited in the setting of gradual typing. Nickel has gradual types, which will be the subject of a coming post.

The examples here are illustrative, but we’ll see more specific and compelling usages of contracts in yet another coming post about Nickel’s meta-values, which, together with contracts, serve as a unified way to describe and validate configurations.

January 22, 2021 12:00 AM

January 13, 2021

Finding Non-determinism with

During the last decade, many initiatives focussing on making builds reproducible have gained momentum. is a great resource for anyone interested in how the work progresses in multiple software communities. tracks the current reproducibility metrics in NixOS.

Nix is particularly suited for working on reproducibility, since it by design isolates builds and comes with tools for finding non-determinism. The Nix community also works on related projects, like Trustix and the content-addressed store.

This blog post summarises how can be useful for finding non-deterministic builds, and announces a new feature related to reproducibility!

Repeated Builds

The way to find non-reproducible builds is to run the same build multiple times and check for any difference in results, when compared bit-for-bit. Since Nix guarantees that all inputs will be identical between the runs, just finding differing output results is enough to conclude that a build is non-deterministic. Of course, we can never prove that a build is deterministic this way, but if we run the build many times, we gain a certain confidence in it.

To run a Nix build multiple times, simply add the –repeat option to your build command. It will run your build the number of extra times you specify.

Suppose we have the following Nix expression in deterministic.nix:

  inherit (import <nixpkgs> {}) runCommand;
in {
  stable = runCommand "stable" {} ''
    touch $out

  unstable = runCommand "unstable" {} ''
    echo $RANDOM > $out

We can run repeated builds like this (note that the --builders "" option is there to force a local build, to not use

$ nix-build deterministic.nix --builders "" -A stable --repeat 1
these derivations will be built:
building '/nix/store/0fj164aqyhsciy7x97s1baswygxn8lzf-stable.drv' (round 1/2)...
building '/nix/store/0fj164aqyhsciy7x97s1baswygxn8lzf-stable.drv' (round 2/2)...

$ nix-build deterministic.nix --builders "" -A unstable --repeat 1
these derivations will be built:
building '/nix/store/psmn1s3bb97989w5a5b1gmjmprqcmf0k-unstable.drv' (round 1/2)...
building '/nix/store/psmn1s3bb97989w5a5b1gmjmprqcmf0k-unstable.drv' (round 2/2)...
output '/nix/store/g7a5sf7iwdxs7q12ksrzlvjvz69yfq3l-unstable' of '/nix/store/psmn1s3bb97989w5a5b1gmjmprqcmf0k-unstable.drv' differs from previous round
error: build of '/nix/store/psmn1s3bb97989w5a5b1gmjmprqcmf0k-unstable.drv' failed

Running repeated builds on works exactly the same way:

$ nix-build deterministic.nix -A stable --repeat 1
these derivations will be built:
building '/nix/store/wnd5y30jp3xwpw1bhs4bmqsg5q60vc8i-stable.drv' (round 1/2) on 'ssh://'...
copying 1 paths...
copying path '/nix/store/z3wlpwgz66ningdbggakqpvl0jp8bp36-stable' from 'ssh://'...

$ nix-build deterministic.nix -A unstable --repeat 1
these derivations will be built:
building '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' (round 1/2) on 'ssh://'...
[] output '/nix/store/srch6l8pyl7z93c7gp1xzf6mq6rwqbaq-unstable' of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' differs from previous round
error: build of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' on 'ssh://' failed: build was non-deterministic
builder for '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' failed with exit code 1
error: build of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' failed

As you can see, the log output differs slightly between the local and the remote builds. This is because when Nix submits a remote build, it will not do the determinism check itself, instead it will leave it up to the builder ( in our case). This is actually a good thing, because it allows to perform some optimizations for repeated builds. The following sections will enumerate those optimizations.

Finding Non-determinism in Past Builds

If you locally try to rebuild a something that has failed due to non-determinism, Nix will build it again at least two times (due to --repeat) and fail it due to non-determinism again, since it keeps no record of the previous build failure (other than the build log).

However, keeps a record of every build performed, also for repeated builds. So when you try to build the same derivation again, is smart enough to look at its past build and figure out that the derivation is non-deterministic without having to rebuild it. We can demonstrate this by re-running the last build from the example above:

$ nix-build deterministic.nix -A unstable --repeat 1
these derivations will be built:
building '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' (round 1/2) on 'ssh://'...
[] output '/nix/store/srch6l8pyl7z93c7gp1xzf6mq6rwqbaq-unstable' of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' differs from previous round
error: build of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' on 'ssh://' failed: a previous build of the derivation was non-deterministic
builder for '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' failed with exit code 1
error: build of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' failed

As you can see, the exact same derivation fails again, but now the build status message says: a previous build of the derivation was non-deterministic. This means didn’t have to run the build, it just checked its past outputs for the derivation and noticed they differed.

When looks at past builds it considers all outputs that have been signed by a key that the account trusts. That means that it can even compare outputs that have been fetched by substitution.

Scaling Out Repeated Builds

When you use --repeat, will create multiple copies of the build and schedule all of them like any other build would have been scheduled. This means that every repeated build will run in parallel, saving time for the user. As soon as has found proof of non-determinism, any repeated build still running will be cancelled.

Provoking Non-determinism through Filesystem Randomness

As promised in the beginning of this blog post, we have new a feature to announce! is now able to inject randomness into the filesystem that the builds see when they run. This can be used to provoke builds to uncover non-deterministic behavior.

The idea is not new, it is in fact the exact same concept as have been implemented in the disorderfs project by However, we’re happy to make it easily accessible to users. The feature is disabled by default, but can be enabled through a new user setting.

For the moment, the implementation will return directory entries in a random order when enabled. In the future we might inject more metadata randomness.

To demonstrate this feature, let’s use this build:

  inherit (import <nixpkgs> {}) runCommand;
in rec {
  files = runCommand "files" {} ''
    mkdir $out
    touch $out/{1..10}

  list = runCommand "list" {} ''
    ls -f ${files} > $out

The files build just creates ten empty files as its output, and the list build lists those file with ls. The -f option of ls disables sorting entirely, so the file names will be printed in the order the filesystem returns them. This means that the build output will depend on how the underlying filesystem is implemented, which could be considered a non-deterministic behavior.

First, we build it locally with --repeat:

$ nix-build non-deterministic-fs.nix --builders "" -A list --repeat 1
these derivations will be built:
building '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' (round 1/2)...
building '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' (round 2/2)...

As you can see, the build succeeded. Then we delete the result from our Nix store so we can run the build again:

rm result
nix-store --delete /nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list

We enable the inject-fs-randomness feature through the shell:> set inject-fs-randomness true

Then we run the build (with --repeat) on

$ nix-build non-deterministic-fs.nix -A list --repeat 1
these derivations will be built:
building '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' (round 1/2) on 'ssh://'...
copying 1 paths...
copying path '/nix/store/vl13q40hqp4q8x6xjvx0by06s1v9g3jy-files' to 'ssh://'...
[] output '/nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list' of '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' differs from previous round
error: build of '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' on 'ssh://' failed: build was non-deterministic
builder for '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' failed with exit code 1
error: build of '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' failed

Now, found the non-determinism! We can double check that the directory entries are in a random order by running without --repeat:

$ nix-build non-deterministic-fs.nix -A list
these derivations will be built:
building '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' on 'ssh://'...
copying 1 paths...
copying path '/nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list' from 'ssh://'...

$ cat /nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list

Future Work

There are lots of possibilities to improve the utility of when it comes to reproducible builds. Your feedback and ideas are very welcome to

Here are some of the things that could be done:

  • Make it possible to trigger repeated builds for any previous build, without submitting a new build with Nix. For example, there could be a command in the shell allowing a user to trigger a repeated build and report back any non-determinism issues.

  • Implement functionality similar to diffoscope to be able to find out exactly what differs between builds. This could be available as a shell command or through an API.

  • Make it possible to download specific build outputs. The way Nix downloads outputs (and stores them locally) doesn’t allow for having multiple variants of the same output, but could provide this functionality through the shell or an API.

  • Inject more randomness inside the sandbox. Since we have complete control over the sandbox environment we can introduce more differences between repeated builds to provoke non-determinism. For example, we can schedule builds on different hardware or use different kernels between repeated builds.

  • Add support for listing known non-deterministic derivations.

by ( at January 13, 2021 12:00 AM

December 29, 2020

The First Year

One year ago was announced to the Nix community for the very first time. The service then ran as a closed beta for 7 months until it was made generally available on the 28th of August 2020.

This blog post will try to summarize how has evolved since GA four months ago, and give a glimpse of the future for the service.

Stability and Performance

Thousands of Nix builds have been built by so far, and every build helps in making the service more reliable by uncovering possible edge cases in the build environment.

These are some of the stability-related improvements and fixes that have been deployed since GA:

  • Better detection and handling of builds that time out or hang.

  • Improved retry logic should our backend storage not deliver Nix closures as expected.

  • Fixes to the virtual file system inside the KVM sandbox.

  • Better handling of builds that have binary data in their log output.

  • Changes to the virtual sandbox environment so it looks even more like a “standard” Linux environment.

  • Application of the Nix sandbox inside our KVM sandbox. This basically guarantees that the Nix environment provided through is identical to the Nix environment for local builds.

  • Support for following HTTP redirects from binary caches.

Even Better Build Reuse

One of the fundamental ideas in is to try as hard as possible to not build your builds, if an existing build result can be reused instead. We can trivially reuse an account’s own builds since they are implicitly trusted by the user, but also untrusted builds can be reused under certain circumstances. This has been described in detail in an earlier blog post

Since GA we’ve introduced a number of new ways build results can be reused.

Reuse of Build Failures

Build failures are now also reused. This means that if someone tries to build a build that is identical (in the sense that the derivation and its transitive input closure is bit-by-bit identical) to a previously failed build, will immediately serve back the failed result instead of re-running the build. You will even get the build log replayed.

Build failures can be reused since we are confident that our sandbox is pure, meaning that it will behave exactly the same as long as the build is exactly the same. Only non-transient failures will be reused. So if the builder misbehaves in some way that is out of control for Nix, that failure will not be reused. This can happen if the builder machine breaks down or something similar. In such cases we will automatically re-run the build anyway.

When we fix bugs or make major changes in our sandbox it can happen that we alter the behavior in terms of which builds succeed or fail. For example, we could find a build that fail just because we have missed implementing some specific detail in the sandbox. Once that is fixed, we don’t want to reuse such failures. To avoid that, all existing build failures will be “invalidated” on each major update of the sandbox.

If a user really wants to re-run a failed build on, failure reuse can be turned off using the new user settings (see below).

Reuse of Build Timeouts

In a similar vein to reused build failures, we can also reuse build timeouts. This is not enabled by default, since users can select different timeout limits. A user can activate reuse of build timeouts through the user settings.

The reuse of timed out builds works like this: Each time a new build is submitted, we check if we have any previous build results of the exact same build. If no successful results or plain failures are found, we look for builds that have timed out. We then check if any of the existing timed out builds ran for longer than the user-specified timeout for the new build. If we can find such a result, it will be served back to the user instead of re-running the build.

This feature can be very useful if you want to avoid re-running builds that timeout over and over again (which can be a very time-consuming excercise). For example, say that you have your build timeout set to two hours, and some input needed for a build takes longer than that to build. The first time that input is needed you have to wait two hours to detect that the build will fail. If you then try building something else that happens to depend on the very same input you will save two hours by directly being served the build failure from!

Wait for Running Builds

When a new build is submitted, will now check if there is any identical build currently running (after checking for previous build results or failures). If there is, the new build will simply hold until the running build has finished. After that, the result of the running build will likely be served back as the result of the new build (as long as the running build wasn’t terminated in a transient way, in which case the new build will have to run from scratch). The identical running builds are checked and reused across accounts.

Before this change, would simply start another build in parallel even if the builds were identical.

New Features

User Settings

A completely new feature has been launched since GA: User Settings. This allows end users to tweak the behavior of For example, the build reuse described above can be controlled by user settings. Other settings includes controlling the maximum used build time per month, and the possibility to lock down specific SSH keys which is useful in CI setups.

The user settings can be set in various way; through the shell, the SSH client environment and even through the Nix derivations themselves.

Even if many users probably never need to change any settings, it can be helpful to read through the documentation to get a feeling for what is possible. If you need to differentiate permissions in any way (different settings for account administrators, developers, CI etc) you should definitely look into the various user settings.

GitHub CI Action

A GitHub Action has been published. This action makes it very easy to use as a remote Nix builder in your GitHub Actions workflows. Instead of running you Nix builds on the two vCPUs provided by GitHub you can now enjoy scale-out Nix builds on with minimal setup required.

The GitHub Action is developed by the team and there are plans on adding more functionality that can offer users, like automatically generated cost and performance reports for your Nix builds.

Shell Improvements

Various minor improvements have been made to the shell. It is for example now much easier to get an overview on how large your next invoice will be, through the usage command.

The Future

After one year of real world usage, we are very happy with the progress of It has been well received in the Nix community, proved both reliable and scalable, and it has delivered on our initial vision of a simple service that can integrate into any setup using Nix.

We feel that we can go anywhere from here, but we also realize that we must be guided by our users’ needs. We have compiled a small and informal roadmap below. The items on this list are things that we, based on the feedback we’ve received throughout the year, think are natural next steps for

The roadmap has no dates and no prioritization, and should be seen as merely a hint about which direction the development is heading. Any question or comment concerning this list (or what’s missing from the list) is very welcome to

Support aarch64-linux Builds

Work is already underway to add support for aarch64-linux builds to, and so far it is looking good. With the current surge in performant ARM hardware (Apple M1, Ampere Altra etc), we think having aarch64 support in is an obvious feature. It is also something that has been requested by our users.

We don’t know yet how the pricing of aarch64 builds will look, or what scalability promises we can make. If you are interested in evaluating aarch64 builds on in an early access setting, just send us an email to

Provide an API over SSH and HTTP

Currently the shell is the administrative tool we offer end users. We will keep developing the shell and make it more intuitive for interactive use. But will also add an alternative, more scriptable variant of the shell.

This alternative version will provide roughly the same functionality as the original shell, only more adapted to scripting instead of interactive use. The reason for providing such an SSH-based API is to make it easy to integrate more tightly into CI and similar scenarios.

There is in fact already a tiny version of this API deployed. You can run the following command to try it out:

$ ssh api show public-signing-key

The above API command is in use by the nixbuild-action for GitHub. So far, this is the only API command implemented, and it should be seen as a very first proof of concept. Nothing has been decided on how the API should look and work in the future.

The API will also be offered over HTTP in addition to SSH.

Upload builds to binary caches

Adding custom binary caches that can fetch dependencies from is supported today, although such requests are still handled manually through support.

We also want to support uploading to custom binary caches. That way users could gain performance by not having to first download build results from and then upload them somewhere else. This could be very useful for CI setups that can spend a considerable amount of their time just uploading closures.

Provide an HTTP-based binary cache

Using as a binary cache is handy since you don’t have to wait for any uploads after a build has finished. Instead, the closures will be immediately available in the binary cache, backed by

It is actually possible to use as a binary cache today, by configuring an SSH-based cache (ssh:// This works out of the box right now. You can even use nix-copy-closure to upload paths to We just don’t yet give any guarantees on how long store paths are kept.

However, there are benfits to providing an HTTP-based cache. It would most probably have better performance (serving nar files over HTTP instead of using the nix-store protocol over SSH), but more importantly it would let us use a CDN for serving cache contents. This could help mitigate the fact that is only deployed in Europe so far.

Support builds that use KVM

The primary motivation for this is to be able to run NixOS tests (with good performance) on

Thank You!

Finally we’d like to thank all our users. We look forward to an exciting new year with lots of Nix builds!

by ( at December 29, 2020 12:00 AM

December 24, 2020


Postmortem of outage on 20th December

On 20 December, Cachix experienced a six-hour downtime, the second significant outage since the service started operating on 1 June 2018. Here are the details of what exactly happened and what has been done to prevent similar events from happening. Timeline (UTC) 02:55:07 - Backend starts to emit errors for all HTTP requests 02:56:00 - Pagerduty tries to notify me of outage via email, phone and mobile app 09:01:00 - I wake up and see the notifications 09:02:02 - Backend is restarted and recovers Root cause analysis All ~24k HTTP requests that reached the backend during the outage failed with the following exception:

by Domen Kožar ( at December 24, 2020 11:30 AM

December 23, 2020

Ollie Charles

Monad Transformers and Effects with Backpack

A good few years ago Edward Yang gifted us an implementation of Backpack - a way for us to essentially abstract modules over other modules, allowing us to write code independently of implementation. A big benefit of doing this is that it opens up new avenues for program optimization. When we provide concrete instantiations of signatures, GHC compiles it as if that were the original code we wrote, and we can benefit from a lot of specialization. So aside from organizational concerns, Backpack gives us the ability to write some really fast code. This benefit isn’t just theoretical - Edward Kmett gave us unpacked-containers, removing a level of indirection from all keys, and Oleg Grenrus showed as how we can use Backpack to “unroll” fixed sized vectors. In this post, I want to show how we can use Backpack to give us the performance benefits of explicit transformers, but without having library code commit to any specific stack. In short, we get the ability to have multiple interpretations of our program, but without paying the performance cost of abstraction.

The Problem

Before we start looking at any code, let’s look at some requirements, and understand the problems that come with some potential solutions. The main requirement is that we are able to write code that requires some effects (in essence, writing our code to an effect interface), and then run this code with different interpretations. For example, in production I might want to run as fast as possible, in local development I might want further diagnostics, and in testing I might want a pure or in memory solution. This change in representation shouldn’t require me to change the underlying library code.

Seasoned Haskellers might be familiar with the use of effect systems to solve these kinds of problems. Perhaps the most familiar is the mtl approach - perhaps unfortunately named as the technique itself doesn’t have much to do with the library. In the mtl approach, we write our interfaces as type classes abstracting over some Monad m, and then provide instances of these type classes - either by stacking transformers (“plucking constraints”, in the words of Matt Parson), or by a “mega monad” that implements many of these instances at once (e.g., like Tweag’s capability) approach.

Despite a few annoyances (e.g., the “n+k” problem, the lack of implementations being first-class, and a few other things), this approach can work well. It also has the potential to generate a great code, but in practice it’s rarely possible to achieve maximal performance. In her excellent talk “Effects for Less”, Alexis King hits the nail on the head - despite being able to provide good code for the implementations of particular parts of an effect, the majority of effectful code is really just threading around inside the Monad constraint. When we’re being polymorphic over any Monad m, GHC is at a loss to do any further optimization - and how could it? We know nothing more than “there will be some >>= function when you get here, promise!” Let’s look at this in a bit more detail.

Say we have the following:

foo :: Monad m => m Int
foo = go 0 1_000_000_000
    go acc 0 = return acc
    go acc i = return acc >> go (acc + 1) (i - 1)

This is obviously “I needed an example for my blog” levels of contrived, but at least small. How does it execute? What are the runtime consequences of this code? To answer, we’ll go all the way down to the STG level with -ddump-stg:

$wfoo =
    \r [ww_s2FA ww1_s2FB]
        let {
          Rec {
          $sgo_s2FC =
              \r [sc_s2FD sc1_s2FE]
                  case eqInteger# sc_s2FD lvl1_r2Fp of {
                    __DEFAULT ->
                        let {
                          sat_s2FK =
                              \u []
                                  case +# [sc1_s2FE 1#] of sat_s2FJ {
                                    __DEFAULT ->
                                        case minusInteger sc_s2FD lvl_r2Fo of sat_s2FI {
                                          __DEFAULT -> $sgo_s2FC sat_s2FI sat_s2FJ;
                                  }; } in
                        let {
                          sat_s2FH =
                              \u []
                                  let { sat_s2FG = CCCS I#! [sc1_s2FE]; } in  ww1_s2FB sat_s2FG;
                        } in  ww_s2FA sat_s2FH sat_s2FK;
                    1# ->
                        let { sat_s2FL = CCCS I#! [sc1_s2FE]; } in  ww1_s2FB sat_s2FL;
          end Rec }
        } in  $sgo_s2FC lvl2_r2Fq 0#;

foo =
    \r [w_s2FM]
        case w_s2FM of {
          C:Monad _ _ ww3_s2FQ ww4_s2FR -> $wfoo ww3_s2FQ ww4_s2FR;

In STG, whenever we have a let we have to do a heap allocation - and this code has quite a few! Of particular interest is the what’s going on inside the actual loop $sgo_s2FC. This loop first compares i to see if it’s 0. In the case that’s it’s not, we allocate two objects and call ww_s2Fa. If you squint, you’ll notice that ww_s2FA is the first argument to $wfoo, and it ultimately comes from unpacking a C:Monad dictionary. I’ll save you the labor of working out what this is - ww_s2Fa is the >>. We can see that every iteration of our loop incurs two allocations for each argument to >>. A heap allocation doesn’t come for free - not only do we have to do the allocation, the entry into the heap incurs a pointer indirection (as heap objects have an info table that points to their entry), and also by merely being on the heap we increase our GC time as we have a bigger heap to traverse. While my STG knowledge isn’t great, my understanding of this code is that every time we want to call >>, we need to supply it with its arguments. This means we have to allocate two closures for this function call - which is basically whenever we pressed “return” on our keyboard when we wrote the code. This seems crazy - can you imagine if you were told in C that merely using ; would cost time and memory?

If we compile this code in a separate module, mark it as {-# NOINLINE #-}, and then call it from main - how’s the performance? Let’s check!

module Main (main) where

import Foo

main :: IO ()
main = print =<< foo
$ ./Main +RTS -s
 176,000,051,368 bytes allocated in the heap
       8,159,080 bytes copied during GC
          44,408 bytes maximum residency (1 sample(s))
          33,416 bytes maximum slop
               0 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     169836 colls,     0 par    0.358s   0.338s     0.0000s    0.0001s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time   54.589s  ( 54.627s elapsed)
  GC      time    0.358s  (  0.338s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   54.947s  ( 54.965s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    3,224,078,302 bytes per MUT second

  Productivity  99.3% of total user, 99.4% of total elapsed

OUCH. My i7 laptop took almost a minute to iterate a loop 1 billion times.

A little disclaimer: I’m intentionally painting a severe picture here - in practice this cost is irrelevant to all but the most performance sensitive programs. Also, notice where the let bindings are in the STG above - they are nested within the loop. This means that we’re essentially allocating “as we go” - these allocations are incredibly cheap, and the growth to GC is equal trivial, resulting in more like constant GC pressure, rather than impending doom. For code that is likely to do any IO, this cost is likely negligible compared to the rest of the work. Nonetheless, it is there, and when it’s there, it’s nice to know if there are alternatives.

So, is the TL;DR that Haskell is completely incapable of writing effectful code? No, of course not. There is another way to compile this program, but we need a bit more information. If we happen to know what m is and we have access to the Monad dictionary for m, then we might be able to inline >>=. When we do this, GHC can be a lot smarter. The end result is code that now doesn’t allocate for every single >>=, and instead just gets on with doing work. One trivial way to witness this is to define everything in a single module (Alexis rightly points out this is a trap for benchmarking that many fall into, but for our uses it’s the behavior we actually want).

This time, let’s write everything in one module:

module Main ( main ) where

And the STG:

lvl_r4AM = CCS_DONT_CARE S#! [0#];

lvl1_r4AN = CCS_DONT_CARE S#! [1#];

Rec {
main_$sgo =
    \r [void_0E sc1_s4AY sc2_s4AZ]
        case eqInteger# sc1_s4AY lvl_r4AM of {
          __DEFAULT ->
              case +# [sc2_s4AZ 1#] of sat_s4B2 {
                __DEFAULT ->
                    case minusInteger sc1_s4AY lvl1_r4AN of sat_s4B1 {
                      __DEFAULT -> main_$sgo void# sat_s4B1 sat_s4B2;
          1# -> let { sat_s4B3 = CCCS I#! [sc2_s4AZ]; } in  Unit# [sat_s4B3];
end Rec }

main2 = CCS_DONT_CARE S#! [1000000000#];

main1 =
    \r [void_0E]
        case main_$sgo void# main2 0# of {
          Unit# ipv1_s4B7 ->
              let { sat_s4B8 = \s [] $fShowInt_$cshow ipv1_s4B7;
              } in  hPutStr' stdout sat_s4B8 True void#;

main = \r [void_0E] main1 void#;

main3 = \r [void_0E] runMainIO1 main1 void#;

main = \r [void_0E] main3 void#;

The same program compiled down to much tighter loop that is almost entirely free of allocations. In fact, the only allocation that happens is when the loop terminates, and it’s just boxing the unboxed integer that’s been accumulating in the loop.

As we might hope, the performance of this is much better:

$ ./Main +RTS -s
  16,000,051,312 bytes allocated in the heap
         128,976 bytes copied during GC
          44,408 bytes maximum residency (1 sample(s))
          33,416 bytes maximum slop
               0 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     15258 colls,     0 par    0.031s   0.029s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    9.402s  (  9.405s elapsed)
  GC      time    0.031s  (  0.029s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    9.434s  (  9.434s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    1,701,712,595 bytes per MUT second

  Productivity  99.7% of total user, 99.7% of total elapsed

Our time in the garbage collector dropped by a factor of 10, from 0.3s to 0.03. Our total allocation dropped from 176GB (yes, you read that right) to 16GB (I’m still not entirely sure what this means, maybe someone can enlighten me). Most importantly our total runtime dropped from 54s to just under 10s. All this from just knowing what m is at compile time.

So GHC is capable of producing excellent code for monads - what are the circumstances under which this happens? We need, at least:

  1. The source code of the thing we’re compiling must be available. This means it’s either defined in the same module, or is available with an INLINABLE pragma (or GHC has chosen to add this itself).

  2. The definitions of >>= and friends must also be available in the same way.

These constraints start to feel a lot like needing whole program compilation, and in practice are unreasonable constraints to reach. To understand why, consider that most real world programs have a small Main module that opens some connections or opens some file handles, and then calls some library code defined in another module. If this code in the other module was already compiled, it will (probably) have been compiled as a function that takes a Monad dictionary, and just calls the >>= function repeatedly in the same manner as our original STG code. To get the allocation-free version, this library code needs to be available to the Main module itself - as that’s the module that choosing what type to instantiate ‘m’ with - which means the library code has to have marked that code as being inlinable. While we could add INLINE everywhere, this leads to an explosion in the amount of code produced, and can sky rocket compilation times.

Alexis’ eff library works around this by not being polymorphic in m. Instead, it chooses a concrete monad with all sorts of fancy continuation features. Likewise, if we commit to a particular monad (a transformer stack, or maybe using RIO), we again avoid this cost. Essentially, if the monad is known a priori at time of module compilation, GHC can go to town. However, the latter also commits to semantics - by choosing a transformer stack, we’re choosing a semantics for our monadic effects.

With the scene set, I now want to present you with another approach to solving this problem using Backpack.

A Backpack Primer

Vanilla GHC has a very simple module system - modules are essentially a method for name-spacing and separate compilation, they don’t do much more. The Backpack project extends this module system with a new concept - signatures. A signature is like the “type” of a module - a signature might mention the presence of some types, functions and type class instances, but it says nothing about what the definitions of these entities are. We’re going to (ab)use this system to build up transformer stacks at configuration time, and allow our library to be abstracted over different monads. By instantiating our library code with different monads, we get different interpretations of the same program.

I won’t sugar coat - what follows is going to pretty miserable. Extremely fun, but miserable to write in practice. I’ll let you decide if you want to inflict this misery on your coworkers in practice - I’m just here to show you it can be done!

A Signature for Monads

The first thing we’ll need is a signature for data types that are monads. This is essentially the “hole” we’ll rely on with our library code - it will give us the ability to say “there exists a monad”, without committing to any particular choice.

In our Cabal file, we have:

library monad-sig
  hs-source-dirs:   src-monad-sig
  signatures:       Control.Monad.Signature
  default-language: Haskell2010
  build-depends:    base

The important line here is signatures: Control.Monad.Signature which shows that this library is incomplete and exports a signature. The definition of Control/Monad/Signature.hsig is:

signature Control.Monad.Signature where

data M a
instance Functor M
instance Applicative M
instance Monad M

This simply states that any module with this signature has some type M with instances of Functor, Applicative and Monad.

Next, we’ll put that signature to use in our library code.

Libary Code

For our library code, we’ll start with a new library in our Cabal file:

library business-logic
  hs-source-dirs:   lib
  signatures:       BusinessLogic.Monad
  exposed-modules:  BusinessLogic
    , base
    , fused-effects
    , monad-sig

  default-language: Haskell2010
    monad-sig requires (Control.Monad.Signature as BusinessLogic.Monad)

Our business-logic library itself exports a signature, which is really just a re-export of the Control.Monad.Signature, but we rename it something more meaningful. It’s this module that will provide the monad that has all of the effects we need. Along with this signature, we also export the BusinessLogic module:

{-# language FlexibleContexts #-}
module BusinessLogic where

import BusinessLogic.Monad ( M )
import Control.Algebra ( Has )
import Control.Effect.Empty ( Empty, guard )

businessCode :: Has Empty sig M => Bool -> M Int
businessCode b = do
  guard b
  return 42

In this module I’m using fused-effects as a framework to say which effects my monad should have (though this is not particularly important, I just like it!). Usually Has would be applied to a type variable m, but here we’re applying it to the type M. This type comes from BusinessLogic.Monad, which is a signature (you can confirm this by checking against the Cabal file). Other than that, this is all pretty standard!

Backpack-ing Monad Transformers

Now we get into the really fun stuff - providing implementations of effects. I mentioned earlier that one possible way to do this is with a stack of monad transformers. Generally speaking, one would write a single newtype T m a for each effect type class, and have that transformer dispatch any effects in that class, and to lift any effects from other classes - deferring their implementation to m.

We’re going to take the same approach here, but we’ll absorb the idea of a transformer directly into the module itself. Let’s look at an implementation of the Empty effect. The Empty effect gives us a special empty :: m a function, which serves the purpose of stopping execution immediately. As a monad transformer, one implementation is MaybeT:

newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }

But we can also write this using Backpack. First, our Cabal library:

library fused-effects-empty-maybe
  hs-source-dirs:   src-fused-effects-backpack
  default-language: Haskell2010
    , base
    , fused-effects
    , monad-sig

  exposed-modules: Control.Carrier.Backpack.Empty.Maybe
    monad-sig requires (Control.Monad.Signature as Control.Carrier.Backpack.Empty.Maybe.Base)

Our library exports the module Control.Carrier.Backpack.Empty.Maybe, but also has a hole - the type of base monad this transformer stacks on top of. As a monad transformer, this would be the m parameter, but when we use Backpack, we move that out into a separate module.

The implementation of Control.Carrier.Backpack.Empty.Maybe is short, and almost identical to the body of Control.Monad.Trans.Maybe - we just change any occurrences of m to instead refer to M from our .Base module:

{-# language BlockArguments, FlexibleContexts, FlexibleInstances, LambdaCase,
      MultiParamTypeClasses, TypeOperators, UndecidableInstances #-}

module Control.Carrier.Backpack.Empty.Maybe where

import Control.Algebra
import Control.Effect.Empty
import qualified Control.Carrier.Backpack.Empty.Maybe.Base as Base

type M = EmptyT

-- We could also write: newtype EmptyT a = EmptyT { runEmpty :: MaybeT Base.M a }
newtype EmptyT a = EmptyT { runEmpty :: Base.M (Maybe a) }

instance Functor EmptyT where
  fmap f (EmptyT m) = EmptyT $ fmap (fmap f) m

instance Applicative EmptyT where
  pure = EmptyT . pure . Just
  EmptyT f <*> EmptyT x = EmptyT do
    f >>= \case
      Nothing -> return Nothing
      Just f' -> x >>= \case
        Nothing -> return Nothing
        Just x' -> return (Just (f' x'))

instance Monad EmptyT where
  return = pure
  EmptyT x >>= f = EmptyT do
    x >>= \case
      Just x' -> runEmpty (f x')
      Nothing -> return Nothing

Finally, we make sure that Empty can handle the Empty effect:

instance Algebra sig Base.M => Algebra (Empty :+: sig) EmptyT where
  alg handle sig context = case sig of
    L Empty -> EmptyT $ return Nothing
    R other -> EmptyT $ thread (maybe (pure Nothing) runEmpty ~<~ handle) other (Just context)

Base Monads

Now that we have a way to run the Empty effect, we need a base case to our transformer stack. As our transformer is now built out of modules that conform to the Control.Monad.Signature signature, we need some modules for each monad that we could use as a base. For this POC, I’ve just added the IO monad:

library fused-effects-lift-io
  hs-source-dirs:   src-fused-effects-backpack
  default-language: Haskell2010
  build-depends:    base
  exposed-modules:  Control.Carrier.Backpack.Lift.IO
module Control.Carrier.Backpack.Lift.IO where
type M = IO

That’s it!

Putting It All Together

Finally we can put all of this together into an actual executable. We’ll take our library code, instantiate the monad to be a combination of EmptyT and IO, and write a little main function that unwraps this all into an IO type. First, here’s the Main module:

module Main where

import BusinessLogic
import qualified BusinessLogic.Monad

main :: IO ()
main = print =<< BusinessLogic.Monad.runEmptyT (businessCode True)

The BusinessLogic module we’ve seen before, but previously BusinessLogic.Monad was a signature (remember, we renamed Control.Monad.Signature to BusinessLogic.Monad). In executables, you can’t have signatures - executables can’t be depended on, so it doesn’t make sense for them to have holes, they must be complete. The magic happens in our Cabal file:

executable test
  main-is:          Main.hs
  hs-source-dirs:   exe
    , base
    , business-logic
    , fused-effects-empty-maybe
    , fused-effects-lift-io
    , transformers

  default-language: Haskell2010
    fused-effects-empty-maybe (Control.Carrier.Backpack.Empty.Maybe as BusinessLogic.Monad) requires (Control.Carrier.Backpack.Empty.Maybe.Base as BusinessLogic.Monad.Base),
    fused-effects-lift-io (Control.Carrier.Backpack.Lift.IO as BusinessLogic.Monad.Base)

Wow, that’s a mouthful! The work is really happening in mixins. Let’s take this step by step:

  1. First, we can see that we need to mixin the fused-effects-empty-maybe library. The first (X as Y) section specifies a list of modules from fused-effects-empty-maybe and renames them for the test executable that’s currently being compiled. Here, we’re renaming Control.Carrier.Backpack.Empty.Maybe as BusinessLogic.Monad. By doing this, we satisfy the hole in the business-logic library, which was otherwise incomplete.

  2. But fused-effects-empty-maybe itself has a hole - the base monad for the transformer. The requires part lets us rename this hole, but we’ll still need to plug it. For now, we rename Control.Carrier.Backpack.Empty.Maybe.Base).

  3. Next, we mixin the fused-effects-lift-io library, and rename Control.Carrier.Backpack.Lift.IO to be BusinessLogic.Monad.Base. We’ve now satisfied the hole for fused-effects-empty-maybe, and our executable has no more holes and can be compiled.

We’re Done!

That’s “all” there is to it. We can finally run our program:

$ cabal run
Just 42

If you compare against businessCode you’ll see that we got passed the guard and returned 42. Because we instantiated BusinessLogic.Monad with a MaybeT-like transformer, this 42 got wrapped up in Just.

Is This Fast?

The best check here is to just look at the underlying code itself. If we add

{-# options -ddump-simpl -ddump-stg -dsuppress-all #-}

to BusinessLogic and recompile, we’ll see the final code output to STDERR. The core is:

  = \ @ sig_a2cM _ b_a13P eta_B1 ->
      case b_a13P of {
        False -> (# eta_B1, Nothing #);
        True -> (# eta_B1, lvl1_r2NP #)

and the STG:

businessCode1 =
    \r [$d(%,%)_s2PE b_s2PF eta_s2PG]
        case b_s2PF of {
          False -> (#,#) [eta_s2PG Nothing];
          True -> (#,#) [eta_s2PG lvl1_r2NP];



In this post, I’ve hopefully shown how we can use Backpack to write effectful code without paying the cost of abstraction. What I didn’t answer is the question of whether or you not you should. There’s a lot more to effectful code than I’ve presented, and it’s unclear to me whether this approach can scale to the needs. For example, if we needed something like mmorph’s MFunctor, what do we do? Are we stuck? I don’t know! Beyond these technical challenges, it’s clear that Backpack here is also not remotely ergonomic, as is. We’ve had to write five components just to get this done, and I pray for any one who comes to read this code and has to orientate themselves.

Nonetheless, I think this an interesting point of the effect design space that hasn’t been explored, and maybe I’ve motivated some people to do some further exploration.

The code for this blog post can be found at

Happy holidays, all!

by Oliver Charles at December 23, 2020 12:00 AM

December 16, 2020

Tweag I/O

Trustix: Distributed trust and reproducibility tracking for binary caches

Downloading binaries from well-known providers is the easiest way to install new software. After all, building software from source is a chore — it requires both time and technical expertise. But how do we know that we aren’t installing something malicious from these providers?

Typically, we trust these binaries because we trust the provider. We believe that they were built from trusted sources, in a trusted computational environment, and with trusted build instructions. But even if the provider does everything transparently and in good faith, the binaries could still be anything if the provider’s system is compromised. In other words, the build process requires trust even if all build inputs (sources, dependencies, build scripts, etc…) are known.

Overcoming this problem is hard — after all, how can we verify the output of arbitrary build inputs? Excitingly, the last years have brought about ecosystems such as Nix, where all build inputs are known and where significant amounts of builds are reproducible. This means that the correspondence between inputs and outputs can be verified by building the same binary multiple times! The r13y project, for example, tracks non-reproducible builds by building them twice on the same machine, showing that this is indeed practical.

But we can go further, and that’s the subject of this blog post, which introduces Trustix, a new tool we are working on. Trustix compares build outputs for given build inputs across independent providers and machines, effectively decentralizing trust. This establishes what I like to call build transparency because it verifies what black box build machines are doing. Behind the scenes Trustix builds a Merkle tree-based append-only log that maps build inputs to build outputs, which I’ll come back to in a later post. This log can be used to establish consensus whether certain build inputs always produce the same output — and can therefore be trusted. Conversely, it can also be used to uncover non-reproducible builds, corrupted or not, on a large scale.

The initial implementation of Trustix, and its description in this post are based on the Nix package manager. Nix focuses on isolated builds, provides access to the hashes of all build inputs as well as a high quantity of bit-reproducible packages, making it the ideal initial testing ecosystem. However, Trustix was designed to be system-independent, and is not strongly tied to Nix.

The developmentent of Trustix is funded by NLNet foundation and the European Commission’s Next Generation Internet programme through the NGI Zero PET (privacy and trust enhancing technologies) fund. The tool is still in development, but I’m very excited to announce it already!

How Nix verifies binary cache results

Most Linux package managers use a very simple signature scheme to secure binary distribution to users. Some use GPG keys, some use OpenSSL certificates, and others use some other kind of key, but the idea is essentially the same for all of them. The general approach is that binaries are signed with a private key, and clients can use an associated public key to check that a binary was really signed by the trusted entity.

Nix for example uses an ed25519-based key signature scheme and comes with a default hard-coded public key that corresponds to the default cache. This key can be overridden or complemented by others, allowing the use of additional caches. The list of signing keys can be found in /etc/nix/nix.conf. The default base64-encoded ed25519 public key with a name as additional metadata looks like this:

trusted-public-keys =

Now, in Nix, software is addressed by the hash of all of its build inputs (sources, dependencies and build instructions). This hash, or the output path is used to query a cache (like for a binary.

Here is an example: The hash of the hello derivation can be obtained from a shell with nix-instantiate:

$ nix-instantiate '<nixpkgs>' --eval -A hello.outPath

Here, behind the scenes, we have evaluated and hashed all build inputs that the hello derivation needs (.outPath is just a helper). This hash can then be used to query the default Nix binary cache:

$ curl
StorePath: /nix/store/w9yy7v61ipb5rx6i35zq1mvc2iqfmps1-hello-2.10
URL: nar/15zk4zszw9lgkdkkwy7w11m5vag11n5dhv2i6hj308qpxczvdddx.nar.xz
Compression: xz
FileHash: sha256:15zk4zszw9lgkdkkwy7w11m5vag11n5dhv2i6hj308qpxczvdddx
FileSize: 41232
NarHash: sha256:1mi14cqk363wv368ffiiy01knardmnlyphi6h9xv6dkjz44hk30i
NarSize: 205968
References: 9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31 w9yy7v61ipb5rx6i35zq1mvc2iqfmps1-hello-2.10

Besides links to the archive that contains the compressed binaries, this response includes two relevant pieces of information which are used to verify binaries from the binary cache(s):

  • The NarHash is a hash over all Nix store directory contents
  • The Sig is a cryptographic signature over the NarHash

With this information, the client can check that this binary really comes from the provider’s Nix store.

What are the limitations of this model?

While this model has served Nix and others well for many years it suffers from a few problems. All of these problems can be traced back to a single point of failure in the chain of trust:

  • First, if the key used by is ever compromised, all builds that were ever added to the cache can be considered tainted.
  • Second, one needs to put either full trust or no trust at all in the build machines of a binary cache — there is no middle ground.
  • Finally, there is no inherent guarantee that the build inputs described in the Nix expressions were actually used to build what’s in the cache.


Trustix aims to solve these problems by assembling a mapping from build inputs to (hashes of) build outputs provided by many build machines.

Instead of relying on verifying packages signatures, like the traditional Nix model does, Trustix only exposes packages that it considers trustworthy. Concretely, Trustix is configured as a proxy for a binary cache, and hides the packages which are not trustworthy. As far as Nix is concerned, the package not being trustworthy is exactly as if the package wasn’t stored in the binary cache to begin with. If such a package is required, Nix will therefore build it from source.

Trustix doesn’t define what a trustworthy package is. What your Trustix considers trustworthy is up to you. The rules for accepting packages are entirely configurable. In fact, in the current prototype, there isn’t a default rule for packages to count as trustworthy: you need to configure trustworthiness yourself.

With this in mind, let’s revisit the above issues

  • In Trustix, if an entity is compromised, you can rely on all other entities in the network to establish that a binary artefact is trustworthy. Maybe a few hashes are wrong in the Trustix mapping, but if an overwhelming majority of the outputs are the same, you can trust that the corresponding artefact is indeed what you would have built yourself.

    Therefore you never need to invalidate an entire binary cache: you can still verify the trustworthiness of old packages, even if newer packages are built by a malicious actor.

  • In Trustix, you never typically consider any build machine to be fully trusted. You always check their results against the other build machines. You can further configure this by considering some machines as more trusted (maybe because it is a community-operated machine, and you trust said community) or less trusted (for instance, because it has been compromised in the past, and you fear it may be compromised again).

    Moreover, in the spirit of having no single point of failure, Trustix’s mapping is not kept in a central database. Instead every builder keeps a log of its builds; these logs are aggregated on your machine by your instance of the Trustix daemon. Therefore even the mapping itself doesn’t have to be fully trusted.

  • In Trustix, package validity is not ensured by a signature scheme. Instead Trustix relies on the consistency of the input to output mapping. As a consequence, the validity criterion, contrary to a signature scheme, links the output to the input. It makes it infeasible to pass the build result of input I as a build result for input J: it would require corrupting the entire network.

Limitations: reproducibility tracking and non-reproducible builds

A system like Trustix will not work well with builds that are non-reproducible, which is a limitation of this model. After all, you cannot reach consensus if everyone’s opinions differ.

However, Trustix can still be useful, even for non-reproducible builds! By accumulating all the data in the various logs and aggregating them, we can track which derivations are non-reproducible over all of Nixpkgs, in a way that is easier than previously possible. Whereas the r13y project builds a single closure on a single machine, Trustix will index everything ever built on every architecture.


I am very excited to be working on the next generation of tooling for trust and reproducibility, and for the purely functional software packaging model pioneered by Nix to keep enabling new use cases. I hope that this work can be a foundation for many other applications other than improving trust — for example, by enabling the Nix community to support new CPU architectures with community binary caches.

Please check out the code at the repo or join us for a chat over in #untrustix on Freenode. And stay tuned — in the next blog post, we will talk more about Merkle trees and how they are used in Trustix.



December 16, 2020 12:00 AM

November 29, 2020

Sander van der Burg

Constructing a simple alerting system with well-known open source projects

Some time ago, I have been experimenting with all kinds of monitoring and alerting technologies. For example, with the following technologies, I can develop a simple alerting system with relative ease:

  • Telegraf is an agent that can be used to gather measurements and transfer the corresponding data to all kinds of storage solutions.
  • InfluxDB is a time series database platform that can store, manage and analyze timestamped data.
  • Kapacitor is a real-time streaming data process engine, that can be used for a variety of purposes. I can use Kapacitor to analyze measurements and see if a threshold has been exceeded so that an alert can be triggered.
  • Alerta is a monitoring system that can store, de-duplicate alerts, and arrange black outs.
  • Grafana is a multi-platform open source analytics and interactive visualization web application.

These technologies appear to be quite straight forward to use. However, as I was learning more about them, I discovered a number of oddities, that may have big implications.

Furthermore, testing and making incremental changes also turns out to be much more challenging than expected, making it very hard to diagnose and fix problems.

In this blog post, I will describe how I built a simple monitoring and alerting system, and elaborate about my learning experiences.

Building the alerting system

As described in the introduction, I can combine several technologies to create an alerting system. I will explain them more in detail in the upcoming sections.


Telegraf is a pluggable agent that gathers measurements from a variety of inputs (such as system metrics, platform metrics, database metrics etc.) and sends them to a variety of outputs, typically storage solutions (database management systems such as InfluxDB, PostgreSQL or MongoDB). Telegraf has a large plugin eco-system that provides all kinds integrations.

In this blog post, I will use InfluxDB as an output storage backend. For the inputs, I will restrict myself to capturing a sub set of system metrics only.

With the following telegraf.conf configuration file, I can capture a variety of system metrics every 10 seconds:

interval = "10s"

urls = [ "https://test1:8086" ]
database = "sysmetricsdb"
username = "sysmetricsdb"
password = "sysmetricsdb"

# no configuration

## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## If true, collect raw CPU time metrics.
collect_cpu_time = false
## If true, compute and report the sum of all non-idle CPU states.
report_active = true

# no configuration

With the above configuration file, I can collect the following metrics:
  • System metrics, such as the hostname and system load.
  • CPU metrics, such as how much the CPU cores on a machine are utilized, including the total CPU activity.
  • Memory (RAM) metrics.

The data will be stored in an InfluxDB database name: sysmetricsdb hosted on a remote machine with host name: test1.


As explained earlier, InfluxDB is a timeseries platform that can store, manage and analyze timestamped data. In many ways, InfluxDB resembles relational databases, but there are also some notable differences.

The query language that InfluxDB uses is called InfluxQL (that shares many similarities with SQL).

For example, with the following query I can retrieve the first three data points from the cpu measurement, that contains the CPU-related measurements collected by Telegraf:

> precision rfc3339
> select * from "cpu" limit 3

providing me the following result set:

name: cpu
time cpu host usage_active usage_guest usage_guest_nice usage_idle usage_iowait usage_irq usage_nice usage_softirq usage_steal usage_system usage_user
---- --- ---- ------------ ----------- ---------------- ---------- ------------ --------- ---------- ------------- ----------- ------------ ----------
2020-11-16T15:36:00Z cpu-total test2 10.665258711721098 0 0 89.3347412882789 0.10559662090813073 0 0 0.10559662090813073 0 8.658922914466714 1.79514255543822
2020-11-16T15:36:00Z cpu0 test2 10.665258711721098 0 0 89.3347412882789 0.10559662090813073 0 0 0.10559662090813073 0 8.658922914466714 1.79514255543822
2020-11-16T15:36:10Z cpu-total test2 0.1055966209080346 0 0 99.89440337909197 0 0 0 0.10559662090813073 0 0 0

As you may probably notice by looking at the output above, every data point has a timestamp and a number of fields capturing CPU metrics:

  • cpu identifies the CPU core.
  • host contains the host name of the machine.
  • The remainder of the fields contain all kinds of CPU metrics, e.g. how much CPU time is consumed by the system (usage_system), the user (usage_user), by waiting for IO (usage_iowait) etc.
  • The usage_active field contains the total CPU activity percentage, which is going to be useful to develop an alert that will warn us if there is too much CPU activity for a long period of time.

Aside from the fact that all data is timestamp based, data in InfluxDB has another notable difference compared to relational databases: an InfluxDB database is schemaless. You can add an arbitrary number of fields and tags to a data point without having to adjust the database structure (and migrating existing data to the new database structure).

Fields and tags can contain arbitrary data, such as numeric values or strings. Tags are also indexed so that you can search for these values more efficiently. Furthermore, tags can be used to group data.

For example, the cpu measurement collection has the following tags:

> SHOW TAG KEYS ON "sysmetricsdb" FROM "cpu";
name: cpu

As shown in the above output, the cpu and host fields are tags in the cpu measurement.

We can use these tags to search for all data points related to a CPU core and/or host machine. Moreover, we can use these tags for grouping allowing us to compute aggregate values, sch as the mean value per CPU core and host.

Beyond storing and retrieving data, InfluxDB has many useful additional features:

  • You can also automatically sample data and run continuous queries that generate and store sampled data in the background.
  • Configure retention policies so that data is no longer stored for an indefinite amount of time. For example, you can configure a retention policy to drop raw data after a certain amount of time, but retain the corresponding sampled data.

InfluxDB has a "open core" development model. The free and open source edition (FOSS) of InfluxDB server (that is MIT licensed) allows you to host multiple databases on a multiple servers.

However, if you also want horizontal scalability and/or high assurance, then you need to switch to the hosted InfluxDB versions -- data in InfluxDB is partitioned into so-called shards of a fixed size (the default shard size is 168 hours).

These shards can be distributed over multiple InfluxDB servers. It is also possible to deploy multiple read replicas of the same shard to multiple InfluxDB servers improving read speed.


Kapacitor is a real-time streaming data process engine developed by InfluxData -- the same company that also develops InfluxDB and Telegraf.

It can be used for all kinds of purposes. In my example cases, I will only use it to determine whether some threshold has been exceeded and an alert needs to be triggered.

Kapacitor works with customly implemented tasks that are written in a domain-specific language called the TICK script language. There are two kinds of tasks: stream and batch tasks. Both task types have advantages and disadvantages.

We can easily develop an alert that gets triggered if the CPU activity level is high for a relatively long period of time (more than 75% on average over 1 minute).

To implement this alert as a stream job, we can write the following TICK script:

dbrp "sysmetricsdb"."autogen"

.groupBy('host', 'cpu')
.where(lambda: "cpu" != 'cpu-total')
.message('Host: {{ index .Tags "host" }} has high cpu usage: {{ index .Fields "mean" }}')
.warn(lambda: "mean" > 75.0)
.crit(lambda: "mean" > 85.0)
.resource('{{ index .Tags "host" }}/{{ index .Tags "cpu" }}')
.event('cpu overload')
.value('{{ index .Fields "mean" }}')

A stream job is built around the following principles:

  • A stream task does not execute queries on an InfluxDB server. Instead, it creates a subscription to InfluxDB -- whenever a data point gets inserted into InfluxDB, the data points gets forwarded to Kapacitor as well.

    To make subscriptions work, both InfluxDB and Kapacitor need to be able to connect to each other with a public IP address.
  • A stream task defines a pipeline consisting of a number of nodes (connected with the | operator). Each node can consume data points, filter, transform, aggregate, or execute arbitrary operations (such as calling an external service), and produce new data points that can be propagated to the next node in the pipeline.
  • Every node also has property methods (such as .measurement('cpu')) making it possible to configure parameters.

The TICK script example shown above does the following:

  • The from node consumes cpu data points from the InfluxDB subscription, groups them by host and cpu and filters out data points with the the cpu-total label, because we are only interested in the CPU consumption per core, not the total amount.
  • The window node states that we should aggregate data points over the last 1 minute and pass the resulting (aggregated) data points to the next node after one minute in time has elapsed. To aggregate data, Kapacitor will buffer data points in memory.
  • The mean node computes the mean value for usage_active for the aggregated data points.
  • The alert node is used to trigger an alert of a specific severity level (WARNING if the mean activity percentage is bigger than 75%) and (CRITICAL if the mean activity percentage is bigger than 85%). In the remainder of the case, the status is considered OK. The alert is sent to Alerta.

It is also possible to write a similar kind of alerting script as a batch task:

dbrp "sysmetricsdb"."autogen"

SELECT mean("usage_active")
FROM "sysmetricsdb"."autogen"."cpu"
WHERE "cpu" != 'cpu-total'
.groupBy('host', 'cpu')
.message('Host: {{ index .Tags "host" }} has high cpu usage: {{ index .Fields "mean" }}')
.warn(lambda: "mean" > 75.0)
.crit(lambda: "mean" > 85.0)
.resource('{{ index .Tags "host" }}/{{ index .Tags "cpu" }}')
.event('cpu overload')
.value('{{ index .Fields "mean" }}')

The above TICK script looks similar to the stream task shown earlier, but instead of using a subscription, the script queries the InfluxDB database (with an InfluxQL query) for data points over the last minute with a query node.

Which approach for writing a CPU alert is best, you may wonder? Each of these two approaches have their pros and cons:

  • Stream tasks offer low latency responses -- when a data point appears, a stream task can immediately respond, whereas a batch task needs to query every minute all the data points to compute the mean percentage over the last minute.
  • Stream tasks maintain a buffer for aggregating the data points making it possible to only send incremental updates to Alerta. Batch tasks are stateless. As a result, they need to update the status of all hosts and CPUs every minute.
  • Processing data points is done synchronously and in sequential order -- if an update round to Alerta takes too long (which is more likely to happen with a batch task), then the next processing run may overlap with the previous, causing all kinds of unpredictable results.

    It may also cause Kapacitor to eventually crash due to growing resource consumption.
  • Batch tasks may also miss data points -- while querying data over a certain time window, it may happen that a new data point gets inserted in that time window (that is being queried). This new data point will not be picked up by Kapacitor.

    A subscription made by a stream task, however, will never miss any data points.
  • Stream tasks can only work with data points that appear from the moment Kapacitor is started -- it cannot work with data points in the past.

    For example, if Kapacitor is restarted and some important event is triggered in the restart time window, Kapacitor will not notice that event, causing the alert to remain in its previous state.

    To work effectively with stream tasks, a continuous data stream is required that frequently reports on the status of a resource. Batch tasks, on the other hand, can work with historical data.
  • The fact that nodes maintain a buffer may also cause the RAM consumption of Kapacitor to grow considerably, if the data volumes are big.

    A batch task on the other hand, does not buffer any data and is more memory efficient.

    Another compelling advantage of batch tasks over stream tasks is that InfluxDB does all the work. The hosted version of InfluxDB can also horizontally scale.
  • Batch tasks can also aggregate data more efficiently (e.g. computing the mean value or sum of values over a certain time period).

I consider neither of these script types the optimal solution. However, for implementing the alerts I tend to have a slight preference for stream jobs, because of its low latency, and incremental update properties.


As explained in the introduction, Alerta is a monitoring system that can store and de-duplicate alerts, and arrange black outs.

The Alerta server provides a REST API that can be used to query and modify alerting data and uses MongoDB or PostgreSQL as a storage database.

There are also a variety of Alerta clients: there is the alerta-cli allows you to control the service from the command-line. There is also a web user interface that I will show later in this blog post.

Running experiments

With all the components described above in place, we can start running experiments to see if the CPU alert will work as expected. To gain better insights in the process, I can install Grafana that allows me to visualize the measurements that are stored in InfluxDB.

Configuring a dashboard and panel for visualizing the CPU activity rate was straight forward. I configured a new dashboard, with the following variables:

The above variables allow me to select for each machine in the network, which CPU core's activity percentage I want to visualize.

I have configured the CPU panel as follows:

In the above configuration, I query the usage_activity from the cpu measurement collection, using the dashboard variables: cpu and host to filter for the right target machine and CPU core.

I have also configured the field unit to be a percentage value (between 0 and 100).

When running the following command-line instruction on a test machine that runs Telegraf (test2), I can deliberately hog the CPU:

$ dd if=/dev/zero of=/dev/null

The above command reads zero bytes (one-by-one) and discards them by sending them to /dev/null, causing the CPU to remain utilized at a high level:

In the graph shown above, it is clearly visible that CPU core 0 on the test2 machine remains utilized at 100% for several minutes.

(As a sidenote, we can also hog both the CPU and consume RAM at the same time with a simple command line instruction).

If we keep hogging the CPU and wait for at least a minute, the Alerta web interface dashboard will show a CRITICAL alert:

If we stop the dd command, then the TICK script should eventually notice that the mean percentage drops below the WARNING threshold causing the alert to go back into the OK state and disappearing from the Alerta dashboard.

Developing test cases

Being able to trigger an alert with a simple command-line instruction is useful, but not always convenient or effective -- one of the inconveniences is that we always have to wait at least one minute to get feedback.

Moreover, when an alert does not work, it is not always easy to find the root cause. I have encountered the following problems that contribute to a failing alert:

  • Telegraf may not be running and, as a result, not capturing the data points that need to be analyzed by the TICK script.
  • A subscription cannot be established between InfluxDB and Kapacitor. This may happen when Kapacitor cannot be reached through a public IP address.
  • There are data points collected, but only the wrong kinds of measurements.
  • The TICK script is functionally incorrect.

Fortunately, for stream tasks it is relatively easy to quickly find out whether an alert is functionally correct or not -- we can generate test cases that almost instantly trigger each possible outcome with a minimal amount of data points.

An interesting property of stream tasks is that they have no notion of time -- the .window(1m) property may suggest that Kapacitor computes the mean value of the data points every minute, but that is not what it actually does. Instead, Kapacitor only looks at the timestamps of the data points that it receives.

When Kapacitor sees that the timestamps of the data points fit in the 1 minute time window, then it keeps buffering. As soon as a data point appears that is outside this time window, the window node relays an aggregated data point to the next node (that computes the mean value, than in turn is consumed by the alert node deciding whether an alert needs to be raised or not).

We can exploit that knowledge, to create a very minimal bash test script that triggers every possible outcome: OK, WARNING and CRITICAL:

influxCmd="influx -database sysmetricsdb -host test1"

export ALERTA_ENDPOINT="https://test1"

### Trigger CRITICAL alert

# Force the average CPU consumption to be 100%
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=100 0000000000"
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=100 60000000000"
# This data point triggers the alert
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=100 120000000000"

sleep 1
actualSeverity=$(alerta --output json query | jq '.[0].severity')

if [ "$actualSeverity" != "critical" ]
echo "Expected severity: critical, but we got: $actualSeverity" >&2

### Trigger WARNING alert

# Force the average CPU consumption to be 80%
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=80 180000000000"
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=80 240000000000"
# This data point triggers the alert
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=80 300000000000"

sleep 1
actualSeverity=$(alerta --output json query | jq '.[0].severity')

if [ "$actualSeverity" != "warning" ]
echo "Expected severity: warning, but we got: $actualSeverity" >&2

### Trigger OK alert

# Force the average CPU consumption to be 0%
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=0 300000000000"
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=0 360000000000"
# This data point triggers the alert
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=0 420000000000"

sleep 1
actualSeverity=$(alerta --output json query | jq '.[0].severity')

if [ "$actualSeverity" != "ok" ]
echo "Expected severity: ok, but we got: $actualSeverity" >&2

The shell script shown above automatically triggers all three possible outcomes of the CPU alert:

  • CRITICAL is triggered by generating data points that force a mean activity percentage of 100%.
  • WARNING is triggered by a mean activity percentage of 80%.
  • OK is triggered by a mean activity percentage of 0%.

It uses the Alerta CLI to connect to the Alerta server to check whether the alert's severity level has the expected value.

We need three data points to trigger each alert type -- the first two data points are on the boundaries of the 1 minute window (0 seconds and 60 seconds), forcing the mean value to become the specified CPU activity percentage.

The third data point is deliberately outside the time window (of 1 minute), forcing the alert node to be triggered with a mean value over the previous two data points.

Although the above test strategy works to quickly validate all possible outcomes, one impractical aspect is that the timestamps in the above example start with 0 (meaning 0 seconds after the epoch: January 1st 1970 00:00 UTC).

If we also want to observe the data points generated by the above script in Grafana, we need to configure the panel to go back in time 50 years.

Fortunately, I can also easily adjust the script to start with a base timestamp, that is 1 hour in the past:

offset="$(($(date +%s) - 3600))"

With this tiny adjustment, we should see the following CPU graph (displaying data points from the last hour) after running the test script:

As you may notice, we can see that the CPU activity level quickly goes from 100%, to 80%, to 0%, using only 9 data points.

Although testing stream tasks (from a functional perspective) is quick and convenient, testing batch tasks in a similar way is difficult. Contrary to the stream task implementation, the query node in the batch task does have a notion of time (because of the WHERE clause that includes the now() expression).

Moreover, the embedded InfluxQL query evaluates the mean values every minute, but the test script does not exactly know when this event triggers.

The only way I could think of to (somewhat reliably) validate the outcomes is by creating a test script that continuously inserts data points for at least double the time window size (2 minutes) until Alerta reports the right alert status (if it does not after a while, I can conclude that the alert is incorrectly implemented).

Automating the deployment

As you may probably have already guessed, to be able to conveniently experiment with all these services, and to reliably run tests in isolation, some form of deployment automation is an absolute must-have.

Most people who do not know anything about my deployment technology preferences, will probably go for Docker or docker-compose, but I have decided to use a variety of solutions from the Nix project.

NixOps is used to automatically deploy a network of NixOS machines -- I have created a logical and physical NixOps configuration that deploys two VirtualBox virtual machines.

With the following command I can create and deploy the virtual machines:

$ nixops create network.nix network-virtualbox.nix -d test
$ nixops deploy -d test

The first machine: test1 is responsible for hosting the entire monitoring infrastructure (InfluxDB, Kapacitor, Alerta, Grafana), and the second machine (test2) runs Telegraf and the load tests.

Disnix (my own deployment tool) is responsible for deploying all services, such as InfluxDB, Kapacitor, Alarta, and the database storage backends. Contrary to docker-compose, Disnix does not work with containers (or other Docker objects, such as networks or volumes), but with arbitrary deployment units that are managed with a plugin system called Dysnomia.

Moreover, Disnix can also be used for distributed deployment in a network of machines.

I have packaged all the services and captured them in a Disnix services model that specifies all deployable services, their types, and their inter-dependencies.

If I combine the services model with the NixOps network models, and a distribution model (that maps Telegraf and the test scripts to the test2 machine and the remainder of the services to the first: test1), I can deploy the entire system:


$ disnixos-env -s services.nix \
-n network.nix \
-n network-virtualbox.nix \
-d distribution.nix

The following diagram shows a possible deployment scenario of the system:

The above diagram describes the following properties:

  • The light-grey colored boxes denote machines. In the above diagram, we have two of them: test1 and test2 that correspond to the VirtualBox machines deployed by NixOps.
  • The dark-grey colored boxes denote containers in a Disnix-context (not to be confused with Linux or Docker containers). These are environments that manage other services.

    For example, a container service could be the PostgreSQL DBMS managing a number of PostgreSQL databases or the Apache HTTP server managing web applications.
  • The ovals denote services that could be any kind of deployment unit. In the above example, we have services that are running processes (managed by systemd), databases and web applications.
  • The arrows denote inter-dependencies between services. When a service has an inter-dependency on another service (i.e. the arrow points from the former to the latter), then the latter service needs to be activated first. Moreover, the former service also needs to know how the latter can be reached.
  • Services can also be container providers (as denoted by the arrows in the labels), stating that other services can be embedded inside this service.

    As already explained, the PostgreSQL DBMS is an example of such a service, because it can host multiple PostgreSQL databases.

Although the process components in the diagram above can also be conveniently deployed with Docker-based solutions (i.e. as I have explained in an earlier blog post, containers are somewhat confined and restricted processes), the non-process integrations need to be managed by other means, such as writing extra shell instructions in Dockerfiles.

In addition to deploying the system to machines managed by NixOps, it is also possible to use the NixOS test driver -- the NixOS test driver automatically generates QEMU virtual machines with a shared Nix store, so that no disk images need to be created, making it possible to quickly spawn networks of virtual machines, with very small storage footprints.

I can also create a minimal distribution model that only deploys the services required to run the test scripts -- Telegraf, Grafana and the front-end applications are not required, resulting in a much smaller deployment:

As can be seen in the above diagram, there are far fewer components required.

In this virtual network that runs a minimal system, we can run automated tests for rapid feedback. For example, the following test driver script (implemented in Python) will run my test shell script shown earlier:


With the following command I can automatically run the tests on the terminal:

$ nix-build release.nix -A tests


The deployment recipes, test scripts and documentation describing the configuration steps are stored in the monitoring playground repository that can be obtained from my GitHub page.

Besides the CPU activity alert described in this blog post, I have also developed a memory alert that triggers if too much RAM is consumed for a longer period of time.

In addition to virtual machines and services, there is also deployment automation in place allowing you also easily deploy Kapacitor TICK scripts and Grafana dashboards.

To deploy the system, you need to use the very latest version of Disnix (version 0.10) that was released very recently.


I would like to thank my employer: Mendix for writing this blog post. Mendix allows developers to work two days per month on research projects, making projects like these possible.


I have given a presentation about this subject at Mendix. For convienence, I have embedded the slides:

by Sander van der Burg ( at November 29, 2020 08:57 PM

November 18, 2020

Tweag I/O

Self-references in a content-addressed Nix

In a previous post I explained why we were eagerly trying to change the Nix store model to allow for content-addressed derivations. I also handwaved that this was a real challenge, but without giving any hint at why this could be tricky. So let’s dive a bit into the gory details and understand some of the conceptual pain points with content-addressability in Nix, which forced us to some trade-offs in how we handle content-addressed paths.

What are self-references?

This is a self-reference

Théophane Hufschmitt, This very article

A very trivial Nix derivation might look like this:

with import <nixpkgs> {};
writeScript "hello" ''


The result of this derivation will be an executable file containing a script that will run the hello program. It will depend on the bash and hello derivations as we refer to them in the file.

We can build this derivation and execute it:

$ nix-build hello.nix
$ ./result
Hello, world!

So far, so good. Let’s now change our derivation to change the prompt of hello to something more personalized:

with import <nixpkgs> {};
writeScript "hello-its-me" ''

echo "Hello, world! This is ${placeholder "out"}"

where ${placeholder "out"} is a magic value that will be replaced by the output path of the derivation during the build.

We can build this and run the result just fine

$ nix-build hello-its-me.nix
$ ./result
Hello, world! This is /nix/store/c0qw0gbp7rfyzm7x7ih279pmnzazg86p-hello-its-me

And we can check that the file is indeed who it claims to be:

$ /nix/store/c0qw0gbp7rfyzm7x7ih279pmnzazg86p-hello-its-me
Hello, world! This is /nix/store/c0qw0gbp7rfyzm7x7ih279pmnzazg86p-hello-its-me

While the hello derivation depends on bash and hello, hello-its-me depends on bash and… itself. This is something rather common in Nix. For example, it’s rather natural for a C program to have /nix/store/xxx-foo/bin/foo depend of /nix/store/xxx-foo/lib/

Self references and content-addressed paths

How do we build a content-addressed derivation foo in Nix? The recipe is rather simple:

  1. Build the derivation in a temporary directory /some/where/
  2. Compute the hash xxx of that /some/where/ directory
  3. Move the directory under /nix/store/xxx-foo/

You might see where things will go wrong with self-references: the reference will point to /some/where rather than /nix/store/xxx-foo, and so will be wrong (in addition to leak a path to what should just be a temporary directory).

To work around that, we would need to compute this xxx hash before the build, but that’s quite impossible as the hash depends on the content of the directory, including the value of the self-references.

However, we can hack our way around it in most cases by allowing ourselves a bit of heuristic. The only assumption that we need to make is that all the self-references will appear textually (i.e. running strings on a file that contains self-references will print all the self-references out).

Under that assumption, we can:

  1. Build the derivation in our /some/where directory
  2. Replace all the occurrences of a self-reference by a magic value
  3. Compute the hash of the resulting path to determine the final path
  4. Replace all the occurrences of the magic value by the final path
  5. Move the resulting store path to its final path

Now you might think that this is a crazy hack − there’s so many ways it could break. And in theory you’ll be right. But, surprisingly, this works remarkably well in practice. You might also notice that pedantically speaking this scheme isn’t exactly content-addressing because of the “modulo the final hash” part. But this is close-enough to keep all the desirable properties of proper content addressing, while also enabling self-references, which wouldn’t be possible otherwise. For example, the Fugue cloud deployment system used a generalisation of this technique which not only deals with self-references, but with reference cycles of arbitrary length.

However, there’s a key thing that’s required for this to work: patching strings in binaries is generally benign, but the final string must have the same length as the original one. But we can do that: we don’t know what the final xxx hash will be, but we know its length (because it’s a fixed-length hash), so we can just choose a temporary directory that has the right length (like a temporary store path with the same name), and we’re all set!

The annoying thing is that there’s no guarantee that there are no self-references hidden in such a way that a textual replacement won’t catch it (for example inside a compressed zip file). This is the main reason why content-addressability will not be the default in Nix, at first at least.

Non-deterministic builds − the diamond problem strikes back

No matter how hard Nix tries to isolate the build environment, some actions will remain inherently non-deterministic − anything that can yield a different output depending on the order in which concurrent tasks will be executed for example. This is an annoyance as it might prevent early cutoff (see our previous article on the subject in case you missed it).

But more than limiting the efficiency of the cache, this could also hurt the correctness of Nix if we’re not careful enough.

For example, consider the following dependency graph:

Dependency graph for foo

Alice wants to get foo installed. She already built lib0 and lib1 locally. Let’s call them lib0_a and lib1_a. The binary cache contains builds of lib0 and lib2. Let’s call them lib0_b and lib2_b. Because the build of lib0 is not deterministic, lib0_a and lib0_b are different — and so have a different hash. In a content-addressed word, that means they will be stored in different paths.

A simple cache implementation would want to fetch lib2_b from the cache and use it to build foo. This would also pull lib0_b, because it’s a dependency of lib2_b. But that would mean that foo would depend on both lib0_a and lib0_b.

Buggy runtime dependency graph for foo

In the happy case this would just be a waste of space − the dependency is duplicated, so we use twice as much memory to store it. But in many cases this would simply blow-up at some point — for example if lib0 is a shared library, the C linker will fail because of the duplicated symbols. Besides that, this breaks down the purity of the build as we get a different behavior depending on what’s already in the store at the start of the build.

Getting out of this

Nix’s foundational paper shows a way out of this by rewriting hashes in substituted paths. This is however quite complex to implement for a first version, so the current implementation settles down on a simpler (though not optimal) behavior where we only allow one build for each derivation. In the example above, lib0 has already been instantiated (as lib0_a), so we don’t allow pulling in lib0_b (nor lib1_b) and we rebuild both lib1 and foo.

While not optimal − we’ll end-up rebuilding foo even if it’s already in the binary cache − this solution has the advantage of preserving correctness while staying conceptually and technically simple.

What now?

Part of this has already been implemented but there’s still quite a long way forward.

I hope for it to be usable (though maybe still experimental) for Nix 3.0.

And in the meantime stay tuned with our regular updates on discourse. Or wait for the next blog post that will explain another change that will be necessary — one that is less fundamental, but more user-facing.

November 18, 2020 12:00 AM

November 10, 2020


Write access control for binary caches

As Cachix is growing, I have noticed a few issues along the way: Signing keys are still the best way to upload content and not delegate trust to Cachix, but users have also found that they can be difficult to manage, particularly if the secret key needs to be rotated. At this point, the best option is to clear out the cache completely, and re-sign everything with a newly generated key.

by Domen Kožar ( at November 10, 2020 11:00 AM

October 31, 2020

Sander van der Burg

Building multi-process Docker images with the Nix process management framework

Some time ago, I have described my experimental Nix-based process management framework that makes it possible to automatically deploy running processes (sometimes also ambiguously called services) from declarative specifications written in the Nix expression language.

The framework is built around two concepts. As its name implies, the Nix package manager is used to deploy all required packages and static artifacts, and a process manager of choice (e.g. sysvinit, systemd, supervisord and others) is used to manage the life-cycles of the processes.

Moreover, it is built around flexible concepts allowing integration with solutions that are not qualified as process managers (but can still be used as such), such as Docker -- each process instance can be deployed as a Docker container with a shared Nix store using the host system's network.

As explained in an earlier blog post, Docker has become such a popular solution that it has become a standard for deploying (micro)services (often as a utility in the Kubernetes solution stack).

When deploying a system that consists of multiple services with Docker, a typical strategy (and recommended practice) is to use multiple containers that have only one root application process. Advantages of this approach is that Docker can control the life-cycles of the applications, and that each process is (somewhat) isolated/protected from other processes and the host system.

By default, containers are isolated, but if they need to interact with other processes, then they can use all kinds of integration facilities -- for example, they can share namespaces, or use shared volumes.

In some situations, it may also be desirable to deviate from the one root process per container practice -- for some systems, processes may need to interact quite intensively (e.g. with IPC mechanisms, shared files or shared memory, or a combination these) in which the container boundaries introduce more inconveniences than benefits.

Moreover, when running multiple processes in a single container, common dependencies can also typically be more efficiently shared leading to lower disk and RAM consumption.

As explained in my previous blog post (that explores various Docker concepts), sharing dependencies between containers only works if containers are constructed from images that share the same layers with the same shared libraries. In practice, this form of sharing is not always as efficient as we want it to be.

Configuring a Docker image to run multiple application processes is somewhat cumbersome -- the official Docker documentation describes two solutions: one that relies on a wrapper script that starts multiple processes in the background and a loop that waits for the "main process" to terminate, and the other is to use a process manager, such as supervisord.

I realised that I could solve this problem much more conveniently by combining the dockerTools.buildImage {} function in Nixpkgs (that builds Docker images with the Nix package manager) with the Nix process management abstractions.

I have created my own abstraction function: createMultiProcessImage that builds multi-process Docker images, managed by any supported process manager that works in a Docker container.

In this blog post, I will describe how this function is implemented and how it can be used.

Creating images for single root process containers

As shown in earlier blog posts, creating a Docker image with Nix for a single root application process is very straight forward.

For example, we can build an image that launches a trivial web application service with an embedded HTTP server (as shown in many of my previous blog posts), as follows:

{dockerTools, webapp}:

dockerTools.buildImage {
name = "webapp";
tag = "test";

runAsRoot = ''
groupadd webapp
useradd webapp -g webapp -d /dev/null

config = {
Env = [ "PORT=5000" ];
Cmd = [ "${webapp}/bin/webapp" ];
Expose = {
"5000/tcp" = {};

The above Nix expression (default.nix) invokes the dockerTools.buildImage function to automatically construct an image with the following properties:

  • The image has the following name: webapp and the following version tag: test.
  • The web application service requires some state to be initialized before it can be used. To configure state, we can run instructions in a QEMU virual machine with root privileges (runAsRoot).

    In the above deployment Nix expression, we create an unprivileged user and group named: webapp. For production deployments, it is typically recommended to drop root privileges, for security reasons.
  • The Env directive is used to configure environment variables. The PORT environment variable is used to configure the TCP port where the service should bind to.
  • The Cmd directive starts the webapp process in foreground mode. The life-cycle of the container is bound to this application process.
  • Expose exposes TCP port 5000 to the public so that the service can respond to requests made by clients.

We can build the Docker image as follows:

$ nix-build

load it into Docker with the following command:

$ docker load -i result

and launch a container instance using the image as a template:

$ docker run -it -p 5000:5000 webapp:test

If the deployment of the container succeeded, we should get a response from the webapp process, by running:

$ curl http://localhost:5000
<!DOCTYPE html>
<title>Simple test webapp</title>
Simple test webapp listening on port: 5000

Creating multi-process images

As shown in previous blog posts, the webapp process is part of a bigger system, namely: a web application system with an Nginx reverse proxy forwarding requests to multiple webapp instances:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager

sharedConstructors = import ../services-agnostic/constructors.nix {
inherit pkgs stateDir runtimeDir logDir cacheDir tmpDir forceDisableUserChange processManager;

constructors = import ./constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager;
rec {
webapp = rec {
port = 5000;
dnsName = "webapp.local";

pkg = constructors.webapp {
inherit port;

nginx = rec {
port = 8080;

pkg = sharedConstructors.nginxReverseProxyHostBased {
webapps = [ webapp ];
inherit port;
} {};

The Nix expression above shows a simple processes model variant of that system, that consists of only two process instances:

  • The webapp process is (as shown earlier) an application that returns a static HTML page.
  • nginx is configured as a reverse proxy to forward incoming connections to multiple webapp instances using the virtual host header property (dnsName).

    If somebody connects to the nginx server with the following host name: webapp.local then the request is forwarded to the webapp service.

Configuration steps

To allow all processes in the process model shown to be deployed to a single container, we need to execute the following steps in the construction of an image:

  • Instead of deploying a single package, such as webapp, we need to refer to a collection of packages and/or configuration files that can be managed with a process manager, such as sysvinit, systemd or supervisord.

    The Nix process management framework provides all kinds of Nix function abstractions to accomplish this.

    For example, the following function invocation builds a configuration profile for the sysvinit process manager, containing a collection of sysvinit scripts (also known as LSB Init compliant scripts):

    profile = import ../create-managed-process/sysvinit/build-sysvinit-env.nix {
    exprFile = ./processes.nix;
    stateDir = "/var";

  • Similar to single root process containers, we may also need to initialize state. For example, we need to create common FHS state directories (e.g. /tmp, /var etc.) in which services can store their relevant state files (e.g. log files, temp files).

    This can be done by running the following command:

    nixproc-init-state --state-dir /var
  • Another property that multiple process containers have in common is that they may also require the presence of unprivileged users and groups, for security reasons.

    With the following commands, we can automatically generate all required users and groups specified in a deployment profile:

    ${dysnomia}/bin/dysnomia-addgroups ${profile}
    ${dysnomia}/bin/dysnomia-addusers ${profile}
  • Instead of starting a (single root) application process, we need to start a process manager that manages the processes that we want to deploy. As already explained, the framework allows you to pick multiple options.

Starting a process manager as a root process

From all process managers that the framework currently supports, the most straight forward option to use in a Docker container is: supervisord.

To use it, we can create a symlink to the supervisord configuration in the deployment profile:

ln -s ${profile} /etc/supervisor

and then start supervisord as a root process with the following command directive:

Cmd = [
"--configuration" "/etc/supervisor/supervisord.conf"
"--logfile" "/var/log/supervisord.log"
"--pidfile" "/var/run/"

(As a sidenote: creating a symlink is not strictly required, but makes it possible to control running services with the supervisorctl command-line tool).

Supervisord is not the only option. We can also use sysvinit scripts, but doing so is a bit tricky. As explained earlier, the life-cycle of container is bound to a running root process (in foreground mode).

sysvinit scripts do not run in the foreground, but start processes that daemonize and terminate immediately, leaving daemon processes behind that remain running in the background.

As described in an earlier blog post about translating high-level process management concepts, it is also possible to run "daemons in the foreground" by creating a proxy script. We can also make a similar foreground proxy for a collection of daemons:

#!/bin/bash -e

nixproc-sysvinit-runactivity -r stop ${profile}
kill "$pid"
exit 0

nixproc-sysvinit-runactivity start ${profile}

# Keep process running, but allow it to respond to the TERM and INT
# signals so that all scripts are stopped properly

trap _term TERM
trap _term INT

tail -f /dev/null & pid=$!
wait "$pid"

The above proxy script does the following:

  • It first starts all sysvinit scripts by invoking the nixproc-sysvinit-runactivity start command.
  • Then it registers a signal handler for the TERM and INT signals. The corresponding callback triggers a shutdown procedure.
  • We invoke a dummy command that keeps running in the foreground without consuming too many system resources (tail -f /dev/null) and we wait for it to terminate.
  • The signal handler properly deactivates all processes in reverse order (with the nixproc-sysvinit-runactivity -r stop command), and finally terminates the dummy command causing the script (and the container) to stop.

In addition supervisord and sysvinit, we can also use Disnix as a process manager by using a similar strategy with a foreground proxy.

Other configuration properties

The above configuration properties suffice to get a multi-process container running. However, to make working with such containers more practical from a user perspective, we may also want to:

  • Add basic shell utilities to the image, so that you can control the processes, investigate log files (in case of errors), and do other maintenance tasks.
  • Add a .bashrc configuration file to make file coloring working for the ls command, and to provide a decent prompt in a shell session.


The configuration steps described in the previous section are wrapped into a function named: createMultiProcessImage, which itself is a thin wrapper around the dockerTools.buildImage function in Nixpkgs -- it accepts the same parameters with a number of additional parameters that are specific to multi-process configurations.

The following function invocation builds a multi-process container deploying our example system, using supervisord as a process manager:

pkgs = import <nixpkgs> {};

createMultiProcessImage = import ../../nixproc/create-multi-process-image/create-multi-process-image.nix {
inherit pkgs system;
inherit (pkgs) dockerTools stdenv;
createMultiProcessImage {
name = "multiprocess";
tag = "test";
exprFile = ./processes.nix;
stateDir = "/var";
processManager = "supervisord";

After building the image, and deploying a container, with the following commands:

$ nix-build
$ docker load -i result
$ docker run -it --network host multiprocessimage:test

we should be able to connect to the webapp instance via the nginx reverse proxy:

$ curl -H 'Host: webapp.local' http://localhost:8080
<!DOCTYPE html>
<title>Simple test webapp</title>
Simple test webapp listening on port: 5000

As explained earlier, the constructed image also provides extra command-line utilities to do maintenance tasks, and control the life-cycle of the individual processes.

For example, we can "connect" to the running container, and check which processes are running:

$ docker exec -it mycontainer /bin/bash
# supervisorctl
nginx RUNNING pid 11, uptime 0:00:38
webapp RUNNING pid 10, uptime 0:00:38

If we change the processManager parameter to sysvinit, we can deploy a multi-process image in which the foreground proxy script is used as a root process (that starts and stops sysvinit scripts).

We can control the life-cycle of each individual process by directly invoking the sysvinit scripts in the container:

$ docker exec -it mycontainer /bin/bash
$ /etc/rc.d/init.d/webapp status
webapp is running with Process ID(s) 33.

$ /etc/rc.d/init.d/nginx status
nginx is running with Process ID(s) 51.

Although having extra command-line utilities to do administration tasks is useful, a disadvantage is that they considerably increase the size of the image.

To save storage costs, it is also possible to disable interactive mode to exclude these packages:

pkgs = import <nixpkgs> {};

createMultiProcessImage = import ../../nixproc/create-multi-process-image/create-multi-process-image.nix {
inherit pkgs system;
inherit (pkgs) dockerTools stdenv;
createMultiProcessImage {
name = "multiprocess";
tag = "test";
exprFile = ./processes.nix;
stateDir = "/var";
processManager = "supervisord";
interactive = false; # Do not install any additional shell utilities


In this blog post, I have described a new utility function in the Nix process management framework: createMultiProcessImage -- a thin wrapper around the dockerTools.buildImage function that can be used to convienently build multi-process Docker images, using any Docker-capable process manager that the Nix process management framework supports.

Besides the fact that we can convienently construct multi-process images, this function also has the advantage (similar to the dockerTools.buildImage function) that Nix is only required for the construction of the image. To deploy containers from a multi-process image, Nix is not a requirement.

There is also a drawback: similar to "ordinary" multi-process container deployments, when it is desired to upgrade a process, the entire container needs to be redeployed, also requiring a user to terminate all other running processes.


The createMultiProcessImage function is part of the current development version of the Nix process management framework that can be obtained from my GitHub page.

by Sander van der Burg ( at October 31, 2020 03:05 PM