NixOS Planet

May 29, 2021

Craige McWhirter

The Consensus on Branch Names

Consensus: Decisions are reached in a dialogue between equals

There was some kerfuffle in 2020 over the use of the term master in git, the origins of the term were resolutely settled so I set about renaming my primary branches to other words.

The one that most people seemed to be using was main, so I started using it too. While main was conveniently brief, it still felt inadequate. Something was wrong and it kept bubbling away in the background.

The word that kept percolating through was consensus.

I kept dismissing it for all the obvious reasons, such as it was too long, too unwieldy, too obscure or just simply not used commonly enough to be familiar or well understood.

The word was persistent though and consensus kept coming back.

One morning recently, I was staring at a git tree when the realisation slapped me in the face that in a git workflow the primary / master / main branches reflected a consensus point in that workflow.

Consensus: Decisions are reached in a dialogue between equals

That realisation settled it pretty hard for me, consensus not only accurately reflected the point in the workflow but was also the most correct English word for what that branch represented.

Continue the conversation on Matrix.

by Craige McWhirter at May 29, 2021 08:19 AM

May 08, 2021

Graham Christensen

Flakes are such an obviously good thing

Flakes is a major development for Nix. I believe one of the most significant changes the project has ever seen. Flakes brings a standardized interchange format for expressions, and dramatically reduces the friction of depending on someone else’s code. However, it needs the community involved to shape and evolve in to a final and wonderful tool.

The Nix community is about 18 years old now. Until recently (~6 years ago) the community was quite small. The project is now much bigger, and growing. The result is that, organizationally, we’re a bit immature. The RFC process is new, and only a few major changes have gone through it.

Unfortunately, Flakes was one of the very first major Nix features to go through the community’s relatively young RFC process.

As a community, we’re not accustomed to and practiced at breaking down large and fundamental changes to the ecosystem and shepherding them through RFCs. But there we were: we had a shiny new process and by golly, a change that deserves an RFC! We hadn’t even tied our shoes and yet we were attempting our first triathalon.

I believe everybody approached that RFC with really good intentions and hope that it would go well and be a productive process. In a lot of ways, it was productive. But the end of that RFC was not good.

What was one RFC probably should have been very theoretical “this is an idea, should we explore it?” followed by several RFCs about specific subsections and details about how Flakes would work.

But… we were so new to RFCs, so new to being a large project, we didn’t see the problems coming.

As a result, the RFC was closed, agreeing to make Flakes “experimental” but merge it into Nix anyway. Maybe in the future, a new RFC would be submitted to review the code as existed in Nix already.

This has caused quite a lot of bad feelings between all sorts of people. A lot of assumptions about motivations that I don’t believe hold up. Probably a good bit of distrust in to RFCs and the legitimacy of the process.

I find this so unfortunate. I believe Flakes are astonishingly important and will be very powerful, but the RFC experience has turned so many people against it on principle. Flakes have a lot of potential, and needs a strong community to work through their problems and help it flourish.

The damage to the perception of RFCs is real and tragic. I believe so deeply in this project and its ability to grow, and the organizational distrust from this makes that much, much more difficult.

I feel so disappointed in myself for not seeing the dangers of sending such a fundamental change to Nix the tool through a nearly brand new process which was a fundamental change to the Nix community. The process wasn’t ready for it, the participants weren’t ready for it.

I regret that so much.

I believe we can, as a project and community, move past it. It will take leadership and effort from a lot of people. Project leadership will have to make the first moves there, to mend the distrust and sow new seeds of cooperation. I know we can do this.

I have so much love for this project and its community. I feel so grateful to be part of it, surrounded every day by people so much smarter than me.

I hope to be part of the solution, to be part of the healing and growth.

I have been part of this project for just over five years now, and I am incredibly excited to be part of the next five.

The future of Nix is so bright I can hardly look right at it without looking down at less ambitious futures.

This was originally a series of tweets.

May 08, 2021 12:00 AM

April 26, 2021

Sander van der Burg

A test framework for the Nix process management framework

As already explained in many previous blog posts, the Nix process management framework adds new ideas to earlier service management concepts explored in Nixpkgs and NixOS:

  • It makes it possible to deploy services on any operating system that can work with the Nix package manager, including conventional Linux distributions, macOS and FreeBSD. It also works on NixOS, but NixOS is not a requirement.
  • It allows you to construct multiple instances of the same service, by using constructor functions that identify conflicting configuration parameters. These constructor functions can be invoked in such a way that these configuration properties no longer conflict.
  • We can target multiple process managers from the same high-level deployment specifications. These high-level specifications are automatically translated to parameters for a target-specific configuration function for a specific process manager.

    It is also possible to override or augment the generated parameters, to work with configuration properties that are not universally supported.
  • There is a configuration option that conveniently allows you to disable user changes making it possible to deploy services as an unprivileged user.

Although the above features are interesting, one particular challenge is that the framework cannot guarantee that all possible variations will work after writing a high-level process configuration. The framework facilitates code reuse, but it is not a write once, run anywhere approach.

To make it possible to validate multiple service variants, I have developed a test framework that is built on top of the NixOS test driver that makes it possible to deploy and test a network of NixOS QEMU virtual machines with very minimal storage and RAM overhead.

In this blog post, I will describe how the test framework can be used.

Automating tests


Before developing the test framework, I was mostly testing all my packaged services manually. Because a manual test process is tedious and time consuming, I did not have any test coverage for anything but the most trivial example services. As a result, I frequently ran into many configuration breakages.

Typically, when I want to test a process instance, or a system that is composed of multiple collaborative processes, I perform the following steps:

  • First, I need to deploy the system for a specific process manager and configuration profile, e.g. for a privileged or unprivileged user, in an isolated environment, such as a virtual machine or container.
  • Then I need to wait for all process instances to become available. Readiness checks are critical and typically more complicated than expected -- for most services, there is a time window between a successful invocation of a process and its availability to carry out its primary task, such as accepting network connections. Executing tests before a service is ready, typically results in errors.

    Although there are process managers that can generally deal with this problem (e.g. systemd has the sd_notify protocol and s6 its own protocol and a sd_notify wrapper), the lack of a standardized protocol and its adoption still requires me to manually implement readiness checks.

    (As a sidenote: the only readiness check protocol that is standardized is for traditional System V services that daemonize on their own. The calling parent process should almost terminate immediately, but still wait until the spawned daemon child process notifies it to be ready.

    As described in an earlier blog post, this notification aspect is more complicated to implement than I thought. Moreover, not all traditional System V daemons follow this protocol.)
  • When all process instances are ready, I can check whether they properly carry out their tasks, and whether the integration of these processes work as expected.

An example


I have developed a Nix function: testService that automates the above process using the NixOS test driver -- I can use this function to create a test suite for systems that are made out of running processes, such as the webapps example described in my previous blog posts about the Nix process management framework.

The example system consists of a number of webapp processes with an embedded HTTP server returning HTML pages displaying their identities. Nginx reverse proxies forward incoming connections to the appropriate webapp processes by using their corresponding virtual host header values:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, libDir ? "${stateDir}/lib"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
sharedConstructors = import ../../../examples/services-agnostic/constructors/constructors.nix {
inherit pkgs stateDir runtimeDir logDir cacheDir libDir tmpDir forceDisableUserChange processManager;
};

constructors = import ../../../examples/webapps-agnostic/constructors/constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager;
webappMode = null;
};
in
rec {
webapp1 = rec {
port = 5000;
dnsName = "webapp1.local";

pkg = constructors.webapp {
inherit port;
instanceSuffix = "1";
};
};

webapp2 = rec {
port = 5001;
dnsName = "webapp2.local";

pkg = constructors.webapp {
inherit port;
instanceSuffix = "2";
};
};

webapp3 = rec {
port = 5002;
dnsName = "webapp3.local";

pkg = constructors.webapp {
inherit port;
instanceSuffix = "3";
};
};

webapp4 = rec {
port = 5003;
dnsName = "webapp4.local";

pkg = constructors.webapp {
inherit port;
instanceSuffix = "4";
};
};

nginx = rec {
port = if forceDisableUserChange then 8080 else 80;
webapps = [ webapp1 webapp2 webapp3 webapp4 ];

pkg = sharedConstructors.nginxReverseProxyHostBased {
inherit port webapps;
} {};
};

webapp5 = rec {
port = 5004;
dnsName = "webapp5.local";

pkg = constructors.webapp {
inherit port;
instanceSuffix = "5";
};
};

webapp6 = rec {
port = 5005;
dnsName = "webapp6.local";

pkg = constructors.webapp {
inherit port;
instanceSuffix = "6";
};
};

nginx2 = rec {
port = if forceDisableUserChange then 8081 else 81;
webapps = [ webapp5 webapp6 ];

pkg = sharedConstructors.nginxReverseProxyHostBased {
inherit port webapps;
instanceSuffix = "2";
} {};
};
}

The processes model shown above (processes-advanced.nix) defines the following process instances:

  • There are six webapp process instances, each running an embedded HTTP service, returning HTML pages with their identities. The dnsName property specifies the DNS domain name value that should be used as a virtual host header to make the forwarding from the reverse proxies work.
  • There are two nginx reverse proxy instances. The former: nginx forwards incoming connections to the first four webapp instances. The latter: nginx2 forwards incoming connections to webapp5 and webapp6.

With the following command, I can connect to webapp2 through the first nginx reverse proxy:


$ curl -H 'Host: webapp2.local' http://localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Simple test webapp</title>
</head>
<body>
Simple test webapp listening on port: 5001
</body>
</html>

Creating a test suite


I can create a test suite for the web application system as follows:


{ pkgs, testService, processManagers, profiles }:

testService {
exprFile = ./processes.nix;

readiness = {instanceName, instance, ...}:
''
machine.wait_for_open_port(${toString instance.port})
'';

tests = {instanceName, instance, ...}:
pkgs.lib.optionalString (instanceName == "nginx" || instanceName == "nginx2")
(pkgs.lib.concatMapStrings (webapp: ''
machine.succeed(
"curl --fail -H 'Host: ${webapp.dnsName}' http://localhost:${toString instance.port} | grep ': ${toString webapp.port}'"
)
'') instance.webapps);

inherit processManagers profiles;
}

The Nix expression above invokes testService with the following parameters:

  • processManagers refers to a list of names of all the process managers that should be tested.
  • profiles refers to a list of configuration profiles that should be tested. Currently, it supports privileged for privileged deployments, and unprivileged for unprivileged deployments in an unprivileged user's home directory, without changing user permissions.
  • The exprFile parameter refers to the processes model of the system: processes-advanced.nix shown earlier.
  • The readiness parameter refers to a function that does a readiness check for each process instance. In the above example, it checks whether each service is actually listening on the required TCP port.
  • The tests parameter refers to a function that executes tests for each process instance. In the above example, it ignores all but the nginx instances, because explicitly testing a webapp instance is a redundant operation.

    For each nginx instance, it checks whether all webapp instances can be reached from it, by running the curl command.

The readiness and tests functions take the following parameters: instanceName identifies the process instance in the processes model, and instance refers to the attribute set containing its configuration.

Furthermore, they can refer to global process model configuration parameters:

  • stateDir: The directory in which state files are stored (typically /var for privileged deployments)
  • runtimeDir: The directory in which runtime files are stored (typically /var/run for privileged deployments).
  • forceDisableUserChange: Indicates whether to disable user changes (for unprivileged deployments) or not.

In addition to writing tests that work on instance level, it is also possible to write tests on system level, with the following parameters (not shown in the example):

  • initialTests: instructions that run right after deploying the system, but before the readiness checks, and instance-level tests.
  • postTests: instructions that run after the instance-level tests.

The above functions also accept the same global configuration parameters, and processes that refers to the entire processes model.

We can also configure other properties useful for testing:

  • systemPackages: installs additional packages into the system profile of the test virtual machine.
  • nixosConfig defines a NixOS module with configuration properties that will be added to the NixOS configuration of the test machine.
  • extraParams propagates additional parameters to the processes model.

Composing test functions


The Nix expression above is not self-contained. It is a function definition that needs to be invoked with all required parameters including all the process managers and profiles that we want to test for.

We can compose tests in the following Nix expression:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, processManagers ? [ "supervisord" "sysvinit" "systemd" "disnix" "s6-rc" ]
, profiles ? [ "privileged" "unprivileged" ]
}:

let
testService = import ../../nixproc/test-driver/universal.nix {
inherit system;
};
in
{

nginx-reverse-proxy-hostbased = import ./nginx-reverse-proxy-hostbased {
inherit pkgs processManagers profiles testService;
};

docker = import ./docker {
inherit pkgs processManagers profiles testService;
};

...
}

The above partial Nix expression (default.nix) invokes the function defined in the previous Nix expression that resides in the nginx-reverse-proxy-hostbased directory and propagates all required parameters. It also composes other test cases, such as docker.

The parameters of the composition expression allow you to globally configure all the desired service variants:

  • processManagers allows you to select the process managers you want to test for.
  • profiles allows you to select the configuration profiles.

With the following command, we can test our system as a privileged user, using systemd as a process manager:


$ nix-build -A nginx-reverse-proxy-hostbased.privileged.systemd

we can also run the same test, but then as an unprivileged user:


$ nix-build -A nginx-reverse-proxy-hostbased.unprivileged.systemd

In addition to systemd, any configured process manager can be used that works in NixOS. The following command runs a privileged test of the same service for sysvinit:


$ nix-build -A nginx-reverse-proxy-hostbased.privileged.sysvinit

Results


With the test driver in place, I have managed to expand my repository of example services, provided test coverage for them and fixed quite a few bugs in the framework caused by regressions.

Below is a screenshot of Hydra: the Nix-based continuous integration service showing an overview of test results for all kinds of variants of a service:


So far, the following services work multi-instance, with multiple process managers, and (optionally) as an unprivileged user:

  • Apache HTTP server. In the services repository, there are multiple constructors for deploying an Apache HTTP server: to deploy static web applications or dynamic web applications with PHP, and to use it as a reverse proxy (via HTTP and AJP) with HTTP basic authentication optionally enabled.
  • Apache Tomcat.
  • Nginx. For Nginx we also have multiple constructors. One to deploy a configuration for serving static web apps, and two for setting up reverse proxies using paths or virtual hosts to forward incoming requests to the appropriate services.

    The reverse proxy constructors can also generate configurations that will cache the responses of incoming requests.
  • MySQL/MariaDB.
  • PostgreSQL.
  • InfluxDB.
  • MongoDB.
  • OpenSSH.
  • svnserve.
  • xinetd.
  • fcron. By default, the fcron user and group are hardwired into the executable. To facilitate unprivileged user deployments, we automatically create a package build override to propagate the --with-run-non-privileged configuration flag so that it can run as unprivileged user. Similarly, for multiple instances we create an override to use a different user and group that does not conflict with the primary instance.
  • supervisord
  • s6-svscan

The following service also works with multiple instances and multiple process managers, but not as an unprivileged user:


The following services work with multiple process managers, but not multi-instance or as an unprivileged user:

  • D-Bus
  • Disnix
  • nix-daemon
  • Hydra

In theory, the above services could be adjusted to work as an unprivileged user, but doing so is not very useful -- for example, the nix-daemon's purpose is to facilitate multi-user package deployments. As an unprivileged user, you only want to facilitate package deployments for yourself.

Moreover, the multi-instance aspect is IMO also not very useful to explore for these services. For example, I can not think of a useful scenario to have two Hydra instances running next to each other.

Discussion


The test framework described in this blog post is an important feature addition to the Nix process management framework -- it allowed me to package more services and fix quite a few bugs caused by regressions.

I can now finally show that it is doable to package services and make them work under nearly all possible conditions that the framework supports (e.g. multiple instances, multiple process managers, and unprivileged user installations).

The only limitation of the test framework is that it is not operating system agnostic -- the NixOS test driver (that serves as its foundation), only works (as its name implies) with NixOS, which itself is a Linux distribution. As a result, we can not automatically test bsdrc scripts, launchd daemons, and cygrunsrv services.

In theory, it is also possible to make a more generalized test driver that works with multiple operating systems. The NixOS test driver is a combination of ideas (e.g. a shared Nix store between the host and guest system, an API to control QEMU, and an API to manage services). We could also dissect these ideas and run them on conventional QEMU VMs running different operating systems (with the Nix package manager).

Although making a more generalized test driver is interesting, it is beyond the scope of the Nix process management framework (which is about managing process instances, not entire systems).

Another drawback is that while it is possible to test all possible service variants on Linux, it may be very expensive to do so.

However, full process manager coverage is often not required to get a reasonable level of confidence. For many services, it typically suffices to implement the following strategy:

  • Pick two process managers: one that prefers foreground processes (e.g. supervisord) and one that prefers daemons (e.g. sysvinit). This is the most significant difference (from a configuration perspective) between all these different process managers.
  • If a service supports multiple configuration variants, and multiple instances, then create a processes model that concurrently deploys all these variants.

Implementing the above strategy only requires you to test four variants, providing a high degree of certainty that it will work with all other process managers as well.

Future work


Most of the interesting functionality required to work with the Nix process management framework is now implemented. I still need to implement more changes to make it more robust and "dog food" more of my own problems as much as possible.

Moreover, the docker backend still requires a bit more work to make it more usable.

Eventually, I will be thinking of an RFC that will upstream the interesting bits of the framework into Nixpkgs.

Availability


The Nix process management framework repository as well as the example services repository can be obtained from my GitHub page.

by Sander van der Burg (noreply@blogger.com) at April 26, 2021 07:32 PM

nixbuild.net

Data Science with Nix: Parameter Sweeps

Parameter sweeping is a technique often utilized in scientific computing and HPC settings. In the mainstream software industry the concept is called a build matrix.

The idea is that you have a task you want to perform with varying input parameters. If the task takes multiple parameters, and you’d like to try it out with multiple values for each parameter, it is easy to end up with a combinatorial explosion.

This blog post gives a practical demonstration showing how Nix is a perfect companion for managing parameter sweeps and build matrices, and how nixbuild.net can be used to supercharge your workflow.

My hope is that this text can interest readers that don’t know anything about Nix as well as experienced Nix users.

Use Cases

In scientific computing, it is common to run simulations of physical processes. The list of things simulated is endless: weather forecasting, molecular dynamics, celestial movements, FEM analysis, particle physics etc. A simulation is usually implemented directly as a computer program or as a description for a higher level simulation framework. A simulation generally has a set of input parameters that can be defined. These parameters can describe initial states, environmental aspects or tweak the behavior of the simulation algorithm itself. Scientists are interested in comparing simulation results for a range of different parameter values, and the process of doing so is referred to as a parameter sweep.

Parameter sweeping is often built into simulation frameworks. For simulations implemented directly as specialized programs, scientists will simply run the program over and over again with different parameters, collecting and comparing the results. When supercomputers are used for running the simulations, the job scheduler usually has some support for launching multiple simulation instances with varying parameters.

In the software industry, the term build matrix is used to mean basically the same thing as parameter sweeping. Regularly, build matrices are used to build different variants of the same deliverable. In the simplest case, a programmer builds and packages a program for set of different targets (Windows, MacOS, Android etc). But more complex build matrices with (much) higher dimensionality is of course also used.

Benchmarking is another area where build matrices are utilized, and combinatorial explosions are common. The demo below will show how a compression benchmark can be implemented with Nix and nixbuild.net.

Embarassing Parallelism

Parameter sweeping can be classified as an embarrassingly parallel problem. As long as we have enough CPUs, all simulations or builds can be executed in parallel, since there is (usually) no dependencies between them. This is a perfect workload for nixbuild.net, which is built to be very scalable.

At the same time, it is also easy to get into troubles managing all possible combinations of parameter values. Adding new parameters or parameter values can increase the number of runs exponentially, and the work of managing the runs and their results becomes overwhelming. The next section will show how Nix can help out with this.

How Nix Helps

One of the aspects of Nix that I find most empowering is that it helps you with the boring and stress-inducing task of managing files. Let me see if I can explain.

Assume you have a program called simulation that takes a parameter file as its only argument. The parameter file contains simulation parameters in a simple INI-format like this:

param1=3
param2=92
param3=0

The program outputs a simulation result on its standard output, in CSV format.

Now we want to run the simulation for some different combinations of input parameters. We could manually author the needed parameter files, or we can write a simple shell script for it:

round=1

for param1 in 3 4 5; do
  for param2 in 10 33 50 92; do
    for param3 in 0 1; do
      echo "param1=$param1" >> "round$round.ini"
      echo "param2=$param2" >> "round$round.ini"
      echo "param3=$param3" >> "round$round.ini"
      round=$((round+1))
    done
  done
done

We now got 24 different parameter files with all possible combinations of the values we are interested in:

$ ls -v
round1.ini  round6.ini   round11.ini  round16.ini  round21.ini
round2.ini  round7.ini   round12.ini  round17.ini  round22.ini
round3.ini  round8.ini   round13.ini  round18.ini  round23.ini
round4.ini  round9.ini   round14.ini  round19.ini  round24.ini
round5.ini  round10.ini  round15.ini  round20.ini

$ cat round20.ini
param1=5
param2=33
param3=1

To run the simulations, we simply loop through all parameter files:

for round in round*.ini; do
  simulation $round >> results.csv
done

So far, so good. At this point, we might want to tweak our parameter generation a bit. Maybe there are more parameter values we want to explore, different sweeps to do. So we change our parameter generator script and re-run a few times. We then realise we want to make changes to our simulation program itself. So we do that, and recompile it. Now we want to re-run all the different parameter sweeps we’ve done. Luckily, we saved all different versions of our parameter generation script, so we can just run the updated simulator with the previous simulator.

At the end of our productive simulation session we have the following mess (with all intermediate parameter files removed):

gen-params.sh         results-1.csv   results-12.csv  simulation-3
gen-params-1.sh       results-2.csv   results-13.csv  simulation-O2-1
gen-params-2.sh       results-3.csv   results-14.csv  simulation-O2-2
gen-params-3.sh       results-4.csv   results-15.csv  simulation-O3
gen-params-3v2.sh     results-5.csv   results-16.csv  simulation-debug
gen-params-4.sh       results-6.csv   results-17.csv  simulation-debug2
gen-params-test.sh    results-7.csv   simulation      simulation-wrong
gen-params-test-2.sh  results-8.csv   simulation-1
results.csv           results-10.csv  simulation-2

Admittedly, this is how things usually end up for me when I’m doing any kind of “exploratory” work. I’m sure you all are much more organized. To my rescue comes Nix. It allows me to stop caring entirely about generated files, and only care about how stuff is generated. Additionally, it gives me tools to abstract, parameterize and reuse generators.

Let’s make an attempt at recreating our workflow above with Nix:

{ pkgs ? import <nixpkgs> {} }:

let

  inherit (pkgs) lib callPackage runCommand writeText;

  # Compiles the 'simulation' program, and allows us to provide
  # build-time arguments
  simulation = callPackage ./simulation.nix;

  # Executes the given simulation program with the given parameters
  runSimulation = buildArgs: parameters:
    runCommand "result.csv" {
      buildInputs = [ (simulation buildArgs) ];
      parametersFile = writeText "params.ini" (
        lib.generators.toKeyValue {} parameters
      );
    } ''
      simulation $parametersFile > $out
    '';

  # Merges multiple CSV files into a single one
  mergeResults = results: runCommand "results.csv" {
    inherit results;
  } ''
    cat $results > $out
  '';

in {

  sim_O3_std_sweep = mergeResults (
    lib.forEach (lib.cartesianProductOfSets {
      param1 = [3 4 5];
      param2 = [10 33 50 92];
      param3 = [0 1];
    }) (
      runSimulation {
        optimizationLevel = 3;
      }
    )
  );

  sim_O2_small_sweep = mergeResults (
    lib.forEach (lib.cartesianProductOfSets {
      param1 = [1 3];
      param2 = [20 60 92];
      param3 = [0 1];
    }) (
      runSimulation {
        optimizationLevel = 2;
      }
    )
  );

}

The key function above is perhaps cartesianProductOfSets, from the library functions in nixpkgs. It will create all possible combinations of input parameters, if we list the possible value for each parameter. Our build function is then mapped over all these combinations using forEach.

We can build one of our parameter sweeps like this:

nix-build -A sim_O3_std_sweep

When all 24 simulation runs are done, Nix will create a single result symlink in the current directory, pointing to a results.csv file containing all simulation results. We can add new sweeps to our Nix file and re-run the build at any time. We never have to care about any generated files, since everything needed to re-generate results exists in the Nix file. The Nix file itself can be version-controlled like any source file.

In addition to the demonstrated ability to parameterize builds, Nix provides us with two more things, for free.

No unnecessary re-builds

In the example above, the sim_O3_std_sweep and sim_O2_small_sweep builds have some overlapping parameter sets. If you build both, Nix will only run the overlapping simulations once, and use the same result.csv files to create the two different results.csv files. This happens without any extra effort from the user. The same is true if you make changes that only affect part of your build. Nix also has support for external caches which makes it easy to share and reuse build results between computers (or you can simply use nixbuild.net to get build sharing without any extra configuration).

Automatic parallelization

When Nix evaluates an expression, it constructs a build graph that tracks all dependencies between builds. In the example above, the results.csv file depends on the list of result.csv files, which in turn depend on specific builds of simulation. All of these dependencies are implicit; you don’t have to do anything other than simply refer to the things you need to perform your build.

The build graph allows Nix to execute the actual builds with maximum parallelization. However, running as many builds as possible in parallel is often not optimal, if your compute resources (usually: your local computer) are limited. Nix has a simplistic build scheduler, which is just a user-configurable limit of the maximum number of concurrent builds. This works in many cases, but quickly gets non-optimal when you have lots of builds that could run in parallel, or when the builds themselves have varying compute requirements.

This is where nixbuild.net can step in. It is able to run an “infinite” number of concurrent Nix builds for you, while keeping all builds perfectly isolated from each other (security- and resource-wise). It also selects compute resources intelligently for each individual build.

From the perspective of scientific computing, you can say that Nix provides a generic framework for parallel workloads, and nixbuild.net acts somewhat as a supercomputer, minus the effort of writing submission scripts and explicitly managing compute results.

Demo: Compression Benchmark

I’m now going to show you an example that is similar to the example in the previous section, but instead of an imaginary simulation we will run an actual benchmark this time. The benchmark will compare the compression ratio of a number of different lossless compression implementations. Since this article is about parameter sweeping, we will vary the following parameters during the benchmark:

  • Compression implementation: brotli, bzip2, gzip, lz4, xz and zstd.

  • Two different versions of each compression implementation. We’ll use the versions packaged in nixpkgs 16.03 and 20.09, respectively.

  • Compression level: 1-9.

  • Corpus type: text, binaries and jpeg files.

  • Corpus size: small, medium and large.

We’ll try out the Cartesian product of the above parameters, resulting in 972 different builds. There is no particular thought behind the parameter selection, they are just picked to demonstrate the abilities of Nix and nixbuild.net. If you were to design a proper benchmark you’d likely come up with different parameters, but the concept would be the same.

Here is the complete Nix expression implementing the benchmark outlined above. The expression is parameterized over package sets from different releases of nixpkgs. There are different ways of actually importing those package sets, but that is out of the scope of this example.

{ pkgs, pkgs_2009, pkgs_1603 }:

let

  inherit (pkgs)
    stdenv fetchurl lib writers runCommand unzip gnutar
    referencesByPopularity uclibc hello zig;

  compressionCommand = pkgs: program: level: {

    brotli = writers.writeBash "brotli-compress" ''
      if [ -x ${pkgs.brotli}/bin/brotli ]; then
        ${pkgs.brotli}/bin/brotli --stdout -${toString level}
      else
        ${pkgs.brotli}/bin/bro --quality ${toString level}
      fi
    '';

    bzip2 = "${pkgs.bzip2}/bin/bzip2 --stdout -${toString level}";

    gzip = "${pkgs.gzip}/bin/gzip --stdout -${toString level}";

    lz4 = "${pkgs.lz4}/bin/lz4 --stdout -${toString level}";

    xz = "${pkgs.xz}/bin/xz --stdout -${toString level}";

    zstd = "${pkgs.zstd}/bin/zstd --stdout -${toString level}";

  }.${program};

  corpus = rec {
    txt.small = calgary-text.small;
    txt.medium = calgary-text;
    txt.large = runCommand "enwik8" {
      buildInputs = [ unzip ];
      src = fetchurl {
        url = "https://mattmahoney.net/dc/enwik8.zip";
        sha256 = "1g1l4n9x8crxghapq956j7i4z89qkycm5ml0hcld3ghfk3cr8yal";
      };
    } ''
      unzip "$src"
      mv enwik8 "$out"
    '';

    pkg.small = closure-tar "uclibc-closure.tar" uclibc;
    pkg.medium = closure-tar "hello-closure.tar" hello;
    pkg.large = closure-tar "zig-closure.tar" zig;

    jpg.small = fetchurl {
      url = "https://people.sc.fsu.edu/~jburkardt/data/jpg/charlie.jpg";
      sha256 = "0cmd8wwm0vaqxsbvb3lxk2f7w2lliz8p361s6pg4nw0vzya6lzrg";
    };
    jpg.medium = fetchurl {
      url = "https://cdn.hasselblad.com/samples/x1d-II-50c/x1d-II-sample-02.jpg";
      sha256 = "15pz84f5d34jmp0ljz61wx3inx8442sgf9n8adbgb8m4v88vifk2";
    };
    jpg.large = fetchurl {
      url = "https://cdn.hasselblad.com/samples/Cam_1_Borna_AOS-H5.jpg";
      sha256 = "0rdcxlxcxanlgfnlxs9ffd3s36a05g8g3ca9khkfsgbyd5spk343";
    };

    calgary-text = stdenv.mkDerivation {
      name = "calgary-corpus-text";
      src = fetchurl {
        url = "https://corpus.canterbury.ac.nz/resources/calgary.tar.gz";
        sha256 = "1dwk417ql549l0sa4jzqab67ffmyli4nmgaq7i9ywp4wq6yyw2g1";
      };
      sourceRoot = ".";
      outputs = [ "out" "small" ];
      installPhase = ''
        cat bib book2 news paper* prog* > "$out"
        cat paper1 > "$small"
      '';
    };

    closure-tar = name: pkg: runCommand name {
      buildInputs = [ gnutar ];
      closure = referencesByPopularity pkg;
    } ''
      tar -c --files-from="$closure" > "$out"
    '';
  };

  benchmark = { release, program, level, corpusType, corpusSize }:
    runCommand (lib.concatStringsSep "-" [
      "zbench" program "l${toString level}" corpusType corpusSize release.rel
    ]) rec {
      corpusFile = corpus.${corpusType}.${corpusSize};
      run = compressionCommand release.pkgs program level;
      version = lib.getVersion release.pkgs.${program};
      tags = lib.concatStringsSep "," [
        program version (toString level) corpusType corpusSize
      ];
    } ''
      orig_size="$(stat -c %s "$corpusFile")"
      result_size="$($run < "$corpusFile" | wc -c)"
      percent="$((100*result_size / orig_size))"
      echo >"$out" "$tags,$orig_size,$result_size,$percent"
    '';

in runCommand "compression-benchmarks" {
  results = map benchmark (lib.cartesianProductOfSets {
    program = [
      "brotli"
      "bzip2"
      "gzip"
      "lz4"
      "xz"
      "zstd"
    ];
    release = [
      { pkgs = pkgs_1603; rel = "1603"; }
      { pkgs = pkgs_2009; rel = "2009"; }
    ];
    level = lib.range 1 9;
    corpusType = [ "txt" "pkg" "jpg" ];
    corpusSize = [ "small" "medium" "large" ];
  });
} ''
  echo program,version,level,corpus,class,orig_size,result_size,ratio > $out
  cat $results >> $out
''

Above, compressionCommand defines the command used for each compression program to compress stdin to stdout with a given level.

The corpus attribute set defines txt, pkg and jpg datasets. For text and jpeg we simply fetch suitable sets, and for the binary (pkg) sets we use Nix itself to create a tar file out of the transistive closure of some different packages. The corpus sizes varies between around 50 kB and 300 MB.

benchmark runs a single compression command for one combination of input parameters.

Finally, we again use the cartesianProductOfSets function to create builds of all possible combinations of parameters, and then simply concatenate all individual results into a big CSV file.

Building the complete benchmark takes about 25 minutes on my somewhat old 8-core workstation, with Nix configured to run at most 8 builds concurrently. If I use nixbuild.net instead, time is cut down to 10 minutes due to the parallelization gains possible when running 972 independent Nix builds.

In the end we get a CSV-file with values for each parameter combination. The first ten lines of the file looks like this:

program,version,level,corpus,class,orig_size,result_size,ratio
brotli,0.3.0,1,txt,small,53161,19634,36
brotli,1.0.9,1,txt,small,53161,21162,39
bzip2,1.0.6,1,txt,small,53161,16558,31
bzip2,1.0.6.0.1,1,txt,small,53161,16558,31
gzip,1.6,1,txt,small,53161,21605,40
gzip,1.10,1,txt,small,53161,21605,40
lz4,131,1,txt,small,53161,27936,52
lz4,1.9.2,1,txt,small,53161,28952,54
xz,5.2.2,1,txt,small,53161,18416,34

To quickly get some sort of visualization of the benchmark data, I dumped the CSV contents into rawgraphs.io and produced the following graph:

From this visualization, we can draw a few conclusions:

  • There’s little point in compressing (already compressed) JPG data.

  • xz is the clear winner when it comes to producing small archives of binary data.

  • bzip2 produces almost the same compression ratio for all level settings. It even looks like level 9 can produce slightly worse compression than level 8.

  • lz4 makes a very big jump in compression ratio between level 2 and 3.

To further refine our workflow, we could also produce data visualizations directly in our Nix expression, by creating builds that would feed the CSV data into some visualization software.

Remember, this blog post is not about benchmarking compression, but about how you can use Nix and nixbuild.net for such workflows. Hopefully you’ve gained some insights into how Nix can be used in scientific computing and data science workflows. Let’s wrap up with a summary of why I find Nix useful in these situations:

  • The Nix programming language and standard library provide tools for managing combinatorial problems, and allows us to quickly come up with high level abstractions giving us sensible knobs to turn when exploring parameter sweeps and build matrices.

  • We don’t have to think about parallelization, Nix takes care of it for us.

  • Nix makes it very easy to build specific variants of packages. This is helpful if you want make comparisons between different software versions or patches. nixpkgs is a huge repository of pre-packaged software available to anyone.

  • nixbuild.net gives you extreme scalability with no adaptation or configuration needed. In the example above we saw build times cut to less than half by sending our Nix builds to nixbuild.net.

  • Reproducibility and build reuse is first-rate in Nix.

Thank you for reading this rather lengthy blog post! If have any comments or questions about the content or about nixbuild.net in general, don’t hesitate to contact me.

by nixbuild.net (support@nixbuild.net) at April 26, 2021 12:00 AM

March 18, 2021

Tweag I/O

Types à la carte in Nickel

This post is the third one of a series on Nickel, a new configuration language that I’ve been working on. In this post, I explore the design of Nickel’s type system, which mixes static and dynamic typing, and the reasons behind this choice.

  1. Presenting Nickel: better configuration for less
  2. Programming with contracts in Nickel
  3. Types à la carte in Nickel

When other constraints allow it (the right tool for the job and all that), I personally go for a statically typed language whenever I can. But the Nickel language is a tad different, for it is a configuration language. You usually run a terminating program once on fixed inputs to generate a static text file. In this context, any type error will most likely either be triggered at evaluation anyway, typechecker or not, or be irrelevant (dead code). Even more if you add data validation, which typing can seldom totally replace: statically enforcing that a string is a valid URL, for example, would require a powerful type system. If you have to validate anyway, checking that a value is a number at run-time on the other hand is trivial.

Nickel also aims at being as interoperable with JSON as possible, and dealing with JSON values in a typed manner may be painful. For all these reasons, being untyped1 in configuration code is appealing.

But this is not true of all code. Library code is written to be reused many times in many different settings. Although specialised in configuration, Nickel is a proper programming language, and one of its value propositions is precisely to provide abstractions to avoid repeating yourself. For reusable code, static typing sounds like the natural choice, bringing in all the usual benefits.

How to get out of this dilemma?

Gradual typing

Gradual typing reconciles the two belligerents by allowing both typed code and untyped code to coexist. Not only to coexist, but most importantly, to interact.

One common use-case of gradual typing is to retrofit static typing on top of an existing dynamically typed language, allowing to gradually — hence the name — type an existing codebase. In the case of Nickel, gradual typing is used on its own, because optional typing makes sense. In both situations, gradual typing provides a formal and principled framework to have both typed and untyped code living in a relative harmony.

Promises, promises!

Since configuration code is to be untyped, and make for the majority of Nickel code, untyped is the default. A basic configuration looks like JSON, up to minor syntactic differences:

{
  host = "google.com",
  port = 80,
  protocol = "http",
}

Typechecking is triggered by a type annotation, introduced by :. Annotations can either be apposed to a variable name or to an expression:

let makePort : Str -> Num = fun protocol =>
  if protocol == "http" then
    80
  else if protocol == "ftp" then
    21
  else
    null in
let unusedBad = 10 ++ "a" in
{
  port = makePort protocol,
  protocol = ("ht" ++ "tp" : Str),
}

In this example, makePort is a function taking a string and returning a number. It is annotated, causing the typechecker to kick in. It makes sure that each sub-expression is well-typed. Notice that subterms don’t need any other annotation: Nickel is able to guess most of the types using unification-based type inference.

Such a static type annotation is also called a promise, as you make a firm promise to the typechecker about the type of an expression.

Static typechecking ends with makePort, and although unusedBad is clearly ill-typed (concatenating a number and a string), it won’t cause any typechecking error.

Can you guess the result of trying to run this program?

error: Incompatible types
  ┌─ repl-input-1:7:5
  │
7 │     null in
  │     ^^^^ this expression
  │
  = The type of the expression was expected to be `Num`
  = The type of the expression was inferred to be `Dyn`
  = These types are not compatible

The typechecker rightly complains than null is not a number. If we fix this (for now, substituting it with -1), the programs runs correctly:

$nickel export <<< ...
{
  "port": 80,
  "protocol": "http"
}

unusedBad doesn’t cause any error at run-time. Due to laziness, it is never evaluated. If we were to add a type annotation for it though, the typechecker would reject our program.

To recap, the typechecker is a lurker by default, letting us do pretty much what we want. It is triggered by a type annotation exp : Type, in which case it switches on and statically typechecks the expression.

Who’s to be blamed

So far, so good. Now, consider the following example:

let add : Num -> Num -> Num = fun x y => x + y in
add "a" 0

As far as typechecking goes, only the body of add is to be checked, and it is well-typed. However, add is called with a parameter of the wrong type by an untyped chunk. Without an additional safety mechanism, one would get this runtime type error:

error: Type error
  ┌─ repl-input-0:1:26
  │
1 │ let add : Num -> Num -> Num = fun x y => x + y
  │                                              ^ This expression has type Str, but Num was expected
  │
[..]

The error first points to a location inside the body of add. It doesn’t feel right, and kinda defeats the purpose of typing: while our function should be guaranteed to be well-behaved, any untyped code calling to it can sneak in ill-typed terms via the parameters. In turn, this raises errors that are located in well behaving code. In this case, which is deliberately trivial, the end of the error message elided as [..] turns out to give us enough information to diagnose the actual issue. This is not necessarily the case for more complex real life functions.

There’s not much we can do statically. The whole point of gradual typing being to accommodate for untyped code, we can’t require the call site to be statically typed. Otherwise, types would contaminate everything and we might as well make our language fully statically typed.

We can do something at run-time, though. Assuming type soundness, no type error should arise in the body of a well-typed function at evaluation. The only sources of type errors are the parameters provided by the caller.

Thus, we just have to control the boundary between typed and untyped blocks by checking the validity of the parameters provided by the caller. If we actually input the previous example in the Nickel REPL, we don’t get the above error message, but this one instead:

nickel> let add : Num -> Num = fun x y => x + y in
add 5 "a"

error: Blame error: contract broken by the caller.
  ┌─ :1:8
  │
1 │ Num -> Num -> Num
  │        --- expected type of the argument provided by the caller
  │
  ┌─ repl-input-6:1:31
  │
1 │ let add : Num -> Num -> Num = fun x y => x + y
  │                               ^^^^^^^^^^^^^^^^ applied to this expression
  │
[..]
note:
  ┌─ repl-input-7:1:1
  │
1 │ add 5 "a"
  │ --------- (1) calling <func>
[..]

This error happens before the body of add is even entered. The Nickel interpreter wraps add in a function that first checks the parameters to be of the required type before actually calling add. Sounds familiar? This is exactly what we described in the post on contracts. That is, typed functions are automatically guarded by a higher-order contract. This ensures that type errors are caught before entering well-typed land, which greatly improves error locality.

In summary, to avoid sneaking ill-typed value in well-typed blocks, the Nickel interpreter automatically protects typed functions by inserting appropriate contracts.

A contract with the devil

We have dealt with the issue of calling typed code from untyped code. A natural follow-up is to examine the dual case: how can one use definitions living in untyped code inside a statically typed context? Consider the following example:

// this example does NOT typecheck
let f = fun x => if x then 10 else "a" in
let doStuffToNum: Num -> Num = fun arg =>
  arg + (f true) in

doStuffToNum 1

The typed function doStuffToNum calls to an untyped function f. f true turns out to be a number indeed, but f itself is not well-typed, because the types of the if and the else branch don’t match. No amount of additional type annotations can make this program accepted.

See what happens in practice:

error: Incompatible types
  ┌─ repl-input-1:3:10
  │
3 │   arg + (f true) in
  │          ^ this expression
  │
  = The type of the expression was expected to be `_a -> Num`
  = The type of the expression was inferred to be `Dyn`
  = These types are not compatible

f not being annotated, the typechecker can’t do much better than to give f the dynamic type Dyn (although in some trivial cases, it can infer a better type). Since it was expecting a function returning Num, it complains. It seems we are doomed to restrict our usage of untyped variables to trivial expressions, or to type them all.

Or are we? One more time, contracts come to the rescue. Going back to the post on contracts again, contracts are enforced similarly to types, but using | instead of :. Let us fix our example:

let doStuffToNum: Num -> Num = fun arg =>
  arg + (f true | Num) in

We just applied a Num contract to f true, and surprise, this code typechecks! Our typechecker didn’t get magically smarter. By adding this contract check, we ensure the fundamental property that f true will either evaluate to a number or fail at run-time with an appropriate contract error. In particular, no value of type Str, for example, can ever enter our well-typed paradise add. When writing exp | Type in a typed block, two things happen:

  1. The typechecker switches back to the default mode inside exp, where it doesn’t enforce anything until the next promise (annotation).
  2. The typechecker blindly assumes that the type of exp | Type is Type. Hence, contract checks are also called assume.

Put differently, a contract check is considered a type cast by the static type system, whose correctness check is delayed to run-time.

Behold: this implies that something like (5 | Bool) : Bool typechecks. How outrageous, for a typed functional language proponent. But even languages like Haskell have some side conditions to type soundness: b :: Bool doesn’t guarantee that b evaluate to a boolean, for it can loop, or raise an exception. Minimizing the amount of such possibilities is surely for the better, but the important point remains that b never silently evaluates to a string.

To conclude, we can use contracts as delayed type casts, to make the typechecker accept untyped terms in a typed context. This is useful to import values from untyped code, or to write expressions that we know are correct but that the typechecker wouldn’t accept.

Conclusion

There is more to say about Nickel’s type system, that features parametric polymorphism or structural typing with row polymorphism for records, to cite some interesting aspects. The purpose of this post is rather to explain the essence of gradual typing, why it makes sense in the context of Nickel, and how it is implemented. We’ve seen that contracts are a fundamental ingredient for the interaction between typed and untyped code in both ways.


  1. I will use typed to mean statically typed, and untyped to mean dynamically typed.

March 18, 2021 12:00 AM

March 12, 2021

Sander van der Burg

Using the Nix process management framework as an infrastructure deployment solution for Disnix

As explained in many previous blog posts, I have developed Disnix as a solution for automating the deployment of service-oriented systems -- it deploys heterogeneous systems, that consist of many different kinds of components (such as web applications, web services, databases and processes) to networks of machines.

The deployment models for Disnix are typically not fully self-contained. Foremost, a precondition that must be met before a service-oriented system can be deployed, is that all target machines in the network require the presence of Nix package manager, Disnix, and a remote connectivity service (e.g. SSH).

For multi-user Disnix installations, in which the user does not have super-user privileges, the Disnix service is required to carry out deployment operations on behalf of a user.

Moreover, the services in the services model typically need to be managed by other services, called containers in Disnix terminology (not to be confused with Linux containers).

Examples of container services are:

  • The MySQL DBMS container can manage multiple databases deployed by Disnix.
  • The Apache Tomcat servlet container can manage multiple Java web applications deployed by Disnix.
  • systemd can act as a container that manages multiple systemd units deployed by Disnix.

Managing the life-cycles of services in containers (such as activating or deactivating them) is done by a companion tool called Dysnomia.

In addition to Disnix, these container services also typically need to be deployed in advance to the target machines in the network.

The problem domain that Disnix works in is called service deployment, whereas the deployment of machines (bare metal or virtual machines) and the container services is called infrastructure deployment.

Disnix can be complemented with a variety of infrastructure deployment solutions:

  • NixOps can deploy networks of NixOS machines, both physical and virtual machines (in the cloud), such as Amazon EC2.

    As part of a NixOS configuration, the Disnix service can be deployed that facilitates multi-user installations. The Dysnomia NixOS module can expose all relevant container services installed by NixOS as container deployment targets.
  • disnixos-deploy-network is a tool that is included with the DisnixOS extension toolset. Since services in Disnix can be any kind of deployment unit, it is also possible to deploy an entire NixOS configuration as a service. This tool is mostly developed for demonstration purposes.

    A limitation of this tool is that it cannot instantiate virtual machines and bootstrap Disnix.
  • Disnix itself. The above solutions are all NixOS-based, a software distribution that is Linux-based and fully managed by the Nix package manager.

    Although NixOS is very powerful, it has two drawbacks for Disnix:

    • NixOS uses the NixOS module system for configuring system aspects. It is very powerful but you can only deploy one instance of a system service -- Disnix can also work with multiple container instances of the same type on a machine.
    • Services in NixOS cannot be deployed to other kinds software distributions: conventional Linux distributions, and other operating systems, such as macOS and FreeBSD.

    To overcome these limitations, Disnix can also be used as a container deployment solution on any operating system that is capable of running Nix and Disnix. Services deployed by Disnix can automatically be exposed as container providers.

    Similar to disnix-deploy-network, a limitation of this approach is that it cannot be used to bootstrap Disnix.

Last year, I have also added a new major feature to Disnix making it possible to deploy both application and container services in the same Disnix deployment models, minimizing the infrastructure deployment problem -- the only requirement is to have machines with Nix, Disnix, and a remote connectivity service (such as SSH) pre-installed on them.

Although this integrated feature is quite convenient, in particular for test setups, a separated infrastructure deployment process (that includes container services) still makes sense in many scenarios:

  • The infrastructure parts and service parts can be managed by different people with different specializations. For example, configuring and tuning an application server is a different responsibility than developing a Java web application.
  • The service parts typically change more frequently than the infrastructure parts. As a result, they typically have different kinds of update cycles.
  • The infrastructure components can typically be reused between projects (e.g. many systems use a database backend such as PostgreSQL or MySQL), whereas the service components are typically very project specific.

I also realized that my other project: the Nix process management framework can serve as a partial infrastructure deployment solution -- it can be used to bootstrap Disnix and deploy container services.

Moreover, it can also deploy multiple instances of container services and used on any operating system that the Nix process management framework supports, including conventional Linux distributions and other operating systems, such as macOS and FreeBSD.

Deploying and exposing the Disnix service with the Nix process management framework


As explained earlier, to allow Disnix to deploy services to a remote machine, a machine needs to have Disnix installed (and run the Disnix service for a multi-user installation), and be remotely connectible, e.g. through SSH.

I have packaged all required services as constructor functions for the Nix process management framework.

The following process model captures the configuration of a basic multi-user Disnix installation:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
ids = if builtins.pathExists ./ids-bare.nix then (import ./ids-bare.nix).ids else {};

constructors = import ../../services-agnostic/constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
};
in
rec {
sshd = {
pkg = constructors.sshd {
extraSSHDConfig = ''
UsePAM yes
'';
};

requiresUniqueIdsFor = [ "uids" "gids" ];
};

dbus-daemon = {
pkg = constructors.dbus-daemon {
services = [ disnix-service ];
};

requiresUniqueIdsFor = [ "uids" "gids" ];
};

disnix-service = {
pkg = constructors.disnix-service {
inherit dbus-daemon;
};

requiresUniqueIdsFor = [ "gids" ];
};
}

The above processes model (processes.nix) captures three process instances:

  • sshd is the OpenSSH server that makes it possible to remotely connect to the machine by using the SSH protocol.
  • dbus-daemon runs a D-Bus system daemon, that is a requirement for the Disnix service. The disnix-service is propagated as a parameter, so that its service directory gets added to the D-Bus system daemon configuration.
  • disnix-service is a service that executes deployment operations on behalf of an authorized unprivileged user. The disnix-service has a dependency on the dbus-service making sure that the latter gets activated first.

We can deploy the above configuration on a machine that has the Nix process management framework already installed.

For example, to deploy the configuration on a machine that uses supervisord, we can run:


$ nixproc-supervisord-switch processes.nix

Resulting in a system that consists of the following running processes:


$ supervisorctl
dbus-daemon RUNNING pid 2374, uptime 0:00:34
disnix-service RUNNING pid 2397, uptime 0:00:33
sshd RUNNING pid 2375, uptime 0:00:34

As may be noticed, the above supervised services correspond to the processes in the processes model.

On the coordinator machine, we can write a bootstrap infrastructure model (infra-bootstrap.nix) that only contains connectivity settings:


{
test1.properties.hostname = "192.168.2.1";
}

and use the bootstrap model to capture the full infrastructure model of the system:


$ disnix-capture-infra infra-bootstrap.nix

resulting in the following configuration:


{
"test1" = {
properties = {
"hostname" = "192.168.2.1";
"system" = "x86_64-linux";
};
containers = {
echo = {
};
fileset = {
};
process = {
};
supervisord-program = {
"supervisordTargetDir" = "/etc/supervisor/conf.d";
};
wrapper = {
};
};
"system" = "x86_64-linux";
};
}

Despite the fact that we have not configured any containers explicitly, the above configuration (infrastructure.nix) already exposes a number of container services:

  • The echo, fileset and process container services are built-in container providers that any Dysnomia installation includes.

    The process container can be used to automatically deploy services that daemonize. Services that daemonize themselves do not require the presence of any external service.
  • The supervisord-program container refers to the process supervisor that manages the services deployed by the Nix process management framework. It can also be used as a container for processes deployed by Disnix.

With the above infrastructure model, we can deploy any system that depends on the above container services, such as the trivial Disnix proxy example:


{ system, distribution, invDistribution, pkgs
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager ? "supervisord"
, nix-processmgmt ? ../../../nix-processmgmt
}:

let
customPkgs = import ../top-level/all-packages.nix {
inherit system pkgs stateDir logDir runtimeDir tmpDir forceDisableUserChange processManager nix-processmgmt;
};

ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

processType = import "${nix-processmgmt}/nixproc/derive-dysnomia-process-type.nix" {
inherit processManager;
};
in
rec {
hello_world_server = rec {
name = "hello_world_server";
port = ids.ports.hello_world_server or 0;
pkg = customPkgs.hello_world_server { inherit port; };
type = processType;
requiresUniqueIdsFor = [ "ports" ];
};

hello_world_client = {
name = "hello_world_client";
pkg = customPkgs.hello_world_client;
dependsOn = {
inherit hello_world_server;
};
type = "package";
};
}

The services model shown above (services.nix) captures two services:

  • The hello_world_server service is a simple service that listens on a TCP port for a "hello" message and responds with a "Hello world!" message.
  • The hello_world_client service is a package providing a client executable that automatically connects to the hello_world_server.

With the following distribution model (distribution.nix), we can map all the services to our deployment machine (that runs the Disnix service managed by the Nix process management framework):


{infrastructure}:

{
hello_world_client = [ infrastructure.test1 ];
hello_world_server = [ infrastructure.test1 ];
}

and deploy the system by running the following command:


$ disnix-env -s services-without-proxy.nix \
-i infrastructure.nix \
-d distribution.nix \
--extra-params '{ processManager = "supervisord"; }'

The last parameter: --extra-params configures the services model (that indirectly invokes the createManagedProcess abstraction function from the Nix process management framework) in such a way that supervisord configuration files are generated.

(As a sidenote: without the --extra-params parameter, the process instances will be built for the disnix process manager generating configuration files that can be deployed to the process container, expecting programs to daemonize on their own and leave a PID file behind with the daemon's process ID. Although this approach is convenient for experiments, because no external service is required, it is not as reliable as managing supervised processes).

The result of the above deployment operation is that the hello-world-service service is deployed as a service that is also managed by supervisord:


$ supervisorctl
dbus-daemon RUNNING pid 2374, uptime 0:09:39
disnix-service RUNNING pid 2397, uptime 0:09:38
hello-world-server RUNNING pid 2574, uptime 0:00:06
sshd RUNNING pid 2375, uptime 0:09:39

and we can use the hello-world-client executable on the target machine to connect to the service:


$ /nix/var/nix/profiles/disnix/default/bin/hello-world-client
Trying 192.168.2.1...
Connected to 192.168.2.1.
Escape character is '^]'.
hello
Hello world!

Deploying container providers and exposing them


With Disnix, it is also possible to deploy systems that are composed of different kinds of components, such as web services and databases.

For example, the Java variant of the ridiculous Staff Tracker example consists of the following services:


The services in the diagram above have the following purpose:

  • The StaffTracker service is the front-end web application that shows an overview of staff members and their locations.
  • The StaffService service is web service with a SOAP interface that provides read and write access to the staff records. The staff records are stored in the staff database.
  • The RoomService service provides read access to the rooms records, that are stored in a separate rooms database.
  • The ZipcodeService service provides read access to zip codes, that are stored in a separate zipcodes database.
  • The GeolocationService infers the location of a staff member from its IP address using the GeoIP service.

To deploy the system shown above, we need a target machine that provides Apache Tomcat (for managing the web application front-end and web services) and MySQL (for managing the databases) as container provider services:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
ids = if builtins.pathExists ./ids-tomcat-mysql.nix then (import ./ids-tomcat-mysql.nix).ids else {};

constructors = import ../../services-agnostic/constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
};

containerProviderConstructors = import ../../service-containers-agnostic/constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
};
in
rec {
sshd = {
pkg = constructors.sshd {
extraSSHDConfig = ''
UsePAM yes
'';
};

requiresUniqueIdsFor = [ "uids" "gids" ];
};

dbus-daemon = {
pkg = constructors.dbus-daemon {
services = [ disnix-service ];
};

requiresUniqueIdsFor = [ "uids" "gids" ];
};

tomcat = containerProviderConstructors.simpleAppservingTomcat {
commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
webapps = [
pkgs.tomcat9.webapps # Include the Tomcat example and management applications
];

properties.requiresUniqueIdsFor = [ "uids" "gids" ];
};

mysql = containerProviderConstructors.mysql {
properties.requiresUniqueIdsFor = [ "uids" "gids" ];
};

disnix-service = {
pkg = constructors.disnix-service {
inherit dbus-daemon;
containerProviders = [ tomcat mysql ];
};

requiresUniqueIdsFor = [ "gids" ];
};
}

The process model above is an extension of the previous processes model, adding two container provider services:

  • tomcat is the Apache Tomcat server. The constructor function: simpleAppServingTomcat composes a configuration for a supported process manager, such as supervisord.

    Moreover, it bundles a Dysnomia container configuration file, and a Dysnomia module: tomcat-webapplication that can be used to manage the life-cycles of Java web applications embedded in the servlet container.
  • mysql is the MySQL DBMS server. The constructor function also creates a process manager configuration file, and bundles a Dysnomia container configuration file and module that manages the life-cycles of databases.
  • The container services above are propagated as containerProviders to the disnix-service. This function parameter is used to update the search paths for container configuration and modules, so that services can be deployed to these containers by Disnix.

After deploying the above processes model, we should see the following infrastructure model after capturing it:


$ disnix-capture-infra infra-bootstrap.nix
{
"test1" = {
properties = {
"hostname" = "192.168.2.1";
"system" = "x86_64-linux";
};
containers = {
echo = {
};
fileset = {
};
process = {
};
supervisord-program = {
"supervisordTargetDir" = "/etc/supervisor/conf.d";
};
wrapper = {
};
tomcat-webapplication = {
"tomcatPort" = "8080";
"catalinaBaseDir" = "/var/tomcat";
};
mysql-database = {
"mysqlPort" = "3306";
"mysqlUsername" = "root";
"mysqlPassword" = "";
"mysqlSocket" = "/var/run/mysqld/mysqld.sock";
};
};
"system" = "x86_64-linux";
};
}

As may be observed, the tomcat-webapplication and mysql-database containers (with their relevant configuration properties) were added to the infrastructure model.

With the following command we can deploy the example system's services to the containers in the network:


$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

resulting in a fully functional system:


Deploying multiple container provider instances


As explained in the introduction, a limitation of the NixOS module system is that it is only possible to construct one instance of a service on a machine.

Process instances in a processes model deployed by the Nix process management framework as well as services in a Disnix services model are instantiated from functions that make it possible to deploy multiple instances of the same service to the same machine, by making conflicting properties configurable.

The following processes model was modified from the previous example to deploy two MySQL servers and two Apache Tomcat servers to the same machine:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
ids = if builtins.pathExists ./ids-tomcat-mysql-multi-instance.nix then (import ./ids-tomcat-mysql-multi-instance.nix).ids else {};

constructors = import ../../services-agnostic/constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
};

containerProviderConstructors = import ../../service-containers-agnostic/constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
};
in
rec {
sshd = {
pkg = constructors.sshd {
extraSSHDConfig = ''
UsePAM yes
'';
};

requiresUniqueIdsFor = [ "uids" "gids" ];
};

dbus-daemon = {
pkg = constructors.dbus-daemon {
services = [ disnix-service ];
};

requiresUniqueIdsFor = [ "uids" "gids" ];
};

tomcat-primary = containerProviderConstructors.simpleAppservingTomcat {
instanceSuffix = "-primary";
httpPort = 8080;
httpsPort = 8443;
serverPort = 8005;
ajpPort = 8009;
commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
webapps = [
pkgs.tomcat9.webapps # Include the Tomcat example and management applications
];
properties.requiresUniqueIdsFor = [ "uids" "gids" ];
};

tomcat-secondary = containerProviderConstructors.simpleAppservingTomcat {
instanceSuffix = "-secondary";
httpPort = 8081;
httpsPort = 8444;
serverPort = 8006;
ajpPort = 8010;
commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
webapps = [
pkgs.tomcat9.webapps # Include the Tomcat example and management applications
];
properties.requiresUniqueIdsFor = [ "uids" "gids" ];
};

mysql-primary = containerProviderConstructors.mysql {
instanceSuffix = "-primary";
port = 3306;
properties.requiresUniqueIdsFor = [ "uids" "gids" ];
};

mysql-secondary = containerProviderConstructors.mysql {
instanceSuffix = "-secondary";
port = 3307;
properties.requiresUniqueIdsFor = [ "uids" "gids" ];
};

disnix-service = {
pkg = constructors.disnix-service {
inherit dbus-daemon;
containerProviders = [ tomcat-primary tomcat-secondary mysql-primary mysql-secondary ];
};

requiresUniqueIdsFor = [ "gids" ];
};
}

In the above processes model, we made the following changes:

  • We have configured two Apache Tomcat instances: tomcat-primary and tomcat-secondary. Both instances can co-exist because they have been configured in such a way that they listen to unique TCP ports and have a unique instance name composed from the instanceSuffix.
  • We have configured two MySQL instances: mysql-primary and mysql-secondary. Similar to Apache Tomcat, they can both co-exist because they listen to unique TCP ports (e.g. 3306 and 3307) and have a unique instance name.
  • Both the primary and secondary instances of the above services are propagated to the disnix-service (with the containerProviders parameter) making it possible for a client to discover them.

After deploying the above processes model, we can run the following command to discover the machine's configuration:


$ disnix-capture-infra infra-bootstrap.nix
{
"test1" = {
properties = {
"hostname" = "192.168.2.1";
"system" = "x86_64-linux";
};
containers = {
echo = {
};
fileset = {
};
process = {
};
supervisord-program = {
"supervisordTargetDir" = "/etc/supervisor/conf.d";
};
wrapper = {
};
tomcat-webapplication-primary = {
"tomcatPort" = "8080";
"catalinaBaseDir" = "/var/tomcat-primary";
};
tomcat-webapplication-secondary = {
"tomcatPort" = "8081";
"catalinaBaseDir" = "/var/tomcat-secondary";
};
mysql-database-primary = {
"mysqlPort" = "3306";
"mysqlUsername" = "root";
"mysqlPassword" = "";
"mysqlSocket" = "/var/run/mysqld-primary/mysqld.sock";
};
mysql-database-secondary = {
"mysqlPort" = "3307";
"mysqlUsername" = "root";
"mysqlPassword" = "";
"mysqlSocket" = "/var/run/mysqld-secondary/mysqld.sock";
};
};
"system" = "x86_64-linux";
};
}

As may be observed, the infrastructure model contains two Apache Tomcat instances and two MySQL instances.

With the following distribution model (distribution.nix), we can divide each database and web application over the two container instances:


{infrastructure}:

{
GeolocationService = {
targets = [
{ target = infrastructure.test1;
container = "tomcat-webapplication-primary";
}
];
};
RoomService = {
targets = [
{ target = infrastructure.test1;
container = "tomcat-webapplication-secondary";
}
];
};
StaffService = {
targets = [
{ target = infrastructure.test1;
container = "tomcat-webapplication-primary";
}
];
};
StaffTracker = {
targets = [
{ target = infrastructure.test1;
container = "tomcat-webapplication-secondary";
}
];
};
ZipcodeService = {
targets = [
{ target = infrastructure.test1;
container = "tomcat-webapplication-primary";
}
];
};
rooms = {
targets = [
{ target = infrastructure.test1;
container = "mysql-database-primary";
}
];
};
staff = {
targets = [
{ target = infrastructure.test1;
container = "mysql-database-secondary";
}
];
};
zipcodes = {
targets = [
{ target = infrastructure.test1;
container = "mysql-database-primary";
}
];
};
}

Compared to the previous distribution model, the above model uses a more verbose notation for mapping services.

As explained in an earlier blog post, in deployments in which only a single container is deployed, services are automapped to the container that has the same name as the service's type. When multiple instances exist, we need to manually specify the container where the service needs to be deployed to.

After deploying the system with the following command:


$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

we will get a running system with the following deployment architecture:


Using the Disnix web service for executing remote deployment operations


By default, Disnix uses SSH to communicate to target machines in the network. Disnix has a modular architecture and is also capable of communicating to target machines by other means, for example via NixOps, the backdoor client, D-Bus, and directly executing tasks on a local machine.

There is also an external package: DisnixWebService that remotely exposes all deployment operations from a web service with a SOAP API.

To use the DisnixWebService, we must deploy a Java servlet container (such as Apache Tomcat) with the DisnixWebService application, configured in such a way that it can connect to the disnix-service over the D-Bus system bus.

The following processes model is an extension of the non-multi containers Staff Tracker example, with an Apache Tomcat service that bundles the DisnixWebService:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
ids = if builtins.pathExists ./ids-tomcat-mysql.nix then (import ./ids-tomcat-mysql.nix).ids else {};

constructors = import ../../services-agnostic/constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
};

containerProviderConstructors = import ../../service-containers-agnostic/constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
};
in
rec {
sshd = {
pkg = constructors.sshd {
extraSSHDConfig = ''
UsePAM yes
'';
};

requiresUniqueIdsFor = [ "uids" "gids" ];
};

dbus-daemon = {
pkg = constructors.dbus-daemon {
services = [ disnix-service ];
};

requiresUniqueIdsFor = [ "uids" "gids" ];
};

tomcat = containerProviderConstructors.disnixAppservingTomcat {
commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
webapps = [
pkgs.tomcat9.webapps # Include the Tomcat example and management applications
];
enableAJP = true;
inherit dbus-daemon;

properties.requiresUniqueIdsFor = [ "uids" "gids" ];
};

apache = {
pkg = constructors.basicAuthReverseProxyApache {
dependency = tomcat;
serverAdmin = "admin@localhost";
targetProtocol = "ajp";
portPropertyName = "ajpPort";

authName = "DisnixWebService";
authUserFile = pkgs.stdenv.mkDerivation {
name = "htpasswd";
buildInputs = [ pkgs.apacheHttpd ];
buildCommand = ''
htpasswd -cb ./htpasswd admin secret
mv htpasswd $out
'';
};
requireUser = "admin";
};

requiresUniqueIdsFor = [ "uids" "gids" ];
};

mysql = containerProviderConstructors.mysql {
properties.requiresUniqueIdsFor = [ "uids" "gids" ];
};

disnix-service = {
pkg = constructors.disnix-service {
inherit dbus-daemon;
containerProviders = [ tomcat mysql ];
authorizedUsers = [ tomcat.name ];
dysnomiaProperties = {
targetEPR = "https://$(hostname)/DisnixWebService/services/DisnixWebService";
};
};

requiresUniqueIdsFor = [ "gids" ];
};
}

The above processes model contains the following changes:

  • The Apache Tomcat process instance is constructed with the containerProviderConstructors.disnixAppservingTomcat constructor function automatically deploying the DisnixWebService and providing the required configuration settings so that it can communicate with the disnix-service over the D-Bus system bus.

    Because the DisnixWebService requires the presence of the D-Bus system daemon, it is configured as a dependency for Apache Tomcat ensuring that it is started before Apache Tomcat.
  • Connecting to the Apache Tomcat server including the DisnixWebService requires no authentication. To secure the web applications and the DisnixWebService, I have configured an apache reverse proxy that forwards connections to Apache Tomcat using the AJP protocol.

    Moreover, the reverse proxy protects incoming requests by using HTTP basic authentication requiring a username and password.

We can use the following bootstrap infrastructure model to discover the machine's configuration:


{
test1.properties.targetEPR = "https://192.168.2.1/DisnixWebService/services/DisnixWebService";
}

The difference between this bootstrap infrastructure model and the previous is that it uses a different connection property (targetEPR) that refers to the URL of the DisnixWebService.

By default, Disnix uses the disnix-ssh-client to communicate to target machines. To use a different client, we must set the following environment variables:


$ export DISNIX_CLIENT_INTERFACE=disnix-soap-client
$ export DISNIX_TARGET_PROPERTY=targetEPR

The above environment variables instruct Disnix to use the disnix-soap-client executable and the targetEPR property from the infrastructure model as a connection string.

To authenticate ourselves, we must set the following environment variables with a username and password:


$ export DISNIX_SOAP_CLIENT_USERNAME=admin
$ export DISNIX_SOAP_CLIENT_PASSWORD=secret

The following command makes it possible to discover the machine's configuration using the disnix-soap-client and DisnixWebService:


$ disnix-capture-infra infra-bootstrap.nix
{
"test1" = {
properties = {
"hostname" = "192.168.2.1";
"system" = "x86_64-linux";
"targetEPR" = "https://192.168.2.1/DisnixWebService/services/DisnixWebService";
};
containers = {
echo = {
};
fileset = {
};
process = {
};
supervisord-program = {
"supervisordTargetDir" = "/etc/supervisor/conf.d";
};
wrapper = {
};
tomcat-webapplication = {
"tomcatPort" = "8080";
"catalinaBaseDir" = "/var/tomcat";
"ajpPort" = "8009";
};
mysql-database = {
"mysqlPort" = "3306";
"mysqlUsername" = "root";
"mysqlPassword" = "";
"mysqlSocket" = "/var/run/mysqld/mysqld.sock";
};
};
"system" = "x86_64-linux";
}
;
}

After capturing the full infrastructure model, we can deploy the system with disnix-env if desired, using the disnix-soap-client to carry out all necessary remote deployment operations.

Miscellaneous: using Docker containers as light-weight virtual machines


As explained earlier in this blog post, the Nix process management framework is only a partial infrastructure deployment solution -- you still need to somehow obtain physical or virtual machines with a software distribution running the Nix package manager.

In a blog post written some time ago, I have explained that Docker containers are not virtual machines or even light-weight virtual machines.

In my previous blog post, I have shown that we can also deploy mutable Docker multi-process containers in which process instances can be upgraded without stopping the container.

The deployment workflow for upgrading mutable containers, is very machine-like -- NixOS has a similar workflow that consists of updating the machine configuration (/etc/nixos/configuration.nix) and running a single command-line instruction to upgrade machine (nixos-rebuild switch).

We can actually start using containers as VMs by adding another ingredient in the mix -- we can also assign static IP addresses to Docker containers.

With the following Nix expression, we can create a Docker image for a mutable container, using any of the processes models shown previously as the "machine's configuration":


let
pkgs = import <nixpkgs> {};

createMutableMultiProcessImage = import ../nix-processmgmt/nixproc/create-image-from-steps/create-mutable-multi-process-image-universal.nix {
inherit pkgs;
};
in
createMutableMultiProcessImage {
name = "disnix";
tag = "test";
contents = [ pkgs.mc pkgs.disnix ];
exprFile = ./processes.nix;
interactive = true;
manpages = true;
processManager = "supervisord";
}

The exprFile in the above Nix expression refers to a previously shown processes model, and the processManager the desired process manager to use, such as supervisord.

With the following command, we can build the image with Nix and load it into Docker:


$ nix-build
$ docker load -i result

With the following command, we can create a network to which our containers (with IP addresses) should belong:


$ docker network create --subnet=192.168.2.0/8 disnixnetwork

The above command creates a subnet with a prefix: 192.168.2.0 and allocates an 8-bit block for host IP addresses.

We can create and start a Docker container named: containervm using our previously built image, and assign it an IP address:


$ docker run --network disnixnetwork --ip 192.168.2.1 \
--name containervm disnix:test

By default, Disnix uses SSH to connect to remote machines. With the following commands we can create a public-private key pair and copy the public key to the container:


$ ssh-keygen -t ed25519 -f id_test -N ""

$ docker exec containervm mkdir -m0700 -p /root/.ssh
$ docker cp id_test.pub containervm:/root/.ssh/authorized_keys
$ docker exec containervm chmod 600 /root/.ssh/authorized_keys
$ docker exec containervm chown root:root /root/.ssh/authorized_keys

On the coordinator machine, that carries out the deployment, we must add the private key to the SSH agent and configure the disnix-ssh-client to connect to the disnix-service:


$ ssh-add id_test
$ export DISNIX_REMOTE_CLIENT=disnix-client

By executing all these steps, containervm can be (mostly) used as if it were a virtual machine, including connecting to it with an IP address over SSH.

Conclusion


In this blog post, I have described how the Nix process management framework can be used as a partial infrastructure deployment solution for Disnix. It can be used both for deploying the disnix-service (to facilitate multi-user installations) as well as deploying container providers: services that manage the life-cycles of services deployed by Disnix.

Moreover, the Nix process management framework makes it possible to do these deployments on all kinds of software distributions that can use the Nix package manager, including NixOS, conventional Linux distributions and other operating systems, such as macOS and FreeBSD.

If I had developed this solution a couple of years ago, it would probably have saved me many hours of preparation work for my first demo in my NixCon 2015 talk in which I wanted demonstrate that it is possible to deploy services to a heterogeneous network that consists of a NixOS, Ubuntu and Windows machine. Back then, I had to do all the infrastructure deployment tasks manually.

I also have to admit (but this statement is mostly based on my personal preferences, not facts), is that I find the functional style that the framework uses is IMO far more intuitive than the NixOS module system for certain service configuration aspects, especially for configuring container services and exposing them with Disnix and Dysnomia:

  • Because every process instance is constructed from a constructor function that makes all instance parameters explicit, you are guarded against common configuration errors such as undeclared dependencies.

    For example, the DisnixWebService-enabled Apache Tomcat service requires access to the dbus-service providing the system bus. Not having this service in the processes model, causes a missing function parameter error.
  • Function parameters in the processes model make it more clear that a process depends on another process and what that relationship may be. For example, with the containerProviders parameter it becomes IMO really clear that the disnix-service uses them as potential deployment targets for services deployed by Disnix.

    In comparison, the implementations of the Disnix and Dysnomia NixOS modules are far more complicated and monolithic -- the Dysnomia module has to figure for all potential container services deployed as part of a NixOS configuration, their properties, convert them to Dysnomia configuration files, and configure the systemd configuration for the disnix-service for proper activation ordering.

    The wants parameter (used for activation ordering) is just a list of strings, not knowing whether it contains valid references to services that have been deployed already.

Availability


The constructor functions for the services as well as the deployment examples described in this blog post can be found in the Nix process management services repository.

Future work


Slowly more and more of my personal use cases are getting supported by the Nix process management framework.

Moreover, the services repository is steadily growing. To ensure that all the services that I have packaged so far do not break, I really need to focus my work on a service test solution.

by Sander van der Burg (noreply@blogger.com) at March 12, 2021 10:28 PM

March 04, 2021

Tweag I/O

Announcing Gomod2nix

I’m very pleased to announce Gomod2nix, a new tool to create Go packages with Nix!

Gomod2nix is a code generation tool whose main focus is addressing the correctness and usability concerns I have with the current Go packaging solutions. It offers a composable override interface, which allows overrides to be shared across projects, simplifying the packaging of complex packages. As a bonus, it also boasts much better cache hit rates than other similar solutions, owing to not abusing fixed-output derivations.

I also took the opportunity of this new package to address some long-standing annoyances with existing Go Nix tooling. For instance Gomod2nix disables CGO by default, this let me enable static Go binaries by default. Changing the defaults in existing tooling would be very difficult as it would break a lot of existing packages, especially those maintained outside of Nixpkgs, which depend on the present behavior.

In order to motivate this new tool, let’s take a look at how Go dependency management evolved.

The developmentent of Trustix (which Gomod2nix was developed for) is funded by NLNet foundation and the European Commission’s Next Generation Internet programme through the NGI Zero PET (privacy and trust enhancing technologies) fund.

A history of Go packaging

In Go you don’t add dependencies to a manifest, but instead you add a dependency to your project by simply adding an import to a source file:

package main

import (
    "fmt"
    "github.com/tweag/foo"
)

func main() {
    fmt.Println(foo.SomeVar)
}

and the go tool will figure out how to fetch this dependency.

From the beginning Go didn’t have package management in the traditional sense. Instead it enforced a directory structure that mimics the import paths. A project called github.com/tweag/foo that depends on github.com/tweag/bar expects to be located in a directory structure looking like:

$GOPATH/github.com/tweag/foo
$GOPATH/github.com/tweag/foo/main.go
$GOPATH/github.com/tweag/bar
$GOPATH/github.com/tweag/bar/main.go

This may not look so bad in this very simple example, but since this structure is not only enforced for your packages but also your dependencies, this quickly becomes messy. The $GOPATH mechanism has been one of the truly sore spots of Go development. Under this packaging paradigm you are expected to always use the latest Git master of all your dependencies and there is no version locking.

Dep was the first official packaging experiment for Go. This tool improved upon $GOPATH not by removing it, but by hiding that complexity from the user entirely. Besides that, it added lock files and a SAT dependency solver.

Finally, armed with the learnings and some critique of Dep, the Go team decided to develop a new simpler solution — Go modules. It addressed a number of perceived problems with Dep, like the fact that the use of semver and SAT solvers are far too complicated for the requirements Go has. As of now, Dep is deprecated, and Go modules is the solution I developed against.

A tale of two lock files

Originally, I set out to design gomod2nix in the same way as poetry2nix, a Python packaging solution for Nix. In poetry2nix one refers directly to a Poetry lock file from Nix, and poetry2nix does all the job needed to create the Nix package, which is very convenient.

However, this wasn’t possible here because of the design of Go modules, for reasons that I will explain below. As a consequence, gomod2nix is a Nix code generation tool. One feeds lock files to it, and it generates Nix expressions defining the corresponding packages.

In the following, I will compare the Poetry and Go lock files, and show which limitations the Go file format and import mechanism imposes upon us.

Exposed dependency graphs

First, let’s look at an excerpt of Poetry’s lock file:

[[package]]
name = "cachecontrol"
version = "0.12.6"
description = "httplib2 caching for requests"
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"

[package.dependencies]
lockfile = {version = ">=0.9", optional = true, markers = "extra == \"filecache\""}
msgpack = ">=0.5.2"
requests = "*"

And also at go.sum:

github.com/davecgh/go-spew v1.1.0 h1:ZDRjVQ15GmhC3fiQ8ni8+OwkZQO4DARzQgrnXU1Liz8=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/stretchr/objx v0.1.0 h1:4G4v2dO3VZwixGIRoQ5Lfboy6nUhCyYzaqnIAPPhYs4=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c h1:dUUwHk2QECo/6vqA44rthZ8ie2QXMNeKRTHCNY2nXvo=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

After squinting at these files for a while we can already see some radical differences in semantics, most notably that go.sum is structured as a flat list rather than a graph. Of course, all dependencies to the build are listed in go.sum, but we don’t know what depends on what. What this means for us in Nix is that we have no good unit of incrementality — everything has to be built together — while Poetry can build dependencies separately.

Bespoke formats

Go modules use its own Go-like file format, while Poetry uses TOML to serialize both its manifest and lock files.

While this format is simple and writing a parser for it isn’t hard, it makes the lives of tooling authors harder. It would be much easier if a standard format for data interchange was used, rather than a custom format.

Bespoke hashes

The next problem with Go modules is its use of a custom hashing mechanism that’s fundamentally incompatible with how Nix hashes paths.

As explained in Eelco Dolstra’s thesis The Purely Functional Software Deployment Model Nix uses it’s own reproducible archive format NAR, which is used for both uploading build results and for directory hashing.

The Go developers faced with similar concerns created their own directory hashing scheme, which unfortunately is fundamentally incompatible with Nix hashes. I don’t see how Go modules could have done this any better, but the situation is unfortunate.

Dynamic package name resolving

In the previous example, I showed how a Go import path looks like. Sadly it turns out that the surface simplicity of those paths hide a lot of underlying logic.

Internally, these import paths are handled by the RepoRootForImport family of functions in the vcs (version control system) package, which maps import paths to repository URLs and VCS types. Some of these are matched statically using regex but others use active probing.

This is a true showstopper for a pure-Nix Go packaging solution, and the reason why Gomod2nix is a code generation tool — we don’t have network access in the Nix evaluator, making it impossible to correctly resolve VCS from a Nix evaluation.

Solutions to go modules packaging

The points above make our limitations clear. With these in mind, let’s discuss how Go packaging solutions were conceived.

Code generation: vgo2nix

My first attempt at creating a tool for packaging Go modules was vgo2nix, another code generation tool. It was written very shortly after modules were announced, and at the time the tooling support for them wasn’t good. For example, there wasn’t a parser for go.mod published back then.

It was based on the older Nixpkgs Go abstraction buildGoPackage, emulating a $GOPATH based build unfortunately with some assumptions that are not true.

Let’s again look at an excerpt from go.sum:

github.com/Azure/go-autorest/autorest v0.9.0/go.mod h1:xyHB1BMZT0cuDHU7I0+g046+BFDTQ8rEZB0s4Yfa6bI=
github.com/Azure/go-autorest/autorest v0.9.3/go.mod h1:GsRuLYvwzLjjjRoWEIyMUaYq8GNUx2nRB378IPt/1p0=
github.com/Azure/go-autorest/autorest/adal v0.5.0/go.mod h1:8Z9fGy2MpX0PvDjB1pEgQTmVqjGhiHBW7RJJEciWzS0=
github.com/Azure/go-autorest/autorest/adal v0.8.0/go.mod h1:Z6vX6WXXuyieHAXwMj0S6HY6e6wcHn37qQMBQlvY3lc=
github.com/Azure/go-autorest/autorest/adal v0.8.1/go.mod h1:ZjhuQClTqx435SRJ2iMlOxPYt3d2C/T/7TiQCVZSn3Q=
github.com/Azure/go-autorest/autorest/date v0.1.0/go.mod h1:plvfp3oPSKwf2DNjlBjWF/7vwR+cUD/ELuzDCXwHUVA=
github.com/Azure/go-autorest/autorest/date v0.2.0/go.mod h1:vcORJHLJEh643/Ioh9+vPmf1Ij9AEBM5FuBIXLmIy0g=

These packages are all developed in the same repository but have different tags, and in this case vgo2nix would incorrectly only clone one version of the repository and sacrifice the correctness we get from modules because of how $GOPATH is set up.

Fixed-output derivations: buildGoModule

The buildGoModule tool is the most popular solution for Go packaging right now in Nixpkgs. A typical buildGoModule package looks something like:

{ buildGoModule, fetchFromGitHub, lib }:

buildGoModule {
  pname = "someName";
  version = "0.0.1";

  src = fetchFromGitHub { ... };

  vendorSha256 = "1500vim2lmkkls758pwhlx3piqbw6ap0nnhdwz9pcxih4s4as2nk";
}

The buildGoModule package is designed around fixed-output derivations, which means that a single derivation is created where all the dependencies of the package you want to build are wrapped, and only a single hash of the derivation is specified. It fetches all dependencies in the fixed output, creating a vendor directory which is used for the build.

This has several issues, most notably there is no sharing of dependencies between packages that depend on the same Go module.

The other notable issue is that it forces developers to remember editing the vendorSha256 attribute separately from the already existing hash/sha256 attribute on the derivation. Forgetting to do so can not only lead to incorrect builds but also be frustrating when working with larger packages that takes a long time to build, and only very late in the build notice that something was broken so you have to start over from scratch.

Because of the lack of hash granularity the build needs to clone every dependency every time the vendorSha256 is invalidated, and cannot use the cache from previous builds.

Fixed-output derivations can also be considered an impurity, and there is a push to restrict them.

My solution: gomod2nix

Approach-wise gomod2nix positions itself right between vgo2nix and buildGoModule. It’s still a code generation tool like vgo2nix, but fully embraces the Go modules world and only supports Go modules based builds — the old GOPATH way is unsupported. It uses the same vendoring approach that buildGoModule uses, but instead of vendoring the actual sources in a derivation, it uses symlinks instead. In that way, dependencies can be fetched separately, and identical dependency source trees can be shared between multiple different packages in the Nix store.

From a user perspective the workflow is largely similar to vgo2nix:

  • You write a basic expression looking like:
pkgs.buildGoApplication {
  pname = "gomod2nix-example";
  version = "0.1";
  src = ./.;
  modules = ./gomod2nix.toml;
}
  • Run the code generation tool: $ gomod2nix

Conclusion

Go packaging looks very simple on the surface, but murky details lure around underneath, and there are lots of tiny details to get right to correctly create a Go package in a sandboxed environment like Nix.

I couldn’t get the best-in-class user experience I was hoping for and gotten used to with Poetry2nix. Code generation adds extra steps to the development process and requires either a developer or a test pipeline to keep the Nix expressions in sync with the language specific lock files, something that requires discipline and takes extra time and effort. Despite that, it turned out there were major wins to be had regarding creating a new packaging solution.

The development of gomod2nix is funded by NLNet through the PET(privacy and trust enhancing technologies) fund. gomod2nix is being developed as a part of Trustix.

NLNet

NGI0

March 04, 2021 12:00 AM

February 24, 2021

Sander van der Burg

Deploying mutable multi-process Docker containers with the Nix process management framework (or running Hydra in a Docker container)

In a blog post written several months ago, I have shown that the Nix process management framework can also be used to conveniently construct multi-process Docker images.

Although Docker is primarily used for managing single root application process containers, multi-process containers can sometimes be useful to deploy systems that consist of multiple, tightly coupled, processes.

The Docker manual has a section that describes how to construct images for multi-process containers, but IMO the configuration process is a bit tedious and cumbersome.

To make this process more convenient, I have built a wrapper function: createMultiProcessImage around the dockerTools.buildImage function (provided by Nixpkgs) that does the following:

  • It constructs an image that runs a Linux and Docker compatible process manager as an entry point. Currently, it supports supervisord, sysvinit, disnix and s6-rc.
  • The Nix process management framework is used to build a configuration for a system that consists of multiple processes, that will be managed by any of the supported process managers.

Although the framework makes the construction of multi-process images convenient, a big drawback of multi-process Docker containers is upgrading them -- for example, for Debian-based containers you can imperatively upgrade packages by connecting to the container:


$ docker exec -it mycontainer /bin/bash

and upgrade the desired packages, such as file:


$ apt install file

The upgrade instruction above is not reproducible -- apt may install file version 5.38 today, and 5.39 tomorrow.

To cope with these kinds of side-effects, Docker works with images that snapshot the outcomes of all the installation steps. Constructing a container from the same image will always provide the same versions of all dependencies.

As a consequence, to perform a reproducible container upgrade, it is required to construct a new image, discard the container and reconstruct the container from the new image version, causing the system as a whole to be terminated, including the processes that have not changed.

For a while, I have been thinking about this limitation and developed a solution that makes it possible to upgrade multi-process containers without stopping and discarding them. The only exception is the process manager.

To make deployments reproducible, it combines the reproducibility properties of Docker and Nix.

In this blog post, I will describe how this solution works and how it can be used.

Creating a function for building mutable Docker images


As explained in an earlier blog post, that compares the deployment properties of Nix and Docker, both solutions support reproducible deployment, albeit for different application domains.

Moreover, their reproducibility properties are built around different concepts:

  • Docker containers are reproducible, because they are constructed from images that consist of immutable layers identified by hash codes derived from their contents.
  • Nix package builds are reproducible, because they are stored in isolation in a Nix store and made immutable (the files' permissions are set read-only). In the construction process of the packages, many side effects are mitigated.

    As a result, when the hash code prefix of a package (derived from all build inputs) is the same, then the build output is also (nearly) bit-identical, regardless of the machine on which the package was built.

By taking these reproducibilty properties into account, we can create a reproducible deployment process for upgradable containers by using a specific separation of responsibilities.

Deploying the base system


For the deployment of the base system that includes the process manager, we can stick ourselves to the traditional Docker deployment workflow based on images (the only unconventional aspect is that we use Nix to build a Docker image, instead of Dockerfiles).

The process manager that the image provides deploys its configuration from a dynamic configuration directory.

To support supervisord, we can invoke the following command as the container's entry point:


supervisord --nodaemon \
--configuration /etc/supervisor/supervisord.conf \
--logfile /var/log/supervisord.log \
--pidfile /var/run/supervisord.pid

The above command starts the supervisord service (in foreground mode), using the supervisord.conf configuration file stored in /etc/supervisord.

The supervisord.conf configuration file has the following structure:


[supervisord]

[include]
files=conf.d/*

The above configuration automatically loads all program definitions stored in the conf.d directory. This directory is writable and initially empty. It can be populated with configuration files generated by the Nix process management framework.

For the other process managers that the framework supports (sysvinit, disnix and s6-rc), we follow a similar strategy -- we configure the process manager in such a way that the configuration is loaded from a source that can be dynamically updated.

Deploying process instances


Deployment of the process instances is not done in the construction of the image, but by the Nix process management framework and the Nix package manager running in the container.

To allow a processes model deployment to refer to packages in the Nixpkgs collection and install binary substitutes, we must configure a Nix channel, such as the unstable Nixpkgs channel:


$ nix-channel --add https://nixos.org/channels/nixpkgs-unstable
$ nix-channel --update

(As a sidenote: it is also possible to subscribe to a stable Nixpkgs channel or a specific Git revision of Nixpkgs).

The processes model (and relevant sub models, such as ids.nix that contains numeric ID assignments) are copied into the Docker image.

We can deploy the processes model for supervisord as follows:


$ nixproc-supervisord-switch

The above command will deploy the processes model in the NIXPROC_PROCESSES environment variable, which defaults to: /etc/nixproc/processes.nix:

  • First, it builds supervisord configuration files from the processes model (this step also includes deploying all required packages and service configuration files)
  • It creates symlinks for each configuration file belonging to a process instance in the writable conf.d directory
  • It instructs supervisord to reload the configuration so that only obsolete processes get deactivated and new services activated, causing unchanged processes to remain untouched.

(For the other process managers, we have equivalent tools: nixproc-sysvinit-switch, nixproc-disnix-switch and nixproc-s6-rc-switch).

Initial deployment of the system


Because only the process manager is deployed as part of the image (with an initially empty configuration), the system is not yet usable when we start a container.

To solve this problem, we must perform an initial deployment of the system on first startup.

I used my lessons learned from the chainloading techniques in s6 (in the previous blog post) and developed hacky generated bootstrap script (/bin/bootstrap) that serves as the container's entry point:


cat > /bin/bootstrap <<EOF
#! ${pkgs.stdenv.shell} -e

# Configure Nix channels
nix-channel --add ${channelURL}
nix-channel --update

# Deploy the processes model (in a child process)
nixproc-${input.processManager}-switch &

# Overwrite the bootstrap script, so that it simply just
# starts the process manager the next time we start the
# container
cat > /bin/bootstrap <<EOR
#! ${pkgs.stdenv.shell} -e
exec ${cmd}
EOR

# Chain load the actual process manager
exec ${cmd}
EOF
chmod 755 /bin/bootstrap

The generated bootstrap script does the following:

  • First, a Nix channel is configured and updated so that we can install packages from the Nixpkgs collection and obtain substitutes.
  • The next step is deploying the processes model by running the nixproc-*-switch tool for a supported process manager. This process is started in the background (as a child process) -- we can use this trick to force the managing bash shell to load our desired process supervisor as soon as possible.

    Ultimately, we want the process manager to become responsible for supervising any other process running in the container.
  • After the deployment process is started in the background, the bootstrap script is overridden by a bootstrap script that becomes our real entry point -- the process manager that we want to use, such as supervisord.

    Overriding the bootstrap script makes sure that the next time we start the container, it will start instantly without attempting to deploy the system again.
  • Finally, the bootstrap script "execs" into the real process manager, becoming the new PID 1 process. When the deployment of the system is done (the nixproc-*-switch process that still runs in the background), the process manager becomes responsible for reaping it.

With the above script, the workflow of deploying an upgradable/mutable multi-process container is the same as deploying an ordinary container from a Docker image -- the only (minor) difference is that the first time that we start the container, it may take some time before the services become available, because the multi-process system needs to be deployed by Nix and the Nix process management framework.

A simple usage scenario


Similar to my previous blog posts about the Nix process management framework, I will use the trivial web application system to demonstrate how the functionality of the framework can be used.

The web application system consists of one or more webapp processes (with an embedded HTTP server) that only return static HTML pages displaying their identities.

An Nginx reverse proxy forwards incoming requests to the appropriate webapp instance -- each webapp service can be reached by using its unique virtual host value.

To construct a mutable multi-process Docker image with Nix, we can write the following Nix expression (default.nix):


let
pkgs = import <nixpkgs> {};

nix-processmgmt = builtins.fetchGit {
url = https://github.com/svanderburg/nix-processmgmt.git;
ref = "master";
};

createMutableMultiProcessImage = import "${nix-processmgmt}/nixproc/create-image-from-steps/create-mutable-multi-process-image-universal.nix" {
inherit pkgs;
};
in
createMutableMultiProcessImage {
name = "multiprocess";
tag = "test";
contents = [ pkgs.mc ];
exprFile = ./processes.nix;
idResourcesFile = ./idresources.nix;
idsFile = ./ids.nix;
processManager = "supervisord"; # sysvinit, disnix, s6-rc are also valid options
}

The above Nix expression invokes the createMutableMultiProcessImage function that constructs a Docker image that provides a base system with a process manager, and a bootstrap script that deploys the multi-process system:

  • The name, tag, and contents parameters specify the image name, tag and the packages that need to be included in the image.
  • The exprFile parameter refers to a processes model that captures the configurations of the process instances that need to be deployed.
  • The idResources parameter refers to an ID resources model that specifies from which resource pools unique IDs need to be selected.
  • The idsFile parameter refers to an IDs model that contains the unique ID assignments for each process instance. Unique IDs resemble TCP/UDP port assignments, user IDs (UIDs) and group IDs (GIDs).
  • We can use the processManager parameter to select the process manager we want to use. In the above example it is supervisord, but other options are also possible.

We can use the following processes model (processes.nix) to deploy a small version of our example system:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
nix-processmgmt = builtins.fetchGit {
url = https://github.com/svanderburg/nix-processmgmt.git;
ref = "master";
};

ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

sharedConstructors = import "${nix-processmgmt}/examples/services-agnostic/constructors/constructors.nix" {
inherit pkgs stateDir runtimeDir logDir cacheDir tmpDir forceDisableUserChange processManager ids;
};

constructors = import "${nix-processmgmt}/examples/webapps-agnostic/constructors/constructors.nix" {
inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager ids;
};
in
rec {
webapp = rec {
port = ids.webappPorts.webapp or 0;
dnsName = "webapp.local";

pkg = constructors.webapp {
inherit port;
};

requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];
};

nginx = rec {
port = ids.nginxPorts.nginx or 0;

pkg = sharedConstructors.nginxReverseProxyHostBased {
webapps = [ webapp ];
inherit port;
} {};

requiresUniqueIdsFor = [ "nginxPorts" "uids" "gids" ];
};
}

The above Nix expression configures two process instances, one webapp process that returns a static HTML page with its identity and an Nginx reverse proxy that forwards connections to it.

A notable difference between the expression shown above and the processes models of the same system shown in my previous blog posts, is that this expression does not contain any references to files on the local filesystem, with the exception of the ID assignments expression (ids.nix).

We obtain all required functionality from the Nix process management framework by invoking builtins.fetchGit. Eliminating local references is required to allow the processes model to be copied into the container and deployed from within the container.

We can build a Docker image as follows:


$ nix-build

load the image into Docker:


$ docker load -i result

and create and start a Docker container:


$ docker run -it --name webapps --network host multiprocess:test
unpacking channels...
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
created 1 symlinks in user environment
2021-02-21 15:29:29,878 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.
2021-02-21 15:29:29,878 WARN No file matches via include "/etc/supervisor/conf.d/*"
2021-02-21 15:29:29,897 INFO RPC interface 'supervisor' initialized
2021-02-21 15:29:29,897 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2021-02-21 15:29:29,898 INFO supervisord started with pid 1
these derivations will be built:
/nix/store/011g52sj25k5k04zx9zdszdxfv6wy1dw-credentials.drv
/nix/store/1i9g728k7lda0z3mn1d4bfw07v5gzkrv-credentials.drv
/nix/store/fs8fwfhalmgxf8y1c47d0zzq4f89fz0g-nginx.conf.drv
/nix/store/vxpm2m6444fcy9r2p06dmpw2zxlfw0v4-nginx-foregroundproxy.sh.drv
/nix/store/4v3lxnpapf5f8297gdjz6kdra8g7k4sc-nginx.conf.drv
/nix/store/mdldv8gwvcd5fkchncp90hmz3p9rcd99-builder.pl.drv
/nix/store/r7qjyr8vr3kh1lydrnzx6nwh62spksx5-nginx.drv
/nix/store/h69khss5dqvx4svsc39l363wilcf2jjm-webapp.drv
/nix/store/kcqbrhkc5gva3r8r0fnqjcfhcw4w5il5-webapp.conf.drv
/nix/store/xfc1zbr92pyisf8lw35qybbn0g4f46sc-webapp.drv
/nix/store/fjx5kndv24pia1yi2b7b2bznamfm8q0k-supervisord.d.drv
these paths will be fetched (78.80 MiB download, 347.06 MiB unpacked):
...

As may be noticed by looking at the output, on first startup the Nix process management framework is invoked to deploy the system with Nix.

After the system has been deployed, we should be able to connect to the webapp process via the Nginx reverse proxy:


$ curl -H 'Host: webapp.local' http://localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Simple test webapp</title>
</head>
<body>
Simple test webapp listening on port: 5000
</body>
</html>

When it is desired to upgrade the system, we can change the system's configuration by connecting to the container instance:


$ docker exec -it webapps /bin/bash

In the container, we can edit the processes.nix configuration file:


$ mcedit /etc/nixproc/processes.nix

and make changes to the configuration of the system. For example, we can change the processes model to include a second webapp process:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
nix-processmgmt = builtins.fetchGit {
url = https://github.com/svanderburg/nix-processmgmt.git;
ref = "master";
};

ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

sharedConstructors = import "${nix-processmgmt}/examples/services-agnostic/constructors/constructors.nix" {
inherit pkgs stateDir runtimeDir logDir cacheDir tmpDir forceDisableUserChange processManager ids;
};

constructors = import "${nix-processmgmt}/examples/webapps-agnostic/constructors/constructors.nix" {
inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager ids;
};
in
rec {
webapp = rec {
port = ids.webappPorts.webapp or 0;
dnsName = "webapp.local";

pkg = constructors.webapp {
inherit port;
};

requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];
};

webapp2 = rec {
port = ids.webappPorts.webapp2 or 0;
dnsName = "webapp2.local";

pkg = constructors.webapp {
inherit port;
instanceSuffix = "2";
};

requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];
};

nginx = rec {
port = ids.nginxPorts.nginx or 0;

pkg = sharedConstructors.nginxReverseProxyHostBased {
webapps = [ webapp webapp2 ];
inherit port;
} {};

requiresUniqueIdsFor = [ "nginxPorts" "uids" "gids" ];
};
}

In the above process model model, a new process instance named: webapp2 was added that listens on a unique port that can be reached with the webapp2.local virtual host value.

By running the following command, the system in the container gets upgraded:


$ nixproc-supervisord-switch

resulting in two webapp process instances running in the container:


$ supervisorctl
nginx RUNNING pid 847, uptime 0:00:08
webapp RUNNING pid 459, uptime 0:05:54
webapp2 RUNNING pid 846, uptime 0:00:08
supervisor>

The first instance: webapp was left untouched, because its configuration was not changed.

The second instance: webapp2 can be reached as follows:


$ curl -H 'Host: webapp2.local' http://localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Simple test webapp</title>
</head>
<body>
Simple test webapp listening on port: 5001
</body>
</html>

After upgrading the system, the new configuration should also get reactivated after a container restart.

A more interesting example: Hydra


As explained earlier, to create upgradable containers we require a fully functional Nix installation in a container. This observation made a think about a more interesting example than the trivial web application system.

A prominent example of a system that requires Nix and is composed out of multiple tightly integrated process is Hydra: the Nix-based continuous integration service.

To make it possible to deploy a minimal Hydra service in a container, I have packaged all its relevant components for the Nix process management framework.

The processes model looks as follows:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
nix-processmgmt = builtins.fetchGit {
url = https://github.com/svanderburg/nix-processmgmt.git;
ref = "master";
};

nix-processmgmt-services = builtins.fetchGit {
url = https://github.com/svanderburg/nix-processmgmt-services.git;
ref = "master";
};

constructors = import "${nix-processmgmt-services}/services-agnostic/constructors.nix" {
inherit nix-processmgmt pkgs stateDir runtimeDir logDir tmpDir cacheDir forceDisableUserChange processManager;
};

instanceSuffix = "";
hydraUser = hydraInstanceName;
hydraInstanceName = "hydra${instanceSuffix}";
hydraQueueRunnerUser = "hydra-queue-runner${instanceSuffix}";
hydraServerUser = "hydra-www${instanceSuffix}";
in
rec {
nix-daemon = {
pkg = constructors.nix-daemon;
};

postgresql = rec {
port = 5432;
postgresqlUsername = "postgresql";
postgresqlPassword = "postgresql";
socketFile = "${runtimeDir}/postgresql/.s.PGSQL.${toString port}";

pkg = constructors.simplePostgresql {
inherit port;
authentication = ''
# TYPE DATABASE USER ADDRESS METHOD
local hydra all ident map=hydra-users
'';
identMap = ''
# MAPNAME SYSTEM-USERNAME PG-USERNAME
hydra-users ${hydraUser} ${hydraUser}
hydra-users ${hydraQueueRunnerUser} ${hydraUser}
hydra-users ${hydraServerUser} ${hydraUser}
hydra-users root ${hydraUser}
# The postgres user is used to create the pg_trgm extension for the hydra database
hydra-users postgresql postgresql
'';
};
};

hydra-server = rec {
port = 3000;
hydraDatabase = hydraInstanceName;
hydraGroup = hydraInstanceName;
baseDir = "${stateDir}/lib/${hydraInstanceName}";
inherit hydraUser instanceSuffix;

pkg = constructors.hydra-server {
postgresqlDBMS = postgresql;
user = hydraServerUser;
inherit nix-daemon port instanceSuffix hydraInstanceName hydraDatabase hydraUser hydraGroup baseDir;
};
};

hydra-evaluator = {
pkg = constructors.hydra-evaluator {
inherit nix-daemon hydra-server;
};
};

hydra-queue-runner = {
pkg = constructors.hydra-queue-runner {
inherit nix-daemon hydra-server;
user = hydraQueueRunnerUser;
};
};

apache = {
pkg = constructors.reverseProxyApache {
dependency = hydra-server;
serverAdmin = "admin@localhost";
};
};
}

In the above processes model, each process instance represents a component of a Hydra installation:

  • The nix-daemon process is a service that comes with Nix package manager to facilitate multi-user package installations. The nix-daemon carries out builds on behalf of a user.

    Hydra requires it to perform builds as an unprivileged Hydra user and uses the Nix protocol to more efficiently orchestrate large builds.
  • Hydra uses a PostgreSQL database backend to store data about projects and builds.

    The postgresql process refers to the PostgreSQL database management system (DBMS) that is configured in such a way that the Hydra components are authorized to manage and modify the Hydra database.
  • hydra-server is the front-end of the Hydra service that provides a web user interface. The initialization procedure of this service is responsible for initializing the Hydra database.
  • The hydra-evaluator regularly updates the repository checkouts and evaluates the Nix expressions to decide which packages need to be built.
  • The hydra-queue-runner builds all jobs that were evaluated by the hydra-evaluator.
  • The apache server is used as a reverse proxy server forwarding requests to the hydra-server.

With the following commands, we can build the image, load it into Docker, and deploy a container that runs Hydra:


$ nix-build hydra-image.nix
$ docker load -i result
$ docker run -it --name hydra-test --network host hydra:test

After deploying the system, we can connect to the container:


$ docker exec -it hydra-test /bin/bash

and observe that all processes are running and managed by supervisord:


$ supervisorctl
apache RUNNING pid 1192, uptime 0:00:42
hydra-evaluator RUNNING pid 1297, uptime 0:00:38
hydra-queue-runner RUNNING pid 1296, uptime 0:00:38
hydra-server RUNNING pid 1188, uptime 0:00:42
nix-daemon RUNNING pid 1186, uptime 0:00:42
postgresql RUNNING pid 1187, uptime 0:00:42
supervisor>

With the following commands, we can create our initial admin user:


$ su - hydra
$ hydra-create-user sander --password secret --role admin
creating new user `sander'

We can connect to the Hydra front-end in a web browser by opening http://localhost (this works because the container uses host networking):


and configure a job set to a build a project, such as libprocreact:


Another nice bonus feature of having multiple process managers supported is that if we build Hydra's Nix process management configuration for Disnix, we can also visualize the deployment architecture of the system with disnix-visualize:


The above diagram displays the following properties:

  • The outer box indicates that we are deploying to a single machine: localhost
  • The inner box indicates that all components are managed as processes
  • The ovals correspond to process instances in the processes model and the arrows denote dependency relationships.

    For example, the apache reverse proxy has a dependency on hydra-server, meaning that the latter process instance should be deployed first, otherwise the reverse proxy is not able to forward requests to it.

Building a Nix-enabled container image


As explained in the previous section, mutable Docker images require a fully functional Nix package manager in the container.

Since this may also be an interesting sub use case, I have created a convenience function: createNixImage that can be used to build an image whose only purpose is to provide a working Nix installation:


let
pkgs = import <nixpkgs> {};

nix-processmgmt = builtins.fetchGit {
url = https://github.com/svanderburg/nix-processmgmt.git;
ref = "master";
};

createNixImage = import "${nix-processmgmt}/nixproc/create-image-from-steps/create-nix-image.nix" {
inherit pkgs;
};
in
createNixImage {
name = "foobar";
tag = "test";
contents = [ pkgs.mc ];
}

The above Nix expression builds a Docker image with a working Nix setup and a custom package: the Midnight Commander.

Conclusions


In this blog post, I have described a new function in the Nix process management framework: createMutableMultiProcessImage that creates reproducible mutable multi-process container images, by combining the reproducibility properties of Docker and Nix. With the exception of the process manager, process instances in a container can be upgraded without bringing the entire container down.

With this new functionality, the deployment workflow of a multi-process container configuration has become very similar to how physical and virtual machines are managed with NixOS -- you can edit a declarative specification of a system and run a single command-line instruction to deploy the new configuration.

Moreover, this new functionality allows us to deploy a complex, tightly coupled multi-process system, such as Hydra: the Nix-based continuous integration service. In the Hydra example case, we are using Nix for three deployment aspects: constructing the Docker image, deploying the multi-process system configuration and building the projects that are configured in Hydra.

A big drawback of mutable multi-process images is that there is no sharing possible between multiple multi-process containers. Since the images are not built from common layers, the Nix store is private to each container and all packages are deployed in the writable custom layer, this may lead to substantial disk and RAM overhead per container instance.

Deploying the processes model to a container instance can probably be made more convenient by using Nix flakes -- a new Nix feature that is still experimental. With flakes we can easily deploy an arbitrary number of Nix expressions to a container and pin the deployment to a specific version of Nixpkgs.

Another interesting observation is the word: mutable. I am not completely sure if it is appropriate -- both the layers of a Docker image, as well as the Nix store paths are immutable and never change after they have been built. For both solutions, immutability is an important ingredient in making sure that a deployment is reproducible.

I have decided to still call these deployments mutable, because I am looking at the problem from a Docker perspective -- the writable layer of the container (that is mounted on top of the immutable layers of an image) is modified each time that we upgrade a system.

Future work


Although I am quite happy with the ability to create mutable multi-process containers, there is still quite a bit of work that needs to be done to make the Nix process management framework more usable.

Most importantly, trying to deploy Hydra revealed all kinds of regressions in the framework. To cope with all these breaking changes, a structured testing approach is required. Currently, such an approach is completely absent.

I could also (in theory) automate the still missing parts of Hydra. For example, I have not automated the process that updates the garbage collector roots, which needs to run in a timely manner. To solve this, I need to use a cron service or systemd timer units, which is beyond the scope of my experiment.

Availability


The createMutableMultiProcessImage function is part of the experimental Nix process management framework GitHub repository that is still under heavy development.

Because the amount of services that can be deployed with the framework has grown considerably, I have moved all non-essential services (not required for testing) into a separate repository. The Hydra constructor functions can be found in this repository as well.

by Sander van der Burg (noreply@blogger.com) at February 24, 2021 09:46 PM

February 17, 2021

Tweag I/O

Derivation outputs in a content-addressed world

This is another blog post on the upcoming content-addressed derivations for Nix. We’ve already explained the feature and some of its advantages, as well as the reasons why it isn’t easy to implement. Now we’re going to talk about about a concrete user-facing change that this feature will entail: the distinction between “derivation outputs” and “output paths”.

Note that the changes presented here might not yet be implemented or merged upstream.

Store paths

Store paths are pervasive in Nix. Run nix build? This will return a store path. Want to move derivation outputs around? Just nix copy their store path. Even if you can run nix copy nixpkgs#hello, this is strictly equivalent to nix build nixpkgs#hello --out-link hello && nix copy $(realpath ./hello). Need to know whether a derivation has been built locally or in a binary cache? Just check whether its output path exists.

This is really nice in a world where the output of the derivations are input-addressed, because there’s a direct mapping between a derivation and its output paths − the .drv file actually explicitly contains them − which means that given a derivation Nix can directly know what its output paths are.

However this falls short with the addition of content-addressed derivations: if hello is content-addressed then I can’t introspect the derivation to know its output path anymore (see the previous post on that topic). Locally, Nix has a database that stores (amongst other things) which derivation produced which outputs, meaning that it knows that hello has already been built and that its output path is /nix/store/1234-hello. But if I just copy this output path to another machine, and try to rebuild hello there, Nix won’t be able to know that its output path is already there (because it doesn’t have that mapping), so it will have rebuild the derivation, only to realise that it yields the output path /nix/store/1234-hello that’s already there and discard the result.

This is very frustrating, as it means that the following won’t work:

$ nix copy --to ssh://somewhereelse nixpkgs.hello
# Try to build with `--max-jobs 0` to make it fail if it needs to rebuild anything
$ ssh somewhereelse nix build nixpkgs.hello --max-jobs 0
error: --- Error ----- nix
252 derivations need to be built, but neither local builds ('--max-jobs') nor remote builds ('--builders') are enabled

We could ask for Nix to copy the mapping between the hello derivation and its output paths as well, but many derivations might have produced this path, so if we just say nix copy --to ssh://somewhereelse /nix/store/1234-hello, does that mean that we want to copy the 1234-hello store path? Or 1234-hello the output of the hello derivation? Or 1234-hello the output of the hello2 derivation? Nix has no way to know that.

This means that we need another way to identify these outputs other than just their store paths.

Introducing derivation outputs

The one thing that we know though, and that uniquely identifies the derivation, is the hash of the derivation itself. The derivation for hello will be stored as /nix/store/xxx-hello.drv (where xxx is a hash), and that xxx definitely identifies the derivation (and is known before the build). As this derivation might have several outputs, we need to append the name of the considered output to get a truly unique identifier, giving us /nix/store/xxx-hello.drv!out.

So given this “derivation output id”, Nix will be able to both retrieve the corresponding output path (if it has been built), and know the mapping (derivation, outputName) -> outputPath.

With this in hand, we now can run nix copy --to ssh://somewhereelse /nix/store/xxx-hello.drv!out, which will both copy the output path /nix/store/1234-hello and register on the remote machine that this path is the output out of the derivation /nix/store/xxx-hello.drv. Likewise, nix copy nixpkgs.hello will be a shortcut for nix copy /nix/store/xxx-hello.drv. And now we can do

$ nix copy --to ssh://somewhereelse nixpkgs.hello
# Try to build with `--max-jobs 0` to make it fail if it needs to rebuild anything
$ ssh somewhereelse nix build nixpkgs.hello --max-jobs 0
$ ./result/bin/hello
Hello, world!

In practice

What will this mean in practice?

This means that the Nix cli will now return or accept either store paths or derivation output ids depending on the context. For example nix build will still create symlinks to the output paths and nix shell will add them to the PATH because that’s what makes sense in the context. But as we’ve seen above, nix copy will accept both store paths and derivation output ids, and these will have different semantics. Copying store paths will just copy the store paths as it used to do (in the case you don’t care about rebuilding them on the other side) while copying derivation outputs will also register these outputs on the remote side.

Once more, right now, this feature is still under development, so the changes presented here might not yet be implemented or merged upstream. So don’t be surprised when the feature lands in the near future!

February 17, 2021 12:00 AM

February 01, 2021

Sander van der Burg

Developing an s6-rc backend for the Nix process management framework

One of my major blog topics last year was my experimental Nix process management framework, that is still under heavy development.

As explained in many of my earlier blog posts, one of its major objectives is to facilitate high-level deployment specifications of running processes that can be translated to configurations for all kinds of process managers and deployment solutions.

The backends that I have implemented so far, were picked for the following reasons:

  • Multiple operating systems support. The most common process management service was chosen for each operating system: On Linux, sysvinit (because this used to be the most common solution) and systemd (because it is used by many conventional Linux distributions today), bsdrc on FreeBSD, launchd for macOS, and cygrunsrv for Cygwin.
  • Supporting unprivileged user deployments. To supervise processes without requiring a service that runs on PID 1, that also works for unprivileged users, supervisord is very convenient because it was specifically designed for this purpose.
  • Docker was selected because it is a very popular solution for managing services, and process management is one of its sub responsibilities.
  • Universal process management. Disnix was selected because it can be used as a primitive process management solution that works on any operating system supported by the Nix package manager. Moreover, the Disnix services model is a super set of the processes model used by the process management framework.

Not long after writing my blog post about the process manager-agnostic abstraction layer, somebody opened an issue on GitHub with the suggestion to also support s6-rc. Although I was already aware that more process/service management solutions exist, s6-rc was a solution that I did not know about.

Recently, I have implemented the suggested s6-rc backend. Although deploying s6-rc services now works quite conveniently, getting to know s6-rc and its companion tools was somewhat challenging for me.

In this blog post, I will elaborate about my learning experiences and explain how the s6-rc backend was implemented.

The s6 tool suite


s6-rc is a software projected published on skarnet and part of a bigger tool ecosystem. s6-rc is a companion tool of s6: skarnet.org's small & secure supervision software suite.

On Linux and many other UNIX-like systems, the initialization process (typically /sbin/init) is a highly critical program:

  • It is the first program loaded by the kernel and responsible for setting the remainder of the boot procedure in motion. This procedure is responsible for mounting additional file systems, loading device drivers, and starting essential system services, such as SSH and logging services.
  • The PID 1 process supervises all processes that were directly loaded by it, as well as indirect child processes that get orphaned -- when this happens they get automatically adopted by the process that runs as PID 1.

    As explained in an earlier blog post, traditional UNIX services that daemonize on their own, deliberately orphan themselves so that they remain running in the background.
  • When a child process terminates, the parent process must take notice or the terminated process will stay behind as a zombie process.

    Because the PID 1 process is the common ancestor of all other processes, it is required to automatically reap all relevant zombie processes that become a child of it.
  • The PID 1 process runs with root privileges and, as a result, has full access to the system. When the security of the PID 1 process gets compromised, the entire system is at risk.
  • If the PID 1 process crashes, the kernel crashes (and hence the entire system) with a kernel panic.

There are many kinds of programs that you can use as a system's PID 1. For example, you can directly use a shell, such as bash, but is far more common to use an init system, such as sysvinit or systemd.

According to the author of s6, an init system is made out of four parts:

  1. /sbin/init: the first userspace program that is run by the kernel at boot time (not counting an initramfs).
  2. pid 1: the program that will run as process 1 for most of the lifetime of the machine. This is not necessarily the same executable as /sbin/init, because /sbin/init can exec into something else.
  3. a process supervisor.
  4. a service manager.

In the s6 tool eco-system, most of these parts are implemented by separate tools:

  • The first userspace program: s6-linux-init takes care of the coordination of the initialization process. It does a variety of one-time boot things: for example, it traps the ctrl-alt-del keyboard combination, it starts the shutdown daemon (that is responsible for eventually shutting down the system), and runs the initial boot script (rc.init).

    (As a sidenote: this is almost true -- the /sbin/init process is a wrapper script that "execs" into s6-linux-linux-init with the appropriate parameters).
  • When the initialization is done, s6-linux-init execs into a process called s6-svscan provided by the s6 toolset. s6-svscan's task is to supervise an entire process supervision tree, which I will explain later.
  • Starting and stopping services is done by a separate service manager started from the rc.init script. s6-rc is the most prominent option (that we will use in this blog post), but also other tools can be used.

Many conventional init systems, implement most (or sometimes all) of these aspects in a single executable.

In particular, the s6 author is highly critical of systemd: the init system that is widely used by many conventional Linux distributions today -- he dedicated an entire page with criticisms about it.

The author of s6 advocates a number of design principles for his tool eco-system (that systemd violates in many ways):

  • The Unix philosophy: do one job and do it well.
  • Doing less instead of more (preventing feature creep).
  • Keeping tight quality control over every tool by only opening up repository access to small teams only (or rather a single person).
  • Integration support: he is against the bazaar approach on project level, but in favor of the bazaar approach on an eco-system level in which everybody can write their own tools that integrate with existing tools.

The concepts implemented by the s6 tool suite were not completely "invented" from scratch. daemontools is what the author considers the ancestor of s6 (if you look at the web page then you will notice that the concept of a "supervision tree" was pioneered there and that some of the tools listed resemble the same tools in the s6 tool suite), and runit its cousin (that is also heavily inspired by daemontools).

A basic usage scenario of s6 and s6-rc


Although it is possible to use Linux distributions in which the init system, supervisor and service manager are all provided by skarnet tools, a sub set of s6 and s6-rc can also be used on any Linux distribution and other supported operating systems, such as the BSDs.

Root privileges are not required to experiment with these tools.

For example, with the following command we can use the Nix package manager to deploy the s6 supervision toolset in a development shell session:


$ nix-shell -p s6

In this development shell session, we can start the s6-svscan service as follows:


$ mkdir -p $HOME/var/run/service
$ s6-svscan $HOME/var/run/service

The s6-svscan is a service that supervises an entire process supervision tree, including processes that may accidentally become a child of it, such as orphaned processes.

The directory parameter is a scan directory that maintains the configurations of the processes that are currently supervised. So far, no supervised process have been deployed yet.

We can actually deploy services by using the s6-rc toolset.

For example, I can easily configure my trivial example system used in previous blog posts that consists of one or multiple web application processes (with an embedded HTTP server) returning static HTML pages and an Nginx reverse proxy that forwards requests to one of the web application processes based on the appropriate virtual host header.

Contrary to the other process management solutions that I have investigated earlier, s6-rc does not have an elaborate configuration language. It does not implement a parser (for very good reasons as explained by the author, because it introduces extra complexity and bugs).

Instead, you have to create directories with text files, in which each file represents a configuration property.

With the following command, I can spawn a development shell with all the required utilities to work with s6-rc:


$ nix-shell -p s6 s6-rc execline

The following shell commands create an s6-rc service configuration directory and a configuration for a single webapp process instance:


$ mkdir -p sv/webapp
$ cd sv/webapp

$ echo "longrun" > type

$ cat > run <<EOF
$ #!$(type -p execlineb) -P

envfile $HOME/envfile
exec $HOME/webapp/bin/webapp
EOF

The above shell script creates a configuration directory for a service named: webapp with the following properties:

  • It creates a service with type: longrun. A long run service deploys a process that runs in the foreground that will get supervised by s6.
  • The run file refers to an executable that starts the service. For s6-rc services it is common practice to implement wrapper scripts using execline: a non-interactive scripting language.

    The execline script shown above loads an environment variable config file with the following content: PORT=5000. This environment variable is used to configure the TCP port number to which the service should bind to and then "execs" into a new process that runs the webapp process.

    (As a sidenote: although it is a common habit to use execline for writing wrapper scripts, this is not a hard requirement -- any executable implemented in any language can be used. For example, we could also write the above run wrapper script as a bash script).

We can also configure the Nginx reverse proxy service in a similar way:


$ mkdir -p ../nginx
$ cd ../nginx

$ echo "longrun" > type
$ echo "webapp" > dependencies

$ cat > run <<EOF
$ #!$(type -p execlineb) -P

foreground { mkdir -p $HOME/var/nginx/logs $HOME/var/cache/nginx }
exec $(type -p nginx) "-p" "$HOME/var/nginx" "-c" "$HOME/nginx/nginx.conf" "-g" "daemon off;"
EOF

The above shell script creates a configuration directory for a service named: nginx with the following properties:

  • It again creates a service of type: longrun because Nginx should be started as a foreground process.
  • It declares the webapp service (that we have configured earlier) a dependency ensuring that webapp is started before nginx. This dependency relationship is important to prevent Nginx doing a redirect to a non-existent service.
  • The run script first creates all mandatory state directories and finally execs into the Nginx process, with a configuration file using the above state directories, and turning off daemon mode so that it runs in the foreground.

In addition to configuring the above services, we also want to deploy the system as a whole. This can be done by creating bundles that encapsulate collections of services:


mkdir -p ../default
cd ../default

echo "bundle" > type

cat > contents <<EOF
webapp
nginx
EOF

The above shell instructions create a bundle named: default referring to both the webapp and nginx reverse proxy service that we have configured earlier.

Our s6-rc configuration directory structure looks as follows:


$ find ./sv
./sv
./sv/default
./sv/default/contents
./sv/default/type
./sv/nginx/run
./sv/nginx/type
./sv/webapp/dependencies
./sv/webapp/run
./sv/webapp/type

If we want to deploy the service directory structure shown above, we first need to compile it into a configuration database. This can be done with the following command:


$ mkdir -p $HOME/etc/s6/rc
$ s6-rc-compile $HOME/etc/s6/rc/compiled-1 $HOME/sv

The above command creates a compiled database file in: $HOME/etc/s6/rc/compiled-1 stored in: $HOME/sv.

With the following command we can initialize the s6-rc system with our compiled configuration database:


$ s6-rc-init -c $HOME/etc/s6/rc/compiled-1 -l $HOME/var/run/s6-rc \
$HOME/var/run/service

The above command generates a "live directory" in: $HOME/var/run/s6-rc containing the state of s6-rc.

With the following command, we can start all services in the: default bundle:


$ s6-rc -l $HOME/var/run/s6-rc -u change default

The above command deploys a running system with the following process tree:


As as can be seen in the diagram above, the entire process tree is supervised by s6-svscan (the program that we have started first). Every longrun service deployed by s6-rc is supervised by a process named: s6-supervise.

Managing service logging


Another important property of s6 and s6-rc is the way it handles logging. By default, all output that the supervised processes produce on the standard output and standard error are captured by s6-svscan and written to a single log stream (in our case, it will be redirected to the terminal).

When it is desired to capture the output of a service into its own dedicated log file, you need to configure the service in such a way that it writes all relevant information to a pipe. A companion logging service is required to capture the data that is sent over the pipe.

The following command-line instructions modify the webapp service (that we have created earlier) to let it send its output to another service:


$ cd sv
$ mv webapp webapp-srv
$ cd webapp-srv

$ echo "webapp-log" > producer-for
$ cat > run <<EOF
$ #!$(type -p execlineb) -P

envfile $HOME/envfile
fdmove -c 2 1
exec $HOME/webapp/bin/webapp
EOF

In the script above, we have changed the webapp service configuration as follows:

  • We rename the service from: webapp to webapp-srv. Using suffixes is a convention commonly used for s6-rc services that also have a log companion service.
  • With the producer-for property, we specify that the webapp-srv is a service that produces output for another service named: webapp-log. We will configure this service later.
  • We create a new run script that adds the following command: fdmove -c 2 1.

    The purpose of this added instruction is to redirect all output that is sent over the standard error (file descriptor: 2) to the standard output (file descriptor: 1). This redirection makes it possible that all data can be captured by the log companion service.

We can configure the log companion service: webapp-log with the following command-line instructions:


$ mkdir ../webapp-log
$ cd ../webapp-log

$ echo "longrun" > type
$ echo "webapp-srv" > consumer-for
$ echo "webapp" > pipeline-name
$ echo 3 > notification-fd

$ cat > run <<EOF
#!$(type -p execlineb) -P

foreground { mkdir -p $HOME/var/log/s6-log/webapp }
exec -c s6-log -d3 $HOME/var/log/s6-log/webapp
EOF

The service configuration created above does the following:

  • We create a service named: webapp-log that is a long running service.
  • We declare the service to be a consumer for the webapp-srv (earlier, we have already declared the companion service: webapp-srv to be a producer for this logging service).
  • We configure a pipeline name: webapp causing s6-rc to automatically generate a bundle with the name: webapp in which all involved services are its contents.

    This generated bundle allows us to always manage the service and logging companion as a single deployment unit.
  • The s6-log service supports readiness notifications. File descriptor: 3 is configured to receive that notification.
  • The run script creates the log directory in which the output should be stored and starts the s6-log service to capture the output and store the data in the corresponding log directory.

    The -d3 parameter instructs it to send a readiness notification over file descriptor 3.

After modifying the configuration files in such a way that each longrun service has a logging companion, we need to compile a new database that provides s6-rc our new configuration:


$ s6-rc-compile $HOME/etc/s6/rc/compiled-2 $HOME/sv

The above command creates a database with a new filename in: $HOME/etc/s6/rc/compiled-2. We are required to give it a new name -- the old configuration database (compiled-1) must be retained to make the upgrade process work.

With the following command, we can upgrade our running configuration:


$ s6-rc-update -l $HOME/var/run/s6-rc $HOME/etc/s6/rc/compiled-2

The result is the following process supervision tree:


As you may observe by looking at the diagram above, every service has a companion s6-log service that is responsible for capturing and storing its output.

The log files of the services can be found in $HOME/var/log/s6-log/webapp and $HOME/var/log/s6-log/nginx.

One shot services


In addition to longrun services that are useful for managing system services, more aspects need to be automated in a boot process, such as mounting file systems.

These kinds of tasks can be automated with oneshot services, that execute an up script on startup, and optionally, a down script on shutdown.

The following service configuration can be used to mount the kernel's /proc filesystem:


mkdir -p ../mount-proc
cd ../mount-proc

echo "oneshot" > type

cat > run <<EOF
$ #!$(type -p execlineb) -P
foreground { mount -t proc proc /proc }
EOF

Chain loading


The execline scripts shown in this blog post resemble shell scripts in many ways. One particular aspect that sets execline scripts apart from shell scripts is that all commands make intensive use of a concept called chain loading.

Every instruction in an execline script executes a task, may imperatively modify the environment (e.g. by changing environment variables, or changing the current working directory etc.) and then "execs" into a new chain loading task.

The last parameter of each command-line instruction refers to the command-line instruction that it needs to "execs into" -- typically this command-line instruction is put on the next line.

The execline package, as well as many packages in the s6 ecosystem, contain many programs that support chain loading.

It is also possible to implement custom chain loaders that follow the same protocol.

Developing s6-rc function abstractions for the Nix process management framework


In the Nix process management framework, I have added function abstractions for each s6-rc service type: longrun, oneshot and bundle.

For example, with the following Nix expression we can generate an s6-rc longrun configuration for the webapp process:


{createLongRunService, writeTextFile, execline, webapp}:

let
envFile = writeTextFile {
name = "envfile";
text = ''
PORT=5000
'';
};
in
createLongRunService {
name = "webapp";
run = writeTextFile {
name = "run";
executable = true;
text = ''
#!${execline}/bin/execlineb -P

envfile ${envFile}
fdmove -c 2 1
exec ${webapp}/bin/webapp
'';
};
autoGenerateLogService = true;
}

Evaluating the Nix expression above does the following:

  • It generates a service directory that corresponds to the: name parameter with a longrun type property file.
  • It generates a run execline script, that uses a generated envFile for configuring the service's port number, redirects the standard error to the standard output and starts the webapp process (that runs in the foreground).
  • The autoGenerateLogService parameter is a concept I introduced myself, to conveniently configure a companion log service, because this a very common operation -- I cannot think of any scenario in which you do not want to have a dedicated log file for a long running service.

    Enabling this option causes the service to automatically become a producer for the log companion service (having the same name with a -log suffix) and automatically configures a logging companion service that consumes from it.

In addition to constructing long run services from Nix expressions, there are also abstraction functions to create one shots: createOneShotService and bundles: createServiceBundle.

The function that generates a log companion service can also be directly invoked with: createLogServiceForLongRunService, if desired.

Generating a s6-rc service configuration from a process-manager agnostic configuration


The following Nix expression is a process manager-agnostic configuration for the webapp service, that can be translated to a configuration for any supported process manager in the Nix process management framework:


{createManagedProcess, tmpDir}:
{port, instanceSuffix ? "", instanceName ? "webapp${instanceSuffix}"}:

let
webapp = import ../../webapp;
in
createManagedProcess {
name = instanceName;
description = "Simple web application";
inherit instanceName;

process = "${webapp}/bin/webapp";
daemonArgs = [ "-D" ];

environment = {
PORT = port;
};

overrides = {
sysvinit = {
runlevels = [ 3 4 5 ];
};
};
}

The Nix expression above specifies the following high-level configuration concepts:

  • The name and description attributes are just meta data. The description property is ignored by the s6-rc generator, because s6-rc has no equivalent configuration property for capturing a description.
  • A process manager-agnostic configuration can specify both how the service can be started as a foreground process or as a process that daemonizes itself.

    In the above example, the process attribute specifies that the same executable needs to invoked for both a foregroundProcess and daemon. The daemonArgs parameter specifies the command-line arguments that need to be propagated to the executable to let it daemonize itself.

    s6-rc has a preference for managing foreground processes, because these can be more reliably managed. When a foregroundProcess executable can be inferred, the generator will automatically compose a longrun service making it possible for s6 to supervise it.

    If only a daemon can be inferred, the generator will compose a oneshot service that starts the daemon with the up script, and on shutdown, terminates the daemon by dereferencing the PID file in the down script.
  • The environment attribute set parameter is automatically translated to an envfile that the generated run script consumes.
  • Similar to the sysvinit backend, it is also possible to override the generated arguments for the s6-rc backend, if desired.

As already explained in the blog post that covers the framework's concepts, the Nix expression above needs to be complemented with a constructors expression that composes the common parameters of every process configuration and a processes model that constructs process instances that need to be deployed.

The following processes model can be used to deploy a webapp process and an nginx reverse proxy instance that connects to it:


{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
constructors = import ./constructors.nix {
inherit pkgs stateDir runtimeDir logDir tmpDir;
inherit forceDisableUserChange processManager;
};
in
rec {
webapp = rec {
port = 5000;
dnsName = "webapp.local";

pkg = constructors.webapp {
inherit port;
};
};

nginx = rec {
port = 8080;

pkg = constructors.nginxReverseProxyHostBased {
webapps = [ webapp ];
inherit port;
} {};
};
}

With the following command-line instruction, we can automatically create a scan directory and start s6-svscan:


$ nixproc-s6-svscan --state-dir $HOME/var

The --state-dir causes the scan directory to be created in the user's home directory making unprivileged deployments possible.

With the following command, we can deploy the entire system, that will get supervised by the s6-svscan service that we just started:


$ nixproc-s6-rc-switch --state-dir $HOME/var \
--force-disable-user-change processes.nix

The --force-disable-user-change parameter prevents the deployment system from creating users and groups and changing user privileges, allowing the deployment as an unprivileged user to succeed.

The result is a running system that allows us to connect to the webapp service via the Nginx reverse proxy:


$ curl -H 'Host: webapp.local' http://localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Simple test webapp</title>
</head>
<body>
Simple test webapp listening on port: 5000
</body>
</html>

Constructing multi-process Docker images supervised by s6


Another feature of the Nix process management framework is constructing multi-process Docker images in which multiple process instances are supervised by a process manager of choice.

s6 can also be used as a supervisor in a container. To accomplish this, we can use s6-linux-init as an entry point.

The following attribute generates a skeleton configuration directory:


let
skelDir = pkgs.stdenv.mkDerivation {
name = "s6-skel-dir";
buildCommand = ''
mkdir -p $out
cd $out

cat > rc.init <<EOF
#! ${pkgs.stdenv.shell} -e
rl="\$1"
shift

# Stage 1
s6-rc-init -c /etc/s6/rc/compiled /run/service

# Stage 2
s6-rc -v2 -up change default
EOF

chmod 755 rc.init

cat > rc.shutdown <<EOF
#! ${pkgs.stdenv.shell} -e

exec s6-rc -v2 -bDa change
EOF

chmod 755 rc.shutdown

cat > rc.shutdown.final <<EOF
#! ${pkgs.stdenv.shell} -e
# Empty
EOF
chmod 755 rc.shutdown.final
'';
};

The skeleton directory generated by the above sub expression contains three configuration files:

  • rc.init is the script that the init system starts, right after starting the supervisor: s6-svscan. It is responsible for initializing the s6-rc system and starting all services in the default bundle.
  • rc.shutdown script is executed on shutdown and stops all previously started services by s6-rc.
  • rc.shutdown.final runs at the very end of the shutdown procedure, after all processes have been killed and all file systems have been unmounted. In the above expression, it does nothing.

In the initialization process of the image (the runAsRoot parameter of dockerTools.buildImage), we need to execute a number of dynamic initialization steps.

First, we must initialize s6-linux-init to read its configuration files from /etc/s6/current using the skeleton directory (that we have configured in the sub expression shown earlier) as its initial contents (the -f parameter) and run the init system in container mode (the -C parameter):


mkdir -p /etc/s6
s6-linux-init-maker -c /etc/s6/current -p /bin -m 0022 -f ${skelDir} -N -C -B /etc/s6/current
mv /etc/s6/current/bin/* /bin
rmdir etc/s6/current/bin

s6-linux-init-maker generates an /bin/init script, that we can use as the container's entry point.

I want the logging services to run as an unprivileged user (s6-log) requiring me to create the user and corresponding group first:


groupadd -g 2 s6-log
useradd -u 2 -d /dev/null -g s6-log s6-log

We must also compile a database from the s6-rc configuration files, by running the following command-line instructions:


mkdir -p /etc/s6/rc
s6-rc-compile /etc/s6/rc/compiled ${profile}/etc/s6/sv

As can be seen in the rc.init script that we have generated earlier, the compiled database: /etc/s6/rc/compiled is propagated to s6-rc-init as a command-line parameter.

With the following Nix expression, we can build an s6-rc managed multi-process Docker image that deploys all the process instances in the processes model that we have written earlier:


let
pkgs = import <nixpkgs> {};

createMultiProcessImage = import ../../nixproc/create-multi-process-image/create-multi-process-image-universal.nix {
inherit pkgs system;
inherit (pkgs) dockerTools stdenv;
};
in
createMultiProcessImage {
name = "multiprocess";
tag = "test";
exprFile = ./processes.nix;
stateDir = "/var";
processManager = "s6-rc";
}

With the following command, we can build the image:


$ nix-build

and load the image into Docker with the following command:


$ docker load -i result

Discussion


With the addition of the s6-rc backend in the Nix process management framework, we have a modern alternative to systemd at our disposal.

We can easily let services be managed by s6-rc using the same agnostic high-level deployment configurations that can also be used to target other process management backends, including systemd.

What I particularly like about the s6 tool ecosystem (and this also applies in some extent to its ancestor: daemontools and cousin project: runit) is the idea to construct the entire system's initialization process and its sub concerns (process supervision, logging and service management) from separate tools, each having clear/fixed scopes.

This kind of design reminds me of microkernels -- in a microkernel design, the kernel is basically split into multiple collaborating processes each having their own responsibilities (e.g. file systems, drivers).

The microkernel is the only process that has full access to the system and typically only has very few responsibilities (e.g. memory management, task scheduling, interrupt handling).

When a process crashes, such as a driver, this failure should not tear the entire system down. Systems can even recover from problems, by restarting crashed processes.

Furthermore, these non-kernel processes typically have very few privileges. If a process' security gets compromised (such as a leaky driver), the system as a whole will not be affected.

Aside from a number of functional differences compared to systemd, there are also some non-functional differences as well.

systemd can only be used on Linux using glibc as the system's libc, s6 can also be used on different operating systems (e.g. the BSDs) with different libc implementations, such as musl.

Moreover, the supervisor service (s6-svscan) can also be used as a user-level supervisor that does not need to run as PID 1. Although systemd supports user sessions (allowing service deployments from unprivileged users), it still has the requirement to have systemd as an init system that needs to run as the system's PID 1.

Improvement suggestions


Although the s6 ecosystem provides useful tools and has all kinds of powerful features, I also have a number of improvement suggestions. They are mostly usability related:

  • I have noticed that the command-line tools have very brief help pages -- they only enumerate the available options, but they do not provide any additional information explaining what these options do.

    I have also noticed that there are no official manpages, but there is a third-party initiative that seems to provide them.

    The "official" source of reference are the HTML pages. For me personally, it is not always convenient to access HTML pages on limited machines with no Internet connection and/or only terminal access.
  • Although each individual tool is well documented (albeit in HTML), I was having quite a few difficulties figuring out how to use them together -- because every tool has a very specific purpose, you typically need to combine them in interesting ways to do something meaningful.

    For example, I could not find any clear documentation on skarnet describing typical combined usage scenarios, such as how to use s6-rc on a conventional Linux distribution that already has a different service management solution.

    Fortunately, I discovered a Linux distribution that turned out to be immensely helpful: Artix Linux. Artix Linux provides s6 as one of its supported process management solutions. I ended up installing Artix Linux in a virtual machine and reading their documentation.

    This kind of unclarity seems to be somewhat analogous to common criticisms of microkernels: one of Linus Torvalds' criticisms is that in microkernel designs, the pieces are simplified, but the coordination of the entire system is more difficult.
  • Updating existing service configurations is difficult and cumbersome. Each time I want to change something (e.g. adding a new service), then I need to compile a new database, make sure that the newly compiled database co-exists with the previous database, and then run s6-rc-update.

    It is very easy to make mistakes. For example, I ended up overwriting the previous database several times. When this happens, the upgrade process gets stuck.

    systemd, on the other hand, allows you to put a new service configuration file in the configuration directory, such as: /etc/systemd/system. We can conveniently reload the configuration with a single command-line instruction:


    $ systemctl daemon-reload
    I believe that the updating process can still be somewhat simplified in s6-rc. Fortunately, I have managed to hide that complexity in the nixproc-s6-rc-deploy tool.
  • It was also difficult to find out all the available configuration properties for s6-rc services -- I ended up looking at the examples and studying the documentation pages for s6-rc-compile, s6-supervise and service directories.

    I think that it could be very helpful to write a dedicated documentation page that describes all configurable properties of s6-rc services.
  • I believe it is also very common that for each longrun service (with a -srv suffix), that you want a companion logging service (with a -log suffix).

    As a matter of fact, I can hardly think of a situation in which you do not want this. Maybe it helps to introduce a convenience property to automatically facilitate the generation of log companion services.

Availability


The s6-rc backend described in this blog post is part of the current development version of the Nix process management framework, that is still under heavy development.

The framework can be obtained from my GitHub page.

by Sander van der Burg (noreply@blogger.com) at February 01, 2021 09:29 PM

January 28, 2021

Mayflower

Safe service upgrades using system.stateVersion

One of the most important features for system administrators who operate NixOS systems are atomic upgrades which means that a deployment won’t reach an inconsistent state: if building a new system’s configuration succeeds, it will be activated in a single step by replacing the /run/current-system-symlink. If a build fails, e.g. due to broken packages, the configuration won’t be activated. This also means that downgrades are fairly simple since a previous configuration can be reactivated in a so-called rollback by changing the symlink to /run/current-system back to the previous store-path.

January 28, 2021 10:23 AM

January 22, 2021

Tweag I/O

Programming with contracts in Nickel

  1. Presenting Nickel: better configuration for less
  2. Programming with contracts in Nickel
  3. Types à la carte in Nickel

In a previous post, I gave a taste of Nickel, a configuration language we are developing at Tweag. One cool feature of Nickel is the ability to validate data and enforce program invariants using so-called contracts. In this post, I introduce the general concept of programming with contracts and illustrate it in Nickel.

Contracts are everywhere

You go to your favorite bakery and buy a croissant. Is there a contract binding you to the baker?

A long time ago, I was puzzled by this very first question of a law class exam. It looked really simple, yet I had absolutely no clue.

“Ehm..No?”

A contract should write down terms and conditions, and be signed by both parties. How could buying a croissant involve such a daunting liability?

Well, I have to confess that this exam didn’t go very well.

It turns out the sheer act of selling something implicitly and automatically establishes a legally binding contract between both parties (at least in France). For once, the programming world is not that different from the physical world: if I see a ConcurrentHashmap class in a Java library, given the context of Java’s naming conventions, I rightfully expect it to be a thread-safe implementation of a hashmap. This is a form of contract. If a programmer uses ConcurrentHashmap to name a class that implements a non-thread safe linked list, they should probably be sent to court.

Contracts may take multiple forms. A contract can be explicit, such as in a formal specification, or implicit, as in the ConcurrentHashMap example. They can be enforced or not, such as a type signature in a statically typed language versus an invariant written as a comment in a dynamically typed language. Here are a few examples:

Contract Explicitness Enforced
Static types Implicit if inferred, explicit otherwise Yes, at compile time
Dynamic types Implicit Yes, at run-time
Documentation Explicit No
Naming Implicit No
assert() primitive Explicit Yes, at run-time
pre/post conditions Explicit Yes, at run-time or compile time

As often, explicit is better than implicit: it leaves no room for misunderstanding. Enforced is better than not, because I would rather be protected by a proper legal system in case of contract violation.

Programming with Contracts

Until now, I’ve been using the word contract in a wide sense. It turns out contracts also refer to a particular programming paradigm which embodies the general notion pretty well. Such contracts are explicit and enforced, following our terminology. They are most notably used in Racket. From now on, I shall use contract in this more specific sense.

To first approximation, contracts are assertions. They check that a value satisfies some property at run-time. If the test passes, the execution can go on normally. Otherwise, an error is raised.

In Nickel, one can enforce a contract using the | operator:

let x = (1 + 1 | Num) in 2*x

Here, x is bound to a Num contract. When evaluating x, the following steps are performed:

  1. evaluate 1 + 1
  2. check that the result is a number
  3. if it is, return the expression unchanged. Otherwise, raise an error that halts the program.

Let’s see it in action:

$nickel <<< '1 + 1 | Num'
Done: Num(2.0)

$nickel <<< 'false | Num'
error: Blame error: contract broken by a value.
  ┌─ :1:1
  │
1 │ Num
  │ --- expected type
  │
  ┌─ <stdin>:1:9
  │
1 │ false | Num
  │         ^^^ bound here

Contracts versus types

I’ve described contracts as assertions, but the above snippet suspiciously resembles a type annotation. How do contracts compare to types? First of all, contracts are checked at run-time, so they would correspond to dynamic typing rather than static typing. Secondly, contracts can check more than just the membership to a type:

let GreaterThan2 = fun label x =>
  if builtins.isNum x then
    if x > 2 then
      x
    else
      contracts.blame (contracts.tag "smaller or equals" label)
  else
    contracts.blame (contracts.tag "not a number" label)
in

(3 | #GreaterThan2) // Ok, evaluate to 3
(1 | #GreaterThan2) // Err, `smaller or equals`
("a" | #GreaterThan2) // Err, `not a number`

Here, we just built a custom contract. A custom contract is a function of two arguments:

  • the label label, carrying information for error reporting.
  • the value x to be tested.

If the value satisfies the condition, it is returned. Otherwise, a call to blame signals rejection with an optional error message attached via tag. When evaluating value | #Contract, the interpreter calls Contract with an appropriate label and value as arguments.

Such custom contracts can check arbitrary properties. Enforcing the property of being greater than two using static types is rather hard, requiring a fancy type system such as refinement types , while the role of dynamic types generally stops at distinguishing basic datatypes and functions.

Back to our first example 1 + 1 | Num, we could have written instead:

let MyNum = fun label x =>
  if builtins.isNum x then x else contracts.blame label in
(1 + 1 | #MyNum)

This is in fact pretty much what 1 + 1 | Num evaluates to. While a contract is not the same entity as a type, one can derive a contract from any type. Writing 1 + 1 | Num asks the interpreter to derive a contract from the type Num and to check 1 + 1 against it. This is just a convenient syntax to specify common contracts. The# character distinguishes contracts as types from contracts as functions (that is, custom contracts).

To sum up, contracts are just glorified assertions. Also, there is this incredibly convenient syntax that spares us a whole three characters by writing Num instead of #MyNum. So… is that all the fuss is about?

Function contracts

Until now, we have only considered what are called flat contracts, which operate on data. But Nickel is a functional programming language: so what about function contracts? They exist too!

let f | Str -> Num = fun x => if x == "a" then 0 else 1 in ...

Here again, we ask Nickel to derive a contract for us, from the type Str -> Num of functions sending strings to numbers. To find out how this contract could work, we must understand what is the defining property of a function of type Str -> Num that the contract should enforce.

A function of type Str -> Num has a duty: it must produce a number. But what if I call f on a boolean? That’s unfair, because the function has also a right: the argument must be a string. The full contract is thus: if you give me a string, I give you a number. If you give me something else, you broke the contract, so I can’t guarantee anything. Another way of viewing it is that the left side of the arrow represents preconditions on the input while the right side represents postconditions on the output.

More than flat contracts, function contracts show similarities with traditional legal contracts. We have two parties: the caller, f "b", and the callee, f. Both must meet conditions: the caller must provide a string while the callee must return a number.

In practice, inspecting the term f can tell us if it is a function at most. This is because a function is inert, waiting for an argument to hand back a result. In consequence, the contract is doomed to fire only when f is applied to an argument, in which case it checks that:

  1. The argument satisfies the Str contract
  2. The return value satisfies the Num contract

The interpreter performs additional bookkeeping to be able to correctly blame the offending code in case of a higher-order contract violation:

$nickel <<< 'let f | Str -> Num = fun x => if x == "a" then 0 else 1 in f "a"'
Done: Num(0.0)

$nickel <<< '... in f 0'
error: Blame error: contract broken by the caller.
  ┌─ :1:1
  │
1 │ Str -> Num
  │ --- expected type of the argument provided by the caller
  │
  ┌─ <stdin>:1:9
  │
1 │ let f | Str -> Num = fun x => if x == "a" then 0 else 1 in f 0
  │         ^^^^^^^^^^ bound here
[..]

$nickel <<< 'let f | Str -> Num = fun x => x in f "a"'
error: Blame error: contract broken by a function.
  ┌─ :1:8
  │
1 │ Str -> Num
  │        --- expected return type
  │
  ┌─ <stdin>:1:9
  │
1 │ let f | Str -> Num = fun x => x in f "a"
  │         ^^^^^^^^^^ bound here

These examples illustrate three possible situations:

  1. The contract is honored by both parties.
  2. The contract is broken by the caller, which provides a number instead of a string.
  3. The contract is broken by the function (callee), which rightfully got a string but returned a string instead of a number.

Combined with custom contracts, function contracts make it possible to express succinctly non-trivial invariants:

let f | #GreaterThan2 -> #GreaterThan2 = fun x => x + 1 in ..

A warning about laziness

Nickel is a lazy programming language. This means that expressions, including contracts, are evaluated only if they are needed. If you are experimenting with contracts and some checks buried inside lists or records do not seem to trigger, you can use the deepSeq operator to recursively force the evaluation of all subterms, including contracts:

let exp = ..YOUR CODE WITH CONTRACTS.. in builtins.deepSeq exp exp

Conclusion

In this post, I introduced programming with contracts. Contracts offer a principled and ergonomic way of validating data and enforcing invariants with a good error reporting story. Contracts can express arbitrary properties that are hard to enforce statically, and they can handle higher-order functions.

Contracts also have a special relationship with static typing. While we compared them as competitors somehow, contracts and static types are actually complementary, reunited in the setting of gradual typing. Nickel has gradual types, which will be the subject of a coming post.

The examples here are illustrative, but we’ll see more specific and compelling usages of contracts in yet another coming post about Nickel’s meta-values, which, together with contracts, serve as a unified way to describe and validate configurations.

January 22, 2021 12:00 AM

January 13, 2021

nixbuild.net

Finding Non-determinism with nixbuild.net

During the last decade, many initiatives focussing on making builds reproducible have gained momentum. reproducible-builds.org is a great resource for anyone interested in how the work progresses in multiple software communities. r13y.com tracks the current reproducibility metrics in NixOS.

Nix is particularly suited for working on reproducibility, since it by design isolates builds and comes with tools for finding non-determinism. The Nix community also works on related projects, like Trustix and the content-addressed store.

This blog post summarises how nixbuild.net can be useful for finding non-deterministic builds, and announces a new feature related to reproducibility!

Repeated Builds

The way to find non-reproducible builds is to run the same build multiple times and check for any difference in results, when compared bit-for-bit. Since Nix guarantees that all inputs will be identical between the runs, just finding differing output results is enough to conclude that a build is non-deterministic. Of course, we can never prove that a build is deterministic this way, but if we run the build many times, we gain a certain confidence in it.

To run a Nix build multiple times, simply add the –repeat option to your build command. It will run your build the number of extra times you specify.

Suppose we have the following Nix expression in deterministic.nix:

let
  inherit (import <nixpkgs> {}) runCommand;
in {
  stable = runCommand "stable" {} ''
    touch $out
  '';

  unstable = runCommand "unstable" {} ''
    echo $RANDOM > $out
  '';
}

We can run repeated builds like this (note that the --builders "" option is there to force a local build, to not use nixbuild.net):

$ nix-build deterministic.nix --builders "" -A stable --repeat 1
these derivations will be built:
  /nix/store/0fj164aqyhsciy7x97s1baswygxn8lzf-stable.drv
building '/nix/store/0fj164aqyhsciy7x97s1baswygxn8lzf-stable.drv' (round 1/2)...
building '/nix/store/0fj164aqyhsciy7x97s1baswygxn8lzf-stable.drv' (round 2/2)...
/nix/store/6502c5490rap0c8dhvfwm5rhi22i9clz-stable

$ nix-build deterministic.nix --builders "" -A unstable --repeat 1
these derivations will be built:
  /nix/store/psmn1s3bb97989w5a5b1gmjmprqcmf0k-unstable.drv
building '/nix/store/psmn1s3bb97989w5a5b1gmjmprqcmf0k-unstable.drv' (round 1/2)...
building '/nix/store/psmn1s3bb97989w5a5b1gmjmprqcmf0k-unstable.drv' (round 2/2)...
output '/nix/store/g7a5sf7iwdxs7q12ksrzlvjvz69yfq3l-unstable' of '/nix/store/psmn1s3bb97989w5a5b1gmjmprqcmf0k-unstable.drv' differs from previous round
error: build of '/nix/store/psmn1s3bb97989w5a5b1gmjmprqcmf0k-unstable.drv' failed

Running repeated builds on nixbuild.net works exactly the same way:

$ nix-build deterministic.nix -A stable --repeat 1
these derivations will be built:
  /nix/store/wnd5y30jp3xwpw1bhs4bmqsg5q60vc8i-stable.drv
building '/nix/store/wnd5y30jp3xwpw1bhs4bmqsg5q60vc8i-stable.drv' (round 1/2) on 'ssh://eu.nixbuild.net'...
copying 1 paths...
copying path '/nix/store/z3wlpwgz66ningdbggakqpvl0jp8bp36-stable' from 'ssh://eu.nixbuild.net'...
/nix/store/z3wlpwgz66ningdbggakqpvl0jp8bp36-stable

$ nix-build deterministic.nix -A unstable --repeat 1
these derivations will be built:
  /nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv
building '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' (round 1/2) on 'ssh://eu.nixbuild.net'...
[nixbuild.net] output '/nix/store/srch6l8pyl7z93c7gp1xzf6mq6rwqbaq-unstable' of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' differs from previous round
error: build of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' on 'ssh://eu.nixbuild.net' failed: build was non-deterministic
builder for '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' failed with exit code 1
error: build of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' failed

As you can see, the log output differs slightly between the local and the remote builds. This is because when Nix submits a remote build, it will not do the determinism check itself, instead it will leave it up to the builder (nixbuild.net in our case). This is actually a good thing, because it allows nixbuild.net to perform some optimizations for repeated builds. The following sections will enumerate those optimizations.

Finding Non-determinism in Past Builds

If you locally try to rebuild a something that has failed due to non-determinism, Nix will build it again at least two times (due to --repeat) and fail it due to non-determinism again, since it keeps no record of the previous build failure (other than the build log).

However, nixbuild.net keeps a record of every build performed, also for repeated builds. So when you try to build the same derivation again, nixbuild.net is smart enough to look at its past build and figure out that the derivation is non-deterministic without having to rebuild it. We can demonstrate this by re-running the last build from the example above:

$ nix-build deterministic.nix -A unstable --repeat 1
these derivations will be built:
  /nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv
building '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' (round 1/2) on 'ssh://eu.nixbuild.net'...
[nixbuild.net] output '/nix/store/srch6l8pyl7z93c7gp1xzf6mq6rwqbaq-unstable' of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' differs from previous round
error: build of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' on 'ssh://eu.nixbuild.net' failed: a previous build of the derivation was non-deterministic
builder for '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' failed with exit code 1
error: build of '/nix/store/6im1drv4pklqn8ziywbn44vq8am977vm-unstable.drv' failed

As you can see, the exact same derivation fails again, but now the build status message says: a previous build of the derivation was non-deterministic. This means nixbuild.net didn’t have to run the build, it just checked its past outputs for the derivation and noticed they differed.

When nixbuild.net looks at past builds it considers all outputs that have been signed by a key that the account trusts. That means that it can even compare outputs that have been fetched by substitution.

Scaling Out Repeated Builds

When you use --repeat, nixbuild.net will create multiple copies of the build and schedule all of them like any other build would have been scheduled. This means that every repeated build will run in parallel, saving time for the user. As soon as nixbuild.net has found proof of non-determinism, any repeated build still running will be cancelled.

Provoking Non-determinism through Filesystem Randomness

As promised in the beginning of this blog post, we have new a feature to announce! nixbuild.net is now able to inject randomness into the filesystem that the builds see when they run. This can be used to provoke builds to uncover non-deterministic behavior.

The idea is not new, it is in fact the exact same concept as have been implemented in the disorderfs project by reproducible-builds.org. However, we’re happy to make it easily accessible to nixbuild.net users. The feature is disabled by default, but can be enabled through a new user setting.

For the moment, the implementation will return directory entries in a random order when enabled. In the future we might inject more metadata randomness.

To demonstrate this feature, let’s use this build:

let
  inherit (import <nixpkgs> {}) runCommand;
in rec {
  files = runCommand "files" {} ''
    mkdir $out
    touch $out/{1..10}
  '';

  list = runCommand "list" {} ''
    ls -f ${files} > $out
  '';
}

The files build just creates ten empty files as its output, and the list build lists those file with ls. The -f option of ls disables sorting entirely, so the file names will be printed in the order the filesystem returns them. This means that the build output will depend on how the underlying filesystem is implemented, which could be considered a non-deterministic behavior.

First, we build it locally with --repeat:

$ nix-build non-deterministic-fs.nix --builders "" -A list --repeat 1
these derivations will be built:
  /nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv
building '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' (round 1/2)...
building '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' (round 2/2)...
/nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list

As you can see, the build succeeded. Then we delete the result from our Nix store so we can run the build again:

rm result
nix-store --delete /nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list

We enable the inject-fs-randomness feature through the nixbuild.net shell:

nixbuild.net> set inject-fs-randomness true

Then we run the build (with --repeat) on nixbuild.net:

$ nix-build non-deterministic-fs.nix -A list --repeat 1
these derivations will be built:
  /nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv
building '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' (round 1/2) on 'ssh://eu.nixbuild.net'...
copying 1 paths...
copying path '/nix/store/vl13q40hqp4q8x6xjvx0by06s1v9g3jy-files' to 'ssh://eu.nixbuild.net'...
[nixbuild.net] output '/nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list' of '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' differs from previous round
error: build of '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' on 'ssh://eu.nixbuild.net' failed: build was non-deterministic
builder for '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' failed with exit code 1
error: build of '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' failed

Now, nixbuild.net found the non-determinism! We can double check that the directory entries are in a random order by running without --repeat:

$ nix-build non-deterministic-fs.nix -A list
these derivations will be built:
  /nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv
building '/nix/store/153s3ir379cy27wpndd94qlfhz0wj71v-list.drv' on 'ssh://eu.nixbuild.net'...
copying 1 paths...
copying path '/nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list' from 'ssh://eu.nixbuild.net'...
/nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list

$ cat /nix/store/h1591y02qff8vls5v41khgjz2zpdr2mg-list
6
1
2
5
10
7
8
..
9
4
3
.

Future Work

There are lots of possibilities to improve the utility of nixbuild.net when it comes to reproducible builds. Your feedback and ideas are very welcome to support@nixbuild.net.

Here are some of the things that could be done:

  • Make it possible to trigger repeated builds for any previous build, without submitting a new build with Nix. For example, there could be a command in the nixbuild.net shell allowing a user to trigger a repeated build and report back any non-determinism issues.

  • Implement functionality similar to diffoscope to be able to find out exactly what differs between builds. This could be available as a shell command or through an API.

  • Make it possible to download specific build outputs. The way Nix downloads outputs (and stores them locally) doesn’t allow for having multiple variants of the same output, but nixbuild.net could provide this functionality through the shell or an API.

  • Inject more randomness inside the sandbox. Since we have complete control over the sandbox environment we can introduce more differences between repeated builds to provoke non-determinism. For example, we can schedule builds on different hardware or use different kernels between repeated builds.

  • Add support for listing known non-deterministic derivations.

by nixbuild.net (support@nixbuild.net) at January 13, 2021 12:00 AM

December 29, 2020

nixbuild.net

The First Year

One year ago nixbuild.net was announced to the Nix community for the very first time. The service then ran as a closed beta for 7 months until it was made generally available on the 28th of August 2020.

This blog post will try to summarize how nixbuild.net has evolved since GA four months ago, and give a glimpse of the future for the service.

Stability and Performance

Thousands of Nix builds have been built by nixbuild.net so far, and every build helps in making the service more reliable by uncovering possible edge cases in the build environment.

These are some of the stability-related improvements and fixes that have been deployed since GA:

  • Better detection and handling of builds that time out or hang.

  • Improved retry logic should our backend storage not deliver Nix closures as expected.

  • Fixes to the virtual file system inside the KVM sandbox.

  • Better handling of builds that have binary data in their log output.

  • Changes to the virtual sandbox environment so it looks even more like a “standard” Linux environment.

  • Application of the Nix sandbox inside our KVM sandbox. This basically guarantees that the Nix environment provided through nixbuild.net is identical to the Nix environment for local builds.

  • Support for following HTTP redirects from binary caches.

Even Better Build Reuse

One of the fundamental ideas in nixbuild.net is to try as hard as possible to not build your builds, if an existing build result can be reused instead. We can trivially reuse an account’s own builds since they are implicitly trusted by the user, but also untrusted builds can be reused under certain circumstances. This has been described in detail in an earlier blog post

Since GA we’ve introduced a number of new ways build results can be reused.

Reuse of Build Failures

Build failures are now also reused. This means that if someone tries to build a build that is identical (in the sense that the derivation and its transitive input closure is bit-by-bit identical) to a previously failed build, nixbuild.net will immediately serve back the failed result instead of re-running the build. You will even get the build log replayed.

Build failures can be reused since we are confident that our sandbox is pure, meaning that it will behave exactly the same as long as the build is exactly the same. Only non-transient failures will be reused. So if the builder misbehaves in some way that is out of control for Nix, that failure will not be reused. This can happen if the builder machine breaks down or something similar. In such cases we will automatically re-run the build anyway.

When we fix bugs or make major changes in our sandbox it can happen that we alter the behavior in terms of which builds succeed or fail. For example, we could find a build that fail just because we have missed implementing some specific detail in the sandbox. Once that is fixed, we don’t want to reuse such failures. To avoid that, all existing build failures will be “invalidated” on each major update of the sandbox.

If a user really wants to re-run a failed build on nixbuild.net, failure reuse can be turned off using the new user settings (see below).

Reuse of Build Timeouts

In a similar vein to reused build failures, we can also reuse build timeouts. This is not enabled by default, since users can select different timeout limits. A user can activate reuse of build timeouts through the user settings.

The reuse of timed out builds works like this: Each time a new build is submitted, we check if we have any previous build results of the exact same build. If no successful results or plain failures are found, we look for builds that have timed out. We then check if any of the existing timed out builds ran for longer than the user-specified timeout for the new build. If we can find such a result, it will be served back to the user instead of re-running the build.

This feature can be very useful if you want to avoid re-running builds that timeout over and over again (which can be a very time-consuming excercise). For example, say that you have your build timeout set to two hours, and some input needed for a build takes longer than that to build. The first time that input is needed you have to wait two hours to detect that the build will fail. If you then try building something else that happens to depend on the very same input you will save two hours by directly being served the build failure from nixbuild.net!

Wait for Running Builds

When a new build is submitted, nixbuild.net will now check if there is any identical build currently running (after checking for previous build results or failures). If there is, the new build will simply hold until the running build has finished. After that, the result of the running build will likely be served back as the result of the new build (as long as the running build wasn’t terminated in a transient way, in which case the new build will have to run from scratch). The identical running builds are checked and reused across accounts.

Before this change, nixbuild.net would simply start another build in parallel even if the builds were identical.

New Features

User Settings

A completely new feature has been launched since GA: User Settings. This allows end users to tweak the behavior of nixbuild.net. For example, the build reuse described above can be controlled by user settings. Other settings includes controlling the maximum used build time per month, and the possibility to lock down specific SSH keys which is useful in CI setups.

The user settings can be set in various way; through the nixbuild.net shell, the SSH client environment and even through the Nix derivations themselves.

Even if many users probably never need to change any settings, it can be helpful to read through the documentation to get a feeling for what is possible. If you need to differentiate permissions in any way (different settings for account administrators, developers, CI etc) you should definitely look into the various user settings.

GitHub CI Action

A GitHub Action has been published. This action makes it very easy to use nixbuild.net as a remote Nix builder in your GitHub Actions workflows. Instead of running you Nix builds on the two vCPUs provided by GitHub you can now enjoy scale-out Nix builds on nixbuild.net with minimal setup required.

The nixbuild.net GitHub Action is developed by the nixbuild.net team and there are plans on adding more functionality that nixbuild.net can offer users, like automatically generated cost and performance reports for your Nix builds.

Shell Improvements

Various minor improvements have been made to the nixbuild.net shell. It is for example now much easier to get an overview on how large your next invoice will be, through the usage command.

The Future

After one year of real world usage, we are very happy with the progress of nixbuild.net. It has been well received in the Nix community, proved both reliable and scalable, and it has delivered on our initial vision of a simple service that can integrate into any setup using Nix.

We feel that we can go anywhere from here, but we also realize that we must be guided by our users’ needs. We have compiled a small and informal roadmap below. The items on this list are things that we, based on the feedback we’ve received throughout the year, think are natural next steps for nixbuild.net.

The roadmap has no dates and no prioritization, and should be seen as merely a hint about which direction the development is heading. Any question or comment concerning this list (or what’s missing from the list) is very welcome to support@nixbuild.net.

Support aarch64-linux Builds

Work is already underway to add support for aarch64-linux builds to nixbuild.net, and so far it is looking good. With the current surge in performant ARM hardware (Apple M1, Ampere Altra etc), we think having aarch64 support in nixbuild.net is an obvious feature. It is also something that has been requested by our users.

We don’t know yet how the pricing of aarch64 builds will look, or what scalability promises we can make. If you are interested in evaluating aarch64 builds on nixbuild.net in an early access setting, just send us an email to support@nixbuild.net.

Provide an API over SSH and HTTP

Currently the nixbuild.net shell is the administrative tool we offer end users. We will keep developing the shell and make it more intuitive for interactive use. But will also add an alternative, more scriptable variant of the shell.

This alternative version will provide roughly the same functionality as the original shell, only more adapted to scripting instead of interactive use. The reason for providing such an SSH-based API is to make it easy to integrate nixbuild.net more tightly into CI and similar scenarios.

There is in fact already a tiny version of this API deployed. You can run the following command to try it out:

$ ssh eu.nixbuild.net api show public-signing-key
{"keyName":"nixbuild.net/bob-1","publicKey":"PmUhzAc4Ug6sf1uG8aobbqMdalxW41SHWH7FE0ie1BY="}

The above API command is in use by the nixbuild-action for GitHub. So far, this is the only API command implemented, and it should be seen as a very first proof of concept. Nothing has been decided on how the API should look and work in the future.

The API will also be offered over HTTP in addition to SSH.

Upload builds to binary caches

Adding custom binary caches that nixbuild.net can fetch dependencies from is supported today, although such requests are still handled manually through support.

We also want to support uploading to custom binary caches. That way users could gain performance by not having to first download build results from nixbuild.net and then upload them somewhere else. This could be very useful for CI setups that can spend a considerable amount of their time just uploading closures.

Provide an HTTP-based binary cache

Using nixbuild.net as a binary cache is handy since you don’t have to wait for any uploads after a build has finished. Instead, the closures will be immediately available in the binary cache, backed by nixbuild.net.

It is actually possible to use nixbuild.net as a binary cache today, by configuring an SSH-based cache (ssh://eu.nixbuild.net). This works out of the box right now. You can even use nix-copy-closure to upload paths to nixbuild.net. We just don’t yet give any guarantees on how long store paths are kept.

However, there are benfits to providing an HTTP-based cache. It would most probably have better performance (serving nar files over HTTP instead of using the nix-store protocol over SSH), but more importantly it would let us use a CDN for serving cache contents. This could help mitigate the fact that nixbuild.net is only deployed in Europe so far.

Support builds that use KVM

The primary motivation for this is to be able to run NixOS tests (with good performance) on nixbuild.net.

Thank You!

Finally we’d like to thank all our users. We look forward to an exciting new year with lots of Nix builds!

by nixbuild.net (support@nixbuild.net) at December 29, 2020 12:00 AM

December 24, 2020

Cachix

Postmortem of outage on 20th December

On 20 December, Cachix experienced a six-hour downtime, the second significant outage since the service started operating on 1 June 2018. Here are the details of what exactly happened and what has been done to prevent similar events from happening. Timeline (UTC) 02:55:07 - Backend starts to emit errors for all HTTP requests 02:56:00 - Pagerduty tries to notify me of outage via email, phone and mobile app 09:01:00 - I wake up and see the notifications 09:02:02 - Backend is restarted and recovers Root cause analysis All ~24k HTTP requests that reached the backend during the outage failed with the following exception:

by Domen Kožar (support@cachix.org) at December 24, 2020 11:30 AM