NixOS Planet

February 03, 2022

Tweag I/O

Trustix - Consensus and voting

In our last posts, we introduced the general idea of Trustix and its log-based append-only data structure. In brief, Trustix is a verification system for binary cache providers, and a tool that allows you to compare those software binaries across independent providers to ensure you’re getting the correct software, but without trusting a single binary cache.

In this post we look at the situation where providers disagree on the correct binaries, and how to aggregate those opinions and decide which output to trust. This problem comes up in many kinds of decentralized systems, and different solutions have arisen for each use-cases. We’ll discuss some of these solutions and see that, unfortunately, none of them would work for Trustix. The good news is, the solution Trustix does employ is far simpler and more robust.

Trusted consensus

Sometimes a service is decentralized on multiple servers for robustness, but all the servers are still owned and operated by the same group. One example of this is Signal’s “secure value recovery”. In order to agree upon state across the machines, they employ the consensus algorithm raft.

However, notice that in this situation all the machines are pre-defined and trusted, and the problem is only to deal with is synchronizing their state. A malicious attacker that controlled one of the machines could corrupt the consensus process. The whole point of Trustix is to be robust against a single build server being malicious.

Federated chat

Matrix is a chat system where there are many servers that communicate with each other and must synchronize state. It is unlike the previous example in that new, potentially untrusted servers can join. They have designed a system for state resolution.

The problem they are looking to solve is different, though. They are not concerned with accepting certain states and rejecting others, but with merging states. Their core method of trust is explicitly extending or rejecting permissions to different servers and users. This is like a blacklists of spam email servers. It works fine for messages, where if a few spam messages get through, we can simply ignore them. For software binaries though, a single malicious binary can run arbitrary code on your system. It can also be subtle enough that it is not immediately detected and added to the blacklist.

In other words, the Matrix system also fundamentally assumes that everyone is trustworthy, and simply adds a mechanism for distrusting servers. This won’t do for our purposes.

Blockchains

Blockchains are closer to our case: they are open to anyone to join, they don’t trust any individual actor, and the costs of a single corrupted block are high.

At its core, the solution is to have all participants in the block-chain vote, and take whatever the majority says. Now in order to hijack the system a bad actor needs to control a majority of the votes, known as a 51% attack.

There is one additional challenge, which is dealing with “voter fraud”. Voting in a simple and anonymous digital system comes at essentially zero cost; a single bad actor can simply vote a million times to gain a majority of votes. There are two methods employed to deal with this, proof of work and proof of stake.

In a proof of work system, popularized by bitcoin and adopted by most cryptocurrencies since, a vote must be accompanied by a proof of an expensive computation. This makes the “zero cost vote” now cost CPU power, and therefore money. It means that a 51% attack requires 51% of the compute power of all participants. This isn’t impossible, but is much harder and more costly.

One disadvantage of proof of work systems is their high energy cost and environmental impact. This has lead some blockchains such as Cardano and Tezos to use a proof-of-stake system instead, which distributes votes according to investment in the cryptocurrency. This means that a 51% attack requires a large up-front investment, and devalues the coin, which they have invested heavily in.

Notice that both proof-of-X solutions require big investment of capital. In the context of a cryptocurrency, where there is significant money to be made, those investments make sense. On the other hand, it’s not nearly as appealing for a company or an individual to operate a build log that comes with a similar cost every time a package build is submitted.

Local voting

So none of the solutions we’ve looked at will work for Trustix. Fortunately, they don’t need to.

Trustix, in contrast with all of these use-cases, does not actually require a globally agreed upon truth. If two of Signal’s servers have different records, that could lead to problems if a user switches from one server to another. If two matrix servers disagree about the messages sent in a room, this can manifest as two groups of users unable to talk to each other. If everyone does not agree on the series of blocks in a blockchain, you have two blockchains, not one.

If two users disagree about which package to trust they… download different packages. This does not pose the same kind of existential threat or bad user experience as it does for the other systems.

So we lean into this, and allow each user to define what consensus means to them, fully scriptable in Lua¹. This is especially well suited to Trustix for two reasons. First, unlike other systems we’ve discussed, it is not essential to reach a consensus in Trustix. If every builder reports a different output hash, the user can simply build that package from source.

Second, different users will have different threat models, different trusted servers, and different resources available. An everyday user may be more inclined to trust a reasonably robust server, and save themselves some compile-time. A high-profile security-conscious organization may care more about not having a single point of failure, and have access to the CPU power to recompile many packages if there’s any ambiguity among different build servers’ output.

Example Trustix voting methods

With that in mind, Trustix does not come with ready-built solutions for every need. A user can program their own decision procedure in Lua. Two common cases are included in Trustix already:

Minimum percentage voting

This is the idea that “X% of build servers must agree on the output”, which can be anywhere from “I trust all build servers implicitly, trust any of them” to “majority rules” to “it must be unanimous, or I’m building it myself.”

A simple extension of this could be weighting different servers based on degree of trust.

Name match

This allows you to specify a trusted build server, and returns whatever that server says the output is. This is how systems without Trustix work, simply trusting a given build server.

This method is primarily meant as a fallback method to express trust scenarios like “match only if at least two thirds of all logs agree on the output, but if no decision can be made return the value of log with name cache.nixos.org”, which would be a combination of the two provided methods.

A 51% attack in Trustix

In some sense, a 51% attack in Trustix isn’t possible or meaningful, since there is no distributed consensus taking place. In practice, for NixOS, there will be some default voting mechanism specified in the Trustix module, and most users will not change it. So there are a few attack vectors:

If the default voting mechanism is majority-rules, then an attacker would need to gain control of 51% of all configured nix Trustix logs, including access to the log’s private key, an attack which is extremely unlikely. Keep in mind that even if this should happen, high-profile targets would likely have configured their own voting procedure, so they would be unaffected.
With any kind of digital voting there is concern over someone making many accounts to amplify their vote. In this case “accounts” would each be a Trustix log. In order to register those logs as valid votes, they would either have to be added in an individual user’s configured list of logs, or added to the default list in the nixos module. If someone has compromised the nixos module though, they can run arbitrary code anyways.
A targeted attack would have to first find out the voting procedure being used by the target, which provides a small amount of protection. Then they would have to compromise enough of the right logs to sway that voting procedure. How hard that is depends on the specific voting procedure, but can be easily made extremely difficult.

Conclusion

In Trustix many global consensus models can’t be made functional for various reasons, but also not everyone has the same level of trust in different build servers, or the same security requirements. By defining consensus and having the vote client-side, we end up with a model that is much more flexible and can be tuned to your use cases and threat model.

The development of Trustix is funded by NLnet through the PET(privacy and trust enhancing technologies) fund.

Lua is a popular scripting language with good Go interoperability
↩

February 03, 2022 12:00 AM

January 14, 2022

Tweag I/O

Trustix - Good things come in trees

In the previous Trustix post Adam introduced Trustix and the general ideas behind it. In this post we will talk you through the data structures behind Trustix, in particular Merkle trees.

At its core, Trustix verifies computations by comparing pairs of hashable inputs and outputs between multiple builders by using build logs. A simple set of input-output pairs would be one way to implement this build log — the log would be queried with an input hash, and would then yield the corresponding output hash. Users could then compare it to the output hashes from other logs or binary caches, to check whether they have all obtained the same answer to a computation.

For security, these input-output pairs could be signed by the builder, which ensures their integrity as long as the signature key is not compromised.

Trustix goes further than this, using a “verifiable log” to which data can be appended but that cannot be modified in any other way. This log is structured as a tree where sets of entries have an identifying hash called the log head, which clients can use to verify that the past data has been preserved unmodified. This ensures that the log isn’t tampered with retroactively and if, for instance, a signature key is compromised, we only have to invalidate entries from that point forward.

Merkle trees

The way the append-only log works is somewhat similar to Git commits. In a Git repository, each commit is given a hash, which depends on its content and on the previous commits. If you modify an earlier commit X (and push --force) then all the commits after X see their hash change, and anybody who has a reference to one of theses commits will immediately notice that the branch has been modified. Therefore, a Git branch acts as a verifiably append only log of commits.

Trustix logs are similar, except that they use a tree structure, called a Merkle tree, instead of a chain. Merkle trees are ubiquitous in computer science and can be found in Blockchains, databases, file systems, peer-to-peer software and many other use cases.

Why a tree? While a simple chain of hashes, like Git’s, is sufficient to ensure that the log is append-only, looking up a particular input-output pair would require a linear scan of the entire log. With a Merkle tree, a lookup only requires walking and downloading one (logarithmically sized) branch, as we describe below.

How Merkle trees work

A Merkle tree has data in its leaf nodes, with immediate parents of those nodes being the hashes of each datum. In our case the data are the input-output hashes of the builds, and the first-level parent nodes thus contain hashes of these pairs of hashes. At the second level, we have hashes once more — but now we have the hashes for sets of nodes at the first level. That is, we have hashes of hashes of the input-output hashes. This goes on and on.

What is important to retain here is that gradually hashes get aggregated into fewer and fewer nodes, until the root node is reached. This root node (the log head) transitively depends on all leaf nodes and thus all data that is stored in the log.

Consider, for example, the following tree:

        root0
       /    \
      /      \
     /        \
    m          n
   / \        / \
  a   b      c   d
  |   |      |   |
  d0  d1     d2  d3

The d0, d1, d2, d3 nodes are data nodes and contain the input-output hashes in the log. a=h(d0), b=h(d1), c=h(d2), d=h(d3) are their hashes computed with the hash function h. The aggregated hashes m=h(a,b)=h(h(d0),h(d1)) and n=h(c,d)=h(h(d2),h(d3)) coalesce in the root hash root0 = h(m, n), which depends on all leaf nodes.

Let’s append two more nodes, maintaining the binary tree structure, and see what happens:

               root1
              /    \
             /      \
            /        \
           /          \
        root0          \
       /    \           \
      /      \           \
     /        \           \
    m          n          o
   / \        / \        / \
  a   b      c   d      e   f
  |   |      |   |      |   |
  d0  d1     d2  d3    d4  d5

Since we already have root0 from the previous state of the tree, the only thing we need to calculate root1 is to hash d4 and d5, let that propagate up the tree to o and hash root0 combined with o. Appending new branches thus doesn’t require us to traverse the whole tree again.

We can verify that nothing from the previous state was modified by seeing that root0 is still in the tree; we have only appended to it.¹

It is also possible to easily verify that a branch or a leaf node of the tree belongs there. For instance, if we want to check that d2 is in the tree, the builder sends us the hashs d, m, and o: the hashes of the siblings of the nodes on the path from root1 to d2. With d2, we compute c, with c and d, we compute n, with n and m we compute root0, and root0 and o we compute root1: if root1 coincides with the root of the builder’s log, then indeed, we have verified that d2 belongs to the log.

Let’s explore this in more detail in the context of Trustix.

Solving trust with Merkle trees

Let’s say we want to verify that an output of the builder is indeed what that builder built for that input derivation. We get pointed to an entry in the build log (a leaf node in a Merkle tree) that has the derivation’s input hash, and the build’s output hash as its values.

We need to verify that this is not a faked entry of the tree, so we take the path from the leaf node up to the root tree. We hash each node with its sibling to calculate their parent’s hash, and eventually we reach the root hash. If the result we get is the same as the root hash in the tree, then this entry is indeed a part of the log.

In addition, the root hash is signed with the log’s private key, which allows us to verify that the tree as a whole is correct according to its owner. We now have a single signed entity from which we can verify everything, as opposed to the current Nix binary cache which signs each entry individually. One advantage of this new approach is a defense against targeted attacks. Everyone can publish the log head they receive from a builder, so others can check they are receiving the same one.

The downside of a Merkle tree is that while it’s fast to verify an entry, it’s slow to find that entry — in general, we have to search the whole tree. In other words, the questions “is this (input hash, output hash) in the tree” and “what is the output hash value of this input hash?” are inefficient to answer.

Sparse Merkle trees to the rescue!

The sparse Merkle tree is a very clever indexed variation on the standard Merkle tree. Suppose that our hashes are 256 bit long, we make a Merkle tree with $<annotation encoding="application/x-tex">2^{256}</annotation></semantics>$ leaves. Each leaf is initially empty. To add an input-output pair, we compute $<semantics> I <annotation encoding="application/x-tex">I</annotation></semantics>$ , the hash of the input, and we change the $<semantics> I <annotation encoding="application/x-tex">I</annotation></semantics>$ -th leaf to contain the (hash of) the output. Effectively the leaves form an array of length $<annotation encoding="application/x-tex">2^{256}</annotation></semantics>$ .

Now we can easily find entries (or show they’re not present in the tree), by hashing the input and looking up the corresponding leaf. We can still verify that the entry belongs in the tree once we find it, by hashing our way up as we did before.

There are two problems. First, sparse Merkle trees are huge and time-consuming to generate. This can be solved by (ab)using the fact that most nodes in the tree are empty, to cache large sections of the tree that all look the same. Second, as you may have noticed our tree is no longer append-only. We’re not appending entries anymore, we’re modifying them from empty → something.

Combining trees for fun and profit

By combining both types of trees in a single log we can get the best of both worlds!

We want the log itself to be an append-only standard Merkle tree. So we use an input-hash indexed sparse Merkle tree for lookups, as earlier; but, instead of storing outputs directly, we store a reference to the input-output pair in the standard Merkle tree.

The new submission process is:

Append (input hash, output hash) to the standard Merkle tree
Get the new root hash of the tree
Sign the new root hash
Write a reference to that entry into the sparse Merkle tree, indexed by the input hash
Get the new root hash of the sparse tree
Sign the new sparse root hash
Publish the signed roots at a well-known location

The lookup process is:

Find the input hash of the derivation you want
Look up that input hash in the sparse Merkle tree
Verify that entry belongs in the tree, and the tree is correctly signed
Follow the reference to the standard Merkle tree
Verify that entry belongs in the tree, and the tree is correctly signed

Success! We can look up log entries, and prove that the log itself is append-only.

Readily available implementations

Why blockchains are not fit for purpose

Some astute readers may have noticed already that what we have described above is awfully close to a blockchain, and you may think that intuitively a blockchain would make sense as a foundation of trust. After all isn’t this what blockchains are all about? The problem comes down to the consensus models. Blockchains are all about distributed consensus, but what Trustix aims to solve requires a far more local idea of what consensus means, which makes all current blockchains unsuitable as a foundation.

Consensus, and therefore blockchains, comes with required financial models such as Proof-of-Work and Proof-of-Stake. Our feeling is that neither of these models are applicable to something like Trustix. They might be great for financial transactions, but carry too much inherent cost for a system like Trustix where we want log operation to come at essentially zero extra costs (hosting aside).

Trillian

Trillian is a mature implementation of Merkle trees that’s already widely used at internet scale, mainly for Certificate Transparency, something that has greatly inspired Trustix.

The performance of Trillian is excellent and it runs on top of many popular storage layers like MariaDB/MySQL, Google Cloud Spanner and PostgreSQL. Thanks to certificate transparency already being deployed at large scale there are many caching patterns to adopt from this ecosystem that apply directly to Trillian.

Trillian has support for smartcard based signing using the PKCS #11 standard, something you don’t necessarily get that easily with the other approaches.

This makes Trillian a very solid foundation to build on. It does require a more complex setup than the other solutions considered. Trillian also ties you to an RDBMS like MySQL or PostgreSQL, making it a very heavy weight solution.

Git

The one major thing Git has going for it is that it’s a simple, well understood format that could be stored easily, potentially even for free at providers like GitHub or GitLab. Git is also based on a structure of Merkle trees, however these are not exposed or designed in a way that makes them suitable for Trustix.

The performance numbers we saw out of Trillian over using Git were also far better at around 3300 submissions per second vs the around 200 we achieved with the Git-based approach. This shows that other solutions can be much more optimized and that Git is too much of a bottleneck.

Rolling your own (or using lower level libraries)

Rolling our own implementations from scratch has some major advantages in terms of allowing us to control the implementation and optimize for problems specific to the package management verification space, the requirements that the NixOS foundation has to deploy this at scale. This makes it much easier to optimize the structures.

Another advantage of this approach is that we entirely control the storage. A valuable property we get from this is that Trustix can run entirely self-contained with its own embedded database.

This turned out to be the best solution for Trustix as we highly benefit from the level of customization we can do.

Conclusion

By combining the strengths of two data structures — traditional and sparse Merkle trees — we can get the best of both worlds and prove the following about a log efficiently:

Non-inclusion of a derivation (i.e. this log never built a package)
Inclusion of a derivation (this entry is indeed a part of the log)
Prove correct operation (the append-only property of the log)

It requires a full log audit to prove that the sparse Merkle tree is append-only. This is needed less often, though, and can be offloaded onto semi-trusted verifiers. This will be explained in depth in a future blog post. When we look up builds, we verify them in the standard Merkle tree, which is easily verified to be append-only.

In the next post in the series we will elaborate on how Trustix compares logs between builders and makes decisions about which build outputs to use.

The development of Trustix is funded by NLnet through the PET(privacy and trust enhancing technologies) fund.

In general it may take slightly more work to verify that a Merkle tree is append-only. Imagine if we add more nodes at this point: o will get hashed with something new, and that will get hashed with root0, replacing root1. However, we can still find root0 and o in the tree, and reconstruct root1, showing it is contained unmodified in the new tree. Importantly, this is still a fast operation.
↩

January 14, 2022 12:00 AM

December 20, 2021

Tweag I/O

Nix 2.4 and 2.5

A couple of weeks ago Nix 2.4 was released. This was the first release in more than two years. More than 195 individuals contributed to this release. Since Tweag is the biggest contributor to the Nix project, I’d like to highlight some of the features that Tweag has worked on.

Flakes

Flakes are a new format to package Nix-based projects in a more discoverable, composable, consistent and reproducible way. A flake is just a repository or tarball containing a file named flake.nix that specifies dependencies on other flakes and returns any Nix assets such as packages, Nixpkgs overlays, NixOS modules or CI tests.

You can read more about flakes in the following blog posts:

The development of flakes was sponsored by Target Corporation and Tweag.

Content-addressed store

Nix’s store can now be content-addressed, meaning that the hash component of a store path is the hash of the path’s contents. Previously Nix could only build input-addressed store paths, where the hash is computed from the derivation dependency graph. Content-addressing allows deduplication, early cutoff in build systems, and unprivileged closure copying.

The content-addressed store (CAS) is described in detail in RFC 0062. It is still marked as experimental, and your input is welcome. You can read more about CAS in these blog posts:

CAS was developed by Tweag and Obsidian Systems, who were supported by an IPFS Grant.

UX improvements

The Nix command line interface (CLI) - commands such as nix-env and nix-build - is pretty old and doesn’t provide a very good user experience. A couple of years ago we started working on a new CLI: a single nix command to replace the nix-* commands that aims to be more modern, consistent, discoverable and pleasant to use.

However, work on the new CLI had stalled somewhat because we didn’t have a discoverable packaging mechanism for Nix projects. Thanks to flakes, we now do! As a result, in Nix 2.4, the nix command has seen a lot of work and is now almost at feature parity with the old CLI. It is centered around flakes; for example, a command like

> nix run nixpkgs#hello

runs the hello application from the nixpkgs flake.

Most of the work on the new CLI was done by Tweag. We organized a Nix UX team to review the state of the Nix user experience and plan improvements. A major result of the UX team is a set of CLI guidelines for the Nix project. More UX improvements are coming up, including an interactive progress indicator.

Experimental features and release schedule

The previous Nix release (2.3) was in September 2019. Having a 2-year gap between releases is something we want to avoid in the future, since it’s bad for both contributors and users that there is an unbounded amount of time before a new feature shows up in a stable release. The thing that has historically caused long gaps between Nix releases is new experimental features landing in master that we weren’t quite sure about, and doing a new release meant having to support these features indefinitely. However, Nix 2.4 introduces a mechanism to mark features as experimental, requiring them to be enabled explicitly on the command line or in the nix.conf configuration file. Thanks to this, we can merge experimental features in a way that still allows them to be changed or removed, while still getting feedback from adventurous users.

Therefore, starting with Nix 2.4, we have switched to a 6-weekly release schedule, meaning that we do a new release every 6 weeks. In fact, Nix 2.5.0 was already released a few days ago!

Non-blocking garbage collector

A very old annoyance with large Nix stores (such as CI systems) is that garbage collection can take a long time, and during that time, you couldn’t start new builds. Instead you would get the infamous message

waiting for the big garbage collector lock...

Nix 2.5 has a new garbage collector that makes this a thing of the past: the collector no longer prevents new builds from proceeding. The development of the new collector was sponsored by Flox.

December 20, 2021 12:00 AM

December 17, 2021

Cachix

NixOS OceanSprint 2021 wrap up

Last week a dozen or so people gathered on the island of Lanzarote to hack on NixOS. If it wasn’t for COVID-19, we’d have a week long of sauna and a private lake in Finland last year, but we had to cancel the event before it was even announced. Wrap up The weather on Lanzarote is usually sunny all year long. Last week we were a bit unfortunate to have two days of wind/clouds and we still managed to hack outdoors all days!

by Domen Kožar (support@cachix.org) at December 17, 2021 02:30 PM

December 15, 2021

Tweag I/O

21.11 Zero Hydra Failures Moscow Hackathon: Report and Suggestions

Twice a year — in May and November, nixpkgs maintainers and contributors get together to fix as many build failures as possible for the new release. The event is, as per tradition, called Zero Hydra Failures, or ZHF for short.

This year, me and fellow hacker cab404 had organized a hackathon to help the cause and spread the Nix love around. The basic premise was, quoting cab404:

Go—go—go!

We wanted to fix as many builds as possible, right before the branchoff. Fixing the broken builds would allow NixOS 21.11 to have a more complete package set.

The main point of this post is to share the experience with people looking to organize similar events.

Preparation

Due to the current lockdown in Moscow, we weren’t able to decide whether the hackathon would happen at all until about a week before the last possible date (the branchoff). This limited our ability to advertise the event in time for all potentially interested people to be able to join. Despite this, we tried our best to advertise the event using the following channels:

We created a website, both in English and Russian;
We sent announcements to the Russian NixOS community telegram group, Discourse and Matrix;
cab404 announced the event in his hackathon group chat.

Improvements

Obviously, we should have planned the event earlier. A week’s notice is way too little for many people, especially on a Friday. In hindsight, we could have anticipated that the branch-off was going to be late and ran the event on Saturday.

Setup

The event took place in undefspace, both physically and virtually (via workadventu.re).

We had provided lots of tea and snacks to physical participants, and a build machine to speed up builds.

Improvements

First of all, the hackerspace was quite small. All of us managed to fit, just. If the event attracted any more people, it could become problematic. Plan your capacity!

Another problem was with the build server setup: while it was running, we didn’t have time to provide people with actual instructions on using it, so a lot of time was spent building packages on slow laptop CPUs instead of the significantly powerful build machine. The theme of lack of instructions limiting the impact of the event deserves a separate section.

Hacking

As happens, most participants came in late. This meant that the spoken instructions I gave at the beginning weren’t heard by everyone, and it resulted in a slowdown in the hacking while people were trying to understand the process.

Another issue was that the written instructions on the website were aimed at mostly at experienced contributors, but most of participants didn’t have much nixpkgs experience — in fact, two of them made their first open-source contributions during the hackathon!

Improvements

The takeaway here is that a lot of attention should be given to making instructions for the hacking you’re going to do — make sure to have something for newcomers as well as experienced hackers.

Results

Friends were made

Nix{,OS,pkgs, Flakes} knowledge was shared

One of the best things about in-person hackathons is the ability to share knowledge and ideas — and plenty were shared!

I have learned some neat git-over-ssh tricks from Cab (in particular, I was pleasantly surprised that one doesn’t need any special git server for an ssh remote!). Also, thanks to Alexander I know about genericBuild which runs all phases as is done inside the nix sandbox during a regular nix build. And finally, I have confirmed I can install NixOS on a headless box with only the keyboard connected!
Nommy and Alexander explored the wonders of Nix Flakes: learned the usage basics (nix build, nix shell, nix develop), wrote their first flake.nix-es, and used their new knowledge to simplify the process of build debugging;
Anton has refreshed his Nix skills and explored the nixpkgs Haskell infrastructure in the process of fixing the mime-strings package.

And, most importantly, builds were fixed!

In total, 10 PRs were submitted during the hackathon: https://github.com/NixOS/nixpkgs/pulls?q=is%3Apr+zhfmsk

We fixed too many packages to count manually, and it’s not an easy thing to count programmatically. However, the openjfx11 fix on x86_64-linux has fixed a lot of Java packages, and other pull requests typically fixed one or two packages.

How are we going to improve the next one?

Pick the right time, in advance: we will try our best to arrange the hackathon on a weekend, with at least two weeks’ notice.

Inform people about the build server: a fast build server speeds up the debugging process significantly. Telling people about it, together with instructions on setting up a build server, is important.

Provide better instructions for all skill levels: prominently displayed instructions on what exactly people need to do, together with links to learning materials for novices, should reduce the need for repeated explanations tête-à-tête, and speed the hacking significantly.

December 15, 2021 12:00 AM

December 02, 2021

Tweag I/O

Implementing a content-addressed Nix

Although the feature is still marked as experimental, the release of Nix 2.4 marks the entry of content-addressed derivations in a released Nix version. This is something to celebrate, but also an occasion to look back and see what this addition means, implementation wise.

This new feature required indeed some deep changes both in the high-level algorithm describing the build loop (and also as a consequence in its low-level details) as well as in the “plumbing” code around it.

Note: Because it touches the implementation, this post will be more technical than the others, and will also assume that you’re somewhat familiar with what content-addressed derivations means, and how it works. If you haven’t done it already, I invite you to read the previous blog articles on the topic (1, 2 and 3), and if you’re really curious, the corresponding RFC.

High-level overview of the build loop

The first − and most obvious − change is that the flow of the build loop itself had to be adapted to take into account this new way of realising derivations.

Before looking at what changed, let’s try to understand how things used to work. What happens (or rather happened) when someone runs nix-build default.nix? The first thing of course, is that the Nix expression evaluator will happily evaluate this default.nix file. We won’t detail this process here, suffice it to say that this will return a derivation which represents the thing that we want to build.

This derivation will be passed on to the “build loop”, whose behavior is roughly the following (I’m using a Python-like pseudocode, but the actual implementation is in C++):

def buildDerivation(derivation : Derivation) -> None:
    remainingOutputs = tryToSubstituteOutputs(derivation)
    if (remainingOutputs == []):
        return
    buildInputs()
    doBuild()
    registerOutputPaths()

In plain english, what happens is that the builder will first try to substitute the outputs of the derivation from the configured binary caches. If some couldn’t be substituted, then it will recursively build all the inputs of the derivation, build the derivation itself and eventually register its output paths in the database. Simple and straightforward (although as you might imagine the devil is in the details, like always).

So what changes with content-addressed derivations? Quite a bunch of things in fact.

Building resolved derivations

The first big change is that for early-cutoff to work, we don’t really want to build the derivation that’s given to us, but rather its resolved version, which is the same derivation but where all the inputs are replaced by their content-addressed path (this is explained more in details in the corresponding RFC). So our buildDerivation should actually be redefined as

def buildDerivation(derivation : Derivation) -> Set[Realisation]:
    remainingOutputs = tryToSubstituteOutputs(derivation)
    if (remainingOutputs == []):
        return
    buildInputs()
    return buildResolvedDerivation(Derivation.resolve(derivation))

where buildResolvedDerivation : ResolvedDerivation -> Set[Realisation] is the one that will do most of the job

Calculated output paths

Another change is that while the output paths used to be a given, they are now a product of the build (it’s the whole point of content-addressed derivations). It means that:

We must register what path each output maps to in the database, and
This mapping must be passed to any piece of code that needs to access the output paths as it can’t just be inferred anymore.

The first point is handled by a new function registerRealisation : Realisation -> (), where a Realisation associates a given derivation output (foo.drv!out) to a given store path (/nix/store/…-foo-out).

For the second point, we must change the registerOutputPaths function a bit: the way doBuild works is that it will build everything in a predetermined temporary location. Then registerOutputPaths will do all the magic to move these paths to their final content-addressed location, as described in my post about self-references. Eventually, this function will return a set of Realisation materializing the newly built derivation outputs.

Our new buildResolvedDerivation now looks like:

def buildResolvedDerivation(derivation : ResolvedDerivation)
      -> Set[Realisation]:
    # Maybe a substituter didn’t know about the original derivation,
    # but knows about the resolved one, so let’s try substituting
    # again
    remainingOutputs = tryToSubstituteOutputs(derivation)
    if (remainingOutputs == []):
        return
    # No need to try building the inputs as by definition
    # a resolved derivation already has all its inputs available
    doBuild()
    newRealisations = registerOutputPaths()
    for realisation in newRealisations:
        registerRealisation(realisation)
    return newRealisations

Registering realisations

In addition to the two points above, we must also register the new realisations for the original (non-resolved) derivation, meaning that buildDerivation should take the set of Realisation that buildResolvedDerivation provides, and register them as if they were its own build outputs. Something like:

def buildDerivation(derivation : Derivation) -> Set[Realisation]:
  remainingOutputs = tryToSubstituteOutputs(derivation)
  if (remainingOutputs == []):
      return
  buildInputs()
  newRealisations = buildResolvedDerivation(
          Derivation.resolve(derivation))
  registerRealisationsFromResolved(newRealisations)
  return newRealisations

Mixing content-addressed and non-content-addressed

These changes are obviously making the build process slightly more involved, but in a way this is only making explicit some stuff that was implicitely passed. The actual build loop is actually even more complex than that right now (even at this level of abstraction), because there’s something else to add on top of this: since content-addressed derivations are an experimental features, this new build loop must only be followed when the feature is enabled. So there’s a bunch of if (settings.isExperimentalFeatureEnabled(Xp::CaDerivations)) in a bunch of places. I won’t write the full algorithm, because it’s more messy than interesting, and it’s (hopefully 🤞) only temporary as the feature is supposed to get out of its experimental state one way or another at some point.

The build scheduling system

Getting more into the details and closer to the actual implementation, things don’t look that nicely structured, mostly because everything is plagued with a weird cps-like style that seems to come out of nowhere.

The reason for this style is that all this process has to be asynchronous, so that the internal scheduler can properly supervise an arbitrary number of concurrent builds and keep the user informed as to what’s going on. And because this part of the code base predates all the nice async frameworks that we can have right now, it uses its own framework, which indeed does the job but does impose a cost in terms of maintainability and readability of the code¹.

Explaining this mechanism in details would be too long to fit in this post², so I’ll just give a quick overview of the bits that really matter here.

At its core, this framework is a cooperative scheduling system. Each high-level task is represented by a subclass of Goal, which is some kind of state machine. The interface of a Goal, is essentially:

class Goal:
    # type State = Goal -> None
    State = Callable[[Goal], None]
    state : State

    dependencies : [Goal]

    # Called when a goal wants to say that it doesn’t have more to do
    def amDone(self) -> None:
      ...

The idea is that state should point to a method of the goal (which represents the next thing to do for the goal). The method pointed to by state should eventually return, and set state to a new method representing the next thing to run (so morally yielding the control back to the scheduler). On top of that, a scheduler will look at all the goals, and will schedule all the goals with no dependencies by calling their state method, until they call amDone to signal that they can be destroyed.

A consequence of this design is that there’s no function inside a goal, just “actions” (the States) that take no argument except the goal itself and return nothing. So any information that must be passed to or returned from a Sate must be present in a global environment (with respect to the current goal), meaning that it must be a field of the class (which is neither memory-efficient nor nice to work with, but has the advantage of making the whole scheduling system much simpler). Likewise, no information can be passed from a goal to another, except by side-effects.

For example, a simplified version of the interface of the DerivationGoal class (the one representing for the buildDerivation function above) would be something like:

class DerivationGoal(Goal):
    # type State = DerivationGoal -> None
    State = Callable[[DerivationGoal], None]

    derivation : Derivation

    outputsToBuild : List[OutputName] = []
    def tryToSubstituteOutputs(self) -> None:
        # Will fill `outputsToBuild` with the name of the outputs that couldn’t
        # be substituted.
        ...
        state = self.outputsSubstituted

    def outputsSubstituted(self) -> None:
        if (outputsToBuild) == []:
            state = 0 # Well that’s not really valid in python, but nevermind
        else:
            state = self.doBuild

    def doBuild(self) -> None:
        ...
        state = self.registerOutputPaths

    newRealisations : List[Realisation] = []
    def registerOutputPaths(self) -> None:
        # sets `newRealisations`
        ...
        state = self.registerRealisations

    def registerRealisations(self) -> None:
        ...
        self.amDone()

As can be seen above, this “state machine” isn’t very modular since everything has to be in the same scope. Besides, this is also a minefield, as most States rely on some implicit state (yes, the wording is a bit tricky) because it’s going to assume that a previous State has set the right fields of the class. Failure to meet this requirements will generally result at best in a segfault, at worst in some nonsensical behavior. Also because of this reliance to an ambient state, sharing code between two different Goal types is much more complex.

This means that the relatively complex logic needed to build content-addressed derivations has been pretty painful to implement. In a nutshell, it works by having a single goal DerivationGoal, implementing both the buildDerivation and buildResolvedDerivation functions above, and that will behave as one or the other depending on the shape of the given derivation. If the derivation doesn’t have the shape of a resolved one, then right at the end, tryToSubstituteOutputs will resolve the derivation, declare a dependency on a new DerivationGoal (with the resolved derivation, this time), and switch to a new resolvedFinished state that will

Query the database for the newly registered realisations (since they couldn’t be passed explicitely between the two goals)
Implement the registerRealisationsFromResolved function described above. This obviously also meant adding a bunch of new fields in the DerivationGoal class to handle all the CA-specific values that had to be passed around between goals.

Outside of the build loop

So far, we’ve been deliberately ignoring everything happening at instantiation time, and focused on what’s happening at build time. Let’s now look at some of the plumbing.

At the interface between the Nix evaluator and the builder lie the “derivations”. These are the data-structures that are returned by the evaluation and which describe in low-level terms how to build store paths.

Amongst other things, a derivation contains

A set of inputs
A build script
A list of outputs

In an input-addressed world, these outputs are plain store paths (computed by hashing all the rest of the derivation). This isn’t the case in an input-addressed world (as the output paths can only be computed once the build is done). But we still need a way to refer to these outputs in the derivation. In particular, we want to export each output path as an environment variable before the start of the build, so that we can do for example echo Foo > $out to create a file as the out output of the derivation. To be able to do that, we assign to each output a “placeholder”, a long chain of characters (which is also computed from a hash of the rest of the derivation so that it is deterministic but can’t be guessed a priori). Right before the build, this placeholder will be replaced by the temporary output used for the build.

More generally, the fact that we don’t know the output paths in advance led to some global changes in the code base, as a lot of places assumed otherwise. For example, the logic for nix build was something along the lines of:

class Buildable:
    """Represents a derivation that can be built """
    drvPath : StorePath
    outputs : Map[str,StorePath]

buildable : Buildable = evalCode(input)
buildBuildable(buildable)
symlink(buildable.outputs["result"], "result")

The Buildable class obviously can’t represent unbuilt content-addressed derivations, so it had to be changed. In a first step, we changed the type of the outputs field to Map[str,Optional[StorePath]] to take into account the fact that for some derivations in some contexts, we don’t know the output paths. This change isn’t the best one semantically speaking, since in most places we should know statically whether the output paths are known or not. But it had the advantage of being very simple to implement and get to work with input-addressed derivations (just add some indirection whenever we access the output field as for input-addressed derivations the field will always be set). And from that, we could progressively migrate to take into account content-addressed derivations by making sure that we weren’t blindly dereferencing the optional in places where it was potentially empty.

Then, we could go one step further and replace this class by two different ones:

class Buildable:
    """Something that we might want to build"""
    drvPath: StorePath
    outputNames: Set[str]

class Built:
    """Something that we have built"""
    drvPath: StorePath
    outputs : Map[str,StorePath]

The only ways to create a Built value being to either build a Buildable or query the database to (optionnally) get a prebuilt version of it, we now have our safety back, and we can be sure that internally, Nix will never try to access the path of an unbuilt derivation output.

Another issue we have to deal with, is that Nix is (or can be used as) a distributed system: The most common setup involves a client and a daemon, and distributed builds and binary caches can mean that several versions of the tool can have to work together. Obviously, this means that different versions have to play well together (as much as the interesection of the features of each allows), and the introduction of content-addressed derivations isn’t an excuse for breaking this.

Amongst other things, this means that

If the daemon or a remote builder doesn’t handle CA derivations, then things must either work or fail graciously. In the case of the remote builder, we reused a trick from the work of recursive-nix, which is that content-addressed derivations will only be sent to builders which advertise the ca-derivations feature. That way a network of Nix machines can freely mix ca-aware and non-ca-aware machines.
Remote caches must also transparently support both content-addressed and input-addressed derivations. Speaking in terms of http binary caches, this was actually quite natural, because as far as remote-caching is concerned, content-addressed derivations build on top of already existing mechanisms, and only requires adding a (totally separate) endpoint to query for the realisations. So the client is free to ignore the substituter if it doesn’t provide that endpoint, and conversely if the client doesn’t query that endpoint… well nothing happens, but that’s precisely what we want.

Conclusion

Overall, implementing this was quite a bumpy ride, but also an awesome occasion to dig into the internals of Nix (and try to improve them by the way). Hopefully, this post was also an occasion for you to join me in this wild ride, and give you some taste of how your favorite package manager works internally. And maybe even make you want to contribute to it?

If this makes you wonder whether we could do better, the answer is yes, we just need more contributors to be able to tackle this sort of deep refactorings
↩
Which also gives me a nice escape hatch so that I don’t have to admit that I don’t really understand this framework
↩

December 02, 2021 12:00 AM

Hercules Labs

CI Development Update

Blogging has been a while and lot has happened.

Hercules CI Effects

The Hercules CI Effects beta was introduced: a principled way for running code after build success, with access to network and secrets. See the docs.

Other improvements

Some highlights:

Streaming build and evaluation logs (as I’ve said, it’s been a while since the last update!)
Builds on aarch64-linux
Support for arbitrary Nix caches as supported by the Nix --store flag
Builds on aarch64-darwin (hercules-ci-agent 0.8.4)
Various improvements to the dashboard
Build failure notification emails
The hci command line interface
Remote state file plugin for NixOps 2

Upcoming Release

Meanwhile, the upcoming agent 0.9 release is going to be another exciting one, with Flakes support and multi-input jobs; a way to make continuous integration more continuous, by including the latest builds of dependency repositories.

So, stay tuned!

December 02, 2021 12:00 AM

October 15, 2021

Cachix

Introducing Organizations

Today I’m introducing Organizations support in Cachix. If you’ve ever had to manage multiple Cachix accounts, wanted a team-specific dashboard or needed to have multiple admins, you’re going to love Organizations. Once you’ve created an organization, it will automatically enter a 14-day trial period. If you’d like to use Organizations for an open source project, please reach out via live chat or domen@cachix.org. Get started Click on “Select an account” dropdown at the top right corner:

by Domen Kožar (support@cachix.org) at October 15, 2021 03:30 PM

October 01, 2021

nixbuild.net

KVM builds supported on nixbuild.net

Less than two weeks ago, we announced support for ARM builds on nixbuild.net. Today we are excited to announce another early access feature — support for builds that use KVM!

KVM support is something that almost no public CI/CD provider offers, so we are very happy to be able to make this available to our users.

For NixOS users and developers, this is especially valuable since it makes it possible to run integration tests based on the powerful testing framework found in nixpkgs/NixOS.

KVM builds are currently in an Early Access phase. If you want to try it out, contact us at support@nixbuild.net. Anyone is free to request access, but there might be waiting time depending on interest. During Early Access, KVM builds are priced and handled exactly as standard builds. The final pricing and price model for builds that require KVM has not been settled yet.

So far KVM builds are only supported on x86_64-linux, not on ARM.

Setup

Once you’ve gained access to running KVM builds, configuring your Nix client is simple. You just need to mark your eu.nixbuild.net remote builder with the kvm system feature. And if you want to run NixOS integration tests you’ll need the nixos-test feature too. For more detailed configuration descriptions see the documentation.

KVM builds in GitHub Actions

The nixbuild.net GitHub Action has been updated to support KVM builds too. If you’ve been granted access to KVM builds, all you have to do is to make sure you use the latest release of nixbuild-action. No extra configuration is needed to run KVM builds, or NixOS integration tests, in GitHub Actions.

by nixbuild.net (support@nixbuild.net) at October 01, 2021 12:00 AM

September 20, 2021

nixbuild.net

nixbuild.net now supports ARM builds!

As of today, there is support for running aarch64-linux builds in nixbuild.net. This is a feature that has been requested by many of our users, and we are very happy to be able to provide this as part of our service.

For the moment, ARM builds are in an Early Access phase. If you want to try it out, just drop us a line at support@nixbuild.net. Anyone is free to request access, but there might be waiting time depending on interest. During Early Access, ARM builds are priced and handled exactly as the standard x86_64-linux builds. The final pricing and price model for ARM builds has not been settled yet.

Setup

Once you’ve gained access to running ARM builds, configuring your Nix client is just as straightforward as for x86_64-linux builds. Just swap the system of your remote builder to aarch64-linux. You can use nixbuild.net for both x86_64 and ARM builds at the same time. For more detailed configuration descriptions see the documentation.

Zero-config aarch64-linux builds in GitHub Actions

Together with the new support for ARM builds in nixbuild.net, there’s also a new release of the nixbuild.net GitHub Action. The release (v6) enables support for aarch64-linux for your GitHub builds. If you’ve been granted access to aarch64-linux builds, all you have to do is to add --system aarch64-linux or similar to your nix invocations inside your GitHub actions.

It is of course also simple to run aarch64-linux builds from GitLab, Hydra or any other CI setup using Nix. Just add nixbuild.net as an aarch64-linux remote builder as described in the documentation.

by nixbuild.net (support@nixbuild.net) at September 20, 2021 12:00 AM

September 06, 2021

Mayflower

Inspecting coredumps like it's 2021

A coredump is a snapshot of a process’s memory that is usually created by the kernel when a crash happens. These can be fairly helpful to find out which part of the code broke by looking at the backtrace or finding any kind of corruption by introspecting the memory itself. Unfortunately it can be a bit tedious to work with these. This article aims to give an overview over helpful tools & tricks to leverage the full power of coredumps on Nix-based systems.

September 06, 2021 09:23 AM

August 31, 2021

Sander van der Burg

A more elaborate approach for bypassing NPM's dependency management features in Nix builds

Nix is a general purpose package manager that can be used to automate the deployments of a variety of systems -- it can deploy components written in a variety of programming languages (e.g. C, C++, Java, Go, Rust, Perl, Python, JavaScript) using various kinds of technologies and frameworks, such as Django, Android, and Node.js.

Another unique selling point of Nix is that it provides strong reproducibility guarantees. If a build succeeds on one machine, then performing the same build on another should result in a build that is (nearly) bit-identical.

Nix improves build reproducibility by complementing build processes with features, such as:

Storing all artifacts in isolation in a so-called Nix store: /nix/store (e.g. packages, configuration files), in which every path is unique by prefixing it with an SHA256 hash code derived from all build inputs (e.g. dependencies, build scripts etc.). Isolated paths make it possible for multiple variants and versions of the same packages to safely co-exist.
Clearing environment variables or setting them to dummy values. In combination with unique and isolated Nix store paths, search environment variables must configured in such a way that the build script can find its dependencies in the Nix store, or it will fail.

Having to specify all search environment variables may sound inconvenient, but prevents undeclared dependencies to accidentally make a build succeed -- deployment of such a package is very likely to fail on machine that misses an unknown dependency.
Running builds as an unprivileged user that does not have any rights to make modifications to the host system -- a build can only write in its designated temp folder or output paths.
Optionally running builds in a chroot environment, so that a build cannot possibly find any undeclared host system dependencies through hard-coded absolute paths.
Restricting network access to prevent a build from obtaining unknown dependencies that may influence the build outcome.

For many build tools, the Nixpkgs repository provides abstraction functions that allow you to easily construct a package from source code (e.g. GNU Make, GNU Autotools, Apache Ant, Perl's MakeMaker, SCons etc.).

However, certain tools are difficult to use in combination with Nix -- for example, NPM that is used to deploy Node.js projects.

NPM is both a dependency and build manager and the former aspect conflicts with Nix -- builds in Nix are typically prevented from downloading files from remote network locations, with the exception of so-called fixed-output derivations in which the output hash is known in advance.

If network connections would be allowed in regular builds, then Nix can no longer ensure that a build is reproducible (i.e. that the hash code in the Nix store path reflects the same build output derived from all inputs).

To cope with the conflicting dependency management feature of NPM, various kinds of integrations have been developed. npm2nix was the first, and several years ago I have started node2nix to provide a solution that aims for accuracy.

Basically, the build process of an NPM package in Nix boils down to performing the following steps in a Nix derivation:


# populate the node_modules/ folder
npm install --offline

We must first obtain the required dependencies of a project through the Nix package manager and install them in the correct locations in the node_modules/ directory tree.

Finally, we should run NPM in offline mode forcing it not to re-obtain or re-install any dependencies, but still perform build management tasks, such as running build scripts.

From a high-level point of view, this principle may look simple, but in practice it is not:

With earlier versions of NPM, we were forced to imitate its dependency resolution algorithm. At first sight, it looked simple, but getting it right (such as coping with circular dependencies and dependency de-duplication) is much more difficult than expected.
NPM 5.x introduced lock files. For NPM development projects, they provide exact version specifiers of all dependencies and transitive dependencies, making it much easier to know which dependencies need to be installed.

Unfortunately, NPM also introduced an offline cache, that prevents us from simply copying packages into the node_modules/ tree. As a result, we need to make additional complex modifications to the package.json configuration files of all dependencies.

Furthermore, end user package installations do not work with lock files, requiring us to still keep our custom implementation of the dependency resolution algorithm.
NPM's behaviour with dependencies on directories on the local file system has changed. In old versions of NPM, such dependencies were copied, but in newer versions, they are symlinked. Furthermore, each directory dependency maintains its own node_modules/ directory for transitive dependencies.

Because we need to take many kinds of installation scenarios into account and work around the directory dependency challenges, the implementation of the build environment: node-env.nix in node2nix has become very complicated.

It has become so complicated that I consider it a major impediment in making any significant changes to the build environment.

In the last few weeks, I have been working on a companion tool named: placebo-npm that should simplify the installation process. Moreover, it should also fix a number of frequently reported issues.

In this blog post, I will explain how the tool works.

Lock-driven deployments

In NPM 5.x, package-lock.json files were introduced. The fact that they capture the exact versions of all dependencies and make all transitive dependencies known, makes certain aspects of an NPM deployment in a Nix build environment easier.

For lock-driven projects, we no longer have to run our own implementation of the dependency resolution algorithm to figure out what the exact versions of all dependencies and transitive dependencies are.

For example, a project with the following package.json:


{
  "name": "simpleproject",
  "version": "0.0.1",
  "dependencies": {
    "underscore": "*",
    "prom2cb": "github:svanderburg/prom2cb",
    "async": "https://mylocalserver/async-3.2.1.tgz"
  }
}

may have the following package-lock.json file:


{
  "name": "simpleproject",
  "version": "0.0.1",
  "lockfileVersion": 1,
  "requires": true,
  "dependencies": {
    "async": {
      "version": "https://mylocalserver/async-3.2.1.tgz",
      "integrity": "sha512-XdD5lRO/87udXCMC9meWdYiR+Nq6ZjUfXidViUZGu2F1MO4T3XwZ1et0hb2++BgLfhyJwy44BGB/yx80ABx8hg=="
    },
    "prom2cb": {
      "version": "github:svanderburg/prom2cb#fab277adce1af3bc685f06fa1e43d889362a0e34",
      "from": "github:svanderburg/prom2cb"
    },
    "underscore": {
      "version": "1.13.1",
      "resolved": "https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz",
      "integrity": "sha512-hzSoAVtJF+3ZtiFX0VgfFPHEDRm7Y/QPjGyNo4TVdnDTdft3tr8hEkD25a1jC+TjTuE7tkHGKkhwCgs9dgBB2g=="
    }
  }
}

As you may notice, the package.json file declares three dependencies:

The first dependency is underscore that refers to the latest version in the NPM registry. In the package-lock.json file, the dependency is frozen to version 1.13.1. The resolved property provides the URL where the tarball should be obtained from. Its integrity can be verified with the given SHA512 hash.
The second dependency: prom2cb refers to the latest revision of the main branch of the prom2cb Git repository on GitHub. In the package-lock.json file, it is pinpointed to the fab277... revision.
The third dependency: async refers to a tarball that is downloaded from an arbitrary HTTP URL. The package-lock.json records its SHA512 integrity hash to make sure that we can only deploy with the version that we have used previously.

As explained earlier, to ensure purity, in a Nix build environment, we cannot allow NPM to obtain the required dependencies of a project. Instead, we must let Nix obtain all the dependencies.

When all dependencies have been obtained, we should populate the node_modules/ folder of the project. In the above example, it is just simply a matter of unpacking the tarballs or copying the Git clones into the node_modules/ folder of the project. No transitive dependencies need to be deployed.

For projects that do not rely on build scripts (that perform tasks, such as linting, compiling code, such as TypeScript etc.) this typically suffices to make a project work.

However, when we also need build management, we need to run the full installation process:


$ npm install --offline

npm ERR! code ENOTCACHED
npm ERR! request to https://registry.npmjs.org/async/-/async-3.2.1.tgz failed: cache mode is 'only-if-cached' but no cached response available.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/sander/.npm/_logs/2021-08-29T12_56_13_978Z-debug.log

Unfortunately, NPM still tries to obtain the dependencies despite the fact that they have already been copied into the right locations into node_modules folder.

Bypassing the offline cache

To cope with the problem that manually obtained dependencies cannot be detected, my initial idea was to use the NPM offline cache in a specific way.

The offline cache claims to be content-addressable, meaning that every item can be looked up by using a hash code that represents its contents, regardless of its origins. Unfortunately, it turns out that this property cannot be fully exploited.

For example, when we obtain the underscore tarball (with the exact same contents) from a different URL:


$ npm cache add http://mylocalcache/underscore-1.13.1.tgz

and run the installation in offline mode:


$ npm install --offline
npm ERR! code ENOTCACHED
npm ERR! request to https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz failed: cache mode is 'only-if-cached' but no cached response available.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/sander/.npm/_logs/2021-08-26T13_50_15_137Z-debug.log

The installation still fails, despite the fact that we already have a tarball (with the exact same SHA512 hash) in our cache.

However, downloading underscore from its original location (the NPM registry):


$ npm cache add underscore@1.13.1

makes the installation succeed.

The reason why downloading the same tarball from an arbitrary HTTP URL does not work is because NPM will only compute a SHA1 hash. Obtaining a tarball from the NPM registry causes NPM to compute a SHA512 hash. Because it was downloaded from a different source, it fails to recognize the SHA512 hash in the package-lock.json file.

We also run into similar issues when we obtain an old package from the NPM registry that only has an SHA1 hash. Importing the same file from a local file path causes NPM to compute a SHA512 hash. As a result, npm install tries to re-obtain the same tarball from the remote location, because the hash was not recognized.

To cope with these problems, placebo-npm will completely bypass the cache. After all dependencies have been copied to the node_modules folder, it modifies their package.json configuration files with hidden metadata properties to trick NPM that they came from their original locations.

For example, to make the underscore dependency work (that is normally obtained from the NPM registry), we must add the following properties to the package.json file:


{
  ...
  _from: "underscore@https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz",
  _integrity: "sha512-XdD5lRO/87udXCMC9meWdYiR+Nq6ZjUfXidViUZGu2F1MO4T3XwZ1et0hb2++BgLfhyJwy44BGB/yx80ABx8hg==",
  _resolved: "https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz"
}

For prom2cb (that is a Git dependency), we should add:


{
  ...
  _from = "github:svanderburg/prom2cb",
  _integrity = "",
  _resolved = "github:svanderburg/prom2cb#fab277adce1af3bc685f06fa1e43d889362a0e34"
}

and for HTTP/HTTPS dependencies and local files we should do something similar (adding _from and _integrity fields).

With these modifications, NPM will no longer attempt to consult the local cache, making the dependency installation step succeed.

Handling directory dependencies

Another challenge is dependencies on local directories, that are frequently used for local development projects:


{
  "name": "simpleproject",
  "version": "0.0.1",
  "dependencies": {
    "underscore": "*",
    "prom2cb": "github:svanderburg/prom2cb",
    "async": "https://mylocalserver/async-3.2.1.tgz",
    "mydep": "../../mydep",
  }
}

In the package.json file shown above, a new dependency has been added: mydep that refers to a relative local directory dependency: ../../mydep.

If we run npm install, then NPM creates a symlink to the folder in the project's node_modules/ folder and installs the transitive dependencies in the node_modules/ folder of the target dependency.

If we want to deploy the same project to a different machine, then it is required to put mydep in the exact same relative location, or the deployment will fail.

Deploying such an NPM project with Nix introduces a new problem -- all packages deployed by Nix are stored in the Nix store (typically /nix/store). After deploying the project, the relative path to the project (from the Nix store) will no longer be correct. Moreover, we also want Nix to automatically deploy the directory dependency as part of the deployment of the entire project.

To cope with these inconveniences, we are required to implement a tricky solution -- we must rewrite directory dependencies in such a way that can refer to a folder that is automatically deployed by Nix. Furthermore, the dependency should still end up being symlink to satisfy NPM -- copying directory dependencies in the node_modules/ folder is not accepted by NPM.

Usage

To conveniently install NPM dependencies from a local source (and satisfying npm in such a way that it believes the dependencies came from their original locations), I have created a tool called: placebo-npm.

We can, for example, obtain all required dependencies ourselves and put them in a local cache folder:


$ mkdir /home/sander/mycache
$ wget https://mylocalserver/async-3.2.1.tgz
$ wget https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz
$ git clone https://github.com/svanderburg/prom2cb

The deployment process that placebo-npm executes is driven by a package-placebo.json configuration file that has the following structure:


{
   "integrityHashToFile": {
     "sha512-hzSoAVtJF+3ZtiFX0VgfFPHEDRm7Y/QPjGyNo4TVdnDTdft3tr8hEkD25a1jC+TjTuE7tkHGKkhwCgs9dgBB2g==": "/home/sander/mycache/underscore-1.13.1.tgz",
     "sha512-XdD5lRO/87udXCMC9meWdYiR+Nq6ZjUfXidViUZGu2F1MO4T3XwZ1et0hb2++BgLfhyJwy44BGB/yx80ABx8hg==": "/home/sander/mycache/async-3.2.1.tgz"
   },
   "versionToFile": {
     github:svanderburg/prom2cb#fab277adce1af3bc685f06fa1e43d889362a0e34": "/home/sander/mycache/prom2cb"
   },
   "versionToDirectoryCopyLink": {
     "file:../dep": "/home/sander/alternatedir/dep"
   }
}

The placebo config maps dependencies in a package-lock.json file to local file references:

integrityHashToFile maps dependencies with an integrity hash to local files, which is useful for HTTP/HTTPS dependencies, registry dependencies, and local file dependencies.
versionToFile: maps dependencies with a version property to local directories. This is useful for Git dependencies.
versionToDirectoryCopyLink: specifies directories that need to be copied into a shadow directory named: placebo_node_dirs and creates symlinks to the shadow directories in the node_modules/ folder. This is useful for installing directory dependencies from arbitrary locations.

With the following command, we can install all required dependencies from the local cache directory and make all necessary modifications to let NPM accept the dependencies:


$ placebo-npm package-placebo.json

Finally, we can run:


$ npm install --offline

The above command does not attempt to re-obtain or re-install the dependencies, but still performs all required build management tasks.

Integration with Nix

All the functionality that placebo-npm provides has already been implemented in the node-env.nix module, but over the years it has evolved into a very complex beast -- it is implemented as a series of Nix functions that generates shell code.

As a consequence, it suffers from recursion problems and makes it extremely difficult to tweak/adjust build processes, such as modifying environment variables or injecting arbitrary build steps to work around Nix integration problems.

With placebo-npm we can reduce the Nix expression that builds projects (buildNPMProject) to an implementation that roughly has the following structure:


{stdenv, placebo-npm}:
{packagePlacebo}:

stdenv.mkDerivation ({
  pname = builtins.replaceStrings [ "@" "/" ] [ "_at_" "_slash_" ] pname; # Escape characters that aren't allowed in a store path

  placeboJSON = builtins.toJSON packagePlacebo;
  passAsFile = [ "placeboJSON" ];

  buildInputs = [ nodejs placebo-npm ] ++ buildInputs;

  buildPhase = ''
    runHook preBuild
    true
    runHook postBuild
  '';
  installPhase = ''
    runHook preInstall

    mkdir -p $out/lib/node_modules/${pname}
    mv * $out/lib/node_modules/${pname}
    cd $out/lib/node_modules/${pname}

    placebo-npm --placebo $placeboJSONPath
    npm install --offline

    runHook postInstall
  '';
} // extraArgs)

As may be observed, the implementation is much more compact and fits easily on one screen. The function accepts a packagePlacebo attribute set as a parameter (that gets translated into a JSON file by the Nix package manager).

Aside from some simple house keeping work, most of the complex work has been delegated to executing placebo-npm inside the build environment, before we run npm install.

The function above is also tweakable -- it is possible to inject arbitrary environment variables and adjust the build process through build hooks (e.g. preInstall and postInstall).

Another bonus feature of delegating all dependency installation functionality to the placebo-npm tool is that we can also use this tool as a build input for other kinds projects -- we can use it the construction process of systems that are built from monolithic repositories, in which NPM is invoked from the build process of the encapsulating project.

The only requirement is to run placebo-npm before npm install is invoked.

Other use cases

In addition to using placebo-npm as a companion tool for node2nix and setting up a simple local cache, it can also be useful to facilitate offline installations from external media, such as USB flash drives.

Discussion

With placebo-npm we can considerably simplify the implementation of node-env.nix (part of node2nix) making it much easier to maintain. I consider the node-env.nix module the second most complicated aspect of node2nix.

As a side effect, it has also become quite easy to provide tweakable build environments -- this should solve a large number of reported issues. Many reported issues are caused by the fact that it is difficult or sometimes impossible to make changes to a project so that it will cleanly deploy.

Moreover, placebo-npm can also be used as a build input for projects built from monolithic repositories, in which a sub set needs to be deployed by NPM.

The integration of the new node-env.nix implementation into node2nix is not completely done yet. I have reworked it, but the part that generates the package-placebo.json file and lets Nix obtain all required dependencies is still a work-in-progress.

I am experimenting with two implementations: a static approach that generates Nix expressions and dynamic implementation that directly consumes a package-lock.json file in the Nix expression language. Both approaches have pros and cons. As a result, node2nix needs to combine both of them into a hybrid approach.

In a next blog post, I will explain more about them.

Availability

The initial version of placebo-npm can be obtained from my GitHub page.

by Sander van der Burg (noreply@blogger.com) at August 31, 2021 06:47 PM

August 06, 2021

Domen Kozar

New laptop - Thinkpad P14s

~4.5 years ago I bought Dell XPS 15. It was a powerhorse without a match for that size/price.

I must admit that I wasn't happy about it:

Webcam is positioned below the screen, recording my typing fingers instead of my face.
Battery getting swollen (and being forced to fly that way wasn't fun), after replacing it happened again.
The CPU fan is loud and throttles a lot.
There's only one company in Slovenia that does warranty repairs and takes the law-specified 40 working days to finish their job (replacing a battery, for example).
Too heavy (over 2kg)

I've decided to go back to Thinkpads, as they are unmatched by the overall experience and quality to run Linux.

Requirements

Inspired by Ben Gamari's post, I've figured out I have the following requirements for the new laptop:

A bright screen (400 nits or possibly more), so I can work outside.
A high-resolution screen (UHD), so that I can reasonably split the screen vertically.
14" or 15" screen, I've still not made up my mind but both of these work well for me.
Light-weight (less than 2kg) so that I can travel light (and hold the laptop with one hand if I have to move a few meters).
32 GB of RAM, so that I don't have to worry about hitting the limit (I don't even configure swap these days).
No discrete GPU, to make it less convenient to play video games and there's less power drain from the battery.
Quiet fans, I haven't bothered with this before. XPS was really loud and I'm getting older.
Best CPU possible, as I'm frequently compiling Haskell.
Good battery life (ideally 8h or more), so that I can do my day of work without worrying about a plug.
Fast repair turnaround, I get that with Lenovo in Slovenia with about a day (if they have spare parts).
At least 2 years of warranty, possibly 3.

Thinkpad P14s

I've bought a Thinkpad P14s:

Processor:
AMD Ryzen 7 PRO 5850U Octa Core (8 Cores / 16 Threads)
1.90 - 4.40 GHz, 4 MB L2 and 16 MB L3 cache

RAM:
32 GB DDR4-3200 MHz (16GB soldered)
(no slot free, max.48GB )

Hard disk:
1 TB SSD M.2 PCIe 3.0 x4 NVMe OPAL2

Display:
14 "UHD (3840x2160) IPS, Dolby Vision
500 nits, 16: 9, 1500: 1 contrast, 170 °, anti-glare
100% DCI-P3 gamut

Graphics card:
AMD Radeon RX Vega 8

Network / communication:
10/100/1000 Mbit RJ45 Ethernet
Realtek WLAN RTL8852AE 11ax, 2x2
Bluetooth 5.2
IR webcam 720p with ThinkShutter prepared for Windows Hello
WWAN / LTE-A, module optionally available

Interfaces:
2x USB 3.2 Gen1 (1x powered)
2x USB-C 3.2 Gen 2 (DisplayPort 1.4a, Power Delivery 3.0)
1x HDMI 2.0
1x SideDocking Port (CS18)
1x 3.5mm Mic-In / Audio-Out combined
1x MicroSD Card Reader
1x SmartCard Reader

Security:
- Match-On Touch FingerPrint Reader
- TPM 2.0 Security Chip
- Kensington Port

input devices:
6 rows Precision Keyboard German with backlighting
UltraNav / UltraNav Touch 3 + 3 Button Design

Battery:
3 cell lithium-polymer battery (50Wh) internal. Run time up to 10 hours.
RapidCharge: 65W USB-C adapter 80% in 1 hour.

Size and weight:
329 x 227 x 17.9 mm, 1.47 kg

Other:
65W USB-C power supply

Warranty:
3 years bring-in manufacturer warranty
1 year on battery

CPU benchmarks claim I'm going from 6945 to 19504 CPU marks (for comparison, Apple M1 has 15139). Quite impressive!

After a few days of using it, it's noticeable faster.

Even with the fan at 4500 RPM, it's completely quiet.

Overall it meets all my requirements except for the battery (while I haven't tested it, I expect it to last below 8h).

Having a German keyboard is a price I pay for the current chip shortage, as it's hard to get a hold of P14s model with US/UK keyboard (also Brexit). Is there some kind of keys replacements service?

NixOS Setup

As NixOS doesn't yet support Secure Boot, the first thing you need to do is boot into BIOS and disable it.

I've written Building bootable ISO image tutorial and nixos-hardware support for Thinkad P14s to be able to boot with an ISO image that has wifi card support.

Installation of NixOS is almost identical to official instructions with a twist of encrypted root partition.

Wifi works (although there are issues with speed), Bluetooth works, brightness control works, webcam works, trackpad works, suspend works.

Happy.

by Domen Kožar at August 06, 2021 09:00 AM

July 26, 2021

Graham Christensen

NixOS on the Framework

What a treat it is to review the Framework laptop a few months before I’ll be buying my own.

The Framework promises to be an powerful, high-end ultra thin laptop with the stand-out feature of easy repairs and upgrades. I think they’ve done it.

Let’s install NixOS.

Live Media

You’ll need an install image with Linux 5.13 or newer to have a working Wi-Fi card.

This presents a challenge: the ISOs provided by https://nixos.org today use Linux 5.10.

If you’re comfortable installing NixOS without a GUI, you can fetch a minimal 21.05 ISO with the latest kernel from Hydra.

Optionally, Building Our Own Live Media

Building your own customized install media is straightforward. You’ll need a Linux machine with Nix.

First, write the following in to a file named custom-media.nix:

{ pkgs, modulesPath, ... }: {
    imports = [
        "${modulesPath}/installer/cd-dvd/installation-cd-graphical-gnome.nix"
    ];

    boot.kernelPackages = pkgs.linuxPackages_latest;
}

Then enter a nix-shell with nixos-generators and build the media:

$ nix-shell -p nixos-generators
nix-shell$ nixos-generate -I nixpkgs=channel:nixos-unstable --format iso --configuration ./custom-media.nix 
unpacking 'https://nixos.org/channels/nixos-unstable/nixexprs.tar.xz'...
these derivations will be built:
  ...snip...
/nix/store/gnnbjvd916yh1f4svbgrssq94550pbxl-nixos-21.11pre304626.8ecc61c91a5-x86_64-linux.iso/iso/nixos-21.11pre304626.8ecc61c91a5-x86_64-linux.iso

Then copy the ISO to my USB disk, which is called /dev/sda.

Note: the sync at the end is critical.

$ sudo cp /nix/store/gnnbjvd916yh1f4svbgrssq94550pbxl-nixos-21.11pre304626.8ecc61c91a5-x86_64-linux.iso/iso/nixos-21.11pre304626.8ecc61c91a5-x86_64-linux.iso /dev/sda
$ sudo sync

Disabling Secure Boot

Reboot
Enter the Firmware Configuration (Fn + F2)
Navigate to the Security tab
Select Secure Boot
Select Enforce Secure Boot
Select Disabled
Save and reboot with Fn + F10

Note: Secure Boot will prevent unsigned software from running. NixOS does not support Secure Boot today.

Booting Your Media

It appears the machine can boot from any USB port.

Reboot
Enter the Boot Manager (Fn + F12)
Select your USB disk

Installing NixOS

Follow the standard NixOS installation instructions.

After running nixos-generate-config, edit /mnt/etc/nixos/configuration.nix and add a few lines:

boot.kernelPackages = pkgs.linuxPackages_latest; - for WiFi support
services.fprintd.enable = true; for fingerprint support

Continue the installation procedure.

Seriously. That’s it.

Hardware Review

The review unit they shipped me is pre-release hardware, but well specced out and lovely:

11th Gen Intel i7-1185G7, up to 4.8GHz
16G of RAM
Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
2 USB-C ports, a mini-display port, and a Micro SD slot

Its performance seems quite good, easily good enough for my day to day work. It has no problems driving my 38” ultrawide display.

The machine is silent except when I’m compiling, and it has never felt hot. After a long compilation some areas near the hinge are noticably warm, but the hand rests are still cool.

The keyboard and trackpad feel, to me, perfect. The keys feel crisp but not heavy. The trackpad feels large and precise. That said, click is a bit too clicky.

Hardware switches for the audio and camera are nice, but a bit hard to toggle. The switches are very small and smooth, making it difficult to move. The switches are part of the replacable bezel, and this may be easily fixed.

The power brick is UL Energy Verified and just slightly larger than the Dell XPS 9300 brick.

The Intel Wi-Fi 6 AX210 has integration issues with Linux. Sometimes the wifi firmware crashes on startup, but it seems to be no more frequent than 50/50. Once it is booted, the wifi is rock solid.

QR Codes on Everything

Everything has a QR code as a link to instructions. And I mean everything: down to the cable connecting the wall to the power brick.

However, many of them don’t actually lead anywhere. I wasn’t too surprised to see the power cable’s QR code was 404ing, but a bit surprised to see the link for RAM replacement 404ing. I am sure this is a work in progress and these QR codes will all work.

Relative to the XPS 9300

I want the smallest, lightest laptop I can buy, and the Framework is a strong contender here.

This laptop is essentially the same width and thickness: within a milimeter or two. The framework comes in at just 60 grams heavier than the XPS (1330g vs. 1270g.)

One major improvement over the 9300: The fingerprint sensor is supported by recent versions of fprintd out of the box, and no TOD patches are required. The sensor is fast and reliable, whereas the Dell XPS 9300’s TOD Goodix drivers are nearly worthless on Linux.

Some aspects of the Framework are a bit more “industrial” feeling:

The cable from the brick to the wall looks like a desktop PC’s cable. Thick, shiny plastic. compared to the more polished, polished, on-brand look of the Dell cable.
The rubber feet are chunky, and not super sticky.
Opening the lid is not super easy like on the XPS.

Ready to Buy

Overall, I am extremely satisfied by Framework. The team seems great, and their values appear to match my own. I’ll be pulling out my wallet later this year when my current laptop isn’t so new. I think they’ve fulfilled their promise, and have created a refreshing machine: an ultralight which is servicable.

Yes, please.

July 26, 2021 12:00 AM

May 29, 2021

Craige McWhirter

The Consensus on Branch Names

Consensus: Decisions are reached in a dialogue between equals

There was some kerfuffle in 2020 over the use of the term master in git, the origins of the term were resolutely settled so I set about renaming my primary branches to other words.

The one that most people seemed to be using was main, so I started using it too. While main was conveniently brief, it still felt inadequate. Something was wrong and it kept bubbling away in the background.

The word that kept percolating through was consensus.

I kept dismissing it for all the obvious reasons, such as it was too long, too unwieldy, too obscure or just simply not used commonly enough to be familiar or well understood.

The word was persistent though and consensus kept coming back.

One morning recently, I was staring at a git tree when the realisation slapped me in the face that in a git workflow the primary / master / main branches reflected a consensus point in that workflow.

Consensus: Decisions are reached in a dialogue between equals

That realisation settled it pretty hard for me, consensus not only accurately reflected the point in the workflow but was also the most correct English word for what that branch represented.

Continue the conversation on Matrix.

by Craige McWhirter at May 29, 2021 08:19 AM