Nix OS, Guix OS, and declarative package management
In my article on how to pick a Linux distro I wrote a brief section on Nix OS and Guix OS. In this article, I describe in more detail what these Linux distros are, how they are different, and far more technologically advanced, from all distros, and the difficulties this can cause for you as an end user of Linux software.
Declarative package management
Quoting myself in the article
How
to pick a Linux distro
:
You could call the Nix and Guix software package repositories a form of
expert system, which by 1980s standards would be considered a form of artificial intelligence. Although nowadays no one thinks of expert systems as AI (the definition ofintelligenceis always changing), still it is a highly advanced algorithm for building software. Unfortunately, most software nowadays, especially the Python and JavaScript ecosystems, are not well equipped to facilitate declarative package management, and Python and JavaScript are among the world's most often used programming languages. Therefore I think it is safe to say, the world is not yet ready for Nix and Guix, these distros are ahead of their time.
Nix and Guix use what are called declarative package
management
algorithms with pure lazy functional package
configuration languages
. What this ball of jargon really means
is, you don't install software onto your computer,
you declare what software should exist on your
computer, and your package manager automatically computes for
you exactly which pieces of software need to be installed to
satisfy your demands. It then installs exactly those pieces of
software, and nothing else. As long as you are careful to declare
all the pieces of software you need, your computer system will work
well.
You declare what software you want to install using a programming language, so right from the start, you need to have some basic computer programming skills to uses these Linux distros. Also, you won't be able to understand much about what is going on without some understanding of the algorithms used by the package manager, and what problem these algorithms solve.
In brief:
Ensuring correctness of software configurations — each software components needs to be tested and
released
, each release has version number attached, ensuring each component in a larger configuration of interdependent software components are all using the correct version numbers is a very difficult problem to solve.
expert
system of software packages is constructed, and a pure lazy
functional language is used to declare
software
configurations in a way that an algorithm can automatically
compute component dependencies, and build the whole software
system correctly. Running the algorithm is called computing
a reproducible build
.
It solves the problem, but introduces some of it's own problems. The algorithm to compute builds is very costly: takes a lot of time, and a lot of disk space to cache the results of these computations. It also downloads large portions of the database onto your computer so requires a lot of network bandwidth. Furthermore, popular programming languages like JavaScript and Python aren't designed for such rigor, and are not well equipped to be used in declarative package management systems.
The problem with software interdependency
The problem that declarative package management solves is that
software is made out of many interdependent units of code, and this
makes things very complicated. Each piece of software is
being maintained by a different group of people in a city or
organization in a different part of the world than all the other
software components. How do you get software programmed by so many
people spread around the world to all work together without breaking
down? Well, with rigorous testing, but this is where concept of
software releases
and version numbers become important. A
release
is a final product that has been fully tested and safe to
use by other software components. Each release has a version
number. And the set of interdependent pieces of software and all of
their versions is called a configuration
.
This is where tracking the interdependence of software components, the software configuration, becomes difficult, because:
the act of fixing a bug in one software component that another software component uses might introduce a new bug in the dependent computer code. If it is at all practical (and often it is not), all of the software put together in it's final configuration needs to be tested as a whole.
you can have cascading software dependencies in which two pieces of software (code A and code B) that you need for you app each require two different versions of the same piece of code (code C). Code A may only work with version 2.0 of code C, but code B may only work with version 1.5 of code C. This is known to professionals as
dependency hell.
Using a package database
The solution to these software interdependency problems that Nix
and Guix offer is to create what some would call
an expert
system
, and have an algorithm compute the best possible set
of interdependent software required to satisfy a set of requirements
— hence making your software configuration declarative,
so that the algorithm can compute the precise conditions that
satisfy your declaration.
All pieces of software are stored in a database, and the portions of this database that you need are downloaded onto your computer. When you change configurations, say by installing new apps, or upgrading your apps, new software is added to the database. Even if you remove software, it remains in the database, but in parts that are inaccessible to the rest of the system. Also, everything in the database is content addressable.
If you discover that the new apps are bad or broken, you
can roll back
the installation to the previous
declaration, and your software will return to the exact
configuration state it was in before your upgrade.
And when you are building larger software applications to distribute to customers, for example Docker images, or FlatPaks, or AppImages, the distribution images built by Nix or Guix are as small as can possibly be — still often in the gigabytes in size, but this might be the absolute smallest that a large application software can reasonably be.
And if you take care to make every, single, last piece of
software in the entire computer system has it's version properly
tracked in the database, then your software build
becomes fully reproducible
, that is, you
can mathematically prove that the software that is built and
installed onto one computer is bit-for-bit identical to the software
that was built and installed onto other computer systems. This is
also made possible by
the content
addressable nature of the database, since each piece of software
is stored with it's cryptographic hash number, if the hash numbers
differ between any two pieces of software, this provides a reasonable
guarantee that the two pieces of software are different.
This property of reproducible builds
is very attractive to
engineers who might be responsible for ensuring some reasonable
guarantee of correctness of the software on the computer systems
they maintain, especially if people's valuable private information
are at stake, or even if lives hang in the balance. And this is
really one of the goals of both Nix OS and Guix OS — to
demonstrate that it is possible to have reproducible software builds
of every last piece of software on the operating system. I would say
they have proved the concept; it is indeed possible.
Problems inherent in declarative package management
If you use Nix or Guix as your distro of choice, your computer will accumulate hours upon hours of time running the calculations necessary to satisfy the system software configurations that you have declared, and spend many more hours downloading and installing the software, and often downloading dozens of gigabytes of code, and it might have to spend even more hours built and tested on your computer.
When you ask Nix or Guix to install new software, you change your package configuration, this launches the software dependency calculations all over again. Calculations and software bundles are saved in the database, so you don't always need to compute everything again from scratch. But it is common to see software installation take anywhere from 30 minutes to several hours, depending on what software you are installing.
Also, without regular garbage collection
—
which is removing old calculations and software bundles in the
database that are no longer used — you may end up with dozens
of gigabytes of space on your computer's hard drive being
wasted. But every time you collect garbage, you lose the results of
those calculations for those pieces of software, and they may need
to be re-computed, if you ever roll-back to a previous
configuration, or if you install other new software that might have
still needed those older calculations.
Probably the biggest problem that I see with declarative package management is that the world of software engineering doesn't yet seem ready to use it. Two of the world's most popular programming languages, Python and JavaScript, have their own package management systems which are not as rigorously declarative as Nix or Guix. Attempts have been made to translate the Python and JavaScript package databases into a form that Nix or Guix can use to compute software configurations in a way that results in reproducible builds, but there is usually not enough information in these databases to do this properly. Furthermore, Python and JavaScript developers tend not to be overly concerned with correctly, accurately built software, and Python programmers have been known to get testy when dealing with Nix OS people.
Python and JavaScript are scripting languages that are easy to use, and quick to write for non-professionals. Mathematical rigor is just not a concern of many programmers in milieu of Python or JavaScript applications. But Python and JavaScript applications can also be among the most popular applications in their respective fields of use, especially data science and machine learning, and so you can't just cut them out of your operating system.
Conclusions
Declarative package management, and operating systems like Nix and Guix which use this technique, are indeed fascinating and may become how software is engineered in the future. But there are problems with it that have yet to be solved, problems with:
requiring end users to learn the package configuration language and declare software configurations,
how computationally intensive these algorithms are,
how much code must be downloaded over the internet and stored on your computer,
how very popular modern software applications are not equipped to being built in a declarative way.
Nix OS, and Guix OS, are ahead of their time. Personally, I think this level of rigor in software installation will become more popular in the future, especially since more often huge sums of money, and even people lives, are ever increasingly more often at the mercy of computer software, the need for correctness and rigor in building tractable software will likewise become more increasingly demanded. But absent a number of costly or deadly software mistakes, or government regulations, that would force software engineering companies to build such rigorous systems, this technology will probably continue to only be used in niche applications.