Last weekend we had what I consider to be the very successful Arch Conf 2020. This included a talk by Michael Stapelberg about distri, his Linux distribution to research fast package management.
Michael showed an example of installing QEMU in Arch vs distri, very similar to the results given on his post about Linux package managers being slow. There are a couple of examples of installing packages in that blog post – one pacman comes out looking quite good, one it comes out looking OK (at least it beats apt on Debian and dnf on Fedora…).
Before I get into the teardown of these comparisons, I will point out that I am not disagreeing that distri is a very fast package manager. It is designed for pure speed, so it should definitely be the fastest! Also, I am not disagreeing that pacman could be faster in many areas. What I am concerned about is the conclusion that “The difference between the slowest and fastest package managers is 30x!“. This conclusion assumes that the package manager is the difference and not how the distribution chooses to implement the use of the package manager.
Lets look at the first comparison – ack. That is not me agreeing with myself, but rather a small perl program that depends on a single perl library (perl-file-next). Assuming perl is installed in the base image for the distros being compared, this is a reasonable comparison, if not slightly small. However, looking at the output from the installs (which is not provided in full for all package manager + distro combinations), we can see that is not the case. Note, Fedora and dnf require 114MB of downloads compared to Arch and pacman with 6.5MB. It appears that Arch is the only distro to have perl installed by the base image, so this makes pacman look better. The NixOS image does not even have coreutils installed (and why is that needed for ack?)!
This gets worse for the qemu comparison. Given the video comparison in Michael’s talk, I will just focus on Arch + pacman vs distri. Lets look at the dependency lists for the two packages. For distri we get:
$ wget https://repo.distr1.org/distri/supersilverhaze/pkg/qemu-amd64.meta.textproto
$ grep runtime_dep qemu-amd64-4.2.0-9.meta.textproto
On a fresh Arch install, the only package in that list that would not be installed is pixman, which has no futher dependencies (beyond glibc), so that would require two packages to be downloaded. However compare that to the Arch package:
$ pacman -Si qemu
Depends On : virglrenderer sdl2 vte3 libpulse brltty seabios gnutls
libpng libaio numactl libnfs lzo snappy curl vde2
libcap-ng spice libcacard usbredir libslirp libssh zstd
That is a lot more dependencies? Lets quantify this with a number:
$ pactree -s -u base | sort > base.deps
$ pactree -s -u qemu | sort > qemu.deps
$ comm -23 qemu.deps base.deps | wc -l
That is very different! Why? Because Arch’s qemu package provides the graphical frontend. However, Arch does have a qemu-headless package, and …
$ pactree -s -u qemu-headless | sort > qemu-headless.deps
$ comm -23 qemu-headless.deps base.deps | wc -l
That is getting to be a fairer comparison. But we still are not comparing only package manager differences. In fact, this comparison is heavily skewed towards how a distribution chooses to package the software. Pacman appears at a disadvantage, because Arch packages all files for the software in a single package, and does not split binaries, libraries, docs, include files, etc into separate packages. That is a distribution decision, and not a package manager decision – all package managers in the comparison list are capable of splitting packages into smaller units. So maybe not apples to oranges, but rather oranges to orange segments? I don’t think I am good at analogies!
Finally, I’ll note comparisons “include metadata updates such as transferring an up-to-date package list“. This seems reasonable at first glance. However, consider two distributions, where the one has a small package set, and the second includes all the packages from the first plus many more. The second distro would take longer to refresh the packages metadata, but that does not mean the package manager is slower! I need to look at distri closer, but my guess from looking at its repo layout is that it downloads the metadata for a package on demand and includes a full dependency list so does not rely on any transitivity. This appears to be a nice trick, but a distro using pacman could be set-up such that package database refreshes were only needed when performing a system update and not when installing packages. Again, Arch makes the choice not to support older versions of packages and removes them from download immediately on updates, but that is not a pacman limitation.
In summary, package managers are slow… But some distributions decide to make them slower! So we need to be careful when drawing conclusions of relative speed of package managers when more variables are at play.
I appreciate the effort, but I am really struggling to see the problem distri tries to solve anyway. Which modern package manager has speed problems to the point it’s considered an issue? Waiting for few more seconds is not a thing everyone care about..
I think it is trying to answer the question of “can things be more efficient?”
Michael’s talk was very interesting and thought-provoking, and now we’re here. I like it.
I first learned of Stapelberg when he wrote a post about leaving Debian (and giving Arch Linux a try), which makes Allan’s post a bit more relevant.
I haven’t really thought about the differences between Debian and Arch when I read Michael’s “Linux package managers are slow,” but it makes a bit more sense after reading this.