The Great Pacman Bug Hunt of 2012

This is a story about a recent issue discovered in pacman, the Arch Linux package manager, and the difficulties we had hunting it down… The story is long, but so was the process of finding the bug.

It all started on a warm summer’s night (in my timezone and location… – it was probably cold and daytime for the other main pacman developers) with the reporting of FS#27805: “[pacman] seg faults when removing firefox”. Of course, my initial reaction was “bull shit” as we all know there are no bugs in the pacman code. But this was only a couple of weeks since pacman-4.0 was moved into the Arch Linux [core] repo so there was an ever so slight possibility it was real.

Luckily for us, the user reporting the bug was very helpful and installed a version of pacman with debugging symbols and gave us a full backtrace. It was very clear where the segfault was occuring:

#0 0xf7fbd4e7 in _alpm_pkg_cmp (p1=0x8128aa0, p2=0x0) at package.c:644

That function is called in the package removal process when we check that a file that is going to be removed with a package is not also owned by another package (which would require someone using -Sf when they should not). If the package in the local database is the same as the one being removed, we do not need to run this check, and hence the test. As you can see above, for some reason _alpm_pkg_cmp is being passed a null pointer as the package from the local database and KABOOM!

So the question was, how do we get a null value for the package from our local database? Given pacman runs through the list of local packages on each package removal, this null entry must have been generated on the removal of the previous package. Here is a bit of background on how package information is stored in pacman. Package information is stored in a hash table that also provides access to the data as a linked list. This provides us with fast look-up by a package’s name but also allows us to loop through the (generally sorted) package list. Now the hash table code is fairly new (first introduced in pacman-3.5) and the removal of items from a hash with collision resolution done by linear probing is not straight forward, so there could be a bug. Dan pointed his finger my way as I wrote the original hash table code and I pointed my finger his way as he made optimizations to the removal part. But it turns out that both of us were not thinking too hard. It is the list that is being corrupted and that has items removed using code that has been around for years. Despite that, the whole hash table and linked list removal code got an in depth review and no issues were found.

We were stumped. Looking at the the debug output from pacman, we could see that a file that actually did not exist on the system was being “removed” right before the crash, but that is not uncommon and appeared to be handled correctly so was unlikely to be the cause. So back to the reporter to see if we could get more information to replicate. He was very helpful and provided us with a copy of his local package database. We created a chroot with exactly the same packages and had no luck replicating. The user even provided us with a complete copy of his chroot where the error was occurring, but again there was no luck replicating. It must be something specific to that users system. Right? Well, even re-extracting the tarball of the chroot the user provided us onto his own system made the bug go away. All in all, a great candidate for being “not a bug”….

Until on another warm summers evening, while being my usual extremely helpful self on IRC, someone mentioned they were getting a segfault while removing packages. A bug report was filed and, again, the user was extremely helpful and the backtrace provided was exactly the same. A core dump showed us there was definitely something wrong with the linked list. Well… bugger! This bug appears real. Again the red-herring of the removal of a non-existent file was shown in the debug log, but it would be very, very strange for that to break the linked list of package information so was ruled out.

It was time to find a reproducer! So I created a chroot and set this script running:

ret=0
while (( ! ret )); do
  pkg=$(pacman -Sql extra | shuf -n1)
  pacman -S --noconfirm $pkg
  pacman -R --noconfirm -s $pkg
  ret=$?
done

Within five minutes I could replicate the segfault. (It turns out I was very lucky as I ran the same script again for over four hours and did not strike the issue.) Now it was time to get debugging!

The first thing I did was print some debugging info in the linked list node removal code, but for some reason the node removal just before the segfault did not print anything. I was only printing information when removing a node from the middle of the list (because that is where the package causing this issue was located), but just to be sure I also added debug statements for the case of removing the head and tail nodes. And then pacman told me it was removing a node from the end of the list… “Why do you think that package is a the end of the list pacman?”, I asked. “Because the head node’s prev entry tells me it IS the end of the list”, replied pacman. “Oh, crap”, I said. “So it does!” Something was clearly wrong here.

It was time to investigate all removal operations on that list. So I printed the entire linked list before and after each package removal and found the error actually occurred before the removal operation even started. The initial list of the local database passed to the removal operation was already broken with the pointer to the tail entry not pointing to the tail. That was good to know as we had thoroughly reviewed the removal code and not found any issues.

This lead me to believe that the error must occur when reading in the local database. Next step: print out the linked list at the end of reading in the local database. But that was completely fine. So somewhere between reading in the local database and using it, things got broken. And, what do we do with the local database between reading it in and removing items from it? The only place where we modify the local database between those points is when it gets sorted by the package names. Sure enough, the pointer to the tail of the linked list is good going into the sort and bad coming out.

This limited the error to two functions: alpm_list_msort or alpm_list_mmerge. These implement a merge sort. Essentially alpm_list_msort recursively calls itself, dividing the list up into smaller pieces until it can not be divided any further and they are then they are merged in sorted order by alpm_list_mmerge. I had just started staring at the code when I saw something that seemed too obvious for such a hard to track down bug. My exact words on IRC were “I think I can fix this…”. And sure enough I could.

It turns out that when alpm_list_msort split a list into two, it did not set the pointer to the tail nodes in the two new lists correctly (or at all…). So a two line addition and we have the bug fixed. It turns out this bug had been present since the start of 2007. So I am still slightly amazed that we did not see it before now and when it did appear that we got a second report of it so quickly.

And why could we not reproduce the issue even with a copy of a chroot where it was occurring? It is entirely dependent on the order the directory entries are returned from the disk. This determined which package was pointed to as the “tail” of the sorted package list. The package incorrectly referred to as the tail had to be removed during a removal operation, and also not be the last package removed, to expose the bug. Given most systems will have many hundreds of packages on them and removal operations tend to involve one or a few packages, this is a fairly rare occurrence. But even if it occurred only a fraction of a percent of removal operations, I think we should have ran into this bug before now. I guess more people probably did experience the issue, but then could not immediately replicate and did not experience the issue again so did not report it.

And that is the end of the story of one of the most frustrating bugs I have ever managed to track down. A big thank you to the two users who installed versions of pacman with debug symbols and provided us backtraces, coredumps and entire chroots! Without their help, we would probably still be not entirely convinced that the bug was real and it would still be hiding away in the pacman source code.

Pacman Package Signing – 4: Arch Linux

I have previously covered the more technical aspects of the implementation of PGP signing of packages and repository databases in pacman. You can read the previous entries here:

Since then, pacman-4.0 has been released and has been in the [testing] repository in Arch Linux for a while. That means that the signing implementation is starting to get some more widespread usage. No major issues have been found, but there are some areas that could be improved (e.g. the handling of the lack of signatures when installing packages with -UFS#26520 and FS#26729). And it has successfully detected a “bad package” in my repo… (well, not really a bad package, but a bad signature. Lesson: do not try creating detached signatures for multiple files at once because gnupg is crap…).

The Arch repos have been gradually preparing for the package signature checking in pacman-4.0. Support for uploading PGP signatures with packages was added in April and was made mandatory from the beginning of November. As of today, 100% of the packages in the [core] repo and approximately 71% of [extra] and 45% of [community] are signed.

So all the components are coming together nicely. But how does this work from a practical standpoint? I’ll start with setting up the pacman PGP keyring and pacman.conf.

When first installing pacman-4.0, you should initialize your pacman keyring using pacman-key --init. This creates the needed keyring files (pubring.gpg, secring.gpg) with the needed permissions, updates the trust database (obviously empty at this point…), and generates a basic configuration file. It also generates the “Pacman Keychain Master Key”, which is your ultimate trust point for starting a PGP web of trust. You may want to change the default keyserver in the configuration file (/etc/pacman.d/gnupg/gpg.conf) as some people have issues connecting to it.

The set-up of your pacman.conf file is somewhat a matter of personal preference, but the values I use are probably reasonable… I have the global settings for signature checking as the default value (Optional TrustedOnly). This basically sets the need for signatures to be optional, but if they are there then the signature has to be from a trusted source. See the pacman.conf man page for more details. For the Arch Linux repos with all packages signed, I set PackageRequired which forces packages to be signed but not databases. (For the small repo I provide, I use Required as both packages and databases are signed.)

Lets look at some output when installing a signed package:

# pacman -S gcc-libs
warning: gcc-libs-4.6.2-3 is up to date -- reinstalling
resolving dependencies...
looking for inter-conflicts...
 
Targets (1): gcc-libs-4.6.2-3
 
Total Installed Size: 2.96 MiB
Net Upgrade Size: 0.00 MiB
 
Proceed with installation? [Y/n]
(1/1) checking package integrity [######################] 100%
error: gcc-libs: key "F99FFE0FEAE999BD" is unknown
:: Import PGP key EAE999BD, "Allan McRae ", created 2011-06-03? [Y/n] y
(1/1) checking package integrity [######################] 100%
error: gcc-libs: signature from "Allan McRae " is unknown trust
error: failed to commit transaction (invalid or corrupted package (PGP signature))
Errors occurred, no packages were upgraded.

As you can see, pacman struck a package that had a signature from an unknown key. It then asks if you would like to import that key. Given the PGP key fingerprint matches that published in multiple places, importing that key seems fine. Then pacman errors out due to that key not being trusted. Well, that Allan guy seems reasonably trustworthy… so I could just locally sign that key using pacman-key --lsign EAE999BD and that key will now be trusted enough to install packages.

Validating every Arch Linux Developer’s and Trusted User’s PGP key would soon become annoying as there are a fair number of them (35 devs and 30 TUs – with some overlap). To make this (a bit…) simpler, five “Master Keys” have been provided for the Arch Linux repositories. The idea behind these keys is that all developer and TU keys are signed by these keys and you only need to import and trust these keys in order to trust all the keys used to sign packages. These key fingerprints will be published in multiple places so that the user can have confidence in them (see the bottom of this post for a listing of the fingerprints obtained relatively independently of those listed on the Arch website).

To set-up your pacman keyring with these keys, you can do something like:

for key in FFF979E7 CDFD6BB0 4C7EA887 6AC6A4C2 824B18E8; do
    pacman-key --recv-keys $key
    pacman-key --lsign-key $key
    printf 'trustn3nquitn' | gpg --homedir /etc/pacman.d/gnupg/
        --no-permission-warning --command-fd 0 --edit-key $key
done

That will import those keys into your keyring and locally sign them. But that is not quite enough as those keys are not used to sign packages themselves. In order for pacman to trust PGP keys signed by the master keys you have to assign some level of trust to the master keys. The final line gives the master keys “marginal” trust. Note I use gpg directly rather than pacman-key as pacman-key does not understand the --command-fd option. You could use pacman-key --edit-key if you wanted to manually type in the commands to set the trust level. By default, the PGP web of trust is set up such that if a key is signed by three keys of marginal trust, then that key will be trusted. (We have five master keys rather than the minimal three so that we can revoke two – a worst case scenario… – and still have our packages trusted.) Note that setting the master keys to have marginal trust serves as a further safety mechanism as multiple keys would need to be hijacked to create a key that is trusted by the pacman keyring.

Now that the five master keys are nicely imported into your pacman keyring, any time pacman strikes a package from the Arch Linux repos with a signature from a key it does not know, it will import the key and it will automatically be trusted. At least that is the idea… We are still in a transition period so not all Developer and Trusted User keys are fully signed yet by the master keys yet, but we are not too far off. In the future we might provide a pacman-keyring package that streamlines this process a bit, or at least will save the individual downloading of each packager’s key.

That just leaves the signing of the databases, but that is a story for another day!


Arch Linux Master Key fingerprints:
    Allan McRae – AB19 265E 5D7D 2068 7D30 3246 BA1D FB64 FFF9 79E7
    Dan McGee – 27FF C476 9E19 F096 D41D 9265 A04F 9397 CDFD 6BB0
    Ionuț Mircea Bîru – 44D4 A033 AC14 0143 9273 97D4 7EFD 567D 4C7E A887
    Pierre Schmitz – 0E8B 6440 79F5 99DF C1DD C397 3348 882F 6AC6 A4C2
    Thomas Bächler – 6841 48BB 25B4 9E98 6A49 44C5 5184 252D 824B 18E8

Pacman Package Signing – 3: Pacman

And on with the “final” component of the package signing saga… I have previously posted about signing packages and databases and managing the PGP keyring, which was all preparatory work for pacman to be able to verify the signatures.

In the end, most people will not notice pacman verifying signatures unless something goes wrong (at least once it is configured). You will see the same “checking package integrity” line, but instead of verifying the packages md5sum, the PGP signature will be checked if available. But implementing this required substantial reworking of the libalpm backend, with the adding of signature verification abilities through the use of the gpgme library, adding flexible configuration options to control repo and package signature verification, changes to how and when repo databases get loaded (so that we can error out early if the repo signature is bad), and the list goes on… The majority of this was done by Dan McGee, who is the lead pacman developer. In fact, looking at the git shortlog for this development cycle:

$ git shortlog -n -s --no-merges maint..
   296  Dan McGee
   128  Dave Reisner
   124  Allan McRae
   ...

(followed by 18 other contributors with 11 or less commits each). So Dan takes the clear lead with about 50% of all commits in this developmental cycle, while the battle for second place remains intensely competed for!

So what have we ended up with? My opinion is ever so slightly biased, but I think we have ended up with the most complete and flexible package signing implementation yet. Most other package managers signature checking is simply a call to gpgv, which trusts any signature in your keyring. With the more complicated solution using gpgme, pacman has the complete concept of the web of trust, allowing for very precise keyring management. We not only sign packages, but sign databases too. Importantly, we can add expiry times to those signatures, which together prevents a malicious mirror holding back individual package updates or deliberately not providing any updates at all. As an aside, we also now protect against the “endless data attack” where an attacker sends an endless data stream instead of the requested file. Together that covers the most well reported avenues of attack on package managers (I hesitate to say “all” despite not knowing of any others because someone will prove me wrong!).

Onward to the actual use of signature checking in pacman. The main adjustment needed to be made is the addition of the SigLevel directive to pacman.conf. This can be specified at a global level and also on a per-repo basis. The SigLevel directive takes three main values: Required, which forces signature checking to be performed; Optional (default), which will check signatures if present but unsigned packages and databases will be accepted; and Never, which sets no signature checking to be preformed. More fine grained control can be added by prefixing these options with Database and Package and combining multiple options. For example, I have a local repo that has a signed database but not all packages have signatures. So I use SigLevel = Optional for my global default and add SigLevel = DatabaseRequired to enforce the database to be validly signed for that repo. Alternatively, I could use SigLevel = DatabaseRequired PackageOptional to explicitly achieve the same result. You can also specify the level of trust needed in a signing key using the TrustedOnly (default) and TrustAll options. The former will only accept a key if it is fully trusted by your PGP keyring, while the latter only requires the key to be present in the keyring (much like using gpgv).

As I wrote earlier, there is very little change from a users perspective once configured. About the only thing that is really noticeable is that pacman will attempt to download a signature for each database it downloads when the database SigLevel is set to Required or Optional. For example:

$ pacman -Syu
:: Synchronizing package databases...
 allanbrokeit           1464.0B  540.5K/s 00:00:00 [######################] 100%
 allanbrokeit.sig        287.0B    7.0M/s 00:00:00 [######################] 100%
...

Beyond that, the checking of PGP signatures occurs during the usual package integrity check stage so will go largely unnoticed unless something goes wrong. This is both a good thing (we all like pacman because of its simplicity) and a bad thing (as the large amount of work done here is not particularly visible to the user). So when everything with package are repo database signing just works for you, remember to thank your local pacman developer (and if it all goes wrong, it was not our fault…).

Pacman Package Signing – 2: Pacman-key

In this second part of the ongoing series of articles about the implementation of package signing in pacman, I am going to focus on keyring management and the new tool pacman-key that is provided to help with this. You can read the previous entry covering makepkg and repo-add here.

The way in which the PGP keyring for pacman is managed will be an essential aspect of the security of your system. The keyring (in combination with configuration options for pacman itself), will control which package and database signatures that you trust and thus what packages get onto your system. In fact, I’m still not entirely sure how best to set-up the keyring in terms of importing keys and setting their trust levels as the only repo I currently use that has full signing is my own and for that I can just add my own key with ultimate trust. Adding my key with ultimate trust would not be ideal for other people to do, but then again it may be acceptable given it is in a keyring for pacman only. But this is more the social aspect of PGP signing so I will leave discussing that further to another time.

However the keyring is set-up, it is helpful to have a tool to manage it. While this could be done directly using gpg with the --homedir flag, there are a few pacman specific keyring management issues that warranted the creation of a separate tool. Enter pacman-key. Originally this was a port of Debian’s apt-key to pacman by Denis A. Altoé Falqueto, but has slowly become closer to being just a gpg wrapper with additional functions. I’ll also add a shout-out to both Ivan Kanakarakis and Pang Yan Han who also contributed multiple patches towards this script and Guillaume Alaux who provided the initial man page.

The pacman keyring will be located (by default) in /etc/pacman.d/gnupg (although this can be adjusted using the GPGDir directive in pacman.conf). The keyring should be set-up using pacman-key --init to ensure the files have the correct permissions for full pacman signature checking functionality. For example, to verify package signatures as a user (e.g. using pacman -Qip <pkg>), we need to let the user have read permissions on the keyring files and also add a gnupg configuration file to prevent the creation of a lock file (this is currently required to be done globally as the gpgme library used by pacman does not have the ability to control lock file creation…).

Keys can be added to the pacman keyring in several ways. They can be imported from a local file or files using pacman-key -a/--add <file(s)> or from a public key server using pacman-key -r/--receive <keyserver> <keyid(s)>. You can also --import entire sets of keys and trust dbs from other gnupg keyrings you have. Keys are removed using the -d/--delete option. There is also a mechanism for a distribution or other repo provider to supply a keyring containing all their packagers’ PGP keys to be imported into the pacman keyring, but this area is still undergoing development.

Once you have some keys in your keyring, you can manipulate them using pacman-key and some standard gnupg flags including --edit-key, --export, --list-{keys,sigs}, etc. The --edit-key option is fairly important as it allows you to do things like adjust the trust levels or locally sign keys in the keyring, which builds our web of trust. For any more advanced manipulation of the keyring (or just something that is not wrapped by pacman-key), you need to use gpg directly (although I am sure that if it turns out that a commonly used command is not currently wrapped by pacman-key, it can be added on request…).

And that is basically all there is to the pacman-key tool. It is fairly simple but it is also the part of the package signing implementation that has probably received the lowest volume of testing as it is not a script that will be used everyday. If you would like to help test it out while not touching your system pacman, you can build and run it directly from a git checkout. This should get you there:

$ git clone git://projects.archlinux.org/pacman.git
$ cd pacman
$ ./autogen.sh
$ ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
$ make -C scripts
$ ./scripts/pacman-key

Test initializing a new keyring, adding and removing keys, editing a keys trust level, verifying a file with a detached signature (many packages in the Arch repos are already signed) and report any issues you run into.

Pacman Package Signing – 1: Makepkg and Repo-add

With pacman development progressing smoothly towards an upcoming 4.0.0 release, I thought it would be nice to write about everybody’s favourite (and a not at all controversial) topic… package signing. I will separate the discussion into several parts over the coming weeks, writing about a new area when I personally consider the interface in that area to being relatively finalised. That is not to say what is written about here will not change before the final release, just that I find it unlikely. Note also that I will focusing more on the technical details of the package signing implementation in pacman and its tools. So there will be limited discussion on issues a distribution may face using these features and I will not be specifically covering how Arch Linux will make use of these features.

The first thing that you are going to need to sign packages and repo databases is a PGP key. All the details of creating one using GnuPG can be found elsewhere. The only real consideration is the choice of key type. Currently a 2048-bit RSA key seems to be the gold standard. Going to 4096-bit is probably excessive and being a larger key has the side effect of slowing down the verification process (to an extent that is noticeable on older CPUs).

Once you have that sorted, it is time to sign some packages using makepkg. The implementation is quite simple. When a package signature is needed, makepkg simply calls gpg --detach-sign on the package(s) it creates. If you have the GnuPG-Agent is running, you will not even be asked for your passphrase (depending on your set-up). Deciding whether to sign packages or not is primarily controlled through the “signBUILDENV option in makepkg.conf, but can be overridden on the command line using --sign or --nosign. By default, the package will be signed with your primary PGP key. If you wish to use another key, you can set the GPGKEY variable (either in makepkg.conf or the environment), or use the --key option with makepkg.

The additions to repo-add are similarly simple. When adding a package to a repo database, repo-add checks for a detached signature and if present adds it to the package description entry, ready for libalpm to process. Finally, signing packages is not enough. We also need the ability to sign the package database (e.g. to prevent the holding back of an update to an individual package containing a security vulnerability). This is done using similar options to makepkg, with -s/--sign to tell repo-add to sign the database and --key (or the environmental variable GPGKEY) to select a non-default GPG key to sign with. In addition, repo-add has a -v/--verify flag that checks the current signature is valid before proceeding (very important as repo-add adjusts the current database rather than regenerating it from scratch).

As an aside, a couple of other useful security features have made their way into makepkg and repo-add during this development cycle. The ability to automatically check PGP signatures for source files has been added to makepkg (thanks to first time contributer Wieland Hoffmann). This is done by detecting files in the source ending in the standard extensions .sig and .asc. A source file and signature can be quickly be specified using bash expansion like:

sources=($pkgname-$pkgver.tar.gz{,.sig})

which makes it quite clear which source files have signatures. If wanted, this check can be skipped using the --skippgpcheck or the --skipinteg options (the latter of which also ships checksum checks). Also, repo-add includes a SHA256 checksum in the repo database in addition the the current MD5 checksum, although currently libalpm (and thus pacman) does nothing with this entry. (Despite some prior assertion, adding that properly took more than a one line change… but I will leave that there.)

Finally, a quick note on the challenges faced by distributions using these tools for package and database signing. The facilities provided by makepkg and repo-add work well for repositories where the packages get built locally, added to the repo database and then mirrored to their server (such as the repo I provide), but may not be ideal to use for a larger distribution repository maintained by multiple people. For example, if building a package on an remote build server, then the packager should not want to put their private PGP key onto that server to sign the package. It currently appears that there is no easy way around this, so the package building and signing steps need to be separated, with the built package downloaded locally and then signed (although this may change in future GnuPG releases as I see patches have been recently submitted to their mailing list providing a proof-of-concept implementation to improve remote signing functionality). Similarly, how is it best to sign a repository database that is added to by multiple packagers? Having some sort of master key sign it requires some sort of reduction in security of the passphrase (with either all people pushing to the repo knowing it or having it somehow accessible to the script adding the packages to the repo database). If set-up with care, this may be acceptably low risk for a distribution to use (and, from what I understand, this is what is done by several distributions), but personally I do not see it as an ideal solution. And that brings us back to the issue of how to best sign a remote file. So, implementing the tools may actually be the simple part in all of this…

Secondary Package Management With Pacman

Want to try out new software but also keep your system clean of packages that you do not use? Unless you keep good track of everything you install for a trial, you are likely to leave some unwanted packages on your system at some stage. Not that they generally do anything apart from take disk space (at least on Arch Linux), until one day when you are doing an update and you think “What is that package doing on my system?”.

One way I have found to keep track of packages you want to temporarily install is to have a sort of secondary package management system within the main pacman database. This is achieved through abuse of the dependency tracking features of pacman. Any package that is to be installed for a temporary period get installed with the --asdep flag. This tells pacman that the package is a dependency. Given no other package depends on it, it is what is commonly referred to as an “orphan” package and can be listed using pacman -Qtd. Currently on my system I have:

$ pacman -Qtd
gimp 2.6.11-6
vlc 1.1.10-6

When I no longer want these packages on my system, they will be uninstalled in the standard way (pacman -Rs pkg). If I decide to keep the package, I can change the pacman database entry using the little known -D/--database flag. E.g. pacman -D --asexplicit vlc will change the install reason for the vlc package from being “Installed as a dependency for another package” to “Explicitly installed”. It will no longer be listed as an orphan, effectively taking it out of this secondary pacman management system inserting it into the main one.

Pacman 3.5.0 Released

It is time for another major pacman release. Here is a brief overview of the new features:

The feature that will be immediately noticed on the pacman upgrade is the change of database format. This was a step towards reducing the large number of small files pacman had to read, which was a major cause of performance issues (particularly on systems with slow hard-drives). Two major changes occurred: the sync database became a single file per repo and the local database had some of its files merged. The sync databases are now read directly from the database (compressed tarball) that is downloaded from the mirrors. No extraction means no fragmentation of the database across the filesystem. The “depends” and “desc” files in the local package database were merged into one file as there was actually little point for them being separate. This results in an approximately 30% less files to be read for the local database on an average system. A script (pacman-db-upgrade) is provided to preform this database upgrade and pacman will abort if a database in the old format is found. Any scripts that read directly from the database will need to be updated to deal with these new formats. Or better yet, they could be written to use libalpm which would make them robust to future changes (the local database format could be improved further). Combine the database changes with other speed enhancements (improved internal storage of package caches, faster pkgname/depends searches) and this pacman release is notably faster.

Until now, a great way to break your system during an update was to run out of disk space. Pacman now attempts to avoid this in two ways. Firstly, it will (optionally) calculate the amount of disk space needed to perform the update/install and check that your partitions have enough room. Doing this calculation is actually fairly involved and I’m sure we will encounter some case of a filesystem and platform combination that we have not tested where this calculation is not correct… I know for certain that it does not work in chroots. The “solution” in these cases is to disable this check in pacman.conf and make a bug report with all the details needed to replicate the issue (except the chroot case). As a second line of defence for disk space issues, pacman will report any extraction error it encounters and attempt to stop installation on the important ones.

A much missed feature in pacman-3.4 was the ability to select which packages you wanted to install from a group. Well, that is back and better than ever! Additionally, the selection dialog is also extended to package provisions, allowing the user to select which provider package they want installed rather than pacman just installing the first one it found.

A feature that will primarily affect packagers is the removal of the “force” option that would result in packages being installed from the repo even if the version was not considered newer by pacman. This was useful for packages with weird versioning schemes (is that “a” for alpha or the first patch level?), but it resulted in strange update behaviour for those who had built themselves newer versions of a package locally. This has been replaced by the use of an optional “epoch” value in the package version – so a “complete” package version looks like epoch:pkgver-pkgrel. If present, the value of the epoch field takes precedent over the rest of the package version.

The main addition to makepkg is the ability to run a check() function between build() and package(). This optional function is useful for running package test suites (or even better, not running them in the early builds when bootstrapping a package). Other changes include the removal of STRIP_DIRS (now all files are stripped by default), adding a buildflags option to disable CFLAGS etc, and allowing the use of $pkgname in split package functions.

For a more complete list of changes in pacman-3.5, see the NEWS and README files in the source code.

Basic Overview of Pacman Code

I am far from knowledgeable about most areas of the pacman code-base, so whenever I want to implement a new feature I first have to sit down and walk step-by-step through a transaction and figure out the code path. Given I have now done this several times, I thought it a good idea to post it for further reference.

Here is my brief overview of what happens with a pacman -S <pkg> transaction:

pacman/pacman.c:
 -> int main(int argc, char *argv[])
      - parse command line
      - parse config file
 
pacman/sync.c:
 -> int pacman_sync(alpm_list_t *targets)
      - check we have a pacman database
      - check target list for SyncFirst packages
 
 -> static int sync_trans(alpm_list_t *targets)
      - initialise transation
      - process the target list
 
libalpm/trans.c:
 -> int SYMEXPORT alpm_trans_prepare(alpm_list_t **data)
      - perfoms checks (via _alpm_trans_prepare)
         - requested packages have valid architecture
         - all deps are available
         - deps do not directly conflict (not files)
 
 -> int SYMEXPORT alpm_trans_commit(alpm_list_t **data)
 
libalpm/sync.c:
 -> int _alpm_sync_commit(pmtrans_t *trans, pmdb_t *db_local, alpm_list_t **data)
      - download needed files
      - deal with deltas (if any)
      - check package integrity
      - check file conflicts
      - remove packages (conflicts/replacements)
      - install targets

Doing a pacman -Syu takes you down exactly the same code path with the only difference being that the package databases are updated and the list of packages to be updated is calculated in the pacman_sync function.

While that is a fairly basic overview, it has more than enough detail for me to locate where I should implement checking for free disk space before proceeding with a package install (which has been a long time feature request for pacman).

I hope this also shows people that the pacman code is not that complex. There are quite a few old bug reports/feature requests in the bug tracker that are obviously very low on the developers priority list and are good candidates for new contributors. Just step through the code until you find the relevant section and then get started!

Pacman 3.4.0 Released

As Dan has already posted about, pacman-3.4.0 has been released. There are a bunch of new features that I am really enjoying.

Firstly, when updating it database, pacman will only extract the new entries. This is similar to what Xyne’s rebase script does (without all the extra output). I had not realised how awesome this feature was until I updated my chroots this morning. It speeds the process up immensely. The chroots using pacman-3.4 extracted the [extra] repo database with a barely noticeable pause while those using pacman-3.3 took a while.

The other feature that I am enjoying is the addition of a functional ‘which’ to the file ownership query. In the past, to find the owner of a binary in my path I would do something like pacman -Qo $(which makepkg) or provide the full path manually. Now pacman will search for binaries in your path automatically, so this is achieved with pacman -Qo makepkg.

Installing packages with pacman -U has received a nice overhaul, allowing pacman to handle package replacements and install needed dependencies all in one transaction. No more removing a package with pacman -Rd and then installing its replacement.

And makepkg also received its share of upgrades. It now automatically exits on build/packaging errors in PKGBUILDs so there is no more need to have “|| return 1” after the commands. Package splitting has improved with pkgver, pkgrel and arch now being able to be overridden and being able to only build subsets of a split package.

Of course, many other features made it into this pacman release. As always, many changes will hopefully never be noticed by a user (e.g. checking a package architecture matches the system architecture before installing, a major rewrite of the pacman bash completion, overhaul of tests in makepkg, more configurable library stripping during packaging), but all these are very useful contributions. See here for a more detailed summary of the changes and the git log for all the details of changes.

A pacman-3.4.0 package is currently in the [testing] repository for Arch Linux. We all know pacman releases are bug free (as the two patches already in the 3.4.1 queue can attest), so look forward to it being in a [core] repo near you in the not too distant future.

New PKGBUILD syntax options with pacman-3.3

With the pacman 3.3 release expected in the coming weeks, I thought I would write about some of the new features that have been added to PKGBUILDs.

The most common change people will make in their PKGBUILDs is to add a package() function. This limits the of fakeroot to only during the file installation steps (so it is not used during the build process). Using fakeroot only during the install stages is considered a “good thing”, but this also provides a workaround for some bugs in fakeroot that can cause issues while attempting to compile a package. A partial example:

build() {
  cd $srcdir/$pkgname-$pkgver
  ./configure --prefix=/usr
  make
}
 
package() {
  cd $srcdir/$pkgname-$pkgver
  make DESTDIR=$pkgdir
  install -Dm644 $srcdir/license $pkgdir/usr/share/licenses/$pkgname/license
}

Note the “cd” step is required in the package() function as makepkg currently does not remember what directory was being used between the build() and package() functions. The package() function is entirely optional, so all PKGBUILDs without one will continue to work as they always have.

The other main feature addition to PKGBUILDs is the ability to create split packages. In Arch Linux, this is useful for packages that are split due to providing separate packages for libraries and binaries (e.g. gcc) or where documentation is too large to justify distributing together with the main package (e.g. ruby). The Arch KDE-4.3 release will also use package splitting, as many lesser used components pulled in a large number of dependencies and made an unsplit KDE install very heavy.

Creating split packages is rather simple. All you have to do is assign an array of package names to the pkgname variable. e.g.

pkgname=('pkg1' 'pkg2')
pkgbase="pkg"

This tells makepkg that it is creating two packages called pkg1 and pkg2. The pkgbase variable is optional, but can be used to hold (e.g.) the upstream package name and is used by makepkg in its output. Each split package requires its own package() function with name in the format package_foo(). e.g. for the above pkgname, the PKGBUILD would have functions package_pkg1() and package_pkg2(). In these functions, the use of the $pkgdir variable is mandatory as it is no longer equivalent to the deprecated $startdir/pkg. All options and directives for the split packages default to the global values given within the PKGBUILD. Most variables can be overridden within the package function. e.g

...
depends=('glibc')
makedepends=('perl')
...
package_pkg1()
{
  depends=('perl')
}
 
package_pkg2()
{
...
}

The pkg1 package will depend on perl while pkg2 does not override the depends array so will depend on glibc. As a general rule, almost every package in the split packages depends array should probably be present in the global makedepends array.

There are several other useful features added to makepkg such as improved handling of info files (automatic removal of $pkgdir/usr/share/info/dir and compression of info files) and being able to specify LDFLAGS in makepkg.conf. Check out a detailed list of changes in the NEWS file in pacman git repo.

For those that want to try out makepkg before the pacman 3.3 release, you can grab a copy of my makepkg-git package which installs alongside the current version of pacman.

Edit – clarified PKGBUILD directive overrides for split packages and added an note about makepkg-git