Pacman 4.1.0rc1

For those that are mildly adventurous, you can try the pre-release of the upcoming pacman-4.1. There are a handful of us who constantly run pacman from git so it should be fairly safe. All bugs found are to be reported to the bug tracker. (Only one issue found so far – in the rarely used pkgdelta script).

Download: i686 x86_64

I’ll make a post about all the new features when the final 4.1.0 release is made – hopefully before the end of the month.

Advantage of a Simple “Database” Format

One fairly common criticism of the pacman package manager is that is very slow due to not using some sort of binary database as its backend. I found suggestions to use sqlite dating back to 2005 (although I am sure they go back further) and mailing list activity peaked around late 2007. Speed is one of pacman’s main features – and it beats the competition by a wide margin according to Linux Format – but I guess people want it even faster.

The problem is that we use a filesystem based “database” where each package has its information stored in multiple files. This means that we can get fragmentation of our “database” and the reading of all these files from the filesystem can be quite slow. Usually most of this is cached by the kernel after the first read so speed improves markedly after the first usage.

This was improved a lot in the pacman 3.5 release (March 2011). The sync databases started to be read directly from the downloaded tarball and the local database had the “desc” and “depends” files for each package merged into one file. This increased the speed of reading from the sync databases massively and was a reasonable improvement to the local database too.

So the local package “database” could be improved by reducing it to one or a few files. But every time I think about changing it, I am reminded why I like the plain text file format. I was updating a reasonably out of date computer when I had an issue with the python-pygame package being renamed to python2-pygame. All packages needing in the Arch Linux repos were rebuilt with the new dependency name, so it did not need a provides entry. But my solarwolf package from the AUR still depended on the old name:

$ pacman -S python2-pygame
resolving dependencies...
looking for inter-conflicts...
:: python2-pygame and python-pygame are in conflict. Remove python-pygame? [y/N] y
error: failed to prepare transaction (could not satisfy dependencies)
:: solarwolf: requires python-pygame

As we have a file based database, adjusting the dependency is easy without rebuilding the package. Just open the relevant file and edit away (or use sed…)

$ vim /var/lib/pacman/local/solarwolf-1.5-5/desc

Now I can see my local database has an issue using the handy testdb tool – solarwolf depends on python2-pygame, but that is not installed.

$ testdb
missing python2-pygame dependency for solarwolf

But now I update as usual, installing python2-pygame which removes python-pygame, and my local pacman database is fully consistent.

I am sure all of this would still be possible if the database was in some other format, but it would have required more tools than a simple text editor. Of course, most people should never need to edit their local database, but I have introduced changes to it several times during pacman development and I consider being able to easily fix or revert these in the category of a “good thing”. And yes, I develop and test directly on my production system…

Of course, it is better to use a real database in performance critical situations. But pacman really does not fall into that category.

Changes To VCS Packaging Support In Makepkg

The current support from building packages from version control systems (VCS) in makepkg is not great for a number of reasons:

  • It relies on obscure (but documented…) variables being specified in the PKGBUILD, which actually achieve nothing in terms of downloading and updating the source as needed.
  • The whole VCS checkout/update mechanism needs repeated across every PKGBUILD that uses it so is a lot of unnecessary code duplication.
  • Building a package from a specific revision/branch/tag/… required using an altered version of this code, resulting in many non-standard work-arounds being made.
  • The automatic updating of the pkgver happens in what may not be an obvious way. For example, the pkgver for git PKGBUILDs is set to the build date, not the date of the last commit. Even if it was the date of the last commit, that can be far from unique. (Why not use git --describe? Because that relies on the tag being something suitable for an actual version number and many repos do not follow this.)
  • Even when a revision number is used for the updated pkgver, this results in different behaviour for different VCS. For example, with hg repos, you have to download/update the repo to determine the latest revision.
  • The updating of the pkgver is done before the makedepends are installed, so can fail if it relies on VCS tools.
  • The --holdver flag stopped the pkgver being updated, but the VCS repo was still updated to the latest version as usual.
  • VCS sources ignore $SRCDEST
  • Offline building (using pre-downloaded sources) required adjusting the PKGBUILD
  • You can not create a source package with the VCS sources included using --allsource

In fact, the issues with the current VCS implementation accounted for almost 10% of the bugs in the pacman bug tracker and there are a number more in the Arch bug tracker about how to improve the supplied prototypes for the VCS PKGBUILDs. It was clearly time for a rewrite.

An idea that had seen some discussion over the years, was to just put the VCS sources in the source array. Makes sense… right? The problem was choosing an appropriate syntax for the URLs that was consistent with what was already used and also flexible enough to handle the various possibilities of a VCS source. The format decided on is:

source=('[dir::][vcs+]url[#fragment]')

Simple! Well, it will be once I explain the parts… The url component should be obvious. The problem with it is that there is often no way to tell that is a VCS source. For example, for git repos without the git protocol enabled on the server, this will start with (e.g.) http://. To work around this, an optional vcs prefix can be added to the URL. So for git over http, you would used git+http://. This is based on the already used syntax when downloading subversion repo over ssh.

At the end of the URL is an optional #fragment. Providing information in a URL after a # character is some sort of standard that I am too lazy to provide a link for… Anyway, it allows us to specify information about what we want to check out when building. For example, I build my pacman-git package using the working branch of my git repo. To check that out, I use:

source=('git+file:///home/arch/code/pacman#branch=working')

Note the use of the git+ prefix there that allows me to check out from a local copy of my repo. The list of recognized fragments is built into makepkg and is documented in the PKGBUILD man page.

Finally, there is the optional dir:: prefix. This allows the specifying of a directory name for makepkg to download the source into. If not specified, makepkg trys to pick a good name from the URL, but there is such variation in VCS URLs that it will be often useful to change it. This is an old, but little known, syntax available in PKGBUILDs, which can be used to rename any source file once it is downloaded.

So now that VCS sources can be used, even multiple different repos to build the one package, how does makepkg chose how to update the pkgver variable? Sort answer is that it doesn’t. You can provide a pkgver() function that outputs a string to be used for the updated package version. This is run after all the sources are downloaded and (make-)dependencies are installed. For my pacman-git package, I use something like:

pkgver() {
  cd $srcdir/pacman
  echo $(git describe | sed 's#-#_#g;s#v##')
}

Currently supported protocols in the master git branch of pacman are git (branch, commit, tag), hg (branch, revision, tag), svn (revision). That covers ~92% of the VCS PKGBUILDs in the AUR. Adding support for the remaining VCS that are used (bzr, cvs, darcs) – or any other VCS – is quite simple but requires knowing how to efficiently use the VCS tools. I will create a patch to support any additional VCS if someone provides me:

  1. How to checkout a repo to a given folder.
  2. What url “fragments” need supported for that VCS.
  3. How to create a working copy of the checked out repo (i.e. “copy” the primary checkout folder) and how to get it to the specified branch/tag/commit/whatever. That can be in all one step.

Note that the old VCS PKGBUILDs will not stop working as such, although they are likely to be broken… At least the pkgver will no longer update. I’m sure there are other subtle incompatibilities too and you would still suffer from all the issues listed above, so it is definitely worth getting proper support for your needed VCS into makepkg.

If you want to take the new implementation for a spin, checkout a copy of the pacman git repo and build it. For those that are somewhat brave, you could even use the pacman-git package in the AUR, but make sure you know the risks involved in running a developmental version of a package manager entails…

The Great Pacman Bug Hunt of 2012

This is a story about a recent issue discovered in pacman, the Arch Linux package manager, and the difficulties we had hunting it down… The story is long, but so was the process of finding the bug.

It all started on a warm summer’s night (in my timezone and location… – it was probably cold and daytime for the other main pacman developers) with the reporting of FS#27805: “[pacman] seg faults when removing firefox”. Of course, my initial reaction was “bull shit” as we all know there are no bugs in the pacman code. But this was only a couple of weeks since pacman-4.0 was moved into the Arch Linux [core] repo so there was an ever so slight possibility it was real.

Luckily for us, the user reporting the bug was very helpful and installed a version of pacman with debugging symbols and gave us a full backtrace. It was very clear where the segfault was occuring:

#0 0xf7fbd4e7 in _alpm_pkg_cmp (p1=0x8128aa0, p2=0x0) at package.c:644

That function is called in the package removal process when we check that a file that is going to be removed with a package is not also owned by another package (which would require someone using -Sf when they should not). If the package in the local database is the same as the one being removed, we do not need to run this check, and hence the test. As you can see above, for some reason _alpm_pkg_cmp is being passed a null pointer as the package from the local database and KABOOM!

So the question was, how do we get a null value for the package from our local database? Given pacman runs through the list of local packages on each package removal, this null entry must have been generated on the removal of the previous package. Here is a bit of background on how package information is stored in pacman. Package information is stored in a hash table that also provides access to the data as a linked list. This provides us with fast look-up by a package’s name but also allows us to loop through the (generally sorted) package list. Now the hash table code is fairly new (first introduced in pacman-3.5) and the removal of items from a hash with collision resolution done by linear probing is not straight forward, so there could be a bug. Dan pointed his finger my way as I wrote the original hash table code and I pointed my finger his way as he made optimizations to the removal part. But it turns out that both of us were not thinking too hard. It is the list that is being corrupted and that has items removed using code that has been around for years. Despite that, the whole hash table and linked list removal code got an in depth review and no issues were found.

We were stumped. Looking at the the debug output from pacman, we could see that a file that actually did not exist on the system was being “removed” right before the crash, but that is not uncommon and appeared to be handled correctly so was unlikely to be the cause. So back to the reporter to see if we could get more information to replicate. He was very helpful and provided us with a copy of his local package database. We created a chroot with exactly the same packages and had no luck replicating. The user even provided us with a complete copy of his chroot where the error was occurring, but again there was no luck replicating. It must be something specific to that users system. Right? Well, even re-extracting the tarball of the chroot the user provided us onto his own system made the bug go away. All in all, a great candidate for being “not a bug”….

Until on another warm summers evening, while being my usual extremely helpful self on IRC, someone mentioned they were getting a segfault while removing packages. A bug report was filed and, again, the user was extremely helpful and the backtrace provided was exactly the same. A core dump showed us there was definitely something wrong with the linked list. Well… bugger! This bug appears real. Again the red-herring of the removal of a non-existent file was shown in the debug log, but it would be very, very strange for that to break the linked list of package information so was ruled out.

It was time to find a reproducer! So I created a chroot and set this script running:

ret=0
while (( ! ret )); do
  pkg=$(pacman -Sql extra | shuf -n1)
  pacman -S --noconfirm $pkg
  pacman -R --noconfirm -s $pkg
  ret=$?
done

Within five minutes I could replicate the segfault. (It turns out I was very lucky as I ran the same script again for over four hours and did not strike the issue.) Now it was time to get debugging!

The first thing I did was print some debugging info in the linked list node removal code, but for some reason the node removal just before the segfault did not print anything. I was only printing information when removing a node from the middle of the list (because that is where the package causing this issue was located), but just to be sure I also added debug statements for the case of removing the head and tail nodes. And then pacman told me it was removing a node from the end of the list… “Why do you think that package is a the end of the list pacman?”, I asked. “Because the head node’s prev entry tells me it IS the end of the list”, replied pacman. “Oh, crap”, I said. “So it does!” Something was clearly wrong here.

It was time to investigate all removal operations on that list. So I printed the entire linked list before and after each package removal and found the error actually occurred before the removal operation even started. The initial list of the local database passed to the removal operation was already broken with the pointer to the tail entry not pointing to the tail. That was good to know as we had thoroughly reviewed the removal code and not found any issues.

This lead me to believe that the error must occur when reading in the local database. Next step: print out the linked list at the end of reading in the local database. But that was completely fine. So somewhere between reading in the local database and using it, things got broken. And, what do we do with the local database between reading it in and removing items from it? The only place where we modify the local database between those points is when it gets sorted by the package names. Sure enough, the pointer to the tail of the linked list is good going into the sort and bad coming out.

This limited the error to two functions: alpm_list_msort or alpm_list_mmerge. These implement a merge sort. Essentially alpm_list_msort recursively calls itself, dividing the list up into smaller pieces until it can not be divided any further and they are then they are merged in sorted order by alpm_list_mmerge. I had just started staring at the code when I saw something that seemed too obvious for such a hard to track down bug. My exact words on IRC were “I think I can fix this…”. And sure enough I could.

It turns out that when alpm_list_msort split a list into two, it did not set the pointer to the tail nodes in the two new lists correctly (or at all…). So a two line addition and we have the bug fixed. It turns out this bug had been present since the start of 2007. So I am still slightly amazed that we did not see it before now and when it did appear that we got a second report of it so quickly.

And why could we not reproduce the issue even with a copy of a chroot where it was occurring? It is entirely dependent on the order the directory entries are returned from the disk. This determined which package was pointed to as the “tail” of the sorted package list. The package incorrectly referred to as the tail had to be removed during a removal operation, and also not be the last package removed, to expose the bug. Given most systems will have many hundreds of packages on them and removal operations tend to involve one or a few packages, this is a fairly rare occurrence. But even if it occurred only a fraction of a percent of removal operations, I think we should have ran into this bug before now. I guess more people probably did experience the issue, but then could not immediately replicate and did not experience the issue again so did not report it.

And that is the end of the story of one of the most frustrating bugs I have ever managed to track down. A big thank you to the two users who installed versions of pacman with debug symbols and provided us backtraces, coredumps and entire chroots! Without their help, we would probably still be not entirely convinced that the bug was real and it would still be hiding away in the pacman source code.

Pacman Package Signing – 4: Arch Linux

I have previously covered the more technical aspects of the implementation of PGP signing of packages and repository databases in pacman. You can read the previous entries here:

Since then, pacman-4.0 has been released and has been in the [testing] repository in Arch Linux for a while. That means that the signing implementation is starting to get some more widespread usage. No major issues have been found, but there are some areas that could be improved (e.g. the handling of the lack of signatures when installing packages with -UFS#26520 and FS#26729). And it has successfully detected a “bad package” in my repo… (well, not really a bad package, but a bad signature. Lesson: do not try creating detached signatures for multiple files at once because gnupg is crap…).

The Arch repos have been gradually preparing for the package signature checking in pacman-4.0. Support for uploading PGP signatures with packages was added in April and was made mandatory from the beginning of November. As of today, 100% of the packages in the [core] repo and approximately 71% of [extra] and 45% of [community] are signed.

So all the components are coming together nicely. But how does this work from a practical standpoint? I’ll start with setting up the pacman PGP keyring and pacman.conf.

When first installing pacman-4.0, you should initialize your pacman keyring using pacman-key --init. This creates the needed keyring files (pubring.gpg, secring.gpg) with the needed permissions, updates the trust database (obviously empty at this point…), and generates a basic configuration file. It also generates the “Pacman Keychain Master Key”, which is your ultimate trust point for starting a PGP web of trust. You may want to change the default keyserver in the configuration file (/etc/pacman.d/gnupg/gpg.conf) as some people have issues connecting to it.

The set-up of your pacman.conf file is somewhat a matter of personal preference, but the values I use are probably reasonable… I have the global settings for signature checking as the default value (Optional TrustedOnly). This basically sets the need for signatures to be optional, but if they are there then the signature has to be from a trusted source. See the pacman.conf man page for more details. For the Arch Linux repos with all packages signed, I set PackageRequired which forces packages to be signed but not databases. (For the small repo I provide, I use Required as both packages and databases are signed.)

Lets look at some output when installing a signed package:

# pacman -S gcc-libs
warning: gcc-libs-4.6.2-3 is up to date -- reinstalling
resolving dependencies...
looking for inter-conflicts...
 
Targets (1): gcc-libs-4.6.2-3
 
Total Installed Size: 2.96 MiB
Net Upgrade Size: 0.00 MiB
 
Proceed with installation? [Y/n]
(1/1) checking package integrity [######################] 100%
error: gcc-libs: key "F99FFE0FEAE999BD" is unknown
:: Import PGP key EAE999BD, "Allan McRae ", created 2011-06-03? [Y/n] y
(1/1) checking package integrity [######################] 100%
error: gcc-libs: signature from "Allan McRae " is unknown trust
error: failed to commit transaction (invalid or corrupted package (PGP signature))
Errors occurred, no packages were upgraded.

As you can see, pacman struck a package that had a signature from an unknown key. It then asks if you would like to import that key. Given the PGP key fingerprint matches that published in multiple places, importing that key seems fine. Then pacman errors out due to that key not being trusted. Well, that Allan guy seems reasonably trustworthy… so I could just locally sign that key using pacman-key --lsign EAE999BD and that key will now be trusted enough to install packages.

Validating every Arch Linux Developer’s and Trusted User’s PGP key would soon become annoying as there are a fair number of them (35 devs and 30 TUs – with some overlap). To make this (a bit…) simpler, five “Master Keys” have been provided for the Arch Linux repositories. The idea behind these keys is that all developer and TU keys are signed by these keys and you only need to import and trust these keys in order to trust all the keys used to sign packages. These key fingerprints will be published in multiple places so that the user can have confidence in them (see the bottom of this post for a listing of the fingerprints obtained relatively independently of those listed on the Arch website).

To set-up your pacman keyring with these keys, you can do something like:

for key in FFF979E7 CDFD6BB0 4C7EA887 6AC6A4C2 824B18E8; do
    pacman-key --recv-keys $key
    pacman-key --lsign-key $key
    printf 'trustn3nquitn' | gpg --homedir /etc/pacman.d/gnupg/
        --no-permission-warning --command-fd 0 --edit-key $key
done

That will import those keys into your keyring and locally sign them. But that is not quite enough as those keys are not used to sign packages themselves. In order for pacman to trust PGP keys signed by the master keys you have to assign some level of trust to the master keys. The final line gives the master keys “marginal” trust. Note I use gpg directly rather than pacman-key as pacman-key does not understand the --command-fd option. You could use pacman-key --edit-key if you wanted to manually type in the commands to set the trust level. By default, the PGP web of trust is set up such that if a key is signed by three keys of marginal trust, then that key will be trusted. (We have five master keys rather than the minimal three so that we can revoke two – a worst case scenario… – and still have our packages trusted.) Note that setting the master keys to have marginal trust serves as a further safety mechanism as multiple keys would need to be hijacked to create a key that is trusted by the pacman keyring.

Now that the five master keys are nicely imported into your pacman keyring, any time pacman strikes a package from the Arch Linux repos with a signature from a key it does not know, it will import the key and it will automatically be trusted. At least that is the idea… We are still in a transition period so not all Developer and Trusted User keys are fully signed yet by the master keys yet, but we are not too far off. In the future we might provide a pacman-keyring package that streamlines this process a bit, or at least will save the individual downloading of each packager’s key.

That just leaves the signing of the databases, but that is a story for another day!


Arch Linux Master Key fingerprints:
    Allan McRae – AB19 265E 5D7D 2068 7D30 3246 BA1D FB64 FFF9 79E7
    Dan McGee – 27FF C476 9E19 F096 D41D 9265 A04F 9397 CDFD 6BB0
    Ionuț Mircea Bîru – 44D4 A033 AC14 0143 9273 97D4 7EFD 567D 4C7E A887
    Pierre Schmitz – 0E8B 6440 79F5 99DF C1DD C397 3348 882F 6AC6 A4C2
    Thomas Bächler – 6841 48BB 25B4 9E98 6A49 44C5 5184 252D 824B 18E8

Pacman Package Signing – 3: Pacman

And on with the “final” component of the package signing saga… I have previously posted about signing packages and databases and managing the PGP keyring, which was all preparatory work for pacman to be able to verify the signatures.

In the end, most people will not notice pacman verifying signatures unless something goes wrong (at least once it is configured). You will see the same “checking package integrity” line, but instead of verifying the packages md5sum, the PGP signature will be checked if available. But implementing this required substantial reworking of the libalpm backend, with the adding of signature verification abilities through the use of the gpgme library, adding flexible configuration options to control repo and package signature verification, changes to how and when repo databases get loaded (so that we can error out early if the repo signature is bad), and the list goes on… The majority of this was done by Dan McGee, who is the lead pacman developer. In fact, looking at the git shortlog for this development cycle:

$ git shortlog -n -s --no-merges maint..
   296  Dan McGee
   128  Dave Reisner
   124  Allan McRae
   ...

(followed by 18 other contributors with 11 or less commits each). So Dan takes the clear lead with about 50% of all commits in this developmental cycle, while the battle for second place remains intensely competed for!

So what have we ended up with? My opinion is ever so slightly biased, but I think we have ended up with the most complete and flexible package signing implementation yet. Most other package managers signature checking is simply a call to gpgv, which trusts any signature in your keyring. With the more complicated solution using gpgme, pacman has the complete concept of the web of trust, allowing for very precise keyring management. We not only sign packages, but sign databases too. Importantly, we can add expiry times to those signatures, which together prevents a malicious mirror holding back individual package updates or deliberately not providing any updates at all. As an aside, we also now protect against the “endless data attack” where an attacker sends an endless data stream instead of the requested file. Together that covers the most well reported avenues of attack on package managers (I hesitate to say “all” despite not knowing of any others because someone will prove me wrong!).

Onward to the actual use of signature checking in pacman. The main adjustment needed to be made is the addition of the SigLevel directive to pacman.conf. This can be specified at a global level and also on a per-repo basis. The SigLevel directive takes three main values: Required, which forces signature checking to be performed; Optional (default), which will check signatures if present but unsigned packages and databases will be accepted; and Never, which sets no signature checking to be preformed. More fine grained control can be added by prefixing these options with Database and Package and combining multiple options. For example, I have a local repo that has a signed database but not all packages have signatures. So I use SigLevel = Optional for my global default and add SigLevel = DatabaseRequired to enforce the database to be validly signed for that repo. Alternatively, I could use SigLevel = DatabaseRequired PackageOptional to explicitly achieve the same result. You can also specify the level of trust needed in a signing key using the TrustedOnly (default) and TrustAll options. The former will only accept a key if it is fully trusted by your PGP keyring, while the latter only requires the key to be present in the keyring (much like using gpgv).

As I wrote earlier, there is very little change from a users perspective once configured. About the only thing that is really noticeable is that pacman will attempt to download a signature for each database it downloads when the database SigLevel is set to Required or Optional. For example:

$ pacman -Syu
:: Synchronizing package databases...
 allanbrokeit           1464.0B  540.5K/s 00:00:00 [######################] 100%
 allanbrokeit.sig        287.0B    7.0M/s 00:00:00 [######################] 100%
...

Beyond that, the checking of PGP signatures occurs during the usual package integrity check stage so will go largely unnoticed unless something goes wrong. This is both a good thing (we all like pacman because of its simplicity) and a bad thing (as the large amount of work done here is not particularly visible to the user). So when everything with package are repo database signing just works for you, remember to thank your local pacman developer (and if it all goes wrong, it was not our fault…).

Pacman Package Signing – 2: Pacman-key

In this second part of the ongoing series of articles about the implementation of package signing in pacman, I am going to focus on keyring management and the new tool pacman-key that is provided to help with this. You can read the previous entry covering makepkg and repo-add here.

The way in which the PGP keyring for pacman is managed will be an essential aspect of the security of your system. The keyring (in combination with configuration options for pacman itself), will control which package and database signatures that you trust and thus what packages get onto your system. In fact, I’m still not entirely sure how best to set-up the keyring in terms of importing keys and setting their trust levels as the only repo I currently use that has full signing is my own and for that I can just add my own key with ultimate trust. Adding my key with ultimate trust would not be ideal for other people to do, but then again it may be acceptable given it is in a keyring for pacman only. But this is more the social aspect of PGP signing so I will leave discussing that further to another time.

However the keyring is set-up, it is helpful to have a tool to manage it. While this could be done directly using gpg with the --homedir flag, there are a few pacman specific keyring management issues that warranted the creation of a separate tool. Enter pacman-key. Originally this was a port of Debian’s apt-key to pacman by Denis A. Altoé Falqueto, but has slowly become closer to being just a gpg wrapper with additional functions. I’ll also add a shout-out to both Ivan Kanakarakis and Pang Yan Han who also contributed multiple patches towards this script and Guillaume Alaux who provided the initial man page.

The pacman keyring will be located (by default) in /etc/pacman.d/gnupg (although this can be adjusted using the GPGDir directive in pacman.conf). The keyring should be set-up using pacman-key --init to ensure the files have the correct permissions for full pacman signature checking functionality. For example, to verify package signatures as a user (e.g. using pacman -Qip <pkg>), we need to let the user have read permissions on the keyring files and also add a gnupg configuration file to prevent the creation of a lock file (this is currently required to be done globally as the gpgme library used by pacman does not have the ability to control lock file creation…).

Keys can be added to the pacman keyring in several ways. They can be imported from a local file or files using pacman-key -a/--add <file(s)> or from a public key server using pacman-key -r/--receive <keyserver> <keyid(s)>. You can also --import entire sets of keys and trust dbs from other gnupg keyrings you have. Keys are removed using the -d/--delete option. There is also a mechanism for a distribution or other repo provider to supply a keyring containing all their packagers’ PGP keys to be imported into the pacman keyring, but this area is still undergoing development.

Once you have some keys in your keyring, you can manipulate them using pacman-key and some standard gnupg flags including --edit-key, --export, --list-{keys,sigs}, etc. The --edit-key option is fairly important as it allows you to do things like adjust the trust levels or locally sign keys in the keyring, which builds our web of trust. For any more advanced manipulation of the keyring (or just something that is not wrapped by pacman-key), you need to use gpg directly (although I am sure that if it turns out that a commonly used command is not currently wrapped by pacman-key, it can be added on request…).

And that is basically all there is to the pacman-key tool. It is fairly simple but it is also the part of the package signing implementation that has probably received the lowest volume of testing as it is not a script that will be used everyday. If you would like to help test it out while not touching your system pacman, you can build and run it directly from a git checkout. This should get you there:

$ git clone git://projects.archlinux.org/pacman.git
$ cd pacman
$ ./autogen.sh
$ ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
$ make -C scripts
$ ./scripts/pacman-key

Test initializing a new keyring, adding and removing keys, editing a keys trust level, verifying a file with a detached signature (many packages in the Arch repos are already signed) and report any issues you run into.

Pacman Package Signing – 1: Makepkg and Repo-add

With pacman development progressing smoothly towards an upcoming 4.0.0 release, I thought it would be nice to write about everybody’s favourite (and a not at all controversial) topic… package signing. I will separate the discussion into several parts over the coming weeks, writing about a new area when I personally consider the interface in that area to being relatively finalised. That is not to say what is written about here will not change before the final release, just that I find it unlikely. Note also that I will focusing more on the technical details of the package signing implementation in pacman and its tools. So there will be limited discussion on issues a distribution may face using these features and I will not be specifically covering how Arch Linux will make use of these features.

The first thing that you are going to need to sign packages and repo databases is a PGP key. All the details of creating one using GnuPG can be found elsewhere. The only real consideration is the choice of key type. Currently a 2048-bit RSA key seems to be the gold standard. Going to 4096-bit is probably excessive and being a larger key has the side effect of slowing down the verification process (to an extent that is noticeable on older CPUs).

Once you have that sorted, it is time to sign some packages using makepkg. The implementation is quite simple. When a package signature is needed, makepkg simply calls gpg --detach-sign on the package(s) it creates. If you have the GnuPG-Agent is running, you will not even be asked for your passphrase (depending on your set-up). Deciding whether to sign packages or not is primarily controlled through the “signBUILDENV option in makepkg.conf, but can be overridden on the command line using --sign or --nosign. By default, the package will be signed with your primary PGP key. If you wish to use another key, you can set the GPGKEY variable (either in makepkg.conf or the environment), or use the --key option with makepkg.

The additions to repo-add are similarly simple. When adding a package to a repo database, repo-add checks for a detached signature and if present adds it to the package description entry, ready for libalpm to process. Finally, signing packages is not enough. We also need the ability to sign the package database (e.g. to prevent the holding back of an update to an individual package containing a security vulnerability). This is done using similar options to makepkg, with -s/--sign to tell repo-add to sign the database and --key (or the environmental variable GPGKEY) to select a non-default GPG key to sign with. In addition, repo-add has a -v/--verify flag that checks the current signature is valid before proceeding (very important as repo-add adjusts the current database rather than regenerating it from scratch).

As an aside, a couple of other useful security features have made their way into makepkg and repo-add during this development cycle. The ability to automatically check PGP signatures for source files has been added to makepkg (thanks to first time contributer Wieland Hoffmann). This is done by detecting files in the source ending in the standard extensions .sig and .asc. A source file and signature can be quickly be specified using bash expansion like:

sources=($pkgname-$pkgver.tar.gz{,.sig})

which makes it quite clear which source files have signatures. If wanted, this check can be skipped using the --skippgpcheck or the --skipinteg options (the latter of which also ships checksum checks). Also, repo-add includes a SHA256 checksum in the repo database in addition the the current MD5 checksum, although currently libalpm (and thus pacman) does nothing with this entry. (Despite some prior assertion, adding that properly took more than a one line change… but I will leave that there.)

Finally, a quick note on the challenges faced by distributions using these tools for package and database signing. The facilities provided by makepkg and repo-add work well for repositories where the packages get built locally, added to the repo database and then mirrored to their server (such as the repo I provide), but may not be ideal to use for a larger distribution repository maintained by multiple people. For example, if building a package on an remote build server, then the packager should not want to put their private PGP key onto that server to sign the package. It currently appears that there is no easy way around this, so the package building and signing steps need to be separated, with the built package downloaded locally and then signed (although this may change in future GnuPG releases as I see patches have been recently submitted to their mailing list providing a proof-of-concept implementation to improve remote signing functionality). Similarly, how is it best to sign a repository database that is added to by multiple packagers? Having some sort of master key sign it requires some sort of reduction in security of the passphrase (with either all people pushing to the repo knowing it or having it somehow accessible to the script adding the packages to the repo database). If set-up with care, this may be acceptably low risk for a distribution to use (and, from what I understand, this is what is done by several distributions), but personally I do not see it as an ideal solution. And that brings us back to the issue of how to best sign a remote file. So, implementing the tools may actually be the simple part in all of this…

Secondary Package Management With Pacman

Want to try out new software but also keep your system clean of packages that you do not use? Unless you keep good track of everything you install for a trial, you are likely to leave some unwanted packages on your system at some stage. Not that they generally do anything apart from take disk space (at least on Arch Linux), until one day when you are doing an update and you think “What is that package doing on my system?”.

One way I have found to keep track of packages you want to temporarily install is to have a sort of secondary package management system within the main pacman database. This is achieved through abuse of the dependency tracking features of pacman. Any package that is to be installed for a temporary period get installed with the --asdep flag. This tells pacman that the package is a dependency. Given no other package depends on it, it is what is commonly referred to as an “orphan” package and can be listed using pacman -Qtd. Currently on my system I have:

$ pacman -Qtd
gimp 2.6.11-6
vlc 1.1.10-6

When I no longer want these packages on my system, they will be uninstalled in the standard way (pacman -Rs pkg). If I decide to keep the package, I can change the pacman database entry using the little known -D/--database flag. E.g. pacman -D --asexplicit vlc will change the install reason for the vlc package from being “Installed as a dependency for another package” to “Explicitly installed”. It will no longer be listed as an orphan, effectively taking it out of this secondary pacman management system inserting it into the main one.

Pacman 3.5.0 Released

It is time for another major pacman release. Here is a brief overview of the new features:

The feature that will be immediately noticed on the pacman upgrade is the change of database format. This was a step towards reducing the large number of small files pacman had to read, which was a major cause of performance issues (particularly on systems with slow hard-drives). Two major changes occurred: the sync database became a single file per repo and the local database had some of its files merged. The sync databases are now read directly from the database (compressed tarball) that is downloaded from the mirrors. No extraction means no fragmentation of the database across the filesystem. The “depends” and “desc” files in the local package database were merged into one file as there was actually little point for them being separate. This results in an approximately 30% less files to be read for the local database on an average system. A script (pacman-db-upgrade) is provided to preform this database upgrade and pacman will abort if a database in the old format is found. Any scripts that read directly from the database will need to be updated to deal with these new formats. Or better yet, they could be written to use libalpm which would make them robust to future changes (the local database format could be improved further). Combine the database changes with other speed enhancements (improved internal storage of package caches, faster pkgname/depends searches) and this pacman release is notably faster.

Until now, a great way to break your system during an update was to run out of disk space. Pacman now attempts to avoid this in two ways. Firstly, it will (optionally) calculate the amount of disk space needed to perform the update/install and check that your partitions have enough room. Doing this calculation is actually fairly involved and I’m sure we will encounter some case of a filesystem and platform combination that we have not tested where this calculation is not correct… I know for certain that it does not work in chroots. The “solution” in these cases is to disable this check in pacman.conf and make a bug report with all the details needed to replicate the issue (except the chroot case). As a second line of defence for disk space issues, pacman will report any extraction error it encounters and attempt to stop installation on the important ones.

A much missed feature in pacman-3.4 was the ability to select which packages you wanted to install from a group. Well, that is back and better than ever! Additionally, the selection dialog is also extended to package provisions, allowing the user to select which provider package they want installed rather than pacman just installing the first one it found.

A feature that will primarily affect packagers is the removal of the “force” option that would result in packages being installed from the repo even if the version was not considered newer by pacman. This was useful for packages with weird versioning schemes (is that “a” for alpha or the first patch level?), but it resulted in strange update behaviour for those who had built themselves newer versions of a package locally. This has been replaced by the use of an optional “epoch” value in the package version – so a “complete” package version looks like epoch:pkgver-pkgrel. If present, the value of the epoch field takes precedent over the rest of the package version.

The main addition to makepkg is the ability to run a check() function between build() and package(). This optional function is useful for running package test suites (or even better, not running them in the early builds when bootstrapping a package). Other changes include the removal of STRIP_DIRS (now all files are stripped by default), adding a buildflags option to disable CFLAGS etc, and allowing the use of $pkgname in split package functions.

For a more complete list of changes in pacman-3.5, see the NEWS and README files in the source code.