Pacman Translations

I was listening to Frostcast in the background today when I heard my name. That always makes me pay some attention. Then I heard wrong information. I don’t know why I care, but I do… so here goes the clarification.

The quote from Philip Müller at 14:35 into the podcast:

The lastest news was Allan McRae – he is a developer of pacman himself – he sent me an email to send over all the translations of Manjaro distribution does. So I forked pacman, and pacman itself has 20 translations and our branch has 44 translations of the same software so Arch Linux is asking us to be upstream and give them our translations…

OK… This is interesting. Time for some background here. When pacman-4.1 was released, we removed the broken SyncFirst option. This is needed by Manjaro Linux to run their update helper script that “fixes” the update process to remove any manual interventions. So Manjaro reverted our patch and brought back SyncFirst to pacman. That required three additional strings to be translated for their version of pacman so they also forked our translation project on Transifex.

As the Arch and Manjaro versions of these projects had started to diverge, I wrote to Phil noting that people were doing more than just translating those three additional strings, and it would be good if the translators were pointed at the Arch project so we all benefited, given the Arch project is the one the pacman developers set up.

Lets compare the status of the Arch and Manjaro translations as of 2013-09-24. There are 24 languages with complete translations in the Arch projects, and being nice and ignoring the additional three strings in the Manjaro project, they have 23. (Of those 23, only 6 actually have the additional three Manjaro strings translated). What are the differences? Manjaro has a complete Hungarian translation while Arch has complete Korean and Romanian translations. The Arch Hungarian translation is at 99%, while the Manjaro Korean and Romanian are at 21% and 62% respectively. So it is clear these languages have diverged since the split, with most of the work done in Arch.

Of the remaining languages with incomplete translations, Manjaro has 19 languages, while Arch has 15. Clearly not a total difference of 20 to 44 languages as claimed. Looking at these in more detail, 9 languages have not deviated between the two projects. The Arabic, Chinese (Taiwan), Dutch, Galician, Polish, Serbian (Latin) translations have all got additional translations in the Arch project since the split with the Manjaro project. So apart from languages that have been have had translations started in Manjaro but not in Arch, the Arch project is behind in 3 strings for the Hungarian language.

Maybe where the Arch translation project for pacman could gain is from the new languages in the Manjaro translation: Czech (Czech Republic) [99%], Bulgarian (Bulgaria) [62%], Uzbek [14%] and Danish (Denmark) [3%]. Also note that 3/4 of those languages have a sub-name there. Taking “Danish (Denmark)” as an example, there is already a “Danish” translation (language code: da) and this is adding a Denmark specialization (language code: da_DK). I might be entirely wrong here, but are there other variants of Czech, Bulgarian and Danish apart from their primary usage, or are these exactly the same and the work is just being repeated?

In summary, the translation project set up by the pacman developers is, and will remain, the upstream translation. I just approached Manjaro to send their translations our way so we would both benefit. Arch from (potentially) more translations, and it would be easier for Manjaro to merge their string translations without ending up removing several hundred perfectly good translations.

We Are Not That Malicious…

I will clarify this just because I have had several people ask me already. No, we did not remove the SyncFirst option in pacman to deliberately cause issues for Manjaro Linux. In fact, it was first discussed in Feburary 2012 and, as far as I can tell, Manjaro has only been around from late March 2012 (looking at the earliest commits in their git repository).

So lets keep the conspiracy theories to a minimum! (or at least come up with a better one…)

Pacman-4.1 Released

I have just released pacman-4.1 and packages are now in the [testing] repo. This is the first time I have made a release for any software project, so I was glad to have released a 4.1RC a few weeks back to learn everything that needed to be done.

It has been over a year since the pacman-4.0 release and there have been a large number of contributions made:

$ git shortlog -n -s --no-merges v4.0.0..v4.1.0
   239 Allan McRae
   185 Dan McGee
   158 Dave Reisner
    52 Andrew Gregory
    23 Simon Gomizelj
    20 William Giokas
    19 Florian Pritz
    15 Daniel Wallace
    ...

I win this time! Apart from the usual three contributors, it was great to see other people regularly helping out, both in providing and reviewing patches. A particular thanks to Andrew Gregory who helped me figure out how to fix something on several occasions and has been actively commenting on patches sent to the mailing list. His patch count also puts him in the top ten contributors of all time. In total we have 45 people with patches accepted for this release. Also a big thank you to our translators – particularly because I was learning how the system worked and may have required additional strings to be translated on a couple of occasions…

Moving on to what has changed. There have been quite a number of features added to pacman and makepkg and a couple of new helper scripts in this release.

The major feature for the release is tight integration between the package manager and systemd. After much discussion about how best to perform updates on a rolling release system, we realized that it was essential to have updates preformed with minimal other processes running. Also, the security aspects of updates mean that it is essential that these get provided as soon as possible. We felt the best way to achieve this was to perform updates on shutdown. This is achieved through a new daemon, pacmand that monitors and downloads updates in the background. When updates are found, it schedules a reboot of the system (hence the need to integrate systemd). At the moment the timing of the reboots is not configurable, but a timer will pop-up to allow you to delay it for a preset amount of time. Configuration will likely be added in pacman-4.2, when pacmanctl will be ready for general use. Until that release is made, Arch Linux will minimize the impact by performing all updates in its [testing] repository and only push updates on a yet to be decided day and time of the week. A news post will be made when that is decided.

Of course, all this makes systemd a hard dependency of pacman. We felt this was acceptable given Arch Linux has officially switched to using systemd. As this release is not tested (and unlikely to work) on systems without systemd, Arch users or other distributions using pacman will be required to make the switch to systemd if they want to continue using pacman as their package manager. The integration with system will become tighter in pacman-4.2 where we plan to use the upcoming kdbus message passing interface – through libsystemd-bus – to allow other programs to interact with pacman, making the development of alternative front-ends easier.

In terms of output, there has been improvements in a couple of areas. First colour support was added. This had been floating around for a long time, but no-one had ever spent the time to create a patchset and submit it. I think the colours for a simple update look good, although those when searching are a bit… rainbow. This can be only configured on or off at the moment. Extra informational output has been added for optdepends, providing details about whether an optdepend is installed or not and giving a warning when removing a package that is an optdepend for another. This also provides the groundwork for more complete optdepend handling in future releases.

When building packages using makepkg from this release, information about all the files in the package is stored, including permissions, modification times, sizes and checksums (md5 and sha256), etc. These can be checked using “pacman -Qkk“, excluding checksums (which requires additional support to be added to libarchive in order to read them in). Other useful features include never overwriting .pacsave files, but instead giving them a number suffix as needed. We have also polished the package signature checking, improving key importing and allowing configuration on how to validate packages installed with “pacman -U“, both using local files and from remote sources.

There are a few improvements to package building too. I have covered support for VCS packaging in makepkg previously, with bzr, git, hg and svn packages just requiring an appropriate line in the source array. Also a pkgver() function can be added to automatically update the pkgver variable in the PKGBUILD. With these VCS source lines, or any other source that is volatile, the value “SKIP” can be used in the checksum array.

An optional prepare() function can now be used in a PKGBUILD for preparation of the sources, such as patching and sed alterations. This function is run after the extraction of the sources and not run when --noextract is used, allowing operations that should only ever been run once on the sources to be skipped. Finally, a new debug option is available that will result all the debug symbols that are stripped from binary files to be stored in a separate package, which can be installed to allow easier debugging (another feature that has had patches floating around for a while).

Finally, two new helper scripts have been added to the contrib section: checkupdates and updpkgsums. The checkupdates script allows you to safely check for package updates without altering the system pacman remote databases. The updpkgsums script will perform an in place update of the checksums in a PKGBUILD, although more complex PKGBUILDs (such as those with different sources for each architecture) will not likely work…

So a long post, but this is a big release! There are enough of running the git version that it should be completely bug free, but just in case I am wrong report any issues to the bug tracker.

Edit: Yes – some of this was April Fools… (moderated comments are now restored too).

Pacman 4.1.0rc1

For those that are mildly adventurous, you can try the pre-release of the upcoming pacman-4.1. There are a handful of us who constantly run pacman from git so it should be fairly safe. All bugs found are to be reported to the bug tracker. (Only one issue found so far – in the rarely used pkgdelta script).

Download: i686 x86_64

I’ll make a post about all the new features when the final 4.1.0 release is made – hopefully before the end of the month.

Advantage of a Simple “Database” Format

One fairly common criticism of the pacman package manager is that is very slow due to not using some sort of binary database as its backend. I found suggestions to use sqlite dating back to 2005 (although I am sure they go back further) and mailing list activity peaked around late 2007. Speed is one of pacman’s main features – and it beats the competition by a wide margin according to Linux Format – but I guess people want it even faster.

The problem is that we use a filesystem based “database” where each package has its information stored in multiple files. This means that we can get fragmentation of our “database” and the reading of all these files from the filesystem can be quite slow. Usually most of this is cached by the kernel after the first read so speed improves markedly after the first usage.

This was improved a lot in the pacman 3.5 release (March 2011). The sync databases started to be read directly from the downloaded tarball and the local database had the “desc” and “depends” files for each package merged into one file. This increased the speed of reading from the sync databases massively and was a reasonable improvement to the local database too.

So the local package “database” could be improved by reducing it to one or a few files. But every time I think about changing it, I am reminded why I like the plain text file format. I was updating a reasonably out of date computer when I had an issue with the python-pygame package being renamed to python2-pygame. All packages needing in the Arch Linux repos were rebuilt with the new dependency name, so it did not need a provides entry. But my solarwolf package from the AUR still depended on the old name:

$ pacman -S python2-pygame
resolving dependencies...
looking for inter-conflicts...
:: python2-pygame and python-pygame are in conflict. Remove python-pygame? [y/N] y
error: failed to prepare transaction (could not satisfy dependencies)
:: solarwolf: requires python-pygame

As we have a file based database, adjusting the dependency is easy without rebuilding the package. Just open the relevant file and edit away (or use sed…)

$ vim /var/lib/pacman/local/solarwolf-1.5-5/desc

Now I can see my local database has an issue using the handy testdb tool – solarwolf depends on python2-pygame, but that is not installed.

$ testdb
missing python2-pygame dependency for solarwolf

But now I update as usual, installing python2-pygame which removes python-pygame, and my local pacman database is fully consistent.

I am sure all of this would still be possible if the database was in some other format, but it would have required more tools than a simple text editor. Of course, most people should never need to edit their local database, but I have introduced changes to it several times during pacman development and I consider being able to easily fix or revert these in the category of a “good thing”. And yes, I develop and test directly on my production system…

Of course, it is better to use a real database in performance critical situations. But pacman really does not fall into that category.

Changes To VCS Packaging Support In Makepkg

The current support from building packages from version control systems (VCS) in makepkg is not great for a number of reasons:

  • It relies on obscure (but documented…) variables being specified in the PKGBUILD, which actually achieve nothing in terms of downloading and updating the source as needed.
  • The whole VCS checkout/update mechanism needs repeated across every PKGBUILD that uses it so is a lot of unnecessary code duplication.
  • Building a package from a specific revision/branch/tag/… required using an altered version of this code, resulting in many non-standard work-arounds being made.
  • The automatic updating of the pkgver happens in what may not be an obvious way. For example, the pkgver for git PKGBUILDs is set to the build date, not the date of the last commit. Even if it was the date of the last commit, that can be far from unique. (Why not use git --describe? Because that relies on the tag being something suitable for an actual version number and many repos do not follow this.)
  • Even when a revision number is used for the updated pkgver, this results in different behaviour for different VCS. For example, with hg repos, you have to download/update the repo to determine the latest revision.
  • The updating of the pkgver is done before the makedepends are installed, so can fail if it relies on VCS tools.
  • The --holdver flag stopped the pkgver being updated, but the VCS repo was still updated to the latest version as usual.
  • VCS sources ignore $SRCDEST
  • Offline building (using pre-downloaded sources) required adjusting the PKGBUILD
  • You can not create a source package with the VCS sources included using --allsource

In fact, the issues with the current VCS implementation accounted for almost 10% of the bugs in the pacman bug tracker and there are a number more in the Arch bug tracker about how to improve the supplied prototypes for the VCS PKGBUILDs. It was clearly time for a rewrite.

An idea that had seen some discussion over the years, was to just put the VCS sources in the source array. Makes sense… right? The problem was choosing an appropriate syntax for the URLs that was consistent with what was already used and also flexible enough to handle the various possibilities of a VCS source. The format decided on is:

source=('[dir::][vcs+]url[#fragment]')

Simple! Well, it will be once I explain the parts… The url component should be obvious. The problem with it is that there is often no way to tell that is a VCS source. For example, for git repos without the git protocol enabled on the server, this will start with (e.g.) http://. To work around this, an optional vcs prefix can be added to the URL. So for git over http, you would used git+http://. This is based on the already used syntax when downloading subversion repo over ssh.

At the end of the URL is an optional #fragment. Providing information in a URL after a # character is some sort of standard that I am too lazy to provide a link for… Anyway, it allows us to specify information about what we want to check out when building. For example, I build my pacman-git package using the working branch of my git repo. To check that out, I use:

source=('git+file:///home/arch/code/pacman#branch=working')

Note the use of the git+ prefix there that allows me to check out from a local copy of my repo. The list of recognized fragments is built into makepkg and is documented in the PKGBUILD man page.

Finally, there is the optional dir:: prefix. This allows the specifying of a directory name for makepkg to download the source into. If not specified, makepkg trys to pick a good name from the URL, but there is such variation in VCS URLs that it will be often useful to change it. This is an old, but little known, syntax available in PKGBUILDs, which can be used to rename any source file once it is downloaded.

So now that VCS sources can be used, even multiple different repos to build the one package, how does makepkg chose how to update the pkgver variable? Sort answer is that it doesn’t. You can provide a pkgver() function that outputs a string to be used for the updated package version. This is run after all the sources are downloaded and (make-)dependencies are installed. For my pacman-git package, I use something like:

pkgver() {
  cd $srcdir/pacman
  echo $(git describe | sed 's#-#_#g;s#v##')
}

Currently supported protocols in the master git branch of pacman are git (branch, commit, tag), hg (branch, revision, tag), svn (revision). That covers ~92% of the VCS PKGBUILDs in the AUR. Adding support for the remaining VCS that are used (bzr, cvs, darcs) – or any other VCS – is quite simple but requires knowing how to efficiently use the VCS tools. I will create a patch to support any additional VCS if someone provides me:

  1. How to checkout a repo to a given folder.
  2. What url “fragments” need supported for that VCS.
  3. How to create a working copy of the checked out repo (i.e. “copy” the primary checkout folder) and how to get it to the specified branch/tag/commit/whatever. That can be in all one step.

Note that the old VCS PKGBUILDs will not stop working as such, although they are likely to be broken… At least the pkgver will no longer update. I’m sure there are other subtle incompatibilities too and you would still suffer from all the issues listed above, so it is definitely worth getting proper support for your needed VCS into makepkg.

If you want to take the new implementation for a spin, checkout a copy of the pacman git repo and build it. For those that are somewhat brave, you could even use the pacman-git package in the AUR, but make sure you know the risks involved in running a developmental version of a package manager entails…

The Great Pacman Bug Hunt of 2012

This is a story about a recent issue discovered in pacman, the Arch Linux package manager, and the difficulties we had hunting it down… The story is long, but so was the process of finding the bug.

It all started on a warm summer’s night (in my timezone and location… – it was probably cold and daytime for the other main pacman developers) with the reporting of FS#27805: “[pacman] seg faults when removing firefox”. Of course, my initial reaction was “bull shit” as we all know there are no bugs in the pacman code. But this was only a couple of weeks since pacman-4.0 was moved into the Arch Linux [core] repo so there was an ever so slight possibility it was real.

Luckily for us, the user reporting the bug was very helpful and installed a version of pacman with debugging symbols and gave us a full backtrace. It was very clear where the segfault was occuring:

#0 0xf7fbd4e7 in _alpm_pkg_cmp (p1=0x8128aa0, p2=0x0) at package.c:644

That function is called in the package removal process when we check that a file that is going to be removed with a package is not also owned by another package (which would require someone using -Sf when they should not). If the package in the local database is the same as the one being removed, we do not need to run this check, and hence the test. As you can see above, for some reason _alpm_pkg_cmp is being passed a null pointer as the package from the local database and KABOOM!

So the question was, how do we get a null value for the package from our local database? Given pacman runs through the list of local packages on each package removal, this null entry must have been generated on the removal of the previous package. Here is a bit of background on how package information is stored in pacman. Package information is stored in a hash table that also provides access to the data as a linked list. This provides us with fast look-up by a package’s name but also allows us to loop through the (generally sorted) package list. Now the hash table code is fairly new (first introduced in pacman-3.5) and the removal of items from a hash with collision resolution done by linear probing is not straight forward, so there could be a bug. Dan pointed his finger my way as I wrote the original hash table code and I pointed my finger his way as he made optimizations to the removal part. But it turns out that both of us were not thinking too hard. It is the list that is being corrupted and that has items removed using code that has been around for years. Despite that, the whole hash table and linked list removal code got an in depth review and no issues were found.

We were stumped. Looking at the the debug output from pacman, we could see that a file that actually did not exist on the system was being “removed” right before the crash, but that is not uncommon and appeared to be handled correctly so was unlikely to be the cause. So back to the reporter to see if we could get more information to replicate. He was very helpful and provided us with a copy of his local package database. We created a chroot with exactly the same packages and had no luck replicating. The user even provided us with a complete copy of his chroot where the error was occurring, but again there was no luck replicating. It must be something specific to that users system. Right? Well, even re-extracting the tarball of the chroot the user provided us onto his own system made the bug go away. All in all, a great candidate for being “not a bug”….

Until on another warm summers evening, while being my usual extremely helpful self on IRC, someone mentioned they were getting a segfault while removing packages. A bug report was filed and, again, the user was extremely helpful and the backtrace provided was exactly the same. A core dump showed us there was definitely something wrong with the linked list. Well… bugger! This bug appears real. Again the red-herring of the removal of a non-existent file was shown in the debug log, but it would be very, very strange for that to break the linked list of package information so was ruled out.

It was time to find a reproducer! So I created a chroot and set this script running:

ret=0
while (( ! ret )); do
  pkg=$(pacman -Sql extra | shuf -n1)
  pacman -S --noconfirm $pkg
  pacman -R --noconfirm -s $pkg
  ret=$?
done

Within five minutes I could replicate the segfault. (It turns out I was very lucky as I ran the same script again for over four hours and did not strike the issue.) Now it was time to get debugging!

The first thing I did was print some debugging info in the linked list node removal code, but for some reason the node removal just before the segfault did not print anything. I was only printing information when removing a node from the middle of the list (because that is where the package causing this issue was located), but just to be sure I also added debug statements for the case of removing the head and tail nodes. And then pacman told me it was removing a node from the end of the list… “Why do you think that package is a the end of the list pacman?”, I asked. “Because the head node’s prev entry tells me it IS the end of the list”, replied pacman. “Oh, crap”, I said. “So it does!” Something was clearly wrong here.

It was time to investigate all removal operations on that list. So I printed the entire linked list before and after each package removal and found the error actually occurred before the removal operation even started. The initial list of the local database passed to the removal operation was already broken with the pointer to the tail entry not pointing to the tail. That was good to know as we had thoroughly reviewed the removal code and not found any issues.

This lead me to believe that the error must occur when reading in the local database. Next step: print out the linked list at the end of reading in the local database. But that was completely fine. So somewhere between reading in the local database and using it, things got broken. And, what do we do with the local database between reading it in and removing items from it? The only place where we modify the local database between those points is when it gets sorted by the package names. Sure enough, the pointer to the tail of the linked list is good going into the sort and bad coming out.

This limited the error to two functions: alpm_list_msort or alpm_list_mmerge. These implement a merge sort. Essentially alpm_list_msort recursively calls itself, dividing the list up into smaller pieces until it can not be divided any further and they are then they are merged in sorted order by alpm_list_mmerge. I had just started staring at the code when I saw something that seemed too obvious for such a hard to track down bug. My exact words on IRC were “I think I can fix this…”. And sure enough I could.

It turns out that when alpm_list_msort split a list into two, it did not set the pointer to the tail nodes in the two new lists correctly (or at all…). So a two line addition and we have the bug fixed. It turns out this bug had been present since the start of 2007. So I am still slightly amazed that we did not see it before now and when it did appear that we got a second report of it so quickly.

And why could we not reproduce the issue even with a copy of a chroot where it was occurring? It is entirely dependent on the order the directory entries are returned from the disk. This determined which package was pointed to as the “tail” of the sorted package list. The package incorrectly referred to as the tail had to be removed during a removal operation, and also not be the last package removed, to expose the bug. Given most systems will have many hundreds of packages on them and removal operations tend to involve one or a few packages, this is a fairly rare occurrence. But even if it occurred only a fraction of a percent of removal operations, I think we should have ran into this bug before now. I guess more people probably did experience the issue, but then could not immediately replicate and did not experience the issue again so did not report it.

And that is the end of the story of one of the most frustrating bugs I have ever managed to track down. A big thank you to the two users who installed versions of pacman with debug symbols and provided us backtraces, coredumps and entire chroots! Without their help, we would probably still be not entirely convinced that the bug was real and it would still be hiding away in the pacman source code.

Pacman Package Signing – 4: Arch Linux

I have previously covered the more technical aspects of the implementation of PGP signing of packages and repository databases in pacman. You can read the previous entries here:

Since then, pacman-4.0 has been released and has been in the [testing] repository in Arch Linux for a while. That means that the signing implementation is starting to get some more widespread usage. No major issues have been found, but there are some areas that could be improved (e.g. the handling of the lack of signatures when installing packages with -UFS#26520 and FS#26729). And it has successfully detected a “bad package” in my repo… (well, not really a bad package, but a bad signature. Lesson: do not try creating detached signatures for multiple files at once because gnupg is crap…).

The Arch repos have been gradually preparing for the package signature checking in pacman-4.0. Support for uploading PGP signatures with packages was added in April and was made mandatory from the beginning of November. As of today, 100% of the packages in the [core] repo and approximately 71% of [extra] and 45% of [community] are signed.

So all the components are coming together nicely. But how does this work from a practical standpoint? I’ll start with setting up the pacman PGP keyring and pacman.conf.

When first installing pacman-4.0, you should initialize your pacman keyring using pacman-key --init. This creates the needed keyring files (pubring.gpg, secring.gpg) with the needed permissions, updates the trust database (obviously empty at this point…), and generates a basic configuration file. It also generates the “Pacman Keychain Master Key”, which is your ultimate trust point for starting a PGP web of trust. You may want to change the default keyserver in the configuration file (/etc/pacman.d/gnupg/gpg.conf) as some people have issues connecting to it.

The set-up of your pacman.conf file is somewhat a matter of personal preference, but the values I use are probably reasonable… I have the global settings for signature checking as the default value (Optional TrustedOnly). This basically sets the need for signatures to be optional, but if they are there then the signature has to be from a trusted source. See the pacman.conf man page for more details. For the Arch Linux repos with all packages signed, I set PackageRequired which forces packages to be signed but not databases. (For the small repo I provide, I use Required as both packages and databases are signed.)

Lets look at some output when installing a signed package:

# pacman -S gcc-libs
warning: gcc-libs-4.6.2-3 is up to date -- reinstalling
resolving dependencies...
looking for inter-conflicts...
 
Targets (1): gcc-libs-4.6.2-3
 
Total Installed Size: 2.96 MiB
Net Upgrade Size: 0.00 MiB
 
Proceed with installation? [Y/n]
(1/1) checking package integrity [######################] 100%
error: gcc-libs: key "F99FFE0FEAE999BD" is unknown
:: Import PGP key EAE999BD, "Allan McRae ", created 2011-06-03? [Y/n] y
(1/1) checking package integrity [######################] 100%
error: gcc-libs: signature from "Allan McRae " is unknown trust
error: failed to commit transaction (invalid or corrupted package (PGP signature))
Errors occurred, no packages were upgraded.

As you can see, pacman struck a package that had a signature from an unknown key. It then asks if you would like to import that key. Given the PGP key fingerprint matches that published in multiple places, importing that key seems fine. Then pacman errors out due to that key not being trusted. Well, that Allan guy seems reasonably trustworthy… so I could just locally sign that key using pacman-key --lsign EAE999BD and that key will now be trusted enough to install packages.

Validating every Arch Linux Developer’s and Trusted User’s PGP key would soon become annoying as there are a fair number of them (35 devs and 30 TUs – with some overlap). To make this (a bit…) simpler, five “Master Keys” have been provided for the Arch Linux repositories. The idea behind these keys is that all developer and TU keys are signed by these keys and you only need to import and trust these keys in order to trust all the keys used to sign packages. These key fingerprints will be published in multiple places so that the user can have confidence in them (see the bottom of this post for a listing of the fingerprints obtained relatively independently of those listed on the Arch website).

To set-up your pacman keyring with these keys, you can do something like:

for key in FFF979E7 CDFD6BB0 4C7EA887 6AC6A4C2 824B18E8; do
    pacman-key --recv-keys $key
    pacman-key --lsign-key $key
    printf 'trustn3nquitn' | gpg --homedir /etc/pacman.d/gnupg/
        --no-permission-warning --command-fd 0 --edit-key $key
done

That will import those keys into your keyring and locally sign them. But that is not quite enough as those keys are not used to sign packages themselves. In order for pacman to trust PGP keys signed by the master keys you have to assign some level of trust to the master keys. The final line gives the master keys “marginal” trust. Note I use gpg directly rather than pacman-key as pacman-key does not understand the --command-fd option. You could use pacman-key --edit-key if you wanted to manually type in the commands to set the trust level. By default, the PGP web of trust is set up such that if a key is signed by three keys of marginal trust, then that key will be trusted. (We have five master keys rather than the minimal three so that we can revoke two – a worst case scenario… – and still have our packages trusted.) Note that setting the master keys to have marginal trust serves as a further safety mechanism as multiple keys would need to be hijacked to create a key that is trusted by the pacman keyring.

Now that the five master keys are nicely imported into your pacman keyring, any time pacman strikes a package from the Arch Linux repos with a signature from a key it does not know, it will import the key and it will automatically be trusted. At least that is the idea… We are still in a transition period so not all Developer and Trusted User keys are fully signed yet by the master keys yet, but we are not too far off. In the future we might provide a pacman-keyring package that streamlines this process a bit, or at least will save the individual downloading of each packager’s key.

That just leaves the signing of the databases, but that is a story for another day!


Arch Linux Master Key fingerprints:
    Allan McRae – AB19 265E 5D7D 2068 7D30 3246 BA1D FB64 FFF9 79E7
    Dan McGee – 27FF C476 9E19 F096 D41D 9265 A04F 9397 CDFD 6BB0
    Ionuț Mircea Bîru – 44D4 A033 AC14 0143 9273 97D4 7EFD 567D 4C7E A887
    Pierre Schmitz – 0E8B 6440 79F5 99DF C1DD C397 3348 882F 6AC6 A4C2
    Thomas Bächler – 6841 48BB 25B4 9E98 6A49 44C5 5184 252D 824B 18E8

Pacman Package Signing – 3: Pacman

And on with the “final” component of the package signing saga… I have previously posted about signing packages and databases and managing the PGP keyring, which was all preparatory work for pacman to be able to verify the signatures.

In the end, most people will not notice pacman verifying signatures unless something goes wrong (at least once it is configured). You will see the same “checking package integrity” line, but instead of verifying the packages md5sum, the PGP signature will be checked if available. But implementing this required substantial reworking of the libalpm backend, with the adding of signature verification abilities through the use of the gpgme library, adding flexible configuration options to control repo and package signature verification, changes to how and when repo databases get loaded (so that we can error out early if the repo signature is bad), and the list goes on… The majority of this was done by Dan McGee, who is the lead pacman developer. In fact, looking at the git shortlog for this development cycle:

$ git shortlog -n -s --no-merges maint..
   296  Dan McGee
   128  Dave Reisner
   124  Allan McRae
   ...

(followed by 18 other contributors with 11 or less commits each). So Dan takes the clear lead with about 50% of all commits in this developmental cycle, while the battle for second place remains intensely competed for!

So what have we ended up with? My opinion is ever so slightly biased, but I think we have ended up with the most complete and flexible package signing implementation yet. Most other package managers signature checking is simply a call to gpgv, which trusts any signature in your keyring. With the more complicated solution using gpgme, pacman has the complete concept of the web of trust, allowing for very precise keyring management. We not only sign packages, but sign databases too. Importantly, we can add expiry times to those signatures, which together prevents a malicious mirror holding back individual package updates or deliberately not providing any updates at all. As an aside, we also now protect against the “endless data attack” where an attacker sends an endless data stream instead of the requested file. Together that covers the most well reported avenues of attack on package managers (I hesitate to say “all” despite not knowing of any others because someone will prove me wrong!).

Onward to the actual use of signature checking in pacman. The main adjustment needed to be made is the addition of the SigLevel directive to pacman.conf. This can be specified at a global level and also on a per-repo basis. The SigLevel directive takes three main values: Required, which forces signature checking to be performed; Optional (default), which will check signatures if present but unsigned packages and databases will be accepted; and Never, which sets no signature checking to be preformed. More fine grained control can be added by prefixing these options with Database and Package and combining multiple options. For example, I have a local repo that has a signed database but not all packages have signatures. So I use SigLevel = Optional for my global default and add SigLevel = DatabaseRequired to enforce the database to be validly signed for that repo. Alternatively, I could use SigLevel = DatabaseRequired PackageOptional to explicitly achieve the same result. You can also specify the level of trust needed in a signing key using the TrustedOnly (default) and TrustAll options. The former will only accept a key if it is fully trusted by your PGP keyring, while the latter only requires the key to be present in the keyring (much like using gpgv).

As I wrote earlier, there is very little change from a users perspective once configured. About the only thing that is really noticeable is that pacman will attempt to download a signature for each database it downloads when the database SigLevel is set to Required or Optional. For example:

$ pacman -Syu
:: Synchronizing package databases...
 allanbrokeit           1464.0B  540.5K/s 00:00:00 [######################] 100%
 allanbrokeit.sig        287.0B    7.0M/s 00:00:00 [######################] 100%
...

Beyond that, the checking of PGP signatures occurs during the usual package integrity check stage so will go largely unnoticed unless something goes wrong. This is both a good thing (we all like pacman because of its simplicity) and a bad thing (as the large amount of work done here is not particularly visible to the user). So when everything with package are repo database signing just works for you, remember to thank your local pacman developer (and if it all goes wrong, it was not our fault…).

Pacman Package Signing – 2: Pacman-key

In this second part of the ongoing series of articles about the implementation of package signing in pacman, I am going to focus on keyring management and the new tool pacman-key that is provided to help with this. You can read the previous entry covering makepkg and repo-add here.

The way in which the PGP keyring for pacman is managed will be an essential aspect of the security of your system. The keyring (in combination with configuration options for pacman itself), will control which package and database signatures that you trust and thus what packages get onto your system. In fact, I’m still not entirely sure how best to set-up the keyring in terms of importing keys and setting their trust levels as the only repo I currently use that has full signing is my own and for that I can just add my own key with ultimate trust. Adding my key with ultimate trust would not be ideal for other people to do, but then again it may be acceptable given it is in a keyring for pacman only. But this is more the social aspect of PGP signing so I will leave discussing that further to another time.

However the keyring is set-up, it is helpful to have a tool to manage it. While this could be done directly using gpg with the --homedir flag, there are a few pacman specific keyring management issues that warranted the creation of a separate tool. Enter pacman-key. Originally this was a port of Debian’s apt-key to pacman by Denis A. Altoé Falqueto, but has slowly become closer to being just a gpg wrapper with additional functions. I’ll also add a shout-out to both Ivan Kanakarakis and Pang Yan Han who also contributed multiple patches towards this script and Guillaume Alaux who provided the initial man page.

The pacman keyring will be located (by default) in /etc/pacman.d/gnupg (although this can be adjusted using the GPGDir directive in pacman.conf). The keyring should be set-up using pacman-key --init to ensure the files have the correct permissions for full pacman signature checking functionality. For example, to verify package signatures as a user (e.g. using pacman -Qip <pkg>), we need to let the user have read permissions on the keyring files and also add a gnupg configuration file to prevent the creation of a lock file (this is currently required to be done globally as the gpgme library used by pacman does not have the ability to control lock file creation…).

Keys can be added to the pacman keyring in several ways. They can be imported from a local file or files using pacman-key -a/--add <file(s)> or from a public key server using pacman-key -r/--receive <keyserver> <keyid(s)>. You can also --import entire sets of keys and trust dbs from other gnupg keyrings you have. Keys are removed using the -d/--delete option. There is also a mechanism for a distribution or other repo provider to supply a keyring containing all their packagers’ PGP keys to be imported into the pacman keyring, but this area is still undergoing development.

Once you have some keys in your keyring, you can manipulate them using pacman-key and some standard gnupg flags including --edit-key, --export, --list-{keys,sigs}, etc. The --edit-key option is fairly important as it allows you to do things like adjust the trust levels or locally sign keys in the keyring, which builds our web of trust. For any more advanced manipulation of the keyring (or just something that is not wrapped by pacman-key), you need to use gpg directly (although I am sure that if it turns out that a commonly used command is not currently wrapped by pacman-key, it can be added on request…).

And that is basically all there is to the pacman-key tool. It is fairly simple but it is also the part of the package signing implementation that has probably received the lowest volume of testing as it is not a script that will be used everyday. If you would like to help test it out while not touching your system pacman, you can build and run it directly from a git checkout. This should get you there:

$ git clone git://projects.archlinux.org/pacman.git
$ cd pacman
$ ./autogen.sh
$ ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
$ make -C scripts
$ ./scripts/pacman-key

Test initializing a new keyring, adding and removing keys, editing a keys trust level, verifying a file with a detached signature (many packages in the Arch repos are already signed) and report any issues you run into.