Basic Overview of Pacman Code

I am far from knowledgeable about most areas of the pacman code-base, so whenever I want to implement a new feature I first have to sit down and walk step-by-step through a transaction and figure out the code path. Given I have now done this several times, I thought it a good idea to post it for further reference.

Here is my brief overview of what happens with a pacman -S <pkg> transaction:

pacman/pacman.c:
 -> int main(int argc, char *argv[])
      - parse command line
      - parse config file
 
pacman/sync.c:
 -> int pacman_sync(alpm_list_t *targets)
      - check we have a pacman database
      - check target list for SyncFirst packages
 
 -> static int sync_trans(alpm_list_t *targets)
      - initialise transation
      - process the target list
 
libalpm/trans.c:
 -> int SYMEXPORT alpm_trans_prepare(alpm_list_t **data)
      - perfoms checks (via _alpm_trans_prepare)
         - requested packages have valid architecture
         - all deps are available
         - deps do not directly conflict (not files)
 
 -> int SYMEXPORT alpm_trans_commit(alpm_list_t **data)
 
libalpm/sync.c:
 -> int _alpm_sync_commit(pmtrans_t *trans, pmdb_t *db_local, alpm_list_t **data)
      - download needed files
      - deal with deltas (if any)
      - check package integrity
      - check file conflicts
      - remove packages (conflicts/replacements)
      - install targets

Doing a pacman -Syu takes you down exactly the same code path with the only difference being that the package databases are updated and the list of packages to be updated is calculated in the pacman_sync function.

While that is a fairly basic overview, it has more than enough detail for me to locate where I should implement checking for free disk space before proceeding with a package install (which has been a long time feature request for pacman).

I hope this also shows people that the pacman code is not that complex. There are quite a few old bug reports/feature requests in the bug tracker that are obviously very low on the developers priority list and are good candidates for new contributors. Just step through the code until you find the relevant section and then get started!

Pacman 3.4.0 Released

As Dan has already posted about, pacman-3.4.0 has been released. There are a bunch of new features that I am really enjoying.

Firstly, when updating it database, pacman will only extract the new entries. This is similar to what Xyne’s rebase script does (without all the extra output). I had not realised how awesome this feature was until I updated my chroots this morning. It speeds the process up immensely. The chroots using pacman-3.4 extracted the [extra] repo database with a barely noticeable pause while those using pacman-3.3 took a while.

The other feature that I am enjoying is the addition of a functional ‘which’ to the file ownership query. In the past, to find the owner of a binary in my path I would do something like pacman -Qo $(which makepkg) or provide the full path manually. Now pacman will search for binaries in your path automatically, so this is achieved with pacman -Qo makepkg.

Installing packages with pacman -U has received a nice overhaul, allowing pacman to handle package replacements and install needed dependencies all in one transaction. No more removing a package with pacman -Rd and then installing its replacement.

And makepkg also received its share of upgrades. It now automatically exits on build/packaging errors in PKGBUILDs so there is no more need to have “|| return 1” after the commands. Package splitting has improved with pkgver, pkgrel and arch now being able to be overridden and being able to only build subsets of a split package.

Of course, many other features made it into this pacman release. As always, many changes will hopefully never be noticed by a user (e.g. checking a package architecture matches the system architecture before installing, a major rewrite of the pacman bash completion, overhaul of tests in makepkg, more configurable library stripping during packaging), but all these are very useful contributions. See here for a more detailed summary of the changes and the git log for all the details of changes.

A pacman-3.4.0 package is currently in the [testing] repository for Arch Linux. We all know pacman releases are bug free (as the two patches already in the 3.4.1 queue can attest), so look forward to it being in a [core] repo near you in the not too distant future.

New PKGBUILD syntax options with pacman-3.3

With the pacman 3.3 release expected in the coming weeks, I thought I would write about some of the new features that have been added to PKGBUILDs.

The most common change people will make in their PKGBUILDs is to add a package() function. This limits the of fakeroot to only during the file installation steps (so it is not used during the build process). Using fakeroot only during the install stages is considered a “good thing”, but this also provides a workaround for some bugs in fakeroot that can cause issues while attempting to compile a package. A partial example:

build() {
  cd $srcdir/$pkgname-$pkgver
  ./configure --prefix=/usr
  make
}
 
package() {
  cd $srcdir/$pkgname-$pkgver
  make DESTDIR=$pkgdir
  install -Dm644 $srcdir/license $pkgdir/usr/share/licenses/$pkgname/license
}

Note the “cd” step is required in the package() function as makepkg currently does not remember what directory was being used between the build() and package() functions. The package() function is entirely optional, so all PKGBUILDs without one will continue to work as they always have.

The other main feature addition to PKGBUILDs is the ability to create split packages. In Arch Linux, this is useful for packages that are split due to providing separate packages for libraries and binaries (e.g. gcc) or where documentation is too large to justify distributing together with the main package (e.g. ruby). The Arch KDE-4.3 release will also use package splitting, as many lesser used components pulled in a large number of dependencies and made an unsplit KDE install very heavy.

Creating split packages is rather simple. All you have to do is assign an array of package names to the pkgname variable. e.g.

pkgname=('pkg1' 'pkg2')
pkgbase="pkg"

This tells makepkg that it is creating two packages called pkg1 and pkg2. The pkgbase variable is optional, but can be used to hold (e.g.) the upstream package name and is used by makepkg in its output. Each split package requires its own package() function with name in the format package_foo(). e.g. for the above pkgname, the PKGBUILD would have functions package_pkg1() and package_pkg2(). In these functions, the use of the $pkgdir variable is mandatory as it is no longer equivalent to the deprecated $startdir/pkg. All options and directives for the split packages default to the global values given within the PKGBUILD. Most variables can be overridden within the package function. e.g

...
depends=('glibc')
makedepends=('perl')
...
package_pkg1()
{
  depends=('perl')
}
 
package_pkg2()
{
...
}

The pkg1 package will depend on perl while pkg2 does not override the depends array so will depend on glibc. As a general rule, almost every package in the split packages depends array should probably be present in the global makedepends array.

There are several other useful features added to makepkg such as improved handling of info files (automatic removal of $pkgdir/usr/share/info/dir and compression of info files) and being able to specify LDFLAGS in makepkg.conf. Check out a detailed list of changes in the NEWS file in pacman git repo.

For those that want to try out makepkg before the pacman 3.3 release, you can grab a copy of my makepkg-git package which installs alongside the current version of pacman.

Edit – clarified PKGBUILD directive overrides for split packages and added an note about makepkg-git