What You Will Learn
  • How to create an archive manually.

    • How to create a simple package to install a Go binary using the dpkg command.

    • What is the format of the archive. How to check its content.

    • How to create the same package using standard Unix tools.

    • Case study: How to create a package in Go.

  • What happens when you install a package using dpkg.

    • What contains the database.

    • How files are copied to the host.

    • What changed in the database.

    • How to check that the package has been installed.

    • Case study: How to install a package like dpkg in Go.

  • What happens when you install a package using apt.

    • How does the command apt know where to search for packages.

    • What is the format of a repository.

    • What does the command apt update.

    • How apt uses dpkg under the hood.

    • Case study: How to install a package like apt in Go.

A Linux package is a bundle of files that your package manager knows how to unpack on your system. Installing packages is something you are doing regularly and I suggest that we are looking under the hood to understand the steps between the creation and the installation of a Linux package.

Prerequisites

I assume you have already installed many Linux packages. A basic comprehension of the languages C and C++ is required and being familiar with the Go language will be helpful to follow the case studies.

Table of Contents

This post is long, really long.

The repositories dpkg and apt contain more than 100,000 lines of code.

When trying to explain how code works, there is a though balance to find between showing the code untouched, and simplifying it at the risk of denaturing it. In this post, I decide to use both approaches. I present the original code slightly annotated, removing only debug messages and the support of command flags not covered in this article. I also present a minimal rewrite of these programs in Go richly commented. Overall, that represents a lot of code, but as developers, we are used to skim over large codebase, and I hope you will find your way.

In addition, there are many asides to explain some Dpkg and Apt features that you can safely skip if you are already familiar with the tools.

Please remember that if you find the post too long to read, just imagine how long it was to write it 😁. Happy reading!

How to create a package manually

Linux packages are commonly available in a .deb and a .rpm file.

  • The .deb files are meant for distributions of Linux that derive from Debian (Ubuntu, Linux Mint, etc.).

  • The .rpm files are used primarily by distributions that derive from Redhat based distros (Fedora, CentOS, RHEL).

Why two formats?

Because there are two main Linux distributions: Red Hat and Debian and each one has its own file formats: .rpm for Red Hat Package Manager and .deb for Debian.

Both package formats have a lot in common and we will only discuss Debian packages in this document. The following table summarizes the main differences between the archive files.

.rpm .deb

Archive Format

Uses the cpio command and file format.

Uses the ar command and file format.

Package Manager

rpm
(1997, Written in C)

dpkg
(1993, Written in C)

Frontend Package Manager

yum
(2011, Written in Python)

apt
(1999, Written in C++)

Database

/var/lib/rpm

/var/lib/dpkg

Database Format

Berkeley DB files

DEB 822 flat files

A package is a collection of files to distribute applications or libraries via the Debian package management system. The aim of packaging is to allow the automation of installing, upgrading, configuring, and removing computer programs in a consistent manner.

What You Need to Know About the Debian Package Format

A .deb file is an ar archive. The ar command is an ancestor of the common tar command and was already present in the first Unix version in 1971! Now, this command is (mostly) only used by Debian packages. This archive contains 3 files:

  • debian-binary: A text file containing 2.0\n. This states the version of the deb file format. For 2.0, all other lines get ignored.

  • data.tar.gz: A tar archive containing all files that will be installed with their destination paths

./
./sbin/
./sbin/parted
./usr/
./usr/share/
./usr/share/man/
./usr/share/man/man8/
./usr/share/man/man8/parted.8.gz
./usr/share/doc/
./usr/share/doc/parted/
./usr/share/doc/parted/README.Debian
./usr/share/doc/parted/copyright
./usr/share/doc/parted/changelog.Debian.gz
./usr/share/doc/parted/changelog.gz
  • control.tar.gz: A tar archive containing various files useful for the dpkg command to do its job: metadata about the package (control) including the list of required dependencies, the md5 sums of every data file to check integrity (md5sums), and also maintainer scripts (ex: postinst for post-installation, prerm for pre-removal, etc.), which are executables that must be run when installing or removing a package.

control
md5sums
postinst
prerm

Further documentation:

You can also learn more about Debian packages by installing a Debian package 😀 (the PDF is also available online):

$ apt install packaging-tutorial
# Check /usr/share/doc/packaging-tutorial/packaging-tutorial.pdf
What You Need to Know About the Command dpkg

The project Dpkg started in 1994, at the same time the Debian package format was created, and thus the command dpkg works only with .deb binary archives. You must provide the archive as the command does not know how to retrieve it by itself. The command manages a database stored under /var/lib/dpkg to keep note of everything that is installed on the server, which is essential to determine what to clean when you remove a package.

Note that the command dpkg --build redirects to the command dpkg-deb --build and the command dpkg --list redirects to the command dpkg-query --list. The code of these commands is present in the same repository in ./dpkg-deb/ and ./src/querycmd.c respectively.

To illustrate this post, we will use the Hello World example present in the Go by example tutorial.

$ cat > hello.go << HERE
package main
import "fmt"
func main() {
    fmt.Println("hello world")
}
HERE
$ go run hello.go
hello world
$ env GOOS=linux GOARCH=amd64 go build hello.go # Make sure to build for Linux
$ ls
hello    hello.go
$ chmod +x hello
$ ./hello
hello world

Our goal is to package this binary and the most popular solution to build a Debian package for a Go program is the utility dh-golang. As we want to use the most basic commands to get as close as possible to the process, we will use the standard dpkg command even if that means not building a world-class Debian package.

Prerequisites

To test the packages we are going to build and install, we will use a Debian VM in order to keep your system safe. We will use Vagrant to create this server. Make sure Vagrant is installed on your system by following the installation procedure for your operating system.

There is a companion GitHub repository julien-sobczak/linux-packages-under-the-hood to this blog post. This repository is optional for this article. It mostly contains a Vagrantfile to start the virtual machine, the files to create various Debian versions of the package hello, and also the Go code that reimplements minimal versions of the dpkg and apt commands. You will find more information in the README.md file of this repository.

Then:

$ mkdir sandbox
$ cd sandbox
$ vagrant init
$ echo > Vagranfile <<EOF
# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  config.vm.box = "debian/buster64"
end
EOF
$ vagrant up
# wait a few minutes
$ vagrant ssh
vagrant$ uname -a
Linux buster 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux

When using Vagrant, the directory containing your Vagrantfile is accessible from the virtual machine from the directory /vagrant. We will use it to copy our hello binary program:

$ ls
Vagrantfile
$ cp /path/to/hello .
$ vagrant ssh
vagrant$ cd /vagrant
vagrant$ ls
hello Vagrantfile
All commands whose prompt starts with vagrant# must be run inside the virtual machine. Otherwise, run the commands from your host.

We are ready to create a Debian package for our Hello program.

vagrant# cd /vagrant/
vagrant# mkdir -p ./debian/usr/bin (1)
vagrant# cp hello ./debian/usr/bin/
vagrant# mkdir -p ./debian/DEBIAN (1)
vagrant# cat > ./debian/DEBIAN/control <<EOF
Package: hello
Version: 1.1-1
Section: base
Priority: optional
Architecture: amd64
Maintainer: Julien Sobczak
Description: Say Hello
EOF
vagrant# cat > ./debian/DEBIAN/preinst <<EOF (2)
#!/bin/sh
echo "preinst says hello";
EOF
vagrant# cat > ./debian/DEBIAN/postinst <<EOF (2)
#!/bin/sh
echo "postinst says hello";
EOF
vagrant# tree /vagrant/debian/
|-- DEBIAN
|   |-- control
|   |-- preinst
|   `-- postinst
`-- usr
    `-- bin
        `-- hello
1 The first version of our package hello contains only the binary hello built previously and a DEB822 file control with the package metadata.
2 We also append basic maintainer scripts that displays a message in the console so that we will know when the installation process runs them.
What You Need to Know About the DEB822 Format

This format can be seen as an ancestor of YAML or JSON. Here is an example showing the three supported types of fields:

FieldSimple: simple value
FieldFolded: very long value
 continuing on the next line starting with a space.
FieldMultiline:
 /usr/bin/cmd1
 /usr/bin/cmd2

The format is used by the file control but also by some files in the dpkg database such as /var/lib/dpkg/status. This format is also used by the command apt, which will be covered later.

Further documentation: Check the man page for additional information.

dpkg --build

We will use the command dpkg --build to build our package:

$ apt install fakeroot # install the fakeroot command
$ fakeroot dpkg --build debian hello_1.1-1_amd64.deb (1)
1 This command builds a Debian package, which as outlined before, consists in building an ar archive containing two tar archives: the content of our directory DEBIAN/ in control.tar.gz and the other files in data.tar.gz. We use the fakeroot command to make sure files inside the archive are created with the user root.

We can also reproduce its working using standard Bash commands:

$ apt install binutils # install the ar command
$ apt install fakeroot # install the fakeroot command
$ echo 2.0 > debian-binary
$ cd debian && tar czf ../data.tar.gz [a-z]* && cd ..
$ cd debian/DEBIAN/ && tar czf ../../control.tar.gz * && cd ../..
$ fakeroot ar r hello_1.1-1_amd64.deb debian-binary control.tar.gz data.tar.gz
ar: creating hello_1.1-1_amd64.deb (1)
1 The package will fail most linter checks. Indeed, we ignored many of the best practices that higher-level commands ensure but we will still be able to install this package on our server.

Now is the time to look at the code. Dpkg is written in C, and the function executed by the command dpkg --build is the function do_build in ./dpkg-deb/build.c.

dpkg-deb/build.c
int
do_build(const char *const *argv)
{
  struct compress_params control_compress_params;
  struct tar_pack_options tar_options;
  struct dpkg_error err;
  struct dpkg_ar *ar;
  const char *dir, *dest;
  char *ctrldir;
  char *debar;
  char *tfbuf;
  int gzfd;

  /* Decode our arguments. */
  dir = *argv++;
  dest = *argv++;

  debar = gen_dest_pathname(dir, dest); (1)
  ctrldir = str_fmt("%s/%s", dir, "DEBIAN");

  /* Now that we have verified everything it is time to actually
   * build something. Let's start by making the ar-wrapper. */
  ar = dpkg_ar_create(debar, 0644); (2)

  /* Create a temporary file to store the control data in. */
  tfbuf = path_make_temp_template("dpkg-deb");
  gzfd = mkstemp(tfbuf);
  free(tfbuf);

  /* Select the compressor to use for our control archive. */
  control_compress_params.type = COMPRESSOR_TYPE_GZIP;
  control_compress_params.strategy = COMPRESSOR_STRATEGY_NONE;
  control_compress_params.level = -1;

  /* Fork a tar to package the control-section of the package. */
  tar_options.mode = "u+rw,go=rX";
  tar_options.root_owner_group = true;
  tarball_pack(ctrldir, control_treewalk_feed, &tar_options,
               &control_compress_params, gzfd);

  free(ctrldir);

  /* We have our first file for the ar-archive. Write a header for it
   * to the package and insert it. */
  const char deb_magic[] = "2.0\n";
  char adminmember[16 + 1];

  sprintf(adminmember, "%s%s", "control.tar",
          compressor_get_extension(control_compress_params.type));

  dpkg_ar_put_magic(ar); (3)
  dpkg_ar_member_put_mem(ar, "debian-binary", deb_magic, strlen(deb_magic)); (4)
  dpkg_ar_member_put_file(ar, adminmember, gzfd, -1); (5)

  close(gzfd);

  /* Control is done, now we need to archive the data. */

  /* Start by creating a new temporary file. */
  tfbuf = path_make_temp_template("dpkg-deb");
  gzfd = mkstemp(tfbuf);
  free(tfbuf);

  /* Pack the directory into a tarball, feeding files from the callback. */
  tar_options.mode = NULL;
  tar_options.root_owner_group = opt_root_owner_group;
  tarball_pack(dir, file_treewalk_feed, &tar_options, &compress_params, gzfd);

  /* Okay, we have data.tar as well now, add it to the ar wrapper. */
  char datamember[16 + 1];

  sprintf(datamember, "%s%s", "data.tar",
          compressor_get_extension(compress_params.type));

  dpkg_ar_member_put_file(ar, datamember, gzfd, -1); (6)

  close(gzfd);

  if (fsync(ar->fd))
    ohshite(_("unable to sync file '%s'"), ar->name);

  dpkg_ar_close(ar); (7)

  free(debar);

  return 0;
}
1 The variable dir is the local directory containing the package files to build. The variable dest is the optional filename for the final package file and debar is the final name as determined by the function gen_dest_pathname, which determines a default name if the argument is missing.
2 The function dpkg_ar_create creates the archive file named after the variable debar.
3 The function dpkg_ar_put_magic defines the magic number !<arch>\n telling Linux the file is of type ar.
4 The function dpkg_ar_member_put_mem appends the file debian-binary with the content of the variable deb_magic.
5 The function dpkg_ar_member_put_file appends the file control.tar with the content of a temporary file.
6 Same as above for data.tar.
7 The function dpkg_ar_close is part of the housecleaning logic and simply closes the file descriptor.

Case Study

What follows is a minimal rewrite of this code in Go. The full code is available on GitHub in the repository julien-sobczak/linux-packages-under-the-hood.

cmd/dpkg/main.go
package main

import (
    "archive/tar"
    "bytes"
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "path/filepath"
    "strings"

    "github.com/blakesmith/ar"
)

func main() {
    // This program expects two arguments:
    // - The directory following the resources to package in the archive.
    // - The name of the output .deb file
    if len(os.Args) < 3 {
        log.Fatalf("Missing 'directory' and/or 'dest' arguments.")
    }

    directory := os.Args[1]
    dest := os.Args[2]

    // Create the Debian archive file
    fdeb, _ := os.Create(dest)
    defer fdeb.Close()

    // A Debian package is an archive using the AR format.
    // We use an external Go module to create the archive
    // as the standard library does not support it but supports
    // the tar format that will be used for the control and data files.

    writer := ar.NewWriter(fdeb)
    writer.WriteGlobalHeader()

    // A Debian package contains 3 files that must be
    // added in a precise order.
    // We use two utility functions that will be defined later:
    // - arPutFile is a wrapper around the library to add an entry.
    // - tarballPack creates a tarball using the Go library.

    // Append debian-binary
    arPutFile(writer, "debian-binary", []byte("2.0\n"))

    // Append control.tar
    controlDir := filepath.Join(directory, "DEBIAN")
    controlTarball := tarballPack(controlDir, nil)
    arPutFile(writer, "control.tar", controlTarball)

    // Append data.tar
    dataDir := directory
    dataTarball := tarballPack(dataDir, func(path string) bool {
        // Filter DEBIAN/ files
        return strings.HasPrefix(path, controlDir)
    })
    arPutFile(writer, "data.tar", dataTarball)
}

// arPutFile adds a new entry in a AR archive.
func arPutFile(w *ar.Writer, name string, body []byte) {
    hdr := &ar.Header{
        Name: name,
        Mode: 0600,
        Uid:  0,
        Gid:  0,
        Size: int64(len(body)),
    }
    w.WriteHeader(hdr)
    w.Write(body)
}

// tarballPack traverses a local directory to add all files under it
// into a tarball.
func tarballPack(directory string, filter func(string) bool) []byte {
    var bufdata bytes.Buffer
    twdata := tar.NewWriter(&bufdata)
    filepath.Walk(
        directory,
        func(path string, info os.FileInfo, errParent error) error {
            if info.IsDir() {
                return nil
            }
            if filter != nil && filter(path) {
                return nil
            }
            sep := fmt.Sprintf("%c", filepath.Separator)
            name := strings.TrimPrefix(strings.TrimPrefix(path, directory), sep)
            hdr := &tar.Header{
                Name: name,
                Uid:  0, // root
                Gid:  0, // root
                Mode: 0650,
                Size: info.Size(),
            }
            twdata.WriteHeader(hdr)
            content, _ := ioutil.ReadFile(path)
            twdata.Write(content)

            return nil
        })
    twdata.Close()

    return bufdata.Bytes()
}

To run the code:

$ go run main.go hello hello.deb

To inspect the resulting archive hello.deb, we can use the command dpkg -c to view the data files or use the command ar to view the real content of the archive:

vagrant# dpkg -c /vagrant/hello.deb
-rw-r-x--- 0/0         2034781 1970-01-01 00:00 usr/bin/hello

vagrant# ar -tf /vagrant/hello.deb
ar -tf /vagrant/hello.deb
debian-binary
control.tar
data.tar
vagrant# ar -xf /vagrant/hello.deb data.tar
vagrant# tar -tf data.tar
usr/bin/hello

🎉 We have finished with the format .deb. This completes the first part of this article. We created a Debian package from scratch! Now, we will inspect the installation process.

What happens when you install a package using dpkg

The command to install a Debian binary package file is dpkg -i myarchive.deb and will be the subject of this second part.

dpkg -i

Let’s run the command on our Debian archive:

vagrant# dpkg -i /vagrant/hello.deb
Selecting previously unselected package hello.
(Reading database ... 32264 files and directories currently installed.)
Preparing to unpack /vagrant/hello.deb ...
preinst says hello
Unpacking hello (1.1-1) ...
Setting up hello (1.1-1) ...
postinst says hello

vagrant# hello
hello world

The command does a lot of interesting things and the code is larger than the previous build command. The man page details the installation steps and we will present the main code for every one of them.

The entry point for the installation of a package is the function archivefiles, and most specifically the function process_archive:

src/archives.c
int
archivefiles(const char *const *argv)
{
  int i;

  modstatdb_open(msdbrw_readonly);

  for (i = 0; argv[i]; i++) {
    process_archive(argv[i]); (1)
  }

  process_queue(); (2)

  trigproc_run_deferred();
  modstatdb_shutdown();

  return 0;
}
1 The main function iterates over all packages to install and delegates to the function process_archive for the unpacking.
2 The function process_queue configures all packages that have been unpacked in the previous step. We will explain the differences between these two steps.

Let’s go!

  1. Extract the control files of the new package.

src/unpack.c
void process_archive(const char *filename) {
  
  cidir = get_control_dir(cidir); (1)
  pid = subproc_fork();
  if (pid == 0) {
    cidirrest[-1] = '\0';
    execlp("dpkg-deb", "dpkg-deb", "--control", filename, cidir, NULL); (2)
    ohshite(_("unable to execute %s (%s)"),
            _("package control information extraction"), BACKEND);
  }
  subproc_reap(pid, "dpkg-deb --control", 0);
  
}
1 Create a temporary directory (commonly /var/lib/dpkg/tmp.ci/).
2 Run the command dpkg --control to extract the DEBIAN/ directory into it.

Then, the code parses the control file to initialize the struct pkginfo, which is the main structure to represent a package. (You can check the const fieldinfos in parse.c to find the mapping between the file and the struct.) Here is a minimal version of this structure with the most important fields annotated:

lib/dpkg/dpkg-db.h
/**
 * Node describing an architecture package instance.
 *
 * This structure holds state information.
 */
struct pkginfo {
  struct pkgset *set;

  enum pkgwant want; (1)
  /** The error flag bitmask. */
  enum pkgeflag eflag; (2)
  enum pkgstatus status;
  enum pkgpriority priority;

  struct pkgbin installed; (3)
  struct pkgbin available; (3)

  struct fsys_namenode_list *files; (4)
  bool files_list_valid; (4)

  /* The status has changed, it needs to be logged. */
  bool status_dirty; (5)
}
1 The enum want determines the expected action for this package, like PKG_WANT_INSTALL for installation, or PKG_WANT_PURGE for the removal of the package and its configuration files.
2 The eflag is initialized if the parser finds an error in the control file (ex: missing field), and also later during the installation process.
3 The installed and available fields contain most of the information present in the control files concerning a possible installed version of the package and the new version to install.
4 Some fields like files are initialized later by other functions like db-fsys-files.c#ensure_packagefiles_available, which reads the file /var/lib/dpkg/list/hello.list to populate this field.
5 The status_dirty flag is set when the current status of the package changes, for example from PKG_STAT_UNPACKED to PKG_STAT_INSTALLED.

And now, the function responsible to create this struct:

src/unpack.c
void process_archive(const char *filename) {
  struct pkginfo *pkg;
  
  parsedb(cidir, parsedb_flags, &pkg); (1)
  
}
1 The function parsedb simply reads a file in Debian RFC822 format, the format we used to write the control file.
  1. If another version of the same package was installed before the new installation, execute prerm script of the old package.

unpack.c#process_archive
void process_archive(const char *filename) {
  
  oldversionstatus = pkg->status; (1)

  if (oldversionstatus == PKG_STAT_INSTALLED) {
    pkg_set_eflags(pkg, PKG_EFLAG_REINSTREQ);
    pkg_set_status(pkg, PKG_STAT_HALFCONFIGURED); (2)
    modstatdb_note(pkg); (2)
    if (dpkg_version_compare(&pkg->available.version,
                             &pkg->installed.version) >= 0)
      /* Upgrade or reinstall. */
      maintscript_fallback(pkg, PRERMFILE, "pre-removal", cidir, cidirrest,
                           "upgrade", "failed-upgrade"); (3)
    else /* Downgrade => no fallback */
      maintscript_installed(pkg, PRERMFILE, "pre-removal",
                            "upgrade",
                            versiondescribe(&pkg->available.version,
                                            vdew_nonambig),
                            NULL); (2)
    pkg_set_status(pkg, PKG_STAT_UNPACKED); (1)
    oldversionstatus = PKG_STAT_UNPACKED;
    modstatdb_note(pkg); (1)
  }
  
}
1 The status read during parsing is reused to determine if the package is already installed.
2 Update the package status to keep trace that the package has been partially installed. The status will be changed several times during the installation. The function modstatdb_note persists the new state to disk.
3 maintscript_fallback and maintscript_installed delegates to maintscript_exec defined in the same file src/script.c. This function runs the script in a fork process and aborts if the return code is greater than 0. Differences between the various calls are explained in the next step.
  1. Run preinst script, if provided by the package.

unpack.c#process_archive
void process_archive(const char *filename) {
  
  if (pkg->status == PKG_STAT_NOTINSTALLED) {
    pkg->installed.version = pkg->available.version;
    pkg->installed.multiarch = pkg->available.multiarch;
  }
  pkg_set_status(pkg, PKG_STAT_HALFINSTALLED);
  modstatdb_note(pkg);
  if (oldversionstatus == PKG_STAT_NOTINSTALLED) { (1)
    maintscript_new(pkg, PREINSTFILE, "pre-installation", cidir, cidirrest,
                    "install", NULL);
  } else if (oldversionstatus == PKG_STAT_CONFIGFILES) { (1)
    maintscript_new(pkg, PREINSTFILE, "pre-installation", cidir, cidirrest,
                    "install",
                    versiondescribe(&pkg->installed.version, vdew_nonambig),
                    versiondescribe(&pkg->available.version, vdew_nonambig),
                    NULL);
  } else { (1)
    maintscript_new(pkg, PREINSTFILE, "pre-installation", cidir, cidirrest,
                    "upgrade",
                    versiondescribe(&pkg->installed.version, vdew_nonambig),
                    versiondescribe(&pkg->available.version, vdew_nonambig),
                    NULL);
  }
  
}
1 The function maintscript_new is a variadic function whose latest arguments are passed to the maintainer script to provide context. For example, the preinst maintainer script can be called using one of these formats: preinst install, preinst install <old-version>, or preinst upgrade <old-version>. This allows the package developer to take different actions based on the current state of the package.
  1. Unpack the new files, and at the same time back up the old files, so that if something goes wrong, they can be restored.

This step is similar to running the command dpkg --unpack. The unpacking process is simple to understand: extract every file present in the data.tar to their destination path. But things are not so simple as outlined by this comment:

unpack.c#process_archive
  /*
   * Now we unpack the archive, backing things up as we go.
   * For each file, we check to see if it already exists.
   * There are several possibilities:
   *
   * + We are trying to install a non-directory ...
   *  - It doesn't exist. In this case we simply extract it.
   *  - It is a plain file, device, symlink, &c. We do an ‘atomic
   *    overwrite’ using link() and rename(), but leave a backup copy.
   *    Later, when we delete the backup, we remove it from any other
   *    packages' lists.
   *  - It is a directory. In this case it depends on whether we're
   *    trying to install a symlink or something else.
   *   = If we're not trying to install a symlink we move the directory
   *     aside and extract the node. Later, when we recursively remove
   *     the backed-up directory, we remove it from any other packages'
   *     lists.
   *   = If we are trying to install a symlink we do nothing - ie,
   *     dpkg will never replace a directory tree with a symlink. This
   *     is to avoid embarrassing effects such as replacing a directory
   *     tree with a link to a link to the original directory tree.
   * + We are trying to install a directory ...
   *  - It doesn't exist. We create it with the appropriate modes.
   *  - It exists as a directory or a symlink to one. We do nothing.
   *  - It is a plain file or a symlink (other than to a directory).
   *    We move it aside and create the directory. Later, when we
   *    delete the backup, we remove it from any other packages' lists.
   *
   *                   Install non-dir   Install symlink   Install dir
   *  Exists not               X               X                X
   *  File/node/symlink       LXR             LXR              BXR
   *  Directory               BXR              -                -
   *
   *    X: extract file/node/link/directory
   *   LX: atomic overwrite leaving backup
   *    B: ordinary backup
   *    R: later remove from other packages' lists
   *    -: do nothing
   *
   * After we've done this we go through the remaining things in the
   * lists of packages we're trying to remove (including the old
   * version of the current package). This happens in reverse order,
   * so that we process files before the directories (or symlinks-to-
   * directories) containing them.
   *
   * + If the thing is a conffile then we leave it alone for the purge
   *   operation.
   * + Otherwise, there are several possibilities too:
   *  - The listed thing does not exist. We ignore it.
   *  - The listed thing is a directory or a symlink to a directory.
   *    We delete it only if it isn't listed in any other package.
   *  - The listed thing is not a directory, but was part of the package
   *    that was upgraded, we check to make sure the files aren't the
   *    same ones from the old package by checking dev/inode
   *  - The listed thing is not a directory or a symlink to one (ie,
   *    it's a plain file, device, pipe, &c, or a symlink to one, or a
   *    dangling symlink). We delete it.
   *
   * The removed packages' list becomes empty (of course, the new
   * version of the package we're installing will have a new list,
   * which replaces the old version's list).
   *
   * If at any stage we remove a file from a package's list, and the
   * package isn't one we're already processing, and the package's
   * list becomes empty as a result, we ‘vanish’ the package. This
   * means that we run its postrm with the ‘disappear’ argument, and
   * put the package in the ‘not-installed’ state. If it had any
   * conffiles, their hashes and ownership will have been transferred
   * already, so we just ignore those and forget about them from the
   * point of view of the disappearing package.
   *
   * NOTE THAT THE OLD POSTRM IS RUN AFTER THE NEW PREINST, since the
   * files get replaced ‘as we go’.
   */
What You Need to Know About Conffiles

We still haven’t talked about conffiles. When upgrading a package, you want the package manager to overwrite the previous version of the files, except for configuration files. You don’t want to lose your customizations, don’t you?

A Debian archive can therefore include a file conffiles in the DEBIAN/ directory to list a subset of files present in the data.tar archive. These "conffiles" are files that must be managed specially to take care of preserving user changes.

Conffiles explains the difference between the commands dpkg remove and dpkg purge. (The first command ignores conffiles while the second removes them completely.)

The version 2.1-1 of our package hello defines a different version written in Python, which reads a configuration file /etc/hello/settings.conf, also present in the package. This conffile is referenced in DEBIAN/conffiles.

If we try to create this configuration file manually before installing this new version:

vagrant# mkdir /etc/hello
vagrant# echo "Language: English" > /etc/hello/settings.conf

vagrant# dpkg -i /vagrant/hello/hello_2.1-1_amd64.deb
Selecting previously unselected package hello.
(Reading database ... 25063 files and directories currently installed.)
Preparing to unpack .../hello/hello_2.1-1_amd64.deb ...
preinst says hello
Unpacking hello (2.1-1) ...
Setting up hello (2.1-1) ...

Configuration file '/etc/hello/settings.conf'
 ==> File on system created by you or by a script.
 ==> File also in package provided by package maintainer.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainers version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** settings.conf (Y/I/N/O/D/Z) [default=N] ? Y
Installing new version of config file /etc/hello/settings.conf ...
postinst says hello

vagrant# cat /etc/hello/settings.conf
Language: French

The package manager detects the conflict by keeping a checkum of the last installed version of every conffile (files named md5sums in the database) and asks the user what to do about it. Options exist to avoid the prompt and the default is, of course, to preserve existing conffiles.

The unpacking runs the command dpkg-deb --fsys-tarfile to extract the content of data.tar. The command sends each file to a pipe created in the same function process_archive and delegates to the function tarobject defined in archives.c, which implements all the rules presented in the previous comment. The code is rather obvious but is too long to introduce it in this article.

We can mention that the backup process consists in extracting files with a special extension like .dpkg-tmp, .dpkg-old and .dpkg-new. Files are renamed to their definitive name if no problem occurs, except for conffiles, which must wait until the last installation step to be renamed.

  1. If another version of the same package was installed before the new installation, execute the postrm script of the old package. Note that this script is executed after the preinst script of the new package, because new files are written at the same time old files are removed.

The execution code of the maintainer script postrm is similar to the previous scripts.

What is more interesting is what happens at the end of the unpacking step. Indeed, the Dpkg database is updated to reflect the changes.

What You Need to Know About the Dpkg Database

Dpkg maintains a database under /var/lib/dpkg, which regroups various files including the followings:

file description

/var/lib/dpkg/status

A DEB822 file containing the status information for all packages (i.e., the current state of each package and the fields in their control file).

/var/lib/dpkg/status-old

The last backup of the /var/lib/dpkg/status file.

/var/lib/dpkg/available

The list of packages available for installation or upgrade from external origins only if you are using dselect as your package manager frontend (instead of apt or aptitude). See details. (not described in this article)

/var/lib/dpkg/diversions

The list of diversions used by dpkg and set by dpkg-divert to force a package file to be installed elsewhere. (not described in this article)

/var/lib/dpkg/statoverride

The stats used by dpkg and set by dpkg-statoverride to change the default ownership and mode of the package files. (not described in this article)

In addition, for every installed package, Dpkg keeps a list of additional files:

file description

/var/lib/dpkg/info/<package_name>.list

The list of files and directories installed by the package (the data.tar listing)

/var/lib/dpkg/info/<package_name>.md5sums

The list of MD5 hash values for files installed by the package. Used for example to detect if a conffile had been edited by the user.

/var/lib/dpkg/info/<package_name>.conffiles

The list of configuration files. Same as the conffiles file under DEBIAN/

/var/lib/dpkg/info/<package_name>.{preinst, postinst, prerm, postrm}

Copies of the maintainer scripts present in the package under DEBIAN/.

/var/lib/dpkg/info/<package_name>.config

Debconf-generated configuration files used only by a minority of packages. (not described in this article)

Here are the different functions called to update the different files in the database:

src/unpack.c
void process_archive(const char *filename) {
  

  /* OK, now we can write the updated files-in-this package list,
   * since we've done away (hopefully) with all the old junk. */
  write_filelist_except(pkg, &pkg->available, newfiles_queue.head, 0); (1)

  /* We also install the new maintainer scripts, and any other
   * cruft that may have come along with the package. First
   * we go through the existing scripts replacing or removing
   * them as appropriate; then we go through the new scripts
   * (any that are left) and install them. */
  pkg_infodb_update(pkg, cidir, cidirrest); (2)

  /* We store now the checksums dynamically computed while unpacking. */
  write_filehash_except(pkg, &pkg->available, newfiles_queue.head, 0); (3)

  /* Right, the package we've unpacked is now in a reasonable state.
   * The only thing that we have left to do with it is remove
   * backup files, and we can leave the user to fix that if and when
   * it happens (we leave the reinstall required flag, of course). */
  pkg_set_status(pkg, PKG_STAT_UNPACKED);
  modstatdb_note(pkg); (4)

  ...
}
1 Edit the file /var/lib/dpkg/info/hello.list.
2 Copy all files under DEBIAN/ into /var/lib/dpkg/info/ by prefixing them with the package name hello..
3 Edit the file /var/lib/dpkg/info/hello.md5sums.
4 Update the field Status in /var/lib/dpkg/status for the package hello to set the value install ok unpacked.

We are getting close to the end of the function process_archive. The last instruction is enqueue_package(pkg). This function simply push a new package waiting to be configured in a queue. Since the dpkg command can be executed with several packages to install, the queue ensures all packages have been unpacked before proceeding to their final configuration.

We are now back to the archivefiles function:

src/archives.c
int
archivefiles(const char *const *argv)
{
  int i;

  modstatdb_open(msdbrw_readonly);

  for (i = 0; argv[i]; i++) {
    process_archive(argv[i]);
  }

  process_queue(); (1)

  trigproc_run_deferred();
  modstatdb_shutdown();

  return 0;
}
1 We are here.

What follows is the data structure representing the queue:

src/packages.c
static struct pkg_queue queue = { .head = NULL, .tail = NULL, .length = 0 }; (1)


/*
 * During the packages queue processing, the algorithm for deciding what to
 * configure first is as follows:
 *
 * Loop through all packages doing a ‘try 1’ until we've been round and
 * nothing has been done, then do ‘try 2’, and subsequent ones likewise.
 * The incrementing of ‘dependtry’ is done by process_queue().
 *
 * Try 1:
 *   Are all dependencies of this package done? If so, do it.
 *   Are any of the dependencies missing or the wrong version?
 *     If so, abort (unless --force-depends, in which case defer).
 *   Will we need to configure a package we weren't given as an
 *     argument? If so, abort ─ except if --force-configure-any,
 *     in which case we add the package to the argument list.
 *   If none of the above, defer the package.
 *
 * Try 2:
 *   Find a cycle and break it (see above).
 *   Do as for try 1.
 *
 * Try 3:
 *   Start processing triggers if necessary.
 *   Do as for try 2.
 *
 * Try 4:
 *   Same as for try 3, but check trigger cycles even when deferring
 *   processing due to unsatisfiable dependencies.
 *
 * Try 5 (only if --force-depends-version):
 *   Same as for try 2, but don't mind version number in dependencies.
 *
 * Try 6 (only if --force-depends):
 *   Do anyway.
 */
enum dependtry {
    DEPEND_TRY_NORMAL = 1,
    DEPEND_TRY_CYCLES = 2,
    DEPEND_TRY_TRIGGERS = 3,
    DEPEND_TRY_TRIGGERS_CYCLES = 4,
    DEPEND_TRY_FORCE_DEPENDS_VERSION = 5,
    DEPEND_TRY_FORCE_DEPENDS = 6,
    DEPEND_TRY_LAST,
};
enum dependtry dependtry = DEPEND_TRY_NORMAL; (2)
int sincenothing = 0; (2)
1 The global variable containing the packages to configure.
2 These variables control the algorithm that decides which package must be configured first, which must be postponed, and when to abort the installation completely.

Finally, the logic to empty the queue present in the function process_queue:

src/archives.c
void process_queue(void) {
  struct pkginfo *volatile pkg;
  volatile enum action action_todo;

  while (!pkg_queue_is_empty(&queue)) {
    pkg = pkg_queue_pop(&queue);

    ensure_package_clientdata(pkg);
    pkg->clientdata->enqueued = false;

    action_todo = cipaction->arg_int;

    if (sincenothing++ > queue.length * 3 + 2) {
      /* Make sure that even if we have exceeded the queue since not having
       * made any progress, we are not getting stuck trying to progress by
       * trigger processing, w/o jumping into the next dependtry. */
      dependtry++;
      sincenothing = 0;
      if (dependtry >= DEPEND_TRY_LAST)
        internerr("exceeded dependtry %d (sincenothing=%d; queue.length=%d)",
                  dependtry, sincenothing, queue.length);
    } else if (sincenothing > queue.length * 2 + 2) {
      if (dependtry >= DEPEND_TRY_TRIGGERS &&
          progress_bytrigproc && progress_bytrigproc->trigpend_head) {
        enqueue_package(pkg);
        pkg = progress_bytrigproc;
        progress_bytrigproc = NULL;
        action_todo = act_configure;
      } else {
        dependtry++;
        sincenothing = 0;
        if (dependtry >= DEPEND_TRY_LAST)
          internerr("exceeded dependtry %d (sincenothing=%d, queue.length=%d)",
                    dependtry, sincenothing, queue.length);
      }
    }

    debug(dbg_general, "process queue pkg %s queue.len %d progress %d, try %d",
          pkg_name(pkg, pnaw_always), queue.length, sincenothing, dependtry);

    deferred_configure(pkg); (1)
  }

  if (queue.length)
    internerr("finished package processing with non-empty queue length %d",
              queue.length);
}
1 The function deferred_configure is the main function doing the configuration and is the subject of the next step.
  1. Configure the package.

    1. Unpack the conffiles, and at the same time back up the old conffiles, so that they can be restored if something goes wrong.

    2. Run postinst script, if provided by the package.

The last step uses the same code as the command dpkg --configure, which may be used to reconfigure a package that had already been unpacked.

The configuration step is implemented by the function deferred_configure which focuses on a single package to configure. If the configuration cannot proceed, the package will be enqueued to be reprocessed later or not. Here is a simplified version:

src/configure.c
/**
 * Process the deferred configure package.
 *
 * @param pkg The package to act on.
 */
void
deferred_configure(struct pkginfo *pkg)
{
    struct varbuf aemsgs = VARBUF_INIT;
    struct conffile *conff;
    struct pkginfo *otherpkg;
    enum dep_check ok;

    ok = dependencies_ok(pkg, NULL, &aemsgs); (1)
    if (ok == DEP_CHECK_DEFER) {
        varbuf_destroy(&aemsgs);
        ensure_package_clientdata(pkg);
        pkg->clientdata->istobe = PKG_ISTOBE_INSTALLNEW;
        enqueue_package(pkg);
        return;
    }

    /*
     * At this point removal from the queue is confirmed. This
     * represents irreversible progress wrt trigger cycles. Only
     * packages in PKG_STAT_UNPACKED are automatically added to the
     * configuration queue, and during configuration and trigger
     * processing new packages can't enter into unpacked.
     */
    sincenothing = 0;


    printf(_("Setting up %s (%s) ...\n"), pkg_name(pkg, pnaw_nonambig),
           versiondescribe(&pkg->installed.version, vdew_nonambig));
    log_action("configure", pkg, &pkg->installed);


    if (pkg->status == PKG_STAT_UNPACKED) {
        /* On entry, the ‘new’ version of each conffile has been
         * unpacked as ‘*.dpkg-new’, and the ‘installed’ version is
         * as-yet untouched in ‘*’. The hash of the ‘old distributed’
         * version is in the conffiles data for the package. If
         * ‘*.dpkg-new’ no longer exists we assume that we've
         * already processed this one. */
        for (conff = pkg->installed.conffiles; conff; conff = conff->next) {
            deferred_configure_conffile(pkg, conff); (2)
        }

        pkg_set_status(pkg, PKG_STAT_HALFCONFIGURED);
        modstatdb_note(pkg);
    }

    maintscript_postinst(pkg, "configure",
      dpkg_version_is_informative(&pkg->configversion) ?
            versiondescribe(&pkg->configversion, vdew_nonambig) :
          "",
      NULL); (3)

    pkg_reset_eflags(pkg);
    post_postinst_tasks(pkg, PKG_STAT_INSTALLED); (4)
}
1 In case of a missing dependency, the installation will abort only at this step, after the unpacking of the package files.
2 The function deferred_configure_conffile renames the conffiles still ending with the suffix .dpkg-new created during the unpacking. This function also shows the confirmation prompt.
3 Run the postinst maintainer script.
4 Change the status to PKG_STAT_INSTALLED and force the update in the status database file.

The installation of our package is now completed. We can check the package has been installed by running the hello command:

vagrant# hello
hello world!

Or by using the command dpkg to get the status of the package:

vagrant# dpkg -s hello
Package: hello
Status: install ok unpacked
Priority: optional
Section: base
Maintainer: Julien Sobczak
Architecture: amd64
Version: 1.1-1
Description: Say Hello

Case Study

What follows is a minimal rewrite in Go of the code covered in this second part. The full code is available on GitHub in the repository julien-sobczak/linux-packages-under-the-hood.

But first, let’s remove the package or we will not be able to test our program:

# dpkg -r hello
(Reading database ... 26963 files and directories currently installed.)
Removing hello (1.1-1) ...

# hello
bash: /usr/bin/hello: No such file or directory

Here is the code:

main.go
package main

import (
    "archive/tar"
    "bytes"
    "fmt"
    "io"
    "log"
    "os"
    "os/exec"
    "path/filepath"
    "strings"

    "github.com/blakesmith/ar"
    "github.com/julien-sobczak/deb822"
)

func main() {
    // This program expects one or more package files to install.
    if len(os.Args) < 2 {
        log.Fatalf("Missing package archive(s)")
    }

    // Read the DPKG database
    db, _ := loadDatabase()

    // Unpack and configure the archive(s)
    for _, archivePath := range os.Args[1:] {
        processArchive(db, archivePath)
    }

    // For simplicity reasons, we don't manage a queue to defer
    // the configuration of packages like in the official code.
}

//
// Dpkg Database
//

type Database struct {
    // File /var/lib/dpkg/status
    Status deb822.Document
    // Packages under /var/lib/dpkg/info/
    Packages []*PackageInfo
}

type PackageInfo struct {
    Paragraph deb822.Paragraph // Extracted section in /var/lib/dpkg/status

    // info
    Files             []string          // File <name>.list
    Conffiles         []string          // File <name>.conffiles
    MaintainerScripts map[string]string // File <name>.{preinst,prerm,...}

    Status      string // Current status (as present in `Paragraph`)
    StatusDirty bool   // True to ask for sync
}

func (p *PackageInfo) Name() string {
    // Extract the package name from its section in /var/lib/dpkg/status
    return p.Paragraph.Value("Package")
}

func (p *PackageInfo) Version() string {
    // Extract the package version from its section in /var/lib/dpkg/status
    return p.Paragraph.Value("Version")
}

// isConffile determines if a file must be processed as a conffile.
func (p *PackageInfo) isConffile(path string) bool {
    for _, conffile := range p.Conffiles {
        if path == conffile {
            return true
        }
    }
    return false
}

// InfoPath returns the path of a file under /var/lib/dpkg/info/.
// Ex: "list" => /var/lib/dpkg/info/hello.list
func (p *PackageInfo) InfoPath(filename string) string {
    return filepath.Join("/var/lib/dpkg", p.Name()+"."+filename)
}

// We now add a method to change the package status
// and make sure the section in the status file is updated too.
// This method will be used several times at the different steps
// of the installation process.

func (p *PackageInfo) SetStatus(new string) {
    p.Status = new
    p.StatusDirty = true
    // Override in DEB 822 document used to write the status file
    old := p.Paragraph.Values["Status"]
    parts := strings.Split(old, " ")
    newStatus := fmt.Sprintf("%s %s %s", parts[0], parts[1], new)
    p.Paragraph.Values["Status"] = newStatus
}

// Now, we are ready to read the database directory to initialize the structs.

func loadDatabase() (*Database, error) {
    // Load the status file
    f, _ := os.Open("/var/lib/dpkg/status")
    parser, _ := deb822.NewParser(f)
    status, _ := parser.Parse()

    // Read the info directory
    var packages []*PackageInfo
    for _, statusParagraph := range status.Paragraphs {
        statusField := statusParagraph.Value("Status") // install ok installed
        statusValues := strings.Split(statusField, " ")

        pkg := PackageInfo{
            Paragraph:         statusParagraph,
            MaintainerScripts: make(map[string]string),
            Status:            statusValues[2],
            StatusDirty:       false,
        }

        // Read the configuration files
        pkg.Files, _ = ReadLines(pkg.InfoPath("list"))
        pkg.Conffiles, _ = ReadLines(pkg.InfoPath("conffiles"))

        // Read the maintainer scripts
        maintainerScripts := []string{"preinst", "postinst", "prerm", "postrm"}
        for _, script := range maintainerScripts {
            scriptPath := pkg.InfoPath(script)
            if _, err := os.Stat(scriptPath); !os.IsNotExist(err) {
                content, err := os.ReadFile(scriptPath)
                if err != nil {
                    return nil, err
                }
                pkg.MaintainerScripts[script] = string(content)
            }
        }
        packages = append(packages, &pkg)
    }

    // We have read everything that interest us and are ready
    // to populate the Database struct.

    return &Database{
        Status:   status,
        Packages: packages,
    }, nil
}

// Now we are ready to process an archive to install.

func processArchive(db *Database, archivePath string) error {

    // Read the Debian archive file
    f, err := os.Open(archivePath)
    if err != nil {
        return err
    }
    defer f.Close()
    reader := ar.NewReader(f)

    // Skip debian-binary
    reader.Next()

    // control.tar
    reader.Next()
    var bufControl bytes.Buffer
    io.Copy(&bufControl, reader)

    pkg, err := parseControl(db, bufControl)
    if err != nil {
        return err
    }

    // Add the new package in the database
    db.Packages = append(db.Packages, pkg)
    db.Sync()

    // data.tar
    reader.Next()
    var bufData bytes.Buffer
    io.Copy(&bufData, reader)

    fmt.Printf("Preparing to unpack %s ...\n", filepath.Base(archivePath))

    if err := pkg.Unpack(bufData); err != nil {
        return err
    }
    if err := pkg.Configure(); err != nil {
        return err
    }

    db.Sync()

    return nil
}

// parseControl processes the control.tar archive.
func parseControl(db *Database, buf bytes.Buffer) (*PackageInfo, error) {

    // The control.tar archive contains the most important files
    // we need to install the package.
    // We need to extract metadata from the control file, determine
    // if the package contains conffiles and maintainer scripts.

    pkg := PackageInfo{
        MaintainerScripts: make(map[string]string),
        Status:            "not-installed",
        StatusDirty:       true,
    }

    tr := tar.NewReader(&buf)

    for {
        hdr, err := tr.Next()
        if err == io.EOF {
            break // End of archive
        }
        if err != nil {
            return nil, err
        }

        // Read the file content
        var buf bytes.Buffer
        if _, err := io.Copy(&buf, tr); err != nil {
            return nil, err
        }

        switch filepath.Base(hdr.Name) {
        case "control":
            parser, _ := deb822.NewParser(strings.NewReader(buf.String()))
            document, _ := parser.Parse()
            controlParagraph := document.Paragraphs[0]

            // Copy control fields and add the Status field in second position
            pkg.Paragraph = deb822.Paragraph{
                Values: make(map[string]string),
            }

            // Make sure the field "Package' comes first, then "Status",
            // then remaining fields.
            pkg.Paragraph.Order = append(
                pkg.Paragraph.Order, "Package", "Status")
            pkg.Paragraph.Values["Package"] = controlParagraph.Value("Package")
            pkg.Paragraph.Values["Status"] = "install ok non-installed"
            for _, field := range controlParagraph.Order {
                if field == "Package" {
                    continue
                }
                pkg.Paragraph.Order = append(pkg.Paragraph.Order, field)
                pkg.Paragraph.Values[field] = controlParagraph.Value(field)
            }
        case "conffiles":
            pkg.Conffiles = SplitLines(buf.String())
        case "prerm":
            fallthrough
        case "preinst":
            fallthrough
        case "postinst":
            fallthrough
        case "postrm":
            pkg.MaintainerScripts[filepath.Base(hdr.Name)] = buf.String()
        }
    }

    return &pkg, nil
}

// Unpack processes the data.tar archive.
func (p *PackageInfo) Unpack(buf bytes.Buffer) error {

    // The unpacking process consists in extracting all files
    // in data.tar to their final destination, except for conffiles,
    // which are copied with a special extension that will be removed
    // in the configure step.

    if err := p.runMaintainerScript("preinst"); err != nil {
        return err
    }

    fmt.Printf("Unpacking %s (%s) ...\n", p.Name(), p.Version())

    tr := tar.NewReader(&buf)
    for {
        hdr, err := tr.Next()
        if err == io.EOF {
            break // End of archive
        }
        if err != nil {
            return err
        }

        var buf bytes.Buffer
        if _, err := io.Copy(&buf, tr); err != nil {
            return err
        }

        switch hdr.Typeflag {
        case tar.TypeReg:
            dest := hdr.Name
            if strings.HasPrefix(dest, "./") {
                // ./usr/bin/hello => /usr/bin/hello
                dest = dest[1:]
            }
            if !strings.HasPrefix(dest, "/") {
                // usr/bin/hello => /usr/bin/hello
                dest = "/" + dest
            }

            tmpdest := dest
            if p.isConffile(tmpdest) {
                // Extract using the extension .dpkg-new
                tmpdest += ".dpkg-new"
            }

            if err := os.MkdirAll(filepath.Dir(tmpdest), 0755); err != nil {
                log.Fatalf("Failed to unpack directory %s: %v", tmpdest, err)
            }

            content := buf.Bytes()
            if err := os.WriteFile(tmpdest, content, 0755); err != nil {
                log.Fatalf("Failed to unpack file %s: %v", tmpdest, err)
            }

            p.Files = append(p.Files, dest)
        }
    }

    p.SetStatus("unpacked")
    p.Sync()

    return nil
}

// Configure processes the conffiles.
func (p *PackageInfo) Configure() error {

    // The configure process consists in renaming the conffiles
    // unpacked at the previous step.
    //
    // We ignore some implementation concerns like checking if a conffile
    // has been updated using the last known checksum.

    fmt.Printf("Setting up %s (%s) ...\n", p.Name(), p.Version())

    // Rename conffiles
    for _, conffile := range p.Conffiles {
        os.Rename(conffile+".dpkg-new", conffile)
    }
    p.SetStatus("half-configured")
    p.Sync()

    // Run maintainer script
    if err := p.runMaintainerScript("postinst"); err != nil {
        return err
    }
    p.SetStatus("installed")
    p.Sync()

    return nil
}

func (p *PackageInfo) runMaintainerScript(name string) error {

    // The control.tar file can contains scripts to be run at
    // specific moments. This function uses the standard Go library
    // to run the `sh` command with a maintainer scrpit as an argument.

    if _, ok := p.MaintainerScripts[name]; !ok {
        // Nothing to run
        return nil
    }

    out, err := exec.Command("/bin/sh", p.InfoPath(name)).Output()
    if err != nil {
        return err
    }
    fmt.Print(string(out))

    return nil
}

// We have covered the different steps of the installation process.
// We still need to write the code to sync the database.

func (d *Database) Sync() error {
    newStatus := deb822.Document{
        Paragraphs: []deb822.Paragraph{},
    }

    // Sync the /var/lib/dpkg/info directory
    for _, pkg := range d.Packages {
        newStatus.Paragraphs = append(newStatus.Paragraphs, pkg.Paragraph)

        if pkg.StatusDirty {
            if err := pkg.Sync(); err != nil {
                return err
            }
        }
    }

    // Make a new version of /var/lib/dpkg/status
    os.Rename("/var/lib/dpkg/status", "/var/lib/dpkg/status-old")
    formatter := deb822.NewFormatter()
    formatter.SetFoldedFields("Description")
    formatter.SetMultilineFields("Conffiles")
    if err := os.WriteFile("/var/lib/dpkg/status",
        []byte(formatter.Format(newStatus)), 0644); err != nil {
        return err
    }

    return nil
}

func (p *PackageInfo) Sync() error {
    // This function synchronizes the files under /var/lib/dpkg/info
    // for a single package.

    // Write <package>.list
    if err := os.WriteFile(p.InfoPath("list"),
        []byte(MergeLines(p.Files)), 0644); err != nil {
        return err
    }

    // Write <package>.conffiles
    if err := os.WriteFile(p.InfoPath("conffiles"),
        []byte(MergeLines(p.Conffiles)), 0644); err != nil {
        return err
    }

    // Write <package>.{preinst,prerm,postinst,postrm}
    for name, content := range p.MaintainerScripts {
        err := os.WriteFile(p.InfoPath(name), []byte(content), 0755)
        if err != nil {
            return err
        }
    }

    p.StatusDirty = false
    return nil
}

/* Utility functions */

func ReadLines(path string) ([]string, error) {
    if _, err := os.Stat(path); !os.IsNotExist(err) {
        content, err := os.ReadFile(path)
        if err != nil {
            return nil, err
        }
        return SplitLines(string(content)), nil
    }
    return nil, nil
}

func SplitLines(content string) []string {
    var lines []string
    for _, line := range strings.Split(string(content), "\n") {
        if strings.TrimSpace(line) == "" {
            continue
        }
        lines = append(lines, line)
    }
    return lines
}

func MergeLines(lines []string) string {
    return strings.Join(lines, "\n") + "\n"
}

Let’s test the new command:

$ go build -o dpkg main.go
$ vagrant destroy -f # Recreate the VM
$ vagrant up         # to force a fresh installation.
vagrant$ sudo su
vagrant# /vagrant/dpkg /vagrant/hello.deb
Preparing to unpack hello.deb ...
preinst says hello
Unpacking hello (1.1-1) ...
Setting up hello (1.1-1) ...
postinst says hello

vagrant# hello
hello world

vagrant# dpkg -s hello
Package: hello
Status: install ok installed
Priority: optional
Section: base
Maintainer: Julien Sobczak
Architecture: amd64
Version: 1.1-1
Description: Say Hello

Our package has been correctly installed. The standard dpkg command recognized it and can be used to remove the package like any other installed package:

vagrant# dpkg -r hello
(Reading database ... 25063 files and directories currently installed.)
Removing hello (1.1-1) ...
prerm says hello
postrm says hello

vagrant# hello
bash: /usr/bin/hello: No such file or directory

🎉 We have finished with the command dpkg. We succeeded in creating a package manually and installed it using a basic Go program. We have a better understanding of how dpkg is working and what information is available in its database. Now, we will have a look at the package manager frontend apt to understand how these programs are working together to install a package.

What happens when you install a package using apt

The main reason to use apt is for the dependency management support. This command understands that in order to install a given package, other packages may need to be installed too, and apt can download and install them. In practice, dpkg is called a package manager and apt is called a frontend package manager.

What You Need to Know About apt, apt-get, aptitude

APT is a vast project started in 1997 organized around a core library. The command apt-get was the first frontend developed within the project, and apt is the second command provided by APT, which overcomes some design mistakes of apt-get, for example, apt refuses to install dependencies that were not installed beforehand during an upgrade. Under the hood, both tools are built on top of the core library and are thus very close.

External projects like aptitude have been developed later to support new features like auto-removing of packages when they are no longer required, but most of these features are now available in apt too.

The most widespread command remains apt, and it is the one that we will use in this section.

APT makes software available to the user by doing the dirty work of downloading all the required packages and installing them using dpkg in the correct order to respect the dependencies. The scope of APT is larger than Dpkg and its behavior is highly configurable.

What You Need to Know About APT Configuration Files

APT configuration resides under /etc/apt/, which contains the following files:

  • apt.conf and apt.conf.d/: The main configuration files where hundred of options are available (more about them soon). The command apt-config dump can be used to view all available options with their default values:

    $ apt-config dump
    ...
    Dir "/";
    Dir::State "var/lib/apt";
    Dir::State::status "/var/lib/dpkg/status";
    Dir::Cache "var/cache/apt";
    Dir::Etc "etc/apt";
    Dir::Etc::sourcelist "sources.list";
    Dir::Etc::sourceparts "sources.list.d";
    Dir::Etc::main "apt.conf";
    Dir::Etc::parts "apt.conf.d";
    Dir::Etc::preferences "preferences";
    Dir::Etc::preferencesparts "preferences.d";
    Dir::Etc::trusted "trusted.gpg";
    Dir::Etc::trustedparts "trusted.gpg.d";
    ...
  • sources.list and sources.list.d/: lists of repositories (more about them soon). Here are the default repositories on my Debian server:

    $ cat /etc/apt/sources.list
    deb http://deb.debian.org/debian buster main
    deb-src http://deb.debian.org/debian buster main
    deb http://security.debian.org/debian-security buster-security main
    deb-src http://security.debian.org/debian-security buster-security main
    deb http://deb.debian.org/debian buster-updates main
    deb-src http://deb.debian.org/debian buster-updates main
    deb http://deb.debian.org/debian buster-backports main
    deb-src http://deb.debian.org/debian buster-backports main
  • preferences and preferences.d/: APT pinning is the only available preference. By default, when multiple repositories are configured, a package can exist in several of them and APT applies logic to decide which one must be installed. Pinning allows you to change this logic (called a policy) for some packages. The command apt-cache policy [pkg] can be used to view the global policy when called without argument:

    $ apt-cache policy
    Package files:
     100 /var/lib/dpkg/status
         release a=now
     500 http://security.debian.org/debian-security buster-security/main
         amd64 Packages
         release o=Debian,a=testing-security,n=buster-security,
         l=Debian-Security,c=main,b=amd64
         origin security.debian.org
     500 http://deb.debian.org/debian buster/main amd64 Packages
         release o=Debian,a=testing,n=buster,l=Debian,c=main,b=amd64
         origin deb.debian.org

    You can create preferences files to privilege a specific repository for a given package or to prevent this package to be upgraded. (not covered in this article)

  • trusted.gpg and trusted.gpg.d/: keys for secure authentication of packages (known as "Secure APT" and used in Debian since 2005). The command apt-key can be used to show the keys, and to add or remove a key. APT uses public-key (asymmetric) cryptography using GPG:

    $ ls -1 /etc/apt/trusted.gpg.d/
    debian-archive-buster-automatic.gpg
    debian-archive-buster-security-automatic.gpg
    debian-archive-buster-stable.gpg
    debian-archive-stretch-automatic.gpg
    debian-archive-stretch-security-automatic.gpg
    debian-archive-stretch-stable.gpg

    When installing a package, APT retrieves the package from an external repository and the Release file, which is the entry file to find Packages index files, may have be altered (which means checking the MD5 sums inside these index files is useless if we can’t guarantee that the Release file is safe against a man-in-the-middle attack). This is the goal of secure APT. Concretely, secure APT always downloads a Release.gpg file if existing before downloading a Release file. (NB: The file InRelease had now merged the intent of these two deprecated files.) Using cryptography, APT can be sure that the file is safe and can trust the MD5 sums present inside it to check other files like Packages files. Otherwise, APT will complain with the following message you have probably encountered before:

    # When adding a new repository in `/etc/apt/sources.list.d/`:
    W: GPG error: http://ftp.us.debian.org testing Release:
     The following signatures couldn't be verified
     because the public key is not available:
     NO_PUBKEY 010908312D230C5F
    # When installing a new package from this repository:
    WARNING: The following packages cannot be authenticated!
      libglib-perl libgtk2-perl
    Install these packages without verification [y/N]?
  • auth.conf and auth.conf.d/: APT configuration and repositories list must be accessible to any user on the system but some repositories may require login information to connect, which are stored in these restrictive files. For example, instead of specifying the user/password apt:debian in the source list file directly (deb https://apt:debian@example.org/debian buster main), you can create an entry in auth.conf:

    machine example.org
    login apt
    password debian

    (not covered in this article)

  • listchanges.conf and listchanges.conf.d: Only used by the command apt-listchanges to show what has been changed in a new version of a Debian package, as compared to the version currently installed on the system. It does this by extracting the relevant entries from both the NEWS.Debian and changelog[.Debian] files, usually found in /usr/share/doc/package in Debian package archives. (not covered in this article)

In practice, .d directories are privileged so that the configuration can be split into several files. Single file may not even exist on your machine and are often deprecated.

Further documentation: APT configuration, Secure APT.

Now is the time to start looking at the code again. APT is written in C++. The entry point for any APT command is the file cmdline/apt.cc which contains a function GetCommands() that maps each command with a function defined in the directory apt-private/, which delegates to other functions in the main APT lib present in the directory apt-pkg/ (i.e., cmdline/ → apt-private/ → apk-pkg/):

cmdline/apt.cc
static std::vector<aptDispatchWithHelp> GetCommands()                        /*{{{*/
{
   return {
      {"list", &DoList, _("list packages based on package names")},
      {"update", &DoUpdate, _("update list of available packages")},
      {"install", &DoInstall, _("install packages")},

      // ...

      {nullptr, nullptr, nullptr}
   };
}

Before invoking the command function, APT simply initializes a few classes like pkgSystem to set the default configuration options.

What You Need to Know About APT Configuration Options

Unlike Dpkg, APT is highly configurable using the files /etc/apt/apt.conf and /etc/apt/apt.conf.d/. The format is similar to some Linux tools like bind or dhcp.

vagrant$ cat /etc/apt/apt.conf.d/*
APT
{
  NeverAutoRemove
  {
    "^firmware-linux.*";
    "^linux-firmware$";
    "^linux-image-[a-z0-9]*$";
    "^linux-image-[a-z0-9]*-[a-z0-9]*$";
  };
};
DPkg::Pre-Install-Pkgs { "/usr/bin/apt-listchanges --apt || test $? -lt 10"; };
...

The configuration file is organized in a tree organized into functional groups. For instance, APT::Get::Assume-Yes is an option within the APT tool group, for the Get tool. A new scope can be opened with curly braces, like this:

APT {
  Get {
    Assume-Yes "true";
    Fix-Broken "true";
  };
};

You can retrieve the full list of options using the command apt-config:

vagrant# apt-config dump
APT "";
APT::Architecture "amd64";
APT::Build-Essential "";
APT::Build-Essential:: "build-essential";
APT::Install-Recommends "1";
APT::Install-Suggests "0";
APT::Sandbox "";
APT::Sandbox::User "_apt";
… hundreds of other options ...

Inside the code, the configuration is accessible using the class Configuration (defined in apt-pkg/contrib/configuration.h):

#include <apt-pkg/configuration.h>

Configuration *_config = new Configuration;

// Example with a boolean option
if (_config->FindB("pkgCacheFile::Generate", true) == false) {}

// Example with an integer option
int const Limit = _config->FindI("Acquire::QueueHost::Limit",DEFAULT_HOST_LIMIT)

Further documentation: man page

apt update

Here is the entry point when running the command apt update:

apt-private/private-update.cc
bool DoUpdate(CommandLine &CmdL)
{
   CacheFile Cache;

   // Covered in step 1
   // Get the source list
   if (Cache.BuildSourceList() == false)
      return false;
   pkgSourceList *List = Cache.GetSourceList();

   // Covered in step 2
   // do the work
   AcqTextStatus Stat(std::cout, ScreenWidth,_config->FindI("quiet",0));
   ListUpdate(Stat, *List);

   // Covered in step 3
   // Rebuild the cache.
   pkgCacheFile::RemoveCaches();
   if (Cache.BuildCaches(false) == false)
      return false;

   // Covered in step 4
   // show basic stats (if the user whishes)
   if (_config->FindB("APT::Cmd::Show-Update-Stats", false) == true)
   {
      int upgradable = 0;
      if (Cache.Open(false) == false)
         return false;
      for (pkgCache::PkgIterator I = Cache->PkgBegin(); I.end() != true; ++I)
      {
         pkgDepCache::StateCache &state = Cache[I];
         if (I->CurrentVer != 0 && state.Upgradable() && state.CandidateVer != NULL)
            upgradable++;
      }
      const char *msg = P_(
         "%i package can be upgraded. Run 'apt list --upgradable' to see it.\n",
         "%i packages can be upgraded. Run 'apt list --upgradable' to see them.\n",
         upgradable);
      if (upgradable == 0)
         c1out << _("All packages are up to date.") << std::endl;
      else
         ioprintf(c1out, msg, upgradable);
   }

   return true;
}

The command is divided in four steps that we will cover separately:

  1. Read the sources.list and sources.list.d/* files.

// Get the source list
if (Cache.BuildSourceList() == false)
   return false;
pkgSourceList *List = Cache.GetSourceList();
What You Need to Know About Source Lists

Apt downloads packages from one or more software repositories, which are often remote servers. The precise list of repositories is determined by the file /etc/apt/sources.list and the ones inside /etc/apt/sources.list.d. Two formats are supported: one source per line (the widespread one-line style) or multiline stanzas defining one or more sources per stanza (the newer deb822 style).

Example using the old format:

deb http://us.archive.ubuntu.com/ubuntu focal main restricted
deb http://security.ubuntu.com/ubuntu focal-security main restricted
deb http://us.archive.ubuntu.com/ubuntu focal-updates main restricted

Example using the new format:

Types: deb
URIs: http://us.archive.ubuntu.com/ubuntu
Suites: focal focal-updates
Components: main restricted

Types: deb
URIs: http://security.ubuntu.com/ubuntu
Suites: focal-security
Components: main restricted

We will ignore the new DEB 822 format in this article.

Further documentation: man 5 sources.list

The class pkgSourceList represents the list of configured sources and is defined like this:

apt-pkg/sourcelist.h
class pkgSourceList
{
   public:

   typedef std::vector<metaIndex *>::const_iterator const_iterator;

   protected:

   std::vector<metaIndex *> SrcList;

   public:

   void Reset();
   bool ReadMainList();
   bool Read(std::string const &File);

   // List accessors
   inline const_iterator begin() const {return SrcList.begin();};
   inline const_iterator end() const {return SrcList.end();};
   inline unsigned int size() const {return SrcList.size();};
   inline bool empty() const {return SrcList.empty();};

   bool FindIndex(pkgCache::PkgFileIterator File,
                  pkgIndexFile *&Found) const;
   bool GetIndexes(pkgAcquire *Owner, bool GetAll=false) const;

   pkgSourceList();
   virtual ~pkgSourceList();
};

The list is initialized by the method BuildSourceList():

apt-pkg/cachefile.cc
bool pkgCacheFile::BuildSourceList(OpProgress * /*Progress*/)
{
   std::unique_ptr<pkgSourceList> SrcList;
   SrcList.reset(new pkgSourceList());
   if (SrcList->ReadMainList() == false)
      return _error->Error(_("The list of sources could not be read."));
   this->SrcList = SrcList.release();
   return true;
}

The method ReadMainList() is used to read the sources.list files:

apt-pkg/sourcelist.cc
bool pkgSourceList::ReadMainList()
{
   Reset();
   string Main = _config->FindFile("Dir::Etc::sourcelist", "sources.list");
   string Parts = _config->FindDir("Dir::Etc::sourceparts", "sources.list.d");

   _error->PushToStack();
   if (RealFileExists(Main) == true)
      ReadAppend(Main); (1)
   if (DirectoryExists(Parts) == true)
      ReadSourceDir(Parts); (1)

   auto good = _error->PendingError() == false;
   _error->MergeWithStack();
   return good;
}
1 The Read* methods parse the sources files. We omit the parsing code for brievity but both parsers pushes a new instance of debReleaseIndex in the SrcList.
  1. Fetch index files from each repository (InRelease, Packages, …​).

// do the work
AcqTextStatus Stat(std::cout, ScreenWidth,_config->FindI("quiet",0)); (1)
ListUpdate(Stat, *List);
1 AcqTextStatus is used to report progress of the files downloading.
What You Need to Know About Repositories

A repository is a set of Debian binary or source packages organized in a special directory tree along various additional files—​checksums, signatures, translations, …​ APT downloads some of these files to install a package on your system.

Ex: deb https://deb.debian.org/debian stable main contrib non-free

  • deb is used for binary packages, deb-src for source packages.

  • https://deb.debian.org/debian specifies the root of the repository.

  • stable is the distribution, which is commonly a suite (stable, oldstable, testing, unstable), which is an alias for a Debian codename (wheezy, jessie, stretch), which is based on Toy Story characters.

  • main contrib non-free are the three component types and indicate the licensing terms of the software they contain.

Here is a preview of files tree for this repository:

https://deb.debian.org/debian
└── dists/
    |── Debian9.13/
    |── Debian10.9/
    |   ├── ChangeLog
    |   ├── InRelease  # Same as Release + Release.gpg
    |   |              # (recommended to have only 1 file to download)
    |   ├── Release  # Lists the index files for this distribution
    |   |            # with their checkums
    |   ├── Release.gpg
    |   ├── contrib/
    |   ├── main/
    |   │   └── binary-all/
    |   │   |   |── Packages.gz
    |   │   |   |── Packages.xz  # Several compression formats are accepted.
    |   |   |   |                # xz compression is required.
    |   │   |   |── Release  # Basic metadata about this directory.
    |   |   |   |            # Not comparable with the main Release file.
    |   │   |── binary-amd64/
    |   │   |── ...
    |   │   |── content-all.gz    # Index containing the list
    |   |   |── content-amd64.gz  # of all files in package archives
    |   │   |── content-arm64.gz  # and their corresponding package archive.
    |   │   |── ...
    |   |   |── i18n/  # Translations of Packages files
    |   |   └── source/  # We ignore source packags in this article
    |   │       |──  Release
    |   │       |──  Sources.gz
    |   │       |──  Sources.xz
    |   └── non-free/
    |── bullseye/  # Future Debian 11
    |── buster/    # Symlink to Debian10.9
    |── stable/    # Symlink to buster
    |── stretch/   # Symlink to Debian9.13
    └── testing/   # Symlink to bullseye

And now the explanations.

The root directory contains a directory dists/ which in turn has a directory for each release and suite, the latter usually symlinks to the former. Each release subdirectory contains a signed Release file and a directory for each component. Inside these are directories for the different architectures, named binary-<arch> and sources. And in these are files Packages and Sources that are text files (in DEB 822 format and often compressed) containing the metadata of available packages.

Example of a Packages file:

# 57849 binary packages declarations like this:
Package: wget
Version: 1.20.1-1.1
Installed-Size: 3257
Maintainer: Noël Köthe <noel@debian.org>
Architecture: amd64
Depends: libc6 (>= 2.28), libgnutls30 (>= 3.6.6), libidn2-0 (>= 0.6),
  libnettle6, libpcre2-8-0 (>= 10.32), libpsl5 (>= 0.16.0),
  libuuid1 (>= 2.16), zlib1g (>= 1:1.1.4)
Recommends: ca-certificates
Conflicts: wget-ssl
Description: retrieves files from the web
Multi-Arch: foreign
Homepage: https://www.gnu.org/software/wget/
Description-md5: 63a4a740bcd9e8e94bf661e4f1806e02
Tag: implemented-in::c, interface::commandline, network::client,
 protocol::ftp, protocol::http, protocol::ssl, role::program,
 suite::gnu, use::downloading, works-with::file
Section: web
Priority: standard
Filename: pool/main/w/wget/wget_1.20.1-1.1_amd64.deb
Size: 901956
MD5sum: a7e3faa711503bd9500650de8fc9835e
SHA256: 3821cee0d331cf75ee79daff716f9d320f758f9dff3eaa6d6cf12bae9ef14306

Package: libwget0
Source: wget2
Version: 1.99.1-2
Installed-Size: 387
Maintainer: Noël Köthe <noel@debian.org>
Architecture: amd64
Depends: libassuan0 (>= 2.0.1), libbrotli1 (>= 0.6.0), libbz2-1.0,
  libc6 (>= 2.27), libgnutls30 (>= 3.5.10), libgpg-error0 (>= 1.14),
  libgpgme11 (>= 1.1.2), libidn2-0 (>= 0.6),
  liblzma5 (>= 5.1.1alpha+20120614), libnghttp2-14 (>= 1.3.0),
  libpcre2-8-0 (>= 10.31), libpsl5 (>= 0.16.0), zlib1g (>= 1:1.1.4)
Description: Download library for files and recursive websites
Homepage: https://gitlab.com/gnuwget/wget2
Description-md5: 3cb4ed03cbc78579a7e509e41156a73f
Tag: role::shared-lib
Section: libs
Priority: optional
Filename: pool/main/w/wget2/libwget0_1.99.1-2_amd64.deb
Size: 146028
MD5sum: 944b2824ee264e1b0cc0f91c1a86e6e2
SHA256: 3bf97e4852e76dba5bf2261f4a949a445edda646d09d7d1175dccfdf77bdbc3f

Example of a Sources file:

# 28489 source packages declarations like this:
Package: wget
Binary: wget, wget-udeb
Version: 1.20.1-1.1
Maintainer: Noël Köthe <noel@debian.org>
Build-Depends: debhelper (>> 11.0.0), pkg-config, gettext, texinfo,
  libidn2-0-dev, uuid-dev, libpsl-dev, libpcre2-dev,
  libgnutls28-dev (>= 3.3.15-5), automake,
  libssl-dev (>= 0.9.8k), zlib1g-dev, dh-strip-nondeterminism
Architecture: any
Standards-Version: 4.3.0
Format: 3.0 (quilt)
Files:
 7a84dd8efb09001dcb9af1576b35992c 2092 wget_1.20.1-1.1.dsc
 f6ebe9c7b375fc9832fb1b2028271fb7 4392853 wget_1.20.1.orig.tar.gz
 e0ed66f143f4d81dd0f27a8f01a9c5c8 60872 wget_1.20.1-1.1.debian.tar.xz
Checksums-Sha256:
 b19...261 2092 wget_1.20.1-1.1.dsc
 b78...1b3 4392853 wget_1.20.1.orig.tar.gz
 7ee...01e 60872 wget_1.20.1-1.1.debian.tar.xz
Homepage: https://www.gnu.org/software/wget/
Package-List:
 wget deb web standard arch=any
 wget-udeb udeb debian-installer optional arch=any
Directory: pool/main/w/wget
Priority: source
Section: web

But still no .deb packages…​ We need to move to another directory at the root of the repository to find them:

https://deb.debian.org/debian
└── dists/
    |── contrib/
    |── main/
    |   |── 0/
    |   |── 1/
    |   |── ...
    |   |── 9/
    |   |── a/
    |   |── ...
    |   |── w/
    |       |── ....
    |       └── wget/
    |   |── ...
    |   |── z/
    |   |── liba/
    |   |── ...
    |   |── libw/
    |   |   |── wget_1.21-1+b1_amd64.deb
    |   |   |── wget_1.21-1.debian.tar.xz
    |   |   |── wget_1.21-1.dsc
    |   |   |── wget_1.21-1_arm64.deb
    |   |   |── wget_1.21.orig.tar.gz
    |   |   └── wget_1.21.orig.tar.gz.asc
    |   |── ...
    |   └── libz/
    └── non-free/

The directory pool/ has a directory for all the components, and in these are directories named 0, …​, 9, a, …​ z, liba, …​ , libz. And in these are directories named after the software packages they contain, and these directories finally contain the actual packages, i.e the .deb files.

Notes:

  • The "single letter" directories are just a trick to avoid having too many entries in a single directory which is what many systems traditionally have performance problems with.

  • The pool/ directory avoid file duplication as binary and source packages are stored only once even if used by many releases under dists/.

  • Packages and Sources index files are control files using a similar format as used in the first part of this article when creating our Debian archive package, with a special field File and Directory respectively, to link to the pool/ directory.

  • Release is an index file in the DEB822 format but containing only a single document whose field names refers to the repository — Origin, Suite, Codename, Architectures (plural), Components — and whose field MD5Sum contains the checksums for all files in this repository.

Further documentation: Debian Repository and the more complete Repository Format

Here is the function ListUpdate that actively downloads index files from the repositories:

apt-pkg/update.cc
bool ListUpdate(pkgAcquireStatus &Stat,
                pkgSourceList &List,
                int PulseInterval)
{
   pkgAcquire Fetcher(&Stat); (1)
   if (Fetcher.GetLock(_config->FindDir("Dir::State::Lists")) == false) (2)
      return false;

   // Populate it with the source selection
   if (List.GetIndexes(&Fetcher) == false) (3)
         return false;

   return AcquireUpdate(Fetcher, PulseInterval, true); (4)
}
1 The class pkgAcquire is the main component of the Acquire subsystem. APT is responsible to retrieve the packages from various sources, mainly remote repositories through HTTP and the Acquire system is responsible to fetch all Item required by APT in the most efficient way. It uses for example a pool of workers to speed up the downloading and is able to test for diffs files before downloading full index files.
2 Most APT commands tries to acquire a lock to prevent two processes using the lib APT to run at the same time. The lock file is /var/lib/apt/lists/lock but other lock files exists for example to update the APT cache.
3 The method GetIndexes() creates new items to download InRelease files using the Acquire system.
4 The function AcquireUpdate() collects the results from the Fetcher and update the cache.
What You Need to Know About APT Diffs

Packages files (and also some other indices files present in a Debian repository) can be relatively large. For example, the compressed Package.xz file for the architecture amd64 and the component main of the stable Debian repository weights 8 MB. These files are typically retrieved when you run the command apt update and APT provides a solution to this problem.

Indeed, a Debian repository can contains diff files (whose content are similar to the output of the command diff) along the standard files like Packages:

https://deb.debian.org/debian
└── dists/bullseye/main/binary-amd64
    |── Packages.xz  7.8M
    └── Packages.diff/
        |── ... # The Debian official repository keeps ~30 days of diff files.
        |── 2021-04-12-1400.57.gz        33
        |── 2021-04-13-0200.48.gz        7.8K
        |── 2021-04-13-1402.06.gz        637
        |── 2021-04-13-2000.50.gz        660
        |── 2021-04-14-0200.40.gz        2.7K
        |── 2021-04-14-2000.54.gz        5.0K
        |── 2021-04-15-0200.39.gz        3.8K
        └── 2021-04-15-1400.39.gz        220

The apt command will try to retrieve these files and apply successive diffs on top of its local index file.

  1. Read the package lists and build the dependency tree.

// Rebuild the cache.
pkgCacheFile::RemoveCaches();
if (Cache.BuildCaches(false) == false)
   return false;
What You Need to Know About /var/cache/apt/

This directory stores the latest version of the APT cache, used to speed up the execution of most commands:

$ tree /var/cache/apt/
|-- archives  # Storage area for downloaded files
|   |-- lock      # Prevent two APT processes to update the cache simultaneously
|   |-- partial/  # Storage area for files in transit
|   |-- apt-transport-https_2.0.5_all.deb  # Debian downloaded archives
|   |__ ...                                # are kept for a configurable
|   |-- tree_1.8.0-1_amd64.deb             # retention.
|   `-- ...
|-- pkgcache.bin     # Binary files loaded directly in C++
|                    # using the mmap() system call.
`-- srcpkgcache.bin  # Contains the local index files
                     # and the archives file lists.
                     # Those are low-level files used
                     # for performance optimizations.

The APT Cache files under this directory (except the lock file) can be safely deleted using the command apt clean to reclaim disk space:

$ sudo apt clean --dry-run
Del /var/cache/apt/archives/* /var/cache/apt/archives/partial/*
Del /var/lib/apt/lists/partial/*
Del /var/cache/apt/pkgcache.bin /var/cache/apt/srcpkgcache.bin

APT is highly configurable and there are several options to clean the cache regurlarly, like after every package installation.

What You Need to Know About /var/lib/apt/

This directory stores the current state of APT, that is which packages have been installed, what is the latest version of retrieved index files used when updating the cache, etc.

$ tree /var/lib/apt/
.
|-- daily_lock  # Used by the Systemd apt-daily.timer for housekeeping tasks.
|               # Runs /usr/lib/apt/apt.systemd.daily which clean the cache,
|               # update the repositories, create backups of extended_states...
|               # Not covered in this article.
|-- extended_states  # Extension to /var/lib/dpkg/status to store which
|                    # packages were installed manually or automatically
|                    # (i.e., as a dependency of another packages).
|                    # Useful to support autoremove of useless packages.
|-- listchanges.db  # Used by the command apt-listchanges
|                   # Not covered in this article.
|-- lists  # Local version of index files retrieved
|   |      # from repositories in sources.list
|   |-- deb.debian.org_debian_dists_buster-backports_InRelease
|   |-- deb.debian.org_debian_dists_buster-updates_InRelease
|   |-- deb.debian.org_debian_dists_buster_InRelease
|   |-- deb.debian.org_debian_dists_buster_main_binary-amd64_Packages
|   |-- deb.debian.org_debian_dists_buster_main_binary-amd64_Packages.diff_Index
|   |-- deb.debian.org_debian_dists_buster_main_i18n_Translation-en
|   |-- deb.debian.org_debian_dists_buster_main_i18n_Translation-en.diff_Index
|   |-- deb.debian.org_debian_dists_buster_main_source_Sources
|   |-- deb.debian.org_debian_dists_buster_main_source_Sources.diff_Index
|   |-- lock  # Same as /var/lib/dpkg/lock.
|   |         # Prevent two processes to use the lib APT at the same time
|   `-- partial/  # Storage area for index files in transit
|-- mirrors  # Used when using repository mirrors.
|   |        # Not covered in this article.
|   `-- partial
`-- periodic  # Empty files whose timestamps are updated
    |         # by the Systemd apt-daily.timer
    |         # to determine the last execution date.
    |         # Not covered in this article.
    |-- download-upgradeable-stamp
    |-- unattended-upgrades-stamp
    |-- update-stamp
    `-- upgrade-stamp

This directory doesn’t have to be edited like /etc/apt/ and doesn’t have to be cleaned like /var/cache/apt/. It can be safely ignored by the Apt user but we will still have to talk about it in this article.

The method pkgCacheFile::BuildCaches() calls the method BuildSourceList() we covered in the previous step, and then delegates to the method pkgCacheGenerator::MakeStatusCache() for the effective cache initialization:

apt-pkg/pkgcachegen.cc
bool pkgCacheGenerator::MakeStatusCache(pkgSourceList &List,OpProgress *Progress,
                        MMap **OutMap,pkgCache **OutCache, bool)
{
   std::vector<pkgIndexFile *> Files;
   if (_system->AddStatusFiles(Files) == false)
      return false;

   // Decide if we can write to the files..
   string const CacheFileName = _config->FindFile("Dir::Cache::pkgcache"); (1)
   string const SrcCacheFileName = _config->FindFile("Dir::Cache::srcpkgcache"); (1)

   if (Progress != NULL)
      Progress->OverallProgress(0,1,1,_("Reading package lists"));

   bool pkgcache_fine = false;
   bool srcpkgcache_fine = false;

   FileFd CacheFile;
   if (CheckValidity(CacheFile, CacheFileName, List, Files.begin(), Files.end()) (2)
   {
      pkgcache_fine = true;
      srcpkgcache_fine = true;
   }

   FileFd SrcCacheFile;
   if (pkgcache_fine == false)
   {
      if (CheckValidity(SrcCacheFile, SrcCacheFileName, List,
            Files.end(), Files.end()) == true) (2)
      {
         srcpkgcache_fine = true;
      }
   }

   if (srcpkgcache_fine == true && pkgcache_fine == true)
   {
      if (Progress != NULL)
         Progress->OverallProgress(1,1,1,_("Reading package lists"));
      return true; (3)
   }

   bool Writeable = false;
   if (srcpkgcache_fine == false || pkgcache_fine == false)
   {
      if (CacheFileName.empty() == false)
         Writeable = access(flNotFile(CacheFileName).c_str(),W_OK) == 0;
      else if (SrcCacheFileName.empty() == false)
         Writeable = access(flNotFile(SrcCacheFileName).c_str(),W_OK) == 0;
   }

   // At this point we know we need to construct something, so get storage ready
   std::unique_ptr<DynamicMMap> Map(CreateDynamicMMap(NULL, 0));

   std::unique_ptr<pkgCacheGenerator> Gen{nullptr};
   map_filesize_t CurrentSize = 0;
   map_filesize_t TotalSize = 0;

   if (srcpkgcache_fine == true && pkgcache_fine == false)
   {
      if (loadBackMMapFromFile(Gen, Map, Progress, SrcCacheFile) == false)
         return false;
      srcpkgcache_fine = true;
      TotalSize += ComputeSize(NULL, Files.begin(), Files.end());
   }
   else if (srcpkgcache_fine == false)
   {
      Gen.reset(new pkgCacheGenerator(Map.get(),Progress));
      if (Gen->Start() == false)
         return false;

      TotalSize += ComputeSize(&List, Files.begin(),Files.end());
      if (BuildCache(*Gen, Progress, CurrentSize, TotalSize, &List,
               Files.end(),Files.end()) == false)
         return false;

      if (Writeable == true && SrcCacheFileName.empty() == false)
         if (writeBackMMapToFile(Gen.get(), Map.get(), SrcCacheFileName) == false)
            return false;
   }

   if (pkgcache_fine == false)
   {
      if (BuildCache(*Gen, Progress, CurrentSize, TotalSize, NULL,
               Files.begin(), Files.end()) == false)
         return false;

      if (Writeable == true && CacheFileName.empty() == false)
         if (writeBackMMapToFile(Gen.get(), Map.get(), CacheFileName) == false)
            return false;
   }

   if (OutMap != nullptr)
      *OutMap = Map.release();

   return true;
}
1 The cache is stored in /var/cache/apt/pkgcache.bin and /var/cache/apt/srcpkgcache.bin. There are binary files that are loaded in memory.
2 The method CheckValidity loads each cache file in memory and checks that they are up-to-date, by verifying that every required index files for every source exists.
3 If both cache files are correct, we can returns immediately. Otherwise, we need to rebuild from scratch the ones that are not fine.
What You Need to Know About APT Cache Files

The APT Cache files are two binary files /var/cache/apt/pkgcache.bin and /var/cache/apt/srcpkgcache.bin.

Basically, these cache files contains all index files (InRelease, Packages, Sources, and Translations) retrieved from the APT repositories present in the list of sources (/etc/apt/source.list and /etc/apt/source.list.d/). The only difference between these two files is that the file pkgcache.bin appends also the content of /var/lib/dpkg/status.

Therefore, every time a new index file is retrieved by APT or when the Dpkg status file changes, the APT cache must be updated too.

The format of the cache files is optimized for the sole usage of APT and the main motivations is to speed up the loading of the cache in memory, and to reduce the memory usage. Therefore, the cache uses a binary format, which means you cannot read the files using your text editor. For example, Header is the first struct copied and starts like this:

struct Header
{
   // Signature information
   unsigned long Signature; # 0x98FE76DC
   short MajorVersion;      # 0
   short MinorVersion;      # 2
   ...
}

Field names are logically omitted and only values (sometimes converted to enums like the status string installed that becomes 6 in the binary file) are appended in successive order as confirmed by the command xxd which dump a file in hexadecimal:

$ xxd /var/cache/apt/pkgcache.bin  | head -1
00000000: dc76 fe98 1000 0000 a802 1c2c 4038 5818  .v.........,@8X.
#
#  long = 4 bytes, short = 2 bytes
#  amd64 = little endian
#
#        dc --------+
#        76 ------+ |
#        fe ----+ |         10 ---+           00 ---+
#        98 --+ | | |       00 -+ |           00 -+ |
#             | | | |           | |               | |
#  Signature: 98FE76DC   Minor: 0010 = 2   Major: 0000 = 0

When APT is launched, these two files are loaded in memory using the mmap() system call and the rest of the code interacts with an instance of the class pkgCache and another of the class pkgDepCache. In fact, pkgDepCache wraps pkgCache to add state informations about the packages on the system so that pkgCache is mostly read-only.

The code to initialize these instances is not covered in the article. Check the files apt-pkg/pkgcache.h, apt-pkg/cachefile.h and apt-pkg/pkgcachegen.h if you are curious.

Further Documentation: APT Cache File Format

We will not go deeper into the APT Cache code. We have already inspected the structure of the different index files (InRelease, Packages, …​) and we know that APT commands use pkgCacheFile.GetPkgCache() and pkgCacheFile.GetDepCache() to retrieve information from the cache.

What follows are annotated definitions to give you an idea of the kind of information present in the APT Cache:

apt-pkg/pkgcache.h
class pkgCache
{
   public:

   struct Header;  // The size and count of each following properties
                   // required to jump to the index in the binary format.

   struct Group;  // Packages with the same name form a group, so we have
                  // a simple way to access a package built
                  // for different architectures.
                  // Groups are also used to iterate over all binaries
                  // produced by a source package.
   struct Package;  // A single package with all the available versions
                    // and the possible installed version.
   struct ReleaseFile;  // Release index file.
   struct PackageFile;  // Packages index file.
   struct Version;  // A single version of a package with the list of
                    // dependencies and the list of files in this package.
   struct Description;  // Translation of a single version of a package
   struct DependencyData;  // Information for a single dependency
                           // (the version, the type, ...)

   // Iterators
   class GrpIterator;
   class PkgIterator;
   class VerIterator;
   class DescIterator;
   class DepIterator;
   class RlsFileIterator;
   class PkgFileIterator;

   class Namespace;

   public:

   // Pointers to the arrays of items
   Header *HeaderP;
   Group *GrpP;
   Package *PkgP;
   DescFile *DescFileP;
   ReleaseFile *RlsFileP; // All Release files used to build the cache
   PackageFile *PkgFileP; // All Packages files used to build the cache
   Version *VerP;
   Description *DescP;
   DependencyData *DepDataP;

   // Accessors
   GrpIterator FindGrp(APT::StringView Name);
   PkgIterator FindPkg(APT::StringView Name);

   inline GrpIterator GrpBegin();
   inline GrpIterator GrpEnd();
   inline PkgIterator PkgBegin();
   inline PkgIterator PkgEnd();
   inline PkgFileIterator FileBegin();
   inline PkgFileIterator FileEnd();
   inline RlsFileIterator RlsFileBegin();
   inline RlsFileIterator RlsFileEnd();
};


struct pkgCache::Package
{
   /** \brief Architecture of the package */
   map_stringitem_t Arch;
   /** \brief List of versions sorted from highest version to lowest version */
   map_pointer<Version> VersionList;
   /** \brief index to the installed version */
   map_pointer<Version> CurrentVer;
   /** \brief index of the group this package belongs to */
   map_pointer<pkgCache::Group> Group;

   /** \brief List of all dependencies on this package */
   map_pointer<Dependency> RevDepends;
   /** \brief List of all "packages" this package provide */
   map_pointer<Provides> ProvidesList;

   // Install/Remove/Purge etc
   /** \brief state that the user wishes the package to be in */
   map_number_t SelectedState;     // What
   /** \brief installation state of the package */
   map_number_t InstState;         // Flags
   /** \brief indicates if the package is installed */
   map_number_t CurrentState;      // State
};

struct pkgCache::ReleaseFile
{
   /** \brief physical disk file that this ReleaseFile represents */
   map_stringitem_t FileName;
   map_stringitem_t Archive;
   map_stringitem_t Codename;
   map_stringitem_t Version;
   map_stringitem_t Origin;
   map_stringitem_t Label;
   /** \brief The site the index file was fetched from */
   map_stringitem_t Site;
};

struct pkgCache::PackageFile
{
   /** \brief physical disk file that this PackageFile represents */
   map_stringitem_t FileName;
   /** \brief the release information to keep record of which
    version belongs to which release e.g. for pinning. */
   map_pointer<ReleaseFile> Release;

   map_stringitem_t Component;
   map_stringitem_t Architecture;
};

struct pkgCache::Version
{
   /** \brief complete version string */
   map_stringitem_t VerStr;
   /** \brief section this version is filled in */
   map_stringitem_t Section;
   /** \brief source package name this version comes from
      Always contains the name, even if it is the same as the binary name */
   map_stringitem_t SourcePkgName;
   /** \brief source version this version comes from
      Always contains the version string, even if it is the same as the binary version */
   map_stringitem_t SourceVerStr;

   /** \brief references all the PackageFile's that this version came from

       FileList can be used to determine what distribution(s) the Version
       applies to. If FileList is 0 then this is a blank version.
       The structure should also have a 0 in all other fields excluding
       pkgCache::Version::VerStr and Possibly pkgCache::Version::NextVer. */
   map_pointer<VerFile> FileList;
   /** \brief base of the dependency list */
   map_pointer<Dependency> DependsList;
   /** \brief links to the owning package

       This allows reverse dependencies to determine the package */
   map_pointer<Package> ParentPkg;
   /** \brief list of pkgCache::Provides */
   map_pointer<Provides> ProvidesList;
};

struct pkgCache::DependencyData
{
   /** \brief string of the version the dependency is applied against */
   map_stringitem_t Version;
   /** \brief index of the package this depends applies to

       The generator will - if the package does not already exist -
       create a blank (no version records) package. */
   map_pointer<pkgCache::Package> Package;

   /** \brief Dependency type - Depends, Recommends, Conflicts, etc */
   map_number_t Type;
   /** \brief comparison operator specified on the depends line

       If the high bit is set then it is a logical OR with the previous record. */
   map_flags_t CompareOp;
};

// Other structs are omitted for brievity.

Here is the definition of the class pkgDepCache:

apt-pkg/depcache.h
class pkgDepCache
{
   public:

   enum ModeList {ModeDelete = 0, ModeKeep = 1, ModeInstall = 2, ModeGarbage = 3};

   struct StateCache
   {
      // text versions of the two version fields
      const char *CandVersion;
      const char *CurVersion;

      // Pointer to the candidate install version.
      Version *CandidateVer;

      // Pointer to the install version.
      Version *InstallVer;

      // Various tree indicators
      signed char Status;              // -1,0,1,2
      unsigned char Mode;              // ModeList

      // Various test members for the current status of the package
      inline bool Keep() const {return Mode == ModeKeep;};
      inline bool Upgrade() const {return Status > 0 && Mode == ModeInstall;};
      inline bool Upgradable() const {return Status >= 1 && CandidateVer != NULL;};
      inline bool Downgrade() const {return Status < 0 && Mode == ModeInstall;};
      inline bool Held() const {return Status != 0 && Keep();};
      // ...
   };

   protected:

   // State information
   pkgCache *Cache;
   StateCache *PkgState;

   public:

   // Accessors
   inline StateCache &operator [](PkgIterator const &I) {return PkgState[I->ID];};
   inline StateCache &operator [](PkgIterator const &I) const {return PkgState[I->ID];};

   // read persistent states
   bool readStateFile(OpProgress * const prog);
   bool writeStateFile(OpProgress * const prog, bool const InstalledOnly=true);

   bool Init(OpProgress * const Prog);
   // Generate all state information
   void Update(OpProgress * const Prog = 0);

   pkgDepCache(pkgCache * const Cache,Policy * const Plcy = 0);
   virtual ~pkgDepCache();
};
  1. Display statistics about package upgrades.

This last step simply traverses the cache to extract the relevant information.

// show basic stats (if the user whishes)
if (_config->FindB("APT::Cmd::Show-Update-Stats", false) == true)
{
   int upgradable = 0;
   if (Cache.Open(false) == false)
      return false;
   for (pkgCache::PkgIterator I = Cache->PkgBegin(); I.end() != true; ++I)
   {
      pkgDepCache::StateCache &state = Cache[I]; (1)
      if (I->CurrentVer != 0 && state.Upgradable() && state.CandidateVer != NULL) (2)
         upgradable++;
   }
   const char *msg = P_(
      "%i package can be upgraded. Run 'apt list --upgradable' to see it.\n",
      "%i packages can be upgraded. Run 'apt list --upgradable' to see them.\n",
      upgradable); (3)
   if (upgradable == 0)
      c1out << _("All packages are up to date.") << std::endl;
   else
      ioprintf(c1out, msg, upgradable);
}
1 The operator [] is overloaded in pkgDepCache to return PkgState[I→ID], which is a struct StateCache containing the current installed and candidate versions.
2 The method Upgradable() reads the state to determine if a new candidate version is available and increments a counter.
3 The macro P_ is defined by define P_(msg,plural,n) (n == 1 ? msg : plural).

That’s all for the command apt update. We will now cover other APT commands, reusing the knowledge we built about the APT cache.

apt list

Here is the code of the command apt list. This version omits optional arguments that are used to filter the list of results.

bool DoList(CommandLine &Cmd)
{
   pkgCacheFile CacheFile;
   pkgCache * const Cache = CacheFile.GetPkgCache(); (1)
   pkgRecords records(CacheFile);

   std::string format = "${color:highlight}${Package}" +
      "${color:neutral}/${Origin} ${Version} " +
      "${Architecture}${ }${apt:Status}"; (2)

   std::list<pkgCache::VerIterator> bag; (3)

   GetVersionSet(CacheFile, &bag);
   std::map<std::string, std::string> output_map;
   for (std::list<pkgCache::VerIterator>::iterator V = bag.begin();
          V != bag.end(); ++V)
   {
      std::stringstream outs;
      ListSingleVersion(CacheFile, records, V, outs, format);
      output_map.insert(std::make_pair<std::string, std::string>(
               V.ParentPkg().FullName(), outs.str()));
   }

   // output the map
   std::map<std::string, std::string>::const_iterator K;
   for (K = output_map.begin(); K != output_map.end(); ++K)
      std::cout << (*K).second << std::endl;

   return true;
}
1 The function CacheFile.GetPkgCache() delegates to the method BuildCaches() we covered in the previous section about apt update. This method is responsible to build the APT cache.
2 Concrete values will be replaced in the function ListSingleVersion by replacing ${Package}, ${Origin}, … by their real values.
3 The real implementation uses the type LocalitySortedVersionSet which is a list ordering packages based on their names in the Translation files of the user locale.

Like for the apt update command, the code is simply using the information present in the APT cache. In this case, it happens in the function GetVersionSet:

apt-private/private-cacheset.cc
bool GetVersionSet(pkgCacheFile &CacheFile,
                   std::list<pkgCache::VerIterator> versions)
{
   pkgCache * const Cache = CacheFile.GetPkgCache();
   pkgDepCache * const DepCache = CacheFile.GetDepCache();

   bool const insertCurrentVer = _config->FindB("APT::Cmd::Installed", false);
   bool const insertUpgradable = _config->FindB("APT::Cmd::Upgradable", false);

   for (pkgCache::PkgIterator P = Cache->PkgBegin(); P.end() == false; ++P)
   {
      pkgDepCache::StateCache &state = (*DepCache)[P];
      if (insertCurrentVer == true) (1)
      {
         if (P->CurrentVer != 0)
            versions->insert(P.CurrentVer());
      }
      else if (insertUpgradable == true) (2)
      {
         if (P.CurrentVer() && state.Upgradable())
            versions->insert(CacheFile.GetPolicy()->GetCandidateVer(P));
      }
      else (3)
      {
         versions->insert(P.VersionList());
      }
   }
   if (progress != NULL)
      progress->Done();
   return true;
}
1 The command apt list --installed searches for installed packages.
2 The command apt list --upgradable searches for installed packages that can be upgraded.
3 The command apt list --all-versions searches for all packages in the APT cache.

The packages are then formatted in the function ListSingleVersion():

apt-private/private-output.cc
void ListSingleVersion(pkgCacheFile &CacheFile, pkgRecords &records,        /*{{{*/
                       pkgCache::VerIterator const &V, std::ostream &out,
                       std::string const &format)
{
   pkgCache::PkgIterator const P = V.ParentPkg();
   pkgDepCache * const DepCache = CacheFile.GetDepCache();
   pkgDepCache::StateCache const &state = (*DepCache)[P];

   std::string output = format; (1)

   output = SubstVar(output, "${db::Status-Abbrev}",
                     GetFlagsStr(CacheFile, P));
   output = SubstVar(output, "${Package}", P.Name());
   std::string const ArchStr = GetArchitecture(CacheFile, P);
   output = SubstVar(output, "${Architecture}", ArchStr);
   std::string const InstalledVerStr = GetInstalledVersion(CacheFile, P);
   output = SubstVar(output, "${installed:Version}", InstalledVerStr);
   std::string const CandidateVerStr = GetCandidateVersion(CacheFile, P);
   output = SubstVar(output, "${candidate:Version}", CandidateVerStr);
   std::string const VersionStr = GetVersion(CacheFile, V);
   output = SubstVar(output, "${Version}", VersionStr);
   output = SubstVar(output, "${Origin}", GetArchiveSuite(CacheFile, V));

   std::string StatusStr = ""; (2)
   if (P->CurrentVer != 0)
   {
      if (P.CurrentVer() == V)
      {
         if (state.Upgradable() && state.CandidateVer != NULL)
            strprintf(StatusStr, _("[installed,upgradable to: %s]"),
                  CandidateVerStr.c_str());
         else if (V.Downloadable() == false)
            StatusStr = _("[installed,local]");
         else if(V.Automatic() == true && state.Garbage == true)
            StatusStr = _("[installed,auto-removable]");
         else if ((state.Flags & pkgCache::Flag::Auto) == pkgCache::Flag::Auto)
            StatusStr = _("[installed,automatic]");
         else
            StatusStr = _("[installed]");
      }
      else if (state.CandidateVer == V && state.Upgradable())
         strprintf(StatusStr, _("[upgradable from: %s]"),
               InstalledVerStr.c_str());
   }
   else if (V.ParentPkg()->CurrentState == pkgCache::State::ConfigFiles)
      StatusStr = _("[residual-config]");
   output = SubstVar(output, "${apt:Status}", StatusStr);
   output = SubstVar(output, "${color:highlight}",
                     _config->Find("APT::Color::Highlight", ""));
   output = SubstVar(output, "${color:neutral}",
                     _config->Find("APT::Color::Neutral", ""));
   output = SubstVar(output, "${Description}",
                     GetShortDescription(CacheFile, records, P));
   output = SubstVar(output, "${LongDescription}",
                     GetLongDescription(CacheFile, records, P));
   output = SubstVar(output, "${ }${ }", "${ }"); (3)
   output = SubstVar(output, "${ }\n", "\n"); (3)
   output = SubstVar(output, "${ }", " "); (3)

   out << output;
}
1 The function ignores which fields are present in the output format and thus will try to replace all of them. If a field is missing, the replacement will do nothing.
2 The code uses the state information present in depPkgCache to determine if the package is installed, or upgradable, and so on.
3 The code ensures no remaining braces are left.

We will close the APT section by covering the most useful command.

apt install

The entry point is the function DoInstall() which is called by various commands: install, reinstall, remove, purge, …​ The code will be simplified to keep only the installation usage.

apt-private/private-install.cc
bool DoInstall(CommandLine &CmdL)
{
   CacheFile Cache;

   // Covered in step 1
   if (Cache.OpenForInstall() == false)
      return false;

   std::set<pkgCache::VerIterator> verset;

   // Covered in step 2
   if (!DoCacheManipulationFromCommandLine(CmdL, Cache, verset))
   {
      return false;
   }

   // Covered in step 3
   /* Print out a list of packages that are going to be installed extra
      to what the user asked */
   if (Cache->InstCount() != verset.size()) (1)
      std::list<pkgCache::PkgIterator> extras;
      for (pkgCache::PkgIterator I = Cache->PkgBegin(); I.end() != true; ++I)
      {
        if ((*Cache)[Pkg].Install() == false)
           continue;
        pkgCache::VerIterator const Cand =
          (*Cache)[Pkg].CandidateVerIter(*Cache);
        if (verset->find(Cand) == verset->end())
        {
           extra.insert(Pkg);
        }
      }
      ShowList(_("The following additional packages will be installed:"),
               extras);

   /* Print out a list of suggested and recommended packages */
   {
      std::list<std::string> Recommends, Suggests, SingleRecommends, SingleSuggests;
      for (auto const &Pkg: pkgCache::PkgIterator(*Cache))
      {
         /* Just look at the ones we want to install */
         if ((*Cache)[Pkg].Install() == false)
           continue;

         // get the recommends/suggests for the candidate ver
         pkgCache::VerIterator CV = (*Cache)[Pkg].CandidateVerIter(*Cache);
         for (pkgCache::DepIterator D = CV.DependsList(); D.end() == false; )
         {
            pkgCache::DepIterator Start;
            pkgCache::DepIterator End;
            D.GlobOr(Start, End); // advances D
            if (Start->Type != pkgCache::Dep::Recommends &&
                Start->Type != pkgCache::Dep::Suggests)
               continue;

            std::string target;
            for (pkgCache::DepIterator I = Start; I != D; ++I)
            {
               if (target.empty() == false)
                  target.append(" | ");
               target.append(I.TargetPkg().FullName(true));
            }
            std::list<std::string> &Type =
                Start->Type == pkgCache::Dep::Recommends ?
                  Recommends :
                  Suggests;
            if (std::find(Type.begin(), Type.end(), target) != Type.end())
               continue;
            Type.push_back(target);
         }

      }
      ShowList(_("Suggested packages:"), Suggests);
      ShowList(_("Recommended packages:"), Recommends);
   }

   bool result;

   // Covered in step 4
   result = InstallPackages(Cache, false);

   return result;
}
1 The package problem resolver is launched during step 2 and can add new packages to install to satisfy dependencies. Therefore, the number of packages to install can be different from the number of packages specified in the command line.
  1. Load the APT cache

The first step is without surprise to load the APT Cache using the method pkgCacheFile::Open() which reuses methods we have already discussed before.

apt-pkg/cachefile.cc
bool pkgCacheFile::Open(OpProgress *Progress)
{
   if (BuildCaches(Progress) == false)
      return false;

   if (BuildPolicy(Progress) == false)
      return false;

   if (BuildDepCache(Progress) == false)
      return false;

   if (Progress != NULL)
      Progress->Done();
   if (_error->PendingError() == true)
      return false;

   return true;
}
  1. Determine the packages to install

Installing a package can also means uninstalling some other packages. Maybe the new version of a package stops using a dependency that was used only by this package and APT will try to autoremove it. The code is therefore a little more complicated.

For this step, we ignore most of these problems and focus on the installation of new packages with only new dependencies to install. The code will be adapted in consequence.

For every package to install, the code will update the state in pkgDepCache using the function Cache→GetDepCache()→SetCandidateVersion() and Cache.MarkInstall(). After that, the code executes the pkgProblemResolver. The goal is to fix broken packages, that is packages with missing or conflicting dependencies if the installation continues. The code is huge with more than 1000 lines of code. To give you an idea of the kind of constraints the resolver must satisfy, here are the relevant fields for a common package:

Package: nginx-core
Description: nginx web/proxy server (standard version)
Version: 1.18.0-6+b1
Architecture: amd64
Replaces: nginx-full (<< 1.18.0-1)
Depends: libnginx-mod-http-geoip (= 1.18.0-6+b1), nginx-common (= 1.18.0-6),
  iproute2, libc6 (>= 2.28), libcrypt1 (>= 1:4.1.0), libpcre3,
  libssl1.1 (>= 1.1.1), zlib1g (>= 1:1.1.4)
Suggests: nginx-doc (= 1.18.0-6)
Conflicts: nginx-extras, nginx-light
Breaks: nginx (<< 1.4.5-1), nginx-full (<< 1.18.0-1)

The code documentation recognizes that the code has become complex and very sophisticated over time. Moreover, the resolver may even not be able to fix all broken packages. Packages may be missing and conflicts may still exist. Check the function pkgProblemResolver::ResolveInternal() defined in apt-pkg/algorithms.cc for more details.

apt-private/private-install.cc
bool DoCacheManipulationFromCommandLine(CommandLine &CmdL, CacheFile &Cache,
                                        std::set<APT::VersionSet> &verset)
{
   std::unique_ptr<pkgProblemResolver> Fix(nullptr);
   Fix.reset(new pkgProblemResolver(Cache));

   for (const char **I = CmdL.FileList + 1; *I != 0; ++I) { (1)
      pkgCache::GrpIterator Grp = Cache.GetPkgCache()->FindGrp(pkg);
      verset.insert(Grp.FindPreferredPkg())
   }

   TryToInstall InstallAction(Cache, Fix.get());

   for (unsigned short i = 0; order[i] != 0; ++i)
   {
      InstallAction = std::for_each(verset.begin(), verset.end(), InstallAction); (2)
   }

   // Call the scored problem resolver
   OpTextProgress Progress(*_config);
   bool resolver_fail = Fix->Resolve(true, &Progress); (3)

   if (resolver_fail == false)
      return false;

   return true;
}
1 Add one to CmdL.FileList to skip the install command name.
2 Mark this package version to be installed.
3 Ensure the resolver fixed the broken packages before continuing the installation.
  1. Ask confirmation for additional packages to install

This step simply iterates over the package to install and inspects the calculated dependencies list to keep packages present in the fields Recommends and Suggests. The “recommended” dependencies are the most important and considerably improve the functionality offered by the package (these recommended packages are now installed by default by APT).

Here is an example of a package with recommended and suggested packages:

/var/lib/apt/lists/deb.debian.org_debian_dists_buster_main_binary-amd64_Packages
...
Package: ngraph-gtk
Version: 6.09.01-1
Maintainer: Hiroyuki Ito <ZXB01226@nifty.com>
Architecture: amd64
Depends: libc6 (>= 2.4), libngraph0 (>= 6.07.02)
Recommends: ngraph-gtk-addins, ngraph-gtk-doc
Suggests: fonts-liberation
Description: create scientific 2-dimensional graphs
...

Note that dependencies of a package can also have recommended and suggested packages, and so on. Therefore, the final list displayed to the user is often pretty long:

vagrant# apt install ngraph-gtk
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  adwaita-icon-theme at-spi2-core dbus-user-session dconf-gsettings-backend
  dconf-service fontconfig fontconfig-config fonts-dejavu-core glib-networking
  glib-networking-common glib-networking-services gsettings-desktop-schemas
  gtk-update-icon-cache hicolor-icon-theme libatk-bridge2.0-0 libatk1.0-0
  libatk1.0-data libatspi2.0-0 libavahi-client3 libavahi-common-data
  libavahi-common3 libcairo-gobject2 libcairo2 libcolord2 libcups2
  libdatrie1 libdconf1 libdeflate0 libepoxy0 libfontconfig1 libfribidi0
  libgdk-pixbuf-2.0-0 libgdk-pixbuf-xlib-2.0-0 libgdk-pixbuf2.0-0
  libgdk-pixbuf2.0-bin libgdk-pixbuf2.0-common libgraphite2-3 libgsl25
  libgslcblas0 libgtk-3-0 libgtk-3-bin libgtk-3-common libgtksourceview-4-0
  libgtksourceview-4-common libharfbuzz0b libjbig0 libjpeg62-turbo
  libjson-glib-1.0-0 libjson-glib-1.0-common liblcms2-2 libngraph0
  libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpixman-1-0
  libproxy1v5 librest-0.7-0 librsvg2-2 librsvg2-common libsoup-gnome2.4-1
  libsoup2.4-1 libthai-data libthai0 libtiff5 libwayland-client0
  libwayland-cursor0 libwayland-egl1 libwebp6 libx11-6 libx11-data libxau6
  libxcb-render0 libxcb-shm0 libxcb1 libxcomposite1 libxcursor1 libxdamage1
  libxdmcp6 libxext6 libxfixes3 libxi6 libxinerama1 libxkbcommon0 libxrandr2
  libxrender1 libxtst6 ngraph-gtk-addins ngraph-gtk-addins-base
  ngraph-gtk-doc shared-mime-info x11-common xkb-data
Suggested packages:
  colord cups-common gsl-ref-psdoc | gsl-doc-pdf | gsl-doc-info |
  gsl-ref-html gvfs liblcms2-utils fonts-liberation librsvg2-bin
The following NEW packages will be installed:
  adwaita-icon-theme at-spi2-core dbus-user-session dconf-gsettings-backend
  dconf-service fontconfig fontconfig-config fonts-dejavu-core glib-networking
  glib-networking-common glib-networking-services gsettings-desktop-schemas
  gtk-update-icon-cache hicolor-icon-theme libatk-bridge2.0-0 libatk1.0-0
  libatk1.0-data libatspi2.0-0 libavahi-client3 libavahi-common-data
  libavahi-common3 libcairo-gobject2 libcairo2 libcolord2 libcups2 libdatrie1
  libdconf1 libdeflate0 libepoxy0 libfontconfig1 libfribidi0
  libgdk-pixbuf-2.0-0 libgdk-pixbuf-xlib-2.0-0 libgdk-pixbuf2.0-0
  libgdk-pixbuf2.0-bin libgdk-pixbuf2.0-common libgraphite2-3 libgsl25
  libgslcblas0 libgtk-3-0 libgtk-3-bin libgtk-3-common libgtksourceview-4-0
  libgtksourceview-4-common libharfbuzz0b libjbig0 libjpeg62-turbo
  libjson-glib-1.0-0 libjson-glib-1.0-common liblcms2-2 libngraph0
  libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpixman-1-0
  libproxy1v5 librest-0.7-0 librsvg2-2 librsvg2-common libsoup-gnome2.4-1
  libsoup2.4-1 libthai-data libthai0 libtiff5 libwayland-client0
  libwayland-cursor0 libwayland-egl1 libwebp6 libx11-6 libx11-data libxau6
  libxcb-render0 libxcb-shm0 libxcb1 libxcomposite1 libxcursor1 libxdamage1
  libxdmcp6 libxext6 libxfixes3 libxi6 libxinerama1 libxkbcommon0 libxrandr2
  libxrender1 libxtst6 ngraph-gtk ngraph-gtk-addins ngraph-gtk-addins-base
  ngraph-gtk-doc shared-mime-info x11-common xkb-data
0 upgraded, 93 newly installed, 0 to remove and 11 not upgraded.
Need to get 38.5 MB of archives.
After this operation, 137 MB of additional disk space will be used.
Do you want to continue? [Y/n]

We can confirm from the previous output that recommended packages are well installed by default.

  1. Proceed to the installation

The last step is managed by the function InstallPackages:

apt-private/private-install.cc
bool InstallPackages(CacheFile &Cache, bool ShwKept, bool Ask)
{
   // Create the download object
   aptAcquireWithTextStatus Fetcher;
   if (Fetcher.GetLock(_config->FindDir("Dir::Cache::Archives")) == false) (1)
      return false;

   // Read the source list
   if (Cache.BuildSourceList() == false)
      return false;
   pkgSourceList * const List = Cache.GetSourceList();

   // Create the text record parser
   pkgRecords Recs(Cache);
   if (_error->PendingError() == true)
      return false;

   // Create the package manager and prepare to download
   std::unique_ptr<pkgPackageManager> PM(_system->CreatePM(Cache)); (2)
   if (PM->GetArchives(&Fetcher, List, &Recs) == false ||
       _error->PendingError() == true)
      return false;

   auto const FetchBytes = Fetcher.FetchNeeded(); (3)
   auto const FetchPBytes = Fetcher.PartialPresent(); (3)

   // Size delta
   ioprintf(c1out,_("After this operation, %sB of additional disk space " +
                    "will be used.\n"),
            SizeToStr(Cache->UsrSize()).c_str());

   if (_error->PendingError() == true)
      return false;

   // Prompt to continue
   if (Ask == true || Fail == true) (4)
   {
      if (_config->FindI("quiet", 0) < 2 &&
            _config->FindB("APT::Get::Assume-Yes", false) == false)
      {
         if (YnPrompt(_("Do you want to continue?")) == false)
         {
            cout << _("Abort.") << std::endl;
            exit(1);
         }
      }
   }

   // Run it
   bool Failed = false;
   while (1)
   {
      bool Transient = false;
      if (AcquireRun(Fetcher, 0, &Failed, &Transient) == false)
         return false;

      if (Failed == true && _config->FindB("APT::Get::Fix-Missing",false) == false)
         return _error->Error(_("Unable to fetch some archives, " +
           "maybe run apt-get update or try with --fix-missing?"));

      auto const progress = APT::Progress::PackageManagerProgressFactory();
      _system->UnLockInner(); (5)
      pkgPackageManager::OrderResult const Res = PM->DoInstall(progress);
      delete progress;

      if (Res == pkgPackageManager::Failed || _error->PendingError() == true)
         return false;
      if (Res == pkgPackageManager::Completed)
         break;

      _system->LockInner();

      Fetcher.Shutdown();
      if (PM->GetArchives(&Fetcher, List, &Recs) == false)
         return false;

      Failed = false;
   }

   std::set<std::string> const disappearedPkgs = PM->GetDisappearedPackages();
   if (disappearedPkgs.empty() == false) (6)
   {
      ShowList(c1out, P_("The following package disappeared from your system as\n"
               "all files have been overwritten by other packages:",
               "The following packages disappeared from your system as\n"
               "all files have been overwritten by other packages:",
               disappearedPkgs.size()), disappearedPkgs,
            [](std::string const &Pkg) { return Pkg.empty() == false; },
            [](std::string const &Pkg) { return Pkg; },
            [](std::string const &) { return std::string(); });
      cout << _("Note: This is done automatically and on purpose by dpkg.") << std::endl;
   }

   return true;
}
1 APT acquires a lock using the fcntl() system call which is used to manipulate file descriptors. When called using the flag F_SETLK, the call returns -1 if the lock is already held by another process.
2 APT supports multiple package managers but the default is the dpkg command. APT uses the class debSystem and the associated pkgDPkgPM to interact with the dpkg command.
3 The Acquire subsystem is reused to download the archives. Internally, the code keeps for every item to retrieve two fields FileSize and PartialSize, which are the size of the object to fetch and how much was already fetched. The methods Fetcher.FetchNeeded() and Fetcher.FetchPartial() iterates over the items to determine the total values.
4 APT asks for confirmation before proceeding to the installation, except if you use options like apt -y install.
5 Unlock Dpkg lock /var/lib/dpkg/lock to make sure the dpkg command can use it.
6 The package manager reads the /var/lib/dpkg/status to found out the packages that were removed because none of their files was referenced by another package.

The installation logic is implemented by the class pkgDPkgPM.

apt-pkg/deb/dpkgpm.h
class pkgDPkgPM : public pkgPackageManager
{
   protected:

   // progress reporting
   struct DpkgState
   {
      const char *state;     // the dpkg state (e.g. "unpack")
      const char *str;       // the human readable translation of the state
   };

   // the dpkg states that the pkg will run through, the string is
   // the package, the vector contains the dpkg states that the package
   // will go through
   std::map<std::string,std::vector<struct DpkgState> > PackageOps;
   // the dpkg states that are already done; the string is the package
   // the int is the state that is already done (e.g. a package that is
   // going to be install is already in state "half-installed")
   std::map<std::string,unsigned int> PackageOpsDone;

   // progress reporting
   unsigned int PackagesDone;
   unsigned int PackagesTotal;

   public:
   struct Item
   {
      enum Ops {Install, Configure, Remove, Purge, ConfigurePending, TriggersPending,
         RemovePending, PurgePending } Op;
      std::string File;
      PkgIterator Pkg;
      Item(Ops Op,PkgIterator Pkg,std::string File = "") : Op(Op),
            File(File), Pkg(Pkg) {};
      Item() {};
   };
   protected:
   std::vector<Item> List; (1)

   virtual bool Install(PkgIterator Pkg,std::string File) override; (2)
   virtual bool Configure(PkgIterator Pkg) override;
   virtual bool Remove(PkgIterator Pkg,bool Purge = false) override;

   virtual bool Go(APT::Progress::PackageManager *progress) override; (3)

   virtual void Reset() override;

   public:

   explicit pkgDPkgPM(pkgDepCache *Cache);
   virtual ~pkgDPkgPM();

   APT_HIDDEN static bool ExpandPendingCalls(std::vector<Item> &List, pkgDepCache &Cache);
};
1 The package manager keeps a list of actions to perform.
2 The method Install simply appends a new item in List of type Install.
3 The method Go reads the list of actions and execute them.

The only remaining code is the dpkg command execution:

apt-pkg/deb/dpkgpm.cc
bool pkgDPkgPM::Go(APT::Progress::PackageManager *progress)
{
   ...

   // Generate the base argument list for dpkg
   std::vector<const char *> Args = { "dpkg" };

   // this loop is runs once per dpkg operation
   vector<Item>::const_iterator I = List.cbegin();
   while (I != List.end())
   {

      int fd[2];
      if (pipe(fd) != 0)
         return _error->Errno("pipe","Failed to create IPC pipe to dpkg");

      ADDARGC("--status-fd");
      char status_fd_buf[20];
      snprintf(status_fd_buf,sizeof(status_fd_buf),"%i", fd[1]);
      ADDARG(status_fd_buf);
      unsigned long const Op = I->Op;

      switch (I->Op)
      {
         // Skip other operations

         case Item::Install:
         ADDARGC("--unpack");
         ADDARGC("--auto-deconfigure");
         break;
      }

      // Write in the file or package name
      if (I->Op == Item::Install)
      {
         if (I->File[0] != '/')
               return _error->Error("Internal Error, " +
               "Pathname to install is not absolute '%s'", I->File.c_str());
            Args.push_back(I->File.c_str());
      }

      pid_t Child = ExecFork(fd[1]); (1)
      if (Child == 0)
      {
         // This is the child
         close(fd[0]); // close the read end of the pipe

         debSystem::DpkgChrootDirectory();

         if (chdir(_config->FindDir("DPkg::Run-Directory","/").c_str()) != 0)
            _exit(100);

         execvp(Args[0], (char**) &Args[0]); (1)
         cerr << "Could not exec dpkg!" << endl;
         _exit(100);
      }

      // we read from dpkg here
      int const _dpkgin = fd[0];
      close(fd[1]); // close the write end of the pipe

      // the result of the waitpid call
      int Status = 0;
      int res;
      bool waitpid_failure = false;
      bool dpkg_finished = false;
      do
      {
         if (dpkg_finished == false)
         {
            if ((res = waitpid(Child, &Status, WNOHANG)) == Child) (1)
               dpkg_finished = true;
            else if (res < 0)
            {
               // error handling, waitpid returned -1
               if (errno == EINTR)
                  continue;
               waitpid_failure = true;
               break;
            }
         }
         if (dpkg_finished)
            break;

      } while (true);

      if (waitpid_failure == true)
      {
         strprintf(d->dpkg_error, "Sub-process %s couldn't be waited for.",
                   Args[0]);
         _error->Error("%s", d->dpkg_error.c_str());
         break;
      }

      ...
   }
}
1 The code is a classic example of Linux programming. The code uses the system calls fork(), exec(), and wait() to delegate to the command dpkg.

After the dpkg command has run, the APT cache will still have to be updated as the state of some packages has been updated. There is nothing really new and we can stop our inspection of the APT code.

Case Study

Like for other parts, we will write a minimal version of the command apt install in Go. We will not bother with a cache and simply read the Debian repositories systematically.

To test our program, we need a basic package so that we can focus on the core logic of the APT installation process without having to support advanced logics. We will use a new version of our package hello (the code is available in the companion GitHub repository):

vagrant# tree /vagrant/hello/3.1-1/
3.1-1/
|-- DEBIAN
|   `-- control
`-- usr
    `-- bin
        `-- hello

vagrant# cat /vagrant/hello/3.1-1/DEBIAN/control
Package: hello
Version: 3.1-1
Section: base
Priority: optional
Architecture: amd64
Maintainer: Julien Sobczak
Description: Say Hello
Depends: cowsay (1)

vagrant# cat /vagrant/hello/3.1-1/usr/bin/hello
#!/bin/bash
echo "hello world" | /usr/games/cowsay (2)
1 Declare a required dependency available in the standard Debian repository.
2 Use the binary installed by this dependency.

To build the new package:

vagrant# $ dpkg --build 3.1-1 hello_3.1-1_amd64.deb (1)
1 We use the command dpkg but we could also have used our Go version created in the first part.

Example of installation using APT:

vagrant# apt install /vagrant/hello/hello_3.1-1_amd64.deb
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  cowsay
Suggested packages:
  filters cowsay-off
The following NEW packages will be installed:
  cowsay hello
0 upgraded, 2 newly installed, 0 to remove and 11 not upgraded.
After this operation, 94.2 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y

Get:1 /vagrant/hello/hello_3.1-1_amd64.deb hello amd64 3.1-1 [20.7 kB]
Get:2 http://deb.debian.org/debian bullseye/main amd64 cowsay all 3.03+dfsg2-8 [21.4 kB]
Fetched 21.4 kB in 0s (66.6 kB/s)
Selecting previously unselected package cowsay.
(Reading database ... 34384 files and directories currently installed.)
Preparing to unpack .../cowsay_3.03+dfsg2-8_all.deb ...
Unpacking cowsay (3.03+dfsg2-8) ...
Selecting previously unselected package hello.
Preparing to unpack .../hello/hello_3.1-1_amd64.deb ...
preinst says hello
Unpacking hello (3.1-1) ...
Setting up cowsay (3.03+dfsg2-8) ...
Setting up hello (3.1-1) ...
postinst says hello
Processing triggers for man-db (2.9.4-2) ...

vagrant# hello
 _____________
< hello world >
 -------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

The challenge is to install the same package using a basic Go program. We will reuse the dpkg version we wrote in Go.

Here is the code:

main.go
package main

import (
    "bufio"
    "bytes"
    "crypto/md5"
    "crypto/sha256"
    "fmt"
    "io"
    "io/ioutil"
    "net/http"
    "os"
    "os/exec"
    "path/filepath"
    "regexp"
    "strings"
    "sync"

    "github.com/julien-sobczak/deb822"
    "github.com/ulikunitz/xz"
    "golang.org/x/crypto/openpgp"
    "golang.org/x/crypto/openpgp/clearsign"
)

// The command `apt install` requires more code
// than our previous implementation of the command `dpkg`.
// We will introduce the different components successively.

///////////////////////////////////////////////////////////

//
// The Acquire subsystem
//

// Apt accepts package names and needs to retrieve their archives
// from repositories, commonly using HTTP.
// The pkgAcquire struct downloads the various required files
// using a pool of worker to process each item to download.
// Like the real implementation, this system is not a generic downloader
// but contains some Apt logic.

type pkgAcquire struct {
    // The downloaded items are used to populate the Apt cache
    cacheFile *CacheFile

    // The items still not finished.
    pendingJobs int
    jobs        chan Item
    results     chan error
    // Workers are run in goroutines and push new items.
    jobsMutex sync.Mutex
}

// There are different types of files to retrieve from an Apt repository:
// - `InRelease`: the metadata about the repository.
// - `Packages`: the list of packages present in the repository.
// - `.deb` files: the archives to install using `dpkg`.
// Each item is accessible from an URI, must be stored locally, and required
// some postprocessing like checking the integrity of the files to prevent
// MITM attacks.

type Item interface {
    // DownloadURI returns the URI to retrieve the item.
    DownloadURI() string

    // DestFile returns the path where the file
    // represented by the URI must be written.
    DestFile(uri string) string

    // Done is called when the file has been downloaded.
    // This function updates the cache with the retrieved item
    // and can trigger new downloads.
    Done(c *CacheFile, a *pkgAcquire) error
}

// We will detail each type after the implementation of pkgAcquire.

// NewPkgAcquire initializes the Acquire system.
func NewPkgAcquire(c *CacheFile) *pkgAcquire {
    a := &pkgAcquire{
        cacheFile:   c,
        pendingJobs: 0,
        jobs:        make(chan Item, 1000),
        results:     make(chan error, 1000),
    }

    // Start the workers responsible to process the items in `jobs`.
    for w := 1; w <= 2; w++ {
        go a.worker(w, a.jobs, a.results)
    }

    return a
}

// Add is used to request the downloading of a new item.
// New items are simply send to the `jobs` channel.
func (a *pkgAcquire) Add(item Item) {
    // The function is called from different goroutines.
    // We use a lock to prevent data inconsistencies.
    a.jobsMutex.Lock()
    a.jobs <- item
    a.pendingJobs++
    a.jobsMutex.Unlock()
}

// A worker simply reads from the `jobs` channel and uses the different
// methods defined by `Item` to know what to do.

func (a *pkgAcquire) worker(id int, jobs <-chan Item, results chan<- error) {
    for item := range jobs {
        results <- a.downloadItem(item)
    }
}

func (a *pkgAcquire) downloadItem(item Item) error {
    uri := item.DownloadURI()
    dest := item.DestFile(uri)

    // Download the file
    resp, err := http.Get(uri)
    if err != nil {
        fmt.Printf("Err: %v\n\t%s\n", item, err)
        return err
    }
    defer resp.Body.Close()

    // Create the local file
    os.MkdirAll(filepath.Dir(dest), 0755)

    out, err := os.Create(dest)
    if err != nil {
        return err
    }
    defer out.Close()

    // Copy the body to the local file
    io.Copy(out, resp.Body)

    fmt.Printf("Get: %v\n", item)

    return item.Done(a.cacheFile, a)
}

// There is one remaining method to cover.
// The Acquire system will try to download items in parallel
// but the code often need to block until all items have been downloaded
// to continue. The next function is used to wait.

/**
 * Run downloads all items that have been added to this
 * download process.
 *
 * This method will block until the download completes.
 */
func (a *pkgAcquire) Run() error {
    var errors []string
    var err error

    for {
        // Exit when there are no more remaining jobs
        a.jobsMutex.Lock()
        if a.pendingJobs == 0 {
            a.jobsMutex.Unlock()
            break
        }
        a.jobsMutex.Unlock()

        // Search for errors in the results
        err = <-a.results
        if err != nil {
            errors = append(errors, err.Error())
        }

        a.jobsMutex.Lock()
        a.pendingJobs--
        a.jobsMutex.Unlock()
    }

    if len(errors) > 0 {
        return fmt.Errorf(strings.Join(errors, "\n"))
    }

    return nil
}

// That's all for the Acquire system. What remains is the implementation
// of the various types of Item.

///////////////////////////////////////////////////////////

/*
 * The first kind of `Item` we have to download are `InRelease` files.
 * These files contain metadata about other index files (ex: `Packages`)
 * present in the same repository and are used to check the integrity
 * of these files.
 */

type MetaIndexItem struct { // InRelease/Release files
    // The Debian source pointing to this repository.
    // The source contains fields required to determine the target URI.
    source *pkgSource
}

func NewMetaIndexItem(source *pkgSource) *MetaIndexItem {
    return &MetaIndexItem{
        source: source,
    }
}

func (i *MetaIndexItem) DownloadURI() string {
    // Ex: http://deb.debian.org/debian/dists/buster/InRelease
    return i.source.URI + "/dists/" + i.source.Dist + "/InRelease"
}

func (i *MetaIndexItem) DestFile(uri string) string {
    // Ex: /var/lib/apt/lists/deb.debian.org_debian_dists_buster_InRelease
    s := i.source
    return "/var/lib/apt/lists/" +
        fmt.Sprintf("%s.%s_InRelease", s.EscapedURI(), s.Dist)
}

func (i *MetaIndexItem) Done(c *CacheFile, acq *pkgAcquire) error {
    s := i.source

    filePath := i.DestFile(s.URI)

    // 1. Check the file integrity

    // Apt loads all GPG keys under /etc/apt/trusted.gpg.d/.
    // Here, for simplicity, we load only the single key we really need:
    // /etc/apt/trusted.gpg.d/debian-archive-buster-stable.gpg
    publicKey := fmt.Sprintf(
        "/etc/apt/trusted.gpg.d/debian-archive-%s-stable.gpg", s.Dist)
    decodedContent, err := gpgDecode(filePath, publicKey)
    if err != nil {
        return fmt.Errorf("the following signature couldn't be verified %s\n%v",
            filePath, err)
    }

    // 2. Parse the content to extract metadata like the checksums
    // for other files to download
    parser, err := deb822.NewParser(strings.NewReader(string(decodedContent)))
    if err != nil {
        return fmt.Errorf("malformed Release file: %v", err)
    }
    doc, err := parser.Parse()
    if err != nil {
        return fmt.Errorf("malformed Release file: %v", err)
    }

    // Extract values
    s.doc = doc.Paragraphs[0]
    s.Codename = s.doc.Value("Codename") // Ex: buster
    s.Suite = s.doc.Value("Suite")       // Ex: stable
    s.Origin = s.doc.Value("Origin")     // Ex: Debian
    s.Label = s.doc.Value("Label")       // Ex: Debian
    s.Entries = make(map[string]string)
    for _, entry := range strings.Split(s.doc.Value("MD5Sum"), "\n") {
        // Ex: 0233ae8f041ca0f1aa5a7f395d326e80    57365 contrib/Contents-all.gz
        fields := regexp.MustCompile(`\s+`).Split(entry, -1)
        relativePath := strings.TrimSpace(fields[2])
        md5sum := fields[0]
        s.Entries[relativePath] = md5sum
    }

    // 3. Download the `Packages` files
    acq.Add(NewIndexItem(s, "main", "amd64"))
    // The real code download other Packages files in addition
    // like the ones for the `contrib` and `non-free` components.

    return nil
}
func (i MetaIndexItem) String() string {
    // Ex: https://packages.grafana.com/oss/deb stable InRelease
    return fmt.Sprintf("%s stable InRelease", i.source.URI)
}

///////////////////////////////////////////////////////////

/*
 * The second kind of Item we have to download are index files
 * (Packages and Sources files).
 * In this implementation, we are ignoring Sources index files.
 * Packages index files list the Debian control files (DEBIAN/control)
 * with a few additional fields for every .deb package available.
 */

type IndexItem struct { // `Packages`/`Sources` files
    source       *pkgSource
    component    string // Ex: main, free or non-free
    architecture string // Ex: amd64

}

func NewIndexItem(source *pkgSource,
    component string, architecture string) *IndexItem {
    return &IndexItem{
        source:       source,
        component:    component,
        architecture: architecture,
    }
}

func (i *IndexItem) DownloadURI() string {
    // Ex: http://deb.debian.org/debian/dists/buster/main/binary-all/Packages.xz
    return i.source.URI + "/dists/" + i.source.Dist + "/" + i.component +
        "/binary-" + i.architecture + "/Packages.xz"
}

func (i *IndexItem) DestFile(uri string) string {
    // Ex: /var/lib/apt/lists/
    //       deb.debian.org_debian_dists_buster_main_binary-amd64_Packages.xz
    s := i.source
    return "/var/lib/apt/lists/" + fmt.Sprintf("%s.%s_%s_binary-%s_Packages.xz",
        s.EscapedURI(), s.Dist, i.component, i.architecture)
}

func (i *IndexItem) Done(c *CacheFile, a *pkgAcquire) error {
    s := i.source
    path := i.DestFile(s.URI)

    // 1. Read the file
    file, err := os.Open(path)
    if err != nil {
        return fmt.Errorf("missing file: %v", err)
    }
    defer file.Close()

    b, err := ioutil.ReadAll(file)
    if err != nil {
        return fmt.Errorf("unable to open file %s: %v", path, err)
    }

    // 2. Check integrity
    hash := md5.New()
    if _, err := io.Copy(hash, bytes.NewReader(b)); err != nil {
        return fmt.Errorf("unable to determine MD5 sum: %s", err)
    }
    md5sum := fmt.Sprintf("%x", hash.Sum(nil))
    md5sumRef := s.Entries[i.EntryName()]
    if md5sum != md5sumRef {
        return fmt.Errorf("found MD5 mismatch: %v != %v", md5sum, md5sumRef)
    }

    // 3. Extract content
    r, err := xz.NewReader(bytes.NewReader(b))
    if err != nil {
        return fmt.Errorf("unable to open xz file: %v", err)
    }
    content, err := io.ReadAll(r)
    if err != nil {
        return fmt.Errorf("unable to read index file content: %v", err)
    }

    // 4. Parse content
    parser, err := deb822.NewParser(strings.NewReader(string(content)))
    if err != nil {
        return fmt.Errorf("malformed index file: %v", err)
    }
    doc, err := parser.Parse()
    if err != nil {
        return fmt.Errorf("malformed index file: %v", err)
    }

    // 5. Add the package into the Apt cache.
    for _, paragraph := range doc.Paragraphs {
        c.AddPackage(&Package{
            doc:    paragraph,
            source: s,
        })
    }

    return nil
}

// EntryName returns the key in MD5Sum for this file in the Release file.
func (i IndexItem) EntryName() string {
    // Ex: main/binary-am64/Packages
    return fmt.Sprintf("%s/binary-%s/Packages.xz", i.component, i.architecture)
}

func (i IndexItem) String() string {
    // Ex: https://packages.grafana.com/oss/deb stable/main amd64 Packages
    return fmt.Sprintf("%s stable/%s %s Packages",
        i.source.URI, i.component, i.architecture)
}

///////////////////////////////////////////////////////////

/*
 * The last kind of Item we have to download are .deb archives that will
 * be passed to the dpkg command to proceed to the installation.
 * These files are downloaded under /var/cache/apt/archives/.
 */

type PackageItem struct { // `.deb` files
    // The package metadata associated with the archive to download.
    pkg *Package
}

func NewPackageItem(pkg *Package) *PackageItem {
    return &PackageItem{
        pkg: pkg,
    }
}

func (i *PackageItem) DownloadURI() string {
    // Ex: http://deb.debian.org/debian/pool/main/r/rsync/rsync_3.2.3_amd64.deb
    return i.pkg.source.URI + "/" + i.pkg.doc.Value("Filename")
}

func (i *PackageItem) DestFile(uri string) string {
    // Ex: /var/cache/apt/archives/rsync_3.2.3-4_amd64.deb
    pkg := i.pkg
    pkg.cacheFilepath = "/var/cache/apt/archives/" + filepath.Base(uri)
    return pkg.cacheFilepath
}

func (i *PackageItem) Done(c *CacheFile, a *pkgAcquire) error {
    // 1. Check file integrity
    f, err := os.Open(i.pkg.cacheFilepath)
    if err != nil {
        return err
    }
    defer f.Close()

    h := sha256.New()
    if _, err := io.Copy(h, f); err != nil {
        return err
    }

    indexChecksum := i.pkg.doc.Value("SHA256")
    effectiveChecksum := fmt.Sprintf("%x", h.Sum(nil))

    if indexChecksum != effectiveChecksum {
        return fmt.Errorf("invalid checksum for %s", i.pkg.cacheFilepath)
    }

    // 2. Nothing more to do.
    // The archive will be processed later when delegating to the `dpkg` command.

    return nil
}

func (i PackageItem) String() string {
    // Ex: https://grafana.com/oss/deb stable/main amd64 grafana amd64 7.5.5
    pkg := i.pkg
    return fmt.Sprintf("%s stable/main %s %s %s", pkg.source.URI,
        pkg.Name(), pkg.Architecture(), pkg.Version())
}

///////////////////////////////////////////////////////////

//
// The Apt Cache
//

// We try to using the same naming as for the real implementation
// using similar structs but containing only the main fields.

// CacheFile is the high-level component for the Apt cache.
type CacheFile struct {
    cache    *pkgCache
    depCache *pkgDepCache
    sources  []*pkgSource
}

// pkgCache contains all known packages
// (found in Dpkg database and in repositories)
type pkgCache struct {
    packages map[string]*Package // The key is the package name
}

// pkgDepCache contains the state information for every package
// (installed, to install, upgradable, ...).
type pkgDepCache struct {
    states map[string]*StateCache
    // The ordered list of packages waiting to be installed.
    order []string
}

// pkgSource represents a single line in a source.list file.
type pkgSource struct {
    doc deb822.Paragraph // `Release` file content

    // parsed from the sources.list file
    Type string
    URI  string
    Dist string

    // parsed from the Packages file
    Codename string
    Suite    string
    Origin   string
    Label    string
    Entries  map[string]string // Checksums of all repository files
}

// EscapedURI returns a name based on the URI that can be used in filename.
// Indeed, most retrieved files are stored under /var/lib/apt/
// and are named after their source.
func (s *pkgSource) EscapedURI() string {
    return strings.ReplaceAll(strings.TrimPrefix(s.URI, "http://"), "/", "_")
}

// The core of the Apt cache is the list of packages.

// Package is a Debian package.
type Package struct {
    // The metadata as present in `Packages` or `status` file
    doc deb822.Paragraph
    // The source where this package is coming from.
    // Can be undefined for already installed packages.
    source *pkgSource

    // The path under /var/cache/apt/packages.
    // Initialized after the download of the package.
    cacheFilepath string
}

// We expose a few additional methods to extract attributes
// from the underlying DEB822 document.

func (p *Package) Name() string {
    return p.doc.Value("Package")
}

func (p *Package) Version() string {
    return p.doc.Value("Version")
}

func (p *Package) Architecture() string {
    return p.doc.Value("Architecture")
}

func (p *Package) Depends() []Dependency {
    return ParseDependencies(p.doc.Value("Depends"))
}

func (p *Package) Suggests() []Dependency {
    return ParseDependencies(p.doc.Value("Suggests"))
}

type Dependency struct {
    Name     string
    Version  string
    Relation string
}

func ParseDependencies(values string) []Dependency {
    // Ex: "adduser, gpgv | gpgv2 | gpgv1, libapt-pkg5.0 (>= 1.7.0~alpha3~)"
    depsValues := strings.TrimSpace(values)
    if depsValues == "" {
        return nil
    }

    var deps []Dependency
    for _, value := range strings.Split(depsValues, ", ") {
        deps = append(deps, ParseDependency(value))
    }
    return deps
}

func ParseDependency(value string) Dependency {
    // Example of syntax:
    // "adduser", "gpgv | gpgv2", "libc6 (>= 2.15)",
    // "python3:any (>= 3.5~)", "foo [i386]", "perl:any", "perlapi-5.28.0"

    var dep Dependency

    r := regexp.MustCompile(`^(?P<name>[\w\.-]+)(?:[:]\w+)?` +
        `(?: [(](?P<relation>(?:>>|>=|=|<=|<<)) ` +
        `(?P<version>\S+)[)])?(?: [|].*)?$`)
    res := r.FindStringSubmatch(value)
    names := r.SubexpNames()
    for i, _ := range res {
        switch names[i] {
        case "name":
            dep.Name = res[i]
        case "relation":
            dep.Relation = res[i]
        case "version":
            dep.Version = res[i]
        }
    }
    return dep
}

// That's all for the different structures relating to the Apt cache.

///////////////////////////////////////////////////////////

// Now, we need to initialize the three main components.
// The first step is thus to create the array containing all known packages.
// This array will be populated in the successive steps.

func (c *CacheFile) BuildCaches() {
    c.cache = &pkgCache{
        packages: make(map[string]*Package),
    }
}

// The second step is to read the lists of sources to find the `Packages` files
// containing the list of available packages.
// So, we need a function to parse these local source files.

// ParseSourceFile parses a single source file.
// It only supports the common multi-line format,
// and not the most recent DEB822 format.
func ParseSourceFile(content string) []*pkgSource {
    var results []*pkgSource

    scanner := bufio.NewScanner(strings.NewReader(content))
    // Read line by line
    for scanner.Scan() {
        line := scanner.Text()
        if strings.TrimSpace(line) == "" {
            // Ignore blank lines
            continue
        }
        if strings.HasPrefix("#", line) {
            // Ignore comments
            continue
        }
        parts := strings.Split(line, " ")
        // Basic parser (ignore some options or unused attributes)
        source := &pkgSource{
            Type: parts[0],
            URI:  parts[1],
            Dist: parts[2],
        }
        results = append(results, source)
    }

    return results
}

// BuildSourceList parses every source file.
func (c *CacheFile) BuildSourceList() {
    var sources []*pkgSource

    // Read /etc/apt/sources.list
    mainPath := "/etc/apt/sources.list"
    if _, err := os.Stat(mainPath); !os.IsNotExist(err) {
        content, err := ioutil.ReadFile(mainPath)
        if err != nil {
            fmt.Printf("E: Unable to read source file\n\t%s\n", err)
            os.Exit(1)
        }
        sources = append(sources, ParseSourceFile(string(content))...)
    }

    // Read /etc/apt/sources.list.d/
    dirPath := "/etc/apt/sources.list.d/"
    if _, err := os.Stat(dirPath); !os.IsNotExist(err) {
        files, err := ioutil.ReadDir(dirPath)
        if err != nil {
            fmt.Printf("E: Unable to read source dir\n\t%s\n", err)
            os.Exit(1)
        }
        for _, file := range files {
            filePath := filepath.Join(dirPath, file.Name())
            content, err := ioutil.ReadFile(filePath)
            if err != nil {
                fmt.Printf("E: Unable to read source file\n\t%s\n", err)
                os.Exit(1)
            }
            sources = append(sources, ParseSourceFile(string(content))...)
        }
    }
    c.sources = sources
}

// The last step is to read the Dpkg database
// to determine the packages already installed.
// Therefore, we need a function to parse the status file.

func ParseStatus() (*deb822.Document, error) {
    f, err := os.Open("/var/lib/dpkg/status")
    if err != nil {
        return nil, err
    }
    parser, err := deb822.NewParser(f)
    if err != nil {
        return nil, err
    }
    statusContent, err := parser.Parse()
    if err != nil {
        return nil, err
    }
    return &statusContent, nil
}

func (c *CacheFile) BuildDepCache() {
    states := make(map[string]*StateCache)

    // Read /var/lib/dpkg/status
    status, err := ParseStatus()
    if err != nil {
        fmt.Printf("E: The package lists or status file could not be parsed.")
        os.Exit(1)
    }

    // Add state for packages already installed
    for _, pkg := range status.Paragraphs {
        // The status file also contains packages
        // that were partially installed or removed.
        if !strings.Contains(pkg.Value("Status"), "installed") {
            continue
        }
        state, ok := states[pkg.Value("Package")]
        if !ok {
            state = &StateCache{}
            states[pkg.Value("Package")] = state
        }
        state.CurrentVersion = pkg.Value("Version")
    }

    c.depCache = &pkgDepCache{
        states: states,
    }
}

///////////////////////////////////////////////////////////

// We now have the three functions required to initialize the Apt cache.
// We will hide them behind a simple method.

func (c *CacheFile) Open() {
    // Initialize the Acquire system to download file from repositories
    acq := NewPkgAcquire(c)

    // Initialize the cache structure
    if c.sources == nil {
        c.BuildCaches()
        c.BuildSourceList()
        c.BuildDepCache()
    }

    // Download items from repositories
    for _, source := range c.sources {
        if source.Type == "deb-src" {
            continue // We are interested only in binary packages
        }
        acq.Add(NewMetaIndexItem(source))
    }

    // Wait for all items to be downloaded to return
    err := acq.Run()
    if err != nil {
        fmt.Printf("E: Unable to fetch resources\n\t%s\n", err)
        os.Exit(1)
    }
}

// As we have implemented before, the cache content is populated
// from the `Done()` methods of the different types of `Item`.
// We need to expose additional methods to easily add or retrieve
// these packages and their state.

func (c *CacheFile) AddPackage(p *Package) {
    c.cache.packages[p.Name()] = p
}

func (c *CacheFile) GetPackage(name string) *Package {
    if p, ok := c.cache.packages[name]; ok {
        return p
    }
    return nil
}

func (c *CacheFile) GetPackages() []*Package {
    values := make([]*Package, 0, len(c.cache.packages))
    for _, v := range c.cache.packages {
        values = append(values, v)
    }
    return values
}

func (c *CacheFile) GetState(pkg *Package) *StateCache {
    var state *StateCache
    state, ok := c.depCache.states[pkg.Name()]
    if !ok {
        // Only the state of installed packages is present.
        // We defer the initialization for other packages until
        // the first access.
        state = &StateCache{
            CandidateVersion: pkg.Version(),
            flagInstall:      false,
        }
        c.depCache.states[pkg.Name()] = state
    }
    return state
}

///////////////////////////////////////////////////////////

// We are almost done with the Apt cache.
// We have discussed several times about the state we keep about each package
// without explaining what it means.

type StateCache struct {
    // The version that can be installed determined using sources.
    CandidateVersion string
    // The version currently installed determined using the Dpkg database.
    CurrentVersion string
    // A flag to determine if the package is marked for installation.
    flagInstall bool
}

func (s *StateCache) Upgradable() bool {
    return s.CurrentVersion != "" &&
        s.CandidateVersion != "" && s.CurrentVersion != s.CandidateVersion
}

func (s *StateCache) Install() bool {
    return s.flagInstall
}

func (s *StateCache) Installed() bool {
    return s.CurrentVersion != ""
}

// When installing a package, we must make sure its dependencies
// are already installed or we need to install them first.
// The logic is rather complicated as many things can go wrong
// with dependency management like conflicts between two packages.
// For this article, we will use a very basic approach.
// We ignore versions completely and install each missing dependencies
// without checking if it brokes other packages. This is another
// reason why you must not run this code on your host directly :).

func (c *CacheFile) MarkForInstallation(pkgName string) {
    pkg := c.GetPackage(pkgName)
    if pkg == nil {
        fmt.Printf("E: Unable to locate package %s\n", pkgName)
        os.Exit(1)
    }

    state := c.GetState(pkg)
    if state.Installed() || state.Install() {
        // Already installed or marked for installation
        return
    }

    // Make sure to mark the package before checking its dependencies
    // to prevent infinite cycles
    state.CandidateVersion = pkg.Version()
    state.flagInstall = true

    // Mark dependencies recursively
    for _, dep := range pkg.Depends() {
        c.MarkForInstallation(dep.Name)
    }

    // Add dependencies first in the installation sequence order
    c.depCache.order = append(c.depCache.order, pkgName)
}

// We end this section with an utility method to report the total
// number of packages that will be installed.
// This number differs commonly as packages have dependencies
// that must be installed and we will use this method to notify
// the user that more packages will be installed as the ones
// passed in argument.

func (c *CacheFile) InstCount() int {
    count := 0
    for _, state := range c.depCache.states {
        if state.Install() {
            count++
        }
    }
    return count
}

///////////////////////////////////////////////////////////

//
// Main
//

// We have everything we need to implement the command `apt install`.
// We will integrate everything we have covered so far.

func main() {
    var pkgNames []string
    // The command `apt install` can be called without any package to install.
    if len(os.Args) > 1 {
        pkgNames = append(pkgNames, os.Args[1:]...)
    }

    // Load the Cache
    cache := &CacheFile{}
    cache.Open()

    // Search for the packages to install
    pkgs := make(map[string]*Package)
    for _, pkgName := range pkgNames {
        // The command `apt install` also supports `.deb` file.
        // We ignore this for simplicity to avoid
        // duplicating code from the previous parts of this blog post.
        // Check https://github.com/julien-sobczak/linux-packages-under-the-hood
        // for a more complete implementation.

        cache.MarkForInstallation(pkgName)
        pkgs[pkgName] = cache.GetPackage(pkgName)
    }

    // Print out the list of additional packages to install
    if cache.InstCount() != len(pkgNames) {
        var extras []string
        for _, pkg := range cache.GetPackages() {
            state := cache.GetState(pkg)
            if !state.Install() {
                continue
            }
            if _, ok := pkgs[pkg.Name()]; !ok {
                extras = append(extras, pkg.Name())
            }
        }
        fmt.Printf(
            "The following additional packages will be installed:\n\t%s\n",
            strings.Join(extras, " "))
    }

    // Print out the list of suggested packages
    var suggests []string
    for _, pkg := range cache.GetPackages() {
        state := cache.GetState(pkg)

        // Just look at the ones we want to install
        if !state.Install() {
            continue
        }

        // Get the suggestions for the candidate version
        for _, dependency := range pkg.Suggests() {
            suggests = append(suggests, dependency.Name)
        }
    }
    if len(suggests) > 0 {
        fmt.Printf("Suggested packages:\n\t%s\n", strings.Join(suggests, " "))
    }

    err := InstallPackages(cache)
    if err != nil {
        fmt.Printf("E: %s\n", err)
        os.Exit(1)
    }
}

func InstallPackages(cache *CacheFile) error {
    acq := NewPkgAcquire(cache)

    // 1. Download package archives
    for _, pkgName := range cache.depCache.order {
        pkg := cache.GetPackage(pkgName)
        acq.Add(NewPackageItem(pkg))
    }
    err := acq.Run()
    if err != nil {
        return err
    }

    // 2. Run the command `dpkg -i` to install them
    var archives []string
    for _, pkgName := range cache.depCache.order {
        pkg := cache.GetPackage(pkgName)
        archives = append(archives, pkg.cacheFilepath)
    }

    // We delegate to the dpkg command to avoid repeating the previous code
    // but the complete code source of this repository reuse the same code.
    // Check https://github.com/julien-sobczak/linux-packages-under-the-hood
    out, err := exec.Command("dpkg", "-i", strings.Join(archives, " ")).Output()
    if err != nil {
        return err
    }
    fmt.Print(string(out))

    return nil
}

///////////////////////////////////////////////////////////

// Helpers

// gpgDecode checks the GPP signature of a clearsigned document and
// returns the content.
func gpgDecode(filename string, publicKey string) ([]byte, error) {
    // Open gpg clearsigned document
    r, err := os.Open(filename)
    if err != nil {
        return nil, fmt.Errorf("error opening public key: %s", err)
    }
    defer r.Close()

    // Read the content
    data, err := ioutil.ReadAll(r)
    if err != nil {
        return nil, err
    }

    // Decode the content
    b, _ := clearsign.Decode(data)
    if b == nil {
        return nil, fmt.Errorf("not PGP signed")
    }

    // Open the public key to validate the signature
    rk, err := os.Open(publicKey)
    if err != nil {
        return nil, fmt.Errorf("error opening public key: %s", err)
    }
    defer r.Close()
    keyring, err := openpgp.ReadKeyRing(rk) // binary
    if err != nil {
        return nil, fmt.Errorf("failed to parse public key: %v", err)
    }

    // Check the signature using the public key
    _, err = openpgp.CheckDetachedSignature(keyring,
        bytes.NewBuffer(b.Bytes), b.ArmoredSignature.Body)
    if err != nil {
        return nil, err
    }

    return b.Plaintext, nil
}

🎉 We have finished with the command apt. We have also finished with this article! We created a Debian archive using a basic Go program and we install the package using Go versions of dpkg and apt.

"One" Last Word

Linux packages are just archives containing files to extract into a different system. The problem is trivial but the evil always comes from details.

In this article, we have glimpsed at some of the challenges that a package manager must address. Packages use others packages which means the package manager must face one of the most difficult problems in computing, dependency management. Despite that, Dpkg and Apt are still approachable programs.

We wrote basic versions from scratch using only a few hundreds of lines of Go code. The biggest obstacle was that the commands dpkg and apt are interactive and try do too much to avoid to rely on the user to fix problems, which explains why the sum of the two programs represents approximatively 100,000 lines of C and C++ code.

If you are managing a large pool of servers like a datacenter, reimplementing your own package manager can be interesting. For example, you could centralize all local databases to ensure that all machines share the same state, or you can take corrective actions like excluding a server from the pool when an upgrade ends in a bad state. Google provides a great example of application. They decided to implement their own package management system. “Any package change is guaranteed to succeed, or the machine is rolled back completely to the previous state. If the rollback fails, the machine is sent through our repairs process for reinstallation and potential hardware replacement. This approach allows us to eliminate much of the complexity of the package states.”[1]. The decision was surely not obvious, but the benefits are for sure obvious.

Implementing a package manager from scratch can be intimidating, but as we have seen in this article, the reality is not so bad, especially if we consider the long list of features that Apt supports that are not useful when managing a large number of homogenous machines in an automated way.


1. Building Secure and Reliable Systems, O’Reilly, Chapter 9 - Design for Recovery, Footnote 18

About the author

Julien Sobczak works as a software developer for Scaleway, a French cloud provider. He is a passionate reader who likes to see the world differently to measure the extent of his ignorance. His main areas of interest are productivity (doing less and better), human potential, and everything that contributes in being a better person (including a better dad and a better developer).

Read Full Profile

Tags