Differences between revisions 16 and 17
Revision 16 as of 2012-07-27 09:03:14
Size: 13557
Revision 17 as of 2017-04-22 04:04:28
Size: 13518
Editor: PaulWise
Comment: page got renamed
Deletions are marked like this. Additions are marked like this.
Line 33: Line 33:
   * as described in [[ http://wiki.debian.org/RepositoryFormat#A.22Packages.22_Indices | The Debian Repository Format ]] - fetched by ''apt-get''    * as described in [[DebianRepository/Format#A.22Packages.22_Indices| The Debian Repository Format]] - fetched by ''apt-get''
Line 37: Line 37:
   * as described in [[ http://wiki.debian.org/RepositoryFormat#A.22Sources.22_Indices | The Debian Repository Format ]] - fetched by ''apt-get''    * as described in [[DebianRepository/Format#A.22Sources.22_Indices|The Debian Repository Format]] - fetched by ''apt-get''

Pluggable Acquire System for APT

Codename: apt-fetcher

External Links: Project Application Launchpad

Configured Build Environment

  • Chroot with Debian codename SID
  • Host OS: Ubuntu Precise 12.04
  • Host Architecture: i386
  • Host IDE: Eclipse Indigo C++ - I will be coding in this IDE for its development facilities, and building will be made inside the schroot
  • VCS: SVN, git, BZR - I'll probably use bazaar, since this is the current VCS for the apt package

State of the Art

  • as presented in 21st of May 2012

The Debian Archive

  • Location: link

  • Organized in:
    • distributions: squeeze, wheezy, sid, experimental, etc.

    • sections: main, contrib, non-free

    • architectures: i386, amd64, mips, powerpc, etc.


    • located in the pool/ directory

    • contains both binary and source packages, organized as described above

    • index files
    • maintainers
    • uploaders
    • Release files: all the packages contained in the current distribution

    • Contents files: mappings of files to packages - fetched by apt-file

    • Packages files: binary package metadata:

    • Sources files: source package metadata:

    • Translations files: translations to package descriptions, used by configuration tools such as debconf

    • DIFF files: for all the above files there are diff files, synchronization with the debian archive is made efficiently

The APT Source Code

  • mirrors the Debian Archive in /var/lib/apt/lists/

  • downloads and installs packages


  • apt-pkg/sourcelist - APT's parsing section

  • class pkgSourceList - the main parsing responsible class

  • parses each line of sources.list, producing a apt-pkg/metaIndex object - this corresponds to a Release file (aka MetaIndex) in the remote Debian Archive

  • a metaIndex is defined by:
    • an URI - location of the data

    • a Dist - the Debian distribution

    • a Type - source / binary package


  • an acquire module - a group of classes that implement the functionality



  • class: pkgAcquire

  • multiple working processes - workers - spawned and managed internally
  • multiple working queues - assigned to the workers
  • multiple fetching methods
  • logger object to send feedback
  • main purpose: scheduling and executing the download of Debian Archive files


  • interface: pkgAcquire::Item

  • specific methods called by the pkgAcquire::Worker:
    • Start() - when the item starts being fetched

    • Done() - upon item fetch completion

    • Failed() - when the item can't be fetched

  • different implementations according to the type of file:
    • pkgAcqMetaSig, pkgAcqMetaIndex, pkgAcqMetaClearSig - metaIndex files

    • pkgAcqSubIndex - records for additional files, others than the ones contained in metaIndex files (Translations, pdiffs)

    • pkgAcqDiffIndex - index files for packages diffs

    • pkgAcqIndexDiffs - all diff files that are required for a specific index file

    • pkgAcqIndex - acquire item responsible for fetching an index file (Packages, Sources)

    • pkgAcqIndexTrans - acquire item responsible for fetching a translated index

    • pkgAcqArchive - package file

    • pkgAcqFile - arbitrary file


    • interface: pkgAcqMethod

    • universal Fetch() method used to fetch a file

    • completely decoupled by the type of the item
    • identified by strings
    • examples: cdrom, http, gzip, copy, rred (apply pdiff patch), etc.

    The Metadata Retrieval Algorithm

    • the sourcelist parser parses the sources.list files and produces metaIndex objects for each entry
    • the metaIndex objects are created by a type-corresponding class (deb - debSLTypeDeb, deb-src - debSLTypeDebSrc)

    • the metaIndex objects are responsible for creating pkgAcq* objects which will download the release file for each entry. The ?GetIndexes() method constructs such objects and assigns them to a fetcher object (pkgAcquire)

    • a metaIndex will submit to the fetcher:
      • a pkgAcqMetaIndex object - the Release file

      • multiple pkgAcqIndex objects - corresponding to Packages or Sources. These are optional and override automatically building the objects from the pkgAcqMetaIndex.

    • the fetcher will first download the Release file
    • after the release file is present and validated, it will be parsed. The release file holds information about the other index files in the Debian Archive. The contents of the Debian Archive will be compared with the desired index files - represented by IndexTarget objects - to figure out which indexes to fetch

    • once it has been settled what indexes to fetch, the fetcher will decide how to fetch them - at this point it checked whether they can be downloaded using pdiffs or whether they should be downloaded entirely

    • to download the index files with pdiffs, an Index file for these diffs must be present in the archive. If the download process fails using pdiffs, the fetcher will try downloading the whole files.
  • The current indexfile (class pkgIndexFile) interface

    • has a type - binary index file, binary translation file, binary index file describing the local system, source index file

    • implements an interface for the acquire module - where can the specific index file be downloaded from

    • implements the interface for the record parsers - generating the package structure from these files

    • implements the interface for the cache generator - APT uses a memory-mapped cache to optimize the whole package management process

The DEBTAGS Source Code

  • mirrors the Debian Archive in /var/lib/debtags/package-tags (packages and their tags) and /var/lib/debtags/vocabulary (all the tags, and their descriptions)

  • fetches the files using a python script called debtags-fetch

  • gives the possibility of different fetching types - local files, http client, apt sources, etc.

The AppStream Project

  • Described in The AppStream Debian Proposal

  • "We propose that the indices are provided alongside the packages files, that is in dists/<SUITE>/<COMPONENT>?/ComponentMetadata.xy and compressed with whatever ftpmaster wish to compress with. This file will only be downloaded on demand, e.g. the Software Center could download it. On servers this file is not really required."

Suggested Design Guidelines

  • All the supported metadata is located in the remote Debian Archive
  • We probably won't need any new objects for control files (e.g. metaIndex, indexfile, indexrecords, etc.)
  • The public parser will be responsible of parsing the sources.list entries and transforming them into Source objects, according to the format of the file (standard, rfc822, rpm, etc.)
  • The Source objects will be transformed into metaIndex objects, using type-specific classes - present ones should be fine
  • The metaIndex objects will be passed to the pluggable acquire framework. For each metaIndex object ask each enabled plugin which files it should get from the archive.
  • Download the files using internal objects (pkgAcq*) and specific technologies (pdiffs, compressed files, etc.)

The sources.list Format

  • the current format is described in man sources.list:

    • multiple entries, one per line, using the format: type [ options ] uri distribution [ component1 ] [ component2 ] ...

    • type can be deb or deb-src

    • options is optional and can be arch or trusted - unknown options are silently ignored

    • uri specifies the base of a Debian Archive (Debian Repository)
    • distribution can be an absolute (/-terminating) path (and component specification is optional) or a relative release path (in this case, at least one component must be specified)
    • component may refer to one of main, contrib and / or non-free debian sections

    • the sources.list.d/ folder may contain other .list files, with the same format - other files are silently ignored

    • we create support for new options for the entries, mapped to plugins

    • additional options may be tags = yes, contents = no, etc.

    • these options can be defined both for the deb and deb-src entries

    • the parser will provide the read entries, along with the new options
    • according to the options and the installed plugins, the pluggable acquire framework will choose which plugins it will activate and what metadata it will acquire
    • omitting a metadata type option means that it shall not be acquired by the framework
    • if a metadata type option is active (e.g. contents = yes) and there is no plugin for that metadata type (e.g. no apt-file plugin), the pluggable acquire framework will display a warning

    • this approach provides backwards compatibility

The Public Parser

  • Will be implemented as a shared library
  • Classes:
    • Source

      • base abstract class for a Debian Archive source
      • provides remote source info (type, URI, distribution, components, options) and local source info (Filename, LineNumber, Comments)

      • the local source info is used to rebuild the locally parsed files
      • provides a write method to output the parsed source
    • SourceIfstream

      • wrapper for an opened sources.list file
      • the main purpose is to provide seek capabilities for these opened files
    • filter_iterator

      • template class to support predicate iteration over Sources
      • predicates extend the abstract base class SourcePredicate, and will implement the Check(Source) method, which validates a specific source

    • SourcesList

      • container for all the sources of an apt-get instance
      • provides read primitives
      • provides predicate based iterators, using custom iteration predicates

The Pluggable Acquire Framework

  • Receives the Source objects from the public parser as input
  • Constructs the necessary objects used by the acquire module
  • Holds metaIndexPlugins and acquireIndexPlugins
  • Holds metaIndexes built from a SourcesList object

Handled Objects

  • metaIndex - corresponds to a Release file and all the other files that are registered to it

  • acquireIndex - correspnonds to a single index (metadata) file in the Debian Archive

Handled Plugins

  • metaIndexPlugin

    • builds metaIndexes (construct + merge)
    • builds the acquireIndexes and IndexTargets for a metaIndex

  • acquireIndexPlugin

    • metadata type plugin
    • builds the acquireIndexes for an acquireIndex
    • given an IndexTarget and an indexRecords::checkSum (Release file entry), builds an according pkgAcquire::Item, used by the acquire module to download

From Source objects to the Acquire Module

  • The framework receives a list of Source objects
  • For each unique Type + URI + Dist, builds an individual metaIndex object
  • The metaIndex object holds the information necessary to specify all the index files that need to be downloaded from the Debian Archive
  • This information is stored in acquireIndex objects, build using the metaIndex file and the framework's registered acquireIndexPlugins
  • The main use of the acquireIndex object is to build an IndexTarget object, which is registered to the fetcher (pkgAcquire)

Acquire Module internal flow

  • first the Release file is downloaded, and then it is parsed to build the set of records - file, size, checksum
  • using the previously registered IndexTargets and these records, the fetcher decides which pkgAcquire::Items to build for download

  • firstly, it tries to download the files using pdiffs - pkgAcqDiffIndex and pkgAcqIndexDiffs objects - which are independent of the type of the index file
  • should this process fail or not be available, the fetcher builds pkgAcquire::Items to download the whole files

  • the fetcher holds a reference to the pluggable acquire framework object
  • using the ShortDescription field of the IndexTarget and the registered metadata plugins - acquireIndexPlugins - the framework builds specific pkgAcquire::Items to download the index files

  • the acquireIndexPlugin implements a subclass of pkgAcquire::Item for its specific metadata type
  • this subclass defines specific behaviors for downloading the metadata piece