Pluggable Acquire System for APT
Codename: apt-fetcher
External Links: Project Application Launchpad
Configured Build Environment
- Chroot with Debian codename SID
- Host OS: Ubuntu Precise 12.04
- Host Architecture: i386
- Host IDE: Eclipse Indigo C++ - I will be coding in this IDE for its development facilities, and building will be made inside the schroot
- VCS: SVN, git, BZR - I'll probably use bazaar, since this is the current VCS for the apt package
State of the Art
as presented in 21st of May 2012
The Debian Archive
Location: link
- Organized in:
distributions: squeeze, wheezy, sid, experimental, etc.
sections: main, contrib, non-free
architectures: i386, amd64, mips, powerpc, etc.
PACKAGES:
located in the pool/ directory
- contains both binary and source packages, organized as described above
METADATA:
- index files
- maintainers
- uploaders
Release files: all the packages contained in the current distribution
Contents files: mappings of files to packages - fetched by apt-file
Packages files: binary package metadata:
as described in The Debian Repository Format - fetched by apt-get
tags - fetched by debtags
- checksums - md5, sha, etc.
Sources files: source package metadata:
as described in The Debian Repository Format - fetched by apt-get
- checksums
Translations files: translations to package descriptions, used by configuration tools such as debconf
DIFF files: for all the above files there are diff files, synchronization with the debian archive is made efficiently
The APT Source Code
mirrors the Debian Archive in /var/lib/apt/lists/
- downloads and installs packages
Parsing
apt-pkg/sourcelist - APT's parsing section
class pkgSourceList - the main parsing responsible class
parses each line of sources.list, producing a apt-pkg/metaIndex object - this corresponds to a Release file (aka MetaIndex) in the remote Debian Archive
- a metaIndex is defined by:
an URI - location of the data
a Dist - the Debian distribution
a Type - source / binary package
Acquiring
an acquire module - a group of classes that implement the functionality
Components
THE FETCHER:
class: pkgAcquire
- multiple working processes - workers - spawned and managed internally
- multiple working queues - assigned to the workers
- multiple fetching methods
- logger object to send feedback
- main purpose: scheduling and executing the download of Debian Archive files
THE ITEMS
interface: pkgAcquire::Item
- specific methods called by the pkgAcquire::Worker:
Start() - when the item starts being fetched
Done() - upon item fetch completion
Failed() - when the item can't be fetched
- different implementations according to the type of file:
pkgAcqMetaSig, pkgAcqMetaIndex, pkgAcqMetaClearSig - metaIndex files
pkgAcqSubIndex - records for additional files, others than the ones contained in metaIndex files (Translations, pdiffs)
pkgAcqDiffIndex - index files for packages diffs
pkgAcqIndexDiffs - all diff files that are required for a specific index file
pkgAcqIndex - acquire item responsible for fetching an index file (Packages, Sources)
pkgAcqIndexTrans - acquire item responsible for fetching a translated index
pkgAcqArchive - package file
pkgAcqFile - arbitrary file
THE METHODS
interface: pkgAcqMethod
universal Fetch() method used to fetch a file
- completely decoupled by the type of the item
- identified by strings
examples: cdrom, http, gzip, copy, rred (apply pdiff patch), etc.
The Metadata Retrieval Algorithm
- the sourcelist parser parses the sources.list files and produces metaIndex objects for each entry
the metaIndex objects are created by a type-corresponding class (deb - debSLTypeDeb, deb-src - debSLTypeDebSrc)
the metaIndex objects are responsible for creating pkgAcq* objects which will download the release file for each entry. The ?GetIndexes() method constructs such objects and assigns them to a fetcher object (pkgAcquire)
- a metaIndex will submit to the fetcher:
a pkgAcqMetaIndex object - the Release file
multiple pkgAcqIndex objects - corresponding to Packages or Sources. These are optional and override automatically building the objects from the pkgAcqMetaIndex.
- the fetcher will first download the Release file
after the release file is present and validated, it will be parsed. The release file holds information about the other index files in the Debian Archive. The contents of the Debian Archive will be compared with the desired index files - represented by IndexTarget objects - to figure out which indexes to fetch
once it has been settled what indexes to fetch, the fetcher will decide how to fetch them - at this point it checked whether they can be downloaded using pdiffs or whether they should be downloaded entirely
- to download the index files with pdiffs, an Index file for these diffs must be present in the archive. If the download process fails using pdiffs, the fetcher will try downloading the whole files.
The current indexfile (class pkgIndexFile) interface
has a type - binary index file, binary translation file, binary index file describing the local system, source index file
implements an interface for the acquire module - where can the specific index file be downloaded from
implements the interface for the record parsers - generating the package structure from these files
implements the interface for the cache generator - APT uses a memory-mapped cache to optimize the whole package management process
The DEBTAGS Source Code
mirrors the Debian Archive in /var/lib/debtags/package-tags (packages and their tags) and /var/lib/debtags/vocabulary (all the tags, and their descriptions)
fetches the files using a python script called debtags-fetch
- gives the possibility of different fetching types - local files, http client, apt sources, etc.
The AppStream Project
Described in The AppStream Debian Proposal
"We propose that the indices are provided alongside the packages files, that is in dists/<SUITE>/<COMPONENT>?/ComponentMetadata.xy and compressed with whatever ftpmaster wish to compress with. This file will only be downloaded on demand, e.g. the Software Center could download it. On servers this file is not really required."
Suggested Design Guidelines
- All the supported metadata is located in the remote Debian Archive
- We probably won't need any new objects for control files (e.g. metaIndex, indexfile, indexrecords, etc.)
- The public parser will be responsible of parsing the sources.list entries and transforming them into Source objects, according to the format of the file (standard, rfc822, rpm, etc.)
- The Source objects will be transformed into metaIndex objects, using type-specific classes - present ones should be fine
- The metaIndex objects will be passed to the pluggable acquire framework. For each metaIndex object ask each enabled plugin which files it should get from the archive.
- Download the files using internal objects (pkgAcq*) and specific technologies (pdiffs, compressed files, etc.)
The sources.list Format
the current format is described in man sources.list:
multiple entries, one per line, using the format: type [ options ] uri distribution [ component1 ] [ component2 ] ...
type can be deb or deb-src
options is optional and can be arch or trusted - unknown options are silently ignored
- uri specifies the base of a Debian Archive (Debian Repository)
- distribution can be an absolute (/-terminating) path (and component specification is optional) or a relative release path (in this case, at least one component must be specified)
component may refer to one of main, contrib and / or non-free debian sections
the sources.list.d/ folder may contain other .list files, with the same format - other files are silently ignored
- ENHANCEMENTS:
we create support for new options for the entries, mapped to plugins
additional options may be tags = yes, contents = no, etc.
these options can be defined both for the deb and deb-src entries
- the parser will provide the read entries, along with the new options
- according to the options and the installed plugins, the pluggable acquire framework will choose which plugins it will activate and what metadata it will acquire
- omitting a metadata type option means that it shall not be acquired by the framework
if a metadata type option is active (e.g. contents = yes) and there is no plugin for that metadata type (e.g. no apt-file plugin), the pluggable acquire framework will display a warning
- this approach provides backwards compatibility
The Public Parser
- Will be implemented as a shared library
- Classes:
Source
- base abstract class for a Debian Archive source
provides remote source info (type, URI, distribution, components, options) and local source info (Filename, LineNumber, Comments)
- the local source info is used to rebuild the locally parsed files
- provides a write method to output the parsed source
SourceIfstream
- wrapper for an opened sources.list file
- the main purpose is to provide seek capabilities for these opened files
filter_iterator
- template class to support predicate iteration over Sources
predicates extend the abstract base class SourcePredicate, and will implement the Check(Source) method, which validates a specific source
SourcesList
- container for all the sources of an apt-get instance
- provides read primitives
- provides predicate based iterators, using custom iteration predicates
The Pluggable Acquire Framework
- Receives the Source objects from the public parser as input
- Constructs the necessary objects used by the acquire module
- Holds metaIndexPlugins and acquireIndexPlugins
Holds metaIndexes built from a SourcesList object
Handled Objects
metaIndex - corresponds to a Release file and all the other files that are registered to it
acquireIndex - correspnonds to a single index (metadata) file in the Debian Archive
Handled Plugins
metaIndexPlugin
- builds metaIndexes (construct + merge)
builds the acquireIndexes and IndexTargets for a metaIndex
acquireIndexPlugin
- metadata type plugin
- builds the acquireIndexes for an acquireIndex
given an IndexTarget and an indexRecords::checkSum (Release file entry), builds an according pkgAcquire::Item, used by the acquire module to download
From Source objects to the Acquire Module
- The framework receives a list of Source objects
- For each unique Type + URI + Dist, builds an individual metaIndex object
- The metaIndex object holds the information necessary to specify all the index files that need to be downloaded from the Debian Archive
- This information is stored in acquireIndex objects, build using the metaIndex file and the framework's registered acquireIndexPlugins
The main use of the acquireIndex object is to build an IndexTarget object, which is registered to the fetcher (pkgAcquire)
Acquire Module internal flow
- first the Release file is downloaded, and then it is parsed to build the set of records - file, size, checksum
using the previously registered IndexTargets and these records, the fetcher decides which pkgAcquire::Items to build for download
- firstly, it tries to download the files using pdiffs - pkgAcqDiffIndex and pkgAcqIndexDiffs objects - which are independent of the type of the index file
- should this process fail or not be available, the fetcher builds pkgAcquire::Items to download the whole files
- the fetcher holds a reference to the pluggable acquire framework object
using the ShortDescription field of the IndexTarget and the registered metadata plugins - acquireIndexPlugins - the framework builds specific pkgAcquire::Items to download the index files
- the acquireIndexPlugin implements a subclass of pkgAcquire::Item for its specific metadata type
- this subclass defines specific behaviors for downloading the metadata piece