Pluggable Acquire System for APT
Codename: apt-fetcher
External Links: Project Application Launchpad
Configured Build Environment
- Chroot with Debian codename SID
- Host OS: Ubuntu Precise 12.04
- Host Architecture: i386
- Host IDE: Eclipse Indigo C++ - I will be coding in this IDE for its development facilities, and building will be made inside the schroot
- VCS: SVN, git, BZR - I'll probably use bazaar, since this is the current VCS for the apt package
State of the Art
as presented in 21st of May 2012
The Debian Archive
Location: link
- Organized in:
distributions: squeeze, wheezy, sid, experimental, etc.
sections: main, contrib, non-free
architectures: i386, amd64, mips, powerpc, etc.
PACKAGES:
located in the pool/ directory
- contains both binary and source packages, organized as described above
METADATA:
- index files
- maintainers
- uploaders
Release files: all the packages contained in the current distribution
Contents files: mappings of files to packages - fetched by apt-file
Packages files: binary package metadata:
as described in The Debian Repository Format - fetched by apt-get
tags - fetched by debtags
- checksums - md5, sha, etc.
Sources files: source package metadata:
as described in The Debian Repository Format - fetched by apt-get
- checksums
Translations files: translations to package descriptions, used by configuration tools such as debconf
DIFF files: for all the above files there are diff files, synchronization with the debian archive is made efficiently
The APT Source Code
mirrors the Debian Archive in /var/lib/apt/lists/
Parsing
apt-pkg/sourcelist
class pkgSourceList
parses each line of sources.list, producing a apt-pkg/metaIndex object
- a metaIndex object contains:
an URI - location of the data
a Dist
a Type - Debian section
Acquiring
apt-pkg/acquire, apt-pkg/acquire-item, apt-pkg/acquire-method, apt-pkg/acquire-worker
parent general class in class pkgAcquire::Item
subclasses into: pkgAcquireSubIndex, pkgAcquireDiffIndex, ... (apt-pkg/acquire-item classes)
Items are downloaded using Download Queues
Download Queues use Download Workers - individual processes that download items using ?MethodConfig objects. They are spawned using fork and they communicate with the package and the system through pipes.
The ?MethodConfig items are basically command line programs used to move data from one place to another - diffindex-download, scp, rcp, cp, etc.
The DEBTAGS Source Code
mirrors the Debian Archive in /var/lib/debtags/package-tags (packages and their tags) and /var/lib/debtags/vocabulary (all the tags, and their descriptions)
fetches the files using a python script called debtags-fetch
- gives the possibility of different fetching types - local files, http client, apt sources, etc.
The AppStream Project
Described in The AppStream Debian Proposal
"We propose that the indices are provided alongside the packages files, that is in dists/<SUITE>/<COMPONENT>?/ComponentMetadata.xy and compressed with whatever ftpmaster wish to compress with. This file will only be downloaded on demand, e.g. the Software Center could download it. On servers this file is not really required."
Suggested Design Guidelines
- For each line in sources.list, we keep track of desired metadata in sources.list.d/
- All the supported metadata is located in the remote Debian Archive
- We probably won't need any new pckAcquire::Item subclasses
We will need to reorganize the sources.list parsing, and create a separate parsing framework. This will provide an extensible plugin model, to do the following:
Identify the type of the metadata - text / icons (supported in AppStream)
- Where to get it from
- Where to store it
- ... other stuff ...
Retrieve enhanced apt-pkg/metaIndex objects
- These new metaIndex objects should be compatible with the rest of the current apt code, so integration won't break backwards compatibility
The sources.list Format
the current format is described in man sources.list:
multiple entries, one per line, using the format: type [ options ] uri distribution [ component1 ] [ component2 ] ...
type can be deb or deb-src
options is optional and can be arch or trusted - unknown options are silently ignored
- uri specifies the base of a Debian Archive (Debian Repository)
- distribution can be an absolute (/-terminating) path (and component specification is optional) or a relative release path (in this case, at least one component must be specified)
component may refer to one of main, contrib and / or non-free debian sections
the sources.list.d/ folder may contain other .list files, with the same format - other files are silently ignored
- ENHANCEMENTS:
we create support for new types of entries, with the same format, but with additional types
additional types may be deb-tags, deb-contents, deb-appstream, etc.
we store these new entries in new .plugin files in sources.list.d/
- the public parser will be capable of assimilating these entries, along with other possible options
- the public parser will provide an API to access the read entries
- the public parser is not responsible of understanding the types of metadata, just pass it further to the pluggable acquire framework
- the pluggable acquire framework will then map the entries to the corresponding plugin, according to the entry type
e.g. deb and deb-src to apt-get, deb-tags to debtags, deb-contents to apt-file, etc.
- each plugin specifies how to acquire the specific metadata: where to get it from, where to store it, etc.
a single apt-get update will fetch the metadata for all active plugins and configured sources -> SUCCESS!!
The Public Parser
- Will be implemented as a shared library
- Handled objects:
class Option
string Name
vector<string> Values
class Entry
string File - for file(s) reconstruction
string Type - with special values _ _ Blank and _ _ Comment for file reconstruction
vector<Option> Options
string URI
string Dist
vector<string> Components
- We need to store Entry objects in containers so that we provide the following facilities:
reproduction of the initial file(s) - the entries must be accessed in the order they are read
delegation to the pluggable acquire framework - the entries must be accessed ordered by type
identification based on the URI - we must order the resources according to URI's
- The shared library will be organized as an "Entry-base", initially populated with the contents of the sources.list file(s), and then accessed by the pluggable acquire framework
( TO BE CONTINUED )