Differences between revisions 23 and 24
Revision 23 as of 2012-04-05 17:31:03
Size: 13304
Revision 24 as of 2012-04-06 10:44:56
Size: 13466
Deletions are marked like this. Additions are marked like this.
Line 25: Line 25:
   * Recently I've installed a Debian build environment locally with the latest sid release, and contributed to solving some minor bugs: [[ http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649340 | [3]]], [[ http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=665833 | [4]]].    * Recently I've installed a Debian build environment locally with the latest sid release, and contributed to solving some minor bugs: [[ http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=639008 | [3]]], [[ http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=665833 | [4]]], [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649340 | [5]]].
Line 154: Line 154:
  [3] [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649340 | Debian Bug #649340]] <<BR>>   [3] [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=639008 | Debian Bug #639008]] <<BR>>
Line 156: Line 156:
  [5] [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649340 | Debian Bug #649340]] <<BR>>

Student Application Template

  • Name: Bogdan Purcăreață

  • Contact/Email:

    • IRC: dodgerblue on OFTC, Freenode

    • bogdan.purcareata AT gmail DOT com
    • Romania, UTC + 2
  • Background:

    • Who are you?

      • Senior undergraduate student in Computer Science at the Computer Science and Automatic Control Faculty, Politehnica University of Bucharest.
      • Highly dependable and efficiency oriented professional.
      • Ambitious, focused and enduring individual.
    • Technical skills / Known technologies

      • Knowledge in C (8 years), as well as C++, Java and Python.

      • Knowledge in Algorithm Design, Data Structures and Project Development Workflow.

      • Knowledge in Compilers, Operating Systems, Networking, Distributed Systems, System programming, Shell Scripting (BASH).

    • Experience

      • Open Source Development Course - I've pursued this course [1], organized by ROSEdu [2]. I've had the chance to learn concepts about Version Control, Bug Tracking, Netiquette, Unit Testing, Debugging, Security Practices, Licensing and pretty much everything related to Open Source development and communities.

      • Ixia Internship - ported a desktop client for a network traffic test engine on the Android platform.

      • GPGPU Workshop - GPU C++ programming, held as a Summer School at our University.

      • Algorithm Design Course - developed a chess engine, equipped with artificial intelligence.

    • What makes you the best person to work on this project?

      • Recently I've installed a Debian build environment locally with the latest sid release, and contributed to solving some minor bugs: [3], [4], [5].

      • Refactoring, optimizing and unifying the metadata acquire system for APT would significantly improve the whole Debian user experience, as well as improve the Package Management System's consistency.
      • To me, this is both a thrilling challenge and a great opportunity to analyze the Debian OS internals and core features.
      • I've been using Debian and Ubuntu for 4 years now, both for my university assignments and for the introspection of the operating system.


  • Project title: Pluggable Acquire-System for APT

  • Project details:

    • Debian has developed several tools to manage packages, each one having its own way of handling metadata. This results in a mixed package management system, prone to inconsistencies and loss in overall OS performance. The aim of this project is to build a broad image of the package metadata locally, so all the information is kept in one place and is updated at the same time. The user, by choosing which tools to use, tells the manager what metadata to acquire, therefore how specific a Debian Archive local image he wishes to interact to.
  • Synopsis:

    • Current model:

      • Several tools for package management - apt-get, debtags, apt-file.

      • Each one handles its own set of package metadata, therefore its own view of the remote Debian Archive:
        • apt-get: Release, Packages, Sources, Translations.

        • apt-file: Content.

        • debtags: Debtags (facets and tags).

      • Each one requires individual updating and interaction.
      • Private parsing of the sources.list file in apt-get.
    • Desired goals

      • Performance: minimal bandwidth usage achieved by metadata differential retrieval and consistent storage. The current technique of fetching only the changes in metadata - pdiff - will be present in the new system.

      • Effectiveness: the user defines which components the acquire system will use, and the system only uses these components. By default, the system comes with the old apt-get update functionality.

      • Scalability: the system is pluggable. Each time a package tool is installed - e.g. apt-file, debtags - it registers itself to this system, and activates the corresponding plugin.

      • Forward compatibility: the system will have a generic design, open for future development, especially in the plugin area. I think it would be a good idea to expose a plugin development API, so new package tools developers can make use of this unified metadata acquire system to handle their metadata as well.

      • Backward compatibility: it is desired that the system doesn't break the existing interfaces, and the transition to the new sistem is as transparent as possible to the user.

      • Openness: providing a public parser for the sources.list file, that other package management tools can use instead of inventing their own.

    • Proposed Architecture:

      • width=800

  • Main components:

    • The Enhanced sources.list File:

      • stores additional information for each URL, besides suite and area - e.g. the enabled plugins.
      • remains compatible with the current apt-get update private parser, which is compatible with present format only - this can be solved in the format of the sources.list, or as a patch for the current parser.
      • another option to enhance the functionality of this static file is to store additional plugin info as snippets in a separate directory - e.g. /etc/sources.list.d/plugins/. The parser will scan the contents of this directory to determine what additional metadata to fetch.

    • The Enhanced Public Parser:

      • is implemented using the libapt API.
      • provides a parsing API for the package management tools frontends.
      • represents the pluggable component of the system - plugins are registered at install time with a default configuration. The user may handle plugin management via an interface.
      • comes with apt-get update old parser functionality by default.
      • supports a generic plugin model for new types of metadata.
      • there are two ways of categorizing the plugins:
        • per current tools' handled metadata: basic packages metadata, apt-file metadata, AppStream metadata.

        • per type of metadata: Release, Packages, Sources, Content, Tags, Components (AppStream).

      • I suggest the second one is used, due to finer granularity.

    • The Unified Metadata Backend:

      • invokes the parser to build an index of desired metadata to fetch from the Debian Archive.

      • is responsible with fetching the metadata from the Debian Archive, processing it, and retrieving it to apt-get update.

      • implements efficient metadata transport mechanisms - e.g. pdiff.

      • implements security enforcement mechanisms - e.g. GnuPG.

Expected Results

  • Deliverables:

    • A new and enhanced format for the sources.list file and the additional information (the enhanced sources.list file).

    • A public, pluggable parser, capable of understanding this new format (the enhanced public parser).

    • An insightful configuration interface for the parser's plugin management (the enhanced public parser interface).

    • An efficient and secure acquire logic (metadata backend).

    • A generic, extensible model for a plugin - what it handles, how does it handle it, when does the information change (generic plugin).

    • Plugins for present tools - apt-get, apt-file, debtags, ... (specific plugins).

    • (Possible plugin for AppStream).

    • All of the above would result in a powerful apt-get update tool capable of handling all OS package management metadata in a structured and coherent way.

  • Benefits to Debian OS:

    • Improving the package management system translates into improving the fundamental layer of the Debian OS.
    • Better bandwidth usage.
    • Less configuration and temporary files, and all kept in one place.
    • Scalability of the metadata acquire system.
    • Better metadata cohesion.
    • Future package management tools won't be coerced to build their own metadata framework - they will just have to come up with a plugin.
  • Benefits to Debian Community:

    • Popularity through usability.
    • Popularity through performance.
    • Integration with other communities through AppStream.

Suggested Timeline

  • April 23 - May 21:

    • Administrative Tasks

      • Get in touch with the mentor and discuss the project and its details.
      • Install a local build environment and required development tools.
      • Get familiar with the Debian community and development model.
    • General Research

      • Debian source code structure.
      • C++ is a very powerful language - how many of its cutting-edge features are used by DDs, do I need to improve language knowledge to cope with understanding the code?
      • Security issues - authentication, authorization, types of attacks, data integrity.
      • Efficiency issues - responsiveness, bandwidth usage, caching.
    • Research State of the Art

      • The present package metadata acquire logic.
      • The format of the configuration files.
      • The relanshionships between different pieces of metadata.
      • The Debian Archive format.
      • The AppStream specs and metadata.

    • Design Tasks

      • Static configuration format (sources.list and other info).
      • Parser model and API.
      • Generic plugin model (and API).
      • Medatada acquire logic and technologies.
  • May 21 - July 13

    • MILESTONE 1: apt-get update with default functionality, using the new backend and parser

      • Week 1: May 21 - May 27: sources.list format definition; basic parser interface definition and implementation.

      • Week 2: May 28 - June 3: basic acquire logic and communication security for the backend; integration with the parser.

      • Week 3: June 4 - June 10: integration with apt-get update; simple test cases.

    • MILESTONE 2: support for first plugin

      • Week 4: June 11 - June 17: generic plugin model definition.

      • Week 5: June 18 - June 24: generic model primitives implementation and test cases.

      • Week 6: June 25 - Jul 1: integration with one packaging tool - suggestion: apt-file; testing.

    • MILESTONE 3: backend completion

      • Week 7: Jul 2 - Jul 8: backend optimization of the acquire logic.

      • Week 8: Jul 9 - Jul 13: backend security mechanisms.

    • At this point there should be a complet proof of concept upon the functionality of the new model and a set of tests.

  • July 13 - August 13

    • MILESTONE 4: parser completion

      • Week 9: Jul 16 - Jul 22: parser module and generic plugin primitives final implementation; testing.

    • MILESTONE 5: other plugin support

      • Week 10: Jul 23 - Jul 29: one additional plugin will be implemented. We can go with implementing the debtags plugin - to support an existing tool - or with AppStream.

    • MILESTONE 6: deliverable package

      • Week 11: Jul 30 - Aug 5: user interface for the parser.

      • Week 12: Aug 6 - Aug 12: packaging, doxygen documentation from source comments.

  • August 13 - hard deadline

    • Final touches: system testing, code refactoring, etc.


  • Exams and other commitments:

    • During this period I'm developing my Bachelor Thesis project, which is expected to be done until the 30th of May. I estimate to dedicate 20 - 30 h / week to GSoC documentation and community bonding. After this date, my main interest will switch to GSoC.
  • Other summer plans:

    • This is not certain, but I'm planning to leave the country for a few days on a trip this summer - a week at most. During this time, I won't be able to code at GSoc.
  • Why Debian?:

    • My first real contact with Linux - I had a first try with Slackware and SUSE, but I didn't find them that easy to learn.

    • Open Source - powerful in learning new technologies and meeting new professionals.
    • Used worldwide - currently one of the most popular Linux distros (along with Ubuntu, which is basically Debian as well).
    • C/C++ - my first programming playground, and the one I'm most familiar with.
  • Are you applying for other projects in SoC? No.