Differences between revisions 6 and 28 (spanning 22 versions)
Revision 6 as of 2012-04-01 09:33:04
Size: 5051
Comment:
Revision 28 as of 2012-04-06 14:38:14
Size: 13554
Comment:
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
  * bogdan.purcareata AT gmail DOT com   * bogdan.purcareata AT gmail DOT com
 * Romania, UTC + 2
Line 10: Line 11:
  * senior undergraduate student at Computer Science and Automatic Control Faculty, Politehnica University of Bucharest
  * knowledge in ''C'' (8 years), as well as ''C++'', ''Java'' and ''Python''
  * knowledge in ''Algorithm Design'', ''Data Structures'' and ''Project Development Workflow''
  * knowledge in ''Compilers'', ''Operating Systems'', ''Networking'', ''Distributed Systems''
  * highly dependable and efficiency oriented professional
  * ambitious, focused and enduring individual
  * familiar with the concepts of Open Source Development - I've pursued an Open Source Development Course[1], organized by ROSEdu[2]
  * refactoring, optimizing and unifying the metadata acquire system for APT would significantly improve the whole Debian user experience, as well as improve the Package Management System's consistency. To me, this is both a thrilling challenge and a great opportunity to analyze the Debian OS internals and core features.
  * ''Who are you?''
   * Senior undergraduate student in Computer Science at the Computer Science and Automatic Control Faculty, University POLITEHNICA of Bucharest.
   * Highly dependable and efficiency oriented professional.
   * Ambitious, focused and enduring individual.
   * More on my [[attachment:CV_BogdanPurcareata_English.pdf | Personal CV]].
  * ''Technical skills / Known technologies''
   * Knowledge in ''C'' (8 years), as well as ''C++'', ''Java'' and ''Python''.
   * Knowledge in ''Algorithm Design'', ''Data Structures'' and ''Project Development Workflow''.
   * Knowledge in ''Compilers'', ''Operating Systems'', ''Networking'', ''Distributed Systems'', ''System programming'', ''Shell Scripting (BASH)''.
  * ''Experience''
   * '''Open Source Development Course''' - I've pursued this course [[ http://cdl.rosedu.org/2012/english | [1]]], organized by ROSEdu [[ http://rosedu.org/ | [2]]]. I've had the chance to learn concepts about ''Version Control'', ''Bug Tracking'', ''Netiquette'', ''Unit Testing'', ''Debugging'', ''Security Practices'', ''Licensing'' and pretty much everything related to ''Open Source development and communities''.
   * '''Ixia Internship''' - ported a desktop client for a network traffic test engine on the Android platform.
   * '''GPGPU Workshop''' - nVIDIA CUDA GPU C++ programming, held as a Summer School at our University.
   * '''Algorithm Design Course''' - developed a ''chess engine'', equipped with artificial intelligence.
  * '''What makes you the best person to work on this project?'''
   * Recently I've installed a Debian build environment locally with the latest sid release, and contributed to solving some minor bugs: [[ http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=639008 | [3]]], [[ http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=665833 | [4]]], [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649340 | [5]]].
   * Refactoring, optimizing and unifying the metadata acquire system for APT would significantly improve the whole Debian user experience, as well as improve the Package Management System's consistency. To me, this is both a thrilling challenge and a great opportunity to analyze the Debian OS internals and core features.
   * I've been using Debian and Ubuntu for 4 years now, both for my university assignments and for the introspection of the operating system.

== Project ==
Line 19: Line 32:
 * '''Project details''': TODO
 * '''Synopsis''': TODO
 * '''Benefits to Debian''': TODO
 * '''Project details''':
  Debian has developed several tools to manage packages, each one having its own way of handling metadata. This results in a mixed package management system, prone to inconsistencies and loss in overall OS performance. The aim of this project is to build a broad image of the package metadata locally, so all the information is kept in one place and is updated at the same time. The user, by choosing which tools to use, tells the manager what metadata to acquire, therefore how specific a Debian Archive local image he wishes to interact to.
 * '''Synopsis''':
  * ''Current model'':
   * Several tools for package management - ''apt-get'', ''debtags'', ''apt-file''.
   * Each one handles its own set of package metadata, therefore its own view of the remote Debian Archive:
    * '''apt-get''': Release, Packages, Sources, Translations.
    * '''apt-file''': Content.
    * '''debtags''': Debtags (facets and tags).
   * Each one requires individual updating and interaction.
   * Private parsing of the sources.list file in apt-get.
  * ''Desired goals''
   * '''Performance''': minimal bandwidth usage achieved by metadata differential retrieval and consistent storage. The current technique of fetching only the changes in metadata - '''pdiff''' - will be present in the new system.
   * '''Effectiveness''': the user defines which components the acquire system will use, and the system only uses these components. By default, the system comes with the old ''apt-get update'' functionality.
   * '''Scalability''': the system is pluggable. Each time a package tool is installed - e.g. ''apt-file'', ''debtags'' - it registers itself to this system, and activates the corresponding plugin.
   * '''Forward compatibility''': the system will have a generic design, open for future development, especially in the plugin area. I think it would be a good idea to '''expose a plugin development API''', so new package tools developers can make use of this unified metadata acquire system to handle their metadata as well.
   * '''Backward compatibility''': it is desired that the system doesn't break the existing interfaces, and the transition to the new system is as transparent as possible to the user.
   * '''Openness''': providing a public parser for the sources.list file, that other package management tools can use instead of inventing their own.
  * ''Proposed Architecture'':
    {{attachment:PluggableAcquireSystemforAPT-ArchitectureDiagram.png|width=800}}
<<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>>
<<BR>><<BR>><<BR>><<BR>><<BR>>
  * ''Main components'':
   * '''The Enhanced sources.list File''':
    * stores additional information for each URL, besides suite and area - e.g. the enabled plugins.
    * remains compatible with the current apt-get update private parser, which is compatible with present format only - this can be solved in the format of the sources.list, or as a patch for the current parser.
    * another option to enhance the functionality of this static file is to store additional plugin info as snippets in a separate directory - e.g. ''/etc/apt/sources.list.d/plugins/''. The parser will scan the contents of this directory to determine what additional metadata to fetch.
   * '''The Enhanced Public Parser''':
    * is implemented using the libapt API.
    * provides a parsing API for the package management tools frontends.
    * represents the pluggable component of the system - plugins are registered at install time with a default configuration. The user may handle plugin management via an interface.
    * comes with apt-get update old parser functionality by default.
    * supports a generic plugin model for new types of metadata.
    * there are two ways of categorizing the plugins:
     * ''per current tools' handled metadata'': '''basic packages''' metadata, '''apt-file''' metadata, '''[[AppStreamDebianProposal | AppStream]]''' metadata.
     * ''per type of metadata'': '''Release''', '''Packages''', '''Sources''', '''Content''', '''Tags''', '''Components''' ([[AppStreamDebianProposal | AppStream]]).
    * ''I suggest the second one is used, due to finer granularity''.
   * '''The Unified Metadata Backend''':
    * invokes the parser to build an ''index'' of desired metadata to fetch from the ''Debian Archive''.
    * is responsible with fetching the metadata from the ''Debian Archive'', processing it, and retrieving it to ''apt-get update''.
    * implements efficient metadata transport mechanisms - e.g. '''pdiff'''.
    * implements security enforcement mechanisms - e.g. '''GnuPG'''.
== Expected Results ==
Line 23: Line 76:
  * A consistent configuration file / set of files for all Debian package metadata tools
  * An insightful configuration interface
  * An efficient and secure acquire logic
  * A parser for the sources.list file, to build the main config file (and others)
  * A generic, extensible model for a plugin - what it handles, how does it handle it, when does the information change
  * Plugins for present tools - apt-get, apt-file, debtags, ...
  * (Possible plugin for AppStream)
  * All of the above would result in '''a powerful apt-get update tool capable of handling all OS package management metadata in a structured and coherent way'''
 * '''Project schedule''':
  * A new and enhanced format for the sources.list file and the additional information (''the enhanced sources.list file'').
  * A public, pluggable parser, capable of understanding this new format (''the enhanced public parser'').
  * An insightful configuration interface for the parser's plugin management (''the enhanced public parser interface'').
  * An efficient and secure acquire logic (''metadata backend'').
  * A generic, extensible model for a plugin - what it handles, how does it handle it, when does the information change (''generic plugin'').
  * Plugins for present tools - apt-get, apt-file, debtags, ... (''specific plugins'').
  * (Possible plugin for [[AppStreamDebianProposal | AppStream]]).
  * All of the above would result in '''a powerful apt-get update tool capable of handling all OS package management metadata in a structured and coherent way'''.
 * '''Benefits to Debian OS''':
  * Improving the package management system translates into improving the fundamental layer of the Debian OS.
  * Better bandwidth usage.
  * Less configuration and temporary files, and all kept in one place.
  * Scalability of the metadata acquire system.
  * Better metadata cohesion.
  * Future package management tools won't be coerced to build their own metadata framework - they will just have to come up with a plugin.
 * '''Benefits to Debian Community''':
  * Popularity through usability.
  * Popularity through performance.
  * Integration with other communities through [[AppStreamDebianProposal | AppStream]].
== Suggested Timeline ==
Line 34: Line 98:
    * get in touch with the mentor
    * install a local build environment
    * get familiar with the Debian community and development model
    * Get in touch with the mentor and discuss the project and its details.
    * Install a local build environment and required development tools.
    * Get familiar with the Debian community and development model.
Line 38: Line 102:
    * Debian source code structure
    * C++ is a very powerful language - how much of its cutting-edge features are used by DD, do I need to improve language knowledge cu cope with understanding the code?
    * Security issues - authentication, authorization, types of attacks, data integrity
    * Efficiency issues - responsiveness, bandwidth usage, caching
    * Debian source code structure.
    * C++ is a very powerful language - how many of its cutting-edge features are used by DDs, do I need to improve language knowledge to cope with understanding the code?
    * Security issues - authentication, authorization, types of attacks, data integrity.
    * Efficiency issues - responsiveness, bandwidth usage, caching.
Line 43: Line 107:
    * the present package metadata acquire logic
    * t
he format of the configuration files
    * the relansh
ionships between different pieces of metadata
    * t
he Debian Archive format
    * the
AppStream specs and metadata
    * The present package metadata acquire logic.
    * T
he format of the configuration files.
    * The relat
ionships between different pieces of metadata.
    * T
he Debian Archive format.
    * The [[
AppStreamDebianProposal | AppStream]] specs and metadata.
Line 49: Line 113:
    * configuration model
    * acquire-logic
    * plugins model
    * parser model
    * Static configuration format (sources.list and other info).
    * Parser model and API.
    * Generic plugin model (and API).
    * Medatada acquire logic and technologies.
Line 54: Line 118:
   * implement the configuration model and acquire logic - first draft
   * implement the metadata acquire backend - first draft
   * integrate apt-get update with the new metadata acquire backend - assert functionality
   * implement the first supported plugin
   * integrate apt-get update with the backend using this plugin - assert functionality
   * implement sources.list parser - first draft
   * generate configuration file using the parser - assert functionality
   * ''at this point there should be a complet proof of concept upon the functionality of the new model''
   * MILESTONE 1: '''apt-get update with default functionality, using the new backend and parser'''
    * ''Week 1: May 21 - May 27'': sources.list format definition; basic parser interface definition and implementation.
    * ''Week 2: May 28 - June 3'': basic acquire logic and communication security for the backend; integration with the parser.
    * ''Week 3: June 4 - June 10'': integration with apt-get update; simple test cases.
   * MILESTONE 2: '''support for first plugin'''
    * ''Week 4: June 11 - June 17'': generic plugin model definition.
    * ''Week 5: June 18 - June 24'': generic model primitives implementation and test cases.
    * ''Week 6: June 25 - Jul 1'': integration with one packaging tool - suggestion: ''apt-file''; testing.
   * MILESTONE 3: '''backend completion'''
    * ''Week 7: Jul 2 - Jul 8'': backend optimization of the acquire logic.
    * ''Week 8: Jul 9 - Jul 13'': backend security mechanisms.
   * ''At this point there should be a complete proof of concept upon the functionality of the new model and a set of tests''.
Line 63: Line 131:
   * configuration interface for the user
   * support for other plugins
   * sources.list parser - final implementation
   * configuration and acquire logic - final implementation
   * metadata acquire backend - final implementation
   * MILESTONE 4: '''parser completion'''
    * ''Week 9: Jul 16 - Jul 22'': parser module and generic plugin primitives final implementation; testing.
   * MILESTONE 5: '''other plugin support'''
    * ''Week 10: Jul 23 - Jul 29'': one additional plugin will be implemented. We can go with implementing the debtags plugin - to support an existing tool - or with [[AppStreamDebianProposal | AppStream]].
   * MILESTONE 6: '''deliverable package'''
    * ''Week 11: Jul 30 - Aug 5'': user interface for the parser.
    * ''Week 12: Aug 6 - Aug 12'': packaging, doxygen documentation from source comments.
Line 69: Line 139:
   * final touches    * Final touches: system testing, code refactoring, etc.
== Miscellaneous ==
Line 71: Line 142:
   During this period I'm developing my Bachelor Thesis project, which is expected to be done until de 30th of May. After this date, my main interest will switch to GSoC.    During this period I'm developing my Bachelor Thesis project, which is expected to be done until the 30th of May. I estimate to dedicate 20 - 30 h / week to GSoC documentation and community bonding. After this date, my main interest will switch to GSoC.
Line 73: Line 144:
  * this is not certain, but I'm planning to leave the country for a few days on a trip this summer - a week at most. During this time, I won't be able to code at GSoc.   * This is not certain, but I'm planning to leave the country for a few days on a trip this summer - a week at most. During this time, I won't be able to code at GSoC.
Line 75: Line 146:
  * my first ''real'' contact with Linux - I had a first try with Slackware, but I failed miserably.   * My first ''real'' contact with Linux - I had a first try with Slackware and SUSE, but I didn't find them that easy to learn.
Line 77: Line 148:
  * used worldwide - currently one of the most popular Linux distros (along with Ubuntu, which is basically Debian as well).   * Used worldwide - currently one of the most popular Linux distros (along with Ubuntu, which is basically Debian as well).
Line 80: Line 151:

  [[http://cdl.rosedu.org/2012/english | 1 - Open Source Development Course]]
  [[http://rosedu.org/ | 2 - Romanian Open Source Education]]
  <<BR>>
  [1] [[http://cdl.rosedu.org/2012/english | Open Source Development Course]] <<BR>>
  [2] [[http://rosedu.org/ | Romanian Open Source Education]] <<BR>>
  [3] [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=639008 | Debian Bug #639008]] <<BR>>
  [4] [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=665833 | Debian Bug #665833]] <<BR>>
  [5] [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649340 | Debian Bug #649340]] <<BR>>

Student Application Template

  • Name: Bogdan Purcăreață

  • Contact/Email:

    • IRC: dodgerblue on OFTC, Freenode

    • bogdan.purcareata AT gmail DOT com
    • Romania, UTC + 2
  • Background:

    • Who are you?

      • Senior undergraduate student in Computer Science at the Computer Science and Automatic Control Faculty, University POLITEHNICA of Bucharest.
      • Highly dependable and efficiency oriented professional.
      • Ambitious, focused and enduring individual.
      • More on my Personal CV.

    • Technical skills / Known technologies

      • Knowledge in C (8 years), as well as C++, Java and Python.

      • Knowledge in Algorithm Design, Data Structures and Project Development Workflow.

      • Knowledge in Compilers, Operating Systems, Networking, Distributed Systems, System programming, Shell Scripting (BASH).

    • Experience

      • Open Source Development Course - I've pursued this course [1], organized by ROSEdu [2]. I've had the chance to learn concepts about Version Control, Bug Tracking, Netiquette, Unit Testing, Debugging, Security Practices, Licensing and pretty much everything related to Open Source development and communities.

      • Ixia Internship - ported a desktop client for a network traffic test engine on the Android platform.

      • GPGPU Workshop - nVIDIA CUDA GPU C++ programming, held as a Summer School at our University.

      • Algorithm Design Course - developed a chess engine, equipped with artificial intelligence.

    • What makes you the best person to work on this project?

      • Recently I've installed a Debian build environment locally with the latest sid release, and contributed to solving some minor bugs: [3], [4], [5].

      • Refactoring, optimizing and unifying the metadata acquire system for APT would significantly improve the whole Debian user experience, as well as improve the Package Management System's consistency. To me, this is both a thrilling challenge and a great opportunity to analyze the Debian OS internals and core features.
      • I've been using Debian and Ubuntu for 4 years now, both for my university assignments and for the introspection of the operating system.

Project

  • Project title: Pluggable Acquire-System for APT

  • Project details:

    • Debian has developed several tools to manage packages, each one having its own way of handling metadata. This results in a mixed package management system, prone to inconsistencies and loss in overall OS performance. The aim of this project is to build a broad image of the package metadata locally, so all the information is kept in one place and is updated at the same time. The user, by choosing which tools to use, tells the manager what metadata to acquire, therefore how specific a Debian Archive local image he wishes to interact to.
  • Synopsis:

    • Current model:

      • Several tools for package management - apt-get, debtags, apt-file.

      • Each one handles its own set of package metadata, therefore its own view of the remote Debian Archive:
        • apt-get: Release, Packages, Sources, Translations.

        • apt-file: Content.

        • debtags: Debtags (facets and tags).

      • Each one requires individual updating and interaction.
      • Private parsing of the sources.list file in apt-get.
    • Desired goals

      • Performance: minimal bandwidth usage achieved by metadata differential retrieval and consistent storage. The current technique of fetching only the changes in metadata - pdiff - will be present in the new system.

      • Effectiveness: the user defines which components the acquire system will use, and the system only uses these components. By default, the system comes with the old apt-get update functionality.

      • Scalability: the system is pluggable. Each time a package tool is installed - e.g. apt-file, debtags - it registers itself to this system, and activates the corresponding plugin.

      • Forward compatibility: the system will have a generic design, open for future development, especially in the plugin area. I think it would be a good idea to expose a plugin development API, so new package tools developers can make use of this unified metadata acquire system to handle their metadata as well.

      • Backward compatibility: it is desired that the system doesn't break the existing interfaces, and the transition to the new system is as transparent as possible to the user.

      • Openness: providing a public parser for the sources.list file, that other package management tools can use instead of inventing their own.

    • Proposed Architecture:

      • width=800



















  • Main components:

    • The Enhanced sources.list File:

      • stores additional information for each URL, besides suite and area - e.g. the enabled plugins.
      • remains compatible with the current apt-get update private parser, which is compatible with present format only - this can be solved in the format of the sources.list, or as a patch for the current parser.
      • another option to enhance the functionality of this static file is to store additional plugin info as snippets in a separate directory - e.g. /etc/apt/sources.list.d/plugins/. The parser will scan the contents of this directory to determine what additional metadata to fetch.

    • The Enhanced Public Parser:

      • is implemented using the libapt API.
      • provides a parsing API for the package management tools frontends.
      • represents the pluggable component of the system - plugins are registered at install time with a default configuration. The user may handle plugin management via an interface.
      • comes with apt-get update old parser functionality by default.
      • supports a generic plugin model for new types of metadata.
      • there are two ways of categorizing the plugins:
        • per current tools' handled metadata: basic packages metadata, apt-file metadata, AppStream metadata.

        • per type of metadata: Release, Packages, Sources, Content, Tags, Components (AppStream).

      • I suggest the second one is used, due to finer granularity.

    • The Unified Metadata Backend:

      • invokes the parser to build an index of desired metadata to fetch from the Debian Archive.

      • is responsible with fetching the metadata from the Debian Archive, processing it, and retrieving it to apt-get update.

      • implements efficient metadata transport mechanisms - e.g. pdiff.

      • implements security enforcement mechanisms - e.g. GnuPG.

Expected Results

  • Deliverables:

    • A new and enhanced format for the sources.list file and the additional information (the enhanced sources.list file).

    • A public, pluggable parser, capable of understanding this new format (the enhanced public parser).

    • An insightful configuration interface for the parser's plugin management (the enhanced public parser interface).

    • An efficient and secure acquire logic (metadata backend).

    • A generic, extensible model for a plugin - what it handles, how does it handle it, when does the information change (generic plugin).

    • Plugins for present tools - apt-get, apt-file, debtags, ... (specific plugins).

    • (Possible plugin for AppStream).

    • All of the above would result in a powerful apt-get update tool capable of handling all OS package management metadata in a structured and coherent way.

  • Benefits to Debian OS:

    • Improving the package management system translates into improving the fundamental layer of the Debian OS.
    • Better bandwidth usage.
    • Less configuration and temporary files, and all kept in one place.
    • Scalability of the metadata acquire system.
    • Better metadata cohesion.
    • Future package management tools won't be coerced to build their own metadata framework - they will just have to come up with a plugin.
  • Benefits to Debian Community:

    • Popularity through usability.
    • Popularity through performance.
    • Integration with other communities through AppStream.

Suggested Timeline

  • April 23 - May 21:

    • Administrative Tasks

      • Get in touch with the mentor and discuss the project and its details.
      • Install a local build environment and required development tools.
      • Get familiar with the Debian community and development model.
    • General Research

      • Debian source code structure.
      • C++ is a very powerful language - how many of its cutting-edge features are used by DDs, do I need to improve language knowledge to cope with understanding the code?
      • Security issues - authentication, authorization, types of attacks, data integrity.
      • Efficiency issues - responsiveness, bandwidth usage, caching.
    • Research State of the Art

      • The present package metadata acquire logic.
      • The format of the configuration files.
      • The relationships between different pieces of metadata.
      • The Debian Archive format.
      • The AppStream specs and metadata.

    • Design Tasks

      • Static configuration format (sources.list and other info).
      • Parser model and API.
      • Generic plugin model (and API).
      • Medatada acquire logic and technologies.
  • May 21 - July 13

    • MILESTONE 1: apt-get update with default functionality, using the new backend and parser

      • Week 1: May 21 - May 27: sources.list format definition; basic parser interface definition and implementation.

      • Week 2: May 28 - June 3: basic acquire logic and communication security for the backend; integration with the parser.

      • Week 3: June 4 - June 10: integration with apt-get update; simple test cases.

    • MILESTONE 2: support for first plugin

      • Week 4: June 11 - June 17: generic plugin model definition.

      • Week 5: June 18 - June 24: generic model primitives implementation and test cases.

      • Week 6: June 25 - Jul 1: integration with one packaging tool - suggestion: apt-file; testing.

    • MILESTONE 3: backend completion

      • Week 7: Jul 2 - Jul 8: backend optimization of the acquire logic.

      • Week 8: Jul 9 - Jul 13: backend security mechanisms.

    • At this point there should be a complete proof of concept upon the functionality of the new model and a set of tests.

  • July 13 - August 13

    • MILESTONE 4: parser completion

      • Week 9: Jul 16 - Jul 22: parser module and generic plugin primitives final implementation; testing.

    • MILESTONE 5: other plugin support

      • Week 10: Jul 23 - Jul 29: one additional plugin will be implemented. We can go with implementing the debtags plugin - to support an existing tool - or with AppStream.

    • MILESTONE 6: deliverable package

      • Week 11: Jul 30 - Aug 5: user interface for the parser.

      • Week 12: Aug 6 - Aug 12: packaging, doxygen documentation from source comments.

  • August 13 - hard deadline

    • Final touches: system testing, code refactoring, etc.

Miscellaneous

  • Exams and other commitments:

    • During this period I'm developing my Bachelor Thesis project, which is expected to be done until the 30th of May. I estimate to dedicate 20 - 30 h / week to GSoC documentation and community bonding. After this date, my main interest will switch to GSoC.
  • Other summer plans:

    • This is not certain, but I'm planning to leave the country for a few days on a trip this summer - a week at most. During this time, I won't be able to code at GSoC.
  • Why Debian?:

    • My first real contact with Linux - I had a first try with Slackware and SUSE, but I didn't find them that easy to learn.

    • Open Source - powerful in learning new technologies and meeting new professionals.
    • Used worldwide - currently one of the most popular Linux distros (along with Ubuntu, which is basically Debian as well).
    • C/C++ - my first programming playground, and the one I'm most familiar with.
  • Are you applying for other projects in SoC? No.