Differences between revisions 1 and 20 (spanning 19 versions)
Revision 1 as of 2012-04-01 00:29:10
Size: 1488
Comment:
Revision 20 as of 2012-04-03 20:35:21
Size: 11898
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
 * '''Contact/Email''':
  * IRC: '''dodgerblue''' on irc.oftc.net
  * gmail: '''bogdan.purcareata'''
 * '''Background''':
  * senior undergraduate student at Computer Science and Automatic Control Faculty, Politehnica University of Bucharest
  * knowledge in ''C'' (8 years), as well as ''C++'', ''Java'' and ''Python''
  * knowledge in ''Algorithm Design'', ''Data Structures'' and ''Project Development Workflow''
  * knowledge in ''Compilers'', ''Operating Systems'', ''Networking'', ''Distributed Systems''
  * highly dependable and efficiency oriented professional
  * ambitious, focused and enduring individual
  * refactoring, optimizing and unifying the metadata acquire system for APT would significantly improve the whole Debian user experience, as well as improve the Package Management System's consistency. To me, this is both a thrilling challenge and a great opportunity to analyze the Debian OS internals and core features.
 * '''Contact/Email''':
  * IRC: '''dodgerblue''' on OFTC, Freenode
  * bogdan.purcareata AT gmail DOT com
  * Romania, UTC + 2
 * '''Background''':
  * ''Who are you?''
   * Senior undergraduate student at the Computer Science and Automatic Control Faculty, Politehnica University of Bucharest.
   * Highly dependable and efficiency oriented professional.
   * Ambitious, focused and enduring individual.
  * ''Technical skills / Known technologies''
   * Knowledge in ''C'' (8 years), as well as ''C++'', ''Java'' and ''Python''
   * Knowledge in ''Algorithm Design'', ''Data Structures'' and ''Project Development Workflow''
   * Knowledge in ''Compilers'', ''Operating Systems'', ''Networking'', ''Distributed Systems'', ''System programming'', ''Shell Scripting (BASH)''
  * ''Experience''
   * '''Open Source Development Course''' - I've pursued this course [1], organized by ROSEdu [2]. I've had the chance to learn concepts about ''Version Control'', ''Bug Tracking'', ''Netiquette'', ''Unit Testing'', ''Debugging'', ''Security Practices'', ''Licensing'' and pretty much everything related to ''Open Source development and communities''.
   * '''Ixia Internship''' - ported a desktop client for a test engine on the Android platform.
   * '''GPGPU Workshop''' - GPU C++ programming, held as a Summer School at our University.
   * '''Algorithm Design Course''' - developed a ''chess engine'', equipped with artificial intelligence.
  * '''What makes you the best person to work on this project?''': Refactoring, optimizing and unifying the metadata acquire system for APT would significantly improve the whole Debian user experience, as well as improve the Package Management System's consistency. To me, this is both a thrilling challenge and a great opportunity to analyze the Debian OS internals and core features. I've been using Debian and Ubuntu for 4 years now, both for my university assignments and for the introspection of the operating system.
== Project ==
Line 18: Line 27:
 * '''Project details''': TODO
 * '''Synopsis''': TODO
 * '''Benefits to Debian''': TODO
 * '''Deliverables''': TODO
 * '''Project schedule''': TODO
 * '''Exams and other commitments''': TODO
 * '''Other summer plans''': TODO
 * '''Why Debian?''': TODO
 * '''Project details''':
  Debian has developed several tools to manage packages, each one having its own way of handling metadata. This results in a mixed package management system, prone to inconsistencies and loss in overall OS performance. The aim of this project is to build a broad image of the package metadata locally, so all the information is kept in one place and is updated at the same time. The user, by choosing which tools to use, tells the manager what metadata to acquire, therefore how specific a Debian Archive local image he wishes to interact to.
 * '''Synopsis''':
  * ''Current model'':
   * Several tools for package management - ''apt-get'', ''debtags'', ''apt-file''.
   * Each one handles its own set of package metadata, therefore its own view of the remote Debian Archive:
    * '''apt-get''': Release, Packages, Sources, Translations.
    * '''apt-file''': Content.
    * '''debtags''': Tags (facets and tags).
   * Each one requires individual updating and interaction.
   * Private parsing of the sources.list file in apt-get.
  * ''Desired goals''
   * '''Performance''': minimal bandwidth usage by efficient diffs and broad local package metadata.
   * '''Effectiveness''': the user defines which components the acquire system will use, and the system only uses these components.
   * '''Scalability''': the system is pluggable.
   * '''Forward compatibility''': the system will have a generic design, open for future development.
   * '''Backward compatibility''': it is desired that the system doesn't break the existing interfaces, and the transition to the new sistem is as transparent as possible to the user.
   * '''Openness''': providing a public parser for the sources.list file, that other package management tools can use instead of inventing their own.
  * ''Proposed Architecture'':
    {{attachment:PluggableAcquireSystemforAPT-ArchitectureDiagram.png|width=800}}
<<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>>
<<BR>><<BR>><<BR>><<BR>><<BR>>
  * ''Main components'':
   * '''The Enhanced sources.list File''':
    * stores additional information for each URL, besides suite and area - e.g. the enabled plugins.
    * remains compatible with the current apt-get update private parser, which is compatible with present format only - this can be solved in the format of the sources.list, or as a patch for the current parser.
    * another option to enhance the functionality of this static file is to store additional plugin info in a separate directory - e.g. ''/etc/sources.list.d/plugins/''. The parser will scan the contents of this directory to fetch additional relevant info.
   * '''The Enhanced Public Parser''':
    * is implemented using the libapt API.
    * provides a parsing API for the package management tools frontends.
    * represents the pluggable component of the system - plugins are registered at install time with a default configuration. The user may handle plugin management via an interface.
    * comes with apt-get update old parser functionality by default.
    * supports a generic plugin model for new types of metadata.
    * there are two ways of categorizing the plugins:
     * ''per current tools' handled metadata'': '''basic packages''' metadata, '''apt-file''' metadata, '''[[AppStreamDebianProposal | AppStream]]''' metadata.
     * ''per type of metadata'': '''Release''', '''Packages''', '''Sources''', '''Content''', '''Tags''', '''Components''' ([[AppStreamDebianProposal | AppStream]]).
    * ''I suggest the second one is used, due to finer granularity''.
   * '''The Unified Metadata Backend''':
    * invokes the parser to build an ''index'' of desired metadata to fetch from the ''Debian Archive''.
    * is responsible with fetching the metadata from the ''Debian Archive'', processing it, and retrieving it to ''apt-get update''.
    * implements efficient transport mechanisms.
    * implements security enforcement mechanisms.
== Expected Results ==
 * '''Deliverables''':
  * A new and enhanced format for the sources.list file and the additional information (''the enhanced sources.list file'').
  * A public, pluggable parser, capable of understanding this new format (''the enhanced public parser'')
  * An insightful configuration interface for the parser's plugin management (''the enhanced public parser interface'')
  * An efficient and secure acquire logic (''metadata backend'').
  * A generic, extensible model for a plugin - what it handles, how does it handle it, when does the information change (''generic plugin'').
  * Plugins for present tools - apt-get, apt-file, debtags, ... (''specific plugins'').
  * (Possible plugin for [[AppStreamDebianProposal | AppStream]]).
  * All of the above would result in '''a powerful apt-get update tool capable of handling all OS package management metadata in a structured and coherent way'''.
 * '''Benefits to Debian OS''':
  * Improving the package management system translates into improving the fundamental layer of the Debian OS.
  * Better bandwidth usage.
  * Less configuration and temporary files, and all kept in one place.
  * Scalability of the metadata acquire system.
  * Better metadata cohesion.
  * Future package management tools won't be coerced to build their own metadata framework - they will just have to come up with a plugin.
 * '''Benefits to Debian Community''':
  * Popularity through usability.
  * Popularity through performance.
  * Integration with other communities through [[AppStreamDebianProposal | AppStream]].
== Suggested Timeline ==
  * '''April 23 - May 21''':
   * ''Administrative Tasks''
    * Get in touch with the mentor.
    * Install a local build environment.
    * Get familiar with the Debian community and development model.
   * ''General Research''
    * Debian source code structure.
    * C++ is a very powerful language - how much of its cutting-edge features are used by DD, do I need to improve language knowledge to cope with understanding the code?
    * Security issues - authentication, authorization, types of attacks, data integrity.
    * Efficiency issues - responsiveness, bandwidth usage, caching.
   * ''Research State of the Art''
    * The present package metadata acquire logic.
    * The format of the configuration files.
    * The relanshionships between different pieces of metadata.
    * The Debian Archive format.
    * The [[AppStreamDebianProposal | AppStream]] specs and metadata.
   * ''Design Tasks''
    * Configuration model.
    * Acquire-logic.
    * Plugins model.
    * Parser model.
  * '''May 21 - July 13'''
   * MILESTONE 1: '''apt-get update with default functionality, using the new backend and parser'''
    * ''Week 1: May 21 - May 27'': sources.list format definition; basic parser interface definition and implementation.
    * ''Week 2: May 28 - June 3'': basic acquire logic and communication security for the backend; integration with the parser.
    * ''Week 3: June 4 - June 10'': integration with apt-get update; simple test cases.
   * MILESTONE 2: '''support for first plugin'''
    * ''Week 4: June 11 - June 17'': generic plugin model definition.
    * ''Week 5: June 18 - June 24'': generic model primitives implementation and test cases.
    * ''Week 6: June 25 - Jul 1'': integration with one packaging tool - suggestion: ''apt-file''; testing.
   * MILESTONE 3: '''backend completion'''
    * ''Week 7: Jul 2 - Jul 8'': backend optimization of the acquire logic.
    * ''Week 8: Jul 9 - Jul 13'': backend security mechanisms.
   * ''At this point there should be a complet proof of concept upon the functionality of the new model and a set of tests''.
  * '''July 13 - August 13'''
   * MILESTONE 4: '''parser completion'''
    * ''Week 9: Jul 16 - Jul 22'': parser module and generic plugin primitives final implementation; testing.
   * MILESTONE 5: '''other plugin support'''
    * ''Week 10: Jul 23 - Jul 29'': one additional plugin will be implemented. We can go with implementing the debtags plugin - to support an existing tool - or with [[AppStreamDebianProposal | AppStream]].
   * MILESTONE 6: '''deliverable package'''
    * ''Week 11: Jul 30 - Aug 5'': user interface for the parser.
    * ''Week 12: Aug 6 - Aug 12'': packaging, doxygen documentation from source comments.
  * '''August 13 - hard deadline'''
   * Final touches: system testing, code refactoring, etc.
== Miscellaneous ==
 * '''Exams and other commitments''':
   During this period I'm developing my Bachelor Thesis project, which is expected to be done until the 30th of May. I estimate to dedicate 20 - 30 h / week to GSoC documentation and community bonding. After this date, my main interest will switch to GSoC.
 * '''Other summer plans''':
  * This is not certain, but I'm planning to leave the country for a few days on a trip this summer - a week at most. During this time, I won't be able to code at GSoc.
 * '''Why Debian?''':
  * My first ''real'' contact with Linux - I had a first try with Slackware and SUSE, but I didn't find them that easy to learn.
  * Open Source - powerful in learning new technologies and meeting new professionals.
  * Used worldwide - currently one of the most popular Linux distros (along with Ubuntu, which is basically Debian as well).
  * C/C++ - my first programming playground, and the one I'm most familiar with.
Line 27: Line 146:
  <<BR>>
  [1] [[http://cdl.rosedu.org/2012/english | Open Source Development Course]] <<BR>>
  [2] [[http://rosedu.org/ | Romanian Open Source Education]] <<BR>>

Student Application Template

  • Name: Bogdan Purcăreață

  • Contact/Email:

    • IRC: dodgerblue on OFTC, Freenode

    • bogdan.purcareata AT gmail DOT com
    • Romania, UTC + 2
  • Background:

    • Who are you?

      • Senior undergraduate student at the Computer Science and Automatic Control Faculty, Politehnica University of Bucharest.
      • Highly dependable and efficiency oriented professional.
      • Ambitious, focused and enduring individual.
    • Technical skills / Known technologies

      • Knowledge in C (8 years), as well as C++, Java and Python

      • Knowledge in Algorithm Design, Data Structures and Project Development Workflow

      • Knowledge in Compilers, Operating Systems, Networking, Distributed Systems, System programming, Shell Scripting (BASH)

    • Experience

      • Open Source Development Course - I've pursued this course [1], organized by ROSEdu [2]. I've had the chance to learn concepts about Version Control, Bug Tracking, Netiquette, Unit Testing, Debugging, Security Practices, Licensing and pretty much everything related to Open Source development and communities.

      • Ixia Internship - ported a desktop client for a test engine on the Android platform.

      • GPGPU Workshop - GPU C++ programming, held as a Summer School at our University.

      • Algorithm Design Course - developed a chess engine, equipped with artificial intelligence.

    • What makes you the best person to work on this project?: Refactoring, optimizing and unifying the metadata acquire system for APT would significantly improve the whole Debian user experience, as well as improve the Package Management System's consistency. To me, this is both a thrilling challenge and a great opportunity to analyze the Debian OS internals and core features. I've been using Debian and Ubuntu for 4 years now, both for my university assignments and for the introspection of the operating system.

Project

  • Project title: Pluggable Acquire-System for APT

  • Project details:

    • Debian has developed several tools to manage packages, each one having its own way of handling metadata. This results in a mixed package management system, prone to inconsistencies and loss in overall OS performance. The aim of this project is to build a broad image of the package metadata locally, so all the information is kept in one place and is updated at the same time. The user, by choosing which tools to use, tells the manager what metadata to acquire, therefore how specific a Debian Archive local image he wishes to interact to.
  • Synopsis:

    • Current model:

      • Several tools for package management - apt-get, debtags, apt-file.

      • Each one handles its own set of package metadata, therefore its own view of the remote Debian Archive:
        • apt-get: Release, Packages, Sources, Translations.

        • apt-file: Content.

        • debtags: Tags (facets and tags).

      • Each one requires individual updating and interaction.
      • Private parsing of the sources.list file in apt-get.
    • Desired goals

      • Performance: minimal bandwidth usage by efficient diffs and broad local package metadata.

      • Effectiveness: the user defines which components the acquire system will use, and the system only uses these components.

      • Scalability: the system is pluggable.

      • Forward compatibility: the system will have a generic design, open for future development.

      • Backward compatibility: it is desired that the system doesn't break the existing interfaces, and the transition to the new sistem is as transparent as possible to the user.

      • Openness: providing a public parser for the sources.list file, that other package management tools can use instead of inventing their own.

    • Proposed Architecture:

      • width=800



















  • Main components:

    • The Enhanced sources.list File:

      • stores additional information for each URL, besides suite and area - e.g. the enabled plugins.
      • remains compatible with the current apt-get update private parser, which is compatible with present format only - this can be solved in the format of the sources.list, or as a patch for the current parser.
      • another option to enhance the functionality of this static file is to store additional plugin info in a separate directory - e.g. /etc/sources.list.d/plugins/. The parser will scan the contents of this directory to fetch additional relevant info.

    • The Enhanced Public Parser:

      • is implemented using the libapt API.
      • provides a parsing API for the package management tools frontends.
      • represents the pluggable component of the system - plugins are registered at install time with a default configuration. The user may handle plugin management via an interface.
      • comes with apt-get update old parser functionality by default.
      • supports a generic plugin model for new types of metadata.
      • there are two ways of categorizing the plugins:
        • per current tools' handled metadata: basic packages metadata, apt-file metadata, AppStream metadata.

        • per type of metadata: Release, Packages, Sources, Content, Tags, Components (AppStream).

      • I suggest the second one is used, due to finer granularity.

    • The Unified Metadata Backend:

      • invokes the parser to build an index of desired metadata to fetch from the Debian Archive.

      • is responsible with fetching the metadata from the Debian Archive, processing it, and retrieving it to apt-get update.

      • implements efficient transport mechanisms.
      • implements security enforcement mechanisms.

Expected Results

  • Deliverables:

    • A new and enhanced format for the sources.list file and the additional information (the enhanced sources.list file).

    • A public, pluggable parser, capable of understanding this new format (the enhanced public parser)

    • An insightful configuration interface for the parser's plugin management (the enhanced public parser interface)

    • An efficient and secure acquire logic (metadata backend).

    • A generic, extensible model for a plugin - what it handles, how does it handle it, when does the information change (generic plugin).

    • Plugins for present tools - apt-get, apt-file, debtags, ... (specific plugins).

    • (Possible plugin for AppStream).

    • All of the above would result in a powerful apt-get update tool capable of handling all OS package management metadata in a structured and coherent way.

  • Benefits to Debian OS:

    • Improving the package management system translates into improving the fundamental layer of the Debian OS.
    • Better bandwidth usage.
    • Less configuration and temporary files, and all kept in one place.
    • Scalability of the metadata acquire system.
    • Better metadata cohesion.
    • Future package management tools won't be coerced to build their own metadata framework - they will just have to come up with a plugin.
  • Benefits to Debian Community:

    • Popularity through usability.
    • Popularity through performance.
    • Integration with other communities through AppStream.

Suggested Timeline

  • April 23 - May 21:

    • Administrative Tasks

      • Get in touch with the mentor.
      • Install a local build environment.
      • Get familiar with the Debian community and development model.
    • General Research

      • Debian source code structure.
      • C++ is a very powerful language - how much of its cutting-edge features are used by DD, do I need to improve language knowledge to cope with understanding the code?
      • Security issues - authentication, authorization, types of attacks, data integrity.
      • Efficiency issues - responsiveness, bandwidth usage, caching.
    • Research State of the Art

      • The present package metadata acquire logic.
      • The format of the configuration files.
      • The relanshionships between different pieces of metadata.
      • The Debian Archive format.
      • The AppStream specs and metadata.

    • Design Tasks

      • Configuration model.
      • Acquire-logic.
      • Plugins model.
      • Parser model.
  • May 21 - July 13

    • MILESTONE 1: apt-get update with default functionality, using the new backend and parser

      • Week 1: May 21 - May 27: sources.list format definition; basic parser interface definition and implementation.

      • Week 2: May 28 - June 3: basic acquire logic and communication security for the backend; integration with the parser.

      • Week 3: June 4 - June 10: integration with apt-get update; simple test cases.

    • MILESTONE 2: support for first plugin

      • Week 4: June 11 - June 17: generic plugin model definition.

      • Week 5: June 18 - June 24: generic model primitives implementation and test cases.

      • Week 6: June 25 - Jul 1: integration with one packaging tool - suggestion: apt-file; testing.

    • MILESTONE 3: backend completion

      • Week 7: Jul 2 - Jul 8: backend optimization of the acquire logic.

      • Week 8: Jul 9 - Jul 13: backend security mechanisms.

    • At this point there should be a complet proof of concept upon the functionality of the new model and a set of tests.

  • July 13 - August 13

    • MILESTONE 4: parser completion

      • Week 9: Jul 16 - Jul 22: parser module and generic plugin primitives final implementation; testing.

    • MILESTONE 5: other plugin support

      • Week 10: Jul 23 - Jul 29: one additional plugin will be implemented. We can go with implementing the debtags plugin - to support an existing tool - or with AppStream.

    • MILESTONE 6: deliverable package

      • Week 11: Jul 30 - Aug 5: user interface for the parser.

      • Week 12: Aug 6 - Aug 12: packaging, doxygen documentation from source comments.

  • August 13 - hard deadline

    • Final touches: system testing, code refactoring, etc.

Miscellaneous

  • Exams and other commitments:

    • During this period I'm developing my Bachelor Thesis project, which is expected to be done until the 30th of May. I estimate to dedicate 20 - 30 h / week to GSoC documentation and community bonding. After this date, my main interest will switch to GSoC.
  • Other summer plans:

    • This is not certain, but I'm planning to leave the country for a few days on a trip this summer - a week at most. During this time, I won't be able to code at GSoc.
  • Why Debian?:

    • My first real contact with Linux - I had a first try with Slackware and SUSE, but I didn't find them that easy to learn.

    • Open Source - powerful in learning new technologies and meeting new professionals.
    • Used worldwide - currently one of the most popular Linux distros (along with Ubuntu, which is basically Debian as well).
    • C/C++ - my first programming playground, and the one I'm most familiar with.
  • Are you applying for other projects in SoC? No.