Debsources as a platform

- I am currently a student at University Claude Bernard Lyon 1 enrolled in the Master program of the Department of Computer Science. (courses list here in french). I have successfully completed my research project on Cryptographic procedures used in the e-voting and their implementation using Elliptic Curves. The project's outcomes are a fully functional Elliptic curve module for cryptography written in Python and a web application used for e-voting to test the aforementioned module using Flask, Tornado and Websockets.

- Through my enrollment in various courses at the university I had the opportunity to work on several projects including:

(Code for some of the university projects are unfortunately not publicly available. I can however provide tarballs with the code or ask for authentication codes for the university forge)

- My experience in web development originates through some freelance work and an internship. I refactored this website and created websites for two PhD candidates ( http://toumiak.com/ http://atoumazi.com/ ). I am currently a volunteer at Womenhelp helping with the web development of their website. Progress appears on bitbucket.

- I attend several conferences in Lyon dealing with free and open source software such as the Journées du Logiciel Libre JdLL , Pyconf http://www.pycon.fr/2014/ and Experiences Numeriques. I am also a member of the hackerspace Lyon Open Laboratory (https://labolyon.fr/) and I'll be a volunteer at the 16 édition des Journées du Logiciel Libre (JdLL).

Current state

Debsources is the portal of the Debian source code. It provides the user with the opportunity to read the source code of the packages as well as observing some statistics. The tasks (updates) are synchronised. The search function enables a person to find packages, ctags or source code lines. Some other various functionnalities are the popup messages, the search based on sha-sum and code-highlighting. In the past a patch tracker existed which now is defunct.

Work to be done

Blueprint the current web application which will enable the separation of views making the application more modular and extensible and at the same time less complex. Develop two new applications, one to serve as a copyright license tracker and the other to "revive" the patch tracker. Use Celery in order to make the updater asynchronous which could enable the tasks to be parallel over different machines.

How I plan to implement the deliverables

Refactoring the existing webapp to use blueprints is done by refactoring the current views and the templates folder. Each blueprint is responsible for a group of urls (for example 'stats', 'ctag', 'src' etc) and has its own templates and views. Setting up the blueprints is a matter of design and modification will follow accordingly. Finally, there is a need to update the config file and the function n setup_blueprints().

For the copyright tracker I intend to work on SQLAlchemy for creating the models, and on templates for the views. The plug-in could search for the Debian folder inside each package and retrieve the corresponding license. Finally, a design is needed on the rendering part as well as exporting a license file (Possible verification of exported licenses using Lintian https://tracker.debian.org/pkg/lintian ).

I have prepared an entity-relation diagram to share my view on the new db-schema. As the files inside a package can have different licenses, owners etc, I have chosen to insert a column files and author in the table for copyrights. Using that, one can render a license for the whole package by selecting all licenses applied on a package_id. In addition, we can search for all the files licensed under a particular name even if they are on different packages.

The development of the patch tracker requires utilization of SQLAlchemy as well as the design of new templates. There is the need to identify different patch formats and find which were applied. Furthermore, this application requires new functionalities to search for maintainers and discover patches applied both by Debian and downstream.

Refactoring the existing architecture to use Celery involves the transformation of current plug-ins into tasks using the appropriate decorators and configuring the handling of the queue. A web interface for viewing the status of the updates (message broker) could also be implemented. If such an interface is implemented we could also have another feature in the api so that 3-parties could know the status of the updater, find out when packages are updated etc. The tasks will have to be divided in two sections: event triggered and cronjobs.

Refactor the current Debsources to use new flask abstractions and develop two web applications that will serve as a license and patch tracker. The architecture will become asynchronous.

Current application will become more robust, and updates should be smoother. The former patch tracker will be "revived" and search for licenses will be user friendly. Moreover, more statistics can be generated describing the evolution of Debian licenses.

Two web-apps: License and Patch tracker. Refactor and transform the current architecture to asynchronous.

"Other deliverable": Maintain a weekly blog to disseminate my experience as a Debian contributor for the GSoC.

As I don't have any university requirements, my work can begin as soon as the selection is announced. I plan to work 8h per day on phases 1-6 and on a similar basis during phase-0. Here is a suggested "calendar":

Phase 0: (Pre coding period)

[Community bonding] Get to know the mentor and the co-mentor. In addition, devote time to become familiar with the code through debugging. Some bugs I intend to work on are: #761869, #761108.

[Webapps] Validate the new database schema and design templates for new web apps.

[Copyright and Patch tracker] Construct a requirement analysis for the new trackers. Also, study recommended packaging tools, formats and expand my knowledge on the DEP3, DEP5 and the SPDX formats.

Phase 1: (week 1)

Blueprint the existing application. Divide each url group in a separate blueprint which will include its own view and templates. If this is already implemented at OPW, or not being anymore considered, I could use this time in the phase 4, or consider another feature (statistics for licenses and/or patches) or another task proposed by the mentors.

Phase 2: (week 2-3-4-5)

(Copyright app) Modify existing schema to meet the goals of the project. Develop plug-in for mining copyright licenses and then implement the views for the desired features (find, render, export licenses) in the corresponding blueprint.

Phase 3: (week 6)

Deal with pending tasks and bugs.

First Evaluation

Phase 4: (week 7-8-9-10)

(Patch tracker) Update schema to take into consideration the patch tracker. Implement the necessary features.

Phase 5: (week 11-12 )

Transform the updater to an asynchronous model using Celery. This involves modifying the plug-ins, setting up Celery in Flask, modifying the existing updater, and setting up the message broker for the updates.

Phase 6: (week 13)

Deal with pending tasks and bugs.

Pencils Down

I am university free from the beginning of April. No exams.

The choice for Debian is on one hand technical and on the other one political.

Technical because Debian is my preferred OS mainly due to the stable and high quality releases. I consider Debian the most important Gnu/Linux distribution for what it offers to the people using it directly or through one of its child distributions.

Political because I consider really important having an independent non-commercial distribution, build with egalitarian principles and where freedom and security are the system's foundations.

I have worked on the following bugs: #761228 #761232 #762934 #761232 contributed to the bar chart and I have a patch pending for #761867 #761106