Debsources as a Platform
Description of the project: Debsources provides Web access to all of Debian source code. Debsources allows to browse, search, and render Debian source code, as well as compute code metrics and statistics that encompass all available source packages. This GSoC project aims at extending Debsources in two directions.
On the one hand, Debsources will be extended to scale better, by switching the Debsources updated to an asynchronous architecture. This change will allow to distribute indexing tasks over multiple workers, potentially running on multiple independent machines. It will also allow to easily re-index previously indexed data in batch (e.g., upon changes to the available indexing plugins, or when injecting new releases from scratch), a use case that is challenging to support properly with the current synchronous architecture.
On the other hand, requests to extend Debsources with new features and to support new use cases, not always related to source code publishing, are on the raise. We want to address them by turning Debsources into a base software platform capable of running multiple Web applications on top of the same underlying database. The Debsources code base will be refactored to make this possible. As concrete use cases to test this change we aim at developing 2 new Web applications on top of Debsources: 1) a "copyright.debian.net" web app, allowing to browse, search, render, and export debian/copyright files; 2) a "patch tracker" web app to publish details about the source code differences that Debian packages carry with respect to upstream releases of the same software.
Confirmed Mentor: StefanoZacchiroli
How to contact the mentor: <info@sources.debian.net> ; see mentors' wiki pages for more
Confirmed co-mentors: MatthieuCaneill
Deliverables of the project:
- refactor Debsources web app to use blueprint, pluggable views, and other Python/Flask abstractions to separate common features (e.g., browse, search, etc.) from specific web apps
implement on top of Debsources a new Web app, copyright.debian.net, that allows to find, render, and export (e.g., in SPDX) debian/copyright files, in particular machine-readable ones
implement on top of Debsources a new web app, similar to the (now defunct) patch tracker
refactor Debsources updater from the current synchronous architecture, to an asynchronous architecture, based on Celery; convert the current plugins to tasks
- write new Debsources plugins (now workers) to mine debian/copyright information and inject them into the Debsources database
you shall include as part of your application a Debsources patch that fixes one of the currently outstanding bugs (you might want to start with newcomer bugs); see the HACKING file for info on how to get started with Debsources development
Desirable skills:
What the student will learn:
- deal with Debian infrastructure components, working on a large-scale service used daily by hundreds of Debian contributors
- apply real-life refactoring on an existing code base, to allow building unexpected features on top of it
- modular Web and infrastructure development, to share code among unrelated services
- deal with "big data" scale services, working on one of the largest existing database of Free Software source code
