Debsources as a Platform

I am familiar with FLOSS, as an user since I installed Sarge in 2004, and as a developer since I made a few contributions to the Weboob framework, which allows easy interactions between console or graphical applications and various websites, in 2010.

I participated to the 2012 and 2014 GSoC:

I am now familiar with Debian packaging, QA tools, and communication within the project.

Project details

Debsources has already become a useful service for the Debian and FLOSS communities, giving everyone the ability to browse the source code of all packages in several suites, as well as a great research tool.

However, to reach its full potential, some issues linger.

The currently synchronous and sequential architecture of debsources-updater, the program that feeds data from packages into the database, doesn't permit reindexing existing data in batch, for example when changing a plugin's options. Running workers on several machines would also improve the indexing time.

Although the situation is currently improving, the web application is still very monolithic, which makes providing new services challenging.

For example, debsources' infrastructure could be use to provide a replacement for patch-tracker.debian.org, which does not run anymore.

Another interesting application that can be developed on top of debsources' backend is copyright.d.n: the service will leverage the existing advanced search features (search by ctag symbol) and huge database of source code to answer questions like:

Indeed, with the numerous licenses used by Free Software projects, and the incompatibilites among some licenses (to say nothing about proprietary software), which is further complicated by the fact that some projects use different licenses for some parts of their code, it is currently hard for companies and communities to ensure perfect license compliance.

Moreover, by implementing SPDX export, copyright.d.n could be become the largest SPDX database in the world, which would be much useful to companies involved in FLOSS.

Below, the plan for implementing these changes. All new features implemented will be supplied with appropriate unit tests to ensure correctness.

updater

The currently synchronous update daemon will be refactored and made asynchronous, converting the various plugins into Celery tasks.

rabitmq seems a sane choice, but I'll discuss that with Zack.

There are two type of tasks:

Hooks handle two events: add_package and rm_package.

add-package events are sent by the stage extract: extract_new → add_package → _add_package → notify.

rm-package events are sent by the stage gc: garbage_collect → _rm_package → notify.

Web application refactoring

Before starting, I will check the code coverage on the relevant parts of debsources. If not perfect, I will improve the tests.

Blueprints:

base/sources

All the base features: searching, browsing, listing and displaying packages and their contents

stats

Displaying some stats will be conditional on what plugins are enabled (sloccount, ctags, ...).

infobox

ctags

sloc

I'll need to make the infobox (debsources/app/infobox.py) display the sloccount info only when the sloccount plugin is enabled.

copyright.d.n

Database

New tables:

author

license

Licensed will be identified by the SPDX identifiers when available. See spdx's git repository for a machine-readable list.

copyright

author_copyright junction-table for the many-to-many relationship between copyright and author

Worker

This will need to be future-proof, so the worker will be designed in a way that future copyright formats can easily be added.

App

New blueprint: copyright.

Essential features to implement:

Bonus features:

Routes:

Render the information in a user-friendly way.

API

patch-tracker

The goal is to replace the now defunct patch-tracker.d.o.

Database

New tables:

patch

patch_file: tracks info about a patch does to a single file

Worker

Parse debian/patches/*. See DEP3.

Problem: do all packages use DEP3 for their patches? I'd guess not, since it's been accepted only 3 years ago.

Bonus feature: parse each diff to gather statistics (number of additions/deletions on a given file)

App

New blueprint

API

Synopsis

Benefits to Debian

Debsources is becoming more and more important in the ecosystem of Debian web applications: it has been integrated into codesearch.d.n, the PTS (both original and the new tracker.d.o) and is used by firewoes, which will be important to Debile once it overcomes its current issues.

[WIP] These changes are necessary to make Debsources reach its full potential as an important piece of the Debian ecosystem, as a service to FLOSS users and contributors in general and as research tool.

Deliverables

updater

  1. new, asynchronous architecture for the updater, using Celery
  2. new worker for parsing debian/copyright files and injecting them into the database
  3. new worker for parsing debian/patches/ and injecting information about the patches into the database

Web application

  1. refactoring of the current web application to make it modular
  2. new web application: copyright.debian.net
  3. new web application: patch-tracker, to replace the defunct patch-tracker.d.o

Project schedule

I am planning buffer periods for each subproject. They will be used for debugging, improving unit tests, documentation. Non-essential ("bonus") features will be developed within those when I have time to spare.

Community bonding

If not done before, proof-of-concept of celery tasks. For example, gathering statistics about copyright-format usage in jessie, to get an idea of how many packages don't have machine-readable copyright files.

Setup a blog where I'll write shorts daily report. In addition to helping me write the weekly reports to soc-coordination, it'll allow me to quickly realize where I'm unproductive or get off track.

Maximize code coverage, especially in areas where I'll break stuff: updater, webapp blueprints and views.

Celerify debsources-updater

Finalize the web app refactoring

Week 3

copyright.d.n

Week 4

Week 5-7

patch-tracker.d.n

Week 8

Week 9-11

End of the summer

Week 12-13

Exams and other commitments

End of exams mid-May, no other commitment after that.

Why Debian?

Are you applying for other projects in SoC?

Yes.