I am a 20-year-old undergraduate student in computer science at University of Strasbourg (France). Fascinated by the human mind, I plan to study Artificial Intelligence after I complete my Bachelor degree. For this reason, I've been looking for an opportunity to start doing machine learning on my own.

I am not new to open source development: since late 2010, I have made small contributions to the Weboob project, a python framework for interacting with websites. This taught me how to work with others and interact with their code, with Git as the VCS.

Should this be relevant (for interaction with existing code, or implementing performance-critical features), I also have some experience in C programming, mainly from school courses.

Because of my long-term interests for machine learning, and my willingness to acquire as many skills as I can in this domain, I shall be entirely commited to this project. I will probably stick around after the summer is over, particularly if I feel my work is not polished enough.

When a new contributor uploads a package, the latter is analyzed to extract relevant metadata from it. The metadata is then compared to a database of existing packages, to find packages similar to the new contributor's. The maintainers of those similar packages are then kept as potential sponsors.

Metadata extraction could be done with a supervised learning algorithm, using existing packages and Debtags' database for training. I'm not certain this is the right way, and I'll research that before the beginning of GSoC.

Matching a package with sponsors will be done in an unsupervised way, looking for similarities with the existing packages. The database could consist of all packages from registered sponsors, and/or be based on the entire Debian archive, using for example Debtags to avoid excessive calculation.

Using automatic metadata extraction from packages and learning algorithms, this project aims to match prospective maintainers with potential sponsors more easily and quickly. An efficient Web interface will be developed, so that maintainers and sponsors can access and improve this semantic metadata.

Debian has been my distribution of choice since 2004, for servers and desktops alike. However, I have been occasionally frustrated by the lack of reactivity for some important packages in Sid (KDE for example), and I always felt that the packaging system, while efficient for users, is unnecessarily complex and opaque for new packagers. For this reason, I regularly try other distributions for my desktop computer, and keep getting back to Debian, because it 'just works'. The grass is not so much more green on the other side.

The research I made after I heard of this project made me realize that package maintaining is not so inaccessible after all. This GSoC looks like a great introduction to the Debian process.