Name: Octavi Font.
Contact/Email: octavi.fs@gmail.com. IRC: octavifs @ Freenode & OFTC.
Background: I am a 4th year undergraduate of a double degree in Telecommunications / Computer Science at the university Pompeu Fabra, Barcelona, Spain. I have a special interest on web applications scalability (I am currently writing my graduate thesis on that subject) and high performance algorithms in general (I took part in last year ACM Programming Contest SWERC, as part of my university's team).
Relevant experience
- Python and C++ are the languages I am most proficient with. I am fairly knowledgeable in Javascript, CSS, HTML and familiar with bash scripting.
I am currently working part-time with my university AI research department, in the European Spacebook Project. I am in charge of the integration between our module and the ones developed in Edinburgh and Stockholm (integration I am doing in python), so I think this experience in dealing with an heterogeneous array of technologies and giving them an homogeneous API will prove useful for the project. I have also developed a demo application for the project using the Flask framework.
I have written and open sourced a Django webapp that parses my faculty timetables and allows the creation of customizable ICS calendars. Apart from the Django experience I have acquired, I've discovered, while browsing the PTS source, that both projects handle parsing and updates in a quite similar manner. I think that should prove helpful in this endeavour.
Project title: PTS rewrite in Django
Synopsis: The Debian Package Tracking System is the platform that enables developers to keep track of a package evolution (versions, bugs, news), either via mail or the web interface. The PTS is built upon a mix between Perl, Python, Bash and XML + XSLT for the HTML rendering. This set of technologies makes maintenance harder than it should be and may discourage people to contribute. Furthermore, the current solution is completely static and some content might grow stale and cause false bug reports and general dissatisfaction in the service. This project aims to rewrite the PTS in Python + Django. This would homogenize the codebase, add a strong and extensible API, increase its accesibility to new developers and keep the resource usage in check through the use of caching.
Project details: My plan consists on developing this platform in various django apps (its idiom for modules), that will take care of each function.
pts.core: Will hold the schema and methods on such schema. This will basically consist on a models.py with Packages, Subscribers, Tags, Bug tracking information, development news, useful links. This models will hold all the data related to the PTS and will collect all the operations regarding that data. It may be also interesting to use Django model inheritance, so we can establish some basic datafields and behaviours, that could be easily extended by other distributions acording to their needs. Most of the functionality would come from this data model and it's related methods (that would act as a sort of API). If this part grows too much, I might consider breaking it down in smaller chunks. It would be interesting to have all the data and operations on that data separated, so that the presentation is simply a matter of adapting the output of the models method to HTML, RDF, RSS or whatever format is used in the future. Also, if we decouple the data recollection from the models (moving from the actual mails + ubuntu patches, ubuntu bugs, update-excuses... to name a few of the ones that appear in update_incoming.sh to a more sophisticated one like UDD + AMPQ realtime messages) the models and the views will hold.
pts.mail: This app will act as a replacement of the perl mail scripts. It will basically consist of various Django commands that replicate the current functionality.
pts.web: This would be the webapp. Using Django templates and the methods exposed in pts.core, I'll write a replacement of the current XML + XSLT to HTML implementation.
pts.rss: This would be the rss views for the news.
pts.rdf/ttl/xml: The idea would we to also have a drop-in replacement for the Turtle / RDF. In principle, given that all the data is stored in the views, it would only be a matter of building the appropiate django templates and views to display the pages accordingly. Ideally, it should be possible to use automatic tools to generate those views, but I would have to test them first.
pts.auth: This module would handle user subscriptions. It would basically consist of the necessary logic to add and validate Users to the database (prior email confirmation) and the necessary views. I would also like to add a view with all the subscribed packages and tags per user, and allow subscription / unsubscription from that page (apart from each packets page).
pts.rest: Basically this module would expose the models method as REST calls. I don't think a full REST API is needed (only GET methods, really), so the API could be implemented with very simple views returning JSON objects with the corresponding data. It may be interesting to study the advantatges / drawbacks of using a more baked solution like Tastypie or Django Rest framework.
caching: The cache configuration would be configured project wide. I think that a good solution would be using memcache to store the package views, and invalidate the cache each time the database performs a update on that entry. This could be done using django signals (trigger them on Package write). Depending on the production hardware and the number of pages to be stored, memcache may use a lot of RAM (if we are storing entire views), so this hardware restrictions would dictate whether the cache backend is stored on disk or on memory. Either way, both approaches are supported in Django.
deployment: I think this application would be best served using NGINX + GUNICORN / NGINX + uWSGI. We can also use Apache + mod_wsgi but I am not that familiar with that. Anyway, NGINX has some builtin cache support that could be of use as a rough cache solution (since, as far as I know, there would be no easy way to invalidate cache on incoming data, unlike with django.signals) This could serve as a quick solution for caching the REST requests.
development: I plan to follow a test driven development approach, so most of the debugging and documentation will be done incrementally. I also think it is very important to have a good test suite, especially for the core methods (that will be used by all the other derived apps, now and in the future).
Benefits to Debian
- Dynamism will allow the PTS to always be up to date.
- Django's MVC pattern and apps focus will make the PTS less monolithic and easier to adapt in the future.
- The usage of current technologies will make the PTS more accessible to new contributors.
Deliverables: A Django implementation of the PTS. It should have the following capabilities:
- email subscription system.
- dynamic web interface, with portal to manage subscriptions and tags.
- REST API to access the data.
- Caching.
- Integration with other Debian infrastructure (UDD, debtags). Would be discussed during the project the exact infrastructure we could benefit most from.
- Documentation and tests of the aforementioned deliverables.
Project schedule:
May 27 - June 16: (Community bonding period)
- Discussion with mentors and community.
- Define the core of the data schema.
- First prototype of the new PTS skeleton (to have an idea of where will everything fit into place).
- Familiarize myself with the actual deployment and devise an strategy for the future PTS deployment.
- Familiarize myself with the entirety of the codebase.
June 17 - June 30 (2 Weeks):
- First pts.core version. This would hold the models for all the PTS data and most of the methods necessary to access it.
July 1 - July 7 (1 Weeks):
- Porting the PTS email interface.
- iron out pts.core issues.
July 8 - July 21 (2 Week):
- Creating the web interface. I plan to have a web interface with most of the basic functionality available for the midterm evaluation.
- Creating RSS feeds view.
July 22 - July 28 (1 Weeks):
- Creating the REST API view.
July 29 - August 4 (1 Week):
- Creating the portal where subscribers can manage all of their subscriptions and tags.
August 5 - August 11 (1 Week):
- Creating the RDF view.
August 12 - August 25 (2 Weeks):
- Testing different caching solutions and their performance. Benchmark them with the current solution.
- Load testing the new PTS
August 26 - September 1 (1 Weeks):
- Testing in production environment
- Packaging of the pts
September 2 - September 15 (2 Week):
- Finish PTS documentation
- Finish PTS deployment documentation
- Feedback from community
- Bugfixing
September 16 - September 23 (1 Week):
- Iron out quirks: documentation, packaging and minor refactors to conform best practices.
Exams and other commitments: I have 2 exams on June and my dissertation presentation on July (will be finished before the 17th, though). It shouldn't take more than 6 days in total.
Why Debian?: I have used Debian for 8 years. I am familiar with the project and its philosophy and I would love to contribute and deploy my own work into such a well-esteemed open source community effort.
Are you applying for other projects in SoC? No.