Name Manish Gill
Contact/Email: gill.manish90@gmail.com (Personal/Melange), mgill25@outlook.com (Mailing Lists). IRC: Naeblis @ Freenode and OFTC.
Background: I am a final year Computer Science student in HMR Institute of Technology and Management, Guru Gobind Singh Indraprastha University, New Delhi, India. So far, my experience has been in web development, but I am also fascinated by (and motivated to learn) various other things like Compiler Theory, AI, Machine Learning, NLP etc.
Programming Experience
I dabble in a number of programming languages, including C, Ruby, Scheme, SML, Javascript, but my favorite and the language I am most comfortable with, is Python.
- I have been programming in Python for more than a year now. You can check out my Github[1] and Bitbbucket[2] profiles to see the various projects I've done in Python. I also worked as an Intern at a startup[3] last year for 7 months, during which time, I worked mostly with Python and Javascript.
- The applications I worked on as an intern are Batch.me[4], an app to queue and publish tweets, and Sportschimp[5], an HTML5 mobile app that uses a JSON-based Node.js API to serve information, with a crawler running on GAE that keeps the Redis database up-to-date.
- The technologies I have worked include: Flask, Django, web2py[DAL], Celery, Redis, MongoDB, PostgreSQL, Node.js/Express.js, Google App Engine.
- I am fairly familiar with Version Control Systems like git, bzr, and hg as well.
What makes me the best person to work on this project?
- Having worked with various loosely coupled web applications in the past, I am familiar with the type of work that might go on in creating a Django web application.
- I have been reading the PTS source code I think I get what might entail in rewriting the source code as a Django application.
- I was initially involved in migrating a REST API from Node.js to Flask, so I know something about rewriting code that has similar functionality in a different language/framework.
Project title: PTS Rewrite in Django
Project details:
Current state of PTS
- PTS (The Debian Package Tracking System) is currently a mix of Perl, Python and Bash scripts, working together to serve the Debian package information on the web. This polygot mashup of technologies makes the PTS harder to maintain, feature addition and just generally hacking on it. This project aims to rewrite PTS using the Django web framework.
- In the current version of PTS update_incoming.sh downloads the package information, which is then processed to generate XML files, these files in turn generate the HTML files, which are served live.
New PTS Architecture
- Django enforces the MVC architecture, which means that for every job, there will be dynamic views, which will be responsible for serving content on the web.
- The package database, which will contain all the information that might be served, will have to be "told" about the updated package information.
- Allow for updates in real time. The data should always be "recent" and refreshes on the database should occur very fast. This can be done either using a queuing mechanism that periodically polls the data sources and keeps the data "fresh", or can be "on-demand", whereby a particular view handler will have the job of fetching the latest information from the database, and caching it so that it can be used in subsequent requests.
- PTS currently fetches data from various sources. There have been ideas on the mailing list to unify and integrated it with the Universal Debian Database.
- Email Subscription shall remain more or less the same. Basic subscribe/unsubscribe functionality, and "summary" emails, which send summarize information within a time period and send it to the subscribers (weekly/monthly digests).
- Instead of XML/HTML combination to produce static pages, use Django templating engine to render HTML dynamically.
Synopsis: This project will use modern web technologies, like the Django web framework, to rewrite the Debian PTS. This new version of PTS will be much more loosely coupled, extensible, and dynamic. The various features will include email subscription, caching, fast data monitoring/updation.
Benefits to Debian
- Allow PTS to update the information as soon as it becomes available, which is much better than the current situation. Currently, PTS has the potential to allow disparity between information shown at the current time and the actual information, which might change in the meantime.
- A homogenous codebase instead of a polygot mashup is much easier to hack on.
Deliverables: A Django implementation of PTS, which serves the package information at packages.qa.debian.org. This app will have:
- A package tracking system that gets updated regularly and provides real time (or as close as we can get) feed to the PTS app.
- An email subscription system integrated within PTS.
- Caching of various dynamic parts of PTS, whereby it is required.
- Possible integration with various Debian infrastructure tools, like UDD and debtags.
- Documentation and Tests for as much of the app as possible. This will be done in conjunction with writing the code.
Project schedule: The following is a tentative schedule. I will try to keep things as close to the timeline as possible, but depending on the various design decisions that the mentors take, this might change.
Major Milestones:
- Basic app that allows to view package information.
- Email Integration.
- Caching Implementation.
- Integration with Debian infrastructure.
Timeline:
May 27 - June 16:
- Community Bonding Period.
- Familiarize myself with all the relevant Debian infrastructure that will be used in the Project.
- Start discussion with mentors and the rest of the community on the project.
- Design the Schema for the app.
- Begin mapping out the initial Django application. Start making models and prototype views.
(Week 1 and 2) June 17 - June 30:
- Work on scripts to pull the packaging information from the various database sources. Basically a rewrite of update_incoming.sh and improvements.
- Possible integration of a queue scheduling system like Celery, or lightweight pyres, instead of cron?
- Tests for the scripts as well.
(Week 3 and 4) July 1 - July 14:
- Write views and templates for displaying basic package related information.
- Tests and documentation.
- Begin working on Email Subscription system.
(Week 5 and 6) July 15 - July 28:
- Finish Email Subscription integration.
- This includes writing a system that allows for:
- subscribing/unsubscribing,
- a periodic newsletter/summary email to all those who opt for it.
- Write tests and documentation.
Mid term evaluation
(Week 7 and 8) July 29 - August 11:
- Learn about various ways caching could be used in PTS.
- Start working on caching implementation.
(Week 9 and 10) August 12 - August 25:
- Finish work on caching.
- Check out other Debian infrastructure use cases and how they can work with PTS.
- Discuss with the community and look into integrating UDD and debtags.
- Start work on any possible solutions.
(Week 11 and 12) August 26 - September 9:
- Continue working with Debian infrastructure services.
- Write tests and documentation for work done.
(Week 13 and 14) September 10 - September 22:
- Final weeks. Full system deployment and testing.
- Feedback from community, bug fixes, additional features.
Final evaluation of GSoC.
Exams and other commitments: I will have end term exams in May or early June. I will update this page when I get the dates.
Other summer plans: No plans, I am available to work with Debian full time during the summer.
Why Debian?: I've been using Debian or Debian-based Operating systems for over 2 years. Debian is also the most friendly open source community that I've interacted with. There is a rich environment here for anyone who wants to contribute to open source community, and I've been passionate about that, just haven't found the right platform/community before now.
- Are you applying for other projects in SoC? Yes, but I prefer to work with Debian.