Name: Pankaj Kumar Sharma
Contact/Email: sharmapankaj1992@gmail.com
Github: https://github.com/pankajksharma
Background: I am a third computer engineering student interested in linux, open source, Web2.0, AWS and Algorithms. I am comfortable with C++, Java, PHP and Python but have got a special interest for Python. Also I’ve got strong concept of Data Structures, Oops and DBMS. Also I like to hack around things (rather than doing stuff manually) and pretty good with scripting.
Relevant Experience:
I have done a 3 months internship with startup called AadhaarUp as backend developer and worked upon Tornado (a Python web framework and asynchronous networking library) and AWS integration.
During the same intern period I wrote python integrations for a few payment gateways APIs originally provided in PHP.
- What makes me the best person to work on this project ?
- I have been working with Python for last one and a half years and as mentioned above I’ve got my hands on over Tornado (which is very much similar to Django), so picking up Django’s API won’t be an uphill task for me.
- I’ve got experience in rewriting projects available in one programming language (PHP) to another (Python).
- I’ve good acquaintance with Databases, ORM and version controls (including git and subversion), which will be useful in this project.
- As this project is a rewrite, it’s important to communicate well with the maintainers and the community behind PTS, so I think my soft skills will be a plus point for this project.
Patches I've written:
Wrote a patch for this wishlist on qa.debian.org
Code [asked with this Application]:
Patch to add my name under To Do section of each package page.
Django Implementation for dispatch.pl.
Project title: PTS rewrite in Django
Project details:
How Stuff work ? [Issues with Current PTS]
- The current PTS is an amalgam of various technologies, including Perl, Python, Bash, XML, etc. making it difficult for its maintenance and extension.This heterogeneous combination does not motivate many contributors to contribute in this project.
- The system runs a cron, running 4 times a day to refresh the data. Clearly, a change made [a new release or a bug reported] against a packages will only be visible inside the PTS only after the next cronjob runs. Thus current PTS fails to be a “live” data monitoring system.
- As stated by Mentors of this project, current PTS data is very Debian specific and the derivatives of Debian find it difficult to set up their own PTS from current source code.
The wiki page of PTS provides a list some pretty cool TODO ideas, (I specifically liked “Subscribe to set of packages” and “Summary mails”) which have not been implemented yet, probably because of the reason “Current PTS being hard to hack”.
Getting Stuff going [The New PTS]:
- A new PTS, implemented in Django framework (along with Python functionalities) would be useful in following ways:
- Instead of being an amalgam of different technologies, the new system will make use of Python and Django, technologies only.
The new implementation of “web” part of PTS will be entirely dynamic in nature. The pages would be generated “on-demand” either by local cache (maintained by PTS system), from the Ultimate Debian Database or would be generated on demand.
- As the page would be generated on demand, instead of being refreshed at frequent interval, the chances of differences between the PTS data and original data would be extremely slim.
- The new implementation would also be “more modular”, so that it could be used by different derivatives of Debian as well.
- Documentation and tests (Unit, Integration and System tests) for each django app (module) developed.
How Stuff gonna work? [The Implementation]:
Mail Interface:
Email Subscription:
- Similar to present PTS, the upcoming system would be able provide users the ability to subscribe/unsubscribe for various packages or modify their preferences via sending mails.
- In addition to above procedure, I would also develop a web interface where a user/developer would also be able to subscribe/unsubscribe for various packages or modify their preferences.
Sending Updates:
Users subscribed for various packages would be able to receive updates based upon their subscription preferences. Instead of calling sendmail explicitly (as in present PTS), I would like to use Django Email API.
New Functionalities:
- Daily/weekly bounce reports to the admin.
- Regular removal of emails having more bounces than a certain limit.
- Time permitting, I would also add functionalities “Subscribe to set of packages” and “Summary mails”.
Web Interface:
The web interface of upcoming PTS will be entirely written in Django and will strictly follow the MVC architecture. I will like to implement all the features (like RSS, bug tracking, etc.) of present interface either in similar or improved way. Some of the key features would be as follow:
Dynamic Web Interface:
Instead of storing static HTML files, I would like to use Django templating engine to generate web pages “on demand”.
- To get data, related to various packages, I will define various backend procedures that will fetch data from local cache, from the UDD or from the repo itself. These procedures would be called during rendering of pages based upon the availability of data.
- To keep the system “live”, I will divide the package description template into a numbers of parts.Those parts which change frequently (like bug reports) will not be served from the cache, while those parts which remain static for longer time, will be cached.
Caching:
- The data would be generated “on demand”, and once generated it would be cached locally for future references to increase the performance of system.
I would like to provide a number of caching mechanism including Memcached, redis, Database Caching, File System caching.
- I will also define an abstraction layer for these caching interfaces. Based upon the availability of resource, admin could use any one of above mentioned methods.
RESTful API:
- I will also be providing REST API to enable users to get information related to packages like stable release, number of Open bugs, any particular bug etc.
Modularity:
- I would provide an added functionality to let admin setting up a new instance of PTS decide whether to include information (say patches, bugs, etc.) related to other derivatives or not simply by including/excluding modules. This will make new PTS code useful for its derivatives as well and will make it much more modular.
New Functionalities:
- In addition to information, I will also add user interface as a part of web interface. Users would be able to login to this interface and subscribe/unsubscribe for various packages or modify their preferences against different keywords (presently, it’s only through mail interface).
I will also add a new search interface which will provide a list of packages with names similar to the possibly misspelled search keyword (presently it throws a 404 page in such scenario).
- Time Permitting, I would also like to implement a few To Dos from present “Wish Lists” of PTS web interface.
Deployment and Benchmarking:
While writing this project, I was trying to find some reliable benchmarked result of Django application deployment techniques. The best I could find is http://brainacle.com/benchmark-of-django-deployment-techniques.html. The author of this post tried a number of techniques like Apache with mod_wsgi, Nginx+Apache with mod_wsgi, Nginx+FCGI, Cherokee+SCGI, etc. As per these test, the best result in terms of response time was observed with Nginx+cherrypy WSGI server.
- I will try all these combinations myself based upon the data available for current PTS (like max number of concurrent clients, availability of computation resources etc) for benchmarking these techniques. This will be helpful in deciding best technique to deploy the new system.
- The results of above deployments benchmarking could also be made available along with the source code.
Performance Assessment:
As mentioned by Stefano Zacchiroli, mentor of this project (link), finding right methodology for provide live data with minimum cost of performance would be part of this project and would require experimentation. So I would like to include performance assessments as a part of this project. Examples of assessments I will do as part of this project are as follow:
- Performance assessments of static and dynamic (new) PTS.
- Assessment of various mechanisms that would be developed for fetching data on the basis of Performance and freshness (validity) of data.
Synopsis:
- The present Package Tracking System (PTS) used by Debian uses a number of technologies (perl, python, bash, XML, XSL, to name a few), making it hard to hack in and to be extended for other Debian derivatives. Further the data in PTS is stored as static pages and gets refreshed 4 times a day, though this technique being good for performance, creates differences between original data and data available at PTS.
- This project aims to rewrite complete PTS system into Python, along with its commonly used web framework “Django”. This rewrite aims to make PTS more modulor and a live data monitoring system and to attract more contributors towards this project. All the present features of present PTS system would be incorporated in this new system, along with a handful of new features.
Benefits to Debian:
- Debian would be getting following benefits out of this project:
- Instead of using various technologies the new PTS would be entirely in Python, making it easier to maintain and extended for different functionalities.
- Python being a common technology in Debian community, this rewrite will bring more developers into this project, and would be helpful in its future evolution.
- Debian will get new, completely dynamic and modular PTS that would be able to serve as live data monitoring system for the Developers and maintainers.
- Being dynamic and with caching, the system would be able to reduce the computation that gets wasted in generation/updation of information related to not so frequently referred packages.
- The new PTS being modular would be useful in setting up PTS easily for Debian derivatives.
- New added features, like search and user interfaces, making PTS more human friendly.
- Performance assessments and benchmarking (as a part of this project) might also be useful for Debian in deciding the future prospects of PTS.
As suggested by Raphael Hertzog, mentor for this project (link), the project after this rewrite could be extended with implementation of following ideas like managing debtags,adding upstream metadata, maintaining list of easy bugs or tasks for new contributors,etc.
- I would like to be a part of these implementations and PTS maintenance in post SoC period as well.
- Debian would be getting following benefits out of this project:
Deliverables:
- Deliverables from this projects are as follow:
- A dynamic PTS written in Django + Python, providing almost real time information.
- More modulor PTS, which could easily be setup for Debian Derivatives.
- A new search and user interface.
- Subscription and sending updates with mail interface.
- Caching support for new System.
- PTS integration with UDD.
- Tests and Documentation for different modules.
- Performance assessment and benchmarking of new system.
Project schedule:
[May 27 - June 16]
- Learning about UDD, Django (in detail), caching, etc.
- Learning more about “Mail” part of PTS and Django email APIs
- Work planning and interacting with mentors and community to get an ideas about their mindsets, suggesting my own ideas(mostly regarding Mail interface)
- Designing DB schema from discussion with mentors and community
[June 17 - June 30]
- Implementing User subscription via mail
- Implementing sending mail to subscribers based upon incoming mails and subscriber preferences (extension of Django app submitted with this proposal)
- Implementing bounce reports and removal of bouncing emails
- Writing tests for mail interface
[July 1 - July 14]
- Coding basic Django app for the web part of PTS
- Designing and coding various backends to retrieve data dynamically
- Improving front end of present PTS
[July 15 - July 31]
- Implementing caching and caching abstraction
- Implementing cache invalidation based upon the frequency of changes to different parts
- Implementing cache invalidation based upon packages
[August 1 - August 25]
- Implementing REST APIs
- Implementing search interface for possibly misspelled search terms
- Implementing User login where users can subscribe/unsubscribe against different packages or modify their preferences
- Writing tests for “Web” part
- Code cleanup and Documentation (for mid-term evaluation)
[August 26 - September 15]
- Writing integration and System tests
- Performance assessments of static and dynamic (new) PTS
- Assessment of various mechanism developed for fetching data on basis of Performance and freshness of data
- Discussion with mentors regarding “new” features and selecting new features to be implemented
- Implementing these “new” features and functionalities
[September 16 - September 23]
- Deployment and benchmarking
- Cleaning code
- Writing and improving code documentation
Exams and other commitments: I might have my next semester exam somewhere around mid september (usually lasts for 1 week only).
Other summer plans: No I don’t have any vacation plans for summer.
Why Debian?:
I have been using Ubuntu for last 3 years, but that’s not the primary motivation factor. The factor that motivated me to choose Debian is its community. I found Debian community the most helpful and encouraging of all open source communities I’ve interacted with so far. I feel it a pleasure to contribute towards this community and at the same time learn a lot from the community.
- Are you applying for other projects in SoC?
No! I am not applying for any other project for SoC.