Name: Nikhil Bafna
Background: I’m currently in my Fourth year of Computer Science at BITS, Pilani. I have a strong formal background in Computer Science through rigorous training at my university.
In addition, I’ve developed a Flex-based application for VSEEL Lab, Purdue University, which serves as a training-cum-experimentation module on the Winner’s Curse. I’m currently wrapping up another Flex-based module for ‘Finvis’, one part of an online training-cum-data collection program funded by the US Social Security Administration and RAND Corp. Executable for both can be downloaded from here
As academic projects, I have worked on a Fuzzy Grading system and Hostel Complaint Management system for BITS Pilani.
During the summer of 2010, I interned at National Centre for Antarctic and Ocean Research, Goa, India. There, I worked on projects to Estimated India's Coastal boundaries beyond continental shelf and a virtual tour for the NCAOR campus, the source of which could be found at - http://github.com/nrbafna/Virtual-Tour---NCAOR-Campus
I currently use a Mac dual-booted with Lion and Ubuntu 10.10. I have used a version control system in the past, and I’m currently using git to manage code for my ongoing work for the demo of this project (debian.herokuapp.com).
What makes me the best person to work on this project?: I am experienced in web development using Python + Django, developing on Heroku, using a version control system. Apart from that, I understand the requirements of the project really well. I have no other commitment during the entire of phase of the GSoC project and I can put in 40-45 hours of work per week. My thesis semester runs from August to December this, and I would be able to continue working with the plugin distribution system and can put in 25-30 hours of work during that period. I have previous experience of working online through feedback, in my freelance projects. As a proof-of-concept of my understanding, I have been working on demo, which can be accessed, at http://debian.herokuapp.com/ I will keep iterating on the demo throughout the application process, which can be accessed at the same URL.
Project title: Improving Debian Team Activity Metrics
Synopsis: The 'Debian Team Activity Metrics' is an existing project to measure the performance of teams in the Debian community by inspecting postings on relevant mailing lists, commit statistics from project repositories and package upload records from the Ultimate Debian Database. I propose to write a RESTful API to access this information, develop a web interface that presents this by accessing the API, in a clear and intuitive way. That would be followed by a period of testing of the scripts that gather this data and code optimization for that. In the last phase, write scripts to implement new metrics adding to the previous three.
Project details: I’ve decomposed the problem into five main parts, and I’ve addressed each one below.
API to access the data: Currently, there is a cron job that collects data from mailing list activity, commit activity and package upload records from the Ultimate Debian Database. But, this is form of SQL tables and not directly user accessible. I propose to expose this data through a RESTful API, which consists only of GET requests and returns JSON formatted data. Some of the sample user queries to the API might be:
example.com/api/all = returns json data for all metrics for all teams; example.com/api/mailing-list = return json data for mailing list activity in all teams; example.com/api/mailing-list?team=accessibility&from=2008 = returns only mailing list data for Accessibility team and from year 2008 onwards
Web Interface, serving charts generated on the backend: Using a JS-only approach to generate charts will not when user has disabled JS in the browser or runs extensions like 'noscript'. Hence, when the browser has JS disabled, the charts will be dynamically generated in the backend using matplotlib for backend, and a simple png for the chart will be displayed. The parameters for the charts will be user-input, such as which metric to view, or show only top N contributors, etc. A new chart will be generated for only the first request, upon which it will cached for repeated queries, reducing the latency of the user request. The various options and parameters for charts are discussed further down.
Web Interface, serving charts generated on the frontend: While generating charts on the backend works, it is of advantage to leverage JS when enabled in the browser. It will reduce the load on the server as the processing will be done on client machines, loading generated images which are not in the cache will build up latency leading to slower response to user input. Hence, whenever the browser has JS enabled, I will use jqPlot to generate integrative charts. The data for the plotting will be accessed from the API developed in Part I. Apart from charts, I would build up an alternate viewing of the data in tabular form, which is more versatile in the sense that the table will be sortable and searchable.
Testing: Writing tools to verify that the information gathered using the metrics is correct. I will write tests for the scripts that gather the data. This is to test that the tools are gathering the right information from a mailing list or a repository. From the discussion on the mailing list, this can be done for example, by writing tests against a known repository or mailing list (team metrics).
New metrics and optimization: Currently, the scripts gathers data from mailing lists, git repo activity and package upload records. Based on the needs and discussion with the mentors, I will write scripts to incorporate new metrics, and to co-relate data from multiple metrics. I have strong background in mathematics and statistics, and I believe this will be an interesting problem to solve. Also, I would optimize all code from the previous stages and the data gathering scripts.
Before the coding stage starts, there is a period of one month which I would to discuss all possible visualizations for the charts such as bar / line / pie, etc. As an example, visualizing total no. of commits by a team over years is best-done using a line chart and running a regression line through it (as here - http://debian.herokuapp.com/teams/). Representing percentages of something would be best done on pie chart whereas seeing contribution of top N members in the mailing list is best viewed on a histogram (as here - http://debian.herokuapp.com/)
I am confident that I can execute all the five parts within the GSoC timeframe and have chalked out a schedule for myself listed in the Project Schedule section.
Benefits to Debian: The gathered metrics are useful indicators of activity and work by various teams. Building a web interface would allow this data to be easily visualized. Exposing this data publicly through API allows users to process it; build their alternate visualizations or mine for useful patterns or correlations. Seeing the work of hard-working teams visualized will be an incentive for the work. Also, showing data about teams, their activities would attract new developers as well.
Deliverables: Development of an interface to the Team Metrics project. API to access the Team Metrics data, and documentation to use the API. Tools to verify that the information gathered using the metrics is correct.
April 7 to May 20 (before GSOC begins):
- Discuss and finalize upon the filers and representation of the charts.
- Thoroughly research the implementation details of my proposal, to work out the best implementation.
- Set up test server to show the work in the progress.
May 21 to June 3:
- Wireframe and build the skeletal HTML, CSS to use throughout the web interface, by extending Twitter Bootstrap.
- Develop the RESTful API.
- Document the API with examples and display this documentation in the web interface of the project.
June 4 to June 24:
- Implement charting module, which generates charts in the backend using matplotlib.
- Develop the front-end interface through which the user requests for data. The user can view brief info about the teams, links to mailing lists, and can use filters such as show top N members, or show data only in certain range, etc.
- Meanwhile, the API is submitted for user testing to the mailing list and irc channel, their feedback is collected
June 25 to July 8:
- Implement the feedback from the mailing list and irc channels and make appropriate changes.
- Implement charting module, which generates charts using jqPlot. The frontend for this is already developed in the previous stage.
- Implement the data visualization in tabular form.
July 9 to July 22:
- Start Part IV i.e. writing tests to verify that the information gathered using the metrics is correct.
- Gather feedback from mailing list and irc channel, on all the work done till now.
- Test the web interface across all browsers.
July 23 to July 29:
- Implement feedback from all the previous public discussions.
- Start discussion on the new metrics can be developed.
July 30 to July 12:
- Start coding to gather data based on the new metric(s).
- Optimize all code written till now, and also optimize the data gathering scripts.
August 1 to August 7:
- Incorporate feedback from mentors, from users on irc channel and the mailing list.
- Thorough testing of the website across major browsers
- Start discussion on ways to ensure easy installation of plugins, with option ranging from custom extensions to custom Uri’s or single interactive installation script.
August 8 to August 20:
- Documentation, code refactoring and bug fixes
- Finish off any remaining work
- Continue work on plugin distribution system till December.
Exams and other commitments: : None. I have a semester break from May 14 - August 2. The next semester would be thesis semester and I will not have any exams from May 14 onwards.
Other summer plans:: I don't have any other summer plans including any other internship, and as I mentioned earlier, I will be able to put in 40-45 hours of work per week during the GSoC period. I can continue working on the project if the scope increases till the end of December this year by putting in 25-30 hours of work.
Why Debian?: Debian is a great example of an organization, which believes in the philosophy of UNIX and FOSS. Even a couple of years back, I would have jumped at a chance like this. I hope this project serves as a way to become involved with the community and work consistently with them in further development.
Are you applying for other projects in SoC?: Yes, one other application.