Progress reports for this project:
(Proposal has been updated on April 6th)
Name: Sukhbir Singh
Contact/Email: [removed] (email), IRC: [removed]
Background: I am a senior undergraduate student of Computer Science and Engineering from RBIEBT, Punjab Technical University, Punjab, India. I was a Google Summer of Code student in 2010 with Pardus GNU/ Linux where I wrote a package testing framework in Python. The project was completed successfully, with all that was planned being implemented.
Technical skills (with development experience):
- Python - GSoC 2010 and other projects (~ one year)
- C, C++ - I have no experience to show for in these languages other than the minor projects I did as part of my coursework.
- I have used Git and Subversion extensively.
- You can checkout my projects (including last year's SoC) and other related work from my website .
- I feel I am the best person to work on this project because I:
- have discussed this idea thoroughly with my mentors over a period of twelve days.
- have the requisite technical knowledge required to complete the project.
- am aware of what needs to be done and how it will be implemented.
- have experience in completing projects.
Duality is a BitTorrent client I wrote in Python based on a research paper that I co-authored.
Project title: Measuring Team Performance in Debian Pure Blends
Synopsis: The aim of this project is to gauge the performance of teams in Debian Pure Blends by inspecting postings on relevant mailing lists, package upload records from the Ultimate Debian Database and commit statistics from project repositories. The information gathered will help in evaluating team performance by measuring how people in a team are working together and an interface to access this information easily will also be developed.
Benefits to Debian:
- The status of teams in Pure Blends and packaging is currently unknown in Debian; there isn't any quantifiable measure yet which can show how well the team model is working. With the information that is collected under this project, we intend to help guide people's decisions to create or join teams or to place their packages under team maintenance.
- This will help measure the performance of teams in Debian Pure Blends.
- While this topic is under the Debian Pure Blends headline, it is of general interest for Debian. For instance, one can fetch the statistics of language specific user lists using this project.
When the mailing list statistics were presented by the mentor during DebConf 8 in Argentina , the audience requested for an additional lightning talk about this. This is indicative of how useful this feature can be. And with this project, we are not only including mailing lists but a variety of other data sources as well.
- This has received interest from pure packaging teams also.
- A set of tools that will measure performance on the basis of the factors discussed above.
- An interface that will allow this information to be easily accessible.
- A comprehensive documentation for the project.
Project details: (Andreas Tille is the mentor for this project along with Scott Howard as the co-mentor)
In Debian Pure Blends, the performance of teams was measured on the basis of postings on relevant mailing lists as well as inspecting the package uploads which are recorded in the Ultimate Debian Database. The information gathered received positive feedback both from Blends and pure packaging teams. Andreas wrote the initial code, which is a mixture of Perl and shell scripts. You can checkout a version from .
With this project, we intend to completely replace the older code, using existing tools to achieve our task or writing our implementation in Python where required. The aim is to make it more flexible, maintainable and include additional resources that will help in measuring performance.
This project is divided into four parts:
1. List Statistics
This will involve parsing the mailing list to identify the most active contributors. The conventional approach was to download the mailing list archive pages and then parse the HTML. In this project, instead of doing that, we will be parsing the mbox archives . This new approach will be faster and the output will be more precise as mbox is standardized format for holding collections of messages.
This phase will be undertaken in namely four steps:
- Fetch the archive in mbox format from the given mailing list,
- Parse the mbox archive,
- Save the information in a database,
Create graphs that show the top n posters every month.
The current list statistics which were generated by the code Andreas wrote are available at . You can have a look to get an idea of what at we are aiming at. With this project, we will improve the flexibility and accuracy of this approach. For the purpose of this task, we will be using: ?MailListStat . The data gathered will then be used to generate the graphs.
2. Most Active Uploaders
Under this, the most active uploaders within a team are measured; this data is available from the Ultimate Debian Database . We will fetch the information from UDD and then generate graphs from that. It will be investigated during this phase to see whether it is possible to obtain data about bugs fixed by team members from UDD or BTS. This is still under discussion and if it is required, it can be easily implemented within a week. This will be implemented in Python.
3. Commit Details
For measuring the performance using commits, we decided on two factors:
- number of commits,
- the number of lines committed.
We believe that no single metric is complete. The number of commits cannot be an absolute measure of productivity and neither can the total number of lines committed. Because of this, we have decided to include both of them. To estimate Git repositories, we have decided on using ?GitStats , which serves our purpose well. However, for parsing the commits from SVN repositories, we will be writing our own implementation. This is because the standard tools available which can do this don't fit our requirements.
4. Data Presentation
After the information has been gathered, we will be working on making it easily accessible. Other than generating graphs from phase (1), (2) and (3), we will be making our data available in form of JSON . This is because not only JSON integrates well with Python, but it is easy to learn and has bindings for large number of languages.
The final phase will be a web interface that allows anyone to easily access the information. This will allow dynamic generation of data; something on the lines of Debian Popularity Contest . Note that this is easier than it sounds; the data is ready, we just have to give it an interface through which it can be accessed.
There will still be many technicalities involved at this stage. For example, in the case of list statistics, a mailing list has several user names for the same person. To get a real team performance, this name matching is quite important and the tool which we are using for this job has no such feature . This will involve patching it or writing our own implementation if required.
Project schedule: This project will be completed within the period of GSoC.
Please note that this is a tentative timeline and as expected, can change when actual development takes place.
Community Bonding Period : I am already familiar with my mentors! I will use this period to further improve and analyze the design of the project. If there is one thing that I learnt the most from last year's GSoC, it's that a robust design is quintessential.
May 23th - 15th June : Initial coding, set up the infrastructure by gathering the required statistics and implement a very basic model of the website (presentation).
16th June - 25th June : Fetch the active uploaders list from the UDD by writing a script to do that. Hopefully, integrate the bug details as well.
26th June - 11th July : Feedback campaign across Debian, compiling user feedback on IRC, mailing lists and patching the basic website. Work on improving the data gathered.
11th July - 15th July : Mid-term evaluations! Review the old code and confirm that everything is working fine.
I really want to attend DebConf11 in Banja Luka. It would be awesome to meet my mentors and other community members in person! If this happens, I will adjust my schedule accordingly.
16th July - 15th August : Implement the presentation layers and complete the website that will present the data.
16th August - 22nd August : Improve code, check for bugs and write documentation.
With the updated proposal, we will be spending more time working on a polished interface and documentation and less time on implementing the tools, as suggested by Marc Brockschmidt. Also, a substantital part of our time will be spent on making these tools/ our implementations work for our specific case.
Other summer plans: I will be working exclusively on this project, full-time.
Exams and other commitments: I do not have exams or any other commitments during the GSoC period (April - August).
If you are not a Debian Developer: I am very much interested and will be contributing to Debian even after the summer. I would absolutely love to! In a short period of ten days, I have acquainted quite nicely with my mentors and would like that this student-mentor bonding continues. As I mentioned before, I was a Summer of Code student last year with Pardus GNU/ Linux  where I wrote a package testing framework . I still maintain the project there by fixing bugs and adding features. However, I am not a Pardus developer and hence this should not conflict in any way with my desire to work for Debian. In case I get selected for this project, I will continue maintaining my Pardus project but I will be focussing on Debian exclusively. One of the primary reasons I want to work under Debian is because of the excellent community it has; I feel that one nurtures better and has fun working when one is part of an active community.
- For this project, I will not be using any code that Andreas wrote. The main idea behind this project is to implement this from the ground up. The existing code is Perl and shell scripts; that will be completely discarded and this will be done in Python exclusively.
- There was no interface that allowed the information to be easily accessed in the older code. As mentioned in Section 4 (Data Presentation), the most crucial part of this project is to implement an interface that will allow this information to be easily accessed. We will be having dynamic generation of data also, something that we are excited about.
- There are some fresh ideas that were discussed with the mentors and those will be implemented in this project.
- The accuracy and the speed of gathering data will be improved significantly because of the proposed approaches.
- To sum it up, I will be using (and improving) the ideas from the previous project and writing a completely new implementation of this.
-  -
-  - svn://svn.debian.org/svn/blends/blends/trunk/team_analysis_tools
 - http://udd.debian.org
 - http://www.json.org/
 - http://popcon.debian.org
 - http://www.pardus.org.tr/eng