Differences between revisions 32 and 33
Revision 32 as of 2012-04-09 00:32:36
Size: 15192
Editor: VipinNair
Comment:
Revision 33 as of 2012-04-09 00:44:43
Size: 15192
Editor: ?AndrewHobden
Comment: Spelling Mistake
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
  I am masters student in computer science & applications at [[http://nitc.ac.in|National Institute of Technology, Calicut]] with an undergraduate degree in mathematics. I have been using Debian Sid since quiet some time and see this as an opportunity to give back to the community, and I intend to stay with the project and the community even after the GSOC period :)   I am masters student in computer science & applications at [[http://nitc.ac.in|National Institute of Technology, Calicut]] with an undergraduate degree in mathematics. I have been using Debian Sid since quite some time and see this as an opportunity to give back to the community, and I intend to stay with the project and the community even after the GSOC period :)

Name

  • Vipin Nair


Contact/Email


Background

  • I am masters student in computer science & applications at National Institute of Technology, Calicut with an undergraduate degree in mathematics. I have been using Debian Sid since quite some time and see this as an opportunity to give back to the community, and I intend to stay with the project and the community even after the GSOC period :)

    I have more than 2 years experience in web development, API design, and response time optimization. Majority of my web application projects were in PHP, but I have worked on python and webapp before, hence I am not completely new to this. I have some system administration experience as well, and maintain an internal Debian Unstable AMD64 mirror in college. My other interests include pattern recognition and artificial intelligence. I have a good experience with version control systems and have used git extensively #1.

    Past Projects:

    • Tuxofwar: Designed a quiz app #2 and its API #3 using the webapp framework running over the GAE.

      • tools: Python, webapp, GAE, git

    • Paathshaala: Designed the backend and API for an internal content repository for college, similar to MIT OCW #4. The project was cover by the Linux For You magazine as an enterprise use of PostgreSQL database #5.

      • tools: PHP, PostgreSQL, git

    • Ocrn: A hand written character recognition tool based on neural network#6.

      • tools: Python, git

    Prototype {i}

    • I am developing a small prototype to support my application. The prototype is called Debmetrics and is hosted on Google App Engine. The code is available on github. I am scraping live data from Debian servers in this application. I'll use features from the prototype to support individual points in my application below. With this prototype i intend to showcase my python skills and hence I have not focused on the user interface as such.

  • I feel I am the best person for this project because :

    • I have completely understood the project requirement after discussing with the mentors.
    • I have prior experience with similar kind of projects and can foresee any situations that may arise.
    • I have a clear implementation roadmap. (!)

    • I have the required skills and I am fast learner when it comes to picking up new things.


Project Title

  • Improving Debian Team Activity Metrics


Synopsis

  • A web interface and an API needs to be developed to present the data gathered by the team metrics project.


Project Details

  • Debian Team Metrics project collects data from various sources (mailing list activity, commits etc) to gauge performance of teams in the Debian community. The data collection work was done as a part of GSOC last year by Sukhbir Singh (mentored by Andreas Tille and Scott Howard) and all the required data is available on a SQL database(PostgreSQL). A web interface for presenting the data and a data access API needs to be designed, which is addressed below. I have identified some of the requirements of the project which I will be discussing here. Following which I will propose a timeline to complete the project.
  • Web Framework

    • Firstly, I will discuss with the mentors and fix a python web framework for the project and familiarise myself with the framework before the actual coding period starts. Django/Pylons are the suggested frameworks but with the scope of the project being limited, using a micro framework like Flask #7 will suffice. For example, in a mailing list discussion, mentors suggested not to use ORM tools because of the simple DB structure and unlike most of the other python web frameworks, Flask with its minimalistic approach does not even come with a ORM tool and at the same time Flask has all the features required to implement this project like URL routing, templating, and basic caching.The prototype is built using the Flask Framework.

  • Graphical Data Representation

    • As part of last years Team Metric project, some R scripts were written written that generate plots based on the data. Data in the database is updated monthly using a cron job, so the plots can be reproduced in the server side each time the data is updated. I have discussed with the mentors regarding the graphical data representation aspect, but no particular approach has been fixed yet, hence I will discuss the various possible approaches and their pros and cons:
      1. Plots generated at server side using existing R scripts. Map the existing images to appropriate API calls.
        • {OK} Pros

          • Less load on Debian servers.
          • Javascript can be avoided at the client side in accordance with Debian Website Guidelines.

          /!\ Cons

          • Less flexibility, have to use the generated plots.
          • New visualizations not possible. (Without using client side js)

          {i} Prototype Example : Teammetrics - Commitlines

      2. Generate plots on the server side based on user request if its not already present. Cache generated plots. Simple scripts can be written in matplotlib #8 to generate those custom plots.

        • {OK} Pros

          • High flexibility. Visualization based on user parameters. (eg: show me a chart of 'metric' of members of a 'team' for years 'x' and 'y')

          • Javascript can be avoided at the client side in accordance with Debian Website Guidelines.

          /!\ Cons

          • Substantial load on Debian servers, if there are enough variable requests.
          • Slow for plots that aren't already pre-rendered.
      3. Use the Data API. Render plots on the browser using Javascript libraries. gRaphael #9 and D3 #10 are good javascript libraries for client side graph rendering.

        • {OK} Pros

          • High Flexibility.
          • Substantial load reduction on the Debian server. (Only text data is transferred)

          • Interactive charts possible at the browser side.

          /!\ Cons

          • Against Debian Website Guidelines because of high dependency on client side javascript.

          {i} Prototype Example : Teammetrics - Commitlines

      4. Combine different approaches based on browser capability. Use approach (3) on modern browsers, fallback to option (1) or (2) when using less capable browsers.
        • {OK} Pros

          • Highest flexibility among all the approaches.

          /!\ Cons

          • Increased development time.
          • Not sure if it satisfies Debian Website Guidelines.
      All the above mentioned approaches will be discussed with the mentors and the Debian community and a suitable approach will be taken based on the feedback. Personally I would pick approach (1) since it conforms to the Debian Website guidelines and the API will take care of the data needs of the user. I must mention that first approach would not limit any data from users, it may only exceed. If Javascript dependency is agreeable, I would prefer approach(3).
  • Interlinking different metrics and a new metric

    • A mailing list discussion with Sukhbir yielded that currently there is no way to link different metrics of a project together. Linking different metrics together is a high priority as this will give a better insight into the working of the teams withing the community. This will be done before the API is designed.

    {i} Prototype Code Example : I have linked different metrics of team in the program. Source. {i} Prototype API data Example : Teammetrics project - All metrics

    • Currently there is no metric that can identify all contributions of a particular member, but this data can be derived from the existing metrics. As part of the work done last year in the teammetrics project, it is possible to uniquely identify a member across all his/her aliases, which implies a good accuracy of the derived metric. I propose 'usercontribution' as a new metric that can be added to the teammetrics project and would like to work on it as part of the GSOC. (!)

  • API Development

    • The project has two goals, one is the development of API over which other applications can be built and other is the web interface, which will be used to present the data. Currently there are few metrics(authorstats, bugs, commitlines, commitstats, uploaders) defined for which we have the data available. The API will facilitate data access for these metrics.

      {i} API Design & Sample API from Prototype.

      GET /api/v1/team All the project metrics for all the years.

      GET /api/v1/team/metric All the project metrics for all the years.

      GET /project/metric/<YEAR> All the data available for a particular metric of the project of a specified year.

      GET /project/metric/<YEARRANGE> All the data available for a particular metric of the project over a time period.

      After a through discussion with the mentors, I will spend the initial time designing a future proof API that is general enough to support all the current metrics and any metrics that may be included in the project in the future, and flexible enough to answer different kinds of user queries. Ideally the API should be able to fetch any data that a SQL statement (at least the simple ones) will be able to retrieve. The API will provide data in JSON/XML format. Best practices of RESTful API design will be followed. A complete API documentation will be written. If required, one can fetch the generated images over the API by sending a GET variable to the API.

  • Web Interface Depending on the approach that we take for graphical representation of data, the interface will be designed accordingly. The interface that will be designed will be in accordance with guidelines mentioned here #11. Interface development time can be reduced by using libraries like Bootstrap #12, which can be utilized for other low priority tasks.


Benefits To Debian

  • The data collected by the Team Metrics projects provides valuable insights into the working of different teams in the Debian community. The interface aims presents the data collected in a methodical way. It can help answer questions like if a team is functioning properly, if there are sufficient interactions on the mailing list, if the team size is growing/shrinking over the years, etc.


Deliverables

  • A sufficiently flexible API for data access.
  • A web interface for graphical representation of the data.


Project Schedule

  • The schedule will depend on the approaches we take to solve different issues discussed in the 'Project Details' section above. A general rough schedule is given below:

    Before GSOC

    • Discuss with mentors and decide on the framework and get myself familiarised with the framework if not already familiar.
    • Finalize one of the implementation scheme from the various approaches discussed above.
    • Learn unit testing so that i could follow the test driven development model.
    • Discuss with mentors if XML API is needed.
    • Learn unit testing so that a test diven development model could be followed.

    Week 01

    • Design a flexible API format following the industry best practices.
    • Discuss the possibility of new metrics that can be derieved from existing metrics.
    • Design a consistent JSON structure across different API levels to ensure client side parsing goodness.

    Week 02 - 03

    • Implement the API and document it.
    • Decide upon the graphical data representation format.
    • Write code to generate plots using matplotlib/javascript, depending on the approach we are following.
    • Proceed with the design of the interface.

    Week 04

    • Implement no javascript version of the interface.
    • Ensure cross broser compatibility and that it follows Debian Community Website guidelines.

    Week 05 - 06

    • Collect feedback on the interface from the Debian website team and the community.
    • Work on the javascript version of the interface.
    • Ensure javascript compatibility and support graceful degradation if javascript is unavailable.

    Week 07 - 08

    • Collect feedback and improve the interface accordingly.
    • Identify new metrics that could be added to teammetrics project.
    • Write data extraction code for the newly identified metrics.
    • Write code to verify the extracted data.
    • Improve existing code, for example, the names.list contains some special characters that could be avoided.

    Week 08 - 11

    • Buffer period for completing any pending tasks.
    • Write documentation and fix any exisitng bugs.
    • Write tests for the current code.
    I expect to complete the API design and development phase for the first evaluation and probably start with the interface design. Other tasks like adding new metrics, verifying correctness of data and code optimization's will be done in the buffer period and I shall continue the work on the project even after GSOC.


Exams and other commitments

  • No other commitments during the GSOC period. I'll consider this project as a summer internship and give utmost priority to the project schedule.


Why Debian?

  • I am using Debian since past few years and this is the perfect opportunity to contribute back. I am also fascinated by the wonderful Debian community and would love to interact more and be a part of the community and I'll continue contributing to the project even after GSOC.


References


* Are you applying for other projects in SoC? Yes, I am very much interested in contributing to Debian and have applied for one more project under the Debian organization.