Differences between revisions 16 and 17
Revision 16 as of 2012-03-26 23:43:59
Size: 10948
Editor: VipinNair
Comment:
Revision 17 as of 2012-03-26 23:50:40
Size: 10949
Editor: VipinNair
Comment:
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
 . {i} GMT+05:30  . {i} GMT +05:30

Name

  • Vipin Nair


Contact/Email


Background

  • I am masters student in computer science & applications at National Institute of Technology, Calicut with an undergraduate degree in mathematics. I have been using Debian Sid since quiet some time and see this as an opportunity to give back to the community, and i intend to stay with the project and the community even after the GSOC period :)

    I have more than 2 years experience in web development, API design, and response time optimization. Majority of my web application projects were in PHP, but i have worked on python and webapp before, hence i am not completely new to this. I have some system administration experience as well, and maintain an internal Debian Unstable AMD64 mirror in college. My other interests include pattern recognition and artificial intelligence. I have a good experience with version control systems and have used git extensively ?[1].

    Past Projects:

    • Tuxofwar: Designed a quiz app ?[2] and its API ?[3] using the webapp framework running over the GAE.

      • tools: Python, webapp, GAE, git

    • Paathshaala: Designed the backend and API for an internal content repository for college, similar to MIT OCW ?[4]. The project was cover by the Linux For You magazine as an enterprise use of PostgreSQL database ?[5].

      • tools: PHP, PostgreSQL, git

    • Ocrn: A hand written character recognition tool based on neural network?[6].

      • tools: Python, git

  • I feel i am the best person for this project because :

    • I have completely understood the project requirement after discussing with the mentors.
    • I have prior experience with similar kind of projects and can foresee any situations that may arise.
    • I have a clear implementation roadmap. (!)

    • I have the required skills and I am fast learner when it comes to picking up new things.


Project Title

  • Improving Debian Team Activity Metrics


Synopsis

  • A web interface and an API needs to be developed to present the data gathered by the team metrics project.


Project Details

  • Debian Team Metrics project collects data from various sources (mailing list activity, commits etc) to gauge performance of teams in the Debian community. The data collection work was done as a part of GSOC last year by Sukhbir Singh (mentored by Andreas Tille and Scott Howard) and all the required data is available on a SQL database(PostgreSQL). A web interface for presenting the data and a data access API needs to be designed, which is addressed below. I have identified some of the requirements of the project which I will be discussing here. Following which I will propose a timeline to complete the project.
  • Web Framework

    • Firstly, I will discuss with the mentors and fix a python web framework for the project and familiarise myself with the framework before the actual coding period starts. Django/Pylons are the suggested frameworks but with the scope of the project being limited, using a micro framework like Flask ?[7] will suffice. For example, in a mailing list discussion, mentors suggested not to use ORM tools because of the simple DB structure and unlike most of the other python web frameworks, Flask with its minimalistic approach does not even come with a ORM tool and at the same time Flask has all the features required to implement this project like URL routing, templating, and basic caching.

  • Graphical Data Representation

    • As part of last years Team Metric project, some R scripts were written written that generate plots based on the data. Data in the database is updated monthly using a cron job, so the plots can be reproduced in the server side each time the data is updated. I have discussed with the mentors regarding the graphical data representation aspect, but no particular approach has been fixed yet, hence i will discuss the various possible approaches and their pros and cons:
      1. Plots generated at server side using existing R scripts. Map the existing images to appropriate API calls.
        • {OK} Pros

          • Less load on Debian servers.
          • Javascript can be avoided at the client side in accordance with Debian Website Guidelines.

          /!\ Cons

          • Less flexibility, have to use the generated plots.
          • New visualizations not possible. (Without using client side js)

      2. Generate plots on the server side based on user request if its not already present. Cache generated plots. Simple scripts can be written in matplotlib ?[8] to generate those custom plots.

        • {OK} Pros

          • High flexibility. Visualization based on user parameters. (eg: show me a chart of 'metric' of members of a 'team' for years 'x' and 'y')

          • Javascript can be avoided at the client side in accordance with Debian Website Guidelines.

          /!\ Cons

          • Substantial load on Debian servers, if there are enough variable requests.
          • Slow for plots that aren't already pre-rendered.
      3. Use the Data API. Render plots on the browser using Javascript libraries. gRaphael ?[9] and D3 ?[10] are good javascript libraries for client side graph rendering.

        • {OK} Pros

          • High Flexibility.
          • Substantial load reduction on the Debian server. (Only text data is transferred)

          • Interactive charts possible at the browser side.

          /!\ Cons

          • Against Debian Website Guidelines because of high dependency on client side javascript.
      4. Combine different approaches based on browser capability. Use approach (3) on modern browsers, fallback to option (1) or (2) when using less capable browsers.
        • {OK} Pros

          • Highest flexibility among all the approaches.

          /!\ Cons

          • Increased development time.
          • Not sure if it satisfies Debian Website Guidelines.
      All the above mentioned approaches will be discussed with the mentors and the Debian community and a suitable approach will be taken based on the feedback. Personally i would pick approach (3) if Javascript dependency is not an issue, else i would prefer approach(1). A user might get more information than (s)he requires, if we are following approach (1) which might not be a issue and since a data API is available, some one can always build over if some functionality is required in the future.
  • API Development

    • The project has two goals, one is the development of API over which other applications can be built and other is the web interface, which will be used to present the data. Currently there are few metrics(authorstats, bugs, commitlines, commitstats, uploaders) defined for which we have the data available. The API will facilitate data access for these metrics.

      A sample Data API might look like this. (project => team)

      GET /project All the project metrics for all the years.

      GET /project/metric All the data available for a particular metric of the project.

      GET /project/metric/<YEAR> All the data available for a particular metric of the project of a specified year.

      GET /project/metric/<YEARRANGE> All the data available for a particular metric of the project over a time period.

      After a through discussion with the mentors, I will spend the initial time designing a future proof API that is general enough to support all the current metrics and any metrics that may be included in the project in the future, and flexible enough to answer different kinds of user queries. Ideally the API should be able to fetch any data that a SQL statement (at least the simple ones) will be able to retrieve. The API will provide data in JSON/XML format. Best practices of RESTful API design will be followed. A complete API documentation will be written. If required, one can fetch the generated images over the API by sending a GET variable to the API.

  • Web Interface Depending on the approach that we take for graphical representation of data, the interface will be designed accordingly. The interface that will be designed will be in accordance with guidelines mentioned here?[11].


Benefits To Debian

  • The data collected by the Team Metrics projects provides valuable insights into the working of different teams in the Debian community. The interface aims presents thee data collected in a methodical way. It can help answer questions like if a team is functioning properly, if there are sufficient interactions on the mailing list, if the team size is growing/shrinking over the years, etc.


Deliverables

  • A sufficiently flexible API for data access.
  • A web interface for graphical representation of the data.


Project Schedule

  • The schedule will depend on the approaches we take to solve different issues discussed in the 'Project Details' section above. A general rough schedule is given below:

    Week 01-04

    I am expecting to finish the API design and development phase in the first 4 weeks of the GSOC period.

    Week 04-08

    The interface design work will begin as soon as the first phase is complete.

    Week 08-10

    Buffer period for any pending tasks, documentation, bug fixes.

    Other tasks like adding new metrics, verifying correctness of data and code optimization's will be done in the buffer period if time permits and i volunteer to continue the work even after GSOC. I expect to complete the API design and development phase for the first evaluation.


Exams and other commitments

  • No other commitments during the GSOC period. I'll consider this project as a summer internship and give utmost priority to the project schedule.


Why Debian?

  • I am using Debian since past few years and this is the perfect opportunity to contribute back. I am also fascinated by the wonderful Debian community and would love to interact more and be a part of the community and I'll continue contributing to the project even after GSOC.


References


* Are you applying for other projects in SoC? No