Name: Aseem Sharma

Contact/Email: Email:, IRC: aseem at OFTC, freenode

Background: I am Aseem Sharma, a Computer Science & Engineering student from Manipal Institute of Technology, India. I have a fair knowledge of C/C++, GTK+, Java, Python. In addition, I have been a web developer/designer for AIESEC (Association internationale des ├ętudiants en sciences ├ęconomiques et commerciales) in Manipal University, developing their first web site. I am currently a web developer/designer for IAESTE (The International Association for the Exchange of Students for Technical Experience) - IAESTE IndiaMIT, so I am fairly good in HTML/CSS and other web based technologies like PHP and ?JavaScript. I am a returning Summer of Code student, I was selected for the previous year SoC but after successfully completing my mid-term evauations I met with an accident because of which I was not able to code for about 3 months. My mentor was made aware of this and so they completed GTK+ port themselves, as it was a top priority, from where I had left off.

Project title: Semantic Package Review Interface for

Project details: The main aspect of the project is to ensure the existence of high quality metadata, the meaning of this is simple - semantic metadata should be able to map accurately the meaning of data that it describes, thus making it easier for either the automatic process or the manual set-tag process to set up a match in accordance with the interests of the teams and the sponsors.

Debian QA [1] provides various tools such as The Package Tracking System, Lintian, Debtags which can be used to gather the metadata on the packages uploaded to debexpo. Debtags in particular has made the process of identifying various packages, based on tags put up by the developers, very easy. For extracting the semantic data, number of libraries are there which can be made use of. I personally would like to work using Beautiful Soup [2] as it is easy for dissection and extraction. For debexpo, various cluster algorithms [3] will be discussed with the mentors and then applied to group the packages which have the same or almost same metadata.

The storage backend/web query interface will be developed using Pylons/TurboGears - I don't have much experience with Pylons/TurboGears, but I am willing to learn and have already started with Pylons.

Synopsis: Development of an extraction-deployment of high quality metadata which will be helpful to map projects to teams and making the task of package reviewing and mentoring easier and faster.

Benefits to Debian: After the completion of this project, the package reviewing process will become easier and faster which is benefecial not only to Debian but also to Debian-Derivatives. A team or a maintainer can easily look up the projects which they are interested in. This benefits Debian in the long run as the time consumed in reviewing process is considerably shortened.

Deliverables: After completion, I plan the following to deliver to the project-

- A faster and easier to use package reviewing tool.

- A versatile storage backend/web query interface.

- Looking up at alioth [4], I am very much interested in deploying the OpenID logins [5] which will make the task of logging in to debexpo easier for everyone.

Project schedule: Regardless of being accepted as a SoC student I would like to take up the project during the summer.The project timeline is as follows, I will try to follow the timeline here as best as I can-

9th - 17th April:

-Looking up at the debexpo code, making myself comfortable with the coding style and reading the documentation, asking anything which I do not understand from the mentors.

-Studying Pylons.

18th - 30th April:

-Reading up more documentation, learning how the packages are reviewed in detail and how they work.

-Studying Pylons, Various Cluster Algorithms, discussing the most convinient way to apply the extraction techniques with the mentors.

-Learning how to interact with Debian infrastructure - mailing-lists and the bug tracking system.

1st - 8th May:

-Start coding to comepensate the days lost due to exams from 15th-26th May, Implementation of extraction techniques discussed with mentors to be started during this phase.

-Studying Cluster Algorithms in detail side by side.

9th - 27th May:

-Not much coding to be done during this period.

-Documenting the code written during the 1st-8th May period, as it would be less time consuming.

-Studying SQLAlchemy when time permits.

28th May - 7th June:

-Taking up where I left to implement the extraction of semantic metadata.

-Reading more about the ORM.

8th - 28th June:

-Finishing up with extraction, Starting with implementing algorithms to group packages according to the metadata extracted.

-Starting up with the storage backend development using Pylons.

28th June- 5th July:

-Documenting the code written, testing and debugging the code.

6th-28th July:

-Implementing the web query interface based on the new storage backend.

-Mid-Term evaluations form to be filled.

-Documenting,testing and debugging the code written during this phase.

-Starting implementing the OpenID login in the later stages.

29th July - 4th August:

-Implementing OpenID logins.

-Documenting and testing the code written.

5th - 12th August:

-Even more testing, deploying the code for a test run and rectifying the errors encountered [ No one can write the perfect code for the first time, except Linus Trovalds ]

Exams and other commitments: I don't have exams during the SoC period. I do have examinations from 15-26th May, that is why I would like to start coding a bit earlier to compensate the time lost during the 5 days (21st-26th May).

Other summer plans: I don't have any summer plans.

Why Debian?: Why are you choosing Debian? What attracts you about Debian?: Debian is the Universal Operating System, I have been using and trying various Linux distro's - Ubuntu, Debian, Fedora, Linux Mint etc for the past four years, but mainly I have been switching distro's between Debian and Ubuntu. The simplicity of Debian as an OS has what attracted me towards it. Also, since Ubuntu is itself a derivative of Debian I am much more interested to work for the Debian community. The main reason which has prompted me to choose Debian is the development cycle, the quality of choosing applications for a particular release has been outstanding and impressive.

Are you applying for other projects in SoC?: No