Student Application Template
Name: lool0
Contact/Email: See melange for more information.
Background: I'm a 4th year medical student. I studied programming for 3 yrs in high school, mainly C, C++, and Java. I passed the AP computer science exam with a score of 5 back in the day. I then picked python quite recently (this year) and started making progress again. My projects include an irc bot for learning japanese and interacting with tatoeba's data offline, a japanese romanization server also used by tatoeba, and a minimal django CMS. You can access them at my github account. My contributions to open source has mainly been to the tatoeba project. I've been involved with the tatoeba project for a bit over three years now as I love languages. I've been using linux (mainly ubuntu and archlinux) for the past 2 yrs as well and have ran my own vps with BIND, nginx, ssh, and other services. I also deployed my own django blog using nginx, pypy, and uwsgi. I've already started working on this project and you can find the code here [Check melange for a link to the code].
Project title: Debian Metrics Portal
Synopsis: This project aims to centralize all debian statistics by providing a consistent interface to add/edit metrics and periodically pull in and integrate new statistics from remote sources to existing metrics. It also aims to provide a public api for accessing those metrics and raw data from udd mirrors and use them to generate client-side interactive graphs on demand.
Project details:
- The plan is to use django as a backend, ggplot and pandas to generate the static graphs, and rickshaw or some other d3.js based library for the client-side charting. It will consist of the following modules:
- stats module [metrics.stats]: this would contain all the code that handles local metrics
- models [metrics.stats.models]: the database schema
- Stat: this table holds information for a single metric we can have multiple tables for the different types of datapoints it takes
- name [text field]: this field labels the metric and is also used as the title for the graph generated
- indep_label [text field]: labels the x-axis
- dep_group_label [text field]: labels the y-axis
data_point [many-to-many relation to ?DataPoint]: holds many references to single x and y pairs in the metric
?IndepPoint: this table holds an x value that can be paired over and over with different y values in a ?DataPoint entry. we can have multiple tables mirroring this one for different types of x values (time, integer, categories, etc..)
- stat [foreign key relation to Stat]: this field keeps a reference to the metric this point is part of
- value [integer field]: this holds a value for the point
?DataPoint: this table holds a single data point pair, x to y.
- stat [foreign key relation to Stat]: this field keeps a reference to the metric this point is part of
indep_point [foreign key to ?IndepPoint]: this field has a reference to an ?IndepPoint entry which represents a certain x-value.
indep_id [integer field]: this is auto-filled by django, it uses the primary key id of an ?IndepPoint. It's used to keep all datapoints in a consistent order.
- dep_label [text field]: this field labels a y-value, it's used for filtering all relevant y-values to correspond to the variables in the metric that are being compared against a common x-value.
- dep_value [integer field]: this stores a value for a y point.
- views [metrics.stats.views]: dictates the presentation of server side graphs
- Stat: this table holds information for a single metric we can have multiple tables for the different types of datapoints it takes
- models [metrics.stats.models]: the database schema
- admin module [metrics.admin]: out of the box, django will provide an admin interface but the design of the database makes entry through it very tedious and error prone, this module will build on the admin interface to provide mass import through csv files or direct udd queries or actual manual entry of a table.
- views [metrics.admin.views]:
?MassImport: this takes an upload in a specific format (tab separated values for example), and imports it into a new Stat table then provides a table preview of the data and possibly a graph preview.
?TableInput: this asks for table dimensions (columns) and then generates an input widget where a user can manually enter values then submit it. It then mass imports the table as with the ?MassImport view and generates table/graph previews.
UDDImport: provides a way to query UDD for data and generate tables then mass imports it as with the ?MassImport view and generates table/graph previews.
- views [metrics.admin.views]:
- udd module [metrics.udd]:
- models [metrics.udd.models]: this is still a work in progress. I'll work with my mentor to identify any potential tables that will be helpful to have public access to and generate django models to mimic the udd schemas.
- views [metrics.udd.views]:
- UDDList: provides a list of all available udd tables
- UDDDetail: provides a way to browse and query a certain udd table. and possibly use client-side code to graph some.
- api modules [metrics.udd.api | metrics.stats.api]: this uses tastypie resource classes that mirror the models for the relevant app
- utils modules [metrics.utils]: any functionality that can be factored out from views should live here. That includes graphing code, import code, etc...
- ggplot code: this should cover functions that deal with time series, and different graph geometries. Naming would be along the lines of graph_$type_$geometry
- import code: this should cover a number of formats of import and should check the data for consistency before import. Naming would be along the lines of import_$format
- interface module [metrics.ui]: this should contain all template code so that making a new interface module should be easy for designers. Another module depending on this one will contain the default design and javascript code.
Benefits to Debian
- Ease of managing and updating metrics
- Accessibility of those metrics
- Provide a continuous stream of organized data on debian's strengths and weaknesses to inform decisions.
Deliverables(copied):
- database structure to store all historical data points of all metrics
- standardized declarative interface to add/remove metrics to be graphed. The interface should allow for both "local" metrics (e.g. data generated by scripts run on the machines hosting the metrics portal) and "remote" metrics (e.g. data generated by remote data sources which are then periodically gathered by the metrics portal)
- cron jobs to periodically fetch new data and generate graphs
- proof of concept: integration of (some of the) existing graphs in the metrics infrastructure
- web interface to show updated graphs of the various metrics
- client-side dynamic web interface to graph, on demand, specific metrics (possibly more than one at a time to look for correlations) over the desired time periods
- (optional) produce a Debian Package of the portal code to ease deployment on Debian-based machines
Technologies used:
- Django (includes an ORM and a templating engine)
- Tastypie (django based api framework)
- ggplot (graphing api interface to matplotlib)
- Rickshaw (js graphing library based on d3.js)
- angular-ui (bootstrap 3 implementation using native angularjs directives for the UI)
Project schedule:
- week 1-2: implement the database structure and json api for it
- week 3-4: implement the server-side views and graphing utils functions that generates static graphs
- week 5-6: implement the admin interface that allows updating data, uploading/entering data, and regenerating static graphs on demand
- week 7-8: implement the client-side javascript that generates interactive time-series graphs
- week 9-10: create all the metrics needed, generate their graphs, and write scripts that would periodically update them. Also work on an aesthetically pleasing possibly bootstrap based interface design.
- week 11-12: write tests, and have time to fix bugs, and deal with unexpected issues during the project.
Exams and other commitments: None.
Other summer plans: I might be in paris for a week and in montpellier for another week in July but I plan to still dedicate 8 hrs a day to the project during my trip.
Why Debian?: I've been a long time user of Ubuntu and I'd like to see Debian stay strong and continue to improve as it will definitely benefit me and everyone using it. (I seriously can't recommend any distro that isn't debian-based to newcomers to the linux and FOSS world.)
- Are you applying for other projects in SoC? Yes.