Translation(s): English - Español - Italiano - Українська
The Ultimate Debian Database (UDD) gathers a lot of data about various aspects of Debian in the same SQL database. It allows users to easily access and combine all these data.
Data currently being imported include: Packages and Sources files, from Debian and Ubuntu, Bugs from the Debian BTS, Popularity contest, History of uploads, History of migrations to testing, Lintian, Orphaned packages, Carnivore, Debtags, Ubuntu bugs (from Launchpad), Packages in NEW queue, DDTP translations.
Some example queries are provided as CGI scripts, to make it easy for everyone to run them. You can browse them or view the source code.
Database schema: https://udd.debian.org/schema/udd.html
Source code: available in the git repository of the qa project at https://salsa.debian.org/qa/udd
Database server: runs on postgres with plperl and postgresql-debversion
- UDD related services:
Bugs Search: multi-criteria search engine for bugs
Bugs Usertags: search for usertag on bugs
Sponsors Stats gives some statistics about who is sponsoring uploads to Debian
Bapase allows to search for "interesting" packages using various criterias
public UDD mirror allows anyone to query UDD using PostgreSQL command-line or GUI clients
For more information, please contact us on #debian-qa or debian-qa@lists.debian.org (mailing list subscription and archives).
Connecting to and using UDD
udd.debian.org is running on ullmann.debian.org. It accepts direct SSLed connections from master, coccia, quantz (qa) and respighi (release) (firewall config).
- command-line:
psql service=udd
or: psql -U guest -h udd.debian.org -p 5452 udd
Python:
import psycopg2 conn = psycopg2.connect("service=udd") cursor = conn.cursor() cursor.execute("SELECT count(*) from sources where release='sid'") print(cursor.fetchall()[0][0])
Ruby (DBI): require 'dbi' ; dbh = DBI::connect('DBI:Pg:dbname=udd;port=5452;host=udd.debian.org', 'guest')
Ruby (PG): require 'pg'; conn = PG.connect({:host => 'udd.debian.org', :port => 5452, :user => 'guest', :dbname => 'udd'})
If you want to know precisely when a specific data source is updated, you can take a look at this config file, or at the current UDD status.
If you do not have access to the required machines, you can access the public UDD mirror directly from your own machines:
command line:
psql "postgresql://udd-mirror:udd-mirror@udd-mirror.debian.net/udd"
For quick successes, inspect the database schema on https://udd.debian.org/schema/udd.html. psql shows tables with \dt. Have fun.
Credits and citing UDD
UDD started as a Google Summer of Code project by Christian von Essen (Neronus), co-mentored by Lucas Nussbaum, Stefano Zacchiroli and Marc 'HE' Brockschmidt. It is now mainly maintained by Lucas Nussbaum, with help from others.
If you use UDD in research work, please cite this paper:
L. Nussbaum and S. Zacchiroli, "The Ultimate Debian Database: Consolidating bazaar metadata for Quality Assurance and data mining," 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town, 2010, pp. 52-61, doi: 10.1109/MSR.2010.5463277 (open access preprint).
Improving UDD
If you want to help improve UDD, you can set up your own instance as described on UltimateDebianDatabase/Hacking. You can report bugs against the qa.debian.org pseudo-package, using the udd usertag and user qa.debian.org@packages.debian.org. (list of bugs ; or list of bugs using UDD)
When reporting bugs by email, it means using the following pseudo-header:
Package: qa.debian.org User: qa.debian.org@packages.debian.org Usertags: udd
Other resources
full dump of the database generated every two days. (~1.9 GB and growing, to be restored using pg_restore. See this script for an example)
See also :
References
Even if the UDD's main goal was not to serve as a scientific research tool, it has been featured in some publications and talks:
Norbert Preining. Analyzing Debian packages with Neo4j. Neo4j Online Meetup. Article parts 1, 2, 3, video
Lucas Nussbaum and Stefano Zacchiroli. The Ultimate Debian Database: Consolidating Bazaar Metadata for Quality Assurance and Data Mining.7th IEEE Working Conference on Mining Software Repositories (MSR'2010)Paper - Slides - HAL
Julius Davies, Hanyu Zhang, Lucas Nussbaum and Daniel M. German.Perspectives on Bugs in the Debian Bug Tracking System 7th IEEE Working Conference on Mining Software Repositories (MSR'2010): Mining Challenge Paper - Slides- HAL
SubPages