Differences between revisions 7 and 103 (spanning 96 versions)
Revision 7 as of 2008-04-23 09:08:03
Size: 1621
Comment:
Revision 103 as of 2022-01-28 09:42:24
Size: 7651
Comment: minor fix to translation header
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Ultimate Debian Database = #language en
##For Translators - to have a constantly up to date translation header in you page, you can just add a line like the following (with the comment's character at the start of the line removed)
##<<Include(UltimateDebianDatabase, ,from="^##TAG:TRANSLATION-HEADER-START",to="^##TAG:TRANSLATION-HEADER-END")>>
##TAG:TRANSLATION-HEADER-START
~-[[DebianWiki/EditorGuide#translation|Translation(s)]]: [[UltimateDebianDatabase|English]] - [[es/UltimateDebianDatabase|Español]] - [[it/UltimateDebianDatabase|Italiano]] - [[uk/UltimateDebianDatabase|Українська]] -~
##TAG:TRANSLATION-HEADER-END
----
Line 3: Line 9:
Ultimate Debian Database (UDD) for short is an ongoing effort to create a relational database to encode information about Debian items (packages, bugs, popularity, ...). '''The Ultimate Debian Database (UDD) gathers a lot of data about various aspects of Debian in the same SQL database. It allows users to easily access and combine all these data.'''
Line 5: Line 11:
It is currently being worked on as [wiki:SummerOfCode2008/UltimateDebianDatabase Google Summer of Code 2008 project]. Data currently being imported include: Packages and Sources files, from Debian and Ubuntu, Bugs from the Debian BTS, Popularity contest, History of uploads, History of migrations to testing, Lintian, Orphaned packages, Carnivore, Debtags, Ubuntu bugs (from Launchpad), Packages in NEW queue, DDTP translations.
Line 7: Line 13:
This page is for coordination during the project development.  * Some '''example queries''' are provided as CGI scripts, to make it easy for everyone to run them. You can [[https://udd.debian.org/cgi-bin/|browse them]] or [[https://salsa.debian.org/qa/udd/tree/master/web/cgi-bin|view the source code]].
 * '''Database schema''': https://udd.debian.org/schema/udd.html
 * '''Source code''': available in the git repository of the [[https://salsa.debian.org/qa|qa project]] at https://salsa.debian.org/qa/udd
 * '''Database server''': runs on postgres with plperl and postgresql-debversion
 * UDD related services:
  * [[https://udd.debian.org/bugs.cgi|Bugs Search]]: multi-criteria search engine for bugs
  * [[https://udd.debian.org/dmd.cgi|Debian Maintainer Dashboard]]
  * [[https://udd.debian.org/cgi-bin/bts-usertags.cgi|Bugs Usertags]]: search for usertag on bugs
  * [[https://udd.debian.org/sponsorstats.cgi|Sponsors Stats]] gives some statistics about who is sponsoring uploads to Debian
  * [[https://udd.debian.org/bapase.cgi|Bapase]] allows to search for "interesting" packages using various criterias
  * [[https://udd-mirror.debian.net|public UDD mirror]] allows anyone to query UDD using PostgreSQL command-line or GUI clients
Line 9: Line 25:
== Resources == For more information, please contact us on [[irc://irc.debian.org/debian-qa|#debian-qa]] or debian-qa@lists.debian.org (mailing list [[https://lists.debian.org/debian-qa/|subscription and archives]]).
Line 11: Line 27:
 * [wiki:/DataSources data sources] we want to be injected in the db == Connecting to and using UDD ==
Line 13: Line 29:
== Involved people == '''udd.debian.org''' is running on '''ullmann.debian.org'''. It accepts direct SSLed connections from ''master'', ''coccia'', ''quantz'' (''qa'') and ''respighi'' (''release'') ([[https://salsa.debian.org/dsa-team/mirror/dsa-puppet/blob/master/modules/ferm/manifests/per_host.pp#L108|firewall config]]).
Line 15: Line 31:
 * Student: Christian von Essen
 * Mentor: Lucas Nussbaum
   * Co-mentor: Stefano Zacchiroli
   * Co-Mentor: Marc 'HE' Brockschmidt
 * command-line:
  * {{{psql service=udd}}}
  * or: {{{psql -U guest -h udd.debian.org -p 5452 udd}}}
 * Python:{{{
import psycopg2
conn = psycopg2.connect("service=udd")
cursor = conn.cursor()
cursor.execute("SELECT count(*) from sources where release='sid'")
print(cursor.fetchall()[0][0])
}}}
 * Ruby (DBI): {{{require 'dbi' ; dbh = DBI::connect('DBI:Pg:dbname=udd;port=5452;host=udd.debian.org', 'guest') }}}
 * Ruby (PG): {{{require 'pg'; conn = PG.connect({:host => 'udd.debian.org', :port => 5452, :user => 'guest', :dbname => 'udd'}) }}}

If you want to know precisely when a specific data source is updated, you can take a look at the [[https://udd.debian.org/crontabs.txt|crontab]] file. The ''timestamps'' table can tell you when a data source was last updated.

If you do not have access to the required machines, you can access the [[https://udd-mirror.debian.net|public UDD mirror]] directly from your own machines:
 * command line:{{{
psql "postgresql://udd-mirror:udd-mirror@udd-mirror.debian.net/udd"
}}} For quick successes, inspect the database schema on https://udd.debian.org/schema/udd.html. psql shows tables with {{{\dt}}}. Have fun.

== Credits and citing UDD ==

UDD started as a Google Summer of Code project by Christian von Essen (Neronus), co-mentored by Lucas Nussbaum, Stefano Zacchiroli and Marc 'HE' Brockschmidt. It is now mainly maintained by Lucas Nussbaum, with help from others.

If you use UDD in research work, please cite [[https://ieeexplore.ieee.org/abstract/document/5463277|this paper]]:
 L. Nussbaum and S. Zacchiroli, "The Ultimate Debian Database: Consolidating bazaar metadata for Quality Assurance and data mining," 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town, 2010, pp. 52-61, [[https://dx.doi.org/10.1109/MSR.2010.5463277|doi: 10.1109/MSR.2010.5463277]] ([[https://hal.inria.fr/inria-00502886/document|open access preprint]]).

== Improving UDD ==
If you want to help improve UDD, you can '''set up your own instance''' as described on [[UltimateDebianDatabase/Hacking]].
You can '''report bugs''' against the qa.debian.org pseudo-package, using the ''udd'' usertag and user ''qa.debian.org@packages.debian.org''. ([[https://bugs.debian.org/cgi-bin/pkgreport.cgi?tag=udd;users=qa.debian.org@packages.debian.org|list of bugs]] ; or [[https://udd.debian.org/bugs/?release=na&merged=ign&done=ign&fnewerval=7&flastmodval=7&fusertag=only&fusertagtag=udd&fusertaguser=qa.debian.org%40packages.debian.org&allbugs=1&cseverity=1&ctags=1&sortby=id&sorto=asc&format=html#results|list of bugs using UDD]])


== Other resources ==
 * [[http://www.loria.fr/~lnussbau/files/debconf9-ultimate-debian-database.pdf|Slides from a UDD talk at Debconf9]]
 * [[https://udd.debian.org/dumps/udd.dump|full dump of the database]] generated every two days. (~1.9 GB and growing, to be restored using pg_restore. See [[https://salsa.debian.org/qa/udd/blob/master/scripts/recreate-db|this script]] for an example)

See also :
 * [[ProjectB]]
 * [[DDE]] (Debian Data Export)

== References ==

Even if the UDD's main goal was not to serve as a scientific research tool, it has been featured in some publications and talks:

 * Norbert Preining. '''Analyzing Debian packages with Neo4j'''. [[https://www.meetup.com/Neo4j-Online-Meetup/events/243206424/|Neo4j Online Meetup]]. Article parts [[https://www.accelia.net/column/research/09/|1]], [[https://www.accelia.net/column/research/10/|2]], [[https://www.accelia.net/column/research/11/|3]], [[https://www.youtube.com/watch?v=lpqvv36SBQw|video]]

 * Lucas Nussbaum and Stefano Zacchiroli. '''The Ultimate Debian Database: Consolidating Bazaar Metadata for Quality Assurance and Data Mining'''.[[http://msr.uwaterloo.ca/msr2010/index.html|7th IEEE Working Conference on Mining Software Repositories (MSR'2010)]][[http://www.loria.fr/~lnussbau/files/msr2010-udd.pdf|Paper]] - [[http://www.loria.fr/~lnussbau/files/msr2010-udd-slides.pdf|Slides]] - [[http://hal.archives-ouvertes.fr/inria-00502886/en|HAL]]

 * Julius Davies, Hanyu Zhang, Lucas Nussbaum and Daniel M. German.'''Perspectives on Bugs in the Debian Bug Tracking System''' [[http://msr.uwaterloo.ca/msr2010/index.html|7th IEEE Working Conference on Mining Software Repositories (MSR'2010): Mining Challenge]] [[http://www.loria.fr/~lnussbau/files/msr2010-debianbugs.pdf|Paper]] - [[http://www.loria.fr/~lnussbau/files/msr2010-debianbugs-slides.pdf|Slides]]- [[http://hal.archives-ouvertes.fr/inria-00502883/en|HAL]]

== SubPages ==

<<PageList(re:^UltimateDebianDatabase/)>>

##UltimateDebianDatabase/CreateLocalReplica (for some reason this link is not getting auto-generated)
Line 21: Line 88:

(snipped content about '''data sources''' and moved == Sources of data we would like to import ==
If you have a use for this data source in mind, add the source to "we really want them". If not, add it to "we might want them". And you are allowed to move sources from "we might" to "we really want", of course. The one who originally added the source might not have thought of your use case :-)
=== We really want them ===
 * information about source and binary packages on each arch (basically the content of *Packages or *Sources, either by importing those files, or by importing projectb)
 * popcon
 * BTS (including usertags)
 * history of uploads (godog's work)
 * history of testing migrations (lucas' work, on http://qa.debian.org/~lucas/testing-status.html)
=== We might want them ===
 * DEHS
 * MIA
 * Carnivore
 * debtags (integrating the tags into the database)
 * wanna-build
 * lintian
it to the appropriate sub-page, see Resources above)
##CategoryPermalink --> https://udd.debian.org
 CategoryPermalink

Translation(s): English - Español - Italiano - Українська


The Ultimate Debian Database (UDD) gathers a lot of data about various aspects of Debian in the same SQL database. It allows users to easily access and combine all these data.

Data currently being imported include: Packages and Sources files, from Debian and Ubuntu, Bugs from the Debian BTS, Popularity contest, History of uploads, History of migrations to testing, Lintian, Orphaned packages, Carnivore, Debtags, Ubuntu bugs (from Launchpad), Packages in NEW queue, DDTP translations.

For more information, please contact us on #debian-qa or debian-qa@lists.debian.org (mailing list subscription and archives).

Connecting to and using UDD

udd.debian.org is running on ullmann.debian.org. It accepts direct SSLed connections from master, coccia, quantz (qa) and respighi (release) (firewall config).

  • command-line:
    • psql service=udd

    • or: psql -U guest -h udd.debian.org -p 5452 udd

  • Python:

    import psycopg2
    conn = psycopg2.connect("service=udd")
    cursor = conn.cursor()
    cursor.execute("SELECT count(*) from sources where release='sid'")
    print(cursor.fetchall()[0][0])
  • Ruby (DBI): require 'dbi' ; dbh = DBI::connect('DBI:Pg:dbname=udd;port=5452;host=udd.debian.org', 'guest') 

  • Ruby (PG): require 'pg'; conn = PG.connect({:host => 'udd.debian.org', :port => 5452, :user => 'guest', :dbname => 'udd'}) 

If you want to know precisely when a specific data source is updated, you can take a look at the crontab file. The timestamps table can tell you when a data source was last updated.

If you do not have access to the required machines, you can access the public UDD mirror directly from your own machines:

  • command line:

    psql "postgresql://udd-mirror:udd-mirror@udd-mirror.debian.net/udd"

    For quick successes, inspect the database schema on https://udd.debian.org/schema/udd.html. psql shows tables with \dt. Have fun.

Credits and citing UDD

UDD started as a Google Summer of Code project by Christian von Essen (Neronus), co-mentored by Lucas Nussbaum, Stefano Zacchiroli and Marc 'HE' Brockschmidt. It is now mainly maintained by Lucas Nussbaum, with help from others.

If you use UDD in research work, please cite this paper:

  • L. Nussbaum and S. Zacchiroli, "The Ultimate Debian Database: Consolidating bazaar metadata for Quality Assurance and data mining," 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town, 2010, pp. 52-61, doi: 10.1109/MSR.2010.5463277 (open access preprint).

Improving UDD

If you want to help improve UDD, you can set up your own instance as described on UltimateDebianDatabase/Hacking. You can report bugs against the qa.debian.org pseudo-package, using the udd usertag and user qa.debian.org@packages.debian.org. (list of bugs ; or list of bugs using UDD)

Other resources

See also :

References

Even if the UDD's main goal was not to serve as a scientific research tool, it has been featured in some publications and talks:

SubPages