Packages Website Xapian Search
Mentor: ?FrankLichtenheld
Summary: Improve search results ranking and presentation
Required skills:
- Knowledge about full text search tools, preferably Xapian (apt-xapian-index)
- Perl
- Prior experience with Apache and mod_perl would be helpful, but is not required
Description: The current presentation of the search results is the exact hit followed by substring hits in alphabetical order. This could be improved by leveraging Xapian to provide a ranking metric. Also the description/tags/whatever hits could be displayed to the user below each result, like a Google result page.
The current code of packages.d.o can be found at http://git.debian.org/?p=webwml/packages.git;a=summary
See the README for some basic information and INSTALL for information how to set up a test instance.
Tackling this problem will not only require to find an elegant solution, but also a very fast one, since packages.d.o is a high-traffic website.
An important part of this proposal involves figuring out what Xapian can provide and what supplementary data sources might be needed. Keep in mind that the end result should be much more useful than a custom Google search on all the packages pages. Other useful sources like debtags, changelogs and copyright files could also be parsed if reasonably implementable. This might involve extending apt-xapian-index.
On the other hand the same code also runs behind http://packages.ubuntu.com/, and also http://archive.debian.net/, so I would prefer to not integrate it too tightly into the Debian infrastructure if that would mean that this flexibility gets lost.