Differences between revisions 17 and 33 (spanning 16 versions)
Revision 17 as of 2016-11-11 17:28:06
Size: 6716
Editor: TheAnarcat
Comment:
Revision 33 as of 2019-10-16 09:45:06
Size: 1142
Comment: Add information on how to report issues
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
https://manpages.debian.org/ is a service providing online manpages in HTML format for the public. https://manpages.debian.org/ is a service providing online manual pages for all Debian releases in HTML format for the public.
Line 7: Line 7:
<<Include(Services/Debian manpages archive)>> <<Include(Services/Debian manpages archive,,from="=$",editlink)>>
Line 9: Line 9:
= Possible implementations = = Deployment =
Line 11: Line 11:
There are three known implementations of "man to web" archive generators. The current `manpages.debian.org` service is powered by [[https://github.com/Debian/debiman|debiman]], running on `manziarly.debian.org`.
Line 13: Line 13:
== Current codebase == Details on how to make changes to the site are in the [[https://manpages.debian.org/faq.html|FAQ]].
Line 15: Line 15:
The [[https://anonscm.debian.org/viewvc/ddp/man-cgi/|current codebase]] is a set of Perl and bash CGI scripts that dynamically generate (and search through) manpages. = Issues =
Line 17: Line 17:
The current codebase could be migrated to `manziarly`, provided we have access to the `manpages` group.

The current codebase extracts manpages with `dpkg --fsys-tarfile` and the tar `tar` commands. It also creates indexes using `man -k` for future searches. Manpages are stored in a directory for each package-version, so it doesn't garbage-collect disappeard manpages.

The CGI script just calls `man` and outputs plain text wrapped in `<PRE>` tags.

There is also a copy of the Ubuntu scripts in the source code.

== Ubuntu ==

Ubuntu has their own manpage repository at https://manpages.ubuntu.com/. Their [[https://code.launchpad.net/ubuntu-manpage-repository|codebase]] is partly Python, Perl and Bash.

It looks like there's a [[http://bazaar.launchpad.net/~kirkland/ubuntu-manpage-repository/main/view/head:/bin/make-manpage-repo.sh|bash]] *and* [[http://bazaar.launchpad.net/~kirkland/ubuntu-manpage-repository/main/view/head:/bin/make-manpage-repo.py|python]] implementation of the same thing. They process the whole archive (which is assumed to be local) and create a timestamp file for every package found, which avoids processing packages repeatedly (but all packages from the `Packages` listing are `stat`'d at every run). In the bash version, the manpages are extracted with `dpkg -x`, in the Python version as well, athough it uses the `apt` python package to list files, and uses a simple regex (`^usr/share/man/.*\.gz$`) to find manpages.

To generate the HTML version of the manpage, both programs use the `/usr/lib/w3m/cgi-bin/w3mman2html.cgi` shipped with the DebianPackage:w3m package.

Seach is operated by a [[http://bazaar.launchpad.net/~kirkland/ubuntu-manpage-repository/main/view/head:/cgi-bin/search.py|custom Python script]] that looks through manpages filenames or uses Google to do a full text search.

== dgilman codebase ==

A new codebase written by dgilman is available in [[https://github.com/dgilman/manpages|github]]. It is a simple Python script with a sqlite backend. It extracts the tarfile with `dpkg --fsys-tarfile` then parses it with the Python `tarfile` library. It uses rather complicated regexes to find manpages and stores various apropos and metadata about manpages in the sqlite database. All manpages are unconditionnally extracted.

== anarcat design ==

The Minimum Viable Product for this project is a service that creates an HTML version of all the manpages of all the packages available in Debian, for all supported suites (including [[LTS]]). Note that the current codebase does not attempt to parse the manpage to generate headers, only the text is output.

`apropos(1)` functionality is considered extra that can be implemented later with already indexing tools like Xapian (or the web frontend, Omega), Lucene / Solr, Elastic search, or a simple homegrown javascript-based search (like readthedocs uses).

A possible design would be:

 1. fetch all manpages from the archive, store them on disk (makes them usable for tools like [[http://manpages.ubuntu.com/dman|dman]] that browses remote webpages)
    * layout options:
      * Ubuntu: `$DISTRIB_CODENAME/$LOCALE/man$i/$PAGE.$i.gz` (see [[http://manpages.ubuntu.com/dman|dman]])
      * current codebase: `"${OUTPUTDIR}/${pooldir}/${packagename}_${version}"` (from [[https://anonscm.debian.org/viewvc/ddp/man-cgi/extractor/manpage-extractor.pl?view=markup|manpage-extractor.pl]])
 2. convert manpages to HTML so they are readable in a web browser, possible solutions here:
    * just the plaintext output of man wrapped in `<PRE>` tags
    * DebianPackage:man2html is an old C program that ships with a bunch of CGI scripts
    * there's another man2html that is a [[http://savannah.nongnu.org/bugs/?34721|perl script]], but I couldn't figure out how to use it correctly.
    * DebianPackage:w3m has a [[https://sources.debian.net/src/w3m/0.5.3-32/scripts/w3mman/w3mman2html.cgi.in/|Perl script]] that is used by the Ubuntu site
    * DebianPackage:roffit is another perl script. the version in Debian is ancient (2012) and doesn't display the `man(1)` synopsis correctly (newer versions from github also fail)
    * DebianPackage:pandoc can't, unfortunately, read manpages (only write)
    * DebianPackage:man itself can generate an HTML version with `man -Hcat man` and the output is fairly decent, although there is no cross-referencing
 3. index HTML pages in a search engine of some sort

parts 1 and 2 would be generated on `manziarly` and stored on the static.d.o CDN (see below). parts 3 would be a separate (pair or?) server(s?) to run the search cluster.

next steps:

 1. write the MVP, maybe based on David's work
 2. ask (through a [[rt.debian.org]] ticket) access to the `manpages` group (./) asked access for `anarcat`
 3. deploy a first dump of the manpages on manziarly
 4. make a patch to the [[https://anonscm.debian.org/cgit/mirror/dsa-puppet.git/|dsa-puppet manifests]] or document how to deploy the scripts for the DSA
 5. ask DSA to deploy the new code, test
 6. if it works, fix the `manpages.debian.org` DNS to point to the static.d.o DNS. at this point, the MVP is in place
 7. make search work...

in the above setup, `manziarly` would be a master server for static file servers in the Debian.org infrastructure. Files saved there would be rsync'd to multiple frontend servers. How this is configured is detailed in the [[https://dsa.debian.org/howto/static-mirroring/|static-mirroring]] DSA documentation, but basically, we would need to ask the DSA team for an extra entry for manpages.d.o there to server static files.

= Hardware =

The old service used to run on `glinka.debian.org`. [[Teams/DSA]] requested the service should be moved to `manziarly.debian.org`.

Note that to configure a vhost on DSA machines, you need to follow the [[https://dsa.debian.org/doc/subdomains/|DSA subdomains documentation]].
Issues can be reported directly in [[https://github.com/Debian/debiman/issues]]. There is also a virtual package `manpages.debian.org` in the Debian bug tracking system which can be used to report bugs: [[http://bugs.debian.org/manpages.debian.org]]
Line 83: Line 21:
Discussions about manpages.debian.org can take place on the regular [[Teams/DDP]] channels, for example the `#debian-doc` IRC channel and `debian-doc@lists.debian.org` mailing list. Discussions about manpages.debian.org takes place on the regular [[Teams/DDP]] channels, for example the `#debian-doc` IRC channel and `debian-doc@lists.debian.org` mailing list. If you have any question on the service please use these regular channels.
Line 85: Line 23:
You can also subscribe to this wiki page to get updates, which also functions as a ad-hoc forum. You can also subscribe to this wiki page to get updates, which also functions as an ad-hoc forum.

https://manpages.debian.org/ is a service providing online manual pages for all Debian releases in HTML format for the public.

Current status

Deployment

The current manpages.debian.org service is powered by debiman, running on manziarly.debian.org.

Details on how to make changes to the site are in the FAQ.

Issues

Issues can be reported directly in https://github.com/Debian/debiman/issues. There is also a virtual package manpages.debian.org in the Debian bug tracking system which can be used to report bugs: http://bugs.debian.org/manpages.debian.org

Forum

Discussions about manpages.debian.org takes place on the regular Teams/DDP channels, for example the #debian-doc IRC channel and debian-doc@lists.debian.org mailing list. If you have any question on the service please use these regular channels.

You can also subscribe to this wiki page to get updates, which also functions as an ad-hoc forum.