Differences between revisions 9 and 12 (spanning 3 versions)
Revision 9 as of 2016-11-10 22:34:20
Size: 3225
Editor: TheAnarcat
Comment: add toc and include the status page
Revision 12 as of 2016-11-10 22:47:08
Size: 3346
Editor: TheAnarcat
Comment:
Deletions are marked like this. Additions are marked like this.
Line 16: Line 16:

The current codebase could be migrated to `manziarly`, provided we have access to the `manpages` group.
Line 33: Line 35:
1. fetch all manpages from the archive, store them on disk (makes them usable for tools like [[http://manpages.ubuntu.com/dman|dman]] that browses remote webpages)
2. convert manpages to HTML so they are readable in a web browser
3. index HTML pages in a search engine of some sort
 1. fetch all manpages from the archive, store them on disk (makes them usable for tools like [[http://manpages.ubuntu.com/dman|dman]] that browses remote webpages)
 2. convert manpages to HTML so they are readable in a web browser
 3. index HTML pages in a search engine of some sort
Line 37: Line 39:
parts 1 and 2 would be generated on some VM and stored on `manziarly` (see below). parts 3 would be a separate (pair or?) server(s?) to run the search cluster.

the above could be formalized as patches to the [[https://anonscm.debian.org/cgit/mirror/dsa-puppet.git/|dsa-puppet manifests]].
parts 1 and 2 would be generated on `manziarly` and stored on the static.d.o CDN (see below). parts 3 would be a separate (pair or?) server(s?) to run the search cluster.
Line 43: Line 43:
1. write the MVP, maybe based on David's work
2. ask (through a [[rt.debian.org]] ticket) access to the `manpages` group
3. deploy a first dump of the manpages there
4. make a puppet manifest or document how to deploy the scripts
5. ask
DSA to deploy the new code, test
6. if it works, fix the `manpages.debian.org` DNS to point to the static.d.o DNS. at this point, the MVP is in place
7. make search work...
 1. write the MVP, maybe based on David's work
 2. ask (through a [[rt.debian.org]] ticket) access to the `manpages` group
 3. deploy a first dump of the manpages on manziarly
 4. make a patch to the [[https://anonscm.debian.org/cgit/mirror/dsa-puppet.git/|dsa-puppet manifests]] or document how to deploy the scripts for the DSA
 5. ask DSA
to deploy the new code, test
 6. if it works, fix the `manpages.debian.org` DNS to point to the static.d.o DNS. at this point, the MVP is in place
 7. make search work...

in the above setup, `manziarly` would be a master server for static file servers in the Debian.org infrastructure. Files saved there would be rsync'd to multiple frontend servers. How this is configured is detailed in the [[https://dsa.debian.org/howto/static-mirroring/|static-mirroring]] DSA documentation, but basically, we would need to ask the DSA team for an extra entry for manpages.d.o there to server static files.
Line 54: Line 56:

`manziarly` is the master server for static file servers in the Debian.org infrastructure. Files saved there are rsync'd to multiple frontend servers. How this is configured is detailed in the [[https://dsa.debian.org/howto/static-mirroring/|static-mirroring]] DSA documentation, but basically, we would need to ask the DSA team for an extra entry for manpages.d.o there to server static files.

https://manpages.debian.org/ is a service providing online manpages in HTML format for the public.

Current status

Debian manpages archive

Possible implementations

There are three known implementations of "man to web" archive generators.

Current codebase

The current codebase is a set of Perl and bash CGI scripts that dynamically generate (and search through) manpages.

The current codebase could be migrated to manziarly, provided we have access to the manpages group.

Ubuntu

Ubuntu has their own manpage repository at https://manpages.ubuntu.com/. Their codebase is partly Python, Perl and Bash.

New codebase

A new codebase written by dgilman is available in github. It is a simple Python script with a sqlite backend.

MVP

The Minimum Viable Product for this project is a service that creates an HTML version of all the manpages of all the packages available in Debian, for all supported suites (including LTS).

apropos(1) functionality is considered extra that can be implemented later with already indexing tools like Xapian (or the web frontend, Omega), Lucene / Solr, Elastic search, or a simple homegrown javascript-based search (like readthedocs uses).

A possible design would be:

  1. fetch all manpages from the archive, store them on disk (makes them usable for tools like dman that browses remote webpages)

  2. convert manpages to HTML so they are readable in a web browser
  3. index HTML pages in a search engine of some sort

parts 1 and 2 would be generated on manziarly and stored on the static.d.o CDN (see below). parts 3 would be a separate (pair or?) server(s?) to run the search cluster.

next steps:

  1. write the MVP, maybe based on David's work
  2. ask (through a rt.debian.org ticket) access to the manpages group

  3. deploy a first dump of the manpages on manziarly
  4. make a patch to the dsa-puppet manifests or document how to deploy the scripts for the DSA

  5. ask DSA to deploy the new code, test
  6. if it works, fix the manpages.debian.org DNS to point to the static.d.o DNS. at this point, the MVP is in place

  7. make search work...

in the above setup, manziarly would be a master server for static file servers in the Debian.org infrastructure. Files saved there would be rsync'd to multiple frontend servers. How this is configured is detailed in the static-mirroring DSA documentation, but basically, we would need to ask the DSA team for an extra entry for manpages.d.o there to server static files.

Hardware

The old service used to run on glinka.debian.org. Teams/DSA requested the service should be moved to manziarly.debian.org.

Forum

Discussions about manpages.debian.org can take place on the regular Teams/DDP channels, for example the #debian-doc IRC channel and debian-doc@lists.debian.org mailing list.

You can also subscribe to this wiki page to get updates, which also functions as a ad-hoc forum.