Differences between revisions 21 and 23 (spanning 2 versions)
Revision 21 as of 2016-11-12 06:28:18
Size: 4431
Editor: TheAnarcat
Comment:
Revision 23 as of 2016-11-14 13:56:25
Size: 1549
Editor: TheAnarcat
Comment: move comparison to the DESIGN document of debmans
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
= Possible implementations = == New implementation ==
Line 11: Line 11:
There are three known implementations of "man to web" archive generators. The current codebase is being rewritten from scratch into a package called [[https://anonscm.debian.org/git/collab-maint/debmans.git/|Debmans]], see the [[https://anonscm.debian.org/git/collab-maint/debmans.git/tree/README.md|README]] for more information and the [[https://anonscm.debian.org/git/collab-maint/debmans.git/tree/DESIGN.md|DESIGN]] file for a discussion on the implementation.
Line 13: Line 13:
== Current codebase ==

The [[https://anonscm.debian.org/viewvc/ddp/man-cgi/|current codebase]] is a set of Perl and bash CGI scripts that dynamically generate (and search through) manpages.

The current codebase could be migrated to `manziarly`, provided we have access to the `manpages` group.

The current codebase extracts manpages with `dpkg --fsys-tarfile` and the tar `tar` commands. It also creates indexes using `man -k` for future searches. Manpages are stored in a directory for each package-version, so it doesn't garbage-collect disappeard manpages. It also appears that packages are always extracted, even if they had been parsed before.

The CGI script just calls `man` and outputs plain text wrapped in `<PRE>` tags.

There is also a copy of the Ubuntu scripts in the source code.

== Ubuntu ==

Ubuntu has their own manpage repository at https://manpages.ubuntu.com/. Their [[https://code.launchpad.net/ubuntu-manpage-repository|codebase]] is partly Python, Perl and Bash.

It looks like there's a [[http://bazaar.launchpad.net/~kirkland/ubuntu-manpage-repository/main/view/head:/bin/make-manpage-repo.sh|bash]] '''and''' [[http://bazaar.launchpad.net/~kirkland/ubuntu-manpage-repository/main/view/head:/bin/make-manpage-repo.py|python]] implementation of the same thing. They process the whole archive (which is assumed to be local) and create a timestamp file for every package found, which avoids processing packages repeatedly (but all packages from the `Packages` listing are `stat`'d at every run). In the bash version, the manpages are extracted with `dpkg -x`, in the Python version as well, athough it uses the `apt` python package to list files, and uses a simple regex (`^usr/share/man/.*\.gz$`) to find manpages.

It keeps a cache of the md5sum of the package in `"$PUBLIC_HTML_DIR/manpages/$dist/.cache/$name` to avoid looking at known packages. The bash version only looks at the timestamp of the file versus the package, and only checks at the modification '''year'''.

To generate the HTML version of the manpage, both programs use the `/usr/lib/w3m/cgi-bin/w3mman2html.cgi` shipped with the DebianPackage:w3m package.

Seach is operated by a [[http://bazaar.launchpad.net/~kirkland/ubuntu-manpage-repository/main/view/head:/cgi-bin/search.py|custom Python script]] that looks through manpages filenames or uses Google to do a full text search.

== dgilman codebase ==

A new codebase written by dgilman is available in [[https://github.com/dgilman/manpages|github]]. It is a simple Python script with a sqlite backend. It extracts the tarfile with `dpkg --fsys-tarfile` then parses it with the Python `tarfile` library. It uses rather complicated regexes to find manpages and stores various apropos and metadata about manpages in the sqlite database. All manpages are unconditionnally extracted.

== anarcat's design ==

After careful analysis of the above options, TheAnarcat started working on his own design, detailed here: https://anonscm.debian.org/git/collab-maint/debmans.git/tree/README.md

At this point, the code can extract all files from a mirror efficiently, although it has not been tested on a full mirror yet, because TheAnarcat is waiting for `manpages` group access and dependencies install (`setuptools` from backports and/or `python-click`, `python-apt` and `python-debian`).

HTML conversion is not implemented yet, nor is searching, although it is expected that existing tools will make those steps more about integration than programming.
At this point, the code can extract all files from a mirror efficiently and convert them to HTML. It has not been tested on a full mirror yet, because TheAnarcat is waiting for `manpages` group access and dependencies install (`setuptools` from backports and/or `python-click`, `python-apt` and `python-debian`). See the [[https://anonscm.debian.org/git/collab-maint/debmans.git/tree/TODO.md|TODO]] file for more information about the current status of the project and limitations.
Line 52: Line 18:

Note that to configure a vhost on DSA machines, you need to follow the [[https://dsa.debian.org/doc/subdomains/|DSA subdomains documentation]].

https://manpages.debian.org/ is a service providing online manpages in HTML format for the public.

Current status

Debian manpages archive

New implementation

The current codebase is being rewritten from scratch into a package called Debmans, see the README for more information and the DESIGN file for a discussion on the implementation.

At this point, the code can extract all files from a mirror efficiently and convert them to HTML. It has not been tested on a full mirror yet, because TheAnarcat is waiting for manpages group access and dependencies install (setuptools from backports and/or python-click, python-apt and python-debian). See the TODO file for more information about the current status of the project and limitations.

Hardware

The old service used to run on glinka.debian.org. Teams/DSA requested the service should be moved to manziarly.debian.org.

Forum

Discussions about manpages.debian.org can take place on the regular Teams/DDP channels, for example the #debian-doc IRC channel and debian-doc@lists.debian.org mailing list.

You can also subscribe to this wiki page to get updates, which also functions as a ad-hoc forum.