Website transition to Git

We have migrated to Git!!

The repo is here:

https://salsa.debian.org/webmaster-team/webwml/

The bug report tracking the migration is

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845297

The info about the migration is kept in the following subsections, for archive purposes.

Moved the TODO up here, so we can coordinate the last bits.

DONE

TODO

Website transition to Git

Debian's website structure is currently still kept in CVS.

We have discussed several times about migrating to another VCS.

Links to several discussions:

https://lists.debian.org/debian-www/2016/11/msg00051.html

Bug report: #845297: [www.debian.org] Website transition from CVS to Git

Git repo: https://salsa.debian.org/webmaster-team/webwml/test_webwml_cvs2git (see the bug report for details about how the migration has been done)

See also WebsiteVCSEvaluation, WebsiteSVNTransition and CvsusingGit.

Ideas, pros/cons, discarded approaches

Migrate to Subversion (discarded)

The migration is not easier if we decide to migrate to Subversion instead of Git.

Publicity team (which also involves translators in a similar fashion as the website) migrated from Subversion to Git and it was well received, in general.

So it's probably better to migrate to Git and not consider SVN.

Migrate the whole webwml repository to Git

CVS version numbers

path/to/file1: 1.1 -> 1.2
path/to/file2: oldversion -> newversion

Transition checklist (needs review and completion)

cvs-revisions 
TRANSLATING.pages 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
changes 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
new_translation.pl 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
english/.wmkrc 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
english/.wmlrc 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
english/Makefile 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
english/contact.wml 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
[..]
english/intro/organization.data 1.572 8db42f8d2205efba68d2f647c2c72ec672eb608d
english/intro/organization.data 1.573 c2d0de0505a28f634bb54d9d31487120e25451eb

Note that there can be multiple levels of translation-check headers chaining through different files, for example:

slovak/international/Slovak/index.wml

-> english/international/Slovak/index.wml
#use wml::debian::translation-check translation="1.14" original="slovak"

-> french/international/Slovak/index.wml
#use wml::debian::translation-check translation="1.14" maintainer="David Prévot"

$ git grep Local::Cvs | grep -v Perl/Local
french/international/french/desc.wml:use Local::Cvsinfo;
french/international/french/desc.wml:my $cvs = Local::Cvsinfo->new();
smart_change.pl:use Local::Cvsinfo;
smart_change.pl:        my $cvs = Local::Cvsinfo->new();
stattrans.pl:use Local::Cvsinfo;
stattrans.pl:my $cvs = Local::Cvsinfo->new();
stattrans.pl:my $altcvs = Local::Cvsinfo->new();
touch_translations.pl:use Local::Cvsinfo;
touch_translations.pl:my $cvs = Local::Cvsinfo->new();

Other ideas/approaches

Use a "database" with the version numbers

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845297#25 for details

Try to migrate to gettext (.po) files as much content as possible

WML allows to use Gettext in the code, and it's used for some parts. We already have a system to generate .pot and .po files and we could think on "Gettext-ifiying" the whole pages and then get rid of the "check-translation" tags tracking revision numbers.

Translators are familiarized with .po and can use many different tools to work with those files.

We could take advantage of translation memories and reuse many strings that are the same or more or less the same across many files.

We could consider using a weblate instance to handle the translations, and then, they are committed in git directly and we wouldn't need partial checkouts, since people not able to work with the whole repo, could use weblate web interface to work on the files (and they would be committed automatically by weblate, with the author/committer info of the translator).

Two approaches:

po4a-gettextize -f wml -m index.wml  -p index.pot

will read the WML and create a POT file (no need to use gettext tags in the file).

Once the PO file is created (translating the pot file), we can use:

po4a-translate -f wml -m index.wml -p index.es.po -l ../spanish/index.wml

to generate the localized wml file.

We could automate the POT updates when needed, and compare the "POT-Creation-Date" in the english POT and language PO files to know if a translation is outdated. No need of versions related to commits (CVS or git).

Some inconveniences and their possible solution:

About weblate

Weblate is a framework for translations that integrates very good with git (each translated string becomes a commit in the git repo). It assumes that we use git for handling our code, and it handles many different types of translation files, but, unfortunately, it does not handle wml files.

Weblate could be used for translations, though, only if we decide to handle all the translations with .po files. Weblate provides a nice web interface showing the status of each translation, and it integrates with git, so we wouldn't need all the scripts that create the translation coordination pages.

Weblate also allows to download the .po files to work offline but don't want to checkout the whole repository.

Using weblate does not interfere with working with the git repo for the people that prefer to work with .po file in the same fashion as they do with code, because we can configure weblate to push each commit automatically, and pull frequently. There can be merge conflicts, though, as with the case of several people working with git at the same time.

You can have a look at https://weblate.org/en/ , and https://hosted.weblate.org/

Weblate is developed by Michal Čihař who is a Debian Developer. Weblate is not packaged in Debian(RFP #745661) but can be installed in a Debian system. A test repo for the website git transition can be found in https://weblate.larjona.net (weblate 2.7 in Debian jessie). Contact LauraArjona if you want an account there.

translation-check headers

There are two options for identifiers to use in translation-check headers.

If git commit hashes are chosen, then smart_change.pl will need to make a commit before it can write the translation-check headers for the translations (and there can be multiple levels of translation French -> English -> German). Also, rebases will break the translation-check headers that refer to the rebased commits.

The advantage of git file hashes is that they can be calculated before a commit and will not change when commits are rebased.

To translate from a git commit hash and filename (from the git-cvsimport revisions file) to a git file hash:

git ls-tree -zr 7c95dc979cd7184ec4f20b0dd37e73e001a22d4f
TRANSLATING.pages | cut -zf1 | cut -z -d' ' -f3

However using git file hashes does have a couple of major issues: