Differences between revisions 15 and 16
Revision 15 as of 2018-05-26 23:47:29
Size: 11968
Revision 16 as of 2018-05-27 01:34:29
Size: 11970
Editor: PaulWise
Comment: formatting
Deletions are marked like this. Additions are marked like this.
Line 169: Line 169:
* git commit hashes
* git file hashes
 * git commit hashes
 * git file hashes

Website transition to Git

Debian's website structure is currently still kept in CVS.

We have discussed several times about migrating to another VCS.

Links to several discussions:


Bug report: #845297: [www.debian.org] Website transition from CVS to Git

Git repo: https://salsa.debian.org/webmaster-team/webwml/test_webwml_cvs2git (see the bug report for details about how the migration has been done)

See also WebsiteVCSEvaluation, WebsiteSVNTransition and CvsusingGit.

Ideas, pros/cons, discarded approaches

Migrate to Subversion (discarded)

The migration is not easier if we decide to migrate to Subversion instead of Git.

Publicity team (which also involves translators in a similar fashion as the website) migrated from Subversion to Git and it was well received, in general.

So it's probably better to migrate to Git and not consider SVN.

Migrate the whole webwml repository to Git

  • Main barrier is how we handle translations: some scripts figure out the revision number from CVS (which provides version number per files) and in the translated files there is a tag "translation-check" that is filled manually to point to the version number of the English file that is translated. Then, another script can compare revision numbers and mark translations as outdated, create the translation coordination pages to show the diffs that translators need, etc.
    • This could be done with git Keyword Expansions but it's not a recommended way to work in git.
    • This can be done introducing a comment including the version number inside the English file, and manually incrementing it when editing the file. This requires human action and it's prone to error, and it's a barrier to bulk changes.
    • This can be done using commit hashes or commit IDs instead of "version numbers". We have created a git repo as a test of this approach: https://salsa.debian.org/webmaster-team/webwml/test_webwml_cvs2git using the script "cvs2git" provided in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845297#45 and this repo the git log messages include the text:

CVS version numbers

path/to/file1: 1.1 -> 1.2
path/to/file2: oldversion -> newversion

  • In CVS we have partial checkouts and in git that could be done with git submodules but it introduces additional complications to the workflow. Some people already reported that partial checkouts are important for them.

Transition checklist (needs review and completion)

  • Make current CVS repository read-only: this will be done by Alioth admins in May 2018
  • Convert the cvs repository to git -> tests done. When we decide to actually migrate we just run again the cvs2git script and upload the resultant repo to Salsa.

  • (DONE: Sledge) Create and run a script to fix the revision numbers in the translation headers of the WML files, using the text in the log messages, or the file .git/cvs-revisions, which is a text file with this format:

TRANSLATING.pages 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
changes 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
new_translation.pl 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
english/.wmkrc 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
english/.wmlrc 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
english/Makefile 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
english/contact.wml 1.1 557f8619f4cd8b7c9b5396856c78d65e0b1aaf65
english/intro/organization.data 1.572 8db42f8d2205efba68d2f647c2c72ec672eb608d
english/intro/organization.data 1.573 c2d0de0505a28f634bb54d9d31487120e25451eb

Note that there can be multiple levels of translation-check headers chaining through different files, for example:


-> english/international/Slovak/index.wml
#use wml::debian::translation-check translation="1.14" original="slovak"

-> french/international/Slovak/index.wml
#use wml::debian::translation-check translation="1.14" maintainer="David Prévot"

$ git grep Local::Cvs | grep -v Perl/Local
french/international/french/desc.wml:use Local::Cvsinfo;
french/international/french/desc.wml:my $cvs = Local::Cvsinfo->new();
smart_change.pl:use Local::Cvsinfo;
smart_change.pl:        my $cvs = Local::Cvsinfo->new();
stattrans.pl:use Local::Cvsinfo;
stattrans.pl:my $cvs = Local::Cvsinfo->new();
stattrans.pl:my $altcvs = Local::Cvsinfo->new();
touch_translations.pl:use Local::Cvsinfo;
touch_translations.pl:my $cvs = Local::Cvsinfo->new();
  • Commit the updated scripts and changes to Makefiles to the webwml repository, and do the final cvs->git migration, and upload the repo to Salsa

  • Migrate the commit hooks that are in the Alioth repository (try to get similar functionality, if possible, in Salsa)
  • Check that everything works.
  • Write the corresponding documentation, and update the website content to reflect the new workflow (www.debian.org/devel/website mainly).
  • Send mails to debian-l10n-* and debian-i18n about the migration and the new translation workflow.
  • Sledge's work is currently at https://salsa.debian.org/93sam/test_webwml_cvs2git ; things will be pushed back to cvs shortly.

Other ideas/approaches

Use a "database" with the version numbers

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845297#25 for details

Try to migrate to gettext (.po) files as much content as possible

WML allows to use Gettext in the code, and it's used for some parts. We already have a system to generate .pot and .po files and we could think on "Gettext-ifiying" the whole pages and then get rid of the "check-translation" tags tracking revision numbers.

Translators are familiarized with .po and can use many different tools to work with those files.

We could take advantage of translation memories and reuse many strings that are the same or more or less the same across many files.

We could consider using a weblate instance to handle the translations, and then, they are committed in git directly and we wouldn't need partial checkouts, since people not able to work with the whole repo, could use weblate web interface to work on the files (and they would be committed automatically by weblate, with the author/committer info of the translator).

Two approaches:

  • (1) Rewrite the pages to use more templates and <gettext> tags, and then, the language wml files would be just a skeleton of code and all the text would be processed via pot and po files. See, for example, https://anonscm.debian.org/viewvc/webwml/webwml/english/doc/books.wml?view=markup : most of the content is gettext-ified via the books.def and books.data files. It would be needed to add gettext tags to the title, and the last paragraph.

  • Or (2) use po4a to generate POT files directly from the wml files. There are two tools that we can use:

po4a-gettextize -f wml -m index.wml  -p index.pot

will read the WML and create a POT file (no need to use gettext tags in the file).

Once the PO file is created (translating the pot file), we can use:

po4a-translate -f wml -m index.wml -p index.es.po -l ../spanish/index.wml

to generate the localized wml file.

We could automate the POT updates when needed, and compare the "POT-Creation-Date" in the english POT and language PO files to know if a translation is outdated. No need of versions related to commits (CVS or git).

Some inconveniences and their possible solution:

  • po4a in Stretch (v.0.47) cannot make page titles translatable. This is fixed in the 0.52 version which is already available in stretch-backports
  • When you use .po files and a new string is added but not translated, or a string is changed and thus become "fuzzy", the result is the translated page showing the original English string. This produce pages with mixed content (English and target language) which is undesirable and can produce issues (when target language and English don't mix well together). This can be solved using "--keep 100" to only write the wml file if the PO file is translated 100%. In the end this is a trade-off, depending on whether having an up-to-date page is more important or less than having a fully-translated one, with the caveat that out-of-date information can be very misleading, while untranslated content can always be fed to an automatic web translation tool or similar.

About weblate

Weblate is a framework for translations that integrates very good with git (each translated string becomes a commit in the git repo). It assumes that we use git for handling our code, and it handles many different types of translation files, but, unfortunately, it does not handle wml files.

Weblate could be used for translations, though, only if we decide to handle all the translations with .po files. Weblate provides a nice web interface showing the status of each translation, and it integrates with git, so we wouldn't need all the scripts that create the translation coordination pages.

Weblate also allows to download the .po files to work offline but don't want to checkout the whole repository.

Using weblate does not interfere with working with the git repo for the people that prefer to work with .po file in the same fashion as they do with code, because we can configure weblate to push each commit automatically, and pull frequently. There can be merge conflicts, though, as with the case of several people working with git at the same time.

You can have a look at https://weblate.org/en/ , and https://hosted.weblate.org/

Weblate is developed by Michal Čihař who is a Debian Developer. Weblate is not packaged in Debian(RFP #745661) but can be installed in a Debian system. A test repo for the website git transition can be found in https://weblate.larjona.net (weblate 2.7 in Debian jessie). Contact LauraArjona if you want an account there.

translation-check headers

There are two options for identifiers to use in translation-check headers.

  • git commit hashes
  • git file hashes

If git commit hashes are chosen, then smart_change.pl will need to make a commit before it can write the translation-check headers for the translations (and there can be multiple levels of translation French -> English -> German). Also, rebases will break the translation-check headers that refer to the rebased commits.

The advantage of git file hashes is that they can be calculated before a commit and will not change when commits are rebased.

To translate from a git commit hash and filename (from the git-cvsimport revisions file) to a git file hash:

git ls-tree -zr 7c95dc979cd7184ec4f20b0dd37e73e001a22d4f
TRANSLATING.pages | cut -zf1 | cut -z -d' ' -f3

However using git file hashes does have a couple of major issues:

  • It's quite alien to most people, even those used to git, so we're likely to see confusion from developers. Most people think in terms of commit hashes, and that's what all the normal git usage models involve.
  • Converting between the file hashes and commit hashes is a really time-consuming operation, meaning potential for performance issues in scripts.