- Website transition to Git
- Ideas, pros/cons, discarded approaches
Website transition to Git
Debian's website structure is currently still kept in CVS. We have discussed several times about migrating to another VCS.
Links to several discussions:
Git repo: https://anonscm.debian.org/cgit/webwml/webwml2git.git (see the bug report for details about how the repo is organized)
Ideas, pros/cons, discarded approaches
Migrate to Subversion (discarded)
The migration is not easier if we decide to migrate to Subversion instead of Git.
Publicity team (which also involves translators in a similar fashion as the website) migrated from Subversion to Git and it was well received, in general.
So it's probably better to migrate to Git and not consider SVN.
Migrate the whole webwml repository to Git
Note: work in this approach will be hosted in https://anonscm.debian.org/cgit/webwml/webwml2git.git/?h=githashes (webwml2git repo, githashes branch).
- Main barrier is how we handle translations: some scripts figure out the revision number from CVS (which provides version number per files) and in the translated files there is a tag "translation-check" that is filled manually to point to the version number of the English file that is translated. Then, another script can compare revision numbers and mark translations as outdated, create the translation coordination pages to show the diffs that translators need, etc.
- This could be done with git Keyword Expansions but it's not a recommended way to work in git.
- This can be done introducing a comment including the version number inside the English file, and manually incrementing it when editing the file. This requires human action and it's prone to error, and it's a barrier to bulk changes.
- This can be done using commit hashes or commit IDs instead of "version numbers", and git cvsimport can create a file matching original CVS versions to commit IDs. This is the preferred approach, for now.
- In CVS we have partial checkouts and in git that could be done with git submodules but it introduces additional complications to the workflow. Some people already reported that partial checkouts are important for them.
Transition checklist (needs review and completion)
Make current CVS repository read-only (chmod g-w -R /cvs/webwml on gluck? or cvs admin -l * )
- Convert the cvs repository to git
- Import the converted repository onto Alioth
- Checkout the entire repository
- Create and run a script to fix the revision numbers in the translation headers of the WML files, using the mapping file .git/cvs-revisions (created by cvsimport).
- Update the scripts that handle the file versions (probably creating webwml/Perl/Local/Gitinfo.pm and webwml/Perl/Local/VCS_git.pm is enough?)
- Commit the updated scripts and changes to Makefiles to the new repository
- Migrate the commit hooks in the Alioth repository.
- Check that everything works.
Create a repo with the non-translatable files
Note: work in this approach will be hosted in https://anonscm.debian.org/cgit/webwml/webwml2git.git/?h=untranslatable (webwml2git repo, untranslatable branch).
This could be done meanwhile we find a solution for migrating the whole repo.
This would allow some people to maintain certain files (for example mirrors.masterlist, organization.data, books.data, and others) in git, and also it would allow to handle the .po files with a weblate instance (in a debian.net service, for example. See below for info about weblate).
Inconveniences is that we'd had the info for the website splitted in two repos, with two different workflows on how to update it.
Transition checklist (needs review and completion)
- Checkout a copy of the cvs repository, and remove:
- the CVS-related files
- all the non-english folders
- Initializae a git repo with this folder structure (let's call this repo "git_untranslatable"). No need to import the history.
- In the cvs repo, in the english folder, remove all the files that will be kept in git_untranslatable, or set them as read only (e.g. cvs admin -l filename). The history is kept there.
- Translators and english editors can continue working in the CVS repo as always.
- People needing to change the untranslatable files can work in the git_untranslatable git repo
Create a build script (similar to the ones in https://anonscm.debian.org/cgit/debwww/cron.git/tree/parts/ ) that pulls the info from git at build time, prior to building the website.
- When the migration of the whole website to git is done, we'll have the cvs log history imported with git cvsimport, and then we just need to add the "git history" from the git_untranslatable repo, into the global git repo.
Use Gettext (.po files) in the whole website
Note: work in this approach will be hosted in https://anonscm.debian.org/cgit/webwml/webwml2git.git/?h=po4a (webwml2git repo, po4a branch).
WML allows to use Gettext in the code, and it's used for some parts. We already have a system to generate .pot and .po files and we could think on "Gettext-ifiying" the whole pages and then get rid of the "check-translation" tags tracking revision numbers.
Translators are familiarized with .po and can use many different tools to work with those files.
We could take advantage of translation memories and reuse many strings that are the same or more or less the same across many files.
We could consider using a weblate instance to handle the translations, and then, they are committed in git directly and we wouldn't need partial checkouts, since people not able to work with the whole repo, could use weblate web interface to work on the files (and they would be committed automatically by weblate, with the author/committer info of the translator).
(1) Rewrite the pages to use more templates and <gettext> tags, and then, the language wml files would be just a skeleton of code and all the text would be processed via pot and po files. See, for example, https://anonscm.debian.org/viewvc/webwml/webwml/english/doc/books.wml?view=markup : most of the content is gettext-ified via the books.def and books.data files. It would be needed to add gettext tags to the title, and the last paragraph.
- Or (2) use po4a to generate POT files directly from the wml files. There are two tools that we can use:
po4a-gettextize -f wml -m index.wml -p index.pot
will read the WML and create a POT file (no need to use gettext tags in the file).
Once the PO file is created (translating the pot file), we can use:
po4a-translate -f wml -m index.wml -p index.es.po -l ../spanish/index.wml
to generate the localized wml file.
We could automate the POT updates when needed, and compare the "POT-Creation-Date" in the english POT and language PO files to know if a translation is outdated. No need of versions related to commits (CVS or git).
Some inconveniences and their possible solution:
- po4a currently cannot make page titles translatable. We could file a bug against the package and help maintainers to fix it.
- When you use .po files and a new string is added but not translated, or a string is changed and thus become "fuzzy", the result is the translated page showing the original English string. This produce pages with mixed content (English and target language) which is undesirable and can produce issues (when target language and English don't mix well together). This can be solved using "--keep 100" to only write the wml file if the PO file is translated 100%. In the end this is a trade-off, depending on whether having an up-to-date page is more important or less than having a fully-translated one, with the caveat that out-of-date information can be very misleading, while untranslated content can always be fed to an automatic web translation tool or similar.
Weblate is a framework for translations that integrates very good with git (each translated string becomes a commit in the git repo). It assumes that we use git for handling our code, and it handles many different types of translation files, but, unfortunately, it does not handle wml files.
Weblate could be used for translations, though, only if we decide to handle all the translations with .po files. Weblate provides a nice web interface showing the status of each translation, and it integrates with git, so we wouldn't need all the scripts that create the translation coordination pages.
Weblate also allows to download the .po files to work offline but don't want to checkout the whole repository.
Using weblate does not interfere with working with the git repo for the people that prefer to work with .po file in the same fashion as they do with code, because we can configure weblate to push each commit automatically, and pull frequently. There can be merge conflicts, though, as with the case of several people working with git at the same time.
Weblate is developed by Michal Čihař who is a Debian Developer. Weblate is not packaged in Debian(RFP #745661) but can be installed in a Debian system. A test repo for the website git transition can be found in https://weblate.larjona.net (weblate 2.7 in Debian jessie). Contact LauraArjona if you want an account there.