I18N for Package Maintainers

This is a draft - Work in Progress

Most documentation available in Debian explains how to do translations. What has been missing is explanation for package maintainers (who might have limited interest in translations) about how to support translators and to keep translations in the package.

This wiki page was distilled from the discussion started with Marc Haber's Rant on Debian-devel in March 2025, but tries to be more neutral about the topic. There is still surprisingly few documentation about this topic, so all people involved with Debian are kindly invited to add their knowledge and experience here.

As of today (March 2025) this cannot yet be seen as authoritative documentation.

For an introduction into the topic of translations, please refer to the existing docs on the Debian wiki i18n portal.

File types

Mark Things as translateable

When writing software, docs or debconf templates, the respective author marks certain strings as translateable. There is a number of conventions to do so which depend on the domain and on the respective programming language.

POT files

A preprocessing step pulls all translateable strings from the respective sources into a "PO Template" file with extension .pot, called the POT file. This is the file the translator for a new language that the strings are not already translated into uses as start for their work.

POT files have standardized comments and a header. They are generally named after the package with the extension .pot.

Care needs to be taken to generate the POT file with a sensible encoding, usually UTF-8.

The best point to create a POT file is when a development step is reached, a freeze or a release is being prepared and it is expected that the once generated POT file is unlikely to undergo big changes before the next release.

Most tools creating POT files overwrite the entire file, not saving the header information. It is important to study the options of the translation software to set proper options, e.g. for po4a the options --copyright-holder, --package-name and --package-version should be set appropriately. Also other options can guide translators. If those options are not used or are not sufficient, then some automatic postprocessing is recommended so that the POT file is brought into a state that it can be in the release with proper headers, version numbers, copyrights, timestamps, etc in place.

PO files

Those are the files that a translator hands in to the package maintainer and that contain the actual translation. They look similar to the POT file, but have already translated contents.

PO files have standardized comments and a header. They are generally named after the short name of the language (for example, pt_BR or nl) and the extension .po.

When the source of the translateable strings changes and the POT file is regenerated, the changed and/or new strings need to be merged from the POT file to the PO file. The msgmerge tool can be used to accomplish that. A Merge is generally strongly recommended each time a POT file was regenerated, otherwise translators will work on outdated strings. Please note that the toolchain may contain other tools for the merge, e.g. po4a handles this internally.

After merging, the PO file can be sent to the respective translators, for new languages the POT file should be sent. Note, however, that translators often pick the PO(T) files themselves, either from your VCS or from the service pages.

The merge tools usually leave the header of the PO file intact, pulling only some contents from the headers of the POT file into the merged PO file. This puts the header of the PO file into the domain of the respective translators. Sadly, the headers of the PO file are often not given the attention that should be given to have the translation properly documented.

It generally does not make sense to automatically postprocess PO files as they come from too many sources with different methods of work. You can use tools like i18nspector to search for problems. Two things to look out for are correct (updated) e-mail addresses in the header and the copyright (to reach the translator in the future) and a properly set encoding (should be UTF-8 most of the time).

What kinds of translations can be in a package

In a Debian package, the following kinds of translations can be done and need to be supported:

Gotchas

All three translations use the same file names and are in the source package distinguished by the paths they are stored in. Path information is lost in communication with the translators, so when receiving translations, a package maintainer has to take care not to mix up the received PO files and to put them in the right place in the source package. This has the potential of destroying past work and to make things harder to understand. Therefore, maintainers should carefully look at the e-mails received for wording like "programme translations" or "debconf" to find the proper type. If in doubt, compare the file with the (three) potential POT files to find the correct location.

Translation of strings in the software itself

This usually causes the least work for a package maintainer, as translations of strings in the software itself are usually done and collected by Upstream. Since users of the Upstream software can profit from the software translations as well, it usually makes sense to not maintain such translations in Debian. Should a package maintainer receive a translation for strings that are in the Upstream software, those translations should either be forwarded Upstream by the package maintainer, or, preferably, the translator should be encouraged to send the translations Upstream directly themselves, as this establishes a shorter path of communication. Note, however, that quite a few translators will require some guidance how to best submit translations upstream, as they are normally not programmers but users with a specific focus.

POT and PO files for program translations usually reside in the po directory at the root of the source package and are created/updated using the xgettext and msgmerge tools by the software authors before they release.

Translation of manual pages

manual pages are usually translated using the po4a Software Tool. Aside from the addendum mechanism, the explanations about POT and PO files further down this page do apply.

Often strings from all manual pages are compiled in a single POT file, so that the translator sees all manual pages as a single translation. Strings might be pulled out of context by the POT generation process.

po4a has a configuration file that normally resides in doc/po4a/po4a.conf

Generally, the POT file is in doc/po4a/po/<package>.pot, while the PO files reside in doc/pota/po/<language>.po. There are "addenda", which are delivered by the translator, are already written in the target language and can give information about the translation to the reader. po4a puts them into the translated manual pages at the specified place. They reside in doc/po4a/translator_<language>.add and need to be mentioned in doc/poea/po4a.conf.

In an ideal world, all upstream software would come with man pages and be ready for translation. Sadly, this doesn't apply to all software. The Maintainer of a Debian package is expected to support translation efforts for man pages, to bring translators in contact with Upstream, optionally help upstream with building translation infrastructure, and handle translations for man pages that have been written inside the Debian project themselves.

As an alternative, man page translations can also go into manpages-l10n. This is especially true where upstream does not host man page translations and hosting them in Debian only is not desirable (especially if the content is not Debian specific). If this is considered, the maintainer of manpages-l10n should be contacted.

Translation of debconf templates

Debconf templates are translated using the po-debconf Software Tool. This works with POT and PO files as well. They generally reside in debian/po inside the source package.

POT and PO files are updated by calling debconf-updatepo, which is frequently seen in the clean target of a debian/rules file. This approach is however not recommended.

In an ideal world, translations of debconf templates would be the only kind of translations that causes actual work for the Debian maintainer of a package.

Workflow to handle translations in a Debian package

Finding the right time

Translation work for a package maintainer tends to come in batches and should be coordinated carefully to avoid duplicate work for both translators and package maintainers.

During a normal Debian release cycle, maintainers work on their packages with different intensity, but there is usually a peak of activity when the freeze approaches. When a maintainer feels that a package might slowly approach a state that is fit to be in the next stable release, the time has come to work on the translations. It is important to do translation work when the software and the package are in a state where the strings are reasonably stable and finalized.

Translation updates are usually allowed in packages even during a freeze, so this is work that can still be done at this time. However, that is usually the time when translators are exceptionally busy since all packages are asking for translations at the time of or during the freeze. Also care should be taken that translators usually have some kind of QA process (proofreading), which requires some extra time.

It is recommended to keep the PO(T) files up to date, so translators can work (on their own risk) at any point of time (e.g. when they have resources to do so), but if translators are asked for updates (e.g. using podebconf-report-po, which can be used for all kinds of translations), the aforementioned stability criterium should be applied.

Create POT file

It is a common idiom to generate the POT file in debian/rules' clean target. Doing so is however not recommended for those reasons:

Recommendations for package maintainers:

Question: How would I take a backup of the pre-creation step in git? Is a tag enough, or should one branch off at this point?

Merge existing translations with new POT file

In a next step, existing PO files are individually merged with the newly created POT file. This is either automatically done by the updating tool, or the msgmerge tool (or po4a) is used to accomplish this. msgmerge doesn't modify the headers of existing PO files.

TOOD: How does itstool handle this?

Doing only slight rewordings, grammatical changes, changes in punctuation and when typos are fixed, msgmerge can still match a translation to the changed original string. This is called a "fuzzy" translation. That means that the translated string needs to be rechecked, but this is actually helpful for the translator as "fuzzy" translations can easily be searched for. Therefore, ensure that the option "--previous" is turned on when creating po(t) files.

Having to do this too often is demotivating translators because they'll have to review the same strings over and over again, when they could be creating new translations. Thus, if you intend to perform clean up of the strings (i.e. keep content, but update markup, punctuations, ..) try to bundle this at one point of time instead of handling it at several times in the development, if possible.

If a translated string is not found any more in the POT file, the resulting translation gets removed by msgmerge. This is actually worse than having a "fuzzy" translation.

TODO: Normally, those strings are put as "translation memory" at the end of the file and they are not removed. Check how this works.

msgmerge is not supposed to destroy any work that has been present in the existing PO file.

PO files have the additional property that they are modified by a tool, but also contain a non-significant amount of work done by a person.

Recommendations for package maintainers:

Gotcha

If POT and PO files are not updated before a release, translators will work on outdated strings, resulting in incompletely translated software or documentation. Therefor, when building your documentation, include building the PO and PO(T) files as well. Best practice for handling appropriate commits should be added here.

Old concern: It is currently not clear how to make sure that a package maintainer does not forget these updates without creating lots of useless commits with new PO files that only differ in line number and date stamp comments. Reply: Using msgmerge --no-location reduced this problem. Updated time stamps are no worry for translators (they seldomly look at them). And they do not care for commits - they look at master or similar or on the service pages and take what they find. So the most important part is to keep this current. If you need this for your VCS, you can add code in your build system to discard po(t) file updates which only change in the date stamp.

Please help here.

Call for Translations

There is some point in the development process when it is time to ask for translations. Translators need a POT file which contains all the translatable strings (if the language is not yet included), and they make a PO file from that which contains the actual translation or use the existing (updated) po file from previous translations in their language. For existing translations, translators need the old PO file (after merging) and optionally the POT file.

podebconf-report-po can be used to generate the calls for translation. One message is sent to the general i10n mailing list with the pot file attached. Existing translations get an individual message for each language, sent to the translators listed in the PO file and the respective team mailing list, should it be mentioned in the PO file. The respective PO file only is attached. Care should be taken to use the option --gzip, as otherwise the files might be too large or filter might apply (and thus the e-mail is not delivered).

It is important that all sources (version control, an eventually uploaded package and the file used to send) are in sync with each other. If they are not, translators are starting from inconsistent states which will cause at least incomplete translations and/or frustrated translators and/or users.

Handle Bounces

Depending on the age of the existing translations, a non-significant part of the of the individual messages sent out to translators is going to bounce. Since the individual language team is on Cc and those people know which team members are active and inactive, it is generally expected that the language team will take care of updating and sending in the new translation.

A package maintainer may want to double check whether the Team Address is still correct though. It is generally expected that debian-l10n-LANG@lists.debian.org points to an active mailing list, but of course there are exceptions. If the language teams looks strange, consider adding the Debian as well. And look for spelling mistakes in the list names.

What the Translator does

When a translator does a translation, they either begin with the old PO file for their language, or the POT file. That's why it's important that all possible sources of the old PO files are in sync.

Translators often don't bother about the header or copyright, so things like package name, license, program names, version numbers, Project-ID-Version and PO-Revision-Date are often questionable, unclear, or just plain wrong. You can use i18nspector to catch some errors. You should check the encoding looks sane (usually UTF-8) and it is recommended to check that the copyright was updated.

Including a PO file in the package

As there is no clear ownership of the PO file header, a package maintainer might want to fix small errors in the header either before sending out the PO file for translation, or after receiving a changed PO file and before committing it to the package. Of course, a package maintainer can also ask the translator to do those fixes, and the last decision whether a translation is committed or not is the package maintainer's.

The only situation when a package maintainer MUST consult the translator before doing a change in a PO file is when there are problems with the copyright entries and/or with the license. It is reasonable to expect that a translation is put under the same license as the original language that is being translated from. In addition, a PO file also contains the original version of the strings that are copyrighted by their respective author.

The new PO file is then committed in the place of the old PO file.

Recommendations for package maintainers:

Appendix

Suggestion for a PO/POT file header

# PACKAGE {program,debconf,manpage} translation
# Copyright (C) (list years, authors and translators here)
# This file is distributed under the same license as the PACKAGE package.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: Debian Maintainer Mail Address\n"
"POT-Creation-Date: YEAR-MO-DA HO:MI+ZONE\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: Translator Name <your.email@example.com>\n"
"Language-Team: Translation Team <team@example.com>\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
<empty line>

The "Plural" header that some versions of xgettext create is language specific and does not belong in the POT file.

PO/POT file header fields

The meaning of the header fields is explained in the Filling in the Header Entry chapter of the GNU gettext manual.