(!) Discussion


This document describes the currently used methods to distribute localized data and current known issues. Localized data includes binary MO files, translated documentation files and translated sounds (for example, in games).

There are no improvements discussed here, but in TranslationDebs.

Overview

There are currently several methods to distribute localized data:

The first method is suboptimal while the second is less user-friendly. The third forces the user to have translations for applications that might not be used.

Bundling localized data for all languages with the application package

Description

This is the simplest solution. Localized data is included in the application package.

Example

The apt package contains localized manual pages in /usr/share/man/ and binary MO files for several language in /usr/share/locale/. All of the files provided by apt in these directories are localized data.

$ dpkg -L apt|perl -nle 'print if -f'|xargs du -c|tail -n 1
3824    total
$ dpkg -L apt|egrep '/usr/share/man|/usr/share/locale'|perl -nle 'print if -f'|xargs du -c|tail -n 1
2456    total
$ apt-cache show apt|egrep '(Installed-Size|Version)'
Installed-Size: 4312

This shows that between 56 to 65 percent of the size of apt 0.6.46.4-0.1 is constituted by localized data.

Issues

Language packages

Description

Each language has a separate package for its localized data. Since there are different binary packages for the application and its localized data, the binary packages can be generated either from the application's source package or use another source package (depending on upstream's approach to translations). Translation updates could be made after a release.

Advantages

  1. Translations can be handled by different maintainers:
    1. the translation release process does not have to depend on the upstream maintainer
    2. customs (or derived) distributions can provide new translations

Example

The iceweasel package relies on iceweasel-l10n-fr to provide the French translation. As of March 2007, there are 50 iceweasel-l10n packages. The sum of their installed size is 41 MB, which is shipped in separate packages rather than multiplying iceweasel's size by about 1.5.

$ grep-aptavail -r -P 'iceweasel$' -s Installed-Size
Installed-Size: 26940
$ grep-aptavail --eregex -P 'iceweasel-l10n.*' -s Installed-Size|cut -d ' ' -f2|awk '{t=t+$1}END{print t}'
40912

Language packages are usually built from a special source package (e.g. iceweasel-l10n).

Issues

  1. Users need to specifically select the packages they want to install in order to make their translation available. The main binary package cannot Depend: on the language packages, at most it must Recommend: them
    1. If the l10n packages are all recommended, by default, all get installed
    2. If the l10n are recommended as alternatives, one of them (the first) is prefered
  2. The number of packages in the distribution grows with the number of language packages.
    1. Increases the time needed by package management tools to handle the dependency tree.
    2. Increases the time needed to download Packages.
  3. Langpacks are not automatically removed when the software for which they provide localized data is removed (unless the langpack depends on it).
  4. Installation of an application and the localized data for one language requires the installation of two packages rather than one, which can trigger two disk seeks rather than a single one on mirrors. This significantly reduces performance of mirrors limited by disk I/O.
  5. if the language pack is tightly related to a specific software version and its release process does not follow the main software release process you might end up with *old* language packs which are not up-to-date. This is specially true when the language pack is distributed by third-parties. In Debian this has led to issues with propagation into testing of Mozilla packages due to language packs in testing not being updated (they have to be removed from testing before the new version of the Mozilla software is introduced)
  6. buggy translation updates might lead to translations "broken" after the release (see Ubuntu Bug #52267)
  7. even if language packs do not update the binary software they might introduce bugs which might break software (e.g. segmentation fault in dd due to problematic MO binary files in language packs, Ubuntu bug #42264)
  8. for software in which translation packages are divided upstream (e.g. Firefox) they might make it possible to install language packs through their own UI mechanism (independent of the system's package) which might cause confusion (See Ubuntu Bug #31284 and

http://librarian.launchpad.net/1566411/update_not_supported.png)

Several applications' translations in one package per language

All localization material for a language for a large set of applications is grouped into one package per language. This method is used by the KDE packages, where all the localization material for a language is placed in a single package and covers the entire KDE suite.

Advantages

  1. The number of packages doesn't grow as much as in the previous case.
  2. Easy to get all the localization material

Issues

  1. If the language package is provided for a large collection of software (such as KDE) the user has to install the full language pack regardless of the pieces of software he actually uses.
  2. The user still has to remove by hand the l10n package, unless it depends on the main package