This document describes the currently used methods to distribute localized data and current known issues. Localized data includes binary MO files, translated documentation files and translated sounds (for example, in games).
There are no improvements discussed here, but in TranslationDebs.
Contents
Overview
There are currently several methods to distribute localized data:
- Bundling localization data for all available languages in a binary package (either the application binary or a -data package). The main issue with this approach is the size of the package.
- Architecture-independent packages associated with the application packages providing localized data (e.g. Mozilla applications' translations). There is typically one package per language. Since they "enable" the language translation for that software when installed they are called language packages or langpacks. The main issue with this approach is that the language package for an application is not installed automatically.
- Architecture-independent packages that group several packages' translations for a specific language (e.g. KDE translation packages)
The first method is suboptimal while the second is less user-friendly. The third forces the user to have translations for applications that might not be used.
Bundling localized data for all languages with the application package
Description
This is the simplest solution. Localized data is included in the application package.
Example
The apt package contains localized manual pages in /usr/share/man/ and binary MO files for several language in /usr/share/locale/. All of the files provided by apt in these directories are localized data.
$ dpkg -L apt|perl -nle 'print if -f'|xargs du -c|tail -n 1 3824 total $ dpkg -L apt|egrep '/usr/share/man|/usr/share/locale'|perl -nle 'print if -f'|xargs du -c|tail -n 1 2456 total $ apt-cache show apt|egrep '(Installed-Size|Version)' Installed-Size: 4312
This shows that between 56 to 65 percent of the size of apt 0.6.46.4-0.1 is constituted by localized data.
Issues
- Binary package size grows (for all architectures) because of translations and data associated with localization.
- On multi-architecture mirrors, architecture-specific packages increase disk usage and bandwidth usage for synchronizations.
- Increases bandwidth usage for users and uploading mirrors.
- On a translation update (or new languages are introduced) users have to download all the binary package even if the change is irrelevant to them.
Increases disk space usage for users. localepurge, considered a hack, exists to diminish this issue.
- Time for installs is increased due to getting and unpacking a larger .deb.
- Localized data is in the same binary package and therefore has to be built from the same source package as the application.
- Localized data can not be handled by different maintainers.
- Translation updates can not be made independently from the application binary package and could cause a regression in the application package. It is risky to do translation updates during a freeze.
- A translation update means that the binary package needs to be rebuilt. Consequently:
- Maintainers might want to wait for a new software release before providing the translation updates. So translator's work is not readily available for users (e.g. debconf updates sitting in the BTS)
Translation updates have to go through the sid -> testing transition mechanism.
- There can not be any translation updates post-release.
Users install translations they will never use, wasting bandwith and consuming disk space (that is the reason why localepurge exists, see http://packages.debian.org/localepurge)
Language packages
Description
Each language has a separate package for its localized data. Since there are different binary packages for the application and its localized data, the binary packages can be generated either from the application's source package or use another source package (depending on upstream's approach to translations). Translation updates could be made after a release.
Advantages
- Translations can be handled by different maintainers:
- the translation release process does not have to depend on the upstream maintainer
- customs (or derived) distributions can provide new translations
Example
The iceweasel package relies on iceweasel-l10n-fr to provide the French translation. As of March 2007, there are 50 iceweasel-l10n packages. The sum of their installed size is 41 MB, which is shipped in separate packages rather than multiplying iceweasel's size by about 1.5.
$ grep-aptavail -r -P 'iceweasel$' -s Installed-Size Installed-Size: 26940 $ grep-aptavail --eregex -P 'iceweasel-l10n.*' -s Installed-Size|cut -d ' ' -f2|awk '{t=t+$1}END{print t}' 40912
Language packages are usually built from a special source package (e.g. iceweasel-l10n).
Issues
- Users need to specifically select the packages they want to install in order to make their translation available. The main binary package cannot Depend: on the language packages, at most it must Recommend: them
- If the l10n packages are all recommended, by default, all get installed
- If the l10n are recommended as alternatives, one of them (the first) is prefered
- The number of packages in the distribution grows with the number of language packages.
- Increases the time needed by package management tools to handle the dependency tree.
- Increases the time needed to download Packages.
- Langpacks are not automatically removed when the software for which they provide localized data is removed (unless the langpack depends on it).
- Installation of an application and the localized data for one language requires the installation of two packages rather than one, which can trigger two disk seeks rather than a single one on mirrors. This significantly reduces performance of mirrors limited by disk I/O.
- if the language pack is tightly related to a specific software version and its release process does not follow the main software release process you might end up with *old* language packs which are not up-to-date. This is specially true when the language pack is distributed by third-parties. In Debian this has led to issues with propagation into testing of Mozilla packages due to language packs in testing not being updated (they have to be removed from testing before the new version of the Mozilla software is introduced)
- buggy translation updates might lead to translations "broken" after the release (see Ubuntu Bug #52267)
- even if language packs do not update the binary software they might introduce bugs which might break software (e.g. segmentation fault in dd due to problematic MO binary files in language packs, Ubuntu bug #42264)
- for software in which translation packages are divided upstream (e.g. Firefox) they might make it possible to install language packs through their own UI mechanism (independent of the system's package) which might cause confusion (See Ubuntu Bug #31284 and
)
Several applications' translations in one package per language
All localization material for a language for a large set of applications is grouped into one package per language. This method is used by the KDE packages, where all the localization material for a language is placed in a single package and covers the entire KDE suite.
Advantages
- The number of packages doesn't grow as much as in the previous case.
- Easy to get all the localization material
Issues
- If the language package is provided for a large collection of software (such as KDE) the user has to install the full language pack regardless of the pieces of software he actually uses.
- The user still has to remove by hand the l10n package, unless it depends on the main package