Notes taken during the discussion about D-I modularization (jfs)
There was an extensive discussion on how to improve the way d-i handles translations so that it will be possible, in the future, to provide as many translations as we are provided with.
See also the thread starting from a mail from Frans Pop on this subject.
The current d-i mechanism is:
the data for all the languages for which translations are available is distributed for all the "core" udebs in d-i in the main initrd. This includes the translation of localechooser which includes all available countries / languages translated to all languages. Translations are included for all messages (expert and normal modes)
[FJP] Note that which udebs go into an initrd differs per architecture and installation method. There is no "one set of core udebs".
- when the user selects new components (after having selected, and configured, their data sources), the udebs include the translations for all languages regardless of their language selection.
With the current d-i mechanism there are some limitations since new translations impact:
- in the size of the initrd RAM disk
- the bandwidth required (in network-based installations) since the translations take up space in the components themselves
- in the memory available for installation tasks (since each new component is downloaded in the RAM disk, with translations taking some space here too)
[FJP] What concerns me most currently is the third issue. The first issue may already be a concern for some architectures, but I'm not sure. It may become a bigger concern at some point in the furure though. The second issue is not really a serious concern IMO.
Note that for the 2nd and 3rd issues compression of template files will not help: both the intrd (at least in most cases) and the udebs are already compressed. initrd size is of course a concern for floppy based installation, but that is already solved by excluding all translations.
Some alternatives were discussed:
separate translations from components into different udebs so the d-i would install the component and would only download the translation for the language the user selected. This seems feasible to do although it would mean modifying the udeb generation (so different udebs are generated for the translations) as well as ANNA (so it is 'intelligent' enough to know it has to download an additional component base on the user's selected language).
[FJP] IMO the explosion in the number of udebs is the main argument against this solution. The only realistic implementation would be to create udebs per "language group", i.e. a maximum of 4 or 5 extra language udebs per udeb.
- Pros: translations can be installed / downloaded independently, users do not have to download translations they will not use (RAM and bandwith use decrease)
- Cons: many more udebs (# of components x # of languages), requires changes to build process and component download mechanisms, requires changes to how cdebconf manages messages (udebs with translations will be installed before/after the actual package with the debconf messages), is not possible to get the translation before some media or the network is accessible (as a consequence, the initrd will not decrease).
generate different initrds per language "zone" (Europe, Asia, America...). So that the user has to select a language (or 'zone') at boot and the appropriate initrd is loaded.
[FJP] Language zones are a difficult concept and don't really work for example for Europe + America's. Still, it should be possible to define language groups roughly based on a combination of character set and region.
Generating separate initrds has major implications for d-i and CD building, mirroring, etc.
What could be an option is that AIUI initramfs initrds allow to "add on" additional initrds at run time. This means that we could have a core initrd that is always loaded and maybe dynamically add on a "language group" initrd from localechooser. Not sure if this really is possible technically though.
- Pros: reduces initial RAM disk size,
- Cons: initrd generation needs to be modified, boot loader needs to be modified so that language selection is an option, users just pressing 'enter' might not get what they expect (they don't get their language unless it's in the 'default' zone)
provide only translations to the messages that the average users see in the udeb core components. Do not include translations for "expert" messages.
[FJP] Fairly difficult to implement too as priority is defined in the code, not in the PO files.
- Pros: reduces initial RAM disk size,
- Cons: translations for expert messages are lost, users might not find it easy to debug tasks (if there is a failure d-i fallbacks to expert mode)
reduce localechooser translations so that translations are only countries the user might select (i.e. do not provide Japanese, Chinese, Korean translations for European countries and viceversa).
[FJP] I don't see how this is different from 'different initrds per language "zone"'.
- Pros: reduces initial RAM disk size (localechooser udeb is large)
- Cons: users might find that the country they are in (if foreigner living abroad) is not translated (the "Other" option in localechooser will always be shown in English), translations (iso-codes) go unused
provide a single component with all the translations. Extract all messages for a given language and generate a two udebs, one for the core system (installed in the initrds) and other for the rest of components. When the user selects a language the udeb for his language is downloaded.
[FJP] As explained above, there is not a single set of "core" udebs. Also, combining translations from different udebs into one udeb per language will create huge synchronization problems as (strings in) udebs change all the time, are never uploaded all together and may migrate to testing at different times.
- Pros: translations can be installed / downloaded independently, users do not have to download translations they will not use (RAM and bandwith use decrease), only needs a few udebs (one per translation), udebs of unused translations in "core" can be removed
- Cons: large udebs (size?), requires changes to build process and component download mechanisms, requires changes to how cdebconf manages messages (udebs with translations will be installed before/after the actual package with the debconf messages), users download translations they will not use, is not possible to get the translation before some media or the network is accessible (as a consequence, the initrd will not decrease).
use the 'lowmem' mechanisms to remove unused translations in the core after language selection.
[FJP] As explained on the mailing list I only see this as desirable if implemented within lowmem, not as a general mechanism.
- Pros: reduce RAM disk size (but not initrd size) once the user selects a language, can be extended also to component downloads
- Cons: the user cannot go back and select a different language, requires coding similar to lowmem