8933
Comment:
|
8944
|
Deletions are marked like this. | Additions are marked like this. |
Line 68: | Line 68: |
analyses PO file and removes broken most usual cases. | analyses PO file and removes broken msgstr for most usual cases. |
Contents
- Status of DocBook XML Transition
-
How to convert DebianDoc SGML source into DocBook XML
- Step 0: Prepare source
- Step 1: Check DebianDoc SGML (en) is in good shape
- Step 2: Convert DebianDoc SGML (en) to DocBook XML (basic)
- Step 3: Convert DebianDoc SGML (non-en) to DocBook XML (basic)
- Step 4: Debug original PO files
- Step 5: Let's think what to do
- Step 6: Touch up *.ent
- Step 7: Convert DebianDoc SGML to DocBook XML and PO
- Step 8: make DocBook XML with comments
- Problems and their solution
- Note: UTF-8 and DebianDoc SGML
Debian Documentation project considers the use of the modern DocBook XML in UTF-8 environment is better than using the older ?DebianDoc SGML.
Status of DocBook XML Transition
The user tag "docbook-xml-transition" is set for debian-doc@lists.debian.org team to track this migration of documents in the Debian archive.
In order to make this migration smooth, you require to have debiandoc-sgml version 1.2.20 or newer supporting debiandoc2dbk (wheezy version). If you are using squeeze environment, installing wheezy version directly onto your system is good enough (this is a Perl script).
How to convert DebianDoc SGML source into DocBook XML
Conversion from SGML to XML.
Let's assume we have followings in your working directory:
- manual.en.sgml manual.xx.po manual.yy.po funky.ent
Please note you need to install
- docbook-xsl moreutils libxml2-utils
Step 0: Prepare source
Copy example scripts in /usr/share/doc/debiandoc-sgml/examples to the working directories and make them executables.
Step 1: Check DebianDoc SGML (en) is in good shape
Verify ?DebianDoc SGML (en) source being usable by building html file.
$ debiandoc2html -1 manual.en.sgml
If you need files like funky.ent, generate them now.
Step 2: Convert DebianDoc SGML (en) to DocBook XML (basic)
Test to convert from ?DebianDoc SGML (en) source to DocBook XML.
$ debiandoc2dbk -1 manual.en.sgml
This should work now without problem. (If not, investigate...)
Verify generated XML file by building HTML via this DocBook XML for English. This can be done by:
$ ./debiandoc2dbkpo --html-dbk manual
If this builds html files OK, you have converted to DocBook XML files of English without comments and all entities are embedded.
Step 3: Convert DebianDoc SGML (non-en) to DocBook XML (basic)
Verify ?DebianDoc SGML to DocBook XML conversion from PO for Language(xx and yy).
$ ./debiandoc2dbkpo --html-dbk manual xx yy
If this builds html files OK, you have converted to DocBook XML files of English and non-English without comments and all entities are embedded.
This /debiandoc2dbkpo script runs ./debiandoc-lint4po script to analyses PO file and removes broken msgstr for most usual cases.
"E: ..." should not happen and may halt processing.
"W: ..." is likely to exist. These are not critical and ./debiandoc-lint4po script will remove those translation which are not suitable for following PO files for the DocBook XML files.
Please note this is just a first step to check you have decent source.
Step 4: Debug original PO files
In order to have the best conversion result, let's improve the health of the PO files.
The previous commands should have listed many warnings and possibly errors.
You can extract broken PO file as follows.
$ ./debiandoc-lint4po -v -u <$MANUAL.xx.po >$MANUAL.xx.unlint.po $ ./debiandoc-lint4po -v -u <$MANUAL.yy.po >$MANUAL.yy.unlint.po
This should help you fix glitches in the original PO file.
If it is not too much trouble, please fix such problem in original PO files ?DebianDoc SGML (non-en). This will improve quality of conversion but is not critical if you do not care loss of these parts.
Step 5: Let's think what to do
The above conversion embeds all the entities into converted DocBook XML files and all the comments in the source is lost.
Step 6: Touch up *.ent
In order to preserve *.ent, you create touched up version of it (them) by the following:
$ mv funkey.ent funkey-orig.ent $ ./debiandoc2dbk-unent <funkey-orig.ent > funky.ent
Step 7: Convert DebianDoc SGML to DocBook XML and PO
Here ia a bit more complicated way for conversion but automated with ./debiandoc2dbkpo.
Verify SGML to XML conversion while including this touched-up funky.ent by building HTML via XML for English and Language(xx and yy).
$ ./debiandoc2dbkpo --html-po manual xx yy
This should work in most cases but may fail while creating PO files (*.??.dbk.po). I will discuss possible sources of problems later.
If this build PO and html file OK, you have converted DocBook XML of English and Language(xx and yy) without comments.
You regain entities by the following:
$ ./debiandoc2dbk-ent <manual.en.dbk | sponge manual.en.dbk $ ./debiandoc2dbk-ent <manual.xx.dbk.po | sponge manual.xx.dbk.po $ ./debiandoc2dbk-ent <manual.yy.dbk.po | sponge manual.yy.dbk.po $ mv funkey-orig.ent funkey.ent
You may need to put the following at the top of manual.en.lint.dbk.
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [ <!ENTITY % funkey SYSTEM "funkey.ent" > %funkey; ]>
in where you see the following at the top of manual.en.dbk.
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [ ]>
Step 8: make DocBook XML with comments
Normal conversion process will strip comments. The idea is to convert comments
<!--- comment ... --->
into
<p>=====COMMENT===== comment ... =====TNEMMOC=====</p>
so these can be restored later. As long as comments are located between normal paragraph, example script ./debiandoc2dbk-wrap does good enough job. You may need some manual edits prior to using this. All comments before <book> and after </book> needs to be removed to start with.
$ edit manual.en.sgml $ ./debiandoc2dbk-wrap <manual.en.sgml >manual-comment.en.sgml $ ./debiandoc2dbkpo manual-comment
This is a bit of trial-and-errors. You do it until you get html.
If you are successful, you create DocBook XML with comments by
$ ./debiandoc2dbk-unwrap <manual-comment.en.dbk >manual.en.dbk
If you use this as xml file, you may get some fuzzy due to space differences. There you may need manual tweaks.
Problems and their solution
Sometimes, translator add locale specific modification which can work in SGML but generated XML may not make one-on-one correspondence.
Sometimes, DTD model of ?DeianDoc SGML may be different from DocBook XML which makes normal conversion difficult.
These are fundamental problem and needs to be addressed manually.
If translator decided to add some new contents and sneaked in them by smart addition of contents, they may cause problem. ./debiandoc-lint4po script should have removed most of those by now.
If source have different contents in English SGML with different content level while translated text happens to be the same, then you see:
... msgid (at maint-guide.en.lint.dbk:36) is of type 'Content of: <book><chapter><section><itemizedlist><listitem><itemizedlist><listitem><para>' while msgstr (at maint-guide.ja.lint.dbk:36 maint-guide.ja.lint.dbk:59) is of type 'Content of: <book><chapter><section><itemizedlist><listitem><para>'. Original text: <literal>while quilt push; do quilt refresh; done</literal> to apply all patches while removing <emphasis>fuzz</emphasis>; Translated text: <literal>while quilt push; do quilt refresh; done</literal> として <emphasis>fuzz</emphasis> を削除しながら全てのパッチを適用します。 (result so far dumped to gettextization.failed.po) ...
This ERROR needs to be worked around by adding bogus content to one of the contents in SGML PO file.
From:
msgid "foo foo and foo" msgstr "bar bar XX bar" msgid "foo foo, and foo" msgstr "bar bar XX bar"
To:
msgid "foo foo and foo" msgstr "bar bar XX bar" msgid "foo foo, and foo" msgstr "bar bar XX bar[XXX_FIXME1_XXX]"
These [XXX_FIXME.*_XXX] can be recoverted in final DocBook XML and its PO files by your manual touch-up. Since these are so common, ./debiandoc2dbk-ent can handle recovery of them.
Sometimes, translator places additional contents within additional <footnote>...</footnote>. For this type, the PO file proofing script does best effort to retain translation by mangling tags within <footnote>...</footnote>.
From:
msgid "foo foo and foo" msgstr "bar bar <footnote>XX</footnote> bar"
To:
msgid "foo foo and foo" msgstr "bar bar @@@[tagopen_footnote]@@@XX@@@[tagclose_footnote]@@@ bar"
These contents are preserved via DocBook conversion but there usually need some manual touch-ups later to the source PO file and *.ent file.
Note: UTF-8 and DebianDoc SGML
Recent debiandoc-sgml package supports UTF-8 encoded source and generated files. But this is a hack.