Abstract

In order to be able to localize (translate, display dates, display addresses, display/use the ISBN code, choose the right paper format, etc. ...) a software in Romanian properly, a software developer of an application must understand the issues that affect the Romanian language and common mistakes and misconceptions about this language.

This page is intended as a guide to software developers which want to introduce proper support for Romanian in thier applications, but are confused about what needs to be done to accomplish that.

PLEASE CONTACT THE UNIFIED ROMANIAN TRANSLATION TEAM IF YOUR QUESTION IS NOT ANSWERED HERE, OR NOT LISTED AT ALL: diacritice AT_SIGN googlegroups.com

Common mistakes, usual questions

Here is a list of common problems/questions in relation to localization (abbrev. l10n, from now on) into Romanian:

Which is the right 8 bit code page for Romanian? Which encodings really support Romanian?

The only encoding that can support properly Romanian is UTF-8. The only 8 bit encoding that can support Romanian properly is iso-8859-16, although it is not commonly used. On the other side, iso-8859-2 is commonly used, even if it doesn't properly supports all correct diacritics and signs (See the next question for details).

Which extra (in relation to usual western-european codepoints) unicode codepoints are needed for Romanian?

ț (0x0218), Ț (0x0219), ș (0x021A) and Ș (0x021B) (t/T comma and s/S comma characters), ă/Ă - 0x0103/0x0102 (a/A breve - do not confuse this with ã - a tilda), â/Â - 0x00E2/0x00C2 (a/A circumflex) and î/Î - 0x00EE/0x00CE (i/I circumflex). Supplemental to these, the quotes „” (99 low and 99 up - unicode points 0x201E and 0x201D respectively) and «» (unicode points 0x00AB and 0x00BB) are needed. Older translations or the ones using iso-8859-2 might use t/T cedilla (0x0163/0x0162) and s/S cedilla (0x015F/0x015E) instead of the correct diacritics (see L10n/Romanian/CommaTransition for more details).

What is the s/S/t/T comma/cedilla madness?

See L10n/Romanian/CommaTransition.

Although I know I have support for encodings that support Romanian, my application doesn't look right in Romanian. Why?

Most likely you are not using a font that has those diacritics. We recommend the ?DejaVu true type font. Is based on the Bitstream Vera font and contains a lot more glyphs than ?BitStream Vera (the license of the ?BitStrem Vera fonts forced the ?DejaVu people to change the name, but basically ?DejaVu is a superset of ?BitStream Vera which follows the line of the original font).

Isn't Romanian a slavic/non-latin language?

No. Romanian is a latin language and it uses the latin alphabet with 5 supplemental diacritics (and their upper case couterparts).

Which are the right quotation marks used in Romanian?

„” and «» . That is "DOUBLE LOW-9 QUOTATION MARK" (0x201E) "RIGHT DOUBLE QUOTATION MARK" (0x201D) and the alternative << (LEFT-POINTING DOUBLE ANGLE QUOTATION MARK - 0x00AB) >> (RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK - 0x00BB) pair. They open and close in the order used here. The alternative pair is only used inside the primary pair. If more quotation is needed inside the alternative quote, the primary and alternative pairs are used alternatively.

Example: This is a „single Quote”. This is an „outer Quote containing an «inner quote»”.

Which are the abbreviations used for the week day names? Which are the week day names in Romanian? Which is considered the first day of week in Romania? Which is the first working day of the week?

First of all, before implementing any code about the days of the week in your application, remember that glibc has this information already and data is already present in there.

The abbreviation are: Lu, Ma, Mi, Jo, Vi, Sb, Du that is luni (Monday), marți (Tuesday), miercuri (Wednesday), joi (Thursday), vineri (Friday), sâmbătă (Satruday), duminică (Sunday). The first day of the week is luni/Monday. The first working day of the week is also luni/Monday.

Which is the proper date format in Romanian?

First of all, before implementing any code about the days of the week in your application, remember that glibc has this information already and data is already present in there.

the compact format is:

$ date +%c
Lu 30 apr 2007 00:50:00 +0300

That is day of week, day of month (always 2 digits), abbreviated month name, year (all digits), hour (24 hour format, 2 digits), minutes (2 digits), seconds (2 digits) time zone offset (the time zone abbreviation is EET or EEST, but is never used)

The long form is:

$ date
luni 30 aprilie 2007, 00:50:03 +0300

The information is in the same order, but with there are changes in the presentation. The day of week and the month name are in thier long forms and the day of month uses as few digits as possible (e.g. 1st of May is "1 mai" not "01 mai")

I have a new application and I don't care about compatibility issues, but I want to support Romanian. What should I do?

Make sure you have UTF-8 support or some other unicode encoding (although UTF-8 is prefered), and offer a way to make translation in your application. Please, do not reinvent the wheel and use gettext. It is also legal to use it even for commercial applications (at the moment of this wrting, 0.15 version of this library is available under lgpl, so you can link your application if you just use the support for i18n; for more details, see the gettext license).

Somebody contacted me and told me to change a lot of things in my application in order to support Romanian. He proposed a lot of changes, but I am not sure about them. Where can I get more information about the proposed changes?

Look on this page and ask on any of our mailing list diacritice AT googlegroups DOT com, or on the debian-l10n-romanian list (debian-l10n-romanian AT lists DOT debian DOT org).

References