4537
Comment: Create unicode coverage of debian fonts page
|
4486
Unicode 14
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
This page aims to document the Unicode coverage of all the fonts (including non-free ones) in Debian. It was created using gucharmap. You should also read the ["DebianInstaller/GUIFonts"] page to find out about the fonts used by the installer. | #language en This page aims to document the Unicode coverage of all the fonts (including non-free ones) in Debian. It was created using `fontforge`, `font-manager` and `gucharmap`. You should also read the [[DebianInstaller/GUIFonts]] page to find out about the fonts used by the installer. Please add any fonts which might fill in some of these blocks to the [[Fonts/Missing]] page. TODO: automatically create this page based on the latest unicode-data package as well as all fonts in Debian unstable, using the fonts review: https://pkg-fonts.alioth.debian.org/review/ |
Line 5: | Line 12: |
Wikipedia languages whose names Firefox could not render. Inuktitut and Yi could not be rendered at all. Kannada and Sichuan Yi had some problems. | Firefox seems to render all Wikipedia languages: |
Line 7: | Line 14: |
http://meta.wikimedia.org/wiki/List_of_Wikipedias | http://meta.wikimedia.org/wiki/List_of_Wikipedias [[https://phabricator.wikimedia.org/diffusion/MW/browse/master/languages/data/Names.php]] |
Line 9: | Line 18: |
= Strange = !DejaVu sans uses some of the private use area, is this valid? |
|
Line 14: | Line 20: |
With missing characters mentioned, most of these are available as bitmap glyphs in ''Unifont'' or ''Unifont Upper''. | |
Line 15: | Line 22: |
* Symbol modifier letters: U+02EF to U+02F2, U+02F4 to U+02F6, U+02F8 to U+02FF * Hebrew: U+05C6. Looks like several other characters in this block use this symbol - U+05A2, U+05C5 and U+05C7 also have the square with numbers in them. * Syriac: U+072D to U+072F, U+074D to U+074F * Devanagari: U+097D * Gujarati: U+0ABC, U+0AE1 to U+0AE3 * Tamil: U+0BB6, U+0BE6, U+0BE6, U+0BF3 to U+0BFA * Teluga: looks like it draws on 2 chars in some other sections - U+0C55 and U+25CC * Kannada: U+0C8C, U+0CBC, U+0CBD, U+0CE1 and draws on U+25CC * Sinhala: 18 missing chars, plus it draws on a few others. 25 chars missing parts in total. * Thai: 16 chars draw on non-printing character U+200D * Tibetan: U+0FD0 and U+0FD1 look buggy, as they show the letters 0FD0 and 0FD1 instead of being symbols. The font is Tibetan Machine Uni. * Ethiopic: U+1207, U+1247, U+1287 * Phonetic extensions: 18 characters missing * Phonetic extensions supplement: 23 * General punctuation: 5 * Superscripts and subscripts: 5 * Currency: german penny and former argentinian currency (AUSTRAL) * Letterlike symbols: 19 * Arrows: 26 * Mathmatical operators: 13 * Misc. symbols: 14 * Coptic: > 60 * CJK radicals supplement: about 20 * CJK symbols and punctuation: 15 * Katakana: U+30FF * Enclosed CJK letters and month: about half * CJK compatability: about a third * CJK Unified Ideographs: this is a huge block, with about an eighth not defined. * Arabic presentation forms A: about 2 thirds to 3 quarters empty * CJK compatability forms: 5 * small form variants: U+FE58 SMALL EM DASH * half width and fullwidth forms: the hangul ones, 20 or so * linear B syllabry: 14 * linear B ideograms: about half * Aegean numbers: 9 - subunit ones * Deseret: 4 - captial/small EW and OI |
* '''Ahom''': U+11740 to U+11746 * '''Arabic''': U+061D * '''Arabic extended-A''': U+08B5, U+08C8 to U+08D2 and U+08E2 * '''Arabic presentation forms-A''': U+FBC2, U+FD40 to U+FD4F, U+FDCF and U+FDFE to U+FDFF * '''Balinese''': U+1B4C, U+1B7D and U+1B7E * '''Bopomofo extended''': U+31BC to U+31BF * '''Brahmi''': U+11070 to U+11075 * '''Chakma''': U+11147 * '''CJK unified ideographs''': U+9FF0 to U+9FFF * '''CJK unified ideographs extension A''': U+4DB6 to U+4DBF * '''CJK unified ideographs extension B''': U+2A6D7 to U+2A6DE and U+2A6DF * '''CJK unified ideographs extension C''': U+2B735 to U+2B738 * '''CJK unified ideographs extension G''': U+30000 to U+30728, U+3072A to U+30EDC, U+30EDF to U+3106B and U+3106D to U+3134A * '''Combining diacritical marks extended''': U+1AC1 to U+1ACE * '''Combining diacritical marks supplement''': U+1DFA * '''Currency symbols''': U+20C0 * '''Enclosed alphanumeric supplement''': U+1F10D to U+1F10F, U+1F16D to U+1F16F and U+1F1AD * '''Geometric shapes extended''': U+1F7F0 * '''Glagolitic''': U+2C2F and U+2C5F * '''Hebrew''': U+05EF * '''Ideographic symbols and punctuation''' U+16FF0 and U+16FF1 * '''Kaithi''': U+110C2 * '''Kana extended-A''': U+1B11F to U+1B122 * '''Kannada''': U+0CDD * '''Latin extended-D''': U+!A7C0, U+!A7C1, U+!A7D0, U+!A7D1, U+!A7D3, U+!A7D5 to U+!A7D9 and U+!A7F2 to U+!A7F4 * '''Mongolian''': U+180F * '''Musical symbols''': U+1D1E9 and U+1D1EA * '''Oriya''': U+0B55 * '''Supplemental punctuation''': U+2E53 to U+2E5D * '''Supplemental symbols and U+pictographs''': U+1F979 and U+1F9CC * '''Symbols and U+pictographs extended-A''': U+1FA7B, U+1FA7C, U+1FAA9 to U+1FAAC, U+1FAB7 to U+1FABA, U+1FAC3 to U+1FAC5, U+1FAD7 to U+1FAD9, U+1FAE0 to U+1FAE7 and U+1FAF0 to U+1FAF6 * '''Tagalog''': U+170D, U+1715 and U+171F * '''Tangut components''' U+18AF3 to U+18AFF * '''Telugu''': U+0C3C and U+0C5D * '''Takri''': U+116B9 * '''Transport and U+map symbols''': U+1F6DD to U+1F6DF * '''Vedic extensions''' U+1CFA |
Line 52: | Line 60: |
= Mostly empty = | = Completely empty blocks = |
Line 54: | Line 62: |
Misc tech, Control pics, misc math symbols A, CJK Unified Ideographs Extension A & B, CJK Unified Ideographs supplement. | Most of these blocks have bitmap glyphs available in ''Unifont'' or ''Unifont Upper''. |
Line 56: | Line 64: |
For the CJK parts, Arne Götje (高盛華) says that they are seldom used, that CJK daily use characters are supported and that he is working on the rest of them, but they won't be finished any time soon. | * '''Arabic extended B''' (U+0870...) * '''Chorasmian''' (U+10FB0...) * '''Cypro-Minoan''' (U+12F90...) * '''Dives Akuru''' (U+11900...) * '''Ethiopic extended-B''' (U+1E7E0...) * '''Indic Siyaq numbers''' (U+1EC71...) * '''Kana extended-B''' (U+1AFF0...) * '''Khitan small script''' (U+18B00...) * '''Latin extended-F''' (U+10780...) * '''Latin extended-G''' (U+1DF00...) * '''Lisu supplement''' (U+11FB0) * '''Makasar''' (U+11EE0...) * '''Nandinagari''' (U+119A0...) * '''Old Uyghur''' (U+10F70...) * '''Ottoman Siyaq numbers''' (U+1ED00...) * '''Syriac supplement''' (U+0860...) * '''Tangsa''' (U+16A70...) * '''Tangut supplement''' (U+18D00...) * '''Toto''' (U+1E290...) * '''Unified Canadian Aboriginal Syllabics extended-A''' (U+11AB0...) * '''Vithkuqi''' (U+10570...) * '''Znamenny musical notation''' (U+1CF00...) |
Line 58: | Line 87: |
= No Glyphs or almost empty = Ethiopic supplement, Unified Canadian Aboriginal Syllabics, Myanmar, Ogham, Tagalog, Buhid, Tanbanwa, Mongolian, New Tai Lue, Kanbun, ethiopic extended, Katakana phonetic extensions, Yijing hexagram symbols, Yi symbols, Yi radicals, ancient greek numbers, byzantine musical symbols, musical symbols, ancient greek musical notation, Tai Xian Jing Symbols, combining diacritical marks, OCR, supplemental arrows A & B, misc math symbols B, supplemental math operators, misc symbols and arrows, supplemental punctuation, ideographic description characters, modifier tone letters, variation selectors, mathematical alphanumeric symbols and variation selectors supplement and combining diacritical marks for symbols (only one character - from !FreeSerif). For myanmar, at the moment there is a proposal to update this block to support Mon and some other languages from Burma. There are several sources for fonts for this, although some intrude on parts of Unicode that are not yet defined. Burmese is in a state of flux at the moment. Also Klingon uses U+!F8D0 to U+F8FF in the private use area, but there are no fonts for it in Debian. |
Some conscripts in the [[http://www.evertype.com/standards/csur/|ConScript Unicode Registry]] (such as Klingon, Tengwar and Visible speech) have bitmap glyphs available in ''Unifont CSUR''. |
Line 68: | Line 91: |
http://www.unifont.org/fontguide/ http://www.alanwood.net/unicode/fonts.html http://en.wikipedia.org/wiki/Unicode_font http://www.evertype.com/celtscript/ http://www.travelphrases.info/fonts.html http://my.wikipedia.org/wiki/Wikipedia:Font http://sourceforge.net/projects/prahita http://sourceforge.net/projects/uniburma http://www.evertype.com/celtscript/ogfont.html http://www.gov.nu.ca/font.htm http://www.gov.nu.ca/Nunavut/English/font/ http://www.evertype.com/celtscript/inuktitut.html http://www.google.com/search?q=Inuktitut+fonts http://www.evertype.com/celtscript/ogfont.html |
http://en.wikipedia.org/wiki/Unicode_font http://www.unicode.org/standard/supported.html http://www.unicode.org/standard/unsupported.html http://www.unifont.org/fontguide/ http://www.alanwood.net/unicode/fonts.html http://www.travelphrases.info/fonts.html |
This page aims to document the Unicode coverage of all the fonts (including non-free ones) in Debian. It was created using fontforge, font-manager and gucharmap. You should also read the DebianInstaller/GUIFonts page to find out about the fonts used by the installer.
Please add any fonts which might fill in some of these blocks to the Fonts/Missing page.
TODO: automatically create this page based on the latest unicode-data package as well as all fonts in Debian unstable, using the fonts review:
https://pkg-fonts.alioth.debian.org/review/
Quick test
Firefox seems to render all Wikipedia languages:
http://meta.wikimedia.org/wiki/List_of_Wikipedias
https://phabricator.wikimedia.org/diffusion/MW/browse/master/languages/data/Names.php
Incomplete blocks
With missing characters mentioned, most of these are available as bitmap glyphs in Unifont or Unifont Upper.
Ahom: U+11740 to U+11746
Arabic: U+061D
Arabic extended-A: U+08B5, U+08C8 to U+08D2 and U+08E2
Arabic presentation forms-A: U+FBC2, U+FD40 to U+FD4F, U+FDCF and U+FDFE to U+FDFF
Balinese: U+1B4C, U+1B7D and U+1B7E
Bopomofo extended: U+31BC to U+31BF
Brahmi: U+11070 to U+11075
Chakma: U+11147
CJK unified ideographs: U+9FF0 to U+9FFF
CJK unified ideographs extension A: U+4DB6 to U+4DBF
CJK unified ideographs extension B: U+2A6D7 to U+2A6DE and U+2A6DF
CJK unified ideographs extension C: U+2B735 to U+2B738
CJK unified ideographs extension G: U+30000 to U+30728, U+3072A to U+30EDC, U+30EDF to U+3106B and U+3106D to U+3134A
Combining diacritical marks extended: U+1AC1 to U+1ACE
Combining diacritical marks supplement: U+1DFA
Currency symbols: U+20C0
Enclosed alphanumeric supplement: U+1F10D to U+1F10F, U+1F16D to U+1F16F and U+1F1AD
Geometric shapes extended: U+1F7F0
Glagolitic: U+2C2F and U+2C5F
Hebrew: U+05EF
Ideographic symbols and punctuation U+16FF0 and U+16FF1
Kaithi: U+110C2
Kana extended-A: U+1B11F to U+1B122
Kannada: U+0CDD
Latin extended-D: U+A7C0, U+A7C1, U+A7D0, U+A7D1, U+A7D3, U+A7D5 to U+A7D9 and U+A7F2 to U+A7F4
Mongolian: U+180F
Musical symbols: U+1D1E9 and U+1D1EA
Oriya: U+0B55
Supplemental punctuation: U+2E53 to U+2E5D
Supplemental symbols and U+pictographs: U+1F979 and U+1F9CC
Symbols and U+pictographs extended-A: U+1FA7B, U+1FA7C, U+1FAA9 to U+1FAAC, U+1FAB7 to U+1FABA, U+1FAC3 to U+1FAC5, U+1FAD7 to U+1FAD9, U+1FAE0 to U+1FAE7 and U+1FAF0 to U+1FAF6
Tagalog: U+170D, U+1715 and U+171F
Tangut components U+18AF3 to U+18AFF
Telugu: U+0C3C and U+0C5D
Takri: U+116B9
Transport and U+map symbols: U+1F6DD to U+1F6DF
Vedic extensions U+1CFA
Completely empty blocks
Most of these blocks have bitmap glyphs available in Unifont or Unifont Upper.
Arabic extended B (U+0870...)
Chorasmian (U+10FB0...)
Cypro-Minoan (U+12F90...)
Dives Akuru (U+11900...)
Ethiopic extended-B (U+1E7E0...)
Indic Siyaq numbers (U+1EC71...)
Kana extended-B (U+1AFF0...)
Khitan small script (U+18B00...)
Latin extended-F (U+10780...)
Latin extended-G (U+1DF00...)
Lisu supplement (U+11FB0)
Makasar (U+11EE0...)
Nandinagari (U+119A0...)
Old Uyghur (U+10F70...)
Ottoman Siyaq numbers (U+1ED00...)
Syriac supplement (U+0860...)
Tangsa (U+16A70...)
Tangut supplement (U+18D00...)
Toto (U+1E290...)
Unified Canadian Aboriginal Syllabics extended-A (U+11AB0...)
Vithkuqi (U+10570...)
Znamenny musical notation (U+1CF00...)
Some conscripts in the ConScript Unicode Registry (such as Klingon, Tengwar and Visible speech) have bitmap glyphs available in Unifont CSUR.