Differences between revisions 6 and 7
Revision 6 as of 2006-12-31 17:08:28
Size: 7690
Editor: OsamuAoki
Revision 7 as of 2007-01-12 11:43:16
Size: 7702
Editor: OsamuAoki
Deletions are marked like this. Additions are marked like this.
Line 18: Line 18:
    * {{{vim "+e ++enc=gb filename"}}} Chinese (zh_TW, Traditional)     * {{{vim "+e ++enc=gb2314 filename"}}} Chinese (zh_TW, Traditional)
Line 50: Line 50:
  an 10.328.100.140 &File.&Reload\ with\ ++enc\.\.\..&GB(zh_CN)<Tab>fenc=gb :e ++enc=gb<CR>   an 10.328.100.140 &File.&Reload\ with\ ++enc\.\.\..&GB2314(zh_CN)<Tab>fenc=gb2314 :e ++enc=gb<CR>

[:UTF8vim: (Japanese)] [:OsamuAoki: Wiki links]

Method to edit non-UTF-8 encoded text files under UTF-8 vim

vim operated under UTF-8 environment can handle any text data of any known languages. But not all text data are stored in UTF-8 encoding. Thus, in order to edit non-UTF-8 encoded text files, you need to convert file contents using iconv before and after the edit. The choice of encoding used for the conversion can be listed with iconv -l command. (The processing of the encoding method name is case insensitive but differentiates hyphen and underscore.) Since it is quite cumbersome to manually convert encoding in shell, I describe easier method to access encoding conversion via vim.

As a matter of fact, standard vim under UTF-8 environment is set to fileencodings=ucs-bom,utf-8,default,latin1, each encoding in this list is tried in this order during the read process if it succeeds. At last, automatically latin1 is selected but it will not be readable display if chosen encoding is a wrong one. In this case, you can read it correctly by reloading with correct encoding.

Method to edit non-UTF-8 encoded text files from commandline

  • latain1 == isoO8859-1 will be auto selected. Western european languages (en, fr, de, it, es, pt, nl)
  • For others, start editting with vim "+e ++enc=... filename" .

    • vim "+e ++enc=cp932 filename" Japanese (Windows3.1J, SHIFT_JIS)

    • vim "+e ++enc=eucjp filename" Japanese (Unix)

    • vim "+e ++enc=iso-2002-jp filename" Japanese (e-mail)

    • vim "+e ++enc=latin2 filename" Polish (latin2 == iso08859-2)

    • vim "+e ++enc=koi8-r filename" Russian

    • vim "+e ++enc=gb2314 filename" Chinese (zh_TW, Traditional)

    • vim "+e ++enc=big5 filename" Chinese (zh_CN, Simplified)

    • vim "+e ++enc=euckr filename" Korean (Unix)

  • If read with wrong encoding, reload it by pressing [Esc] and :e ++enc=newencoding .

  • For writing back in the original encoding, press [Esc] and :w as usual and overwrite to the original file.

  • For writing back in UTF-8, press [Esc] and :w ++enc=utf-8 newfilename.txt and save to the new file.

Method to edit non-UTF-8 encoded text files via GUI

You can add menu to gvim to cope with non-UTF-8 encoding files by adding following script to ~/.vimrc. You should customize this to your required encodings.

Here, please select Reload with ++enc... for reloading file and Save with ++enc... for save file in a particular encoding.

" Menu:                 Access to old encodings and conversion
" Translated By:        Osamu AOKI  <osamu@debian.org>
" Last Change:          30-Dec-2006.
if has('iconv')
  " Check iconv version
  let support_jisx0213 = (iconv("\x87\x64\x87\x6a", 'cp932', 'euc-jisx0213') ==# "\xad\xc5\xad\xcb") ? 1 : 0

  an 10.328.100.100 &File.&Reload\ with\ ++enc\.\.\..&SJIS<Tab>fenc=cp932 :e ++enc=cp932<CR>
  if !support_jisx0213
    an 10.328.100.110 &File.&Reload\ with\ ++enc\.\.\..EUC&JP<Tab>fenc=euc-jp :e ++enc=euc-jp<CR>
    an 10.328.100.120 &File.&Reload\ with\ ++enc\.\.\..J&IS<Tab>fenc=iso-2022-jp :e ++enc=iso-2022-jp<CR>
    an 10.328.100.110 &File.&Reload\ with\ ++enc\.\.\..EUC&JP<Tab>fenc=euc-jisx0213 :e ++enc=euc-jisx0213<CR>
    an 10.328.100.120 &File.&Reload\ with\ ++enc\.\.\..J&IS<Tab>fenc=iso-2022-jp-3 :e ++enc=iso-2022-jp-3<CR>
  an 10.328.100.130 &File.&Reload\ with\ ++enc\.\.\..EUC&KR<Tab>fenc=euckr :e ++enc=euckr<CR>
  an 10.328.100.140 &File.&Reload\ with\ ++enc\.\.\..&GB2314(zh_CN)<Tab>fenc=gb2314 :e ++enc=gb<CR>
  an 10.328.100.150 &File.&Reload\ with\ ++enc\.\.\..&BIG5(zh_TW)<Tab>fenc=big5 :e ++enc=big5<CR>
  an 10.328.100.200 &File.&Reload\ with\ ++enc\.\.\..-SEPRELOAD1- <Nop>
  an 10.328.100.201 &File.&Reload\ with\ ++enc\.\.\..latin&1<Tab>fenc=latin1 :e ++enc=latin1<CR>
  an 10.328.100.202 &File.&Reload\ with\ ++enc\.\.\..latin&2<Tab>fenc=latin2 :e ++enc=latin2<CR>
  an 10.328.100.203 &File.&Reload\ with\ ++enc\.\.\..latin&3<Tab>fenc=latin3 :e ++enc=latin3<CR>
  an 10.328.100.204 &File.&Reload\ with\ ++enc\.\.\..latin&4<Tab>fenc=latin4 :e ++enc=latin4<CR>
  an 10.328.100.205 &File.&Reload\ with\ ++enc\.\.\..latin&5<Tab>fenc=latin5 :e ++enc=latin5<CR>
  an 10.328.100.206 &File.&Reload\ with\ ++enc\.\.\..latin&6<Tab>fenc=latin6 :e ++enc=latin6<CR>
  an 10.328.100.207 &File.&Reload\ with\ ++enc\.\.\..latin&7<Tab>fenc=latin7 :e ++enc=latin7<CR>
  an 10.328.100.208 &File.&Reload\ with\ ++enc\.\.\..latin&8<Tab>fenc=latin8 :e ++enc=latin8<CR>
  an 10.328.100.209 &File.&Reload\ with\ ++enc\.\.\..latin&9<Tab>fenc=latin9 :e ++enc=latin9<CR>
  an 10.328.100.210 &File.&Reload\ with\ ++enc\.\.\..latin1&0<Tab>fenc=latin10 :e ++enc=latin10<CR>
  an 10.328.100.800 &File.&Reload\ with\ ++enc\.\.\..-SEPRELOAD2- <Nop>
  an 10.328.100.900 &File.&Reload\ with\ ++enc\.\.\..&UTF-8<Tab>fenc=utf-8 :e ++enc=utf-8<CR>

  " Save with ++enc as ...
  an 10.360.120.100 &File.&Save\ with\ ++enc\.\.\..&SJIS<Tab>fenc=cp932 :browse confirm saveas ++enc=cp932<CR>
  if !support_jisx0213
    an 10.360.120.110 &File.&Save\ with\ ++enc\.\.\..EUC&JP<Tab>fenc=euc-jp :browse confirm saveas ++enc=euc-jp<CR>
    an 10.360.120.120 &File.&Save\ with\ ++enc\.\.\..J&IS<Tab>fenc=iso-2022-jp :browse confirm saveas ++enc=iso-2022-jp<CR>
    an 10.360.120.110 &File.&Save\ with\ ++enc\.\.\..EUC&JP<Tab>fenc=euc-jisx0213 :browse confirm saveas ++enc=euc-jisx0213<CR>
    an 10.360.120.120 &File.&Save\ with\ ++enc\.\.\..J&IS<Tab>fenc=iso-2022-jp-3 :browse confirm saveas ++enc=iso-2022-jp-3<CR>
  an 10.360.120.130 &File.&Save\ with\ ++enc\.\.\..EUC&KR<Tab>fenc=euckr :browse confirm saveas ++enc=euck<CR>
  an 10.360.120.140 &File.&Save\ with\ ++enc\.\.\..&GB(zh_CN)<Tab>fenc=gb :browse confirm saveas ++enc=gb<CR>
  an 10.360.120.150 &File.&Save\ with\ ++enc\.\.\..&BIG5(zh_TW)<Tab>fenc=big5 :browse confirm saveas ++enc=big5<CR>
  an 10.360.120.200 &File.&Save\ with\ ++enc\.\.\..-SEPSAVE1- <Nop>
  an 10.360.120.201 &File.&Save\ with\ ++enc\.\.\..latin&1<Tab>fenc=latin1 :browse confirm saveas ++enc=latin1<CR>
  an 10.360.120.202 &File.&Save\ with\ ++enc\.\.\..latin&2<Tab>fenc=latin2 :browse confirm saveas ++enc=latin2<CR>
  an 10.360.120.203 &File.&Save\ with\ ++enc\.\.\..latin&3<Tab>fenc=latin3 :browse confirm saveas ++enc=latin3<CR>
  an 10.360.120.204 &File.&Save\ with\ ++enc\.\.\..latin&4<Tab>fenc=latin4 :browse confirm saveas ++enc=latin4<CR>
  an 10.360.120.205 &File.&Save\ with\ ++enc\.\.\..latin&5<Tab>fenc=latin5 :browse confirm saveas ++enc=latin5<CR>
  an 10.360.120.206 &File.&Save\ with\ ++enc\.\.\..latin&6<Tab>fenc=latin6 :browse confirm saveas ++enc=latin6<CR>
  an 10.360.120.207 &File.&Save\ with\ ++enc\.\.\..latin&7<Tab>fenc=latin7 :browse confirm saveas ++enc=latin7<CR>
  an 10.360.120.208 &File.&Save\ with\ ++enc\.\.\..latin&8<Tab>fenc=latin8 :browse confirm saveas ++enc=latin8<CR>
  an 10.360.120.209 &File.&Save\ with\ ++enc\.\.\..latin&9<Tab>fenc=latin9 :browse confirm saveas ++enc=latin9<CR>
  an 10.360.120.210 &File.&Save\ with\ ++enc\.\.\..latin1&0<Tab>fenc=latin10 :browse confirm saveas ++enc=latin10<CR>
  an 10.360.120.800 &File.&Save\ with\ ++enc\.\.\..-SEPSAVE2- <Nop>
  an 10.360.120.900 &File.&Save\ with\ ++enc\.\.\..&UTF-8<Tab>fenc=utf-8 :browse confirm saveas ++enc=utf-8<CR>

" filler to avoid the line above being recognized as a modeline
" filler
" filler
" filler

Input characters not found in the keyboad

See [:JapaneseEnvironmentE: (English)] [:JapaneseEnvironment: (Japanese)] as example.