CSets: Supplemental Unicode Mapping Tables

This archive is freeware.

The CSets collection is a set of mapping tables between various character sets and Unicode, and is intended to provide mappings not included in most character set conversion tools available today.

The origin of this distribution was several projects that involved text encoded in many obscure character encodings. Many of these encodings are not supported in the most frequently used character set conversion tools (i.e. iconv), so this package was put together to provide the encoding information in a simple, consistent format.

No program is provided to actually do the conversion between characters sets because of the wide variety of text file formats they appear in. It is up to the developer/user to write their own conversion programs using this data.

Individual mapping tables are available by clicking on the file link and complete archives are available as:

Current Mapping Tables - Version 2.1 - 28 May 2008
No.
File Link
Description
Notes
Modified
1 8859-16.TXT Expanded Latin alphabet 10.   Added 11 January 2000.
2 ALTVAR.TXT Alternativnyj Variant Russian.    
3 ARMSCII-7.TXT Armenian Standard Code for Information Interchange 1999, 7-bit encoding for transmission   Added 13 November 2000
4 ARMSCII-8.TXT Armenian Standard Code for Information Interchange 1999, 8-bit encoding for Windows and Unix.   Added 13 November 2000
5 ARMSCII-8A.TXT Armenian Standard Code for Information Interchange 1999, alternative 8-bit encoding for DOS and Macintosh.   Added 13 November 2000
6 AST166-7.TXT Armenian national standard AST166.1997, 7-bit encoding for transmission. ARMSCII-7 is more current.  
7 AST166-8.TXT Armenian national standard AST166.1997, 8-bit encoding for Windows and Unix. ARMSCII-8 is more current.  
8 AST166-A.TXT Armenian national standard AST166.1997, "A" encoding for DOS and MacOS. ARMSCII-8A is more current.  
9 ATEX.TXT ATeX Arabic transliteration.    
10 BRM.TXT Buddhist Relief Mission transliteration encoding for Pali.   Added 25 July 2005
11 CP1133.TXT IBM CP1133 Lao mapping.   Added 06 December 1999.
12 CSCD.TXT Chattha Sangayana CD Pali transliteration encoding.   Added 25 July 2005
13 CSCSX.TXT Classical Sanscrit eXtended transliteration encoding.   Added 25 July 2005
14 CSXPLUS.TXT Classical Sanscrit eXtended Plus transliteration encoding.   Added 25 July 2005
15 DECMCS.TXT DEC Multinational Character Set 1987.    
16 EGAF.TXT EGA Farsi (Persian). Visual encoding.  
17 GEO-ITA.TXT Georgian InfoTech/Academy encoding.    
18 GEO-PS.TXT Georgian Parliament encoding.    
19 GN-LINUX.TXT Linux console Guarani encoding.   Added 22 September 2005.
20 GN-TIMESG.TXT A Times New Roman based variant encoding of Guarani.   Added 18 November 2005.
21 GN-WIN.TXT WIN-GN Guarani encoding for (La)TeX.   Added 22 September 2005.
22 HAMSH.TXT Hamshahri Persian encoding. Visual encoding.  
23 IRANSYSTEM.TXT Common Persian encoding. Visual encoding. Updated 21 January 2000.
24 IRNA.TXT IRNA Persian encoding. Visual encoding.  
25 ISIRI2900.TXT Older Persian encoding. Visual encoding.  
26 ISIRI3342.TXT Mapping actually used in Iran.    
27 ISO002.TXT ISO 646 (IRV) mapping.    
28 ISO006.TXT ISO 646-1991 mapping.   Added 14 November 2000.
29 ISO053.TXT ISO 5426-1980 Extended Latin for Bibliographic use.   Added 03 November 2000.
30 ISOIR111.TXT ISO IR 111/ECMA Cyrillic.   Added 03 November 2000.
31 JAGHBUB.TXT Latin transliteration encoding for Middle Eastern languages.   Added 03 February 2006.
32 KOI8RU.TXT Obsoleted Ukrainian.   Updated 20 December 1999.
33 KOI8U.TXT KOI8 Ukrainian (RFC2319).   Added 20 December 1999.
34 KOI8UNI.TXT Fingertip Software Unified Cyrillic.   Updated 20 December 1999.
35 KZ1048.TXT Khazakh national standard.   Added 14 June 2007.
36 MOZPALI.TXT Pali transliteration encoding.   Added 25 July 2005.
37 MULELAO1.TXT Mule G1 Lao mapping.   Added 06 December 1999.
38 NAVLS.TXT Linguist's Software Laser Navajo mapping.   Added 25 July 2005.
39 NBSC.TXT Nota Bene SerboCroat Latin (partial mapping).    
40 NORMYN.TXT Normyn transliteration encoding for Pali.   Added 25 July 2005.
41 PAFOR1.TXT Foreign1 transliteration encoding for Pali.   Added 25 July 2005.
42 PAKEW.TXT Kew transliteration encoding for Pali.   Added 25 July 2005.
43 PAKH2SKJ.TXT KH2S_KJ transliteration encoding for Pali.   Added 25 July 2005.
44 PALBIT.TXT LeedsBit transliteration encoding for Pali.   Added 25 July 2005.
45 PATRA.TXT Times Roman A transliteration encoding for Pali.   Added 25 July 2005.
46 PAVELT.TXT Velthuis' (La)TeX sequences for Pali.   Added 25 July 2005.
47 PAVRI.TXT VRI transliteration encoding for Pali.   Added 25 July 2005.
48 OSNOVAR.TXT Osnovnoj Variant Russian.    
49 PTCP154.TXT Paratype Cyrillic Asian.   Added 04 May 2005.
50 RISCOS.TXT Acorn RISC OS.   Added 09 May 2003.
51 SEASCII.TXT Stanford Extended ASCII (from RFC 698).    
52 SHIFTGB.TXT Shifted GB2312.1980.   Updated 06 December 1999.
53 SOCNET-C.TXT Cyrillic font encoding used by http://www.serbianorthodoxchurch.net.   Added 27 May 2008.
54 SOCNET-L.TXT Latin font encoding used by http://www.serbianorthodoxchurch.net.   Added 27 May 2008.
55 TEX-CMMI.TXT TeX mapping for the Computer Modern Math Italic fonts.    
56 TEX-CMR.TXT TeX mapping for the Computer Modern Roman fonts.    
57 TEX-CMSY.TXT TeX mapping for the Computer Modern Symbol fonts.    
58 TEX-CMTI.TXT TeX mapping for the Computer Modern Text Italic fonts.    
59 TEX-CMTT.TXT TeX mapping for the Computer Modern Typewriter fonts.    
60 TIS620.TXT TCCII 2533 1009 / TIS 620 Thai.    
61 UCODE.TXT U-Code Russian.    
62 VIQRI.TXT Vietnamese Quoted Readable Implicit.    
63 VISCII.TXT VISCII 1.1 Vietnamese.    
64 VN5712-1.TXT TCVN 5712-1 1993 Vietnamese.    
65 VN5712-2.TXT TCVN 5712-2 1993 Vietnamese.    
66 VNI.TXT VNISoft encoded Vietnamese.   Added 25 July 2005
67 VPS.TXT VPS encoded Vietnamese.   Added 25 July 2005


Other Encodings

The encodings handled by the following packages are provided for cases where a simple mapping table is not sufficient for some reason.


Mapping Table Comparison

This link points to Bruno Haible's comparison of mapping tables from various packages, including CSets. He is also the author of the libiconv package which is the core of the iconv program found on many distributions of Linux and Unix these days.