Add Apple Unicode mapping data

Mirrored from ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE
2025-02-22 09:28:58 +00:00 · 2021-12-14 15:50:04 -05:00 · 2021-12-14 15:50:04 -05:00 · d2401b963a
commit d2401b963a
parent db4187b65b
29 changed files with 49398 additions and 0 deletions
--- a/charmap/ARABIC.TXT
+++ b/charmap/ARABIC.TXT
@ -0,0 +1,536 @@
+#=======================================================================
+#   File name:  ARABIC.TXT
+#
+#   Contents:   Map (external version) from Mac OS Arabic
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1994-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-04    Update header comments. Matches internal xml
+#                           <c1.2> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Add comments about character display and
+#                           direction overrides. Update URLs, notes.
+#                           Matches internal utom<b4>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n10  1998-Feb-05    Show required Unicode character
+#                           directionality in a different way. Matches
+#                           internal utom<n4>, ufrm<n21>, and Text
+#                           Encoding Converter version 1.3. Update
+#                           header comments; include information on
+#                           loose mapping of digits.
+#       n07  1997-Jul-17    Update to match internal utom<n2>, ufrm<n17>:
+#                           Change standard mapping for 0xC0 from U+066D
+#                           to U+274A. Add direction overrides to
+#                           mappings for 0x25, 0x2C, 0x3B, 0x3F. Add
+#                           information on variants.
+#       n03  1995-Apr-18    First version (after fixing some typos).
+#                           Matches internal ufrm<n11>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect, 
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Arabic code (in hex as 0xNN).
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN),
+#       possibly preceded by a tag indicating required directionality
+#       (i.e. <LR>+0xNNNN or <RL>+0xNNNN).
+#     Column #3 is a comment containing the Unicode name.
+#
+#   The entries are in Mac OS Arabic code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Arabic character set uses the standard control characters at
+#   0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Arabic:
+# -----------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   1. General
+#
+#   The Mac OS Arabic character set is intended to cover Arabic as
+#   used in North Africa, the Arabian peninsula, and the Levant. It
+#   also contains several characters needed for Urdu and/or Farsi.
+#
+#   The Mac OS Arabic character set is essentially a superset of ISO
+#   8859-6. The 8859-6 code points that are interpreted differently
+#   in the Mac OS Arabic set are as follows:
+#    0xA0 is NO-BREAK SPACE in 8859-6 and right-left SPACE in Mac OS
+#         Arabic; NO-BREAK is 0x81 in Mac OS Arabic.
+#    0xA4 is CURRENCY SIGN in 8859-6 and right-left DOLLAR SIGN in
+#         Mac OS Arabic.
+#    0xAD is SOFT HYPHEN in 8859-6 and right-left HYPHEN-MINUS in
+#         Mac OS Arabic.
+#   ISO 8859-6 specifies that codes 0x30-0x39 can be rendered either
+#   with European digit shapes or Arabic digit shapes. This is also
+#   true in Mac OS Arabic, which determines from context which digit
+#   shapes to use (see below).
+#
+#   The Mac OS Arabic character set uses the C1 controls area and other
+#   code points which are undefined in ISO 8859-6 for additional
+#   graphic characters: additional Arabic letters for Farsi and Urdu,
+#   some accented Roman letters for European languages (such as French),
+#   and duplicates of some of the punctuation, symbols, and digits in
+#   the ASCII block. The duplicate punctuation, symbol, and digit
+#   characters have right-left directionality, while the ASCII versions
+#   have left-right directionality. See the next section for more
+#   information on this.
+#
+#   Mac OS Arabic characters 0xEB-0xF2 are non-spacing/combining marks.
+#
+#   2. Directional characters and roundtrip fidelity
+#
+#   The Mac OS Arabic character set was developed in 1986-1987. At that
+#   time the bidirectional line layout algorithm used in the Mac OS
+#   Arabic system was fairly simple; it used only a few direction
+#   classes (instead of the 19 now used in the Unicode bidirectional
+#   algorithm). In order to permit users to handle some tricky layout
+#   problems, certain punctuation and symbol characters were encoded
+#   twice, one with a left-right direction attribute and the other with
+#   a right-left direction attribute.
+#
+#   For example, plus sign is encoded at 0x2B with a left-right
+#   attribute, and at 0xAB with a right-left attribute. However, there
+#   is only one PLUS SIGN character in Unicode. This leads to some
+#   interesting problems when mapping between Mac OS Arabic and Unicode;
+#   see below.
+#
+#   A related problem is that even when a particular character is
+#   encoded only once in Mac OS Arabic, it may have a different
+#   direction attribute than the corresponding Unicode character.
+#
+#   For example, the Mac OS Arabic character at 0x93 is HORIZONTAL
+#   ELLIPSIS with strong right-left direction. However, the Unicode
+#   character HORIZONTAL ELLIPSIS has direction class neutral.
+#
+#   3. Behavior of ASCII-range numbers in WorldScript
+#
+#   Mac OS Arabic also has two sets of digit codes.
+#
+#   The digits at 0x30-0x39 may be displayed using either European
+#   digit forms or Arabic digit forms, depending on context. If there
+#   is a "strong European" character such as a Latin letter on either
+#   side of a sequence consisting of digits 0x30-0x39 and possibly comma
+#   0x2C or period 0x2E, then the characters will be displayed using
+#   European forms (This will happen even if there are neutral characters
+#   between the digits and the strong European character). Otherwise, the
+#   digits will be displayed using Arabic forms, the comma will be
+#   displayed as Arabic thousands separator, and the period as Arabic
+#   decimal separator. In any case, 0x2C, 0x2E, and 0x30-0x39 are always
+#   left-right.
+#
+#   The digits at 0xB0-0xB9 are always displayed using Arabic digit
+#   shapes, and moreover, these digits always have strong right-left
+#   directionality. These are mainly intended for special layout
+#   purposes such as part numbers, etc.
+#
+#   4. Font variants
+#
+#   The table in this file gives the Unicode mappings for the standard
+#   Mac OS Arabic encoding. This encoding is supported by the Cairo font
+#   (the system font for Arabic), and is the encoding supported by the
+#   text processing utilities. However, the other Arabic fonts actually
+#   implement slightly different encodings; this mainly affects the code
+#   points 0xAA and 0xC0. For these code points the standard Mac OS
+#   Arabic encoding has the following mappings:
+#     0xAA -> <RL>+0x002A ASTERISK, right-left
+#     0xC0 -> <RL>+0x274A EIGHT TEARDROP-SPOKED PROPELLER ASTERISK,
+#                         right-left
+#   This mapping of 0xAA is consistent with the normal convention for
+#   Mac OS Arabic and Hebrew that the right-left duplicates have codes
+#   that are equal to the ASCII code of the left-right character plus
+#   0x80. However, in all of the other fonts, 0xAA is MULTIPLY SIGN, and
+#   right-left ASTERISK may be at a different code point. The other
+#   variants are described below.
+#
+#   The TrueType variant is used for most of the Arabic TrueType fonts:
+#   Baghdad, Geeza, Kufi, Nadeem.  It differs from the standard variant
+#   in the following way:
+#     0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
+#     0xC0 -> <RL>+0x002A ASTERISK, right-left
+#
+#   The Thuluth variant is used for the Arabic Postscript-only fonts:
+#   Thuluth and Thuluth bold. It differs from the standard variant in
+#   the following way:
+#     0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
+#     0xC0 -> 0x066D ARABIC FIVE POINTED STAR
+#
+#   The AlBayan variant is used for the Arabic TrueType font Al Bayan.
+#   It differs from the standard variant in the following way:
+#     0x81 -> no mapping (glyph just has authorship information, etc.)
+#     0xA3 -> 0xFDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
+#     0xA4 -> 0xFDF2 ARABIC LIGATURE ALLAH ISOLATED FORM
+#     0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
+#     0xDC -> <RL>+0x25CF BLACK CIRCLE, right-left
+#     0xFC -> <RL>+0x25A0 BLACK SQUARE, right-left
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   1. Matching the direction of Mac OS Arabic characters
+#
+#   When Mac OS Arabic encodes a character twice but with different
+#   direction attributes for the two code points - as in the case of
+#   plus sign mentioned above - we need a way to map both Mac OS Arabic
+#   code points to Unicode and back again without loss of information.
+#   With the plus sign, for example, mapping one of the Mac OS Arabic
+#   characters to a code in the Unicode corporate use zone is
+#   undesirable, since both of the plus sign characters are likely to
+#   be used in text that is interchanged.
+#
+#   The problem is solved with the use of direction override characters
+#   and direction-dependent mappings. When mapping from Mac OS Arabic
+#   to Unicode, we use direction overrides as necessary to force the
+#   direction of the resulting Unicode characters.
+#
+#   The required direction is indicated by a direction tag in the
+#   mappings. A tag of <LR> means the corresponding Unicode character
+#   must have a strong left-right context, and a tag of <RL> indicates
+#   a right-left context.
+#
+#   For example, the mapping of 0x2B is given as <LR>+0x002B; the
+#   mapping of 0xAB is given as <RL>+0x002B. If we map an isolated
+#   instance of 0x2B to Unicode, it should be mapped as follows (LRO
+#   indicates LEFT-RIGHT OVERRIDE, PDF indicates POP DIRECTION
+#   FORMATTING):
+#
+#     0x2B ->  0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
+#
+#   When mapping several characters in a row that require direction
+#   forcing, the overrides need only be used at the beginning and end.
+#   For example:
+#
+#     0x24 0x20 0x28 0x29 -> 0x202D 0x0024 0x0020 0x0028 0x0029 0x202C
+#
+#   If neutral characters that require direction forcing are already
+#   between strong-direction characters with matching directionality,
+#   then direction overrides need not be used. Direction overrides are
+#   always needed to map the right-left digits at 0xB0-0xB9.
+#
+#   When mapping from Unicode to Mac OS Arabic, the Unicode
+#   bidirectional algorithm should be used to determine resolved
+#   direction of the Unicode characters. The mapping from Unicode to
+#   Mac OS Arabic can then be disambiguated by the use of the resolved
+#   direction:
+#
+#     Unicode 0x002B -> Mac OS Arabic 0x2B (if L) or 0xAB (if R)
+#
+#   However, this also means the direction override characters should
+#   be discarded when mapping from Unicode to Mac OS Arabic (after
+#   they have been used to determine resolved direction), since the
+#   direction override information is carried by the code point itself.
+#
+#   Even when direction overrides are not needed for roundtrip
+#   fidelity, they are sometimes used when mapping Mac OS Arabic
+#   characters to Unicode in order to achieve similar text layout with
+#   the resulting Unicode text. For example, the single Mac OS Arabic
+#   ellipsis character has direction class right-left,and there is no
+#   left-right version. However, the Unicode HORIZONTAL ELLIPSIS
+#   character has direction class neutral (which means it may end up
+#   with a resolved direction of left-right if surrounded by left-right
+#   characters). When mapping the Mac OS Arabic ellipsis to Unicode, it
+#   is surrounded with a direction override to help preserve proper
+#   text layout. The resolved direction is not needed or used when
+#   mapping the Unicode HORIZONTAL ELLIPSIS back to Mac OS Arabic.
+#
+#   2. Mapping the Mac OS Arabic digits
+#
+#   The main table below contains mappings that should be used when
+#   strict round-trip fidelity is required. However, for numeric
+#   values, the mappings in that table will produce Unicode characters
+#   that may appear different than the Mac OS Arabic text displayed on
+#   a Mac OS system using WorldScript. This is because WorldScript
+#   uses context-dependent display for the 0x30-0x39 digits.
+#
+#   If roundtrip fidelity is not required, then the following
+#   alternate mappings should be used when a sequence of 0x30-0x39
+#   digits - possibly including 0x2C and 0x2E - occurs in an Arabic
+#   context (that is, when the first "strong" character on either side
+#   of the digit sequence is Arabic, or there is no strong character):
+#
+#     0x2C	0x066C	# ARABIC THOUSANDS SEPARATOR
+#     0x2E	0x066B	# ARABIC DECIMAL SEPARATOR
+#     0x30	0x0660	# ARABIC-INDIC DIGIT ZERO
+#     0x31	0x0661	# ARABIC-INDIC DIGIT ONE
+#     0x32	0x0662	# ARABIC-INDIC DIGIT TWO
+#     0x33	0x0663	# ARABIC-INDIC DIGIT THREE
+#     0x34	0x0664	# ARABIC-INDIC DIGIT FOUR
+#     0x35	0x0665	# ARABIC-INDIC DIGIT FIVE
+#     0x36	0x0666	# ARABIC-INDIC DIGIT SIX
+#     0x37	0x0667	# ARABIC-INDIC DIGIT SEVEN
+#     0x38	0x0668	# ARABIC-INDIC DIGIT EIGHT
+#     0x39	0x0669	# ARABIC-INDIC DIGIT NINE
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version n03 to version n07:
+#
+#   - Change mapping for 0xC0 from U+066D to U+274A.
+#
+#   - Add direction overrides (required directionality) to mappings
+#     for 0x25, 0x2C, 0x3B, 0x3F.
+#
+##################
+
+0x20	<LR>+0x0020	# SPACE, left-right
+0x21	<LR>+0x0021	# EXCLAMATION MARK, left-right
+0x22	<LR>+0x0022	# QUOTATION MARK, left-right
+0x23	<LR>+0x0023	# NUMBER SIGN, left-right
+0x24	<LR>+0x0024	# DOLLAR SIGN, left-right
+0x25	<LR>+0x0025	# PERCENT SIGN, left-right
+0x26	<LR>+0x0026	# AMPERSAND, left-right
+0x27	<LR>+0x0027	# APOSTROPHE, left-right
+0x28	<LR>+0x0028	# LEFT PARENTHESIS, left-right
+0x29	<LR>+0x0029	# RIGHT PARENTHESIS, left-right
+0x2A	<LR>+0x002A	# ASTERISK, left-right
+0x2B	<LR>+0x002B	# PLUS SIGN, left-right
+0x2C	<LR>+0x002C	# COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
+0x2D	<LR>+0x002D	# HYPHEN-MINUS, left-right
+0x2E	<LR>+0x002E	# FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
+0x2F	<LR>+0x002F	# SOLIDUS, left-right
+0x30	0x0030	# DIGIT ZERO;  in Arabic-script context, displayed as 0x0660 ARABIC-INDIC DIGIT ZERO
+0x31	0x0031	# DIGIT ONE;   in Arabic-script context, displayed as 0x0661 ARABIC-INDIC DIGIT ONE
+0x32	0x0032	# DIGIT TWO;   in Arabic-script context, displayed as 0x0662 ARABIC-INDIC DIGIT TWO
+0x33	0x0033	# DIGIT THREE; in Arabic-script context, displayed as 0x0663 ARABIC-INDIC DIGIT THREE
+0x34	0x0034	# DIGIT FOUR;  in Arabic-script context, displayed as 0x0664 ARABIC-INDIC DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE;  in Arabic-script context, displayed as 0x0665 ARABIC-INDIC DIGIT FIVE
+0x36	0x0036	# DIGIT SIX;   in Arabic-script context, displayed as 0x0666 ARABIC-INDIC DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN; in Arabic-script context, displayed as 0x0667 ARABIC-INDIC DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT; in Arabic-script context, displayed as 0x0668 ARABIC-INDIC DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE;  in Arabic-script context, displayed as 0x0669 ARABIC-INDIC DIGIT NINE
+0x3A	<LR>+0x003A	# COLON, left-right
+0x3B	<LR>+0x003B	# SEMICOLON, left-right
+0x3C	<LR>+0x003C	# LESS-THAN SIGN, left-right
+0x3D	<LR>+0x003D	# EQUALS SIGN, left-right
+0x3E	<LR>+0x003E	# GREATER-THAN SIGN, left-right
+0x3F	<LR>+0x003F	# QUESTION MARK, left-right
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	<LR>+0x005B	# LEFT SQUARE BRACKET, left-right
+0x5C	<LR>+0x005C	# REVERSE SOLIDUS, left-right
+0x5D	<LR>+0x005D	# RIGHT SQUARE BRACKET, left-right
+0x5E	<LR>+0x005E	# CIRCUMFLEX ACCENT, left-right
+0x5F	<LR>+0x005F	# LOW LINE, left-right
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	<LR>+0x007B	# LEFT CURLY BRACKET, left-right
+0x7C	<LR>+0x007C	# VERTICAL LINE, left-right
+0x7D	<LR>+0x007D	# RIGHT CURLY BRACKET, left-right
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	<RL>+0x00A0	# NO-BREAK SPACE, right-left
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x06BA	# ARABIC LETTER NOON GHUNNA
+0x8C	<RL>+0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	<RL>+0x2026	# HORIZONTAL ELLIPSIS, right-left
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	<RL>+0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	<RL>+0x00F7	# DIVISION SIGN, right-left
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	<RL>+0x0020	# SPACE, right-left
+0xA1	<RL>+0x0021	# EXCLAMATION MARK, right-left
+0xA2	<RL>+0x0022	# QUOTATION MARK, right-left
+0xA3	<RL>+0x0023	# NUMBER SIGN, right-left
+0xA4	<RL>+0x0024	# DOLLAR SIGN, right-left
+0xA5	0x066A	# ARABIC PERCENT SIGN
+0xA6	<RL>+0x0026	# AMPERSAND, right-left
+0xA7	<RL>+0x0027	# APOSTROPHE, right-left
+0xA8	<RL>+0x0028	# LEFT PARENTHESIS, right-left
+0xA9	<RL>+0x0029	# RIGHT PARENTHESIS, right-left
+0xAA	<RL>+0x002A	# ASTERISK, right-left
+0xAB	<RL>+0x002B	# PLUS SIGN, right-left
+0xAC	0x060C	# ARABIC COMMA
+0xAD	<RL>+0x002D	# HYPHEN-MINUS, right-left
+0xAE	<RL>+0x002E	# FULL STOP, right-left
+0xAF	<RL>+0x002F	# SOLIDUS, right-left
+0xB0	<RL>+0x0660	# ARABIC-INDIC DIGIT ZERO, right-left (need override)
+0xB1	<RL>+0x0661	# ARABIC-INDIC DIGIT ONE, right-left (need override)
+0xB2	<RL>+0x0662	# ARABIC-INDIC DIGIT TWO, right-left (need override)
+0xB3	<RL>+0x0663	# ARABIC-INDIC DIGIT THREE, right-left (need override)
+0xB4	<RL>+0x0664	# ARABIC-INDIC DIGIT FOUR, right-left (need override)
+0xB5	<RL>+0x0665	# ARABIC-INDIC DIGIT FIVE, right-left (need override)
+0xB6	<RL>+0x0666	# ARABIC-INDIC DIGIT SIX, right-left (need override)
+0xB7	<RL>+0x0667	# ARABIC-INDIC DIGIT SEVEN, right-left (need override)
+0xB8	<RL>+0x0668	# ARABIC-INDIC DIGIT EIGHT, right-left (need override)
+0xB9	<RL>+0x0669	# ARABIC-INDIC DIGIT NINE, right-left (need override)
+0xBA	<RL>+0x003A	# COLON, right-left
+0xBB	0x061B	# ARABIC SEMICOLON
+0xBC	<RL>+0x003C	# LESS-THAN SIGN, right-left
+0xBD	<RL>+0x003D	# EQUALS SIGN, right-left
+0xBE	<RL>+0x003E	# GREATER-THAN SIGN, right-left
+0xBF	0x061F	# ARABIC QUESTION MARK
+0xC0	<RL>+0x274A	# EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
+0xC1	0x0621	# ARABIC LETTER HAMZA
+0xC2	0x0622	# ARABIC LETTER ALEF WITH MADDA ABOVE
+0xC3	0x0623	# ARABIC LETTER ALEF WITH HAMZA ABOVE
+0xC4	0x0624	# ARABIC LETTER WAW WITH HAMZA ABOVE
+0xC5	0x0625	# ARABIC LETTER ALEF WITH HAMZA BELOW
+0xC6	0x0626	# ARABIC LETTER YEH WITH HAMZA ABOVE
+0xC7	0x0627	# ARABIC LETTER ALEF
+0xC8	0x0628	# ARABIC LETTER BEH
+0xC9	0x0629	# ARABIC LETTER TEH MARBUTA
+0xCA	0x062A	# ARABIC LETTER TEH
+0xCB	0x062B	# ARABIC LETTER THEH
+0xCC	0x062C	# ARABIC LETTER JEEM
+0xCD	0x062D	# ARABIC LETTER HAH
+0xCE	0x062E	# ARABIC LETTER KHAH
+0xCF	0x062F	# ARABIC LETTER DAL
+0xD0	0x0630	# ARABIC LETTER THAL
+0xD1	0x0631	# ARABIC LETTER REH
+0xD2	0x0632	# ARABIC LETTER ZAIN
+0xD3	0x0633	# ARABIC LETTER SEEN
+0xD4	0x0634	# ARABIC LETTER SHEEN
+0xD5	0x0635	# ARABIC LETTER SAD
+0xD6	0x0636	# ARABIC LETTER DAD
+0xD7	0x0637	# ARABIC LETTER TAH
+0xD8	0x0638	# ARABIC LETTER ZAH
+0xD9	0x0639	# ARABIC LETTER AIN
+0xDA	0x063A	# ARABIC LETTER GHAIN
+0xDB	<RL>+0x005B	# LEFT SQUARE BRACKET, right-left
+0xDC	<RL>+0x005C	# REVERSE SOLIDUS, right-left
+0xDD	<RL>+0x005D	# RIGHT SQUARE BRACKET, right-left
+0xDE	<RL>+0x005E	# CIRCUMFLEX ACCENT, right-left
+0xDF	<RL>+0x005F	# LOW LINE, right-left
+0xE0	0x0640	# ARABIC TATWEEL
+0xE1	0x0641	# ARABIC LETTER FEH
+0xE2	0x0642	# ARABIC LETTER QAF
+0xE3	0x0643	# ARABIC LETTER KAF
+0xE4	0x0644	# ARABIC LETTER LAM
+0xE5	0x0645	# ARABIC LETTER MEEM
+0xE6	0x0646	# ARABIC LETTER NOON
+0xE7	0x0647	# ARABIC LETTER HEH
+0xE8	0x0648	# ARABIC LETTER WAW
+0xE9	0x0649	# ARABIC LETTER ALEF MAKSURA
+0xEA	0x064A	# ARABIC LETTER YEH
+0xEB	0x064B	# ARABIC FATHATAN
+0xEC	0x064C	# ARABIC DAMMATAN
+0xED	0x064D	# ARABIC KASRATAN
+0xEE	0x064E	# ARABIC FATHA
+0xEF	0x064F	# ARABIC DAMMA
+0xF0	0x0650	# ARABIC KASRA
+0xF1	0x0651	# ARABIC SHADDA
+0xF2	0x0652	# ARABIC SUKUN
+0xF3	0x067E	# ARABIC LETTER PEH
+0xF4	0x0679	# ARABIC LETTER TTEH
+0xF5	0x0686	# ARABIC LETTER TCHEH
+0xF6	0x06D5	# ARABIC LETTER AE
+0xF7	0x06A4	# ARABIC LETTER VEH
+0xF8	0x06AF	# ARABIC LETTER GAF
+0xF9	0x0688	# ARABIC LETTER DDAL
+0xFA	0x0691	# ARABIC LETTER RREH
+0xFB	<RL>+0x007B	# LEFT CURLY BRACKET, right-left
+0xFC	<RL>+0x007C	# VERTICAL LINE, right-left
+0xFD	<RL>+0x007D	# RIGHT CURLY BRACKET, right-left
+0xFE	0x0698	# ARABIC LETTER JEH
+0xFF	0x06D2	# ARABIC LETTER YEH BARREE
--- a/charmap/CELTIC.TXT
+++ b/charmap/CELTIC.TXT
@ -0,0 +1,328 @@
+#=======================================================================
+#   File name:  CELTIC.TXT
+#
+#   Contents:   Map (external version) from Mac OS Celtic
+#               character set to Unicode 2.1 and later
+#
+#   Contacts:   charsets@apple.com, everson@evertype.com
+#
+#   Changes:
+#
+#       c01  2005-Apr-01    First posted version. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Celtic code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Celtic code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Celtic character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Celtic (partly from Michael Everson):
+# -----------------------------------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   This character set was developed by Michael Everson of Everson
+#   Typography (everson@evertype.com) and was used for the Irish
+#   localizations of Mac OS 6.0.8 and 7.1, for the Welsh localization of
+#   Mac OS 7.1, and for several fonts that can be used on any version of
+#   Mac OS 7.1 or later. Note that while Apple authorized
+#   the Irish and Welsh localizations mentioned above, they were not
+#   systems which shipped with Apple hardware, and were not otherwise
+#   supported by Apple. Fonts conforming to the Mac OS Celtic character
+#   set are available from Everson Typography (http://www.evertype.com)
+#   and MEU Cymru (http://www.meucymru.co.uk). Information about the use
+#   of this character set is available at
+#   http://www.evertype.com/celtscript/celtcode.html.
+#
+#   The Mac OS Celtic encoding shares the script code smRoman (0) with
+#   the standard Mac OS Roman encoding. To determine if the Celtic
+#   encoding is being used in Mac OS 7-9, you should also check if the
+#   system region code is 50, verIreland, or 79, verWales. Otherwise,
+#   you can check for particular fonts that conform to this encoding.
+#
+#   This character set is a variant of standard Mac OS Roman, adding
+#   capital and small y with acute, grave, and circumflex, and capital
+#   and small w with acute, grave, circumflex and diaeresis. It has 14
+#   code point differences from standard Mac OS Roman (0xDE, 0xDF, 0xE2,
+#   0xE3, 0xF6-0xFF).
+#
+#   Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
+#   mapped to U+00A4. In Mac OS 8.5 and later versions, code point
+#   0xDB is changed to EURO SIGN and maps to U+20AC; the standard
+#   Apple fonts were updated for Mac OS 8.5 to reflect this. There is
+#   a "currency sign" variant of the Mac OS Celtic encoding that still
+#   maps 0xDB to U+00A4; this can be used for older fonts.
+#   Note: U+20AC is new with Unicode 2.1; for earlier Unicode
+#   versions, Mac OS Celtic 0xDB may be mapped to private-use
+#   character U+F8A0.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x00C5	# LATIN CAPITAL LETTER A WITH RING ABOVE
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x00E3	# LATIN SMALL LETTER A WITH TILDE
+0x8C	0x00E5	# LATIN SMALL LETTER A WITH RING ABOVE
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	0x00EC	# LATIN SMALL LETTER I WITH GRAVE
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	0x00F2	# LATIN SMALL LETTER O WITH GRAVE
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00F5	# LATIN SMALL LETTER O WITH TILDE
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	0x2020	# DAGGER
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x00A2	# CENT SIGN
+0xA3	0x00A3	# POUND SIGN
+0xA4	0x00A7	# SECTION SIGN
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x00DF	# LATIN SMALL LETTER SHARP S
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x00B4	# ACUTE ACCENT
+0xAC	0x00A8	# DIAERESIS
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x00C6	# LATIN CAPITAL LETTER AE
+0xAF	0x00D8	# LATIN CAPITAL LETTER O WITH STROKE
+0xB0	0x221E	# INFINITY
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x00A5	# YEN SIGN
+0xB5	0x00B5	# MICRO SIGN
+0xB6	0x2202	# PARTIAL DIFFERENTIAL
+0xB7	0x2211	# N-ARY SUMMATION
+0xB8	0x220F	# N-ARY PRODUCT
+0xB9	0x03C0	# GREEK SMALL LETTER PI
+0xBA	0x222B	# INTEGRAL
+0xBB	0x00AA	# FEMININE ORDINAL INDICATOR
+0xBC	0x00BA	# MASCULINE ORDINAL INDICATOR
+0xBD	0x03A9	# GREEK CAPITAL LETTER OMEGA
+0xBE	0x00E6	# LATIN SMALL LETTER AE
+0xBF	0x00F8	# LATIN SMALL LETTER O WITH STROKE
+0xC0	0x00BF	# INVERTED QUESTION MARK
+0xC1	0x00A1	# INVERTED EXCLAMATION MARK
+0xC2	0x00AC	# NOT SIGN
+0xC3	0x221A	# SQUARE ROOT
+0xC4	0x0192	# LATIN SMALL LETTER F WITH HOOK
+0xC5	0x2248	# ALMOST EQUAL TO
+0xC6	0x2206	# INCREMENT
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x00C0	# LATIN CAPITAL LETTER A WITH GRAVE
+0xCC	0x00C3	# LATIN CAPITAL LETTER A WITH TILDE
+0xCD	0x00D5	# LATIN CAPITAL LETTER O WITH TILDE
+0xCE	0x0152	# LATIN CAPITAL LIGATURE OE
+0xCF	0x0153	# LATIN SMALL LIGATURE OE
+0xD0	0x2013	# EN DASH
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x00F7	# DIVISION SIGN
+0xD7	0x25CA	# LOZENGE
+0xD8	0x00FF	# LATIN SMALL LETTER Y WITH DIAERESIS
+0xD9	0x0178	# LATIN CAPITAL LETTER Y WITH DIAERESIS
+0xDA	0x2044	# FRACTION SLASH
+0xDB	0x20AC	# EURO SIGN # before Mac OS 8.5 this was U+00A4 CURRENCY SIGN
+0xDC	0x2039	# SINGLE LEFT-POINTING ANGLE QUOTATION MARK
+0xDD	0x203A	# SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
+0xDE	0x0176	# LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
+0xDF	0x0177	# LATIN SMALL LETTER Y WITH CIRCUMFLEX
+0xE0	0x2021	# DOUBLE DAGGER
+0xE1	0x00B7	# MIDDLE DOT
+0xE2	0x1EF2	# LATIN CAPITAL LETTER Y WITH GRAVE
+0xE3	0x1EF3	# LATIN SMALL LETTER Y WITH GRAVE
+0xE4	0x2030	# PER MILLE SIGN
+0xE5	0x00C2	# LATIN CAPITAL LETTER A WITH CIRCUMFLEX
+0xE6	0x00CA	# LATIN CAPITAL LETTER E WITH CIRCUMFLEX
+0xE7	0x00C1	# LATIN CAPITAL LETTER A WITH ACUTE
+0xE8	0x00CB	# LATIN CAPITAL LETTER E WITH DIAERESIS
+0xE9	0x00C8	# LATIN CAPITAL LETTER E WITH GRAVE
+0xEA	0x00CD	# LATIN CAPITAL LETTER I WITH ACUTE
+0xEB	0x00CE	# LATIN CAPITAL LETTER I WITH CIRCUMFLEX
+0xEC	0x00CF	# LATIN CAPITAL LETTER I WITH DIAERESIS
+0xED	0x00CC	# LATIN CAPITAL LETTER I WITH GRAVE
+0xEE	0x00D3	# LATIN CAPITAL LETTER O WITH ACUTE
+0xEF	0x00D4	# LATIN CAPITAL LETTER O WITH CIRCUMFLEX
+0xF0	0x2663	# BLACK CLUB SUIT = shamrock # future mapping U+2618 SHAMROCK
+0xF1	0x00D2	# LATIN CAPITAL LETTER O WITH GRAVE
+0xF2	0x00DA	# LATIN CAPITAL LETTER U WITH ACUTE
+0xF3	0x00DB	# LATIN CAPITAL LETTER U WITH CIRCUMFLEX
+0xF4	0x00D9	# LATIN CAPITAL LETTER U WITH GRAVE
+0xF5	0x0131	# LATIN SMALL LETTER DOTLESS I
+0xF6	0x00DD	# LATIN CAPITAL LETTER Y WITH ACUTE
+0xF7	0x00FD	# LATIN SMALL LETTER Y WITH ACUTE
+0xF8	0x0174	# LATIN CAPITAL LETTER W WITH CIRCUMFLEX
+0xF9	0x0175	# LATIN SMALL LETTER W WITH CIRCUMFLEX
+0xFA	0x1E84	# LATIN CAPITAL LETTER W WITH DIAERESIS
+0xFB	0x1E85	# LATIN SMALL LETTER W WITH DIAERESIS
+0xFC	0x1E80	# LATIN CAPITAL LETTER W WITH GRAVE
+0xFD	0x1E81	# LATIN SMALL LETTER W WITH GRAVE
+0xFE	0x1E82	# LATIN CAPITAL LETTER W WITH ACUTE
+0xFF	0x1E83	# LATIN SMALL LETTER W WITH ACUTE
--- a/charmap/CENTEURO.TXT
+++ b/charmap/CENTEURO.TXT
@ -0,0 +1,327 @@
+#=======================================================================
+#   File name:  CENTEURO.TXT
+#
+#   Contents:   Map (external version) from Mac OS Central European
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-04    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update URLs. Matches internal utom<b1>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n05  1998-Feb-05    Update header comments to new format; no
+#                           mapping changes. Matches internal utom<n3>,
+#                           ufrm<n13>, and Text Encoding Converter
+#                           version 1.3.
+#       n03  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<n5>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect, 
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Central European code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Central European code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Central European character set uses the standard control
+#   characters at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Central European:
+# ---------------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported directly in programming
+#   interfaces for QuickDraw Text, the Script Manager, and related
+#   Text Utilities. For other purposes it is supported via transcoding
+#   to and from Unicode.
+#
+#   This character set is intended to cover the following languages:
+#
+#   Polish, Czech, Slovak, Hungarian, Estonian, Latvian, Lithuanian
+#
+#   These are written in Latin script, but using a different set of
+#   of accented characters than Mac OS Roman. The Mac OS Central
+#   European character set also includes a number of characters
+#   needed for the Mac OS user interface and localization (e.g.
+#   ellipsis, bullet, copyright sign), several typographic
+#   punctuation symbols, math symbols, etc. However, it has a
+#   smaller set of punctuation and symbols than Mac OS Roman. All of
+#   the characters in Mac OS Central European that are also in the
+#   Mac OS Roman character set are at the same code point in both
+#   character sets; this improves application compatibility.
+#
+#   Note: This does not have the same letter repertoire as ISO
+#   8859-2 (Latin-2); each has some accented letters that the other
+#   does not have.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x0100	# LATIN CAPITAL LETTER A WITH MACRON
+0x82	0x0101	# LATIN SMALL LETTER A WITH MACRON
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x0104	# LATIN CAPITAL LETTER A WITH OGONEK
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x0105	# LATIN SMALL LETTER A WITH OGONEK
+0x89	0x010C	# LATIN CAPITAL LETTER C WITH CARON
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x010D	# LATIN SMALL LETTER C WITH CARON
+0x8C	0x0106	# LATIN CAPITAL LETTER C WITH ACUTE
+0x8D	0x0107	# LATIN SMALL LETTER C WITH ACUTE
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x0179	# LATIN CAPITAL LETTER Z WITH ACUTE
+0x90	0x017A	# LATIN SMALL LETTER Z WITH ACUTE
+0x91	0x010E	# LATIN CAPITAL LETTER D WITH CARON
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	0x010F	# LATIN SMALL LETTER D WITH CARON
+0x94	0x0112	# LATIN CAPITAL LETTER E WITH MACRON
+0x95	0x0113	# LATIN SMALL LETTER E WITH MACRON
+0x96	0x0116	# LATIN CAPITAL LETTER E WITH DOT ABOVE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	0x0117	# LATIN SMALL LETTER E WITH DOT ABOVE
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00F5	# LATIN SMALL LETTER O WITH TILDE
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x011A	# LATIN CAPITAL LETTER E WITH CARON
+0x9E	0x011B	# LATIN SMALL LETTER E WITH CARON
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	0x2020	# DAGGER
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x0118	# LATIN CAPITAL LETTER E WITH OGONEK
+0xA3	0x00A3	# POUND SIGN
+0xA4	0x00A7	# SECTION SIGN
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x00DF	# LATIN SMALL LETTER SHARP S
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x0119	# LATIN SMALL LETTER E WITH OGONEK
+0xAC	0x00A8	# DIAERESIS
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x0123	# LATIN SMALL LETTER G WITH CEDILLA
+0xAF	0x012E	# LATIN CAPITAL LETTER I WITH OGONEK
+0xB0	0x012F	# LATIN SMALL LETTER I WITH OGONEK
+0xB1	0x012A	# LATIN CAPITAL LETTER I WITH MACRON
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x012B	# LATIN SMALL LETTER I WITH MACRON
+0xB5	0x0136	# LATIN CAPITAL LETTER K WITH CEDILLA
+0xB6	0x2202	# PARTIAL DIFFERENTIAL
+0xB7	0x2211	# N-ARY SUMMATION
+0xB8	0x0142	# LATIN SMALL LETTER L WITH STROKE
+0xB9	0x013B	# LATIN CAPITAL LETTER L WITH CEDILLA
+0xBA	0x013C	# LATIN SMALL LETTER L WITH CEDILLA
+0xBB	0x013D	# LATIN CAPITAL LETTER L WITH CARON
+0xBC	0x013E	# LATIN SMALL LETTER L WITH CARON
+0xBD	0x0139	# LATIN CAPITAL LETTER L WITH ACUTE
+0xBE	0x013A	# LATIN SMALL LETTER L WITH ACUTE
+0xBF	0x0145	# LATIN CAPITAL LETTER N WITH CEDILLA
+0xC0	0x0146	# LATIN SMALL LETTER N WITH CEDILLA
+0xC1	0x0143	# LATIN CAPITAL LETTER N WITH ACUTE
+0xC2	0x00AC	# NOT SIGN
+0xC3	0x221A	# SQUARE ROOT
+0xC4	0x0144	# LATIN SMALL LETTER N WITH ACUTE
+0xC5	0x0147	# LATIN CAPITAL LETTER N WITH CARON
+0xC6	0x2206	# INCREMENT
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x0148	# LATIN SMALL LETTER N WITH CARON
+0xCC	0x0150	# LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
+0xCD	0x00D5	# LATIN CAPITAL LETTER O WITH TILDE
+0xCE	0x0151	# LATIN SMALL LETTER O WITH DOUBLE ACUTE
+0xCF	0x014C	# LATIN CAPITAL LETTER O WITH MACRON
+0xD0	0x2013	# EN DASH
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x00F7	# DIVISION SIGN
+0xD7	0x25CA	# LOZENGE
+0xD8	0x014D	# LATIN SMALL LETTER O WITH MACRON
+0xD9	0x0154	# LATIN CAPITAL LETTER R WITH ACUTE
+0xDA	0x0155	# LATIN SMALL LETTER R WITH ACUTE
+0xDB	0x0158	# LATIN CAPITAL LETTER R WITH CARON
+0xDC	0x2039	# SINGLE LEFT-POINTING ANGLE QUOTATION MARK
+0xDD	0x203A	# SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
+0xDE	0x0159	# LATIN SMALL LETTER R WITH CARON
+0xDF	0x0156	# LATIN CAPITAL LETTER R WITH CEDILLA
+0xE0	0x0157	# LATIN SMALL LETTER R WITH CEDILLA
+0xE1	0x0160	# LATIN CAPITAL LETTER S WITH CARON
+0xE2	0x201A	# SINGLE LOW-9 QUOTATION MARK
+0xE3	0x201E	# DOUBLE LOW-9 QUOTATION MARK
+0xE4	0x0161	# LATIN SMALL LETTER S WITH CARON
+0xE5	0x015A	# LATIN CAPITAL LETTER S WITH ACUTE
+0xE6	0x015B	# LATIN SMALL LETTER S WITH ACUTE
+0xE7	0x00C1	# LATIN CAPITAL LETTER A WITH ACUTE
+0xE8	0x0164	# LATIN CAPITAL LETTER T WITH CARON
+0xE9	0x0165	# LATIN SMALL LETTER T WITH CARON
+0xEA	0x00CD	# LATIN CAPITAL LETTER I WITH ACUTE
+0xEB	0x017D	# LATIN CAPITAL LETTER Z WITH CARON
+0xEC	0x017E	# LATIN SMALL LETTER Z WITH CARON
+0xED	0x016A	# LATIN CAPITAL LETTER U WITH MACRON
+0xEE	0x00D3	# LATIN CAPITAL LETTER O WITH ACUTE
+0xEF	0x00D4	# LATIN CAPITAL LETTER O WITH CIRCUMFLEX
+0xF0	0x016B	# LATIN SMALL LETTER U WITH MACRON
+0xF1	0x016E	# LATIN CAPITAL LETTER U WITH RING ABOVE
+0xF2	0x00DA	# LATIN CAPITAL LETTER U WITH ACUTE
+0xF3	0x016F	# LATIN SMALL LETTER U WITH RING ABOVE
+0xF4	0x0170	# LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
+0xF5	0x0171	# LATIN SMALL LETTER U WITH DOUBLE ACUTE
+0xF6	0x0172	# LATIN CAPITAL LETTER U WITH OGONEK
+0xF7	0x0173	# LATIN SMALL LETTER U WITH OGONEK
+0xF8	0x00DD	# LATIN CAPITAL LETTER Y WITH ACUTE
+0xF9	0x00FD	# LATIN SMALL LETTER Y WITH ACUTE
+0xFA	0x0137	# LATIN SMALL LETTER K WITH CEDILLA
+0xFB	0x017B	# LATIN CAPITAL LETTER Z WITH DOT ABOVE
+0xFC	0x0141	# LATIN CAPITAL LETTER L WITH STROKE
+0xFD	0x017C	# LATIN SMALL LETTER Z WITH DOT ABOVE
+0xFE	0x0122	# LATIN CAPITAL LETTER G WITH CEDILLA
+0xFF	0x02C7	# CARON
--- a/charmap/CHINSIMP.TXT
+++ b/charmap/CHINSIMP.TXT
--- a/charmap/CHINTRAD.TXT
+++ b/charmap/CHINTRAD.TXT
--- a/charmap/CORPCHAR.TXT
+++ b/charmap/CORPCHAR.TXT
@ -0,0 +1,519 @@
+#=======================================================================
+#   File name:  CORPCHAR.TXT
+#
+#   Contents:   Registry (external version) of Apple use of
+#               Unicode corporate-zone characters.
+#
+#   Copyright:  (c) 1994-2003, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c03  2005-Apr-04    Deprecate 0xF8E6. Matches internal registry
+#                           <c1.3>
+#       c02  2003-Feb-18    Add entry for 0xF802.
+#      b4,c1 2002-Dec-19    Add entries for 0xF700-0xF747 and 0xF803-
+#                           0xF84F; update replacement characters for
+#                           0xF883, 0xF8AA, 0xF8B4, 0xF8B7, 0xF8BD,
+#                           0xF8D7-0xF8E4, 0xF8EB-0xF8F3, 0xF8F5-
+#                           0xF8FE. Deprecate 0xF8E7, 0xF8F4. Delete Mac
+#                           OS Greek mapping for 0xF8A0. Update URLs.
+#                           Matches internal registry <b7>.
+#       b03  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal registry <b3> and Text Encoding
+#                           Converter version 1.5.
+#       b02  1998-Aug-18    Expanded usage of 0xF8A0. Matches internal
+#                           registry <b3>.
+#       n11  1998-Feb-05    Minor update to header comments
+#       n09  1997-Dec-14    Update to match internal registry <n23>:
+#                           Add source hint 0xF850, transcoding hints
+#                           0xF860-0xF86B and 0xF870-0xF872, deprecate
+#                           almost all other non-hint corporate
+#                           characters.
+#       n08  1997-Jul-17    Update to match internal registry <n13>:
+#                           Add characters for Mac OS Chinese, Korean &
+#                           Farsi. Add CJK source hints. Deprecate some
+#                           characters in favor of combinations of
+#                           standard characters and transcoding hints.
+#                           Change header format.
+#       n04  1995-Nov-15    Update to match internal registry <n8>:
+#                           Add characters for Mac OS Hebrew and Thai.
+#       n02  1995-Apr-18    First version. Matches internal registry
+#                           <n5>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect, 
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Two tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Unicode corporate character code point
+#           (in hex as 0xNNNN)
+#     Column #2 is a comment containing:
+#       1)  an informal name describing the Unicode corporate character,
+#           or if it is deprecated, information about what to use
+#           instead.
+#       2)  optionally, another '#', followed by information on which
+#           Mac OS encodings use the Unicode corporate character, and -
+#           if relevant - the Mac OS code points that correspond to the
+#           corporate character.
+#
+#   The entries are in Unicode order.
+#_______________________________________________________________________
+
+# NeXT's OpenStep reserved corporate characters in the range 0xF700 to
+# 0xF8FF for transient use as keyboard function keys. The ones actually
+# assigned in NextStep are 0xF700-0xF747, as follows. These are still
+# used in the Mac OS X AppKit frameworks. Note that there is no glyph
+# associated with these, and they are not mapped or used by the Mac OS
+# Text Encoding Converter.
+0xF700	# NSUpArrowFunctionKey
+0xF701	# NSDownArrowFunctionKey
+0xF702	# NSLeftArrowFunctionKey
+0xF703	# NSRightArrowFunctionKey
+0xF704	# NSF1FunctionKey
+0xF705	# NSF2FunctionKey
+0xF706	# NSF3FunctionKey
+0xF707	# NSF4FunctionKey
+0xF708	# NSF5FunctionKey
+0xF709	# NSF6FunctionKey
+0xF70A	# NSF7FunctionKey
+0xF70B	# NSF8FunctionKey
+0xF70C	# NSF9FunctionKey
+0xF70D	# NSF10FunctionKey
+0xF70E	# NSF11FunctionKey
+0xF70F	# NSF12FunctionKey
+0xF710	# NSF13FunctionKey
+0xF711	# NSF14FunctionKey
+0xF712	# NSF15FunctionKey
+0xF713	# NSF16FunctionKey
+0xF714	# NSF17FunctionKey
+0xF715	# NSF18FunctionKey
+0xF716	# NSF19FunctionKey
+0xF717	# NSF20FunctionKey
+0xF718	# NSF21FunctionKey
+0xF719	# NSF22FunctionKey
+0xF71A	# NSF23FunctionKey
+0xF71B	# NSF24FunctionKey
+0xF71C	# NSF25FunctionKey
+0xF71D	# NSF26FunctionKey
+0xF71E	# NSF27FunctionKey
+0xF71F	# NSF28FunctionKey
+0xF720	# NSF29FunctionKey
+0xF721	# NSF30FunctionKey
+0xF722	# NSF31FunctionKey
+0xF723	# NSF32FunctionKey
+0xF724	# NSF33FunctionKey
+0xF725	# NSF34FunctionKey
+0xF726	# NSF35FunctionKey
+0xF727	# NSInsertFunctionKey
+0xF728	# NSDeleteFunctionKey
+0xF729	# NSHomeFunctionKey
+0xF72A	# NSBeginFunctionKey
+0xF72B	# NSEndFunctionKey
+0xF72C	# NSPageUpFunctionKey
+0xF72D	# NSPageDownFunctionKey
+0xF72E	# NSPrintScreenFunctionKey
+0xF72F	# NSScrollLockFunctionKey
+0xF730	# NSPauseFunctionKey
+0xF731	# NSSysReqFunctionKey
+0xF732	# NSBreakFunctionKey
+0xF733	# NSResetFunctionKey
+0xF734	# NSStopFunctionKey
+0xF735	# NSMenuFunctionKey
+0xF736	# NSUserFunctionKey
+0xF737	# NSSystemFunctionKey
+0xF738	# NSPrintFunctionKey
+0xF739	# NSClearLineFunctionKey
+0xF73A	# NSClearDisplayFunctionKey
+0xF73B	# NSInsertLineFunctionKey
+0xF73C	# NSDeleteLineFunctionKey
+0xF73D	# NSInsertCharFunctionKey
+0xF73E	# NSDeleteCharFunctionKey
+0xF73F	# NSPrevFunctionKey
+0xF740	# NSNextFunctionKey
+0xF741	# NSSelectFunctionKey
+0xF742	# NSExecuteFunctionKey
+0xF743	# NSUndoFunctionKey
+0xF744	# NSRedoFunctionKey
+0xF745	# NSFindFunctionKey
+0xF746	# NSHelpFunctionKey
+0xF747	# NSModeSwitchFunctionKey
+
+# The following (11) are for mapping the Mac OS Keyboard and Mac OS Korean
+# encodings (for Mac OS Korean also see 0xF83D, 0xF840-0xF84F).
+0xF802	# lower left pencil # Keyboard-0x0F
+0xF803	# contextual menu symbol # Keyboard-0x6D
+0xF804	# eject symbol # Keyboard-0x8C
+0xF805	# black diamond minus white square # Korean-0xA658
+0xF806	# black square minus white diamond # Korean-0xA663
+0xF807	# telephone dial # Korean-0xA69F
+0xF808	# five vertical lines # Korean-0xA68F
+0xF809	# one downward-pointing black triangle over two others # Korean-0xA681
+0xF80A	# two interwoven eye shapes # Korean-0xA674
+0xF80B	# narrow-leaf four-petal florette # Korean-0xA696
+0xF80C	# four interleaved fisheyes # Korean-0xA69A
+
+# The following (51) are mainly for mapping the dingbat/fleuron repetoire
+# of the Hoefler Ornaments font, which is otherwise unmappable to Unicode.
+# 0xF83D is also used for mapping MacKorean.
+0xF80D	# horizontal line thickening at center # Hoefler Ornaments glyph 6
+0xF80E	# dotted X design 1 # Hoefler Ornaments glyph 7
+0xF80F	# dotted X design 2 # Hoefler Ornaments glyph 8
+0xF810	# dotted X design 3 # Hoefler Ornaments glyph 9
+0xF811	# dotted X design 4 # Hoefler Ornaments glyph 10
+0xF812	# horizontal line with wasp waist at center # Hoefler Ornaments glyph 11
+0xF813	# horizontal line thickening at center, alternate # Hoefler Ornaments glyph 12
+0xF814	# half-filled fleuron  1 # Hoefler Ornaments glyph 13
+0xF815	# half-filled fleuron  2 # Hoefler Ornaments glyph 14
+0xF816	# half-filled fleuron  3 # Hoefler Ornaments glyph 15
+0xF817	# half-filled fleuron  4 # Hoefler Ornaments glyph 16
+0xF818	# half-filled fleuron  5 # Hoefler Ornaments glyph 17
+0xF819	# half-filled fleuron  6 # Hoefler Ornaments glyph 18
+0xF81A	# half-filled fleuron  7 # Hoefler Ornaments glyph 19
+0xF81B	# half-filled fleuron  8 # Hoefler Ornaments glyph 20
+0xF81C	# half-filled fleuron  9 # Hoefler Ornaments glyph 21
+0xF81D	# half-filled fleuron 10 # Hoefler Ornaments glyph 22
+0xF81E	# half-filled fleuron 11 # Hoefler Ornaments glyph 23
+0xF81F	# half-filled fleuron 12 # Hoefler Ornaments glyph 24
+0xF820	# half-filled fleuron 13 # Hoefler Ornaments glyph 25
+0xF821	# half-filled fleuron 14 # Hoefler Ornaments glyph 26
+0xF822	# half-filled fleuron 15 # Hoefler Ornaments glyph 27
+0xF823	# half-filled fleuron 16 # Hoefler Ornaments glyph 28
+0xF824	# half-filled dingbat 1 # Hoefler Ornaments glyph 29
+0xF825	# half-filled dingbat 2 # Hoefler Ornaments glyph 30
+0xF826	# half-filled dingbat 3 # Hoefler Ornaments glyph 31
+0xF827	# filled fleuron  1 # Hoefler Ornaments glyph 34
+0xF828	# filled fleuron  2 # Hoefler Ornaments glyph 35
+0xF829	# filled fleuron  3 # Hoefler Ornaments glyph 36
+0xF82A	# filled fleuron  4 # Hoefler Ornaments glyph 37
+0xF82B	# filled fleuron  5 # Hoefler Ornaments glyph 38
+0xF82C	# filled fleuron  6 # Hoefler Ornaments glyph 39
+0xF82D	# filled fleuron  7 # Hoefler Ornaments glyph 40
+0xF82E	# filled fleuron  8 # Hoefler Ornaments glyph 41
+0xF82F	# filled fleuron  9 # Hoefler Ornaments glyph 42
+0xF830	# filled fleuron 10 # Hoefler Ornaments glyph 43
+0xF831	# filled fleuron 11 # Hoefler Ornaments glyph 44
+0xF832	# filled fleuron 12 # Hoefler Ornaments glyph 45
+0xF833	# filled fleuron 13 # Hoefler Ornaments glyph 46
+0xF834	# filled fleuron 14 # Hoefler Ornaments glyph 47
+0xF835	# filled fleuron 15 # Hoefler Ornaments glyph 48
+0xF836	# filled fleuron 16 # Hoefler Ornaments glyph 49
+0xF837	# filled dingbat 1 # Hoefler Ornaments glyph 50
+0xF838	# filled dingbat 2 # Hoefler Ornaments glyph 51
+0xF839	# filled dingbat 3 # Hoefler Ornaments glyph 52
+0xF83A	# sun with face # Hoefler Ornaments glyph 53
+0xF83B	# moon with face # Hoefler Ornaments glyph 54
+0xF83C	# crown # Hoefler Ornaments glyph 55
+0xF83D	# fleur-de-lis # Korean-0xA642, Hoefler Ornaments glyph 57
+0xF83E	# sailing ship # Hoefler Ornaments glyph 58
+0xF83F	# fleuron 17 # Hoefler Ornaments glyph 59
+
+# The following (16) are for mapping the Mac OS Korean encoding
+# (also see 0xF805-0xF80C, 0xF83D).
+0xF840	# three asterisks aligned vertically # Korean-0xA16E
+0xF841	# left right up down arrow # Korean-0xA894
+0xF842	# downwards wave arrow # Korean-0xAC54
+0xF843	# leftwards white arrow from wall (cf. U+21F0) # Korean-0xAC42
+0xF844	# black leftwards arrowhead (cf. U+27A4) # Korean-0xAC49
+0xF845	# black-feathered leftwards arrow (cf. U+27B5) # Korean-0xAC5F
+0xF846	# leftwards arrowhead with tail of spreading ripples # Korean-0xA867
+0xF847	# rightwards arrowhead with tail of spreading ripples # Korean-0xA868
+0xF848	# large white leftwards arrow with white fins # Korean-0xA89D
+0xF849	# large white rightwards arrow with white fins # Korean-0xA89C
+0xF84A	# leftwards arrow with bow # Korean-0xAC4B
+0xF84B	# rightwards arrow with bow # Korean-0xAC4A
+0xF84C	# pentagon # Korean-0xA747
+0xF84D	# trapezoid # Korean-0xA74B
+0xF84E	# quadrilateral with shorter right side # Korean-0xA74C
+0xF84F	# quadrilateral with shorter left side # Korean-0xA74D
+
+# The block of 16 characters 0xF850-0xF85F is for source hint characters.
+# These have no display (like zero-width no-break space). If they appear
+# in text, they can only be mapped to tables that include them. If a run
+# of Unicode characters such as Han characters could otherwise be mapped
+# to any of several encodings, including one of these hint characters can
+# force the text to be mapped only to an encoding whose mapping table
+# includes the hint character. Once they have forced mapping to a particular
+# encoding, they no longer apply (they don't need to be cancelled); if a
+# subsequent character cannot be mapped to that encoding, it may be mapped
+# to another encoding. Currently source hints are mainly defined for CJK
+# source disambiguation.
+# NOTE: These are only defined for application developers who have requested
+# them. The Mac OS Text Encoding Converter does not generate these when
+# converting from other CJK encodings to Unicode. However, it will handle
+# these characters correctly when converting from Unicode to other encodings.
+0xF850	# source hint: Reset, try all candidate encodings in preferred order.
+0xF85C	# source hint: Chinese simplified
+0xF85D	# source hint: Chinese traditional
+0xF85E	# source hint: Japanese
+0xF85F	# source hint: Korean
+
+# The block of 32 characters 0xF860-0xF87F is for transcoding hints.
+# These are used in combination with standard Unicode characters to force
+# them to be treated in a special way for mapping to other encodings;
+# they have no other effect.
+#
+# 0xF870-0xF87F are "variant tags" - they are like combining characters,
+# and can follow a standard Unicode (or a sequence consisting of a base
+# character and other combining characters) to tag it so that it will be
+# unique, treated in a special way for transcoding. These always terminate
+# a sequence of combining characters.
+#
+# 0xF860-0xF86B are "grouping hints" - they precede a group of two to
+# four standard Unicode characters to indicate that they are treated as a
+# group for transcoding. This grouping overrides any other combining
+# behavior.
+#
+# Here are the ones defined so far:
+0xF860	# transcoding hint: group next 2 characters # Japanese,Korean
+0xF861	# transcoding hint: group next 3 characters # Japanese,Korean
+0xF862	# transcoding hint: group next 4 characters # Japanese,Korean
+0xF863	# transcoding hint: group next 4 characters, alt1 # Korean
+0xF864	# transcoding hint: group next 4 characters, alt2 # Korean
+0xF865	# transcoding hint: group next 4 characters, alt3 # Korean
+0xF866	# transcoding hint: group next 4 characters, alt4 # Korean
+0xF867	# transcoding hint: group next 2 characters, alt1 # Korean
+0xF868	# transcoding hint: group next 2 characters, alt2 # Korean
+0xF869	# transcoding hint: group next 2 characters, alt3 # Korean
+0xF86A	# transcoding hint: group next 2 characters, RL # Hebrew
+0xF86B	# transcoding hint: group next 4 characters, RL # Farsi variant
+#
+0xF870	# transcoding hint: variant tag 16 # Symbol, Korean
+0xF871	# transcoding hint: variant tag 15 # Symbol, Korean
+0xF872	# transcoding hint: variant tag 14 # Symbol
+0xF873	# transcoding hint: variant tag 13 # Korean, Thai
+0xF874	# transcoding hint: variant tag 12 # Korean, Thai
+0xF875	# transcoding hint: variant tag 11 # Korean, Thai
+0xF876	# transcoding hint: variant tag 10 # Korean
+0xF877	# transcoding hint: variant tag 9 # Korean
+0xF878	# transcoding hint: variant tag 8 # Korean
+0xF879	# transcoding hint: variant tag 7 # Korean
+0xF87A	# transcoding hint: variant tag 6 # Korean
+0xF87B	# transcoding hint: variant tag 5 # Korean
+0xF87C	# transcoding hint: variant tag 4 # ChineseTrad, Korean, Dingbats
+0xF87D	# transcoding hint: variant tag 3 # ChineseTrad
+0xF87E	# transcoding hint: variant tag 2 # Chinese,Japanese
+0xF87F	# transcoding hint: variant tag 1 # CJK,Symbol,Dingbats,Hebrew
+
+# The following (2) are metrics "characters" so applications can get the
+# height and width of double-byte character glyphs by measuring the glyph of a
+# one-byte character (e.g. calling CharWidth for character 0x82 in a Chinese
+# Traditional font); this approach assumes that the glyphs for all double-byte
+# characters in a font have the same metrics, which is currently true. Note
+# that the width-metric character glyphs are used differently for TrueType and
+# old-style bitmap fonts; for TrueType fonts the metric glyph width is equal
+# to the full width of a double-byte character glyph, while for FBIT/FDEF
+# bitmap fonts the metric glyph width is half the width of a double-byte
+# character glyph.
+0xF880	# height-metric character for double-byte fonts # Chinese Simp&Trad-0x81
+0xF881	# width-metric character for double-byte fonts # Chinese Simp&Trad-0x82
+
+# The following (2) are for the TrueType variant of Mac OS Farsi.
+# NOTE: 0xF883 is deprecated, but is still loosely mapped to 0xA4 in the
+# Mac OS Farsi TrueType variant.
+0xF882	# Arabic ligature "peace on him" # Farsi(TrueType variant)-0x8B
+0xF883	# deprecated, use 0xFDFC (3.2) or 0xF86B+0x0631+0x06CC+0x0627+0x0644 # Farsi(TrueType variant)-0xA4
+
+# The following (22) are for the Mac OS Thai encoding.
+# In this encoding, positional variants of upper vowels, tone marks,
+# and other marks are normally handled automatically by WorldScript I.
+# However, the Thai-DTP keyboard allows the codes for the positional
+# variants to be entered directly, so they must be treated as
+# characters. When the abstract character is treated as a positional
+# variant, it has the right (and high, if relevant) position.
+# NOTE: These are now all deprecated in favor of combinations of standard
+# characters and transcoding hints. The deprecated characters will still
+# be loosely mapped to the appropriate Mac OS Thai character.
+0xF884	# deprecated, use 0x0E31+0xF874 # Thai-0x92
+0xF885	# deprecated, use 0x0E34+0xF874 # Thai-0x94
+0xF886	# deprecated, use 0x0E35+0xF874 # Thai-0x95
+0xF887	# deprecated, use 0x0E36+0xF874 # Thai-0x96
+0xF888	# deprecated, use 0x0E37+0xF874 # Thai-0x97
+0xF889	# deprecated, use 0x0E47+0xF874 # Thai-0x93
+0xF88A	# deprecated, use 0x0E48+0xF874 # Thai-0x98
+0xF88B	# deprecated, use 0x0E48+0xF873 # Thai-0x88
+0xF88C	# deprecated, use 0x0E48+0xF875 # Thai-0x83
+0xF88D	# deprecated, use 0x0E49+0xF874 # Thai-0x99
+0xF88E	# deprecated, use 0x0E49+0xF873 # Thai-0x89
+0xF88F	# deprecated, use 0x0E49+0xF875 # Thai-0x84
+0xF890	# deprecated, use 0x0E4A+0xF874 # Thai-0x9A
+0xF891	# deprecated, use 0x0E4A+0xF873 # Thai-0x8A
+0xF892	# deprecated, use 0x0E4A+0xF875 # Thai-0x85
+0xF893	# deprecated, use 0x0E4B+0xF874 # Thai-0x9B
+0xF894	# deprecated, use 0x0E4B+0xF873 # Thai-0x8B
+0xF895	# deprecated, use 0x0E4B+0xF875 # Thai-0x86
+0xF896	# deprecated, use 0x0E4C+0xF874 # Thai-0x9C
+0xF897	# deprecated, use 0x0E4C+0xF873 # Thai-0x8C
+0xF898	# deprecated, use 0x0E4C+0xF875 # Thai-0x87
+0xF899	# deprecated, use 0x0E4D+0xF874 # Thai-0x8F
+
+# The following (6) are for the Mac OS Hebrew encoding. Four of
+# these are for the obsolete "canoral" codes that were used before
+# System 7.1/Worldscript to control positioning of nikud marks (points).
+# In the future these 4 code points may be redefined.
+# NOTE: Some of these are deprecated in favor of a combination of standard
+# character and transcoding hint. The deprecated characters will still
+# be loosely mapped to the appropriate Mac OS Hebrew character.
+0xF89A	# deprecated, use 0xF86A+0x05DC+0x05B9 # Hebrew-0xC0
+0xF89B	# Hebrew canoral 1 # Hebrew-0xC2
+0xF89C	# Hebrew canoral 2 # Hebrew-0xC3
+0xF89D	# Hebrew canoral 3 # Hebrew-0xC4
+0xF89E	# Hebrew canoral 4 # Hebrew-0xC5
+0xF89F	# deprecated, use 0x05B8+0xF87F # Hebrew-0xDE
+
+# The following (1) is for mapping the single undefined code point in
+# the Mac OS Greek and Turkish encodings, thus permitting full
+# round-trip fidelity. This character is also used for mapping EURO SIGN
+# when mapping to Unicode 1.1 (e.g. for Mac OS Roman and Symbol).
+0xF8A0	# undefined1, also EURO SIGN for Unicode 1.1 # Turkish-0xF5, Roman-0xDB, Symbol-0xA0
+
+# The following (54) are for the Mac OS Japanese encoding.
+# part 1 - Apple corporate Unicode chars for Mac OS Japanese extended
+# characters not in Unicode.
+# NOTE: These are now all deprecated in favor of combinations of standard
+# characters and transcoding hints. The deprecated characters will still
+# be loosely mapped to the appropriate Mac OS Japanese character.
+0xF8A1	# deprecated, use 0xF860+0x0030+0x002E # Jpn-0x8591
+0xF8A2	# deprecated, use 0xF862+0x0058+0x0049+0x0049+0x0049 # Jpn-0x85AB
+0xF8A3	# deprecated, use 0xF861+0x0058+0x0049+0x0056 # Jpn-0x85AC
+0xF8A4	# deprecated, use 0xF860+0x0058+0x0056 # Jpn-0x85AD
+0xF8A5	# deprecated, use 0xF862+0x0078+0x0069+0x0069+0x0069 # Jpn-0x85BF
+0xF8A6	# deprecated, use 0xF861+0x0078+0x0069+0x0076 # Jpn-0x85C0
+0xF8A7	# deprecated, use 0xF860+0x0078+0x0076 # Jpn-0x85C1
+0xF8A8	# deprecated, use 0xFF4D+0xF87F # Jpn-0x8645
+0xF8A9	# deprecated, use 0xFF47+0xF87F # Jpn-0x864B
+0xF8AA	# deprecated, use 0x2113 # Jpn-0x8650
+0xF8AB	# deprecated, use 0xF860+0x0054+0x0042 # Jpn-0x865D
+0xF8AC	# deprecated, use 0xF861+0x0046+0x0041+0x0058 # Jpn-0x869E
+0xF8AD	# deprecated, use 0xF860+0x2193+0x2191 # Jpn-0x86CE
+0xF8AE	# deprecated, use 0x21E8+0xF87A # Jpn-0x86D3
+0xF8AF	# deprecated, use 0x21E6+0xF87A # Jpn-0x86D4
+0xF8B0	# deprecated, use 0x21E7+0xF87A # Jpn-0x86D5
+0xF8B1	# deprecated, use 0x21E9+0xF87A # Jpn-0x86D6
+0xF8B2	# deprecated, use 0xF862+0x6709+0x9650+0x4F1A+0x793E # Jpn-0x87FB
+0xF8B3	# deprecated, use 0xF862+0x8CA1+0x56E3+0x6CD5+0x4EBA # Jpn-0x87FC
+0xF8B4	# deprecated, use 0x301F # Jpn-0x8855
+# part 2 - Apple corporate Unicode chars for Mac OS Japanese vertical
+# forms not in Unicode.
+# NOTE: These are now all deprecated in favor of combinations of standard
+# characters and transcoding hints. The deprecated characters will still
+# be loosely mapped to the appropriate Mac OS Japanese character.
+0xF8B5	# deprecated, use 0x3001+0xF87E # Jpn-0xEB41
+0xF8B6	# deprecated, use 0x3002+0xF87E # Jpn-0xEB42
+0xF8B7	# deprecated, use 0xFFE3+0xF87E # Jpn-0xEB50
+0xF8B8	# deprecated, use 0x30FC+0xF87E # Jpn-0xEB5B
+0xF8B9	# deprecated, use 0x2010+0xF87E # Jpn-0xEB5D
+0xF8BA	# deprecated, use 0x301C+0xF87E # Jpn-0xEB60
+0xF8BB	# deprecated, use 0x2016+0xF87E # Jpn-0xEB61
+0xF8BC	# deprecated, use 0xFF5C+0xF87E # Jpn-0xEB62
+0xF8BD	# deprecated, use 0x2026+0xF87E # Jpn-0xEB63
+0xF8BE	# deprecated, use 0xFF3B+0xF87E # Jpn-0xEB6D
+0xF8BF	# deprecated, use 0xFF3D+0xF87E # Jpn-0xEB6E
+0xF8C0	# deprecated, use 0xFF1D+0xF87E # Jpn-0xEB81
+0xF8C1	# deprecated, use 0x3041+0xF87E # Jpn-0xEC9F
+0xF8C2	# deprecated, use 0x3043+0xF87E # Jpn-0xECA1
+0xF8C3	# deprecated, use 0x3045+0xF87E # Jpn-0xECA3
+0xF8C4	# deprecated, use 0x3047+0xF87E # Jpn-0xECA5
+0xF8C5	# deprecated, use 0x3049+0xF87E # Jpn-0xECA7
+0xF8C6	# deprecated, use 0x3063+0xF87E # Jpn-0xECC1
+0xF8C7	# deprecated, use 0x3083+0xF87E # Jpn-0xECE1
+0xF8C8	# deprecated, use 0x3085+0xF87E # Jpn-0xECE3
+0xF8C9	# deprecated, use 0x3087+0xF87E # Jpn-0xECE5
+0xF8CA	# deprecated, use 0x308E+0xF87E # Jpn-0xECEC
+0xF8CB	# deprecated, use 0x30A1+0xF87E # Jpn-0xED40
+0xF8CC	# deprecated, use 0x30A3+0xF87E # Jpn-0xED42
+0xF8CD	# deprecated, use 0x30A5+0xF87E # Jpn-0xED44
+0xF8CE	# deprecated, use 0x30A7+0xF87E # Jpn-0xED46
+0xF8CF	# deprecated, use 0x30A9+0xF87E # Jpn-0xED48
+0xF8D0	# deprecated, use 0x30C3+0xF87E # Jpn-0xED62
+0xF8D1	# deprecated, use 0x30E3+0xF87E # Jpn-0xED83
+0xF8D2	# deprecated, use 0x30E5+0xF87E # Jpn-0xED85
+0xF8D3	# deprecated, use 0x30E7+0xF87E # Jpn-0xED87
+0xF8D4	# deprecated, use 0x30EE+0xF87E # Jpn-0xED8E
+0xF8D5	# deprecated, use 0x30F5+0xF87E # Jpn-0xED95
+0xF8D6	# deprecated, use 0x30F6+0xF87E # Jpn-0xED96
+
+# The following (14) are for the Mac OS Dingbats encoding.
+# NOTE: These are now all deprecated in favor of standard characters or
+# combinations of standard characters and transcoding hints. The
+# deprecated characters will still be loosely mapped to the appropriate
+# Mac OS Dingbats character.
+0xF8D7	# deprecated, use 0x2768 (3.2) or 0x0028 # Dingbats-0x80
+0xF8D8	# deprecated, use 0x2769 (3.2) or 0x0029 # Dingbats-0x81
+0xF8D9	# deprecated, use 0x276A (3.2) or 0x0028+0xF87F # Dingbats-0x82
+0xF8DA	# deprecated, use 0x276B (3.2) or 0x0029+0xF87F # Dingbats-0x83
+0xF8DB	# deprecated, use 0x276C (3.2) or 0x3008 # Dingbats-0x84
+0xF8DC	# deprecated, use 0x276D (3.2) or 0x3009 # Dingbats-0x85
+0xF8DD	# deprecated, use 0x276E (3.2) or 0x2039 # Dingbats-0x86
+0xF8DE	# deprecated, use 0x276F (3.2) or 0x203A # Dingbats-0x87
+0xF8DF	# deprecated, use 0x2770 (3.2) or 0x3008+0xF87C # Dingbats-0x88
+0xF8E0	# deprecated, use 0x2771 (3.2) or 0x3009+0xF87C # Dingbats-0x89
+0xF8E1	# deprecated, use 0x2772 (3.2) or 0x3014 # Dingbats-0x8A
+0xF8E2	# deprecated, use 0x2773 (3.2) or 0x3015 # Dingbats-0x8B
+0xF8E3	# deprecated, use 0x2774 (3.2) or 0x007B # Dingbats-0x8C
+0xF8E4	# deprecated, use 0x2775 (3.2) or 0x007D # Dingbats-0x8D
+
+# The following (26) are for the Mac OS Symbol encoding.
+# NOTE: Some of these are deprecated in favor of combinations of standard
+# characters and transcoding hints. The deprecated characters will still
+# be loosely mapped to the appropriate Mac OS Symbol character.
+0xF8E5	# radical extender # Symbol-0x60
+0xF8E6	# deprecated, use 0x23D0 (4.0) # Symbol-0xBD
+0xF8E7	# deprecated, use 0x23AF (3.2) # Symbol-0xBE
+0xF8E8	# deprecated, use 0x00AE+0xF87F # Symbol-0xE2
+0xF8E9	# deprecated, use 0x00A9+0xF87F # Symbol-0xE3
+0xF8EA	# deprecated, use 0x2122+0xF87F # Symbol-0xE4
+0xF8EB	# deprecated, use 0x239B (3.2) or 0x0028+0xF870 # Symbol-0xE6
+0xF8EC	# deprecated, use 0x239C (3.2) or 0x0028+0xF871 # Symbol-0xE7
+0xF8ED	# deprecated, use 0x239D (3.2) or 0x0028+0xF872 # Symbol-0xE8
+0xF8EE	# deprecated, use 0x23A1 (3.2) or 0x005B+0xF870 # Symbol-0xE9
+0xF8EF	# deprecated, use 0x23A2 (3.2) or 0x005B+0xF871 # Symbol-0xEA
+0xF8F0	# deprecated, use 0x23A3 (3.2) or 0x005B+0xF872 # Symbol-0xEB
+0xF8F1	# deprecated, use 0x23A7 (3.2) or 0x007B+0xF870 # Symbol-0xEC
+0xF8F2	# deprecated, use 0x23A8 (3.2) or 0x007B+0xF871 # Symbol-0xED
+0xF8F3	# deprecated, use 0x23A9 (3.2) or 0x007B+0xF872 # Symbol-0xEE
+0xF8F4	# deprecated, use 0x23AA (3.2) # Symbol-0xEF
+0xF8F5	# deprecated, use 0x23AE (3.2) or 0x222B+0xF871 # Symbol-0xF4
+0xF8F6	# deprecated, use 0x239E (3.2) or 0x0029+0xF870 # Symbol-0xF6
+0xF8F7	# deprecated, use 0x239F (3.2) or 0x0029+0xF871 # Symbol-0xF7
+0xF8F8	# deprecated, use 0x23A0 (3.2) or 0x0029+0xF872 # Symbol-0xF8
+0xF8F9	# deprecated, use 0x23A4 (3.2) or 0x005D+0xF870 # Symbol-0xF9
+0xF8FA	# deprecated, use 0x23A5 (3.2) or 0x005D+0xF871 # Symbol-0xFA
+0xF8FB	# deprecated, use 0x23A6 (3.2) or 0x005D+0xF872 # Symbol-0xFB
+0xF8FC	# deprecated, use 0x23AB (3.2) or 0x007D+0xF870 # Symbol-0xFC
+0xF8FD	# deprecated, use 0x23AC (3.2) or 0x007D+0xF871 # Symbol-0xFD
+0xF8FE	# deprecated, use 0x23AD (3.2) or 0x007D+0xF872 # Symbol-0xFE
+
+# The following (1) is for the Mac OS Roman encoding
+# (also used in Symbol & Croatian).
+# NOTE: The graphic image associated with the Apple logo character is
+# not authorized for use without permission of Apple, and unauthorized
+# use might constitute trademark infringement.
+0xF8FF	# Apple logo # Roman-0xF0, Symbol-0xF0, Croatian-0xD8
--- a/charmap/CROATIAN.TXT
+++ b/charmap/CROATIAN.TXT
@ -0,0 +1,351 @@
+#=======================================================================
+#   File name:  CROATIAN.TXT
+#
+#   Contents:   Map (external version) from Mac OS Croatian
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-04    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update URLs, notes. Matches internal
+#                           utom<b3>.
+#       b02  1999-Sep-22    Encoding changed for Mac OS 8.5; change
+#                           mapping of 0xDB from CURRENCY SIGN to EURO
+#                           SIGN. Update contact e-mail address. Matches
+#                           internal utom<b2>, ufrm<b2>, and Text
+#                           Encoding Converter version 1.5.
+#       n07  1998-Feb-05    Minor update to header comments
+#       n05  1997-Dec-14    Update to match internal utom<5>, ufrm<16>:
+#                           Change standard mapping for 0xBD from U+2126
+#                           to its canonical decomposition, U+03A9.
+#       n03  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<6>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect, 
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Croatian code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Croatian code order.
+#
+#   One of these mappings requires the use of a corporate character.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Croatian character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Croatian:
+# -------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   Mac OS Croatian is used for Croatian and Slovene.
+#
+#   The Mac OS Croatian encoding shares the script code smRoman
+#   (0) with the standard Mac OS Roman encoding. To determine if
+#   the Croatian encoding is being used, you must check if the
+#   system region code is 68, verCroatia (or 25, verYugoCroatian,
+#   only used in older systems).
+#
+#   This character set is a variant of standard Mac OS Roman
+#   encoding, adding five accented letter case pairs to handle
+#   Croatian. It has 20 code point differences from standard
+#   Mac OS Roman, but only 10 differences in repertoire.
+#
+#   Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
+#   mapped to U+00A4. In Mac OS 8.5 and later versions, code point
+#   0xDB is changed to EURO SIGN and maps to U+20AC; the standard
+#   Apple fonts are updated for Mac OS 8.5 to reflect this. There is
+#   a "currency sign" variant of the Mac OS Croatian encoding that
+#   still maps 0xDB to U+00A4; this can be used for older fonts.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   The following corporate zone Unicode character is used in this
+#   mapping:
+#
+#     0xF8FF  Apple logo
+#
+#   NOTE: The graphic image associated with the Apple logo character
+#   is not authorized for use without permission of Apple, and
+#   unauthorized use might constitute trademark infringement.
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version n07 to version b02:
+#
+#   - Encoding changed for Mac OS 8.5; change mapping of 0xDB from
+#   CURRENCY SIGN (U+00A4) to  EURO SIGN (U+20AC).
+#
+#   Changes from version n03 to version n05:
+#
+#   - Change mapping of 0xBD from U+2126 to its canonical
+#     decomposition, U+03A9.
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x00C5	# LATIN CAPITAL LETTER A WITH RING ABOVE
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x00E3	# LATIN SMALL LETTER A WITH TILDE
+0x8C	0x00E5	# LATIN SMALL LETTER A WITH RING ABOVE
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	0x00EC	# LATIN SMALL LETTER I WITH GRAVE
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	0x00F2	# LATIN SMALL LETTER O WITH GRAVE
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00F5	# LATIN SMALL LETTER O WITH TILDE
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	0x2020	# DAGGER
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x00A2	# CENT SIGN
+0xA3	0x00A3	# POUND SIGN
+0xA4	0x00A7	# SECTION SIGN
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x00DF	# LATIN SMALL LETTER SHARP S
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x0160	# LATIN CAPITAL LETTER S WITH CARON
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x00B4	# ACUTE ACCENT
+0xAC	0x00A8	# DIAERESIS
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x017D	# LATIN CAPITAL LETTER Z WITH CARON
+0xAF	0x00D8	# LATIN CAPITAL LETTER O WITH STROKE
+0xB0	0x221E	# INFINITY
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x2206	# INCREMENT
+0xB5	0x00B5	# MICRO SIGN
+0xB6	0x2202	# PARTIAL DIFFERENTIAL
+0xB7	0x2211	# N-ARY SUMMATION
+0xB8	0x220F	# N-ARY PRODUCT
+0xB9	0x0161	# LATIN SMALL LETTER S WITH CARON
+0xBA	0x222B	# INTEGRAL
+0xBB	0x00AA	# FEMININE ORDINAL INDICATOR
+0xBC	0x00BA	# MASCULINE ORDINAL INDICATOR
+0xBD	0x03A9	# GREEK CAPITAL LETTER OMEGA
+0xBE	0x017E	# LATIN SMALL LETTER Z WITH CARON
+0xBF	0x00F8	# LATIN SMALL LETTER O WITH STROKE
+0xC0	0x00BF	# INVERTED QUESTION MARK
+0xC1	0x00A1	# INVERTED EXCLAMATION MARK
+0xC2	0x00AC	# NOT SIGN
+0xC3	0x221A	# SQUARE ROOT
+0xC4	0x0192	# LATIN SMALL LETTER F WITH HOOK
+0xC5	0x2248	# ALMOST EQUAL TO
+0xC6	0x0106	# LATIN CAPITAL LETTER C WITH ACUTE
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x010C	# LATIN CAPITAL LETTER C WITH CARON
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x00C0	# LATIN CAPITAL LETTER A WITH GRAVE
+0xCC	0x00C3	# LATIN CAPITAL LETTER A WITH TILDE
+0xCD	0x00D5	# LATIN CAPITAL LETTER O WITH TILDE
+0xCE	0x0152	# LATIN CAPITAL LIGATURE OE
+0xCF	0x0153	# LATIN SMALL LIGATURE OE
+0xD0	0x0110	# LATIN CAPITAL LETTER D WITH STROKE
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x00F7	# DIVISION SIGN
+0xD7	0x25CA	# LOZENGE
+0xD8	0xF8FF	# Apple logo
+0xD9	0x00A9	# COPYRIGHT SIGN
+0xDA	0x2044	# FRACTION SLASH
+0xDB	0x20AC	# EURO SIGN
+0xDC	0x2039	# SINGLE LEFT-POINTING ANGLE QUOTATION MARK
+0xDD	0x203A	# SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
+0xDE	0x00C6	# LATIN CAPITAL LETTER AE
+0xDF	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xE0	0x2013	# EN DASH
+0xE1	0x00B7	# MIDDLE DOT
+0xE2	0x201A	# SINGLE LOW-9 QUOTATION MARK
+0xE3	0x201E	# DOUBLE LOW-9 QUOTATION MARK
+0xE4	0x2030	# PER MILLE SIGN
+0xE5	0x00C2	# LATIN CAPITAL LETTER A WITH CIRCUMFLEX
+0xE6	0x0107	# LATIN SMALL LETTER C WITH ACUTE
+0xE7	0x00C1	# LATIN CAPITAL LETTER A WITH ACUTE
+0xE8	0x010D	# LATIN SMALL LETTER C WITH CARON
+0xE9	0x00C8	# LATIN CAPITAL LETTER E WITH GRAVE
+0xEA	0x00CD	# LATIN CAPITAL LETTER I WITH ACUTE
+0xEB	0x00CE	# LATIN CAPITAL LETTER I WITH CIRCUMFLEX
+0xEC	0x00CF	# LATIN CAPITAL LETTER I WITH DIAERESIS
+0xED	0x00CC	# LATIN CAPITAL LETTER I WITH GRAVE
+0xEE	0x00D3	# LATIN CAPITAL LETTER O WITH ACUTE
+0xEF	0x00D4	# LATIN CAPITAL LETTER O WITH CIRCUMFLEX
+0xF0	0x0111	# LATIN SMALL LETTER D WITH STROKE
+0xF1	0x00D2	# LATIN CAPITAL LETTER O WITH GRAVE
+0xF2	0x00DA	# LATIN CAPITAL LETTER U WITH ACUTE
+0xF3	0x00DB	# LATIN CAPITAL LETTER U WITH CIRCUMFLEX
+0xF4	0x00D9	# LATIN CAPITAL LETTER U WITH GRAVE
+0xF5	0x0131	# LATIN SMALL LETTER DOTLESS I
+0xF6	0x02C6	# MODIFIER LETTER CIRCUMFLEX ACCENT
+0xF7	0x02DC	# SMALL TILDE
+0xF8	0x00AF	# MACRON
+0xF9	0x03C0	# GREEK SMALL LETTER PI
+0xFA	0x00CB	# LATIN CAPITAL LETTER E WITH DIAERESIS
+0xFB	0x02DA	# RING ABOVE
+0xFC	0x00B8	# CEDILLA
+0xFD	0x00CA	# LATIN CAPITAL LETTER E WITH CIRCUMFLEX
+0xFE	0x00E6	# LATIN SMALL LETTER AE
+0xFF	0x02C7	# CARON
--- a/charmap/CYRILLIC.TXT
+++ b/charmap/CYRILLIC.TXT
@ -0,0 +1,352 @@
+#=======================================================================
+#   File name:  CYRILLIC.TXT
+#
+#   Contents:   Map (external version) from Mac OS Cyrillic
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c03  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update URLs, notes. Matches internal
+#                           utom<b2>.
+#       b02  1999-Sep-22    Encoding changed for Mac OS 9.0 to merge
+#                           with Mac OS Ukrainian and support EURO SIGN;
+#                           Change mappings for 0xA2, 0xB6, and 0xFF.
+#                           Update contact e-mail address. Matches
+#                           internal utom<b2>, ufrm<b2>, and Text
+#                           Encoding Converter version 1.5.
+#       n05  1998-Feb-05    Update header comments to new format; no
+#                           mapping changes.  Matches internal utom<n3>,
+#                           ufrm<n13>, and Text Encoding Converter
+#                           version 1.3.
+#       n03  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<n5>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Cyrillic code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Cyrillic code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Cyrillic character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Cyrillic:
+# -------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported directly in programming
+#   interfaces for QuickDraw Text, the Script Manager, and related
+#   Text Utilities. For other purposes it is supported via transcoding
+#   to and from Unicode.
+#
+#   This is the "Euro sign" version of Mac Cyrillic for Mac OS 9.0 and
+#   later. Before Mac OS 9.0, there were two separate Slavic Cyrillic
+#   encodings:
+#
+#   1. The Cyrillic currency sign variant (used for localized Russian
+#      and Bulgarian systems), which had the following:
+#       0xA2  U+00A2 CENT SIGN
+#       0xB6  U+2202 PARTIAL DIFFERENTIAL
+#       0xFF  U+00A4 CURRENCY SIGN
+#
+#   2. The Ukrainian currency sign variant (used for localized Ukrainian
+#      systems and the pre-9.0 Cyrillic Language Kit), which had the
+#      following:
+#       0xA2  U+0490 CYRILLIC CAPITAL LETTER GHE WITH UPTURN
+#       0xB6  U+0491 CYRILLIC SMALL LETTER GHE WITH UPTURN
+#       0xFF  U+00A4 CURRENCY SIGN
+#
+#   This new Cyrillic Euro sign version is based on the old Ukrainian
+#   currency sign variant, with 0xFF changed to be EURO SIGN.
+#
+#   The Mac OS Cyrillic encoding includes the Cyrillic letter repertoire
+#   of ISO 8859-5 (although not at the same code points). This covers
+#   most of the Slavic languages written in Cyrillic script.
+#
+#   The Mac OS Cyrillic encoding also includes a number of characters
+#   needed for the Mac OS user interface and localization (e.g.
+#   ellipsis, bullet, copyright sign). All of the characters in Mac OS
+#   Cyrillic that are also in the Mac OS Roman encoding are at the
+#   same code point in both; this improves application compatibility.
+#
+#   Note: There is a common Ukrainian glyph variation in which the glyph
+#   for CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I may or may not
+#   have a dot above.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version n05 to version b02:
+#
+#   - Encoding changed for Mac OS 9.0 to merge with Mac OS Ukrainian and
+#   support EURO SIGN. 0xA2 changed from U+00A2 to U+0490; 0xB6 changed
+#   from U+2202 to U+0491; 0xFF changed from U+00A4 to U+20AC.
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x0410	# CYRILLIC CAPITAL LETTER A
+0x81	0x0411	# CYRILLIC CAPITAL LETTER BE
+0x82	0x0412	# CYRILLIC CAPITAL LETTER VE
+0x83	0x0413	# CYRILLIC CAPITAL LETTER GHE
+0x84	0x0414	# CYRILLIC CAPITAL LETTER DE
+0x85	0x0415	# CYRILLIC CAPITAL LETTER IE
+0x86	0x0416	# CYRILLIC CAPITAL LETTER ZHE
+0x87	0x0417	# CYRILLIC CAPITAL LETTER ZE
+0x88	0x0418	# CYRILLIC CAPITAL LETTER I
+0x89	0x0419	# CYRILLIC CAPITAL LETTER SHORT I
+0x8A	0x041A	# CYRILLIC CAPITAL LETTER KA
+0x8B	0x041B	# CYRILLIC CAPITAL LETTER EL
+0x8C	0x041C	# CYRILLIC CAPITAL LETTER EM
+0x8D	0x041D	# CYRILLIC CAPITAL LETTER EN
+0x8E	0x041E	# CYRILLIC CAPITAL LETTER O
+0x8F	0x041F	# CYRILLIC CAPITAL LETTER PE
+0x90	0x0420	# CYRILLIC CAPITAL LETTER ER
+0x91	0x0421	# CYRILLIC CAPITAL LETTER ES
+0x92	0x0422	# CYRILLIC CAPITAL LETTER TE
+0x93	0x0423	# CYRILLIC CAPITAL LETTER U
+0x94	0x0424	# CYRILLIC CAPITAL LETTER EF
+0x95	0x0425	# CYRILLIC CAPITAL LETTER HA
+0x96	0x0426	# CYRILLIC CAPITAL LETTER TSE
+0x97	0x0427	# CYRILLIC CAPITAL LETTER CHE
+0x98	0x0428	# CYRILLIC CAPITAL LETTER SHA
+0x99	0x0429	# CYRILLIC CAPITAL LETTER SHCHA
+0x9A	0x042A	# CYRILLIC CAPITAL LETTER HARD SIGN
+0x9B	0x042B	# CYRILLIC CAPITAL LETTER YERU
+0x9C	0x042C	# CYRILLIC CAPITAL LETTER SOFT SIGN
+0x9D	0x042D	# CYRILLIC CAPITAL LETTER E
+0x9E	0x042E	# CYRILLIC CAPITAL LETTER YU
+0x9F	0x042F	# CYRILLIC CAPITAL LETTER YA
+0xA0	0x2020	# DAGGER
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x0490	# CYRILLIC CAPITAL LETTER GHE WITH UPTURN
+0xA3	0x00A3	# POUND SIGN
+0xA4	0x00A7	# SECTION SIGN
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x0406	# CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x0402	# CYRILLIC CAPITAL LETTER DJE
+0xAC	0x0452	# CYRILLIC SMALL LETTER DJE
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x0403	# CYRILLIC CAPITAL LETTER GJE
+0xAF	0x0453	# CYRILLIC SMALL LETTER GJE
+0xB0	0x221E	# INFINITY
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x0456	# CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
+0xB5	0x00B5	# MICRO SIGN
+0xB6	0x0491	# CYRILLIC SMALL LETTER GHE WITH UPTURN
+0xB7	0x0408	# CYRILLIC CAPITAL LETTER JE
+0xB8	0x0404	# CYRILLIC CAPITAL LETTER UKRAINIAN IE
+0xB9	0x0454	# CYRILLIC SMALL LETTER UKRAINIAN IE
+0xBA	0x0407	# CYRILLIC CAPITAL LETTER YI
+0xBB	0x0457	# CYRILLIC SMALL LETTER YI
+0xBC	0x0409	# CYRILLIC CAPITAL LETTER LJE
+0xBD	0x0459	# CYRILLIC SMALL LETTER LJE
+0xBE	0x040A	# CYRILLIC CAPITAL LETTER NJE
+0xBF	0x045A	# CYRILLIC SMALL LETTER NJE
+0xC0	0x0458	# CYRILLIC SMALL LETTER JE
+0xC1	0x0405	# CYRILLIC CAPITAL LETTER DZE
+0xC2	0x00AC	# NOT SIGN
+0xC3	0x221A	# SQUARE ROOT
+0xC4	0x0192	# LATIN SMALL LETTER F WITH HOOK
+0xC5	0x2248	# ALMOST EQUAL TO
+0xC6	0x2206	# INCREMENT
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x040B	# CYRILLIC CAPITAL LETTER TSHE
+0xCC	0x045B	# CYRILLIC SMALL LETTER TSHE
+0xCD	0x040C	# CYRILLIC CAPITAL LETTER KJE
+0xCE	0x045C	# CYRILLIC SMALL LETTER KJE
+0xCF	0x0455	# CYRILLIC SMALL LETTER DZE
+0xD0	0x2013	# EN DASH
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x00F7	# DIVISION SIGN
+0xD7	0x201E	# DOUBLE LOW-9 QUOTATION MARK
+0xD8	0x040E	# CYRILLIC CAPITAL LETTER SHORT U
+0xD9	0x045E	# CYRILLIC SMALL LETTER SHORT U
+0xDA	0x040F	# CYRILLIC CAPITAL LETTER DZHE
+0xDB	0x045F	# CYRILLIC SMALL LETTER DZHE
+0xDC	0x2116	# NUMERO SIGN
+0xDD	0x0401	# CYRILLIC CAPITAL LETTER IO
+0xDE	0x0451	# CYRILLIC SMALL LETTER IO
+0xDF	0x044F	# CYRILLIC SMALL LETTER YA
+0xE0	0x0430	# CYRILLIC SMALL LETTER A
+0xE1	0x0431	# CYRILLIC SMALL LETTER BE
+0xE2	0x0432	# CYRILLIC SMALL LETTER VE
+0xE3	0x0433	# CYRILLIC SMALL LETTER GHE
+0xE4	0x0434	# CYRILLIC SMALL LETTER DE
+0xE5	0x0435	# CYRILLIC SMALL LETTER IE
+0xE6	0x0436	# CYRILLIC SMALL LETTER ZHE
+0xE7	0x0437	# CYRILLIC SMALL LETTER ZE
+0xE8	0x0438	# CYRILLIC SMALL LETTER I
+0xE9	0x0439	# CYRILLIC SMALL LETTER SHORT I
+0xEA	0x043A	# CYRILLIC SMALL LETTER KA
+0xEB	0x043B	# CYRILLIC SMALL LETTER EL
+0xEC	0x043C	# CYRILLIC SMALL LETTER EM
+0xED	0x043D	# CYRILLIC SMALL LETTER EN
+0xEE	0x043E	# CYRILLIC SMALL LETTER O
+0xEF	0x043F	# CYRILLIC SMALL LETTER PE
+0xF0	0x0440	# CYRILLIC SMALL LETTER ER
+0xF1	0x0441	# CYRILLIC SMALL LETTER ES
+0xF2	0x0442	# CYRILLIC SMALL LETTER TE
+0xF3	0x0443	# CYRILLIC SMALL LETTER U
+0xF4	0x0444	# CYRILLIC SMALL LETTER EF
+0xF5	0x0445	# CYRILLIC SMALL LETTER HA
+0xF6	0x0446	# CYRILLIC SMALL LETTER TSE
+0xF7	0x0447	# CYRILLIC SMALL LETTER CHE
+0xF8	0x0448	# CYRILLIC SMALL LETTER SHA
+0xF9	0x0449	# CYRILLIC SMALL LETTER SHCHA
+0xFA	0x044A	# CYRILLIC SMALL LETTER HARD SIGN
+0xFB	0x044B	# CYRILLIC SMALL LETTER YERU
+0xFC	0x044C	# CYRILLIC SMALL LETTER SOFT SIGN
+0xFD	0x044D	# CYRILLIC SMALL LETTER E
+0xFE	0x044E	# CYRILLIC SMALL LETTER YU
+0xFF	0x20AC	# EURO SIGN
--- a/charmap/DEVANAGA.TXT
+++ b/charmap/DEVANAGA.TXT
@ -0,0 +1,447 @@
+#=======================================================================
+#   File name:  DEVANAGA.TXT
+#
+#   Contents:   Map (external version) from Mac OS Devanagari
+#               encoding to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments; add section on
+#                           roundtrip considerations. Matches internal
+#                           xml <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update URLs. Matches internal utom<b1>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n04  1998-Feb-05    First version; matches internal utom<n9>,
+#                           ufrm<n15>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Devanagari code or code sequence
+#       (in hex as 0xNN or 0xNN+0xNN)
+#     Column #2 is the corresponding Unicode or Unicode sequence
+#       (in hex as 0xNNNN or 0xNNNN+0xNNNN).
+#     Column #3 is a comment containing the Unicode name or sequence
+#       of names. In some cases an additional comment follows the
+#       Unicode name(s).
+#
+#   The entries are in two sections. The first section is for pairs of
+#   Mac OS Devanagari code points that must be mapped in a special way.
+#   The second section maps individual code points.
+#
+#   Within each section, the entries are in Mac OS Devanagari code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Devanagari character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Devanagari:
+# ---------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   Mac OS Devanagari is based on IS 13194:1991 (ISCII-91), with the
+#   addition of several punctuation and symbol characters. However,
+#   Mac OS Devanagari does not support the ATR (attribute) mechanism of
+#   ISCII-91.
+#
+# 1. ISCII-91 features in Mac OS Devanagari include:
+#
+#  a) Overloading of nukta
+#
+#     In addition to using the nukta (0xE9) like a combining dot below,
+#     nukta is overloaded to function as a general character modifier.
+#     In this role, certain code points followed by 0xE9 are treated as
+#     a two-byte code point representing a character which may be
+#     rather different than the characters represented by either of
+#     the code points alone. For example, the character DEVANAGARI OM
+#     (U+0950) is represented in ISCII-91 as candrabindu + nukta.
+#
+#  b) Explicit halant and soft halant
+#
+#     A double halant (0xE8 + 0xE8) constitutes an "explicit halant",
+#     which will always appear as a halant instead of causing formation
+#     of a ligature or half-form consonant.
+#
+#     Halant followed by nukta (0xE8 + 0xE9) constitutes a "soft
+#     halant", which prevents formation of a ligature and instead
+#     retains the half-form of the first consonant.
+#
+#  c) Invisible consonant
+#
+#     The byte 0xD9 (called INV in ISCII-91) is an invisible consonant:
+#     It behaves like a consonant but has no visible appearance. It is
+#     intended to be used (often in combination with halant) to display
+#     dependent forms in isolation, such as the RA forms or consonant
+#     half-forms.
+#
+#  d) Extensions for Vedic, etc.
+#
+#     The byte 0xF0 (called EXT in ISCII-91) followed by any byte in
+#     the range 0xA1-0xEE constitutes a two-byte code point which can
+#     be used to represent additional characters for Vedic (or other
+#     extensions); 0xF0 followed by any other byte value constitutes
+#     malformed text. Mac OS Devanagari supports this mechanism, but
+#     does not currently map any of these two-byte code points to
+#     anything.
+#
+# 2. Mac OS Devanagari additions
+#
+#   Mac OS Devanagari adds characters using the code points
+#   0x80-0x8A and 0x90-0x91 (the latter are some Devanagari additions
+#   from Unicode).
+#
+# 3. Unused code points
+#
+#   The following code points are currently unused, and are not shown
+#   here: 0x8B-0x8F, 0x92-0xA0, 0xEB-0xEF, 0xFB-0xFF. In addition,
+#   0xF0 is not shown here, but it has a special function as described
+#   above.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# 1. Mapping the byte pairs
+#
+#   If one of the following byte values is encountered when mapping
+#   Mac OS Devanagari text - 0xA1, 0xA6, 0xA7, 0xAA, 0xDB, 0xDC, 0xDF,
+#   0xE8, or 0xEA - then the next byte (if there is one) should be
+#   examined. If the next byte is 0xE9 - or also 0xE8, if the first
+#   byte was 0xE8 - then the byte pair should be mapped using the
+#   first section of the mapping table below. Otherwise, each byte
+#   should be mapped using the second section of the mapping table
+#   below.
+#
+#   - The Unicode Standard, Version 2.0, specifies how explicit
+#     halant and soft halant should be represented in Unicode;
+#     these mappings are used below.
+#
+#   If the byte value 0xF0 is encountered when mapping Mac OS
+#   Devanagari text, then the next byte should be examined. If there
+#   is no next byte (e.g. 0xF0 at end of buffer), the mapping
+#   process should indicate incomplete character. If there is a next
+#   byte but it is not in the range 0xA1-0xEE, the mapping process
+#   should indicate malformed text. Otherwise, the mapping process
+#   should treat the byte pair as a valid two-byte code point with no
+#   mapping (e.g. map it to QUESTION MARK, REPLACEMENT CHARACTER,
+#   etc.).
+#
+# 2. Mapping the invisible consonant
+#
+#   It has been suggested that INV in ISCII-91 should map to ZERO
+#   WIDTH NON-JOINER in Unicode. However, this causes problems with
+#   roundtrip fidelity: The ISCII-91 sequences 0xE8+0xE8 and 0xE8+0xD9
+#   would map to the same sequence of Unicode characters. We have
+#   instead mapped INV to LEFT-TO-RIGHT MARK, which avoids these
+#   problems.
+#
+# 3. Additional loose mappings from Unicode
+#
+#   These are not preserved in roundtrip mappings.
+#
+#   U+0958  0xB3+0xE9  # DEVANAGARI LETTER QA
+#   U+0959  0xB4+0xE9  # DEVANAGARI LETTER KHHA
+#   U+095A  0xB5+0xE9  # DEVANAGARI LETTER GHHA
+#   U+095B  0xBA+0xE9  # DEVANAGARI LETTER ZA
+#   U+095C  0xBF+0xE9  # DEVANAGARI LETTER DDDHA
+#   U+095D  0xC0+0xE9  # DEVANAGARI LETTER RHA
+#   U+095E  0xC9+0xE9  # DEVANAGARI LETTER FA
+#
+# 4. Roundtrip considerations when mapping to decomposed Unicode
+#
+#   Both ISCII-91 (hence Mac OS Devanagari) and Unicode provide multiple
+#   ways of representing certain Devanagari consonants. For example,
+#   DEVANAGARI LETTER NNNA can be represented in Unicode as the single
+#   character 0x0929 or as the sequence 0x0928 0x093C; similarly, this
+#   consonant can be represented in Mac OS Devanagari as 0xC7 or as the
+#   sequence 0xC6 0xE9. This leads to some roundtrip problems. First
+#   note that we have the following mappings without such problems:
+#
+#   ISCII/  standard                  decomposition of  reverse mapping
+#   Mac OS  Unicode mapping           standard mapping  of decomposition
+#   ------  -----------------------   ----------------  ----------------
+#   0xC6    0x0928  ... LETTER NA     0x0928 (same)     0xC6
+#   0xCD    0x092F  ... LETTER YA     0x092F (same)     0xCD
+#   0xCF    0x0930  ... LETTER RA     0x0930 (same)     0xCF
+#   0xD2    0x0933  ... LETTER LLA    0x0933 (same)     0xD2
+#   0xE9    0x093C  ... SIGN NUKTA    0x093C (same)     0xE9
+#
+#   However, those mappings above cause roundtrip problems for the
+#   the following mappings if they are decomposed:
+#
+#   ISCII/  standard                  decomposition of  reverse mapping
+#   Mac OS  Unicode mapping           standard mapping  of decomposition
+#   ------  -----------------------   ----------------  ----------------
+#   0xC7    0x0929  ... LETTER NNNA   0x0928 0x093C     0xC6 0xE9
+#   0xCE    0x095F  ... LETTER YYA    0x092F 0x093C     0xCD 0xE9
+#   0xD0    0x0931  ... LETTER RRA    0x0930 0x093C     0xCF 0xE9
+#   0xD3    0x0934  ... LETTER LLLA   0x0933 0x093C     0xD2 0xE9
+#
+#   One solution is to use a grouping transcoding hint with the four
+#   decompositions above to mark the decomposed sequence for special
+#   treatment in transcoding. This yields the following mappings to
+#   decomposed Unicode:
+#
+#   ISCII/                     decomposed
+#   Mac OS                     Unicode mapping
+#   ------                     ----------------
+#   0xC7                       0xF860 0x0928 0x093C
+#   0xCE                       0xF860 0x092F 0x093C
+#   0xD0                       0xF860 0x0930 0x093C
+#   0xD3                       0xF860 0x0933 0x093C
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+##################
+
+# Section 1: Map the following byte pairs as indicated:
+# (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
+# (Also see note about 0xF0 in comments above)
+
+0xA1+0xE9	0x0950	# DEVANAGARI OM
+0xA6+0xE9	0x090C	# DEVANAGARI LETTER VOCALIC L
+0xA7+0xE9	0x0961	# DEVANAGARI LETTER VOCALIC LL
+0xAA+0xE9	0x0960	# DEVANAGARI LETTER VOCALIC RR
+0xDB+0xE9	0x0962	# DEVANAGARI VOWEL SIGN VOCALIC L
+0xDC+0xE9	0x0963	# DEVANAGARI VOWEL SIGN VOCALIC LL
+0xDF+0xE9	0x0944	# DEVANAGARI VOWEL SIGN VOCALIC RR
+0xE8+0xE8	0x094D+0x200C	# DEVANAGARI SIGN VIRAMA + ZWNJ # explicit halant
+0xE8+0xE9	0x094D+0x200D	# DEVANAGARI SIGN VIRAMA + ZWJ # soft halant
+0xEA+0xE9	0x093D	# DEVANAGARI SIGN AVAGRAHA
+
+# Section 2: Map the remaining bytes as follows:
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00D7	# MULTIPLICATION SIGN
+0x81	0x2212	# MINUS SIGN
+0x82	0x2013	# EN DASH
+0x83	0x2014	# EM DASH
+0x84	0x2018	# LEFT SINGLE QUOTATION MARK
+0x85	0x2019	# RIGHT SINGLE QUOTATION MARK
+0x86	0x2026	# HORIZONTAL ELLIPSIS
+0x87	0x2022	# BULLET
+0x88	0x00A9	# COPYRIGHT SIGN
+0x89	0x00AE	# REGISTERED SIGN
+0x8A	0x2122	# TRADE MARK SIGN
+#
+0x90	0x0965	# DEVANAGARI DOUBLE DANDA
+0x91	0x0970	# DEVANAGARI ABBREVIATION SIGN
+#
+0xA1	0x0901	# DEVANAGARI SIGN CANDRABINDU
+0xA2	0x0902	# DEVANAGARI SIGN ANUSVARA
+0xA3	0x0903	# DEVANAGARI SIGN VISARGA
+0xA4	0x0905	# DEVANAGARI LETTER A
+0xA5	0x0906	# DEVANAGARI LETTER AA
+0xA6	0x0907	# DEVANAGARI LETTER I
+0xA7	0x0908	# DEVANAGARI LETTER II
+0xA8	0x0909	# DEVANAGARI LETTER U
+0xA9	0x090A	# DEVANAGARI LETTER UU
+0xAA	0x090B	# DEVANAGARI LETTER VOCALIC R
+0xAB	0x090E	# DEVANAGARI LETTER SHORT E
+0xAC	0x090F	# DEVANAGARI LETTER E
+0xAD	0x0910	# DEVANAGARI LETTER AI
+0xAE	0x090D	# DEVANAGARI LETTER CANDRA E
+0xAF	0x0912	# DEVANAGARI LETTER SHORT O
+0xB0	0x0913	# DEVANAGARI LETTER O
+0xB1	0x0914	# DEVANAGARI LETTER AU
+0xB2	0x0911	# DEVANAGARI LETTER CANDRA O
+0xB3	0x0915	# DEVANAGARI LETTER KA
+0xB4	0x0916	# DEVANAGARI LETTER KHA
+0xB5	0x0917	# DEVANAGARI LETTER GA
+0xB6	0x0918	# DEVANAGARI LETTER GHA
+0xB7	0x0919	# DEVANAGARI LETTER NGA
+0xB8	0x091A	# DEVANAGARI LETTER CA
+0xB9	0x091B	# DEVANAGARI LETTER CHA
+0xBA	0x091C	# DEVANAGARI LETTER JA
+0xBB	0x091D	# DEVANAGARI LETTER JHA
+0xBC	0x091E	# DEVANAGARI LETTER NYA
+0xBD	0x091F	# DEVANAGARI LETTER TTA
+0xBE	0x0920	# DEVANAGARI LETTER TTHA
+0xBF	0x0921	# DEVANAGARI LETTER DDA
+0xC0	0x0922	# DEVANAGARI LETTER DDHA
+0xC1	0x0923	# DEVANAGARI LETTER NNA
+0xC2	0x0924	# DEVANAGARI LETTER TA
+0xC3	0x0925	# DEVANAGARI LETTER THA
+0xC4	0x0926	# DEVANAGARI LETTER DA
+0xC5	0x0927	# DEVANAGARI LETTER DHA
+0xC6	0x0928	# DEVANAGARI LETTER NA
+0xC7	0x0929	# DEVANAGARI LETTER NNNA
+0xC8	0x092A	# DEVANAGARI LETTER PA
+0xC9	0x092B	# DEVANAGARI LETTER PHA
+0xCA	0x092C	# DEVANAGARI LETTER BA
+0xCB	0x092D	# DEVANAGARI LETTER BHA
+0xCC	0x092E	# DEVANAGARI LETTER MA
+0xCD	0x092F	# DEVANAGARI LETTER YA
+0xCE	0x095F	# DEVANAGARI LETTER YYA
+0xCF	0x0930	# DEVANAGARI LETTER RA
+0xD0	0x0931	# DEVANAGARI LETTER RRA
+0xD1	0x0932	# DEVANAGARI LETTER LA
+0xD2	0x0933	# DEVANAGARI LETTER LLA
+0xD3	0x0934	# DEVANAGARI LETTER LLLA
+0xD4	0x0935	# DEVANAGARI LETTER VA
+0xD5	0x0936	# DEVANAGARI LETTER SHA
+0xD6	0x0937	# DEVANAGARI LETTER SSA
+0xD7	0x0938	# DEVANAGARI LETTER SA
+0xD8	0x0939	# DEVANAGARI LETTER HA
+0xD9	0x200E	# LEFT-TO-RIGHT MARK # invisible consonant
+0xDA	0x093E	# DEVANAGARI VOWEL SIGN AA
+0xDB	0x093F	# DEVANAGARI VOWEL SIGN I
+0xDC	0x0940	# DEVANAGARI VOWEL SIGN II
+0xDD	0x0941	# DEVANAGARI VOWEL SIGN U
+0xDE	0x0942	# DEVANAGARI VOWEL SIGN UU
+0xDF	0x0943	# DEVANAGARI VOWEL SIGN VOCALIC R
+0xE0	0x0946	# DEVANAGARI VOWEL SIGN SHORT E
+0xE1	0x0947	# DEVANAGARI VOWEL SIGN E
+0xE2	0x0948	# DEVANAGARI VOWEL SIGN AI
+0xE3	0x0945	# DEVANAGARI VOWEL SIGN CANDRA E
+0xE4	0x094A	# DEVANAGARI VOWEL SIGN SHORT O
+0xE5	0x094B	# DEVANAGARI VOWEL SIGN O
+0xE6	0x094C	# DEVANAGARI VOWEL SIGN AU
+0xE7	0x0949	# DEVANAGARI VOWEL SIGN CANDRA O
+0xE8	0x094D	# DEVANAGARI SIGN VIRAMA # halant
+0xE9	0x093C	# DEVANAGARI SIGN NUKTA
+0xEA	0x0964	# DEVANAGARI DANDA
+#
+0xF1	0x0966	# DEVANAGARI DIGIT ZERO
+0xF2	0x0967	# DEVANAGARI DIGIT ONE
+0xF3	0x0968	# DEVANAGARI DIGIT TWO
+0xF4	0x0969	# DEVANAGARI DIGIT THREE
+0xF5	0x096A	# DEVANAGARI DIGIT FOUR
+0xF6	0x096B	# DEVANAGARI DIGIT FIVE
+0xF7	0x096C	# DEVANAGARI DIGIT SIX
+0xF8	0x096D	# DEVANAGARI DIGIT SEVEN
+0xF9	0x096E	# DEVANAGARI DIGIT EIGHT
+0xFA	0x096F	# DEVANAGARI DIGIT NINE
--- a/charmap/DINGBATS.TXT
+++ b/charmap/DINGBATS.TXT
@ -0,0 +1,329 @@
+#=======================================================================
+#   File name:  DINGBATS.TXT
+#
+#   Contents:   Map (external version) from Mac OS Dingbats
+#               character set to Unicode 3.2 and later.
+#
+#   Copyright:  (c) 1994-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update mappings for 0x80-0x8D to use new
+#                           Unicode 3.2 characters. Update URLs, notes.
+#                           Matches internal utom<b2>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n05  1998-Feb-05    Update to match internal utom<n4>, ufrm<n14>,
+#                           and Text Encoding Converter version 1.3:
+#                           Change all mappings to single corporate-zone
+#                           Unicodes to either use standard Unicodes
+#                           or standard Unicodes plus transcoding hints;
+#                           see details below. Also update header
+#                           comments to new format.
+#       n03  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<n4>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Dingbats code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode or Unicode sequence
+#       (in hex as 0xNNNN).
+#     Column #3 is a comment containing the Unicode name.
+#       In some cases an additional comment follows the Unicode name.
+#
+#   The entries are in Mac OS Dingbats code order.
+#
+#   Some of these mappings require the use of corporate characters.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Dingbats character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Dingbats:
+# -------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported directly in programming
+#   interfaces for QuickDraw Text, the Script Manager, and related
+#   Text Utilities. For other purposes it is supported via transcoding
+#   to and from Unicode.
+#
+#   The Mac OS Dingbats encoding shares the script code smRoman
+#   (0) with the standard Mac OS Roman encoding. To determine if
+#   the Dingbats encoding is being used, you must check if the
+#   font name is "Zapf Dingbats".
+#
+#   The layout of the Dingbats character set is identical to or
+#   a superset of the layout of the Adobe Zapf Dingbats encoding
+#   vector.
+#
+#   The following code points are unused, and are not shown here:
+#   0x8E-0xA0, 0xF0, 0xFF.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version b02 to version b03/c01:
+#
+#   - The mappings for the following Mac OS Dingbats characters
+#   were changed to use standard Unicode characters added for
+#   Unicode 3.2: 0x80-0x8D.
+#
+#   Changes from version n03 to version n05:
+#
+#   - The mappings for the following Mac OS Dingbats characters
+#   were changed from single corporate-zone Unicode characters
+#   to standard Unicode characters:
+#   0x80-0x81, 0x84-0x87, 0x8A-0x8D.
+#
+#   - The mappings for the following Mac OS Dingbats characters
+#   were changed from single corporate-zone Unicode characters
+#   to combinations of a standard Unicode and a transcoding hint:
+#   0x82-0x83, 0x88-0x89.
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x2701	# UPPER BLADE SCISSORS
+0x22	0x2702	# BLACK SCISSORS
+0x23	0x2703	# LOWER BLADE SCISSORS
+0x24	0x2704	# WHITE SCISSORS
+0x25	0x260E	# BLACK TELEPHONE
+0x26	0x2706	# TELEPHONE LOCATION SIGN
+0x27	0x2707	# TAPE DRIVE
+0x28	0x2708	# AIRPLANE
+0x29	0x2709	# ENVELOPE
+0x2A	0x261B	# BLACK RIGHT POINTING INDEX
+0x2B	0x261E	# WHITE RIGHT POINTING INDEX
+0x2C	0x270C	# VICTORY HAND
+0x2D	0x270D	# WRITING HAND
+0x2E	0x270E	# LOWER RIGHT PENCIL
+0x2F	0x270F	# PENCIL
+0x30	0x2710	# UPPER RIGHT PENCIL
+0x31	0x2711	# WHITE NIB
+0x32	0x2712	# BLACK NIB
+0x33	0x2713	# CHECK MARK
+0x34	0x2714	# HEAVY CHECK MARK
+0x35	0x2715	# MULTIPLICATION X
+0x36	0x2716	# HEAVY MULTIPLICATION X
+0x37	0x2717	# BALLOT X
+0x38	0x2718	# HEAVY BALLOT X
+0x39	0x2719	# OUTLINED GREEK CROSS
+0x3A	0x271A	# HEAVY GREEK CROSS
+0x3B	0x271B	# OPEN CENTRE CROSS
+0x3C	0x271C	# HEAVY OPEN CENTRE CROSS
+0x3D	0x271D	# LATIN CROSS
+0x3E	0x271E	# SHADOWED WHITE LATIN CROSS
+0x3F	0x271F	# OUTLINED LATIN CROSS
+0x40	0x2720	# MALTESE CROSS
+0x41	0x2721	# STAR OF DAVID
+0x42	0x2722	# FOUR TEARDROP-SPOKED ASTERISK
+0x43	0x2723	# FOUR BALLOON-SPOKED ASTERISK
+0x44	0x2724	# HEAVY FOUR BALLOON-SPOKED ASTERISK
+0x45	0x2725	# FOUR CLUB-SPOKED ASTERISK
+0x46	0x2726	# BLACK FOUR POINTED STAR
+0x47	0x2727	# WHITE FOUR POINTED STAR
+0x48	0x2605	# BLACK STAR
+0x49	0x2729	# STRESS OUTLINED WHITE STAR
+0x4A	0x272A	# CIRCLED WHITE STAR
+0x4B	0x272B	# OPEN CENTRE BLACK STAR
+0x4C	0x272C	# BLACK CENTRE WHITE STAR
+0x4D	0x272D	# OUTLINED BLACK STAR
+0x4E	0x272E	# HEAVY OUTLINED BLACK STAR
+0x4F	0x272F	# PINWHEEL STAR
+0x50	0x2730	# SHADOWED WHITE STAR
+0x51	0x2731	# HEAVY ASTERISK
+0x52	0x2732	# OPEN CENTRE ASTERISK
+0x53	0x2733	# EIGHT SPOKED ASTERISK
+0x54	0x2734	# EIGHT POINTED BLACK STAR
+0x55	0x2735	# EIGHT POINTED PINWHEEL STAR
+0x56	0x2736	# SIX POINTED BLACK STAR
+0x57	0x2737	# EIGHT POINTED RECTILINEAR BLACK STAR
+0x58	0x2738	# HEAVY EIGHT POINTED RECTILINEAR BLACK STAR
+0x59	0x2739	# TWELVE POINTED BLACK STAR
+0x5A	0x273A	# SIXTEEN POINTED ASTERISK
+0x5B	0x273B	# TEARDROP-SPOKED ASTERISK
+0x5C	0x273C	# OPEN CENTRE TEARDROP-SPOKED ASTERISK
+0x5D	0x273D	# HEAVY TEARDROP-SPOKED ASTERISK
+0x5E	0x273E	# SIX PETALLED BLACK AND WHITE FLORETTE
+0x5F	0x273F	# BLACK FLORETTE
+0x60	0x2740	# WHITE FLORETTE
+0x61	0x2741	# EIGHT PETALLED OUTLINED BLACK FLORETTE
+0x62	0x2742	# CIRCLED OPEN CENTRE EIGHT POINTED STAR
+0x63	0x2743	# HEAVY TEARDROP-SPOKED PINWHEEL ASTERISK
+0x64	0x2744	# SNOWFLAKE
+0x65	0x2745	# TIGHT TRIFOLIATE SNOWFLAKE
+0x66	0x2746	# HEAVY CHEVRON SNOWFLAKE
+0x67	0x2747	# SPARKLE
+0x68	0x2748	# HEAVY SPARKLE
+0x69	0x2749	# BALLOON-SPOKED ASTERISK
+0x6A	0x274A	# EIGHT TEARDROP-SPOKED PROPELLER ASTERISK
+0x6B	0x274B	# HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK
+0x6C	0x25CF	# BLACK CIRCLE
+0x6D	0x274D	# SHADOWED WHITE CIRCLE
+0x6E	0x25A0	# BLACK SQUARE
+0x6F	0x274F	# LOWER RIGHT DROP-SHADOWED WHITE SQUARE
+0x70	0x2750	# UPPER RIGHT DROP-SHADOWED WHITE SQUARE
+0x71	0x2751	# LOWER RIGHT SHADOWED WHITE SQUARE
+0x72	0x2752	# UPPER RIGHT SHADOWED WHITE SQUARE
+0x73	0x25B2	# BLACK UP-POINTING TRIANGLE
+0x74	0x25BC	# BLACK DOWN-POINTING TRIANGLE
+0x75	0x25C6	# BLACK DIAMOND
+0x76	0x2756	# BLACK DIAMOND MINUS WHITE X
+0x77	0x25D7	# RIGHT HALF BLACK CIRCLE
+0x78	0x2758	# LIGHT VERTICAL BAR
+0x79	0x2759	# MEDIUM VERTICAL BAR
+0x7A	0x275A	# HEAVY VERTICAL BAR
+0x7B	0x275B	# HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT
+0x7C	0x275C	# HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT
+0x7D	0x275D	# HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT
+0x7E	0x275E	# HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT
+#
+0x80	0x2768	# MEDIUM LEFT PARENTHESIS ORNAMENT # for Unicode 3.2 and later
+0x81	0x2769	# MEDIUM RIGHT PARENTHESIS ORNAMENT # for Unicode 3.2 and later
+0x82	0x276A	# MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT # for Unicode 3.2 and later
+0x83	0x276B	# MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT # for Unicode 3.2 and later
+0x84	0x276C	# MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT # for Unicode 3.2 and later
+0x85	0x276D	# MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT # for Unicode 3.2 and later
+0x86	0x276E	# HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT # for Unicode 3.2 and later
+0x87	0x276F	# HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT # for Unicode 3.2 and later
+0x88	0x2770	# HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT # for Unicode 3.2 and later
+0x89	0x2771	# HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT # for Unicode 3.2 and later
+0x8A	0x2772	# LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT # for Unicode 3.2 and later
+0x8B	0x2773	# LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT # for Unicode 3.2 and later
+0x8C	0x2774	# MEDIUM LEFT CURLY BRACKET ORNAMENT # for Unicode 3.2 and later
+0x8D	0x2775	# MEDIUM RIGHT CURLY BRACKET ORNAMENT # for Unicode 3.2 and later
+#
+0xA1	0x2761	# CURVED STEM PARAGRAPH SIGN ORNAMENT
+0xA2	0x2762	# HEAVY EXCLAMATION MARK ORNAMENT
+0xA3	0x2763	# HEAVY HEART EXCLAMATION MARK ORNAMENT
+0xA4	0x2764	# HEAVY BLACK HEART
+0xA5	0x2765	# ROTATED HEAVY BLACK HEART BULLET
+0xA6	0x2766	# FLORAL HEART
+0xA7	0x2767	# ROTATED FLORAL HEART BULLET
+0xA8	0x2663	# BLACK CLUB SUIT
+0xA9	0x2666	# BLACK DIAMOND SUIT
+0xAA	0x2665	# BLACK HEART SUIT
+0xAB	0x2660	# BLACK SPADE SUIT
+0xAC	0x2460	# CIRCLED DIGIT ONE
+0xAD	0x2461	# CIRCLED DIGIT TWO
+0xAE	0x2462	# CIRCLED DIGIT THREE
+0xAF	0x2463	# CIRCLED DIGIT FOUR
+0xB0	0x2464	# CIRCLED DIGIT FIVE
+0xB1	0x2465	# CIRCLED DIGIT SIX
+0xB2	0x2466	# CIRCLED DIGIT SEVEN
+0xB3	0x2467	# CIRCLED DIGIT EIGHT
+0xB4	0x2468	# CIRCLED DIGIT NINE
+0xB5	0x2469	# CIRCLED NUMBER TEN
+0xB6	0x2776	# DINGBAT NEGATIVE CIRCLED DIGIT ONE
+0xB7	0x2777	# DINGBAT NEGATIVE CIRCLED DIGIT TWO
+0xB8	0x2778	# DINGBAT NEGATIVE CIRCLED DIGIT THREE
+0xB9	0x2779	# DINGBAT NEGATIVE CIRCLED DIGIT FOUR
+0xBA	0x277A	# DINGBAT NEGATIVE CIRCLED DIGIT FIVE
+0xBB	0x277B	# DINGBAT NEGATIVE CIRCLED DIGIT SIX
+0xBC	0x277C	# DINGBAT NEGATIVE CIRCLED DIGIT SEVEN
+0xBD	0x277D	# DINGBAT NEGATIVE CIRCLED DIGIT EIGHT
+0xBE	0x277E	# DINGBAT NEGATIVE CIRCLED DIGIT NINE
+0xBF	0x277F	# DINGBAT NEGATIVE CIRCLED NUMBER TEN
+0xC0	0x2780	# DINGBAT CIRCLED SANS-SERIF DIGIT ONE
+0xC1	0x2781	# DINGBAT CIRCLED SANS-SERIF DIGIT TWO
+0xC2	0x2782	# DINGBAT CIRCLED SANS-SERIF DIGIT THREE
+0xC3	0x2783	# DINGBAT CIRCLED SANS-SERIF DIGIT FOUR
+0xC4	0x2784	# DINGBAT CIRCLED SANS-SERIF DIGIT FIVE
+0xC5	0x2785	# DINGBAT CIRCLED SANS-SERIF DIGIT SIX
+0xC6	0x2786	# DINGBAT CIRCLED SANS-SERIF DIGIT SEVEN
+0xC7	0x2787	# DINGBAT CIRCLED SANS-SERIF DIGIT EIGHT
+0xC8	0x2788	# DINGBAT CIRCLED SANS-SERIF DIGIT NINE
+0xC9	0x2789	# DINGBAT CIRCLED SANS-SERIF NUMBER TEN
+0xCA	0x278A	# DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE
+0xCB	0x278B	# DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT TWO
+0xCC	0x278C	# DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT THREE
+0xCD	0x278D	# DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FOUR
+0xCE	0x278E	# DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FIVE
+0xCF	0x278F	# DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SIX
+0xD0	0x2790	# DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SEVEN
+0xD1	0x2791	# DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT EIGHT
+0xD2	0x2792	# DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE
+0xD3	0x2793	# DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN
+0xD4	0x2794	# HEAVY WIDE-HEADED RIGHTWARDS ARROW
+0xD5	0x2192	# RIGHTWARDS ARROW
+0xD6	0x2194	# LEFT RIGHT ARROW
+0xD7	0x2195	# UP DOWN ARROW
+0xD8	0x2798	# HEAVY SOUTH EAST ARROW
+0xD9	0x2799	# HEAVY RIGHTWARDS ARROW
+0xDA	0x279A	# HEAVY NORTH EAST ARROW
+0xDB	0x279B	# DRAFTING POINT RIGHTWARDS ARROW
+0xDC	0x279C	# HEAVY ROUND-TIPPED RIGHTWARDS ARROW
+0xDD	0x279D	# TRIANGLE-HEADED RIGHTWARDS ARROW
+0xDE	0x279E	# HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW
+0xDF	0x279F	# DASHED TRIANGLE-HEADED RIGHTWARDS ARROW
+0xE0	0x27A0	# HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW
+0xE1	0x27A1	# BLACK RIGHTWARDS ARROW
+0xE2	0x27A2	# THREE-D TOP-LIGHTED RIGHTWARDS ARROWHEAD
+0xE3	0x27A3	# THREE-D BOTTOM-LIGHTED RIGHTWARDS ARROWHEAD
+0xE4	0x27A4	# BLACK RIGHTWARDS ARROWHEAD
+0xE5	0x27A5	# HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW
+0xE6	0x27A6	# HEAVY BLACK CURVED UPWARDS AND RIGHTWARDS ARROW
+0xE7	0x27A7	# SQUAT BLACK RIGHTWARDS ARROW
+0xE8	0x27A8	# HEAVY CONCAVE-POINTED BLACK RIGHTWARDS ARROW
+0xE9	0x27A9	# RIGHT-SHADED WHITE RIGHTWARDS ARROW
+0xEA	0x27AA	# LEFT-SHADED WHITE RIGHTWARDS ARROW
+0xEB	0x27AB	# BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW
+0xEC	0x27AC	# FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW
+0xED	0x27AD	# HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
+0xEE	0x27AE	# HEAVY UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
+0xEF	0x27AF	# NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
+#
+0xF1	0x27B1	# NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
+0xF2	0x27B2	# CIRCLED HEAVY WHITE RIGHTWARDS ARROW
+0xF3	0x27B3	# WHITE-FEATHERED RIGHTWARDS ARROW
+0xF4	0x27B4	# BLACK-FEATHERED SOUTH EAST ARROW
+0xF5	0x27B5	# BLACK-FEATHERED RIGHTWARDS ARROW
+0xF6	0x27B6	# BLACK-FEATHERED NORTH EAST ARROW
+0xF7	0x27B7	# HEAVY BLACK-FEATHERED SOUTH EAST ARROW
+0xF8	0x27B8	# HEAVY BLACK-FEATHERED RIGHTWARDS ARROW
+0xF9	0x27B9	# HEAVY BLACK-FEATHERED NORTH EAST ARROW
+0xFA	0x27BA	# TEARDROP-BARBED RIGHTWARDS ARROW
+0xFB	0x27BB	# HEAVY TEARDROP-SHANKED RIGHTWARDS ARROW
+0xFC	0x27BC	# WEDGE-TAILED RIGHTWARDS ARROW
+0xFD	0x27BD	# HEAVY WEDGE-TAILED RIGHTWARDS ARROW
+0xFE	0x27BE	# OPEN-OUTLINED RIGHTWARDS ARROW
--- a/charmap/FARSI.TXT
+++ b/charmap/FARSI.TXT
@ -0,0 +1,521 @@
+#=======================================================================
+#   File name:  FARSI.TXT
+#
+#   Contents:   Map (external version) from Mac OS Farsi
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1997-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Add comments about character display and
+#                           direction overrides. Update URLs, notes.
+#                           Matches internal utom<b3>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n04  1998-Feb-05    Show required Unicode character
+#                           directionality in a different way. Matches
+#                           internal utom<n3>, ufrm<n9>, and Text
+#                           Encoding Converter version 1.3. Update
+#                           header comments; include information on
+#                           loose mapping of digits, and changes to
+#                           mapping for the TrueType variant.
+#       n01  1997-Jul-17    First version. Matches internal utom<n1>,
+#                           ufrm<n2>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Farsi code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN),
+#       possibly preceded by a tag indicating required directionality
+#       (i.e. <LR>+0xNNNN or <RL>+0xNNNN).
+#     Column #3 is a comment containing the Unicode name.
+#
+#   The entries are in Mac OS Farsi code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Farsi character set uses the standard control characters at
+#   0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Farsi:
+# ----------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   1. General
+#
+#   The Mac OS Farsi character set is based on the Mac OS Arabic
+#   character set. The main difference is in the right-to-left digits
+#   0xB0-0xB9: For Mac OS Arabic these correspond to right-left
+#   versions of the Unicode ARABIC-INDIC DIGITs 0660-0669; for
+#   Mac OS Farsi these correspond to right-left versions of the
+#   Unicode EXTENDED ARABIC-INDIC DIGITs 06F0-06F9. The other
+#   difference is in the nature of the font variants.
+#
+#   For more information, see the comments in the mapping table for
+#   Mac OS Arabic.
+#
+#   Mac OS Farsi characters 0xEB-0xF2 are non-spacing/combining marks.
+#
+#   2. Directional characters and roundtrip fidelity
+#
+#   The Mac OS Arabic character set (on which Mac OS Farsi is based)
+#   was developed in 1986-1987. At that time the bidirectional line
+#   layout algorithm used in the Mac OS Arabic system was fairly simple;
+#   it used only a few direction classes (instead of the 19 now used in
+#   the Unicode bidirectional algorithm). In order to permit users to
+#   handle some tricky layout problems, certain punctuation and symbol
+#   characters were encoded twice, one with a left-right direction
+#   attribute and the other with a right-left direction attribute. This
+#   is the case in Mac OS Farsi too.
+#
+#   For example, plus sign is encoded at 0x2B with a left-right
+#   attribute, and at 0xAB with a right-left attribute. However, there
+#   is only one PLUS SIGN character in Unicode. This leads to some
+#   interesting problems when mapping between Mac OS Farsi and Unicode;
+#   see below.
+#
+#   A related problem is that even when a particular character is
+#   encoded only once in Mac OS Farsi, it may have a different
+#   direction attribute than the corresponding Unicode character.
+#
+#   For example, the Mac OS Farsi character at 0x93 is HORIZONTAL
+#   ELLIPSIS with strong right-left direction. However, the Unicode
+#   character HORIZONTAL ELLIPSIS has direction class neutral.
+#
+#   3. Behavior of ASCII-range numbers in WorldScript
+#
+#   Mac OS Farsi also has two sets of digit codes.
+
+#   The digits at 0x30-0x39 may be displayed using either European
+#   digit forms or Persian digit forms, depending on context. If there
+#   is a "strong European" character such as a Latin letter on either
+#   side of a sequence consisting of digits 0x30-0x39 and possibly comma
+#   0x2C or period 0x2E, then the characters will be displayed using
+#   European forms (This will happen even if there are neutral characters
+#   between the digits and the strong European character). Otherwise, the
+#   digits will be displayed using Persian forms, the comma will be
+#   displayed as Arabic thousands separator, and the period as Arabic
+#   decimal separator. In any case, 0x2C, 0x2E, and 0x30-0x39 are always
+#   left-right.
+#
+#   The digits at 0xB0-0xB9 are always displayed using Persian digit
+#   shapes, and moreover, these digits always have strong right-left
+#   directionality. These are mainly intended for special layout
+#   purposes such as part numbers, etc.
+#
+#   4. Font variants
+#
+#   The table in this file gives the Unicode mappings for the standard
+#   Mac OS Farsi encoding. This encoding is supported by the Tehran font
+#   (the system font for Farsi), and is the encoding supported by the
+#   text processing utilities. However, the other Farsi fonts actually
+#   implement a somewhat different encoding; this affects nine code
+#   points including 0xAA and 0xC0 (which are also affected by font
+#   variants in Mac OS Arabic). For these nine code points the standard
+#   Mac OS Farsi encoding has the following mappings:
+#       0x8B -> 0x06BA ARABIC LETTER NOON GHUNNA (Urdu)
+#       0xA4 -> <RL>+0x0024 DOLLAR SIGN, right-left
+#       0xAA -> <RL>+0x002A ASTERISK, right-left
+#       0xC0 -> <RL>+0x274A EIGHT TEARDROP-SPOKED PROPELLER ASTERISK,
+#               right-left
+#       0xF4 -> 0x0679 ARABIC LETTER TTEH (Urdu)
+#       0xF7 -> 0x06A4 ARABIC LETTER VEH (for transliteration)
+#       0xF9 -> 0x0688 ARABIC LETTER DDAL (Urdu)
+#       0xFA -> 0x0691 ARABIC LETTER RREH (Urdu)
+#       0xFF -> 0x06D2 ARABIC LETTER YEH BARREE (Urdu)
+#
+#   The TrueType variant is used for the Farsi TrueType fonts: Ashfahan,
+#   Amir, Kamran, Mashad, NadeemFarsi. It differs from the standard
+#   variant in the following ways:
+#       0x8B -> 0xF882 Arabic ligature "peace on him" (corporate char.)
+#       0xA4 -> 0xFDFC RIAL SIGN (added in Unicode 3.2)
+#       0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
+#       0xC0 -> <RL>+0x002A ASTERISK, right-left
+#       0xF4 -> <RL>+0x00B0 DEGREE SIGN, right-left
+#       0xF7 -> 0xFDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
+#       0xF9 -> <RL>+0x25CF BLACK CIRCLE, right-left
+#       0xFA -> <RL>+0x25A0 BLACK SQUARE, right-left
+#       0xFF -> <RL>+0x25B2 BLACK UP-POINTING TRIANGLE, right-left
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   1. Matching the direction of Mac OS Farsi characters
+#
+#   When Mac OS Farsi encodes a character twice but with different
+#   direction attributes for the two code points - as in the case of
+#   plus sign mentioned above - we need a way to map both Mac OS Farsi
+#   code points to Unicode and back again without loss of information.
+#   With the plus sign, for example, mapping one of the Mac OS Farsi
+#   characters to a code in the Unicode corporate use zone is
+#   undesirable, since both of the plus sign characters are likely to
+#   be used in text that is interchanged.
+#
+#   The problem is solved with the use of direction override characters
+#   and direction-dependent mappings. When mapping from Mac OS Farsi
+#   to Unicode, we use direction overrides as necessary to force the
+#   direction of the resulting Unicode characters.
+#
+#   The required direction is indicated by a direction tag in the
+#   mappings. A tag of <LR> means the corresponding Unicode character
+#   must have a strong left-right context, and a tag of <RL> indicates
+#   a right-left context.
+#
+#   For example, the mapping of 0x2B is given as <LR>+0x002B; the
+#   mapping of 0xAB is given as <RL>+0x002B. If we map an isolated
+#   instance of 0x2B to Unicode, it should be mapped as follows (LRO
+#   indicates LEFT-RIGHT OVERRIDE, PDF indicates POP DIRECTION
+#   FORMATTING):
+#
+#     0x2B ->  0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
+#
+#   When mapping several characters in a row that require direction
+#   forcing, the overrides need only be used at the beginning and end.
+#   For example:
+#
+#     0x24 0x20 0x28 0x29 -> 0x202D 0x0024 0x0020 0x0028 0x0029 0x202C
+#
+#   If neutral characters that require direction forcing are already
+#   between strong-direction characters with matching directionality,
+#   then direction overrides need not be used. Direction overrides are
+#   always needed to map the right-left digits at 0xB0-0xB9.
+#
+#   When mapping from Unicode to Mac OS Farsi, the Unicode
+#   bidirectional algorithm should be used to determine resolved
+#   direction of the Unicode characters. The mapping from Unicode to
+#   Mac OS Farsi can then be disambiguated by the use of the resolved
+#   direction:
+#
+#     Unicode 0x002B -> Mac OS Farsi 0x2B (if L) or 0xAB (if R)
+#
+#   However, this also means the direction override characters should
+#   be discarded when mapping from Unicode to Mac OS Farsi (after
+#   they have been used to determine resolved direction), since the
+#   direction override information is carried by the code point itself.
+#
+#   Even when direction overrides are not needed for roundtrip
+#   fidelity, they are sometimes used when mapping Mac OS Farsi
+#   characters to Unicode in order to achieve similar text layout with
+#   the resulting Unicode text. For example, the single Mac OS Farsi
+#   ellipsis character has direction class right-left,and there is no
+#   left-right version. However, the Unicode HORIZONTAL ELLIPSIS
+#   character has direction class neutral (which means it may end up
+#   with a resolved direction of left-right if surrounded by left-right
+#   characters). When mapping the Mac OS Farsi ellipsis to Unicode, it
+#   is surrounded with a direction override to help preserve proper
+#   text layout. The resolved direction is not needed or used when
+#   mapping the Unicode HORIZONTAL ELLIPSIS back to Mac OS Farsi.
+#
+#   2. Mapping the Mac OS Farsi digits
+#
+#   The main table below contains mappings that should be used when
+#   strict round-trip fidelity is required. However, for numeric
+#   values, the mappings in that table will produce Unicode characters
+#   that may appear different than the Mac OS Farsi text displayed on
+#   a Mac OS system using WorldScript. This is because WorldScript
+#   uses context-dependent display for the 0x30-0x39 digits.
+#
+#   If roundtrip fidelity is not required, then the following
+#   alternate mappings should be used when a sequence of 0x30-0x39
+#   digits - possibly including 0x2C and 0x2E - occurs in an Arabic
+#   context (that is, when the first "strong" character on either side
+#   of the digit sequence is Arabic, or there is no strong character):
+#
+#     0x2C	0x066C	# ARABIC THOUSANDS SEPARATOR
+#     0x2E	0x066B	# ARABIC DECIMAL SEPARATOR
+#     0x30	0x06F0	# EXTENDED ARABIC-INDIC DIGIT ZERO
+#     0x31	0x06F1	# EXTENDED ARABIC-INDIC DIGIT ONE
+#     0x32	0x06F2	# EXTENDED ARABIC-INDIC DIGIT TWO
+#     0x33	0x06F3	# EXTENDED ARABIC-INDIC DIGIT THREE
+#     0x34	0x06F4	# EXTENDED ARABIC-INDIC DIGIT FOUR
+#     0x35	0x06F5	# EXTENDED ARABIC-INDIC DIGIT FIVE
+#     0x36	0x06F6	# EXTENDED ARABIC-INDIC DIGIT SIX
+#     0x37	0x06F7	# EXTENDED ARABIC-INDIC DIGIT SEVEN
+#     0x38	0x06F8	# EXTENDED ARABIC-INDIC DIGIT EIGHT
+#     0x39	0x06F9	# EXTENDED ARABIC-INDIC DIGIT NINE
+#
+#   3. Use of corporate-zone Unicodes (mapping the TrueType variant)
+#
+#   The following corporate zone Unicode character is used in this
+#   mapping:
+#
+#     0xF882  Arabic ligature "peace on him"
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version b02 to version b03/c01:
+#
+#   - Update mapping of 0xA4 in TrueType variant to use new Unicode
+#     character U+FDFC RIAL SIGN addded for Unicode 3.2
+#
+#   Changes from version n01 to version n04:
+#
+#   - Change mapping of 0xA4 in TrueType variant (just described in
+#     header comment) from single corporate character to use
+#     grouping hint
+#
+##################
+
+0x20	<LR>+0x0020	# SPACE, left-right
+0x21	<LR>+0x0021	# EXCLAMATION MARK, left-right
+0x22	<LR>+0x0022	# QUOTATION MARK, left-right
+0x23	<LR>+0x0023	# NUMBER SIGN, left-right
+0x24	<LR>+0x0024	# DOLLAR SIGN, left-right
+0x25	<LR>+0x0025	# PERCENT SIGN, left-right
+0x26	<LR>+0x0026	# AMPERSAND, left-right
+0x27	<LR>+0x0027	# APOSTROPHE, left-right
+0x28	<LR>+0x0028	# LEFT PARENTHESIS, left-right
+0x29	<LR>+0x0029	# RIGHT PARENTHESIS, left-right
+0x2A	<LR>+0x002A	# ASTERISK, left-right
+0x2B	<LR>+0x002B	# PLUS SIGN, left-right
+0x2C	<LR>+0x002C	# COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
+0x2D	<LR>+0x002D	# HYPHEN-MINUS, left-right
+0x2E	<LR>+0x002E	# FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
+0x2F	<LR>+0x002F	# SOLIDUS, left-right
+0x30	0x0030	# DIGIT ZERO;  in Arabic-script context, displayed as 0x06F0 EXTENDED ARABIC-INDIC DIGIT ZERO
+0x31	0x0031	# DIGIT ONE;   in Arabic-script context, displayed as 0x06F1 EXTENDED ARABIC-INDIC DIGIT ONE
+0x32	0x0032	# DIGIT TWO;   in Arabic-script context, displayed as 0x06F2 EXTENDED ARABIC-INDIC DIGIT TWO
+0x33	0x0033	# DIGIT THREE; in Arabic-script context, displayed as 0x06F3 EXTENDED ARABIC-INDIC DIGIT THREE
+0x34	0x0034	# DIGIT FOUR;  in Arabic-script context, displayed as 0x06F4 EXTENDED ARABIC-INDIC DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE;  in Arabic-script context, displayed as 0x06F5 EXTENDED ARABIC-INDIC DIGIT FIVE
+0x36	0x0036	# DIGIT SIX;   in Arabic-script context, displayed as 0x06F6 EXTENDED ARABIC-INDIC DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN; in Arabic-script context, displayed as 0x06F7 EXTENDED ARABIC-INDIC DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT; in Arabic-script context, displayed as 0x06F8 EXTENDED ARABIC-INDIC DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE;  in Arabic-script context, displayed as 0x06F9 EXTENDED ARABIC-INDIC DIGIT NINE
+0x3A	<LR>+0x003A	# COLON, left-right
+0x3B	<LR>+0x003B	# SEMICOLON, left-right
+0x3C	<LR>+0x003C	# LESS-THAN SIGN, left-right
+0x3D	<LR>+0x003D	# EQUALS SIGN, left-right
+0x3E	<LR>+0x003E	# GREATER-THAN SIGN, left-right
+0x3F	<LR>+0x003F	# QUESTION MARK, left-right
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	<LR>+0x005B	# LEFT SQUARE BRACKET, left-right
+0x5C	<LR>+0x005C	# REVERSE SOLIDUS, left-right
+0x5D	<LR>+0x005D	# RIGHT SQUARE BRACKET, left-right
+0x5E	<LR>+0x005E	# CIRCUMFLEX ACCENT, left-right
+0x5F	<LR>+0x005F	# LOW LINE, left-right
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	<LR>+0x007B	# LEFT CURLY BRACKET, left-right
+0x7C	<LR>+0x007C	# VERTICAL LINE, left-right
+0x7D	<LR>+0x007D	# RIGHT CURLY BRACKET, left-right
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	<RL>+0x00A0	# NO-BREAK SPACE, right-left
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x06BA	# ARABIC LETTER NOON GHUNNA
+0x8C	<RL>+0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	<RL>+0x2026	# HORIZONTAL ELLIPSIS, right-left
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	<RL>+0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	<RL>+0x00F7	# DIVISION SIGN, right-left
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	<RL>+0x0020	# SPACE, right-left
+0xA1	<RL>+0x0021	# EXCLAMATION MARK, right-left
+0xA2	<RL>+0x0022	# QUOTATION MARK, right-left
+0xA3	<RL>+0x0023	# NUMBER SIGN, right-left
+0xA4	<RL>+0x0024	# DOLLAR SIGN, right-left
+0xA5	0x066A	# ARABIC PERCENT SIGN
+0xA6	<RL>+0x0026	# AMPERSAND, right-left
+0xA7	<RL>+0x0027	# APOSTROPHE, right-left
+0xA8	<RL>+0x0028	# LEFT PARENTHESIS, right-left
+0xA9	<RL>+0x0029	# RIGHT PARENTHESIS, right-left
+0xAA	<RL>+0x002A	# ASTERISK, right-left
+0xAB	<RL>+0x002B	# PLUS SIGN, right-left
+0xAC	0x060C	# ARABIC COMMA
+0xAD	<RL>+0x002D	# HYPHEN-MINUS, right-left
+0xAE	<RL>+0x002E	# FULL STOP, right-left
+0xAF	<RL>+0x002F	# SOLIDUS, right-left
+0xB0	<RL>+0x06F0	# EXTENDED ARABIC-INDIC DIGIT ZERO, right-left (need override)
+0xB1	<RL>+0x06F1	# EXTENDED ARABIC-INDIC DIGIT ONE, right-left (need override)
+0xB2	<RL>+0x06F2	# EXTENDED ARABIC-INDIC DIGIT TWO, right-left (need override)
+0xB3	<RL>+0x06F3	# EXTENDED ARABIC-INDIC DIGIT THREE, right-left (need override)
+0xB4	<RL>+0x06F4	# EXTENDED ARABIC-INDIC DIGIT FOUR, right-left (need override)
+0xB5	<RL>+0x06F5	# EXTENDED ARABIC-INDIC DIGIT FIVE, right-left (need override)
+0xB6	<RL>+0x06F6	# EXTENDED ARABIC-INDIC DIGIT SIX, right-left (need override)
+0xB7	<RL>+0x06F7	# EXTENDED ARABIC-INDIC DIGIT SEVEN, right-left (need override)
+0xB8	<RL>+0x06F8	# EXTENDED ARABIC-INDIC DIGIT EIGHT, right-left (need override)
+0xB9	<RL>+0x06F9	# EXTENDED ARABIC-INDIC DIGIT NINE, right-left (need override)
+0xBA	<RL>+0x003A	# COLON, right-left
+0xBB	0x061B	# ARABIC SEMICOLON
+0xBC	<RL>+0x003C	# LESS-THAN SIGN, right-left
+0xBD	<RL>+0x003D	# EQUALS SIGN, right-left
+0xBE	<RL>+0x003E	# GREATER-THAN SIGN, right-left
+0xBF	0x061F	# ARABIC QUESTION MARK
+0xC0	<RL>+0x274A	# EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
+0xC1	0x0621	# ARABIC LETTER HAMZA
+0xC2	0x0622	# ARABIC LETTER ALEF WITH MADDA ABOVE
+0xC3	0x0623	# ARABIC LETTER ALEF WITH HAMZA ABOVE
+0xC4	0x0624	# ARABIC LETTER WAW WITH HAMZA ABOVE
+0xC5	0x0625	# ARABIC LETTER ALEF WITH HAMZA BELOW
+0xC6	0x0626	# ARABIC LETTER YEH WITH HAMZA ABOVE
+0xC7	0x0627	# ARABIC LETTER ALEF
+0xC8	0x0628	# ARABIC LETTER BEH
+0xC9	0x0629	# ARABIC LETTER TEH MARBUTA
+0xCA	0x062A	# ARABIC LETTER TEH
+0xCB	0x062B	# ARABIC LETTER THEH
+0xCC	0x062C	# ARABIC LETTER JEEM
+0xCD	0x062D	# ARABIC LETTER HAH
+0xCE	0x062E	# ARABIC LETTER KHAH
+0xCF	0x062F	# ARABIC LETTER DAL
+0xD0	0x0630	# ARABIC LETTER THAL
+0xD1	0x0631	# ARABIC LETTER REH
+0xD2	0x0632	# ARABIC LETTER ZAIN
+0xD3	0x0633	# ARABIC LETTER SEEN
+0xD4	0x0634	# ARABIC LETTER SHEEN
+0xD5	0x0635	# ARABIC LETTER SAD
+0xD6	0x0636	# ARABIC LETTER DAD
+0xD7	0x0637	# ARABIC LETTER TAH
+0xD8	0x0638	# ARABIC LETTER ZAH
+0xD9	0x0639	# ARABIC LETTER AIN
+0xDA	0x063A	# ARABIC LETTER GHAIN
+0xDB	<RL>+0x005B	# LEFT SQUARE BRACKET, right-left
+0xDC	<RL>+0x005C	# REVERSE SOLIDUS, right-left
+0xDD	<RL>+0x005D	# RIGHT SQUARE BRACKET, right-left
+0xDE	<RL>+0x005E	# CIRCUMFLEX ACCENT, right-left
+0xDF	<RL>+0x005F	# LOW LINE, right-left
+0xE0	0x0640	# ARABIC TATWEEL
+0xE1	0x0641	# ARABIC LETTER FEH
+0xE2	0x0642	# ARABIC LETTER QAF
+0xE3	0x0643	# ARABIC LETTER KAF
+0xE4	0x0644	# ARABIC LETTER LAM
+0xE5	0x0645	# ARABIC LETTER MEEM
+0xE6	0x0646	# ARABIC LETTER NOON
+0xE7	0x0647	# ARABIC LETTER HEH
+0xE8	0x0648	# ARABIC LETTER WAW
+0xE9	0x0649	# ARABIC LETTER ALEF MAKSURA
+0xEA	0x064A	# ARABIC LETTER YEH
+0xEB	0x064B	# ARABIC FATHATAN
+0xEC	0x064C	# ARABIC DAMMATAN
+0xED	0x064D	# ARABIC KASRATAN
+0xEE	0x064E	# ARABIC FATHA
+0xEF	0x064F	# ARABIC DAMMA
+0xF0	0x0650	# ARABIC KASRA
+0xF1	0x0651	# ARABIC SHADDA
+0xF2	0x0652	# ARABIC SUKUN
+0xF3	0x067E	# ARABIC LETTER PEH
+0xF4	0x0679	# ARABIC LETTER TTEH
+0xF5	0x0686	# ARABIC LETTER TCHEH
+0xF6	0x06D5	# ARABIC LETTER AE
+0xF7	0x06A4	# ARABIC LETTER VEH
+0xF8	0x06AF	# ARABIC LETTER GAF
+0xF9	0x0688	# ARABIC LETTER DDAL
+0xFA	0x0691	# ARABIC LETTER RREH
+0xFB	<RL>+0x007B	# LEFT CURLY BRACKET, right-left
+0xFC	<RL>+0x007C	# VERTICAL LINE, right-left
+0xFD	<RL>+0x007D	# RIGHT CURLY BRACKET, right-left
+0xFE	0x0698	# ARABIC LETTER JEH
+0xFF	0x06D2	# ARABIC LETTER YEH BARREE
--- a/charmap/GAELIC.TXT
+++ b/charmap/GAELIC.TXT
@ -0,0 +1,337 @@
+#=======================================================================
+#   File name:  GAELIC.TXT
+#
+#   Contents:   Map (external version) from Mac OS Celtic
+#               character set to Unicode 3.0 and later
+#
+#   Contacts:   charsets@apple.com, everson@evertype.com
+#
+#   Changes:
+#
+#       c01  2005-Apr-01    First posted version. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Gaelic code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Gaelic code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Gaelic character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Gaelic (partly from Michael Everson):
+# -----------------------------------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   This character set was developed by Michael Everson of Everson
+#   Typography (everson@evertype.com) and was used for fonts in his
+#   Celtic Utilities and CeltScript font packages for the Mac, as well
+#   as some fonts included with the Irish localizations of Mac OS 6.0.8
+#   and 7.1. Note that while Apple authorized this Irish localization,
+#   it was not a system which shipped with Apple hardware, and was not
+#   otherwise supported by Apple. Fonts conforming to the Mac OS Gaelic
+#   character set are available from Everson Typography
+#   (http://www.evertype.com/celtscript/). Information about the use of
+#   this character set is available at
+#   http://www.evertype.com/celtscript/celtcode.html.
+#
+#   The Mac OS Gaelic encoding shares the script code smRoman (0) with
+#   the standard Mac OS Roman encoding. To determine if the Gaelic
+#   encoding is being used in Mac OS 7-9, you should also check if the
+#   system region code is 81. Otherwise, you can check for particular
+#   fonts that conform to this encoding (since in practice Gaelic fonts
+#   are used with the ordinary US or UK system versions).
+#
+#   This character set is a variant of standard Mac OS Roman, adding
+#   capital and small y with acute, grave, and circumflex; capital and
+#   small w with acute, grave, circumflex and diaeresis; capital and
+#   small b, c, d, f, g, m, p, s, t with dot above; tironian et; small
+#   long r, small long s, and small long s with dot above. It has 36
+#   code point differences from standard Mac OS Roman.
+#
+#   Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
+#   mapped to U+00A4. In Mac OS 8.5 and later versions, code point
+#   0xDB is changed to EURO SIGN and maps to U+20AC; the standard
+#   Apple fonts are updated for Mac OS 8.5 to reflect this. There is
+#   a "currency sign" variant of the Latin 8 Extended encoding that still
+#   maps 0xDB to U+00A4; this can be used for older fonts.
+#   Note: U+20AC is new with Unicode 2.1; for earlier Unicode
+#   versions, Latin 8 Extended 0xDB may be mapped to private-use
+#   character U+F8A0.
+#
+#   Before Unicode 3.0, code point 0xE4 was PER MILLE SIGN, and was
+#   mapped to U+2030. Since August 1998, code point 0xE4 is changed
+#   to TIRONIAN SIGN ET and maps to U+204A. There is a "per mille
+#   sign" variant of the Mac OS Gaelic encoding that still
+#   maps 0xE4 to U+2030; this can be used for older fonts.
+#   Note: U+204A is new with Unicode 3.0; for earlier Unicode
+#   versions, Mac OS Gaelic was unified with AMPERSAND.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x00C5	# LATIN CAPITAL LETTER A WITH RING ABOVE
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x00E3	# LATIN SMALL LETTER A WITH TILDE
+0x8C	0x00E5	# LATIN SMALL LETTER A WITH RING ABOVE
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	0x00EC	# LATIN SMALL LETTER I WITH GRAVE
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	0x00F2	# LATIN SMALL LETTER O WITH GRAVE
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00F5	# LATIN SMALL LETTER O WITH TILDE
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	0x2020	# DAGGER
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x00A2	# CENT SIGN
+0xA3	0x00A3	# POUND SIGN
+0xA4	0x00A7	# SECTION SIGN
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x00DF	# LATIN SMALL LETTER SHARP S
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x00B4	# ACUTE ACCENT
+0xAC	0x00A8	# DIAERESIS
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x00C6	# LATIN CAPITAL LETTER AE
+0xAF	0x00D8	# LATIN CAPITAL LETTER O WITH STROKE
+0xB0	0x1E02	# LATIN CAPITAL LETTER B WITH DOT ABOVE
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x1E03	# LATIN SMALL LETTER B WITH DOT ABOVE
+0xB5	0x010A	# LATIN CAPITAL LETTER C WITH DOT ABOVE
+0xB6	0x010B	# LATIN SMALL LETTER C WITH DOT ABOVE
+0xB7	0x1E0A	# LATIN CAPITAL LETTER D WITH DOT ABOVE
+0xB8	0x1E0B	# LATIN SMALL LETTER D WITH DOT ABOVE
+0xB9	0x1E1E	# LATIN CAPITAL LETTER F WITH DOT ABOVE
+0xBA	0x1E1F	# LATIN SMALL LETTER F WITH DOT ABOVE
+0xBB	0x0120	# LATIN CAPITAL LETTER G WITH DOT ABOVE
+0xBC	0x0121	# LATIN SMALL LETTER G WITH DOT ABOVE
+0xBD	0x1E40	# LATIN CAPITAL LETTER M WITH DOT ABOVE
+0xBE	0x00E6	# LATIN SMALL LETTER AE
+0xBF	0x00F8	# LATIN SMALL LETTER O WITH STROKE
+0xC0	0x1E41	# LATIN SMALL LETTER M WITH DOT ABOVE
+0xC1	0x1E56	# LATIN CAPITAL LETTER P WITH DOT ABOVE
+0xC2	0x1E57	# LATIN SMALL LETTER P WITH DOT ABOVE
+0xC3	0x027C	# LATIN SMALL LETTER R WITH LONG LEG
+0xC4	0x0192	# LATIN SMALL LETTER F WITH HOOK
+0xC5	0x017F	# LATIN SMALL LETTER LONG S
+0xC6	0x1E60	# LATIN CAPITAL LETTER S WITH DOT ABOVE
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x00C0	# LATIN CAPITAL LETTER A WITH GRAVE
+0xCC	0x00C3	# LATIN CAPITAL LETTER A WITH TILDE
+0xCD	0x00D5	# LATIN CAPITAL LETTER O WITH TILDE
+0xCE	0x0152	# LATIN CAPITAL LIGATURE OE
+0xCF	0x0153	# LATIN SMALL LIGATURE OE
+0xD0	0x2013	# EN DASH
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x1E61	# LATIN SMALL LETTER S WITH DOT ABOVE
+0xD7	0x1E9B	# LATIN SMALL LETTER LONG S WITH DOT ABOVE
+0xD8	0x00FF	# LATIN SMALL LETTER Y WITH DIAERESIS
+0xD9	0x0178	# LATIN CAPITAL LETTER Y WITH DIAERESIS
+0xDA	0x1E6A	# LATIN CAPITAL LETTER T WITH DOT ABOVE
+0xDB	0x20AC	# EURO SIGN # before Mac OS 8.5 this was U+00A4 CURRENCY SIGN
+0xDC	0x2039	# SINGLE LEFT-POINTING ANGLE QUOTATION MARK
+0xDD	0x203A	# SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
+0xDE	0x0176	# LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
+0xDF	0x0177	# LATIN SMALL LETTER Y WITH CIRCUMFLEX
+0xE0	0x1E6B	# LATIN SMALL LETTER T WITH DOT ABOVE
+0xE1	0x00B7	# MIDDLE DOT
+0xE2	0x1EF2	# LATIN CAPITAL LETTER Y WITH GRAVE
+0xE3	0x1EF3	# LATIN SMALL LETTER Y WITH GRAVE
+0xE4	0x204A	# TIRONIAN SIGN ET # change from MacCeltic for Unicode 3.0; before Aug. 1998 this was U+2030 PER MILLE SIGN
+0xE5	0x00C2	# LATIN CAPITAL LETTER A WITH CIRCUMFLEX
+0xE6	0x00CA	# LATIN CAPITAL LETTER E WITH CIRCUMFLEX
+0xE7	0x00C1	# LATIN CAPITAL LETTER A WITH ACUTE
+0xE8	0x00CB	# LATIN CAPITAL LETTER E WITH DIAERESIS
+0xE9	0x00C8	# LATIN CAPITAL LETTER E WITH GRAVE
+0xEA	0x00CD	# LATIN CAPITAL LETTER I WITH ACUTE
+0xEB	0x00CE	# LATIN CAPITAL LETTER I WITH CIRCUMFLEX
+0xEC	0x00CF	# LATIN CAPITAL LETTER I WITH DIAERESIS
+0xED	0x00CC	# LATIN CAPITAL LETTER I WITH GRAVE
+0xEE	0x00D3	# LATIN CAPITAL LETTER O WITH ACUTE
+0xEF	0x00D4	# LATIN CAPITAL LETTER O WITH CIRCUMFLEX
+0xF0	0x2663	# BLACK CLUB SUIT = shamrock # future mapping U+2618 SHAMROCK
+0xF1	0x00D2	# LATIN CAPITAL LETTER O WITH GRAVE
+0xF2	0x00DA	# LATIN CAPITAL LETTER U WITH ACUTE
+0xF3	0x00DB	# LATIN CAPITAL LETTER U WITH CIRCUMFLEX
+0xF4	0x00D9	# LATIN CAPITAL LETTER U WITH GRAVE
+0xF5	0x0131	# LATIN SMALL LETTER DOTLESS I
+0xF6	0x00DD	# LATIN CAPITAL LETTER Y WITH ACUTE
+0xF7	0x00FD	# LATIN SMALL LETTER Y WITH ACUTE
+0xF8	0x0174	# LATIN CAPITAL LETTER W WITH CIRCUMFLEX
+0xF9	0x0175	# LATIN SMALL LETTER W WITH CIRCUMFLEX
+0xFA	0x1E84	# LATIN CAPITAL LETTER W WITH DIAERESIS
+0xFB	0x1E85	# LATIN SMALL LETTER W WITH DIAERESIS
+0xFC	0x1E80	# LATIN CAPITAL LETTER W WITH GRAVE
+0xFD	0x1E81	# LATIN SMALL LETTER W WITH GRAVE
+0xFE	0x1E82	# LATIN CAPITAL LETTER W WITH ACUTE
+0xFF	0x1E83	# LATIN SMALL LETTER W WITH ACUTE
--- a/charmap/GREEK.TXT
+++ b/charmap/GREEK.TXT
@ -0,0 +1,355 @@
+#=======================================================================
+#   File name:  GREEK.TXT
+#
+#   Contents:   Map (external version) from Mac OS Greek
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update to match changes in Mac OS Greek
+#                           encoding for Mac OS 9.2.2 and later.
+#                           Update URLs, notes. Matches internal
+#                           utom<b3>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n06  1998-Feb-05    Update to match internal utom<n4>, ufrm<n17>,
+#                           and Text Encoding Converter versions 1.3:
+#                           Change mapping for 0xAF from U+0387 to its
+#                           canonical decomposition, U+00B7. Also
+#                           update header comments to new format.
+#       n04  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<n7>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Greek code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Greek code order.
+#
+#   One of these mappings requires the use of a corporate character.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Greek character set uses the standard control characters at
+#   0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Greek:
+# ----------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   Although a Mac OS script code is defined for Greek (smGreek = 6),
+#   the Greek localized system does not currently use it (the font
+#   family IDs are in the Mac OS Roman range). To determine if the
+#   Greek encoding is being used when the script code is smRoman (0),
+#   you must check if the system region code is 20, verGreece.
+#
+#   The Mac OS Greek encoding is a superset of the repertoire of
+#   ISO 8859-7 (although characters are not at the same code points),
+#   except that LEFT & RIGHT SINGLE QUOTATION MARK replace the
+#   MODIFIER LETTER REVERSED COMMA & APOSTROPHE (spacing versions of
+#   Greek rough & smooth breathing marks) that are in ISO 8859-7.
+#   The added characters in Mac OS Greek include more punctuation and
+#   symbols and several accented Latin letters.
+#
+#   Before Mac OS 9.2.2, code point 0x9C was SOFT HYPHEN (U+00AD), and
+#   code point 0xFF was undefined. In Mac OS 9.2.2 and later versions,
+#   SOFT HYPHEN was moved to 0xFF, and code point 0x9C was changed to be
+#   EURO SIGN (U+20AC); the standard Apple fonts are updated for Mac OS
+#   9.2.2 to reflect this. There is a "no Euro sign" variant of the Mac
+#   OS Greek encoding that uses the older mapping; this can be used for
+#   older fonts.
+#
+#   This "no Euro sign" variant of Mac OS Greek was the character set
+#   used by Mac OS Greek systems before 9.2.2 except for system 6.0.7,
+#   which used a variant character set but was quickly replaced with
+#   Greek system 6.0.7.1 using the no Euro sign" character set
+#   documented here. Greek system 4.1 used a variant Greek set that had
+#   ISO 8859-7 in 0xA0-0xFF (with some holes filled in with DTP
+#   characters), and Mac OS Roman accented Roman letters in 0x80-0x9F.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version b02 to version b03/c01:
+#
+#   - The Mac OS Greek encoding changed for Mac OS 9.2.2 and later
+#     as follows:
+#     0x9C, changed from 0x00AD SOFT HYPHEN to 0x20AC EURO SIGN
+#     0xFF, changed from undefined to 0x00AD SOFT HYPHEN
+#
+#   Changes from version n04 to version n06:
+#
+#   - Change mapping of 0xAF from U+0387 to its canonical
+#     decomposition, U+00B7.
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x00B9	# SUPERSCRIPT ONE
+0x82	0x00B2	# SUPERSCRIPT TWO
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00B3	# SUPERSCRIPT THREE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x0385	# GREEK DIALYTIKA TONOS
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x0384	# GREEK TONOS
+0x8C	0x00A8	# DIAERESIS
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00A3	# POUND SIGN
+0x93	0x2122	# TRADE MARK SIGN
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x2022	# BULLET
+0x97	0x00BD	# VULGAR FRACTION ONE HALF
+0x98	0x2030	# PER MILLE SIGN
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00A6	# BROKEN BAR
+0x9C	0x20AC	# EURO SIGN # before Mac OS 9.2.2, was SOFT HYPHEN
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	0x2020	# DAGGER
+0xA1	0x0393	# GREEK CAPITAL LETTER GAMMA
+0xA2	0x0394	# GREEK CAPITAL LETTER DELTA
+0xA3	0x0398	# GREEK CAPITAL LETTER THETA
+0xA4	0x039B	# GREEK CAPITAL LETTER LAMDA
+0xA5	0x039E	# GREEK CAPITAL LETTER XI
+0xA6	0x03A0	# GREEK CAPITAL LETTER PI
+0xA7	0x00DF	# LATIN SMALL LETTER SHARP S
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x03A3	# GREEK CAPITAL LETTER SIGMA
+0xAB	0x03AA	# GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
+0xAC	0x00A7	# SECTION SIGN
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x00B0	# DEGREE SIGN
+0xAF	0x00B7	# MIDDLE DOT
+0xB0	0x0391	# GREEK CAPITAL LETTER ALPHA
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x00A5	# YEN SIGN
+0xB5	0x0392	# GREEK CAPITAL LETTER BETA
+0xB6	0x0395	# GREEK CAPITAL LETTER EPSILON
+0xB7	0x0396	# GREEK CAPITAL LETTER ZETA
+0xB8	0x0397	# GREEK CAPITAL LETTER ETA
+0xB9	0x0399	# GREEK CAPITAL LETTER IOTA
+0xBA	0x039A	# GREEK CAPITAL LETTER KAPPA
+0xBB	0x039C	# GREEK CAPITAL LETTER MU
+0xBC	0x03A6	# GREEK CAPITAL LETTER PHI
+0xBD	0x03AB	# GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
+0xBE	0x03A8	# GREEK CAPITAL LETTER PSI
+0xBF	0x03A9	# GREEK CAPITAL LETTER OMEGA
+0xC0	0x03AC	# GREEK SMALL LETTER ALPHA WITH TONOS
+0xC1	0x039D	# GREEK CAPITAL LETTER NU
+0xC2	0x00AC	# NOT SIGN
+0xC3	0x039F	# GREEK CAPITAL LETTER OMICRON
+0xC4	0x03A1	# GREEK CAPITAL LETTER RHO
+0xC5	0x2248	# ALMOST EQUAL TO
+0xC6	0x03A4	# GREEK CAPITAL LETTER TAU
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x03A5	# GREEK CAPITAL LETTER UPSILON
+0xCC	0x03A7	# GREEK CAPITAL LETTER CHI
+0xCD	0x0386	# GREEK CAPITAL LETTER ALPHA WITH TONOS
+0xCE	0x0388	# GREEK CAPITAL LETTER EPSILON WITH TONOS
+0xCF	0x0153	# LATIN SMALL LIGATURE OE
+0xD0	0x2013	# EN DASH
+0xD1	0x2015	# HORIZONTAL BAR
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x00F7	# DIVISION SIGN
+0xD7	0x0389	# GREEK CAPITAL LETTER ETA WITH TONOS
+0xD8	0x038A	# GREEK CAPITAL LETTER IOTA WITH TONOS
+0xD9	0x038C	# GREEK CAPITAL LETTER OMICRON WITH TONOS
+0xDA	0x038E	# GREEK CAPITAL LETTER UPSILON WITH TONOS
+0xDB	0x03AD	# GREEK SMALL LETTER EPSILON WITH TONOS
+0xDC	0x03AE	# GREEK SMALL LETTER ETA WITH TONOS
+0xDD	0x03AF	# GREEK SMALL LETTER IOTA WITH TONOS
+0xDE	0x03CC	# GREEK SMALL LETTER OMICRON WITH TONOS
+0xDF	0x038F	# GREEK CAPITAL LETTER OMEGA WITH TONOS
+0xE0	0x03CD	# GREEK SMALL LETTER UPSILON WITH TONOS
+0xE1	0x03B1	# GREEK SMALL LETTER ALPHA
+0xE2	0x03B2	# GREEK SMALL LETTER BETA
+0xE3	0x03C8	# GREEK SMALL LETTER PSI
+0xE4	0x03B4	# GREEK SMALL LETTER DELTA
+0xE5	0x03B5	# GREEK SMALL LETTER EPSILON
+0xE6	0x03C6	# GREEK SMALL LETTER PHI
+0xE7	0x03B3	# GREEK SMALL LETTER GAMMA
+0xE8	0x03B7	# GREEK SMALL LETTER ETA
+0xE9	0x03B9	# GREEK SMALL LETTER IOTA
+0xEA	0x03BE	# GREEK SMALL LETTER XI
+0xEB	0x03BA	# GREEK SMALL LETTER KAPPA
+0xEC	0x03BB	# GREEK SMALL LETTER LAMDA
+0xED	0x03BC	# GREEK SMALL LETTER MU
+0xEE	0x03BD	# GREEK SMALL LETTER NU
+0xEF	0x03BF	# GREEK SMALL LETTER OMICRON
+0xF0	0x03C0	# GREEK SMALL LETTER PI
+0xF1	0x03CE	# GREEK SMALL LETTER OMEGA WITH TONOS
+0xF2	0x03C1	# GREEK SMALL LETTER RHO
+0xF3	0x03C3	# GREEK SMALL LETTER SIGMA
+0xF4	0x03C4	# GREEK SMALL LETTER TAU
+0xF5	0x03B8	# GREEK SMALL LETTER THETA
+0xF6	0x03C9	# GREEK SMALL LETTER OMEGA
+0xF7	0x03C2	# GREEK SMALL LETTER FINAL SIGMA
+0xF8	0x03C7	# GREEK SMALL LETTER CHI
+0xF9	0x03C5	# GREEK SMALL LETTER UPSILON
+0xFA	0x03B6	# GREEK SMALL LETTER ZETA
+0xFB	0x03CA	# GREEK SMALL LETTER IOTA WITH DIALYTIKA
+0xFC	0x03CB	# GREEK SMALL LETTER UPSILON WITH DIALYTIKA
+0xFD	0x0390	# GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
+0xFE	0x03B0	# GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
+0xFF	0x00AD	# SOFT HYPHEN # before Mac OS 9.2.2, was undefined
--- a/charmap/GUJARATI.TXT
+++ b/charmap/GUJARATI.TXT
@ -0,0 +1,383 @@
+#=======================================================================
+#   File name:  GUJARATI.TXT
+#
+#   Contents:   Map (external version) from Mac OS Gujarati
+#               encoding to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1997-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update URLs. Matches internal utom<b1>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n02  1998-Feb-05    First version; matches internal utom<n4>,
+#                           ufrm<n5>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Gujarati code or code sequence
+#       (in hex as 0xNN or 0xNN+0xNN)
+#     Column #2 is the corresponding Unicode or Unicode sequence
+#       (in hex as 0xNNNN or 0xNNNN+0xNNNN).
+#     Column #3 is a comment containing the Unicode name or sequence
+#       of names. In some cases an additional comment follows the
+#       Unicode name(s).
+#
+#   The entries are in two sections. The first section is for pairs of
+#   Mac OS Gujarati code points that must be mapped in a special way.
+#   The second section maps individual code points.
+#
+#   Within each section, the entries are in Mac OS Gujarati code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Gujarati character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Gujarati:
+# -------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   Mac OS Gujarati is based on IS 13194:1991 (ISCII-91), with the
+#   addition of several punctuation and symbol characters. However,
+#   Mac OS Gujarati does not support the ATR (attribute) mechanism of
+#   ISCII-91.
+#
+# 1. ISCII-91 features in Mac OS Gujarati include:
+#
+#  a) Overloading of nukta
+#
+#     In addition to using the nukta (0xE9) like a combining dot below,
+#     nukta is overloaded to function as a general character modifier.
+#     In this role, certain code points followed by 0xE9 are treated as
+#     a two-byte code point representing a character which may be
+#     rather different than the characters represented by either of
+#     the code points alone. For example, the character GUJARATI OM
+#     (U+0AD0) is represented in ISCII-91 as candrabindu + nukta.
+#
+#  b) Explicit halant and soft halant
+#
+#     A double halant (0xE8 + 0xE8) constitutes an "explicit halant",
+#     which will always appear as a halant instead of causing formation
+#     of a ligature or half-form consonant.
+#
+#     Halant followed by nukta (0xE8 + 0xE9) constitutes a "soft
+#     halant", which prevents formation of a ligature and instead
+#     retains the half-form of the first consonant.
+#
+#  c) Invisible consonant
+#
+#     The byte 0xD9 (called INV in ISCII-91) is an invisible consonant:
+#     It behaves like a consonant but has no visible appearance. It is
+#     intended to be used (often in combination with halant) to display
+#     dependent forms in isolation, such as the RA forms or consonant
+#     half-forms.
+#
+#  d) Extensions for Vedic, etc.
+#
+#     The byte 0xF0 (called EXT in ISCII-91) followed by any byte in
+#     the range 0xA1-0xEE constitutes a two-byte code point which can
+#     be used to represent additional characters for Vedic (or other
+#     extensions); 0xF0 followed by any other byte value constitutes
+#     malformed text. Mac OS Gujarati supports this mechanism, but
+#     does not currently map any of these two-byte code points to
+#     anything.
+#
+# 2. Mac OS Gujarati additions
+#
+#   Mac OS Gujarati adds characters using the code points
+#   0x80-0x8A and 0x90.
+#
+# 3. Unused code points
+#
+#   The following code points are currently unused, and are not shown
+#   here: 0x8B-0x8F, 0x91-0xA0, 0xAB, 0xAF, 0xC7, 0xCE, 0xD0, 0xD3,
+#   0xE0, 0xE4, 0xEB-0xEF, 0xFB-0xFF. In addition, 0xF0 is not shown
+#   here, but it has a special function as described above.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# 1. Mapping the byte pairs
+#
+#   If one of the following byte values is encountered when mapping
+#   Mac OS Gujarati text - xA1, xAA, xDF, or 0xE8 - then the next
+#   byte (if there is one) should be examined. If the next byte is
+#   0xE9 - or also 0xE8, if the first byte was 0xE8 - then the byte
+#   pair should be mapped using the first section of the mapping
+#   table below. Otherwise, each byte should be mapped using the
+#   second section of the mapping table below.
+#
+#   - The Unicode Standard, Version 2.0, specifies how explicit
+#     halant and soft halant should be represented in Unicode;
+#     these mappings are used below.
+#
+#   If the byte value 0xF0 is encountered when mapping Mac OS
+#   Gujarati text, then the next byte should be examined. If there
+#   is no next byte (e.g. 0xF0 at end of buffer), the mapping
+#   process should indicate incomplete character. If there is a next
+#   byte but it is not in the range 0xA1-0xEE, the mapping process
+#   should indicate malformed text. Otherwise, the mapping process
+#   should treat the byte pair as a valid two-byte code point with no
+#   mapping (e.g. map it to QUESTION MARK, REPLACEMENT CHARACTER,
+#   etc.).
+#
+# 2. Mapping the invisible consonant
+#
+#   It has been suggested that INV in ISCII-91 should map to ZERO
+#   WIDTH NON-JOINER in Unicode. However, this causes problems with
+#   roundtrip fidelity: The ISCII-91 sequences 0xE8+0xE8 and 0xE8+0xD9
+#   would map to the same sequence of Unicode characters. We have
+#   instead mapped INV to LEFT-TO-RIGHT MARK, which avoids these
+#   problems.
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+##################
+
+# Section 1: Map the following byte pairs as indicated:
+# (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
+# (Also see note about 0xF0 in comments above)
+
+0xA1+0xE9	0x0AD0	# GUJARATI OM
+0xAA+0xE9	0x0AE0	# GUJARATI LETTER VOCALIC RR
+0xDF+0xE9	0x0AC4	# GUJARATI VOWEL SIGN VOCALIC RR
+0xE8+0xE8	0x0ACD+0x200C	# GUJARATI SIGN VIRAMA + ZWNJ # explicit halant
+0xE8+0xE9	0x0ACD+0x200D	# GUJARATI SIGN VIRAMA + ZWJ # soft halant
+
+# Section 2: Map the remaining bytes as follows:
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00D7	# MULTIPLICATION SIGN
+0x81	0x2212	# MINUS SIGN
+0x82	0x2013	# EN DASH
+0x83	0x2014	# EM DASH
+0x84	0x2018	# LEFT SINGLE QUOTATION MARK
+0x85	0x2019	# RIGHT SINGLE QUOTATION MARK
+0x86	0x2026	# HORIZONTAL ELLIPSIS
+0x87	0x2022	# BULLET
+0x88	0x00A9	# COPYRIGHT SIGN
+0x89	0x00AE	# REGISTERED SIGN
+0x8A	0x2122	# TRADE MARK SIGN
+#
+0x90	0x0965	# DEVANAGARI DOUBLE DANDA
+#
+0xA1	0x0A81	# GUJARATI SIGN CANDRABINDU
+0xA2	0x0A82	# GUJARATI SIGN ANUSVARA
+0xA3	0x0A83	# GUJARATI SIGN VISARGA
+0xA4	0x0A85	# GUJARATI LETTER A
+0xA5	0x0A86	# GUJARATI LETTER AA
+0xA6	0x0A87	# GUJARATI LETTER I
+0xA7	0x0A88	# GUJARATI LETTER II
+0xA8	0x0A89	# GUJARATI LETTER U
+0xA9	0x0A8A	# GUJARATI LETTER UU
+0xAA	0x0A8B	# GUJARATI LETTER VOCALIC R
+#
+0xAC	0x0A8F	# GUJARATI LETTER E
+0xAD	0x0A90	# GUJARATI LETTER AI
+0xAE	0x0A8D	# GUJARATI VOWEL CANDRA E
+#
+0xB0	0x0A93	# GUJARATI LETTER O
+0xB1	0x0A94	# GUJARATI LETTER AU
+0xB2	0x0A91	# GUJARATI VOWEL CANDRA O
+0xB3	0x0A95	# GUJARATI LETTER KA
+0xB4	0x0A96	# GUJARATI LETTER KHA
+0xB5	0x0A97	# GUJARATI LETTER GA
+0xB6	0x0A98	# GUJARATI LETTER GHA
+0xB7	0x0A99	# GUJARATI LETTER NGA
+0xB8	0x0A9A	# GUJARATI LETTER CA
+0xB9	0x0A9B	# GUJARATI LETTER CHA
+0xBA	0x0A9C	# GUJARATI LETTER JA
+0xBB	0x0A9D	# GUJARATI LETTER JHA
+0xBC	0x0A9E	# GUJARATI LETTER NYA
+0xBD	0x0A9F	# GUJARATI LETTER TTA
+0xBE	0x0AA0	# GUJARATI LETTER TTHA
+0xBF	0x0AA1	# GUJARATI LETTER DDA
+0xC0	0x0AA2	# GUJARATI LETTER DDHA
+0xC1	0x0AA3	# GUJARATI LETTER NNA
+0xC2	0x0AA4	# GUJARATI LETTER TA
+0xC3	0x0AA5	# GUJARATI LETTER THA
+0xC4	0x0AA6	# GUJARATI LETTER DA
+0xC5	0x0AA7	# GUJARATI LETTER DHA
+0xC6	0x0AA8	# GUJARATI LETTER NA
+#
+0xC8	0x0AAA	# GUJARATI LETTER PA
+0xC9	0x0AAB	# GUJARATI LETTER PHA
+0xCA	0x0AAC	# GUJARATI LETTER BA
+0xCB	0x0AAD	# GUJARATI LETTER BHA
+0xCC	0x0AAE	# GUJARATI LETTER MA
+0xCD	0x0AAF	# GUJARATI LETTER YA
+#
+0xCF	0x0AB0	# GUJARATI LETTER RA
+#
+0xD1	0x0AB2	# GUJARATI LETTER LA
+0xD2	0x0AB3	# GUJARATI LETTER LLA
+#
+0xD4	0x0AB5	# GUJARATI LETTER VA
+0xD5	0x0AB6	# GUJARATI LETTER SHA
+0xD6	0x0AB7	# GUJARATI LETTER SSA
+0xD7	0x0AB8	# GUJARATI LETTER SA
+0xD8	0x0AB9	# GUJARATI LETTER HA
+0xD9	0x200E	# LEFT-TO-RIGHT MARK # invisible consonant
+0xDA	0x0ABE	# GUJARATI VOWEL SIGN AA
+0xDB	0x0ABF	# GUJARATI VOWEL SIGN I
+0xDC	0x0AC0	# GUJARATI VOWEL SIGN II
+0xDD	0x0AC1	# GUJARATI VOWEL SIGN U
+0xDE	0x0AC2	# GUJARATI VOWEL SIGN UU
+0xDF	0x0AC3	# GUJARATI VOWEL SIGN VOCALIC R
+#
+0xE1	0x0AC7	# GUJARATI VOWEL SIGN E
+0xE2	0x0AC8	# GUJARATI VOWEL SIGN AI
+0xE3	0x0AC5	# GUJARATI VOWEL SIGN CANDRA E
+#
+0xE5	0x0ACB	# GUJARATI VOWEL SIGN O
+0xE6	0x0ACC	# GUJARATI VOWEL SIGN AU
+0xE7	0x0AC9	# GUJARATI VOWEL SIGN CANDRA O
+0xE8	0x0ACD	# GUJARATI SIGN VIRAMA # halant
+0xE9	0x0ABC	# GUJARATI SIGN NUKTA
+0xEA	0x0964	# DEVANAGARI DANDA
+#
+0xF1	0x0AE6	# GUJARATI DIGIT ZERO
+0xF2	0x0AE7	# GUJARATI DIGIT ONE
+0xF3	0x0AE8	# GUJARATI DIGIT TWO
+0xF4	0x0AE9	# GUJARATI DIGIT THREE
+0xF5	0x0AEA	# GUJARATI DIGIT FOUR
+0xF6	0x0AEB	# GUJARATI DIGIT FIVE
+0xF7	0x0AEC	# GUJARATI DIGIT SIX
+0xF8	0x0AED	# GUJARATI DIGIT SEVEN
+0xF9	0x0AEE	# GUJARATI DIGIT EIGHT
+0xFA	0x0AEF	# GUJARATI DIGIT NINE
--- a/charmap/GURMUKHI.TXT
+++ b/charmap/GURMUKHI.TXT
@ -0,0 +1,441 @@
+#=======================================================================
+#   File name:  GURMUKHI.TXT
+#
+#   Contents:   Map (external version) from Mac OS Gurmukhi
+#               encoding to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1997-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Change mappings for 0x91, 0xD5 based on
+#							new decomposition rules. Update URLs,
+#                           notes. Matches internal utom<b2>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n02  1998-Feb-05    First version; matches internal utom<n5>,
+#                           ufrm<n6>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Gurmukhi code or code sequence
+#       (in hex as 0xNN or 0xNN+0xNN)
+#     Column #2 is the corresponding Unicode or Unicode sequence
+#       (in hex as 0xNNNN or 0xNNNN+0xNNNN).
+#     Column #3 is a comment containing the Unicode name or sequence
+#       of names. In some cases an additional comment follows the
+#       Unicode name(s).
+#
+#   The entries are in two sections. The first section is for pairs of
+#   Mac OS Gurmukhi code points that must be mapped in a special way.
+#   The second section maps individual code points.
+#
+#   Within each section, the entries are in Mac OS Gurmukhi code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Gurmukhi character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Gurmukhi:
+# -------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   Mac OS Gurmukhi is based on IS 13194:1991 (ISCII-91), with the
+#   addition of several punctuation and symbol characters. However,
+#   Mac OS Gurmukhi does not support the ATR (attribute) mechanism of
+#   ISCII-91.
+#
+# 1. ISCII-91 features in Mac OS Gurmukhi include:
+#
+#  a) Explicit halant and soft halant
+#
+#     A double halant (0xE8 + 0xE8) constitutes an "explicit halant",
+#     which will always appear as a halant instead of causing formation
+#     of a ligature or half-form consonant.
+#
+#     Halant followed by nukta (0xE8 + 0xE9) constitutes a "soft
+#     halant", which prevents formation of a ligature and instead
+#     retains the half-form of the first consonant.
+#
+#  b) Invisible consonant
+#
+#     The byte 0xD9 (called INV in ISCII-91) is an invisible consonant:
+#     It behaves like a consonant but has no visible appearance. It is
+#     intended to be used (often in combination with halant) to display
+#     dependent forms in isolation, such as the RA forms or consonant
+#     half-forms.
+#
+#  c) Extensions for Vedic, etc.
+#
+#     The byte 0xF0 (called EXT in ISCII-91) followed by any byte in
+#     the range 0xA1-0xEE constitutes a two-byte code point which can
+#     be used to represent additional characters for Vedic (or other
+#     extensions); 0xF0 followed by any other byte value constitutes
+#     malformed text. Mac OS Gurmukhi supports this mechanism, but
+#     does not currently map any of these two-byte code points to
+#     anything.
+#
+# 2. Mac OS Gurmukhi additions
+#
+#   Mac OS Gurmukhi adds characters using the code points
+#   0x80-0x8A and 0x90-0x94 (the latter are some Gurmukhi additions).
+#
+# 3. Unused code points
+#
+#   The following code points are currently unused, and are not shown
+#   here: 0x8B-0x8F, 0x95-0xA1, 0xA3, 0xAA-0xAB, 0xAE-0xAF, 0xB2,
+#   0xC7, 0xCE, 0xD0, 0xD2-0xD3, 0xD6, 0xDF-0xE0, 0xE3-0xE4, 0xE7,
+#   0xEB-0xEF, 0xFB-0xFF. In addition, 0xF0 is not shown here, but it
+#   has a special function as described above.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# 1. Mapping the byte pairs
+#
+#   If the byte value 0xE8 is encountered when mapping Mac OS
+#   Gurmukhi text, then the next byte (if there is one) should be
+#   examined. If the next byte is 0xE8 or 0xE9, then the byte pair
+#   should be mapped using the first section of the mapping table
+#   below. Otherwise, each byte should be mapped using the second
+#   section of the mapping table below.
+#
+#   - The Unicode Standard, Version 2.0, specifies how explicit
+#     halant and soft halant should be represented in Unicode;
+#     these mappings are used below.
+#
+#   If the byte value 0xF0 is encountered when mapping Mac OS
+#   Gurmukhi text, then the next byte should be examined. If there
+#   is no next byte (e.g. 0xF0 at end of buffer), the mapping
+#   process should indicate incomplete character. If there is a next
+#   byte but it is not in the range 0xA1-0xEE, the mapping process
+#   should indicate malformed text. Otherwise, the mapping process
+#   should treat the byte pair as a valid two-byte code point with no
+#   mapping (e.g. map it to QUESTION MARK, REPLACEMENT CHARACTER,
+#   etc.).
+#
+# 2. Mapping the invisible consonant
+#
+#   It has been suggested that INV in ISCII-91 should map to ZERO
+#   WIDTH NON-JOINER in Unicode. However, this causes problems with
+#   roundtrip fidelity: The ISCII-91 sequences 0xE8+0xE8 and 0xE8+0xD9
+#   would map to the same sequence of Unicode characters. We have
+#   instead mapped INV to LEFT-TO-RIGHT MARK, which avoids these
+#   problems.
+#
+# 3. Mappings using corporate characters
+#
+#   Mapping the GURMUKHI LETTER SHA 0xD5 presents an interesting
+#   problem. At first glance, we could map it to the single Unicode
+#   character 0x0A36.
+#
+#   However, our goal is that the mappings provided here should also
+#   be able to generate the mappings to maximally decomposed Unicode
+#   by simple recursive substitution of the canonical decompositions
+#   in the Unicode database. We want mapping tables derived this way
+#   to retain full roundtrip fidelity.
+#
+#   Since the canonical decomposition of 0x0A36 is 0x0A38+0x0A3C,
+#   the decomposition mapping for 0xD5 would be identical with the
+#   decomposition mapping for 0xD7+0xE9, and roundtrip fidelity would
+#   be lost.
+#
+#   We solve this problem by using a grouping hint (one of the set of
+#   transcoding hints defined by Apple).
+#
+#   Apple has defined a block of 32 corporate characters as "transcoding
+#   hints." These are used in combination with standard Unicode characters
+#   to force them to be treated in a special way for mapping to other
+#   encodings; they have no other effect. Sixteen of these transcoding
+#   hints are "grouping hints" - they indicate that the next 2-4 Unicode
+#   characters should be treated as a single entity for transcoding. The
+#   other sixteen transcoding hints are "variant tags" - they are like
+#   combining characters, and can follow a standard Unicode (or a sequence
+#   consisting of a base character and other combining characters) to
+#   cause it to be treated in a special way for transcoding. These always
+#   terminate a combining-character sequence.
+#
+#   The transcoding coding hint used in this mapping table is:
+#     0xF860 group next 2 characters
+#
+#   Then we can map 0x91 as follows:
+#     0xD5 -> 0xF860+0x0A38+0x0A3C
+#
+#   We could also have used a variant tag such as 0xF87F and mapped it
+#   this way:
+#     0xD5 -> 0x0A36+0xF87F
+#
+# 4. Additional loose mappings from Unicode
+#
+#   These are not preserved in roundtrip mappings.
+#
+#   0A59 -> 0xB4+0xE9   # GURMUKHI LETTER KHHA
+#   0A5A -> 0xB5+0xE9   # GURMUKHI LETTER GHHA
+#   0A5B -> 0xBA+0xE9   # GURMUKHI LETTER ZA
+#   0A5E -> 0xC9+0xE9   # GURMUKHI LETTER FA
+#
+#   0A70 -> 0xA2    # GURMUKHI TIPPI
+#
+#   Loose mappings from Unicode should also map U+0A71 (GURMUKHI ADDAK)
+#   followed by any Gurmukhi consonant to the equivalent ISCII-91
+#   consonant plus halant plus the consonant again. For example:
+#
+#   0A71+0A15 -> 0xB3+0xE8+0xB3
+#   0A71+0A16 -> 0xB4+0xE8+0xB4
+#   ...
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version b02 to version b03/c01:
+#
+#   - Change mapping of 0x91 from 0xF860+0x0A21+0x0A3C to 0x0A5C GURMUKHI
+#     LETTER RRA, now that the canonical decomposition of 0x0A5C to
+#     0x0A21+0x0A3C has been deleted
+#
+#   - Change mapping of 0xD5 from 0x0A36 GURMUKHI LETTER SHA to
+#     0xF860+0x0A38+0x0A3C, now that a canonical decomposition of 0x0A36
+#     to 0x0A38+0x0A3C has been added.
+#
+##################
+
+# Section 1: Map the following byte pairs as indicated:
+# (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
+# (Also see note about 0xF0 in comments above)
+
+0xE8+0xE8	0x0A4D+0x200C	# GURMUKHI SIGN VIRAMA + ZWNJ # explicit halant
+0xE8+0xE9	0x0A4D+0x200D	# GURMUKHI SIGN VIRAMA + ZWJ # soft halant
+
+# Section 2: Map the remaining bytes as follows:
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00D7	# MULTIPLICATION SIGN
+0x81	0x2212	# MINUS SIGN
+0x82	0x2013	# EN DASH
+0x83	0x2014	# EM DASH
+0x84	0x2018	# LEFT SINGLE QUOTATION MARK
+0x85	0x2019	# RIGHT SINGLE QUOTATION MARK
+0x86	0x2026	# HORIZONTAL ELLIPSIS
+0x87	0x2022	# BULLET
+0x88	0x00A9	# COPYRIGHT SIGN
+0x89	0x00AE	# REGISTERED SIGN
+0x8A	0x2122	# TRADE MARK SIGN
+#
+0x90	0x0A71	# GURMUKHI ADDAK
+0x91	0x0A5C	# GURMUKHI LETTER RRA
+0x92	0x0A73	# GURMUKHI URA
+0x93	0x0A72	# GURMUKHI IRI
+0x94	0x0A74	# GURMUKHI EK ONKAR
+#
+0xA2	0x0A02	# GURMUKHI SIGN BINDI
+#
+0xA4	0x0A05	# GURMUKHI LETTER A
+0xA5	0x0A06	# GURMUKHI LETTER AA
+0xA6	0x0A07	# GURMUKHI LETTER I
+0xA7	0x0A08	# GURMUKHI LETTER II
+0xA8	0x0A09	# GURMUKHI LETTER U
+0xA9	0x0A0A	# GURMUKHI LETTER UU
+#
+0xAC	0x0A0F	# GURMUKHI LETTER EE
+0xAD	0x0A10	# GURMUKHI LETTER AI
+#
+0xB0	0x0A13	# GURMUKHI LETTER OO
+0xB1	0x0A14	# GURMUKHI LETTER AU
+#
+0xB3	0x0A15	# GURMUKHI LETTER KA
+0xB4	0x0A16	# GURMUKHI LETTER KHA
+0xB5	0x0A17	# GURMUKHI LETTER GA
+0xB6	0x0A18	# GURMUKHI LETTER GHA
+0xB7	0x0A19	# GURMUKHI LETTER NGA
+0xB8	0x0A1A	# GURMUKHI LETTER CA
+0xB9	0x0A1B	# GURMUKHI LETTER CHA
+0xBA	0x0A1C	# GURMUKHI LETTER JA
+0xBB	0x0A1D	# GURMUKHI LETTER JHA
+0xBC	0x0A1E	# GURMUKHI LETTER NYA
+0xBD	0x0A1F	# GURMUKHI LETTER TTA
+0xBE	0x0A20	# GURMUKHI LETTER TTHA
+0xBF	0x0A21	# GURMUKHI LETTER DDA
+0xC0	0x0A22	# GURMUKHI LETTER DDHA
+0xC1	0x0A23	# GURMUKHI LETTER NNA
+0xC2	0x0A24	# GURMUKHI LETTER TA
+0xC3	0x0A25	# GURMUKHI LETTER THA
+0xC4	0x0A26	# GURMUKHI LETTER DA
+0xC5	0x0A27	# GURMUKHI LETTER DHA
+0xC6	0x0A28	# GURMUKHI LETTER NA
+#
+0xC8	0x0A2A	# GURMUKHI LETTER PA
+0xC9	0x0A2B	# GURMUKHI LETTER PHA
+0xCA	0x0A2C	# GURMUKHI LETTER BA
+0xCB	0x0A2D	# GURMUKHI LETTER BHA
+0xCC	0x0A2E	# GURMUKHI LETTER MA
+0xCD	0x0A2F	# GURMUKHI LETTER YA
+#
+0xCF	0x0A30	# GURMUKHI LETTER RA
+#
+0xD1	0x0A32	# GURMUKHI LETTER LA
+#
+0xD4	0x0A35	# GURMUKHI LETTER VA
+0xD5	0xF860+0x0A38+0x0A3C	# GURMUKHI LETTER SHA
+#
+0xD7	0x0A38	# GURMUKHI LETTER SA
+0xD8	0x0A39	# GURMUKHI LETTER HA
+0xD9	0x200E	# LEFT-TO-RIGHT MARK # invisible consonant
+0xDA	0x0A3E	# GURMUKHI VOWEL SIGN AA
+0xDB	0x0A3F	# GURMUKHI VOWEL SIGN I
+0xDC	0x0A40	# GURMUKHI VOWEL SIGN II
+0xDD	0x0A41	# GURMUKHI VOWEL SIGN U
+0xDE	0x0A42	# GURMUKHI VOWEL SIGN UU
+#
+0xE1	0x0A47	# GURMUKHI VOWEL SIGN EE
+0xE2	0x0A48	# GURMUKHI VOWEL SIGN AI
+#
+0xE5	0x0A4B	# GURMUKHI VOWEL SIGN OO
+0xE6	0x0A4C	# GURMUKHI VOWEL SIGN AU
+#
+0xE8	0x0A4D	# GURMUKHI SIGN VIRAMA # halant
+0xE9	0x0A3C	# GURMUKHI SIGN NUKTA
+0xEA	0x0964	# DEVANAGARI DANDA
+#
+0xF1	0x0A66	# GURMUKHI DIGIT ZERO
+0xF2	0x0A67	# GURMUKHI DIGIT ONE
+0xF3	0x0A68	# GURMUKHI DIGIT TWO
+0xF4	0x0A69	# GURMUKHI DIGIT THREE
+0xF5	0x0A6A	# GURMUKHI DIGIT FOUR
+0xF6	0x0A6B	# GURMUKHI DIGIT FIVE
+0xF7	0x0A6C	# GURMUKHI DIGIT SIX
+0xF8	0x0A6D	# GURMUKHI DIGIT SEVEN
+0xF9	0x0A6E	# GURMUKHI DIGIT EIGHT
+0xFA	0x0A6F	# GURMUKHI DIGIT NINE
--- a/charmap/HEBREW.TXT
+++ b/charmap/HEBREW.TXT
@ -0,0 +1,601 @@
+#=======================================================================
+#   File name:  HEBREW.TXT
+#
+#   Contents:   Map (external version) from Mac OS Hebrew
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments; add section on
+#                           roundtrip considerations. Matches internal
+#                           xml <c1.4> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Don't require left-right context for digits
+#                           0x30-0x39. Change mapping of 0x81 to use
+#                           decomposition. Reverse the mappings of 0xA8,
+#                           0xA9. Update URLs, notes. Matches internal
+#                           utom<b7>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n03  1998-Feb-05    Show required Unicode character
+#                           directionality in a different way. Update
+#                           mappings for 0xC0 and 0xDE to use
+#                           transcoding hints; matches internal utom<n6>,
+#                           ufrm<n20>, and Text Encoding Converter
+#                           version 1.3. Rewrite header comments.
+#       n01  1995-Nov-15    First version. Matches internal ufrm<n8>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Hebrew code (in hex as 0xNN).
+#     Column #2 is the corresponding Unicode or Unicode sequence (in
+#       hex as 0xNNNN, 0xNNNN+0xNNNN, etc.). Sequences of up to 3
+#       Unicode characters are used here. A single Unicode character
+#       may be preceded by a tag indicating required directionality
+#       (i.e. <LR>+0xNNNN or <RL>+0xNNNN).
+#     Column #3 is a comment containing the Unicode name.
+#
+#   The entries are in Mac OS Hebrew code order.
+#
+#   Some of these mappings require the use of corporate characters.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Hebrew character set uses the standard control characters at
+#   0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Hebrew:
+# -----------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   1. General
+#
+#   The Mac OS Hebrew character set supports the Hebrew and Yiddish
+#   languages. It incorporates the Hebrew letter repertoire of
+#   ISO 8859-8, and uses the same code points for them, 0xE0-0xFA.
+#   It also incorporates the ASCII character set. In addition, the
+#   Mac OS Hebrew character set includes the following:
+#
+#   - Hebrew points (nikud marks) at 0xC6, 0xCB-0xCF and 0xD8-0xDF.
+#     These are non-spacing combining marks. Note that the RAFE point
+#     at 0xD8 is not displayed correctly in some fonts, and cannot be
+#     typed using the keyboard layouts in the current Hebrew localized
+#     systems. Also note: The character given in Unicode as QAMATS
+#     (U+05B8) actually refers to two different sounds, depending on
+#     context. For example, when ALEF is followed by QAMATS, the QAMATS
+#     can actually refer to two different sounds depending on the
+#     following letters. The Mac OS Hebrew character set separately
+#     encodes these two sounds for the same graphic shape, as "qamats"
+#     (0xCB) and "qamats qatan" (0xDE). The "qamats" character is more
+#     common, so it is mapped to the Unicode QAMATS; "qamats qatan" can
+#     only be used with a limited number of characters, and it is
+#     mapped using a corporate-zone variant tag (see below).
+#
+#   - Various Hebrew ligatures at 0x81, 0xC0, 0xC7, 0xC8, 0xD6, and
+#     0xD7. Also note that the Yiddish YOD YOD PATAH ligature at 0x81
+#     is missing in some fonts.
+#
+#   - The NEW SHEQEL SIGN at 0xA6.
+#
+#   - Latin characters with diacritics at 0x80 and 0x82-0x9F. However,
+#     most of these cannot be typed using the keyboard layouts in the
+#     Hebrew localized systems.
+#
+#   - Right-left versions of certain ASCII punctuation, symbols and
+#     digits: 0xA0-0xA5, 0xA7-0xBF, 0xFB-0xFF. See below.
+#
+#   - Miscellaneous additional punctuation at 0xC1, 0xC9, 0xCA, and
+#     0xD0-0xD5. There is a variant of the Hebrew encoding in which
+#     the LEFT SINGLE QUOTATION MARK at 0xD4 is replaced by FIGURE
+#     SPACE. The glyphs for some of the other punctuation characters
+#     are missing in some fonts.
+#
+#   - Four obsolete characters at 0xC2-0xC5 known as canorals (not to
+#     be confused with cantillation marks!). These were used for
+#     manual positioning of nikud marks before System 7.1 (at which
+#     point nikud positioning became automatic with WorldScript.).
+#
+#   2. Directional characters and roundtrip fidelity
+#
+#   The Mac OS Hebrew character set was developed around 1987. At that
+#   time the bidirectional line line layout algorithm used in the Mac OS
+#   Hebrew system was fairly simple; it used only a few direction
+#   classes (instead of the 19 now used in the Unicode bidirectional
+#   algorithm). In order to permit users to handle some tricky layou
+#   problems, certain punctuation, symbol, and digit characters have
+#   duplicate code points, one with a left-right direction attribute and
+#   the other with a right-left direction attribute.
+#
+#   For example, plus sign is encoded at 0x2B with a left-right
+#   attribute, and at 0xAB with a right-left attribute. However, there
+#   is only one PLUS SIGN character in Unicode. This leads to some
+#   interesting problems when mapping between Mac OS Hebrew and Unicode;
+#   see below.
+#
+#   A related problem is that even when a particular character is
+#   encoded only once in Mac OS Hebrew, it may have a different
+#   direction attribute than the corresponding Unicode character.
+#
+#   For example, the Mac OS Hebrew character at 0xC9 is HORIZONTAL
+#   ELLIPSIS with strong right-left direction. However, the Unicode
+#   character HORIZONTAL ELLIPSIS has direction class neutral.
+#
+#   3. Font variants
+#
+#   The table in this file gives the Unicode mappings for the standard
+#   Mac OS Hebrew encoding. This encoding is supported by many of the
+#   Apple fonts (including all of the fonts in the Hebrew Language Kit),
+#   and is the encoding supported by the text processing utilities.
+#   However, some TrueType fonts provided with the localized Hebrew
+#   system implement a slightly different encoding; the difference is
+#   only in one code point, 0xD4. For the standard variant, this is:
+#     0xD4 -> <RL>+0x2018  LEFT SINGLE QUOTATION MARK, right-left
+#
+#   The TrueType variant is used by the following TrueType fonts from
+#   the localized system: Caesarea, Carmel Book, Gilboa, Ramat Sharon,
+#   and Sinai Book. For these, 0xD4 is as follows:
+#     0xD4 -> <RL>+0x2007  FIGURE SPACE, right-left
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   1. Matching the direction of Mac OS Hebrew characters
+#
+#   When Mac OS Hebrew encodes a character twice but with different
+#   direction attributes for the two code points - as in the case of
+#   plus sign mentioned above - we need a way to map both Mac OS Hebrew
+#   code points to Unicode and back again without loss of information.
+#   With the plus sign, for example, mapping one of the Mac OS Hebrew
+#   characters to a code in the Unicode corporate use zone is
+#   undesirable, since both of the plus sign characters are likely to
+#   be used in text that is interchanged.
+#
+#   The problem is solved with the use of direction override characters
+#   and direction-dependent mappings. When mapping from Mac OS Hebrew
+#   to Unicode, we use direction overrides as necessary to force the
+#   direction of the resulting Unicode characters.
+#
+#   The required direction is indicated by a direction tag in the
+#   mappings. A tag of <LR> means the corresponding Unicode character
+#   must have a strong left-right context, and a tag of <RL> indicates
+#   a right-left context.
+#
+#   For example, the mapping of 0x2B is given as <LR>+0x002B; the
+#   mapping of 0xAB is given as <RL>+0x002B. If we map an isolated
+#   instance of 0x2B to Unicode, it should be mapped as follows (LRO
+#   indicates LEFT-RIGHT OVERRIDE, PDF indicates POP DIRECTION
+#   FORMATTING):
+#
+#     0x2B ->  0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
+#
+#   When mapping several characters in a row that require direction
+#   forcing, the overrides need only be used at the beginning and end.
+#   For example:
+#
+#     0x24 0x20 0x28 0x29 -> 0x202D 0x0024 0x0020 0x0028 0x0029 0x202C
+#
+#   If neutral characters that require direction forcing are already
+#   between strong-direction characters with matching directionality,
+#   then direction overrides need not be used. Direction overrides are
+#   always needed to map the right-left digits at 0xB0-0xB9.
+#
+#   When mapping from Unicode to Mac OS Hebrew, the Unicode
+#   bidirectional algorithm should be used to determine resolved
+#   direction of the Unicode characters. The mapping from Unicode to
+#   Mac OS Hebrew can then be disambiguated by the use of the resolved
+#   direction:
+#
+#     Unicode 0x002B -> Mac OS Hebrew 0x2B (if L) or 0xAB (if R)
+#
+#   However, this also means the direction override characters should
+#   be discarded when mapping from Unicode to Mac OS Hebrew (after
+#   they have been used to determine resolved direction), since the
+#   direction override information is carried by the code point itself.
+#
+#   Even when direction overrides are not needed for roundtrip
+#   fidelity, they are sometimes used when mapping Mac OS Hebrew
+#   characters to Unicode in order to achieve similar text layout with
+#   the resulting Unicode text. For example, the single Mac OS Hebrew
+#   ellipsis character has direction class right-left,and there is no
+#   left-right version. However, the Unicode HORIZONTAL ELLIPSIS
+#   character has direction class neutral (which means it may end up
+#   with a resolved direction of left-right if surrounded by left-right
+#   characters). When mapping the Mac OS Hebrew ellipsis to Unicode, it
+#   is surrounded with a direction override to help preserve proper
+#   text layout. The resolved direction is not needed or used when
+#   mapping the Unicode HORIZONTAL ELLIPSIS back to Mac OS Hebrew.
+#
+#   2. Use of corporate-zone Unicodes
+#
+#   The goals in the mappings provided here are:
+#   - Ensure roundtrip mapping from every character in the Mac OS
+#     Hebrew character set to Unicode and back
+#   - Use standard Unicode characters as much as possible, to
+#     maximize interchangeability of the resulting Unicode text.
+#     Whenever possible, avoid having content carried by private-use
+#     characters.
+#
+#   Some of the characters in the Mac OS Hebrew character set do not
+#   correspond to distinct, single Unicode characters. To map these
+#   and satisfy both goals above, we employ various strategies.
+#
+#   a) If possible, use private use characters in combination with
+#   standard Unicode characters to mark variants of the standard
+#   Unicode character.
+#
+#   Apple has defined a block of 32 corporate characters as "transcoding
+#   hints." These are used in combination with standard Unicode characters
+#   to force them to be treated in a special way for mapping to other
+#   encodings; they have no other effect. Sixteen of these transcoding
+#   hints are "grouping hints" - they indicate that the next 2-4 Unicode
+#   characters should be treated as a single entity for transcoding. The
+#   other sixteen transcoding hints are "variant tags" - they are like
+#   combining characters, and can follow a standard Unicode (or a sequence
+#   consisting of a base character and other combining characters) to
+#   cause it to be treated in a special way for transcoding. These always
+#   terminate a combining-character sequence.
+#
+#   Two transcoding hints are used in this mapping table: a grouping hint
+#   and a variant tag:
+#   hint:
+#     0xF86A  group next 2 characters, right-left directionality
+#     0xF87F  variant tag
+#
+#   In Mac OS Hebrew, 0xC0 is a ligature for lamed holam. This can also
+#   be represented in Mac OS Hebrew as 0xEC+0xDD, using separate
+#   characters for lamed and holam. The latter sequence is mapped to
+#   Unicode as 0x05DC+0x05B9, i.e. as the sequence HEBREW LETTER LAMED +
+#   HEBREW POINT HOLAM. We want to map the ligature 0xC0 using the same
+#   standard Unicode characters, but for round-trip fidelity we need to
+#   distinguish it from the mapping of the sequence 0xEC+0xDD. Thus for
+#   0xC0 we use a grouping hint, and map as follows:
+#
+#     0xC0 -> 0xF86A+0x05DC+0x05B9
+#
+#   The variant tag is used for "qamats qatan" to mark it as an alternate
+#   for HEBREW POINT QAMATS, as follows:
+#
+#     0xDE -> 0x05B8+0xF87F
+#
+#   b) Otherwise, use private use characters by themselves to map Mac OS
+#   Hebrew characters which  have no relationship to any standard Unicode
+#   character.
+#
+#   The following additional corporate zone Unicode characters are used
+#   for this purpose here (to map the obsolete "canorals", see above):
+#
+#     0xF89B  Hebrew canoral 1
+#     0xF89C  Hebrew canoral 2
+#     0xF89D  Hebrew canoral 3
+#     0xF89E  Hebrew canoral 4
+#
+#   3. Roundtrip considerations when mapping to decomposed Unicode
+#
+#   Both Mac OS Hebrew and Unicode provide multiple ways of representing
+#   certain letter-and-point combinations. For example, HEBREW LETTER
+#   VAV WITH HOLAM can be represented in Unicode as the single character
+#   0xFB4B or as the sequence 0x05D5 0x05B9; similarly, it can be
+#   represented in Mac OS Hebrew as 0xC7 or as the sequence 0xE5 0xDD.
+#   This leads to some roundtrip problems. First note that we have the
+#   following mappings without such problems:
+#
+#   Mac   standard                            decomp. of     reverse map
+#   OS    Unicode mapping                     std. mapping   of decomp.
+#   ----  ----------------------------------  -------------  -----------
+#   0xC6  0x05BC  ... POINT DAGESH OR MAPIQ   0x05BC (same)  0xC6
+#   0xE5  0x05D5  ... LETTER VAV              0x05D5 (same)  0xE5
+#   0xDD  0x05B9  ... POINT HOLAM             0x05B9 (same)  0xDD
+#
+#   However, those mappings above cause roundtrip problems for the
+#   the following mappings if they are decomposed:
+#
+#   Mac   standard                            decomp. of     reverse map
+#   OS    Unicode mapping                     std. mapping   of decomp.
+#   ----  ----------------------------------  -------------  -----------
+#   0xC7  0xFB4B  ... LETTER VAV WITH HOLAM   0x05D5 0x05B9  0xE5 0xDD
+#   0xC8  0xFB35  ... LETTER VAV WITH DAGESH  0x05D5 0x05BC  0xE5 0xC6
+#
+#   One solution is to use a grouping transcoding hint with the two
+#   decompositions above to mark the decomposed sequence for special
+#   treatment in transcoding. This yields the following mappings to
+#   decomposed Unicode:
+#
+#   Mac                                decomposed
+#   OS                                 Unicode mapping
+#   ----                               --------------------
+#   0xC7                               0xF86A 0x05D5 0x05B9
+#   0xC8                               0xF86A 0x05D5 0x05BC
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version b02 to version b03/c01:
+#
+#   - Stop specifying left-right context for digits 0x30-0x39, since the
+#     corresponding Unicodes 0x0030-0x0039 already have left-right
+#     directionality.
+#
+#   - Change mapping of 0x81 from 0xFB1F HEBREW LIGATURE YIDDISH YOD YOD
+#     PATAH to its canonical decomposition 0x05F2+0x05B7 to improve
+#     cross-platform compatibility (Windows doesn't handle 0xFB1F)
+#
+#   - Interchange the mappings of 0xA8 and 0xA9 to obtain the correct
+#     open/close behavior; they work differently than in Mac Arabic.
+#     The old mapping was
+#         0xA8 <RL>+0x0028 # LEFT PARENTHESIS, right-left
+#         0xA9 <RL>+0x0029 # RIGHT PARENTHESIS, right-left
+#     and the new mapping is
+#         0xA8 <RL>+0x0029 # RIGHT PARENTHESIS, right-left
+#         0xA9 <RL>+0x0028 # LEFT PARENTHESIS, right-left
+#
+#   Changes from version n01 to version n03:
+#
+#   - Change mapping for 0xC0 from single corporate character to
+#     grouping hint plus standard Unicodes
+#
+#   - Change mapping for 0xDE from single corporate character to
+#     standard Unicode plus variant tag
+#
+##################
+
+0x20	<LR>+0x0020	# SPACE, left-right
+0x21	<LR>+0x0021	# EXCLAMATION MARK, left-right
+0x22	<LR>+0x0022	# QUOTATION MARK, left-right
+0x23	<LR>+0x0023	# NUMBER SIGN, left-right
+0x24	<LR>+0x0024	# DOLLAR SIGN, left-right
+0x25	<LR>+0x0025	# PERCENT SIGN, left-right
+0x26	0x0026	# AMPERSAND
+0x27	<LR>+0x0027	# APOSTROPHE, left-right
+0x28	<LR>+0x0028	# LEFT PARENTHESIS, left-right
+0x29	<LR>+0x0029	# RIGHT PARENTHESIS, left-right
+0x2A	<LR>+0x002A	# ASTERISK, left-right
+0x2B	<LR>+0x002B	# PLUS SIGN, left-right
+0x2C	<LR>+0x002C	# COMMA, left-right
+0x2D	<LR>+0x002D	# HYPHEN-MINUS, left-right
+0x2E	<LR>+0x002E	# FULL STOP, left-right
+0x2F	<LR>+0x002F	# SOLIDUS, left-right
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	<LR>+0x003A	# COLON, left-right
+0x3B	<LR>+0x003B	# SEMICOLON, left-right
+0x3C	<LR>+0x003C	# LESS-THAN SIGN, left-right
+0x3D	<LR>+0x003D	# EQUALS SIGN, left-right
+0x3E	<LR>+0x003E	# GREATER-THAN SIGN, left-right
+0x3F	<LR>+0x003F	# QUESTION MARK, left-right
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	<LR>+0x005B	# LEFT SQUARE BRACKET, left-right
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	<LR>+0x005D	# RIGHT SQUARE BRACKET, left-right
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	<LR>+0x007B	# LEFT CURLY BRACKET, left-right
+0x7C	<LR>+0x007C	# VERTICAL LINE, left-right
+0x7D	<LR>+0x007D	# RIGHT CURLY BRACKET, left-right
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x05F2+0x05B7	# HEBREW LIGATURE YIDDISH YOD YOD PATAH
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x00E3	# LATIN SMALL LETTER A WITH TILDE
+0x8C	0x00E5	# LATIN SMALL LETTER A WITH RING ABOVE
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	0x00EC	# LATIN SMALL LETTER I WITH GRAVE
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	0x00F2	# LATIN SMALL LETTER O WITH GRAVE
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00F5	# LATIN SMALL LETTER O WITH TILDE
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	<RL>+0x0020	# SPACE, right-left
+0xA1	<RL>+0x0021	# EXCLAMATION MARK, right-left
+0xA2	<RL>+0x0022	# QUOTATION MARK, right-left
+0xA3	<RL>+0x0023	# NUMBER SIGN, right-left
+0xA4	<RL>+0x0024	# DOLLAR SIGN, right-left
+0xA5	<RL>+0x0025	# PERCENT SIGN, right-left
+0xA6	0x20AA	# NEW SHEQEL SIGN
+0xA7	<RL>+0x0027	# APOSTROPHE, right-left
+0xA8	<RL>+0x0029	# RIGHT PARENTHESIS, right-left # close parenthesis
+0xA9	<RL>+0x0028	# LEFT PARENTHESIS, right-left # open parenthesis
+0xAA	<RL>+0x002A	# ASTERISK, right-left
+0xAB	<RL>+0x002B	# PLUS SIGN, right-left
+0xAC	<RL>+0x002C	# COMMA, right-left
+0xAD	<RL>+0x002D	# HYPHEN-MINUS, right-left
+0xAE	<RL>+0x002E	# FULL STOP, right-left
+0xAF	<RL>+0x002F	# SOLIDUS, right-left
+0xB0	<RL>+0x0030	# DIGIT ZERO, right-left (need override)
+0xB1	<RL>+0x0031	# DIGIT ONE, right-left (need override)
+0xB2	<RL>+0x0032	# DIGIT TWO, right-left (need override)
+0xB3	<RL>+0x0033	# DIGIT THREE, right-left (need override)
+0xB4	<RL>+0x0034	# DIGIT FOUR, right-left (need override)
+0xB5	<RL>+0x0035	# DIGIT FIVE, right-left (need override)
+0xB6	<RL>+0x0036	# DIGIT SIX, right-left (need override)
+0xB7	<RL>+0x0037	# DIGIT SEVEN, right-left (need override)
+0xB8	<RL>+0x0038	# DIGIT EIGHT, right-left (need override)
+0xB9	<RL>+0x0039	# DIGIT NINE, right-left (need override)
+0xBA	<RL>+0x003A	# COLON, right-left
+0xBB	<RL>+0x003B	# SEMICOLON, right-left
+0xBC	<RL>+0x003C	# LESS-THAN SIGN, right-left
+0xBD	<RL>+0x003D	# EQUALS SIGN, right-left
+0xBE	<RL>+0x003E	# GREATER-THAN SIGN, right-left
+0xBF	<RL>+0x003F	# QUESTION MARK, right-left
+0xC0	0xF86A+0x05DC+0x05B9	# Hebrew ligature lamed holam
+0xC1	<RL>+0x201E	# DOUBLE LOW-9 QUOTATION MARK, right-left
+0xC2	0xF89B	# Hebrew canoral 1
+0xC3	0xF89C	# Hebrew canoral 2
+0xC4	0xF89D	# Hebrew canoral 3
+0xC5	0xF89E	# Hebrew canoral 4
+0xC6	0x05BC	# HEBREW POINT DAGESH OR MAPIQ
+0xC7	0xFB4B	# HEBREW LETTER VAV WITH HOLAM
+0xC8	0xFB35	# HEBREW LETTER VAV WITH DAGESH
+0xC9	<RL>+0x2026	# HORIZONTAL ELLIPSIS, right-left
+0xCA	<RL>+0x00A0	# NO-BREAK SPACE, right-left
+0xCB	0x05B8	# HEBREW POINT QAMATS
+0xCC	0x05B7	# HEBREW POINT PATAH
+0xCD	0x05B5	# HEBREW POINT TSERE
+0xCE	0x05B6	# HEBREW POINT SEGOL
+0xCF	0x05B4	# HEBREW POINT HIRIQ
+0xD0	<RL>+0x2013	# EN DASH, right-left
+0xD1	<RL>+0x2014	# EM DASH, right-left
+0xD2	<RL>+0x201C	# LEFT DOUBLE QUOTATION MARK, right-left
+0xD3	<RL>+0x201D	# RIGHT DOUBLE QUOTATION MARK, right-left
+0xD4	<RL>+0x2018	# LEFT SINGLE QUOTATION MARK, right-left
+0xD5	<RL>+0x2019	# RIGHT SINGLE QUOTATION MARK, right-left
+0xD6	0xFB2A	# HEBREW LETTER SHIN WITH SHIN DOT
+0xD7	0xFB2B	# HEBREW LETTER SHIN WITH SIN DOT
+0xD8	0x05BF	# HEBREW POINT RAFE
+0xD9	0x05B0	# HEBREW POINT SHEVA
+0xDA	0x05B2	# HEBREW POINT HATAF PATAH
+0xDB	0x05B1	# HEBREW POINT HATAF SEGOL
+0xDC	0x05BB	# HEBREW POINT QUBUTS
+0xDD	0x05B9	# HEBREW POINT HOLAM
+0xDE	0x05B8+0xF87F	# HEBREW POINT QAMATS, alternate form "qamats qatan"
+0xDF	0x05B3	# HEBREW POINT HATAF QAMATS
+0xE0	0x05D0	# HEBREW LETTER ALEF
+0xE1	0x05D1	# HEBREW LETTER BET
+0xE2	0x05D2	# HEBREW LETTER GIMEL
+0xE3	0x05D3	# HEBREW LETTER DALET
+0xE4	0x05D4	# HEBREW LETTER HE
+0xE5	0x05D5	# HEBREW LETTER VAV
+0xE6	0x05D6	# HEBREW LETTER ZAYIN
+0xE7	0x05D7	# HEBREW LETTER HET
+0xE8	0x05D8	# HEBREW LETTER TET
+0xE9	0x05D9	# HEBREW LETTER YOD
+0xEA	0x05DA	# HEBREW LETTER FINAL KAF
+0xEB	0x05DB	# HEBREW LETTER KAF
+0xEC	0x05DC	# HEBREW LETTER LAMED
+0xED	0x05DD	# HEBREW LETTER FINAL MEM
+0xEE	0x05DE	# HEBREW LETTER MEM
+0xEF	0x05DF	# HEBREW LETTER FINAL NUN
+0xF0	0x05E0	# HEBREW LETTER NUN
+0xF1	0x05E1	# HEBREW LETTER SAMEKH
+0xF2	0x05E2	# HEBREW LETTER AYIN
+0xF3	0x05E3	# HEBREW LETTER FINAL PE
+0xF4	0x05E4	# HEBREW LETTER PE
+0xF5	0x05E5	# HEBREW LETTER FINAL TSADI
+0xF6	0x05E6	# HEBREW LETTER TSADI
+0xF7	0x05E7	# HEBREW LETTER QOF
+0xF8	0x05E8	# HEBREW LETTER RESH
+0xF9	0x05E9	# HEBREW LETTER SHIN
+0xFA	0x05EA	# HEBREW LETTER TAV
+0xFB	<RL>+0x007D	# RIGHT CURLY BRACKET, right-left
+0xFC	<RL>+0x005D	# RIGHT SQUARE BRACKET, right-left
+0xFD	<RL>+0x007B	# LEFT CURLY BRACKET, right-left
+0xFE	<RL>+0x005B	# LEFT SQUARE BRACKET, right-left
+0xFF	<RL>+0x007C	# VERTICAL LINE, right-left
--- a/charmap/ICELAND.TXT
+++ b/charmap/ICELAND.TXT
@ -0,0 +1,369 @@
+#=======================================================================
+#   File name:  ICELAND.TXT
+#
+#   Contents:   Map (external version) from Mac OS Icelandic
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update URLs, notes. Matches internal
+#                           utom<b3>.
+#       b02  1999-Sep-22    Encoding changed for Mac OS 8.5; change
+#                           mapping of 0xDB from CURRENCY SIGN to EURO
+#                           SIGN. Update contact e-mail address. Matches
+#                           internal utom<b2>, ufrm<b2>, and Text
+#                           Encoding Converter version 1.5.
+#       n06  1998-Feb-05    Minor update to header comments, add
+#                           information on font variants
+#       n03  1997-Dec-14    Update to match internal utom<n4>, ufrm<n16>:
+#                           Change standard mapping for 0xBD from U+2126
+#                           to its canonical decomposition, U+03A9.
+#       n02  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<n5>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Icelandic code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Icelandic code order.
+#
+#   One of these mappings requires the use of a corporate character.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Icelandic character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Icelandic:
+# --------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   1. General
+#
+#   Mac OS Icelandic is used for Icelandic and Faroese.
+#
+#   The Mac OS Icelandic encoding shares the script code smRoman
+#   (0) with the standard Mac OS Roman encoding. To determine if
+#   the Icelandic encoding is being used, you must also check if
+#   the system region code is 21, verIceland.
+#
+#   This character set is a variant of standard Mac OS Roman,
+#   adding upper and lower eth, thorn, and Y acute. It has 6 code
+#   point differences from standard Mac OS Roman.
+#
+#   Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
+#   mapped to U+00A4. In Mac OS 8.5 and later versions, code point
+#   0xDB is changed to EURO SIGN and maps to U+20AC; the standard
+#   Apple fonts are updated for Mac OS 8.5 to reflect this. There are
+#   "currency sign" variants of the Mac OS Icelandic encoding that
+#   still map 0xDB to U+00A4; these can be used for older fonts.
+#
+#   2. Font variants
+#
+#   The table in this file gives the Unicode mappings for the standard
+#   Mac OS Icelandic encoding. This encoding is supported by the
+#   Icelandic versions of the fonts Chicago, Geneva, Monaco, and New
+#   York, and is the encoding supported by the text processing
+#   utilities. However, other TrueType fonts implement a slightly
+#   different encoding; the difference is only in two code points.
+#   For the standard variant, these are:
+#     0xBB -> 0x00AA  FEMININE ORDINAL INDICATOR
+#     0xBC -> 0x00BA  MASCULINE ORDINAL INDICATOR
+#
+#   For the TrueType variant (used by the Icelandic versions of the
+#   fonts Courier, Helvetica, Palatino, and Times), these are:
+#     0xBB -> 0xFB01  LATIN SMALL LIGATURE FI
+#     0xBC -> 0xFB02  LATIN SMALL LIGATURE FL
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   The following corporate zone Unicode character is used in this
+#   mapping:
+#
+#     0xF8FF  Apple logo
+#
+#   NOTE: The graphic image associated with the Apple logo character
+#   is not authorized for use without permission of Apple, and
+#   unauthorized use might constitute trademark infringement.
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version n06 to version b02:
+#
+#   - Encoding changed for Mac OS 8.5; change mapping of 0xDB from
+#   CURRENCY SIGN (U+00A4) to EURO SIGN (U+20AC).
+#
+#   Changes from version n02 to version n03:
+#
+#   - Change mapping of 0xBD from U+2126 to its canonical
+#     decomposition, U+03A9.
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x00C5	# LATIN CAPITAL LETTER A WITH RING ABOVE
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x00E3	# LATIN SMALL LETTER A WITH TILDE
+0x8C	0x00E5	# LATIN SMALL LETTER A WITH RING ABOVE
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	0x00EC	# LATIN SMALL LETTER I WITH GRAVE
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	0x00F2	# LATIN SMALL LETTER O WITH GRAVE
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00F5	# LATIN SMALL LETTER O WITH TILDE
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	0x00DD	# LATIN CAPITAL LETTER Y WITH ACUTE
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x00A2	# CENT SIGN
+0xA3	0x00A3	# POUND SIGN
+0xA4	0x00A7	# SECTION SIGN
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x00DF	# LATIN SMALL LETTER SHARP S
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x00B4	# ACUTE ACCENT
+0xAC	0x00A8	# DIAERESIS
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x00C6	# LATIN CAPITAL LETTER AE
+0xAF	0x00D8	# LATIN CAPITAL LETTER O WITH STROKE
+0xB0	0x221E	# INFINITY
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x00A5	# YEN SIGN
+0xB5	0x00B5	# MICRO SIGN
+0xB6	0x2202	# PARTIAL DIFFERENTIAL
+0xB7	0x2211	# N-ARY SUMMATION
+0xB8	0x220F	# N-ARY PRODUCT
+0xB9	0x03C0	# GREEK SMALL LETTER PI
+0xBA	0x222B	# INTEGRAL
+0xBB	0x00AA	# FEMININE ORDINAL INDICATOR
+0xBC	0x00BA	# MASCULINE ORDINAL INDICATOR
+0xBD	0x03A9	# GREEK CAPITAL LETTER OMEGA
+0xBE	0x00E6	# LATIN SMALL LETTER AE
+0xBF	0x00F8	# LATIN SMALL LETTER O WITH STROKE
+0xC0	0x00BF	# INVERTED QUESTION MARK
+0xC1	0x00A1	# INVERTED EXCLAMATION MARK
+0xC2	0x00AC	# NOT SIGN
+0xC3	0x221A	# SQUARE ROOT
+0xC4	0x0192	# LATIN SMALL LETTER F WITH HOOK
+0xC5	0x2248	# ALMOST EQUAL TO
+0xC6	0x2206	# INCREMENT
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x00C0	# LATIN CAPITAL LETTER A WITH GRAVE
+0xCC	0x00C3	# LATIN CAPITAL LETTER A WITH TILDE
+0xCD	0x00D5	# LATIN CAPITAL LETTER O WITH TILDE
+0xCE	0x0152	# LATIN CAPITAL LIGATURE OE
+0xCF	0x0153	# LATIN SMALL LIGATURE OE
+0xD0	0x2013	# EN DASH
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x00F7	# DIVISION SIGN
+0xD7	0x25CA	# LOZENGE
+0xD8	0x00FF	# LATIN SMALL LETTER Y WITH DIAERESIS
+0xD9	0x0178	# LATIN CAPITAL LETTER Y WITH DIAERESIS
+0xDA	0x2044	# FRACTION SLASH
+0xDB	0x20AC	# EURO SIGN
+0xDC	0x00D0	# LATIN CAPITAL LETTER ETH
+0xDD	0x00F0	# LATIN SMALL LETTER ETH
+0xDE	0x00DE	# LATIN CAPITAL LETTER THORN
+0xDF	0x00FE	# LATIN SMALL LETTER THORN
+0xE0	0x00FD	# LATIN SMALL LETTER Y WITH ACUTE
+0xE1	0x00B7	# MIDDLE DOT
+0xE2	0x201A	# SINGLE LOW-9 QUOTATION MARK
+0xE3	0x201E	# DOUBLE LOW-9 QUOTATION MARK
+0xE4	0x2030	# PER MILLE SIGN
+0xE5	0x00C2	# LATIN CAPITAL LETTER A WITH CIRCUMFLEX
+0xE6	0x00CA	# LATIN CAPITAL LETTER E WITH CIRCUMFLEX
+0xE7	0x00C1	# LATIN CAPITAL LETTER A WITH ACUTE
+0xE8	0x00CB	# LATIN CAPITAL LETTER E WITH DIAERESIS
+0xE9	0x00C8	# LATIN CAPITAL LETTER E WITH GRAVE
+0xEA	0x00CD	# LATIN CAPITAL LETTER I WITH ACUTE
+0xEB	0x00CE	# LATIN CAPITAL LETTER I WITH CIRCUMFLEX
+0xEC	0x00CF	# LATIN CAPITAL LETTER I WITH DIAERESIS
+0xED	0x00CC	# LATIN CAPITAL LETTER I WITH GRAVE
+0xEE	0x00D3	# LATIN CAPITAL LETTER O WITH ACUTE
+0xEF	0x00D4	# LATIN CAPITAL LETTER O WITH CIRCUMFLEX
+0xF0	0xF8FF	# Apple logo
+0xF1	0x00D2	# LATIN CAPITAL LETTER O WITH GRAVE
+0xF2	0x00DA	# LATIN CAPITAL LETTER U WITH ACUTE
+0xF3	0x00DB	# LATIN CAPITAL LETTER U WITH CIRCUMFLEX
+0xF4	0x00D9	# LATIN CAPITAL LETTER U WITH GRAVE
+0xF5	0x0131	# LATIN SMALL LETTER DOTLESS I
+0xF6	0x02C6	# MODIFIER LETTER CIRCUMFLEX ACCENT
+0xF7	0x02DC	# SMALL TILDE
+0xF8	0x00AF	# MACRON
+0xF9	0x02D8	# BREVE
+0xFA	0x02D9	# DOT ABOVE
+0xFB	0x02DA	# RING ABOVE
+0xFC	0x00B8	# CEDILLA
+0xFD	0x02DD	# DOUBLE ACUTE ACCENT
+0xFE	0x02DB	# OGONEK
+0xFF	0x02C7	# CARON
--- a/charmap/INUIT.TXT
+++ b/charmap/INUIT.TXT
@ -0,0 +1,322 @@
+#=======================================================================
+#   File name:  INUIT.TXT
+#
+#   Contents:   Map (external version) from Mac OS Inuit
+#               character set to Unicode 3.0 and later
+#
+#   Contacts:   charsets@apple.com, everson@evertype.com
+#
+#   Changes:
+#
+#       c01  2005-Apr-01    First posted version. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Inuit code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Inuit code order.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Inuit character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Inuit (partly from Michael Everson):
+# ----------------------------------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   This character set was developed by Michael Everson of Everson
+#   Typography (everson@evertype.com) and was used for the Inuktitut
+#   localizations of Mac OS, as well as for the Inuktitut utilities
+#	package from Everson Typography. Note that while Apple authorized
+#   the Inuktitut localization mentioned above, it was not shipped with
+#   Apple hardware, and was not otherwise supported by Apple. Fonts
+#   conforming to the Mac OS Inuit character set are available from
+#   Everson Typography (http://www.evertype.com/software/apple/).
+#   Information about the use of this character set is available at 
+#   http://www.evertype.com/standards/iu/.
+#
+#   The Mac OS Inuit character set shares the script code smEthiopic
+#   (28) with the Ethiopic encoding. To determine if the Inuktitut
+#   encoding is being used, you must also check if the system region
+#   code is 78, verNunavut.
+#
+#   The Mac OS Inuit character set includes the full syllabic letter
+#   repertoire required for Inuktitut; it is a subset of the Unified
+#   Canadian Aboriginal Syllabics set encoded in Unicode. The encoding
+#   is InuitSCII, designed by Doug Hitch for the Government of the 
+#   Northwest Territories.
+#
+#   The Mac OS Inuit character set also includes a number of characters
+#   that were needed for the classic Mac OS user interface and
+#   localization (e.g. ellipsis, bullet, copyright sign). All of the
+#   characters in Mac OS Inuit that are also in the Mac OS Roman
+#   encoding are at the same code point in both; this improves
+#   application compatibility.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x1403	# CANADIAN SYLLABICS I
+0x81	0x1404	# CANADIAN SYLLABICS II
+0x82	0x1405	# CANADIAN SYLLABICS O
+0x83	0x1406	# CANADIAN SYLLABICS OO
+0x84	0x140A	# CANADIAN SYLLABICS A
+0x85	0x140B	# CANADIAN SYLLABICS AA
+0x86	0x1431	# CANADIAN SYLLABICS PI
+0x87	0x1432	# CANADIAN SYLLABICS PII
+0x88	0x1433	# CANADIAN SYLLABICS PO
+0x89	0x1434	# CANADIAN SYLLABICS POO
+0x8A	0x1438	# CANADIAN SYLLABICS PA
+0x8B	0x1439	# CANADIAN SYLLABICS PAA
+0x8C	0x1449	# CANADIAN SYLLABICS P
+0x8D	0x144E	# CANADIAN SYLLABICS TI
+0x8E	0x144F	# CANADIAN SYLLABICS TII
+0x8F	0x1450	# CANADIAN SYLLABICS TO
+0x90	0x1451	# CANADIAN SYLLABICS TOO
+0x91	0x1455	# CANADIAN SYLLABICS TA
+0x92	0x1456	# CANADIAN SYLLABICS TAA
+0x93	0x1466	# CANADIAN SYLLABICS T
+0x94	0x146D	# CANADIAN SYLLABICS KI
+0x95	0x146E	# CANADIAN SYLLABICS KII
+0x96	0x146F	# CANADIAN SYLLABICS KO
+0x97	0x1470	# CANADIAN SYLLABICS KOO
+0x98	0x1472	# CANADIAN SYLLABICS KA
+0x99	0x1473	# CANADIAN SYLLABICS KAA
+0x9A	0x1483	# CANADIAN SYLLABICS K
+0x9B	0x148B	# CANADIAN SYLLABICS CI
+0x9C	0x148C	# CANADIAN SYLLABICS CII
+0x9D	0x148D	# CANADIAN SYLLABICS CO
+0x9E	0x148E	# CANADIAN SYLLABICS COO
+0x9F	0x1490	# CANADIAN SYLLABICS CA
+0xA0	0x1491	# CANADIAN SYLLABICS CAA
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x14A1	# CANADIAN SYLLABICS C
+0xA3	0x14A5	# CANADIAN SYLLABICS MI
+0xA4	0x14A6	# CANADIAN SYLLABICS MII
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x14A7	# CANADIAN SYLLABICS MO
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x14A8	# CANADIAN SYLLABICS MOO
+0xAC	0x14AA	# CANADIAN SYLLABICS MA
+0xAD	0x14AB	# CANADIAN SYLLABICS MAA
+0xAE	0x14BB	# CANADIAN SYLLABICS M
+0xAF	0x14C2	# CANADIAN SYLLABICS NI
+0xB0	0x14C3	# CANADIAN SYLLABICS NII
+0xB1	0x14C4	# CANADIAN SYLLABICS NO
+0xB2	0x14C5	# CANADIAN SYLLABICS NOO
+0xB3	0x14C7	# CANADIAN SYLLABICS NA
+0xB4	0x14C8	# CANADIAN SYLLABICS NAA
+0xB5	0x14D0	# CANADIAN SYLLABICS N
+0xB6	0x14EF	# CANADIAN SYLLABICS SI
+0xB7	0x14F0	# CANADIAN SYLLABICS SII
+0xB8	0x14F1	# CANADIAN SYLLABICS SO
+0xB9	0x14F2	# CANADIAN SYLLABICS SOO
+0xBA	0x14F4	# CANADIAN SYLLABICS SA
+0xBB	0x14F5	# CANADIAN SYLLABICS SAA
+0xBC	0x1505	# CANADIAN SYLLABICS S
+0xBD	0x14D5	# CANADIAN SYLLABICS LI
+0xBE	0x14D6	# CANADIAN SYLLABICS LII
+0xBF	0x14D7	# CANADIAN SYLLABICS LO
+0xC0	0x14D8	# CANADIAN SYLLABICS LOO
+0xC1	0x14DA	# CANADIAN SYLLABICS LA
+0xC2	0x14DB	# CANADIAN SYLLABICS LAA
+0xC3	0x14EA	# CANADIAN SYLLABICS L
+0xC4	0x1528	# CANADIAN SYLLABICS YI
+0xC5	0x1529	# CANADIAN SYLLABICS YII
+0xC6	0x152A	# CANADIAN SYLLABICS YO
+0xC7	0x152B	# CANADIAN SYLLABICS YOO
+0xC8	0x152D	# CANADIAN SYLLABICS YA
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x152E	# CANADIAN SYLLABICS YAA
+0xCC	0x153E	# CANADIAN SYLLABICS Y
+0xCD	0x1555	# CANADIAN SYLLABICS FI
+0xCE	0x1556	# CANADIAN SYLLABICS FII
+0xCF	0x1557	# CANADIAN SYLLABICS FO
+0xD0	0x2013	# EN DASH
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x1558	# CANADIAN SYLLABICS FOO
+0xD7	0x1559	# CANADIAN SYLLABICS FA
+0xD8	0x155A	# CANADIAN SYLLABICS FAA
+0xD9	0x155D	# CANADIAN SYLLABICS F
+0xDA	0x1546	# CANADIAN SYLLABICS RI
+0xDB	0x1547	# CANADIAN SYLLABICS RII
+0xDC	0x1548	# CANADIAN SYLLABICS RO
+0xDD	0x1549	# CANADIAN SYLLABICS ROO
+0xDE	0x154B	# CANADIAN SYLLABICS RA
+0xDF	0x154C	# CANADIAN SYLLABICS RAA
+0xE0	0x1550	# CANADIAN SYLLABICS R
+0xE1	0x157F	# CANADIAN SYLLABICS QI
+0xE2	0x1580	# CANADIAN SYLLABICS QII
+0xE3	0x1581	# CANADIAN SYLLABICS QO
+0xE4	0x1582	# CANADIAN SYLLABICS QOO
+0xE5	0x1583	# CANADIAN SYLLABICS QA
+0xE6	0x1584	# CANADIAN SYLLABICS QAA
+0xE7	0x1585	# CANADIAN SYLLABICS Q
+0xE8	0x158F	# CANADIAN SYLLABICS NGI
+0xE9	0x1590	# CANADIAN SYLLABICS NGII
+0xEA	0x1591	# CANADIAN SYLLABICS NGO
+0xEB	0x1592	# CANADIAN SYLLABICS NGOO
+0xEC	0x1593	# CANADIAN SYLLABICS NGA
+0xED	0x1594	# CANADIAN SYLLABICS NGAA
+0xEE	0x1595	# CANADIAN SYLLABICS NG
+0xEF	0x1671	# CANADIAN SYLLABICS NNGI
+0xF0	0x1672	# CANADIAN SYLLABICS NNGII
+0xF1	0x1673	# CANADIAN SYLLABICS NNGO
+0xF2	0x1674	# CANADIAN SYLLABICS NNGOO
+0xF3	0x1675	# CANADIAN SYLLABICS NNGA
+0xF4	0x1676	# CANADIAN SYLLABICS NNGAA
+0xF5	0x1596	# CANADIAN SYLLABICS NNG
+0xF6	0x15A0	# CANADIAN SYLLABICS LHI
+0xF7	0x15A1	# CANADIAN SYLLABICS LHII
+0xF8	0x15A2	# CANADIAN SYLLABICS LHO
+0xF9	0x15A3	# CANADIAN SYLLABICS LHOO
+0xFA	0x15A4	# CANADIAN SYLLABICS LHA
+0xFB	0x15A5	# CANADIAN SYLLABICS LHAA
+0xFC	0x15A6	# CANADIAN SYLLABICS LH
+0xFD	0x157C	# CANADIAN SYLLABICS NUNAVUT H
+0xFE	0x0141	# LATIN CAPITAL LETTER L WITH STROKE
+0xFF	0x0142	# LATIN SMALL LETTER L WITH STROKE
--- a/charmap/JAPANESE.TXT
+++ b/charmap/JAPANESE.TXT
--- a/charmap/KEYBOARD.TXT
+++ b/charmap/KEYBOARD.TXT
@ -0,0 +1,234 @@
+#=======================================================================
+#   File name:  KEYBOARD.TXT
+#
+#   Contents:   Map (external version) from Mac OS Keyboard
+#               character set to Unicode 4.0 and later.
+#
+#   Copyright:  (c) 2001-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Change mappings for 0x09, 0x0F, 0x8C; add
+#                           Mac OS X-only mappings for 0x8D-9x8F.
+#                           Update header comments, including
+#                           clarification of Mac OS X usage. Matches
+#                           internal xml <c1.2> and Text Encoding
+#                           Converter 2.0.
+#      b1,c1 2002-Dec-19    First version. Matches internal utom<b6>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Keyboard code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode or Unicode sequence
+#       (in hex as 0xNNNN or 0xNNNN+0xNNNN, etc.).
+#     Column #3 is a comment containing the Unicode name.
+#       In some cases an additional comment follows the Unicode name.
+#
+#   The entries are in Mac OS Keyboard code order.
+#
+#   Some of these mappings require the use of corporate characters.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   The Mac OS Keyboard character set uses the ranges normally set aside
+#   for controls, so those ranges are present in this table.
+#
+# Notes on Mac OS Keyboard:
+# -------------------------
+#
+#   This is the encoding for the legacy font named ".Keyboard". Before
+#   Mac OS X, this font was used by the user-interface system to display
+#   glyphs for special keys on the keyboard. In Mac OS X, that font is
+#   not present and this mapping is not associated with a font; it is
+#   only used as a way to map from a set of Menu Manager constants to
+#   associated Unicode sequences. As such, new mappings added for Mac OS
+#   X only may be one-way mappings: From the Keyboard glyph "encoding"
+#   to Unicode, but not back.
+#
+#   The Mac OS Keyboard encoding shares the script code smRoman
+#   (0) with the Mac OS Roman encoding. To determine if the Keyboard
+#   encoding is being used in Mac OS 8 or Mac OS 9, you must check if
+#   the font name is ".Keyboard".
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   The goals in the mappings provided here are:
+#   - For mappings used in Mac OS 8 and Mac OS 9, ensure roundtrip
+#     mapping from every character in the Mac OS  Keyboard character set
+#     to Unicode and back. This consideration does not apply to mappings
+#     added for Mac OS X only (noted below).
+#   - Use standard Unicode characters as much as possible, to
+#     maximize interchangeability of the resulting Unicode text.
+#     Whenever possible, avoid having content carried by private-use
+#     characters.
+#
+#   Some of the characters in the Mac OS Keyboard character set do not
+#   correspond to distinct, single Unicode characters. To map these
+#   and satisfy both goals above, we employ various strategies.
+#
+#   a) If possible, use private use characters in combination with
+#   standard Unicode characters to mark variants of the standard
+#   Unicode character.
+#
+#   Apple has defined a block of 32 corporate characters as "transcoding
+#   hints." These are used in combination with standard Unicode
+#   characters to force them to be treated in a special way for mapping
+#   to other encodings; they have no other effect. Sixteen of these
+#   transcoding hints are "grouping hints" - they indicate that the next
+#   2-4 Unicode characters should be treated as a single entity for
+#   transcoding. The other sixteen transcoding hints are "variant tags"
+#   - they are like combining characters, and can follow a standard
+#   Unicode (or a sequence consisting of a base character and other
+#   combining characters) to cause it to be treated in a special way for
+#   transcoding. These always terminate a combining-character sequence.
+#
+#   The transcoding coding hints used in this mapping table are two
+#   grouping tags, 0xF860-61, and one variant tag, 0xF87F. Since these
+#   are combined with standard Unicode characters, some characters in
+#   the Mac OS Keyboard character set map to a sequence of two to four
+#   Unicodes instead of a single Unicode character.
+#
+#   For example, the Mac OS Keyboard character at 0x6F, representing the
+#   F1 key, is mapped to Unicode using the grouping tag F860 (group next
+#   two) followed by U+0046 (LATIN CAPITAL LETTER F) and U+0031 (DIGIT
+#   ONE).
+#
+#   b) Otherwise, use private use characters by themselves to map Mac OS
+#   Keyboard characters which have no relationship to any standard
+#   Unicode character.
+#
+#   The following additional corporate zone Unicode characters are
+#   used for this purpose here:
+#
+#     0xF802  Lower left pencil
+#     0xF803  Contextual menu key symbol
+#     0xF8FF  Apple logo
+#
+#   NOTE: The graphic image associated with the Apple logo character
+#   is not authorized for use without permission of Apple, and
+#   unauthorized use might constitute trademark infringement.
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version c01 to version c02:
+#
+#   - Mapping for 0x09 changed from 0x0009 (wrong) to 0x2423
+#   - Mapping for 0x0F changed from 0x270E (wrong) to 0xF802
+#   - Mapping for 0x8C changed from 0xF804 to 0x23CF (Unicode 4.0)
+#   - Add Mac OS X-only mappings for 0x8D-0x8F
+#
+##################
+
+0x00	0x0000	# control - NUL
+#
+0x02	0x21E5	# RIGHTWARDS ARROW TO BAR # Tab right (left-to-right text)
+0x03	0x21E4	# LEFTWARDS ARROW TO BAR # Tab left (right-to-left text)
+0x04	0x2324	# UP ARROWHEAD BETWEEN TWO HORIZONTAL BARS # Enter key
+0x05	0x21E7	# UPWARDS WHITE ARROW # Shift key
+0x06	0x2303	# UP ARROWHEAD # Control key
+0x07	0x2325	# OPTION KEY # Option key
+0x08	0x0008	# control - BS
+0x09	0x2423	# OPEN BOX # Space key (Mac OS X mapping, duplicates mapping for 0x61, hence no round-trip)
+0x0A	0x2326	# ERASE TO THE RIGHT # Delete right (right-to-left text)
+0x0B	0x21A9	# LEFTWARDS ARROW WITH HOOK # Return key (left-to-right text)
+0x0C	0x21AA	# RIGHTWARDS ARROW WITH HOOK # Return key (right-to-left text)
+0x0D	0x000D	# control - CR
+#
+0x0F	0xF802	# lower left pencil
+0x10	0x21E3	# DOWNWARDS DASHED ARROW
+0x11	0x2318	# PLACE OF INTEREST SIGN # Command key
+0x12	0x2713	# CHECK MARK
+0x13	0x25C6	# BLACK DIAMOND
+0x14	0xF8FF	# Apple logo
+#
+0x17	0x232B	# ERASE TO THE LEFT # Delete left (left-to-right text)
+0x18	0x21E0	# LEFTWARDS DASHED ARROW
+0x19	0x21E1	# UPWARDS DASHED ARROW
+0x1A	0x21E2	# RIGHTWARDS DASHED ARROW
+0x1B	0x238B	# BROKEN CIRCLE WITH NORTHWEST ARROW # Escape key; for Unicode 3.0 and later
+0x1C	0x2327	# X IN A RECTANGLE BOX # Clear key
+#
+0x20	0x0020	# SPACE
+#
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+#
+0x46	0x0046	# LATIN CAPITAL LETTER F
+#
+0x61	0x2423	# OPEN BOX # Blank key
+0x62	0x21DE	# UPWARDS ARROW WITH DOUBLE STROKE # Page up key
+0x63	0x21EA	# UPWARDS WHITE ARROW FROM BAR # Caps lock key
+0x64	0x2190	# LEFTWARDS ARROW
+0x65	0x2192	# RIGHTWARDS ARROW
+0x66	0x2196	# NORTH WEST ARROW
+0x67	0x003F+0x20DD	# QUESTION MARK + COMBINING ENCLOSING CIRCLE # Help key
+0x68	0x2191	# UPWARDS ARROW
+0x69	0x2198	# SOUTH EAST ARROW
+0x6A	0x2193	# DOWNWARDS ARROW
+0x6B	0x21DF	# DOWNWARDS ARROW WITH DOUBLE STROKE # Page down key
+0x6C	0xF8FF+0xF87F	# Apple logo, outline
+0x6D	0xF803	# Contextual menu key symbol
+0x6E	0x2758+0x20DD	# LIGHT VERTICAL BAR + COMBINING ENCLOSING CIRCLE # Power key
+0x6F	0xF860+0x0046+0x0031	# group_2 + F + 1 # F1 key
+0x70	0xF860+0x0046+0x0032	# group_2 + F + 2 # F2 key
+0x71	0xF860+0x0046+0x0033	# group_2 + F + 3 # F3 key
+0x72	0xF860+0x0046+0x0034	# group_2 + F + 4 # F4 key
+0x73	0xF860+0x0046+0x0035	# group_2 + F + 5 # F5 key
+0x74	0xF860+0x0046+0x0036	# group_2 + F + 6 # F6 key
+0x75	0xF860+0x0046+0x0037	# group_2 + F + 7 # F7 key
+0x76	0xF860+0x0046+0x0038	# group_2 + F + 8 # F8 key
+0x77	0xF860+0x0046+0x0039	# group_2 + F + 9 # F9 key
+0x78	0xF861+0x0046+0x0031+0x0030	# group_3 + F + 1 + 0 # F10 key
+0x79	0xF861+0x0046+0x0031+0x0031	# group_3 + F + 1 + 1 # F11 key
+0x7A	0xF861+0x0046+0x0031+0x0032	# group_3 + F + 1 + 2 # F12 key
+#
+0x87	0xF861+0x0046+0x0031+0x0033	# group_3 + F + 1 + 3 # F13 key
+0x88	0xF861+0x0046+0x0031+0x0034	# group_3 + F + 1 + 4 # F14 key
+0x89	0xF861+0x0046+0x0031+0x0035	# group_3 + F + 1 + 5 # F15 key
+0x8A	0x2388	# HELM SYMBOL # Control key (ISO standard), Unicode 3.0 and later
+0x8B	0x2387	# ALTERNATIVE KEY SYMBOL # Unicode 3.0 and later
+0x8C	0x23CF	# EJECT SYMBOL # Unicode 4.0 and later, Mac OS X only
+0x8D	0x82F1+0x6570	# Japanese "eisu" key symbol # Mac OS X only
+0x8E	0x304B+0x306A	# Japanese "kana" key symbol # Mac OS X only
+0x8F	0xF861+0x0046+0x0031+0x0036	# group_3 + F + 1 + 6 # F16 key, Mac OS X only
+#
--- a/charmap/KOREAN.TXT
+++ b/charmap/KOREAN.TXT
--- a/charmap/ROMAN.TXT
+++ b/charmap/ROMAN.TXT
@ -0,0 +1,370 @@
+#=======================================================================
+#   File name:  ROMAN.TXT
+#
+#   Contents:   Map (external version) from Mac OS Roman
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1994-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b4,c1 2002-Dec-19    Update URLs, notes. Matches internal
+#                           utom<b5>.
+#       b03  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b4>, ufrm<b3>, and Text
+#                           Encoding Converter version 1.5.
+#       b02  1998-Aug-18    Encoding changed for Mac OS 8.5; change
+#                           mapping of 0xDB from CURRENCY SIGN to
+#                           EURO SIGN. Matches internal utom<b3>,
+#                           ufrm<b3>.
+#       n08  1998-Feb-05    Minor update to header comments
+#       n06  1997-Dec-14    Add warning about future changes to 0xDB
+#                           from CURRENCY SIGN to EURO SIGN. Clarify
+#                           some header information
+#       n04  1997-Dec-01    Update to match internal utom<n3>, ufrm<n22>:
+#                           Change standard mapping for 0xBD from U+2126
+#                           to its canonical decomposition, U+03A9.
+#       n03  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<n9>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Roman code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Roman code order.
+#
+#   One of these mappings requires the use of a corporate character.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Roman character set uses the standard control characters at
+#   0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Roman:
+# ----------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported directly in programming
+#   interfaces for QuickDraw Text, the Script Manager, and related
+#   Text Utilities. For other purposes it is supported via transcoding
+#   to and from Unicode.
+#
+#   This character set is used for at least the following Mac OS
+#   localizations: U.S., British, Canadian French, French, Swiss
+#   French, German, Swiss German, Italian, Swiss Italian, Dutch,
+#   Swedish, Norwegian, Danish, Finnish, Spanish, Catalan,
+#   Portuguese, Brazilian, and the default International system.
+#
+#   Variants of Mac OS Roman are used for Croatian, Icelandic,
+#   Turkish, Romanian, and other encodings. Separate mapping tables
+#   are available for these encodings.
+#
+#   Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
+#   mapped to U+00A4. In Mac OS 8.5 and later versions, code point
+#   0xDB is changed to EURO SIGN and maps to U+20AC; the standard
+#   Apple fonts are updated for Mac OS 8.5 to reflect this. There is
+#   a "currency sign" variant of the Mac OS Roman encoding that still
+#   maps 0xDB to U+00A4; this can be used for older fonts.
+#
+#   Before Mac OS 8.5, the ROM bitmap versions of the fonts Chicago,
+#   New York, Geneva, and Monaco did not implement the full Mac OS
+#   Roman character set; they only supported character codes up to
+#   0xD8. The TrueType versions of these fonts have always implemented
+#   the full character set, as with the bitmap and TrueType versions
+#   of the other standard Roman fonts.
+#
+#   In all Mac OS encodings, fonts such as Chicago which are used
+#   as "system" fonts (for menus, dialogs, etc.) have four glyphs
+#   at code points 0x11-0x14 for transient use by the Menu Manager.
+#   These glyphs are not intended as characters for use in normal
+#   text, and the associated code points are not generally
+#   interpreted as associated with these glyphs; they are usually
+#   interpreted (if at all) as the control codes DC1-DC4.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   The following corporate zone Unicode character is used in this
+#   mapping:
+#
+#     0xF8FF  Apple logo
+#
+#   NOTE: The graphic image associated with the Apple logo character
+#   is not authorized for use without permission of Apple, and
+#   unauthorized use might constitute trademark infringement.
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version n08 to version b02:
+#
+#   - Encoding changed for Mac OS 8.5; change mapping of 0xDB from
+#   CURRENCY SIGN (U+00A4) to EURO SIGN (U+20AC).
+#
+#   Changes from version n03 to version n04:
+#
+#   - Change mapping of 0xBD from U+2126 to its canonical
+#     decomposition, U+03A9.
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x00C5	# LATIN CAPITAL LETTER A WITH RING ABOVE
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x00E3	# LATIN SMALL LETTER A WITH TILDE
+0x8C	0x00E5	# LATIN SMALL LETTER A WITH RING ABOVE
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	0x00EC	# LATIN SMALL LETTER I WITH GRAVE
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	0x00F2	# LATIN SMALL LETTER O WITH GRAVE
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00F5	# LATIN SMALL LETTER O WITH TILDE
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	0x2020	# DAGGER
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x00A2	# CENT SIGN
+0xA3	0x00A3	# POUND SIGN
+0xA4	0x00A7	# SECTION SIGN
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x00DF	# LATIN SMALL LETTER SHARP S
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x00B4	# ACUTE ACCENT
+0xAC	0x00A8	# DIAERESIS
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x00C6	# LATIN CAPITAL LETTER AE
+0xAF	0x00D8	# LATIN CAPITAL LETTER O WITH STROKE
+0xB0	0x221E	# INFINITY
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x00A5	# YEN SIGN
+0xB5	0x00B5	# MICRO SIGN
+0xB6	0x2202	# PARTIAL DIFFERENTIAL
+0xB7	0x2211	# N-ARY SUMMATION
+0xB8	0x220F	# N-ARY PRODUCT
+0xB9	0x03C0	# GREEK SMALL LETTER PI
+0xBA	0x222B	# INTEGRAL
+0xBB	0x00AA	# FEMININE ORDINAL INDICATOR
+0xBC	0x00BA	# MASCULINE ORDINAL INDICATOR
+0xBD	0x03A9	# GREEK CAPITAL LETTER OMEGA
+0xBE	0x00E6	# LATIN SMALL LETTER AE
+0xBF	0x00F8	# LATIN SMALL LETTER O WITH STROKE
+0xC0	0x00BF	# INVERTED QUESTION MARK
+0xC1	0x00A1	# INVERTED EXCLAMATION MARK
+0xC2	0x00AC	# NOT SIGN
+0xC3	0x221A	# SQUARE ROOT
+0xC4	0x0192	# LATIN SMALL LETTER F WITH HOOK
+0xC5	0x2248	# ALMOST EQUAL TO
+0xC6	0x2206	# INCREMENT
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x00C0	# LATIN CAPITAL LETTER A WITH GRAVE
+0xCC	0x00C3	# LATIN CAPITAL LETTER A WITH TILDE
+0xCD	0x00D5	# LATIN CAPITAL LETTER O WITH TILDE
+0xCE	0x0152	# LATIN CAPITAL LIGATURE OE
+0xCF	0x0153	# LATIN SMALL LIGATURE OE
+0xD0	0x2013	# EN DASH
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x00F7	# DIVISION SIGN
+0xD7	0x25CA	# LOZENGE
+0xD8	0x00FF	# LATIN SMALL LETTER Y WITH DIAERESIS
+0xD9	0x0178	# LATIN CAPITAL LETTER Y WITH DIAERESIS
+0xDA	0x2044	# FRACTION SLASH
+0xDB	0x20AC	# EURO SIGN
+0xDC	0x2039	# SINGLE LEFT-POINTING ANGLE QUOTATION MARK
+0xDD	0x203A	# SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
+0xDE	0xFB01	# LATIN SMALL LIGATURE FI
+0xDF	0xFB02	# LATIN SMALL LIGATURE FL
+0xE0	0x2021	# DOUBLE DAGGER
+0xE1	0x00B7	# MIDDLE DOT
+0xE2	0x201A	# SINGLE LOW-9 QUOTATION MARK
+0xE3	0x201E	# DOUBLE LOW-9 QUOTATION MARK
+0xE4	0x2030	# PER MILLE SIGN
+0xE5	0x00C2	# LATIN CAPITAL LETTER A WITH CIRCUMFLEX
+0xE6	0x00CA	# LATIN CAPITAL LETTER E WITH CIRCUMFLEX
+0xE7	0x00C1	# LATIN CAPITAL LETTER A WITH ACUTE
+0xE8	0x00CB	# LATIN CAPITAL LETTER E WITH DIAERESIS
+0xE9	0x00C8	# LATIN CAPITAL LETTER E WITH GRAVE
+0xEA	0x00CD	# LATIN CAPITAL LETTER I WITH ACUTE
+0xEB	0x00CE	# LATIN CAPITAL LETTER I WITH CIRCUMFLEX
+0xEC	0x00CF	# LATIN CAPITAL LETTER I WITH DIAERESIS
+0xED	0x00CC	# LATIN CAPITAL LETTER I WITH GRAVE
+0xEE	0x00D3	# LATIN CAPITAL LETTER O WITH ACUTE
+0xEF	0x00D4	# LATIN CAPITAL LETTER O WITH CIRCUMFLEX
+0xF0	0xF8FF	# Apple logo
+0xF1	0x00D2	# LATIN CAPITAL LETTER O WITH GRAVE
+0xF2	0x00DA	# LATIN CAPITAL LETTER U WITH ACUTE
+0xF3	0x00DB	# LATIN CAPITAL LETTER U WITH CIRCUMFLEX
+0xF4	0x00D9	# LATIN CAPITAL LETTER U WITH GRAVE
+0xF5	0x0131	# LATIN SMALL LETTER DOTLESS I
+0xF6	0x02C6	# MODIFIER LETTER CIRCUMFLEX ACCENT
+0xF7	0x02DC	# SMALL TILDE
+0xF8	0x00AF	# MACRON
+0xF9	0x02D8	# BREVE
+0xFA	0x02D9	# DOT ABOVE
+0xFB	0x02DA	# RING ABOVE
+0xFC	0x00B8	# CEDILLA
+0xFD	0x02DD	# DOUBLE ACUTE ACCENT
+0xFE	0x02DB	# OGONEK
+0xFF	0x02C7	# CARON
--- a/charmap/ROMANIAN.TXT
+++ b/charmap/ROMANIAN.TXT
@ -0,0 +1,365 @@
+#=======================================================================
+#   File name:  ROMANIAN.TXT
+#
+#   Contents:   Map (external version) from Mac OS Romanian
+#               character set to Unicode 3.0 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.2> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update mappings for 0xAF, 0xBF, 0xDE, 0xDF
+#                           to use new composed characters added in
+#                           Unicode 3.0. Update URLs, notes. Matches
+#                           internal utom<b3>.
+#       b02  1999-Sep-22    Encoding changed for Mac OS 8.5; change
+#                           mapping of 0xDB from CURRENCY SIGN to EURO
+#                           SIGN. Update contact e-mail address. Matches
+#                           internal utom<b2>, ufrm<b2>, and Text
+#                           Encoding Converter version 1.5.
+#       n05  1998-Feb-05    Minor update to header comments
+#       n03  1997-Dec-14    Update to match internal utom<n5>, ufrm<n16>:
+#                           Change standard mapping for 0xBD from U+2126
+#                           to its canonical decomposition, U+03A9.
+#                           Change mapping of 0xAF,0xBF,0xDE,0xDF from
+#                           composed S/T WITH CEDILLA to S/T with
+#                           COMBINING COMMA BELOW (to match our
+#                           decomposition mappings).
+#       n02  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<n4>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Romanian code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Romanian code order.
+#
+#   One of these mappings requires the use of a corporate character.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Romanian character set uses the standard control characters at
+#   0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Romanian:
+# -------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   Mac OS Romanian is used only for Romanian.
+#
+#   The Mac OS Romanian encoding shares the script code smRoman
+#   (0) with the standard Mac OS Roman encoding. To determine if
+#   the Romanian encoding is being used, you must also check if the
+#   system region code is 39, verRomania.
+#
+#   This character set is a variant of standard Mac OS Roman, adding
+#   upper and lower A breve, S comma below, and T comma below. It
+#   has 6 code point differences from standard Mac OS Roman.
+#
+#   Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
+#   mapped to U+00A4. In Mac OS 8.5 and later versions, code point
+#   0xDB is changed to EURO SIGN and maps to U+20AC; the standard
+#   Apple fonts are updated for Mac OS 8.5 to reflect this. There is
+#   a "currency sign" variant of the Mac OS Romanian encoding that
+#   still maps 0xDB to U+00A4; this can be used for older fonts.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   The following corporate zone Unicode character is used in this
+#   mapping:
+#
+#     0xF8FF  Apple logo
+#
+#   NOTE: The graphic image associated with the Apple logo character
+#   is not authorized for use without permission of Apple, and
+#   unauthorized use might constitute trademark infringement.
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version b02 to version b03/c01:
+#
+#   - Update the mappings for 0xAF, 0xBF, 0xDE, 0xDF to use new
+#     composed Unicode characters 0x0218-0x021B added in Unicode 3.0;
+#     the previous mappings were to the equivalent decomposition
+#     sequences.
+#
+#   Changes from version n05 to version b02:
+#
+#   - Encoding changed for Mac OS 8.5; change mapping of 0xDB from
+#   CURRENCY SIGN (U+00A4) to EURO SIGN (U+20AC).
+#
+#   Changes from version n02 to version n03:
+#
+#   - Change mapping of 0xBD from U+2126 to its canonical
+#     decomposition, U+03A9.
+#   - Change mapping of 0xAF,0xBF,0xDE,0xDF from composed S or T
+#     WITH CEDILLA to S or T with COMBINING COMMA BELOW (to match
+#     our decomposition mappings).
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x00C5	# LATIN CAPITAL LETTER A WITH RING ABOVE
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x00E3	# LATIN SMALL LETTER A WITH TILDE
+0x8C	0x00E5	# LATIN SMALL LETTER A WITH RING ABOVE
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	0x00EC	# LATIN SMALL LETTER I WITH GRAVE
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	0x00F2	# LATIN SMALL LETTER O WITH GRAVE
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00F5	# LATIN SMALL LETTER O WITH TILDE
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	0x2020	# DAGGER
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x00A2	# CENT SIGN
+0xA3	0x00A3	# POUND SIGN
+0xA4	0x00A7	# SECTION SIGN
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x00DF	# LATIN SMALL LETTER SHARP S
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x00B4	# ACUTE ACCENT
+0xAC	0x00A8	# DIAERESIS
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x0102	# LATIN CAPITAL LETTER A WITH BREVE
+0xAF	0x0218	# LATIN CAPITAL LETTER S WITH COMMA BELOW # for Unicode 3.0 and later
+0xB0	0x221E	# INFINITY
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x00A5	# YEN SIGN
+0xB5	0x00B5	# MICRO SIGN
+0xB6	0x2202	# PARTIAL DIFFERENTIAL
+0xB7	0x2211	# N-ARY SUMMATION
+0xB8	0x220F	# N-ARY PRODUCT
+0xB9	0x03C0	# GREEK SMALL LETTER PI
+0xBA	0x222B	# INTEGRAL
+0xBB	0x00AA	# FEMININE ORDINAL INDICATOR
+0xBC	0x00BA	# MASCULINE ORDINAL INDICATOR
+0xBD	0x03A9	# GREEK CAPITAL LETTER OMEGA
+0xBE	0x0103	# LATIN SMALL LETTER A WITH BREVE
+0xBF	0x0219	# LATIN SMALL LETTER S WITH COMMA BELOW # for Unicode 3.0 and later
+0xC0	0x00BF	# INVERTED QUESTION MARK
+0xC1	0x00A1	# INVERTED EXCLAMATION MARK
+0xC2	0x00AC	# NOT SIGN
+0xC3	0x221A	# SQUARE ROOT
+0xC4	0x0192	# LATIN SMALL LETTER F WITH HOOK
+0xC5	0x2248	# ALMOST EQUAL TO
+0xC6	0x2206	# INCREMENT
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x00C0	# LATIN CAPITAL LETTER A WITH GRAVE
+0xCC	0x00C3	# LATIN CAPITAL LETTER A WITH TILDE
+0xCD	0x00D5	# LATIN CAPITAL LETTER O WITH TILDE
+0xCE	0x0152	# LATIN CAPITAL LIGATURE OE
+0xCF	0x0153	# LATIN SMALL LIGATURE OE
+0xD0	0x2013	# EN DASH
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x00F7	# DIVISION SIGN
+0xD7	0x25CA	# LOZENGE
+0xD8	0x00FF	# LATIN SMALL LETTER Y WITH DIAERESIS
+0xD9	0x0178	# LATIN CAPITAL LETTER Y WITH DIAERESIS
+0xDA	0x2044	# FRACTION SLASH
+0xDB	0x20AC	# EURO SIGN
+0xDC	0x2039	# SINGLE LEFT-POINTING ANGLE QUOTATION MARK
+0xDD	0x203A	# SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
+0xDE	0x021A	# LATIN CAPITAL LETTER T WITH COMMA BELOW # for Unicode 3.0 and later
+0xDF	0x021B	# LATIN SMALL LETTER T WITH COMMA BELOW # for Unicode 3.0 and later
+0xE0	0x2021	# DOUBLE DAGGER
+0xE1	0x00B7	# MIDDLE DOT
+0xE2	0x201A	# SINGLE LOW-9 QUOTATION MARK
+0xE3	0x201E	# DOUBLE LOW-9 QUOTATION MARK
+0xE4	0x2030	# PER MILLE SIGN
+0xE5	0x00C2	# LATIN CAPITAL LETTER A WITH CIRCUMFLEX
+0xE6	0x00CA	# LATIN CAPITAL LETTER E WITH CIRCUMFLEX
+0xE7	0x00C1	# LATIN CAPITAL LETTER A WITH ACUTE
+0xE8	0x00CB	# LATIN CAPITAL LETTER E WITH DIAERESIS
+0xE9	0x00C8	# LATIN CAPITAL LETTER E WITH GRAVE
+0xEA	0x00CD	# LATIN CAPITAL LETTER I WITH ACUTE
+0xEB	0x00CE	# LATIN CAPITAL LETTER I WITH CIRCUMFLEX
+0xEC	0x00CF	# LATIN CAPITAL LETTER I WITH DIAERESIS
+0xED	0x00CC	# LATIN CAPITAL LETTER I WITH GRAVE
+0xEE	0x00D3	# LATIN CAPITAL LETTER O WITH ACUTE
+0xEF	0x00D4	# LATIN CAPITAL LETTER O WITH CIRCUMFLEX
+0xF0	0xF8FF	# Apple logo
+0xF1	0x00D2	# LATIN CAPITAL LETTER O WITH GRAVE
+0xF2	0x00DA	# LATIN CAPITAL LETTER U WITH ACUTE
+0xF3	0x00DB	# LATIN CAPITAL LETTER U WITH CIRCUMFLEX
+0xF4	0x00D9	# LATIN CAPITAL LETTER U WITH GRAVE
+0xF5	0x0131	# LATIN SMALL LETTER DOTLESS I
+0xF6	0x02C6	# MODIFIER LETTER CIRCUMFLEX ACCENT
+0xF7	0x02DC	# SMALL TILDE
+0xF8	0x00AF	# MACRON
+0xF9	0x02D8	# BREVE
+0xFA	0x02D9	# DOT ABOVE
+0xFB	0x02DA	# RING ABOVE
+0xFC	0x00B8	# CEDILLA
+0xFD	0x02DD	# DOUBLE ACUTE ACCENT
+0xFE	0x02DB	# OGONEK
+0xFF	0x02C7	# CARON
--- a/charmap/ReadMe.txt
+++ b/charmap/ReadMe.txt
@ -0,0 +1,590 @@
+#=======================================================================
+#   File name:  README.TXT
+#
+#   Contents:   Background information on Unicode mapping tables for
+#               Mac OS legacy text encodings
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-04    Update discussion of roundtrip fidelity,
+#                           delete discussion of mappings dependent on
+#                           symmetric swapping (no longer supported),
+#                           provide information on how legacy encodings
+#                           are supported in Mac OS X.
+#      b3,c1 2002-Dec-19    Add Keyboard font encoding. Update URLs,
+#                           notes.
+#       b02  1999-Sep-22    Update information on Cyrillic. Update
+#                           contact e-mail address.
+#       n07  1998-Feb-05    Rewrite to provide additional information
+#                           relevant to using the accompanying mapping
+#                           tables, and to delete some extraneous
+#                           information. Delete Bulgarian (no special
+#                           encoding, uses standard Cyrillic), add
+#                           Farsi, Devanagari, Gurmukhi, Gujarati,
+#                           Celtic, Gaelic, Inuit, Tibetan.
+#       n04  1995-Nov-15    Update info for Hebrew and Thai
+#       n03  1995-Apr-15    First version (after fixing some typos).
+#
+##################
+
+0. Preliminaries
+----------------
+
+For maximum interchangeability, this file and the accompanying Mac OS
+mapping tables use only ASCII characters. They are intended to be
+displayed in a monospaced font.
+
+Apple, the Apple logo, Mac, and Macintosh are trademarks of Apple
+Computer, Inc., registered in the United States and other countries.
+QuickDraw and TrueType are trademarks of Apple Computer, Inc. Unicode is
+a trademark of Unicode Inc. PostScript is a trademark of Adobe Systems
+Inc., which may be registered in certain jurisdictions. IBM is a
+registered trademark of International Business Machines Corporation. ITC
+Zapf Dingbats is a registered trademark of the International Typeface
+Corporation. For the sake of brevity, throughout this document and the
+accompanying tables, "Macintosh" can be used to refer to Macintosh
+computers and "Unicode" can be used to refer to the Unicode standard.
+
+Apple Computer, Inc. ("Apple") makes no warranty or representation,
+either express or implied, with respect to this document and the
+accompanying tables, their quality, accuracy, or fitness for a
+particular purpose. In no event will Apple be liable for direct,
+indirect, special, incidental, or consequential damages resulting from
+any defect or inaccuracy in this document or the accompanying tables.
+
+1. Introduction
+---------------
+
+This document summarizes some Unicode mapping considerations that are
+relevant for the accompanying mapping tables. It also provides an
+overview of Mac OS legacy encodings.
+
+These mapping tables and character lists are subject to change. The
+latest tables should be available from the following:
+
+<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+
+2. Round-trip fidelity and overview of mapping techniques
+---------------------------------------------------------
+
+For a particular set of national and international standards, Unicode
+provides round-trip fidelity: Text in one of those encodings can be
+mapped to Unicode and back again, yielding the original characters.
+Characters which are distinct in one of these source standards have a
+distinct counterpart in Unicode. Note that this counterpart might not be
+a single Unicode character; as is pointed out in "The Unicode Standard,
+Version 2.0" (page 2-10), "sometimes a single code value in another
+standard corresponds to a sequence of code values in the Unicode
+Standard, or vice versa."
+
+However, Unicode does not attempt to provide round-trip fidelity for
+most vendor standards. Nevertheless, Apple and other platform vendors
+may need to provide such round-trip fidelity for their current platform
+encodings and/or legacy platform encodings (this can be important in
+file systems, for example). In order to do this, Apple makes use of some
+Unicode characters in the corporate-use zone (the upper end of the
+private use area).
+
+Corporate-zone characters must be used with care. Indiscriminate use of
+such characters can result in text which is not easily interchanged with
+other systems, since these characters have no standard meaning outside a
+particular platform. The mappings provided here are intended to minimize
+the use of private use characters, or to use them in such a way that
+basic text content will not be lost if the corporate zone characters are
+dropped when text is transferred to another system.
+
+The tables provided here have three goals, in the following order of
+importance:
+1. Provide 100% round-trip mapping from a Mac OS legacy encoding to
+Unicode and back.
+2. Map characters in a Mac OS encoding into the Unicode characters that
+best represent the interpretation and usage of the Mac OS characters.
+3. When mapping text in a Mac OS encoding to Unicode using the tables,
+the resulting Unicode text should be as interchangeable as possible.
+
+To satisfy these goals, the mappings use a variety of techniques. First
+we attempt to achieve round-trip mappings using any standard Unicode
+feature at our disposal, without resorting to corporate-zone characters.
+This can includes the following techniques:
+- Use of all Unicode characters defined in Unicode 2.1 and later,
+  including compatibility characters.
+- Mapping a single character in a Mac OS encoding to a sequence of
+  standard Unicode characters, or vice versa. This requires grouping
+  characters into appropriate chunks for lookup before mapping them
+  (this mainly applies to sequences of Unicode characters).
+- Using Unicode direction overrides to force direction attributes when
+  mapping to Unicode. This requires resolution of Unicode character
+  direction, and use of this information, when mapping from Unicode back
+  to certain Mac OS encodings.
+The requirements imposed on Unicode handling are necessary for other,
+non-transcoding operations in a full Unicode implementation anyway, so
+requiring them for transcoding should not impose much of a burden.
+
+Next, if round-trip fidelity cannot be achieved using the above
+techniques, we attempt to use corporate-zone characters only as
+"transcoding hints" (more on this below). These are combined with one or
+more standard Unicode characters to mark them as special for
+transcoding, but have no other function and can be deleted with no loss
+of basic text content (only of round-trip fidelity).
+
+Finally, if a character in a Mac OS encoding is unrelated to any Unicode
+character or Unicode character sequence, we may map it to a single
+corporate-zone Unicode code point.
+
+These techniques are described in more detail in the following sections.
+
+Some clients of these tables may have a different set of goals. For
+example, some clients may prefer to avoid compatibility characters,
+perhaps sacrificing round-trip fidelity if necessary. In most cases it
+is fairly easy to construct other types of mappings from the mappings
+given here. In particular, the Unicode mappings here have been designed
+so that if they are converted to a restricted form of NFD (a form that
+does NOT decompose or normalize Unicode characters in the ranges
+2000-2FFF or F900-FAFF), the resulting mappings still provide roundtrip
+fidelity. (For certain characters in the Mac OS Hebrew and Devanagari
+encodings, the decomposition mappings must use a grouping transcoding
+hint to ensure roundtrip fidelity; more details on this are provided in
+the mapping tables for those encodings.)
+
+There is one more round-trip issue that should be mentioned. If a
+Unicode character or sequence can be mapped at all into a particular Mac
+OS encoding, then the reverse mapping back to Unicode should yield the
+original Unicode character or sequence (except for possible differences
+in direction overrides or other Unicode characters with General Category
+Cf). The tables here also provide this. For a related issue, see the
+next section.
+
+3. Mapping tolerance: Strict and loose
+--------------------------------------
+
+In many character sets, a single character may have multiple semantics, 
+either by explicit definition, ambiguous definition, or established 
+usage. For example, the JIS character 0x2142, or 0x8161 in Shift-JIS, 
+is specified in the JIS X0208 standard to have two meanings: "double 
+vertical line" and "parallel". Each of these meanings corresponds to a 
+different Unicode character: 0x2016 DOUBLE VERTICAL LINE and 0x2225 
+PARALLEL TO. When mapping from Unicode to Shift-JIS, it is normally 
+desirable to map both of these Unicode characters to the single
+Shift-JIS character. However, when mapping the Shift-JIS character to
+Unicode, we can choose only one of the possible Unicode characters.
+
+For two encodings X and Y, we can define a set of "strict" mappings
+from one to the other as follows: If text in X can be mapped to Y using
+the strict mappings from X to Y, then the resulting text can be mapped
+back using the strict mappings from Y to X to end up with the original
+text from X. Similarly, if text in Y can be mapped to X using the strict
+mappings from Y to X, then the resulting text can be mapped back using
+the strict mappings from X to Y to end up with the original text from Y.
+
+There may be several characters in one encoding that all map to a
+single character in another encoding, but only one of these mappings
+can be strict; the others are "loose".
+
+The mappings given in the accompanying tables are strict mappings.
+However, the Mac OS Text Encoding Converter also supports loose
+mappings and fallback mappings. Some of the accompanying tables provide
+suggestions about possible loose mappings.
+
+4. Mapping a Mac encoding character to a Unicode sequence or vice versa
+-----------------------------------------------------------------------
+
+In some cases, a character in a Mac OS legacy encoding maps to a
+sequence of Unicode characters. For example, the Mac OS Japanese
+encoding includes a character for the circled CJK ideograph "big".
+Although Unicode encodes other circled ideographs as single characters,
+it does not encode this one. However, this character can be
+unambiguously represented in Unicode as the Unicode sequence
+0x5927+0x20DD, the CJK ideograph for "big" followed by COMBINING
+ENCLOSING CIRCLE.
+
+To handle the reverse mapping, a transcoding process must group the
+Unicode sequence 0x5927+0x20DD as a single element for lookup (The
+Mac OS Text Encoding Converter does this).
+
+In a few cases, a sequence of characters in a Mac OS legacy encoding
+must be grouped for mapping to a single Unicode character or a sequence
+of Unicode characters. For example, in Mac OS Devanagari (based on
+ISCII-91), DEVANAGARI LETTER VOCALIC L is represented as 0xA6+0xE9;
+but this is represented in Unicode by the single character 0x090C.
+Furthermore, explicit halant is represented in Mac OS Devanagari as
+0xE8+0xE8 (double halant) and in Unicode as 0x094D+0x200C (VIRAMA
+plus ZERO WIDTH NON-JOINER). The latter can also be considered as
+a context-dependent mapping of 0xE8, halant.
+
+Loose mappings from Unicode to a Mac OS encoding often map a single
+Unicode to a sequence of characters in the Mac OS encoding. For example,
+the Unicode character 0x00BD VULGAR FRACTION ONE HALF cannot be mapped
+into the Mac OS Roman character set as a single character, but it has a
+loose mapping to the sequence 0x31+0xDA+0x32, "digit one" + "fraction
+slash" + "digit two".
+
+In some cases a Unicode character such as a direction override may
+simply be discarded when mapping to a Mac OS encoding, since the
+information carried by the override may be represented in a different
+way by the Mac OS encoding. See the next section for an example.
+
+5. Mappings that depend on directionality (or other attributes)
+---------------------------------------------------------------
+
+Strict mappings from Unicode to Mac OS legacy encodings may depend on
+resolved character direction. Loose mappings may depend on additional
+attributes such as whether the text should use vertical form codes if
+available (i.e. whether the text is intended for vertical display on a
+system that cannot automatically substitute vertical forms).
+
+a) Resolved character direction
+
+The Mac OS Arabic and Hebrew character sets were developed in 1986-1987.
+At that time the bidirectional line layout algorithm used in the Mac OS
+was fairly simple; it used only a few direction classes (instead of the
+19 now used in the Unicode bidirectional algorithm). In order to permit
+users to handle some tricky layout problems, certain punctuation and
+symbol characters have duplicate code points, one with a left-right 
+direction attribute and the other with a right-left direction attribute.
+
+For example, plus sign is encoded at 0x2B with a left-right attribute,
+and at 0xAB with a right-left attribute. However, there is only one PLUS
+SIGN character in Unicode. This leads to some interesting problems when
+mapping between Mac OS Arabic or Hebrew and Unicode.
+
+We need a way to map both of these plus signs to Unicode and back. Using
+a single corporate character for one of these plus signs is not a good
+solution, since both of the plus sign characters are likely to be used
+in text that is interchanged, and thus content would be lost.
+
+The problem is solved with the use of direction override characters and
+direction-dependent mappings. When mapping from Mac OS Arabic or Hebrew
+to Unicode, we use direction overrides as necessary to force the
+direction of the resulting Unicode characters. When mapping back from
+Unicode, the Unicode bidirectional algorithm should be used to determine
+resolved direction of the Unicode characters. The mapping from Unicode
+to Mac OS Arabic or Hebrew can then be disambiguated as necessary by
+using the resolved direction.
+
+For example, when mapping from Mac OS Arabic or Hebrew, we can use
+LEFT-RIGHT OVERRIDE (LRO), RIGHT-LEFT OVERRIDE (RLO), and POP DIRECTION
+FORMATTING (PDF) as follows:
+
+  0x2B ->  0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
+  0xAB ->  0x202E (RLO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
+
+When mapping back, we resolve the direction of the Unicode character
+0x002B, and use this information to determine which of the Mac OS
+encoding characters to use:
+
+  0x002B -> 0x2B (if LR) or 0xAB (if RL)
+  
+After direction overrides have been used in this way to force a
+particular resolved direction, they may be discarded when mapping from
+Unicode to Mac OS Arabic and Hebrew (since the information they carried
+in Unicode is represented in the Mac OS encoding by the code point of
+the plus sign).
+
+Even when not required for round-trip fidelity, direction overrides
+may be used when mapping from a Mac OS encoding to Unicode in order to
+preserve proper text layout. For example, the single Mac OS Arabic
+ellipsis character has direction class right-left, while the Unicode
+HORIZONTAL ELLIPSIS character has direction class neutral. When 
+mapping the Mac OS ellipsis to Unicode, it is surrounded with a
+direction override to help preserve proper text layout. However,
+resolved direction is not needed or used when mapping the Unicode
+HORIZONTAL ELLIPSIS back to Mac OS Arabic.
+
+b) Horizontal or vertical display
+
+The Mac OS Japanese encoding includes separately-encoded vertical forms
+for some punctuation and kana. When Unicode characters in the CJK
+punctuation and kana ranges are mapped to Mac OS Japanese characters and
+(1) those characters are intended for vertical display, (2) they will be
+displayed in an environment that does not provide automatic vertical
+form substitution, and (3) loose mappings are desired, the Unicode
+characters can be mapped to the corresponding vertical form codes in the
+Mac OS Japanese encoding.
+
+This does not affect mapping of the Unicode vertical presentation forms
+(which always map to the Mac OS Japanese vertical form codes).
+
+6. Use of corporate characters
+------------------------------
+
+Apple has defined a block of 32 corporate characters as "transcoding
+hints." These are used in combination with standard Unicode characters
+to force them to be treated in a special way for mapping to other
+encodings; they have no other effect. Sixteen of these transcoding
+hints are "grouping hints" - they indicate that the next 2-4 Unicode
+characters should be treated as a single entity for transcoding. The
+other sixteen transcoding hints are "variant tags" - they are like
+combining characters, and can follow a standard Unicode (or a sequence
+consisting of a base character and other combining characters) to
+cause it to be treated in a special way for transcoding. These always
+terminate a combining-character sequence.
+
+Whenever possible, mappings that require corporate-zone characters
+use standard Unicode characters in combination with a single
+transcoding hint (no mapping uses more than one transcoding hint).
+For these mappings, even if the corporate-zone characters are lost in
+interchange, the basic text content will be preserved.
+
+However, some characters in a Mac OS encoding - such as the Apple
+logo character - bear no relation to any standard Unicode character.
+In these cases, the Mac OS character is mapped to a single corporate
+zone character defined by Apple. Fewer than 40 corporate characters
+are used in this way.
+
+All of the corporate characters defined by Apple are listed in the
+accompanying file "CORPCHAR.TXT", including old Apple corporate
+character assignments which are now deprecated (but which are still
+supported as loose mappings by the Mac OS Text Encoding Converter).
+
+7. Font variants
+----------------
+
+For some Mac OS legacy encodings, certain fonts used with that encoding
+may actually implement a slight variant of the standard encoding
+specified in the accompanying mapping tables. The header comments in the
+mapping table files for each encoding describe any font variants
+associated with that encoding.
+
+8. Encodings in Mac OS X
+------------------------
+
+The Mac OS X Cocoa and Carbon environments use Unicode as the primary
+text encoding. Some legacy programming interfaces in the Carbon
+environment - e.g. Quickdraw Text, the Script Manager, and related
+Text Utilities - use and support the following subset of Mac OS legacy
+encodings:
+  Roman
+  Central European
+  Cyrillic
+  Chinese Traditional
+  Chinese Simplified 
+  Japanese
+  Korean
+
+Other legacy Mac OS encodings are supported in Carbon and Cocoa via
+transcoding using the Mac OS Text Encoding Converter or other
+transcoding interfaces; the character repertoires of all Mac OS
+legacy encodings are supported in Unicode on Mac OS X.
+
+Additional legacy encodings are also supported in the Classic
+environment under Mac OS X.
+
+9. Mac OS legacy encodings
+--------------------------
+
+Mac OS versions 7.1 and later supported multiple encodings via the
+Script Manager, QuickDraw Text and related Text Utilities. These
+system components distinguish these encodings primarily by script code:
+font family IDs are grouped into ranges, and each range is associated
+with a script code. 
+
+In some cases, there are several encodings that share a single script
+code. Usually these are closely related. To distinguish among these,
+additional information is required, such as font name or system
+region code (locale code).
+
+The encodings described here (and in the accompanying tables) are the 
+legacy encodings used in Mac OS versions 7.1 and later. In some cases,
+certain earlier system versions have used different encodings. Not all
+of these encodings are directly supported in Mac OS X, but Mac OS X
+does support transcoding between all of these encodings and Unicode.
+
+In all Mac OS legacy encodings, character codes 0x00-0x7F are identical
+to ASCII, except that
+  - in Mac OS Japanese, reverse solidus is replaced by yen sign
+  - in Mac OS Arabic, Farsi, and Hebrew, some of the punctuation in this
+    range is treated as having strong left-right directionality,
+    although the corresponding Unicode characters have neutral
+    directionality
+  - in the three symbol glyphs encodings (Symbol, Dingbats, and Keyboard
+    glyphs), a different mapping is used for the ASCII range. The
+    Keyboard glyphs encoding even has a special mapping for the control
+    characters range 0x00-0x1F.
+Fonts used as "system" fonts (for menus, dialogs, etc.) had four glyphs
+at code points 0x11-0x14 for transient use by the Menu Manager. These
+glyphs were not intended as characters for use in normal text, and the
+associated code points are not generally interpreted as associated with
+these glyphs. (However, a "system font variant" mapping table could
+provide mappings for these).
+
+Note that in general, character sets cannot be determined from font 
+layouts (they are not the same thing!). This is very noticeable with 
+Arabic, Hebrew, and Devanagari, for example.
+
+The following is a list of legacy Mac OS encodings. The accompanying
+tables provide mappings from these encodings to Unicode.
+
+a) Mac OS encodings for script code 0, smRoman.
+
+* Roman - this is the default for script code 0 (when the special
+  cases listed below do not apply). It covers several western European
+  languages, and includes math operators and various symbols.
+
+* Symbol - this is the encoding for the font named "Symbol". It includes
+  Greek letters, math operators, and miscellaneous symbols. The layout
+  of the Symbol character set is identical to the layout of the Adobe
+  Symbol encoding vector, with the addition of the Apple logo at 0xF0
+  and the EURO SIGN at 0xA0.
+
+* Dingbats - this is the encoding for the font named "Zapf Dingbats".
+  The layout of the Dingbats character set is identical to or a superset
+  of the layout of the Adobe Zapf Dingbats encoding vector.
+
+* Keyboard glyphs - this is the encoding for the legacy font named
+  ".Keyboard". Before Mac OS X, this font was used by the user-interface
+  system to display glyphs for special keys on the keyboard. In Mac OS
+  X, this mapping is not associated with a font; it is only used as a
+  way to map from a set of Menu Manager constants to associated Unicode
+  sequences. As such, new mappings added for Mac OS X only may be
+  one-way mappings: From the Keyboard glyph "encoding" to Unicode, but
+  not back.
+
+* Turkish - this is the encoding if the script code is 0 and the system
+  region code is 24, verTurkey. It has 7 code point differences from
+  Mac OS Roman.
+
+* Croatian - this is the encoding if the script code is 0 and the system
+  region code is any of the following:
+    68, verCroatia
+    66, verSlovenian
+    25, verYugoCroatian (only used in older systems)
+  It has 20 code point differences from standard Roman, but only 10
+  differences in repertoire.
+
+* Icelandic - this is the encoding if the script code is 0 and the
+  system region code is either of the following:
+    21, verIceland
+    47, verFaroeIsl
+  It has 6 code point differences from standard Roman. It also has one
+  font variant.
+
+* Romanian - this is the encoding if the script code is 0 and the system
+  region code is 39, verRomania . It has 6 code point differences from
+  standard Roman.
+
+* Celtic - this is the encoding if the script code is 0 and the system
+  region code is any of the following:
+    50, verIreland
+    75, verScottishGaelic
+    76, verManxGaelic
+    77, verBreton
+    79, verWelsh
+  It is a variant of Mac OS Roman with a few extra accented characters
+  for Welsh.
+
+* Gaelic - this is the encoding if the script code is 0 and the system
+  region code is 81, verIrishGaelicScript. It is a variant of Mac OS
+  Roman, and supports the older Irish orthography using dot above.
+
+* Greek (monotonic) - this is the encoding if the script code is 0 and
+  the system region code is 20, verGreece. Although a script code is
+  defined for Greek, the Greek localized system does not use it (the
+  font family IDs are in the smRoman range). This encoding is based on
+  the ISO/IEC 8859-7 repertoire with additional Roman characters for
+  French and German, as well as additional symbols. Greek system 4.1
+  used a different encoding that matched 8859-7 code points for Greek
+  letters. Greek system 6.0.7 also used a variant of the standard
+  encoding, but it was quickly replaced by Greek system 6.0.7.1 which
+  used the standard encoding.
+
+  See also the Central European encoding under script code 29 below.
+
+b) Mac OS encodings for script code 1, smJapanese.
+
+* Japanese - this is the default for script code 1. It is based on a
+  Shift-JIS implementation of JIS X0208-1990 ("fullwidth") and
+  JIS X0201-1976 ("halfwidth"), with 5 additional one-byte characters
+  and one modified character, a set of Apple extension characters which
+  include many industry standard extensions, and separate codes for
+  vertical forms of some punctuation and kana. There are several font
+  variants.
+
+c) Mac OS encodings for script code 2, smTradChinese.
+
+* Chinese Traditional - this is an extension of Big-5.
+
+d) Mac OS encodings for script code 3, smKorean.
+
+* Korean - this is an extension of EUC-KR.
+
+e) Mac OS encodings for script code 4, smArabic.
+
+* Arabic - This is the default for script code 4 (when the special
+  case listed below does not apply). It is based on the ISO/IEC 8859-6
+  repertoire, with additional Arabic letters for Persian and Urdu and
+  with accented Roman letters for European languages. It has the
+  interesting feature mentioned above that certain ASCII punctuation
+  and symbol characters are encoded twice, once for each direction. It
+  has several font variants.
+ 
+* Farsi - This is the encoding if the script code is 4 and the system
+  region code is 48, verIran. It is similar to Mac OS Arabic, but has
+  the "extended" or Persian digits instead of the standard Arabic
+  digits. It has one font variant.
+
+f) Mac OS encodings for script code 5, smHebrew.
+
+* Hebrew - This is based on the ISO/IEC 8859-8 Hebrew letter repertoire,
+  but adds Hebrew points, some Hebrew ligatures, some accented Roman
+  letters for European languages, and some non-ASCII punctuation. As 
+  with Mac OS Arabic, certain ASCII punctuation and symbol characters
+  are encoded twice, once for each direction. This is also true for the
+  European digits. This has one font variant.
+
+g) Mac OS encodings for script code 6, smGreek.
+
+  None currently - see smRoman.
+
+h) Mac OS encodings for script code 7, smCyrillic.
+
+* Cyrillic - This is based on the ISO/IEC 8859-5 Cyrillic character
+  repertoire plus an additional case pair for Ukrainian.
+
+i) Mac OS encodings for script code 9, smDevanagari.
+
+* Devanagari - This is based on IS 13194:1991 (ISCII-91), and adds some
+  punctuation and symbols.
+
+j) Mac OS encodings for script code 10, smGurmukhi.
+
+* Gurmukhi - This is based on IS 13194:1991 (ISCII-91), and adds some
+  punctuation and symbols.
+
+k) Mac OS encodings for script code 11, smGujarati.
+
+* Gujarati - This is based on IS 13194:1991 (ISCII-91), and adds some
+  punctuation and symbols.
+
+l) Mac OS encodings for script code 21, smThai.
+
+* Thai - This is based on TIS 620-2533, except that three of the
+  TIS 620-2533 characters are replaced with other characters. Some
+  undefined code points in TIS 620-2533 are used for additional
+  punctuation characters.
+
+m) Mac OS encodings for script code 25, smSimpChinese.
+
+* Chinese Simplified - this is an extension of EUC-CN.
+
+n) Mac OS encodings for script code 26, smTibetan.
+
+* Tibetan
+
+o) Mac OS encodings for script code 28, smEthiopic.
+
+* Inuit - this is the encoding if the script code is 28 and the
+  system region code is 78, verNunavut (for Inuktitut language).
+  There is no script code for Inuit, so it shares the script code
+  with Ethiopic.
+
+p) Mac OS encodings for script code 29, smCentralEuroRoman.
+
+* Central European - This is similar to standard Roman, but with a
+  different (and larger) set of European characters and with fewer
+  symbols. It is used for Polish, Czech, Slovak, Hungarian, Estonian,
+  Latvian, and Lithuanian.
--- a/charmap/Readme.txt
+++ b/charmap/Readme.txt
@ -0,0 +1,590 @@
+#=======================================================================
+#   File name:  README.TXT
+#
+#   Contents:   Background information on Unicode mapping tables for
+#               Mac OS legacy text encodings
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-04    Update discussion of roundtrip fidelity,
+#                           delete discussion of mappings dependent on
+#                           symmetric swapping (no longer supported),
+#                           provide information on how legacy encodings
+#                           are supported in Mac OS X.
+#      b3,c1 2002-Dec-19    Add Keyboard font encoding. Update URLs,
+#                           notes.
+#       b02  1999-Sep-22    Update information on Cyrillic. Update
+#                           contact e-mail address.
+#       n07  1998-Feb-05    Rewrite to provide additional information
+#                           relevant to using the accompanying mapping
+#                           tables, and to delete some extraneous
+#                           information. Delete Bulgarian (no special
+#                           encoding, uses standard Cyrillic), add
+#                           Farsi, Devanagari, Gurmukhi, Gujarati,
+#                           Celtic, Gaelic, Inuit, Tibetan.
+#       n04  1995-Nov-15    Update info for Hebrew and Thai
+#       n03  1995-Apr-15    First version (after fixing some typos).
+#
+##################
+
+0. Preliminaries
+----------------
+
+For maximum interchangeability, this file and the accompanying Mac OS
+mapping tables use only ASCII characters. They are intended to be
+displayed in a monospaced font.
+
+Apple, the Apple logo, Mac, and Macintosh are trademarks of Apple
+Computer, Inc., registered in the United States and other countries.
+QuickDraw and TrueType are trademarks of Apple Computer, Inc. Unicode is
+a trademark of Unicode Inc. PostScript is a trademark of Adobe Systems
+Inc., which may be registered in certain jurisdictions. IBM is a
+registered trademark of International Business Machines Corporation. ITC
+Zapf Dingbats is a registered trademark of the International Typeface
+Corporation. For the sake of brevity, throughout this document and the
+accompanying tables, "Macintosh" can be used to refer to Macintosh
+computers and "Unicode" can be used to refer to the Unicode standard.
+
+Apple Computer, Inc. ("Apple") makes no warranty or representation,
+either express or implied, with respect to this document and the
+accompanying tables, their quality, accuracy, or fitness for a
+particular purpose. In no event will Apple be liable for direct,
+indirect, special, incidental, or consequential damages resulting from
+any defect or inaccuracy in this document or the accompanying tables.
+
+1. Introduction
+---------------
+
+This document summarizes some Unicode mapping considerations that are
+relevant for the accompanying mapping tables. It also provides an
+overview of Mac OS legacy encodings.
+
+These mapping tables and character lists are subject to change. The
+latest tables should be available from the following:
+
+<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+
+2. Round-trip fidelity and overview of mapping techniques
+---------------------------------------------------------
+
+For a particular set of national and international standards, Unicode
+provides round-trip fidelity: Text in one of those encodings can be
+mapped to Unicode and back again, yielding the original characters.
+Characters which are distinct in one of these source standards have a
+distinct counterpart in Unicode. Note that this counterpart might not be
+a single Unicode character; as is pointed out in "The Unicode Standard,
+Version 2.0" (page 2-10), "sometimes a single code value in another
+standard corresponds to a sequence of code values in the Unicode
+Standard, or vice versa."
+
+However, Unicode does not attempt to provide round-trip fidelity for
+most vendor standards. Nevertheless, Apple and other platform vendors
+may need to provide such round-trip fidelity for their current platform
+encodings and/or legacy platform encodings (this can be important in
+file systems, for example). In order to do this, Apple makes use of some
+Unicode characters in the corporate-use zone (the upper end of the
+private use area).
+
+Corporate-zone characters must be used with care. Indiscriminate use of
+such characters can result in text which is not easily interchanged with
+other systems, since these characters have no standard meaning outside a
+particular platform. The mappings provided here are intended to minimize
+the use of private use characters, or to use them in such a way that
+basic text content will not be lost if the corporate zone characters are
+dropped when text is transferred to another system.
+
+The tables provided here have three goals, in the following order of
+importance:
+1. Provide 100% round-trip mapping from a Mac OS legacy encoding to
+Unicode and back.
+2. Map characters in a Mac OS encoding into the Unicode characters that
+best represent the interpretation and usage of the Mac OS characters.
+3. When mapping text in a Mac OS encoding to Unicode using the tables,
+the resulting Unicode text should be as interchangeable as possible.
+
+To satisfy these goals, the mappings use a variety of techniques. First
+we attempt to achieve round-trip mappings using any standard Unicode
+feature at our disposal, without resorting to corporate-zone characters.
+This can includes the following techniques:
+- Use of all Unicode characters defined in Unicode 2.1 and later,
+  including compatibility characters.
+- Mapping a single character in a Mac OS encoding to a sequence of
+  standard Unicode characters, or vice versa. This requires grouping
+  characters into appropriate chunks for lookup before mapping them
+  (this mainly applies to sequences of Unicode characters).
+- Using Unicode direction overrides to force direction attributes when
+  mapping to Unicode. This requires resolution of Unicode character
+  direction, and use of this information, when mapping from Unicode back
+  to certain Mac OS encodings.
+The requirements imposed on Unicode handling are necessary for other,
+non-transcoding operations in a full Unicode implementation anyway, so
+requiring them for transcoding should not impose much of a burden.
+
+Next, if round-trip fidelity cannot be achieved using the above
+techniques, we attempt to use corporate-zone characters only as
+"transcoding hints" (more on this below). These are combined with one or
+more standard Unicode characters to mark them as special for
+transcoding, but have no other function and can be deleted with no loss
+of basic text content (only of round-trip fidelity).
+
+Finally, if a character in a Mac OS encoding is unrelated to any Unicode
+character or Unicode character sequence, we may map it to a single
+corporate-zone Unicode code point.
+
+These techniques are described in more detail in the following sections.
+
+Some clients of these tables may have a different set of goals. For
+example, some clients may prefer to avoid compatibility characters,
+perhaps sacrificing round-trip fidelity if necessary. In most cases it
+is fairly easy to construct other types of mappings from the mappings
+given here. In particular, the Unicode mappings here have been designed
+so that if they are converted to a restricted form of NFD (a form that
+does NOT decompose or normalize Unicode characters in the ranges
+2000-2FFF or F900-FAFF), the resulting mappings still provide roundtrip
+fidelity. (For certain characters in the Mac OS Hebrew and Devanagari
+encodings, the decomposition mappings must use a grouping transcoding
+hint to ensure roundtrip fidelity; more details on this are provided in
+the mapping tables for those encodings.)
+
+There is one more round-trip issue that should be mentioned. If a
+Unicode character or sequence can be mapped at all into a particular Mac
+OS encoding, then the reverse mapping back to Unicode should yield the
+original Unicode character or sequence (except for possible differences
+in direction overrides or other Unicode characters with General Category
+Cf). The tables here also provide this. For a related issue, see the
+next section.
+
+3. Mapping tolerance: Strict and loose
+--------------------------------------
+
+In many character sets, a single character may have multiple semantics, 
+either by explicit definition, ambiguous definition, or established 
+usage. For example, the JIS character 0x2142, or 0x8161 in Shift-JIS, 
+is specified in the JIS X0208 standard to have two meanings: "double 
+vertical line" and "parallel". Each of these meanings corresponds to a 
+different Unicode character: 0x2016 DOUBLE VERTICAL LINE and 0x2225 
+PARALLEL TO. When mapping from Unicode to Shift-JIS, it is normally 
+desirable to map both of these Unicode characters to the single
+Shift-JIS character. However, when mapping the Shift-JIS character to
+Unicode, we can choose only one of the possible Unicode characters.
+
+For two encodings X and Y, we can define a set of "strict" mappings
+from one to the other as follows: If text in X can be mapped to Y using
+the strict mappings from X to Y, then the resulting text can be mapped
+back using the strict mappings from Y to X to end up with the original
+text from X. Similarly, if text in Y can be mapped to X using the strict
+mappings from Y to X, then the resulting text can be mapped back using
+the strict mappings from X to Y to end up with the original text from Y.
+
+There may be several characters in one encoding that all map to a
+single character in another encoding, but only one of these mappings
+can be strict; the others are "loose".
+
+The mappings given in the accompanying tables are strict mappings.
+However, the Mac OS Text Encoding Converter also supports loose
+mappings and fallback mappings. Some of the accompanying tables provide
+suggestions about possible loose mappings.
+
+4. Mapping a Mac encoding character to a Unicode sequence or vice versa
+-----------------------------------------------------------------------
+
+In some cases, a character in a Mac OS legacy encoding maps to a
+sequence of Unicode characters. For example, the Mac OS Japanese
+encoding includes a character for the circled CJK ideograph "big".
+Although Unicode encodes other circled ideographs as single characters,
+it does not encode this one. However, this character can be
+unambiguously represented in Unicode as the Unicode sequence
+0x5927+0x20DD, the CJK ideograph for "big" followed by COMBINING
+ENCLOSING CIRCLE.
+
+To handle the reverse mapping, a transcoding process must group the
+Unicode sequence 0x5927+0x20DD as a single element for lookup (The
+Mac OS Text Encoding Converter does this).
+
+In a few cases, a sequence of characters in a Mac OS legacy encoding
+must be grouped for mapping to a single Unicode character or a sequence
+of Unicode characters. For example, in Mac OS Devanagari (based on
+ISCII-91), DEVANAGARI LETTER VOCALIC L is represented as 0xA6+0xE9;
+but this is represented in Unicode by the single character 0x090C.
+Furthermore, explicit halant is represented in Mac OS Devanagari as
+0xE8+0xE8 (double halant) and in Unicode as 0x094D+0x200C (VIRAMA
+plus ZERO WIDTH NON-JOINER). The latter can also be considered as
+a context-dependent mapping of 0xE8, halant.
+
+Loose mappings from Unicode to a Mac OS encoding often map a single
+Unicode to a sequence of characters in the Mac OS encoding. For example,
+the Unicode character 0x00BD VULGAR FRACTION ONE HALF cannot be mapped
+into the Mac OS Roman character set as a single character, but it has a
+loose mapping to the sequence 0x31+0xDA+0x32, "digit one" + "fraction
+slash" + "digit two".
+
+In some cases a Unicode character such as a direction override may
+simply be discarded when mapping to a Mac OS encoding, since the
+information carried by the override may be represented in a different
+way by the Mac OS encoding. See the next section for an example.
+
+5. Mappings that depend on directionality (or other attributes)
+---------------------------------------------------------------
+
+Strict mappings from Unicode to Mac OS legacy encodings may depend on
+resolved character direction. Loose mappings may depend on additional
+attributes such as whether the text should use vertical form codes if
+available (i.e. whether the text is intended for vertical display on a
+system that cannot automatically substitute vertical forms).
+
+a) Resolved character direction
+
+The Mac OS Arabic and Hebrew character sets were developed in 1986-1987.
+At that time the bidirectional line layout algorithm used in the Mac OS
+was fairly simple; it used only a few direction classes (instead of the
+19 now used in the Unicode bidirectional algorithm). In order to permit
+users to handle some tricky layout problems, certain punctuation and
+symbol characters have duplicate code points, one with a left-right 
+direction attribute and the other with a right-left direction attribute.
+
+For example, plus sign is encoded at 0x2B with a left-right attribute,
+and at 0xAB with a right-left attribute. However, there is only one PLUS
+SIGN character in Unicode. This leads to some interesting problems when
+mapping between Mac OS Arabic or Hebrew and Unicode.
+
+We need a way to map both of these plus signs to Unicode and back. Using
+a single corporate character for one of these plus signs is not a good
+solution, since both of the plus sign characters are likely to be used
+in text that is interchanged, and thus content would be lost.
+
+The problem is solved with the use of direction override characters and
+direction-dependent mappings. When mapping from Mac OS Arabic or Hebrew
+to Unicode, we use direction overrides as necessary to force the
+direction of the resulting Unicode characters. When mapping back from
+Unicode, the Unicode bidirectional algorithm should be used to determine
+resolved direction of the Unicode characters. The mapping from Unicode
+to Mac OS Arabic or Hebrew can then be disambiguated as necessary by
+using the resolved direction.
+
+For example, when mapping from Mac OS Arabic or Hebrew, we can use
+LEFT-RIGHT OVERRIDE (LRO), RIGHT-LEFT OVERRIDE (RLO), and POP DIRECTION
+FORMATTING (PDF) as follows:
+
+  0x2B ->  0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
+  0xAB ->  0x202E (RLO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
+
+When mapping back, we resolve the direction of the Unicode character
+0x002B, and use this information to determine which of the Mac OS
+encoding characters to use:
+
+  0x002B -> 0x2B (if LR) or 0xAB (if RL)
+  
+After direction overrides have been used in this way to force a
+particular resolved direction, they may be discarded when mapping from
+Unicode to Mac OS Arabic and Hebrew (since the information they carried
+in Unicode is represented in the Mac OS encoding by the code point of
+the plus sign).
+
+Even when not required for round-trip fidelity, direction overrides
+may be used when mapping from a Mac OS encoding to Unicode in order to
+preserve proper text layout. For example, the single Mac OS Arabic
+ellipsis character has direction class right-left, while the Unicode
+HORIZONTAL ELLIPSIS character has direction class neutral. When 
+mapping the Mac OS ellipsis to Unicode, it is surrounded with a
+direction override to help preserve proper text layout. However,
+resolved direction is not needed or used when mapping the Unicode
+HORIZONTAL ELLIPSIS back to Mac OS Arabic.
+
+b) Horizontal or vertical display
+
+The Mac OS Japanese encoding includes separately-encoded vertical forms
+for some punctuation and kana. When Unicode characters in the CJK
+punctuation and kana ranges are mapped to Mac OS Japanese characters and
+(1) those characters are intended for vertical display, (2) they will be
+displayed in an environment that does not provide automatic vertical
+form substitution, and (3) loose mappings are desired, the Unicode
+characters can be mapped to the corresponding vertical form codes in the
+Mac OS Japanese encoding.
+
+This does not affect mapping of the Unicode vertical presentation forms
+(which always map to the Mac OS Japanese vertical form codes).
+
+6. Use of corporate characters
+------------------------------
+
+Apple has defined a block of 32 corporate characters as "transcoding
+hints." These are used in combination with standard Unicode characters
+to force them to be treated in a special way for mapping to other
+encodings; they have no other effect. Sixteen of these transcoding
+hints are "grouping hints" - they indicate that the next 2-4 Unicode
+characters should be treated as a single entity for transcoding. The
+other sixteen transcoding hints are "variant tags" - they are like
+combining characters, and can follow a standard Unicode (or a sequence
+consisting of a base character and other combining characters) to
+cause it to be treated in a special way for transcoding. These always
+terminate a combining-character sequence.
+
+Whenever possible, mappings that require corporate-zone characters
+use standard Unicode characters in combination with a single
+transcoding hint (no mapping uses more than one transcoding hint).
+For these mappings, even if the corporate-zone characters are lost in
+interchange, the basic text content will be preserved.
+
+However, some characters in a Mac OS encoding - such as the Apple
+logo character - bear no relation to any standard Unicode character.
+In these cases, the Mac OS character is mapped to a single corporate
+zone character defined by Apple. Fewer than 40 corporate characters
+are used in this way.
+
+All of the corporate characters defined by Apple are listed in the
+accompanying file "CORPCHAR.TXT", including old Apple corporate
+character assignments which are now deprecated (but which are still
+supported as loose mappings by the Mac OS Text Encoding Converter).
+
+7. Font variants
+----------------
+
+For some Mac OS legacy encodings, certain fonts used with that encoding
+may actually implement a slight variant of the standard encoding
+specified in the accompanying mapping tables. The header comments in the
+mapping table files for each encoding describe any font variants
+associated with that encoding.
+
+8. Encodings in Mac OS X
+------------------------
+
+The Mac OS X Cocoa and Carbon environments use Unicode as the primary
+text encoding. Some legacy programming interfaces in the Carbon
+environment - e.g. Quickdraw Text, the Script Manager, and related
+Text Utilities - use and support the following subset of Mac OS legacy
+encodings:
+  Roman
+  Central European
+  Cyrillic
+  Chinese Traditional
+  Chinese Simplified 
+  Japanese
+  Korean
+
+Other legacy Mac OS encodings are supported in Carbon and Cocoa via
+transcoding using the Mac OS Text Encoding Converter or other
+transcoding interfaces; the character repertoires of all Mac OS
+legacy encodings are supported in Unicode on Mac OS X.
+
+Additional legacy encodings are also supported in the Classic
+environment under Mac OS X.
+
+9. Mac OS legacy encodings
+--------------------------
+
+Mac OS versions 7.1 and later supported multiple encodings via the
+Script Manager, QuickDraw Text and related Text Utilities. These
+system components distinguish these encodings primarily by script code:
+font family IDs are grouped into ranges, and each range is associated
+with a script code. 
+
+In some cases, there are several encodings that share a single script
+code. Usually these are closely related. To distinguish among these,
+additional information is required, such as font name or system
+region code (locale code).
+
+The encodings described here (and in the accompanying tables) are the 
+legacy encodings used in Mac OS versions 7.1 and later. In some cases,
+certain earlier system versions have used different encodings. Not all
+of these encodings are directly supported in Mac OS X, but Mac OS X
+does support transcoding between all of these encodings and Unicode.
+
+In all Mac OS legacy encodings, character codes 0x00-0x7F are identical
+to ASCII, except that
+  - in Mac OS Japanese, reverse solidus is replaced by yen sign
+  - in Mac OS Arabic, Farsi, and Hebrew, some of the punctuation in this
+    range is treated as having strong left-right directionality,
+    although the corresponding Unicode characters have neutral
+    directionality
+  - in the three symbol glyphs encodings (Symbol, Dingbats, and Keyboard
+    glyphs), a different mapping is used for the ASCII range. The
+    Keyboard glyphs encoding even has a special mapping for the control
+    characters range 0x00-0x1F.
+Fonts used as "system" fonts (for menus, dialogs, etc.) had four glyphs
+at code points 0x11-0x14 for transient use by the Menu Manager. These
+glyphs were not intended as characters for use in normal text, and the
+associated code points are not generally interpreted as associated with
+these glyphs. (However, a "system font variant" mapping table could
+provide mappings for these).
+
+Note that in general, character sets cannot be determined from font 
+layouts (they are not the same thing!). This is very noticeable with 
+Arabic, Hebrew, and Devanagari, for example.
+
+The following is a list of legacy Mac OS encodings. The accompanying
+tables provide mappings from these encodings to Unicode.
+
+a) Mac OS encodings for script code 0, smRoman.
+
+* Roman - this is the default for script code 0 (when the special
+  cases listed below do not apply). It covers several western European
+  languages, and includes math operators and various symbols.
+
+* Symbol - this is the encoding for the font named "Symbol". It includes
+  Greek letters, math operators, and miscellaneous symbols. The layout
+  of the Symbol character set is identical to the layout of the Adobe
+  Symbol encoding vector, with the addition of the Apple logo at 0xF0
+  and the EURO SIGN at 0xA0.
+
+* Dingbats - this is the encoding for the font named "Zapf Dingbats".
+  The layout of the Dingbats character set is identical to or a superset
+  of the layout of the Adobe Zapf Dingbats encoding vector.
+
+* Keyboard glyphs - this is the encoding for the legacy font named
+  ".Keyboard". Before Mac OS X, this font was used by the user-interface
+  system to display glyphs for special keys on the keyboard. In Mac OS
+  X, this mapping is not associated with a font; it is only used as a
+  way to map from a set of Menu Manager constants to associated Unicode
+  sequences. As such, new mappings added for Mac OS X only may be
+  one-way mappings: From the Keyboard glyph "encoding" to Unicode, but
+  not back.
+
+* Turkish - this is the encoding if the script code is 0 and the system
+  region code is 24, verTurkey. It has 7 code point differences from
+  Mac OS Roman.
+
+* Croatian - this is the encoding if the script code is 0 and the system
+  region code is any of the following:
+    68, verCroatia
+    66, verSlovenian
+    25, verYugoCroatian (only used in older systems)
+  It has 20 code point differences from standard Roman, but only 10
+  differences in repertoire.
+
+* Icelandic - this is the encoding if the script code is 0 and the
+  system region code is either of the following:
+    21, verIceland
+    47, verFaroeIsl
+  It has 6 code point differences from standard Roman. It also has one
+  font variant.
+
+* Romanian - this is the encoding if the script code is 0 and the system
+  region code is 39, verRomania . It has 6 code point differences from
+  standard Roman.
+
+* Celtic - this is the encoding if the script code is 0 and the system
+  region code is any of the following:
+    50, verIreland
+    75, verScottishGaelic
+    76, verManxGaelic
+    77, verBreton
+    79, verWelsh
+  It is a variant of Mac OS Roman with a few extra accented characters
+  for Welsh.
+
+* Gaelic - this is the encoding if the script code is 0 and the system
+  region code is 81, verIrishGaelicScript. It is a variant of Mac OS
+  Roman, and supports the older Irish orthography using dot above.
+
+* Greek (monotonic) - this is the encoding if the script code is 0 and
+  the system region code is 20, verGreece. Although a script code is
+  defined for Greek, the Greek localized system does not use it (the
+  font family IDs are in the smRoman range). This encoding is based on
+  the ISO/IEC 8859-7 repertoire with additional Roman characters for
+  French and German, as well as additional symbols. Greek system 4.1
+  used a different encoding that matched 8859-7 code points for Greek
+  letters. Greek system 6.0.7 also used a variant of the standard
+  encoding, but it was quickly replaced by Greek system 6.0.7.1 which
+  used the standard encoding.
+
+  See also the Central European encoding under script code 29 below.
+
+b) Mac OS encodings for script code 1, smJapanese.
+
+* Japanese - this is the default for script code 1. It is based on a
+  Shift-JIS implementation of JIS X0208-1990 ("fullwidth") and
+  JIS X0201-1976 ("halfwidth"), with 5 additional one-byte characters
+  and one modified character, a set of Apple extension characters which
+  include many industry standard extensions, and separate codes for
+  vertical forms of some punctuation and kana. There are several font
+  variants.
+
+c) Mac OS encodings for script code 2, smTradChinese.
+
+* Chinese Traditional - this is an extension of Big-5.
+
+d) Mac OS encodings for script code 3, smKorean.
+
+* Korean - this is an extension of EUC-KR.
+
+e) Mac OS encodings for script code 4, smArabic.
+
+* Arabic - This is the default for script code 4 (when the special
+  case listed below does not apply). It is based on the ISO/IEC 8859-6
+  repertoire, with additional Arabic letters for Persian and Urdu and
+  with accented Roman letters for European languages. It has the
+  interesting feature mentioned above that certain ASCII punctuation
+  and symbol characters are encoded twice, once for each direction. It
+  has several font variants.
+ 
+* Farsi - This is the encoding if the script code is 4 and the system
+  region code is 48, verIran. It is similar to Mac OS Arabic, but has
+  the "extended" or Persian digits instead of the standard Arabic
+  digits. It has one font variant.
+
+f) Mac OS encodings for script code 5, smHebrew.
+
+* Hebrew - This is based on the ISO/IEC 8859-8 Hebrew letter repertoire,
+  but adds Hebrew points, some Hebrew ligatures, some accented Roman
+  letters for European languages, and some non-ASCII punctuation. As 
+  with Mac OS Arabic, certain ASCII punctuation and symbol characters
+  are encoded twice, once for each direction. This is also true for the
+  European digits. This has one font variant.
+
+g) Mac OS encodings for script code 6, smGreek.
+
+  None currently - see smRoman.
+
+h) Mac OS encodings for script code 7, smCyrillic.
+
+* Cyrillic - This is based on the ISO/IEC 8859-5 Cyrillic character
+  repertoire plus an additional case pair for Ukrainian.
+
+i) Mac OS encodings for script code 9, smDevanagari.
+
+* Devanagari - This is based on IS 13194:1991 (ISCII-91), and adds some
+  punctuation and symbols.
+
+j) Mac OS encodings for script code 10, smGurmukhi.
+
+* Gurmukhi - This is based on IS 13194:1991 (ISCII-91), and adds some
+  punctuation and symbols.
+
+k) Mac OS encodings for script code 11, smGujarati.
+
+* Gujarati - This is based on IS 13194:1991 (ISCII-91), and adds some
+  punctuation and symbols.
+
+l) Mac OS encodings for script code 21, smThai.
+
+* Thai - This is based on TIS 620-2533, except that three of the
+  TIS 620-2533 characters are replaced with other characters. Some
+  undefined code points in TIS 620-2533 are used for additional
+  punctuation characters.
+
+m) Mac OS encodings for script code 25, smSimpChinese.
+
+* Chinese Simplified - this is an extension of EUC-CN.
+
+n) Mac OS encodings for script code 26, smTibetan.
+
+* Tibetan
+
+o) Mac OS encodings for script code 28, smEthiopic.
+
+* Inuit - this is the encoding if the script code is 28 and the
+  system region code is 78, verNunavut (for Inuktitut language).
+  There is no script code for Inuit, so it shares the script code
+  with Ethiopic.
+
+p) Mac OS encodings for script code 29, smCentralEuroRoman.
+
+* Central European - This is similar to standard Roman, but with a
+  different (and larger) set of European characters and with fewer
+  symbols. It is used for Polish, Czech, Slovak, Hungarian, Estonian,
+  Latvian, and Lithuanian.
--- a/charmap/SYMBOL.TXT
+++ b/charmap/SYMBOL.TXT
@ -0,0 +1,405 @@
+#=======================================================================
+#   File name:  SYMBOL.TXT
+#
+#   Contents:   Map (external version) from Mac OS Symbol
+#               character set to Unicode 4.0 and later.
+#
+#   Copyright:  (c) 1994-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Change mappings for 0xBD, 0xE0. Update
+#                           header comments. Matches internal xml <c1.2>
+#                           and Text Encoding Converter 2.0.
+#      b4,c1 2002-Dec-19    Update mappings for encoded glyph fragments
+#                           0xBE, 0xE6-EF, 0xF4, 0xF6-FE to use new
+#                           Unicode 3.2 characters instead of sequences
+#                           involving corporate-use characters. Update
+#                           URLs, notes. Matches internal utom<b4>.
+#       b03  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b3>, ufrm<b3>, and Text
+#                           Encoding Converter version 1.5.
+#       b02  1998-Aug-18    Encoding changed for Mac OS 8.5; add new
+#                           mapping from 0xA0 to EURO SIGN. Matches
+#                           internal utom<b3>, ufrm<b3>.
+#       n05  1998-Feb-05    Update to match internal utom<n5>, ufrm<n15>
+#                           and Text Encoding Converter version 1.3:
+#                           Use standard Unicodes plus transcoding hints
+#                           instead of single corporate characters, also
+#                           change mappings for 0xE1 & 0xF1 from U+2329
+#                           & U+232A to their canonical decompositions;
+#                           see details below. Also update header
+#                           comments to new format.
+#       n03  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<n4>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Symbol code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode or Unicode sequence
+#       (in hex as 0xNNNN or 0xNNNN+0xNNNN).
+#     Column #3 is a comment containing the Unicode name.
+#       In some cases an additional comment follows the Unicode name.
+#
+#   The entries are in Mac OS Symbol code order.
+#
+#   Some of these mappings require the use of corporate characters.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Symbol character set uses the standard control characters
+#   at 0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Symbol:
+# -----------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported directly in programming
+#   interfaces for QuickDraw Text, the Script Manager, and related
+#   Text Utilities. For other purposes it is supported via transcoding
+#   to and from Unicode.
+#
+#   The Mac OS Symbol encoding shares the script code smRoman
+#   (0) with the Mac OS Roman encoding. To determine if the Symbol
+#   encoding is being used, you must check if the font name is
+#   "Symbol".
+#
+#   Before Mac OS 8.5, code point 0xA0 was unused. In Mac OS 8.5
+#   and later versions, code point 0xA0 is EURO SIGN and maps to
+#   U+20AC (the Symbol font is updated for Mac OS 8.5 to reflect
+#   this).
+#
+#   The layout of the Mac OS Symbol character set is identical to
+#   the layout of the Adobe Symbol encoding vector, with the
+#   addition of the Apple logo character at 0xF0.
+#
+#   This character set encodes a number of glyph fragments. Some are
+#   used as extenders: 0x60 is used to extend radical signs, 0xBD and
+#   0xBE are used to extend vertical and horizontal arrows, etc. In
+#   addition, there are top, bottom, and center sections for
+#   parentheses, brackets, integral signs, and other signs that may
+#   extend vertically for 2 or more lines of normal text. As of
+#   Unicode 3.2, most of these are now encoded in Unicode; a few are
+#   not, so these are mapped using corporate-zone Unicode characters
+#   (see below).
+#
+#   In addition, Symbol separately encodes both serif and sans-serif
+#   forms for copyright, trademark, and registered signs. Unicode
+#   encodes only the abstract characters, so one set of these (the
+#   sans-serif forms) are also mapped using corporate-zone Unicode
+#   characters (see below).
+#
+#   The following code points are unused, and are not shown here:
+#   0x80-0x9F, 0xFF.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   The goals in the mappings provided here are:
+#   - Ensure roundtrip mapping from every character in the Mac OS
+#     Symbol character set to Unicode and back
+#   - Use standard Unicode characters as much as possible, to
+#     maximize interchangeability of the resulting Unicode text.
+#     Whenever possible, avoid having content carried by private-use
+#     characters.
+#
+#   Some of the characters in the Mac OS Symbol character set do not
+#   correspond to distinct, single Unicode characters. To map these
+#   and satisfy both goals above, we employ various strategies.
+#
+#   a) If possible, use private use characters in combination with
+#   standard Unicode characters to mark variants of the standard
+#   Unicode character.
+#
+#   Apple has defined a block of 32 corporate characters as "transcoding
+#   hints." These are used in combination with standard Unicode
+#   characters to force them to be treated in a special way for mapping
+#   to other encodings; they have no other effect. Sixteen of these
+#   transcoding hints are "grouping hints" - they indicate that the next
+#   2-4 Unicode characters should be treated as a single entity for
+#   transcoding. The other sixteen transcoding hints are "variant tags"
+#   - they are like combining characters, and can follow a standard
+#   Unicode (or a sequence consisting of a base character and other
+#   combining characters) to cause it to be treated in a special way for
+#   transcoding. These always terminate a combining-character sequence.
+#
+#   The transcoding coding hint used in this mapping table is the
+#   variant tag 0xF87F. Since this is combined with standard Unicode
+#   characters, some characters in the Mac OS Symbol character set map
+#   to a sequence of two Unicodes instead of a single Unicode character.
+#
+#   For example, the Mac OS Symbol character at 0xE2 is an alternate,
+#   sans-serif form of the REGISTERED SIGN (the standard mapping is for
+#   the abstract character at 0xD2, which here has a serif form). So 0xE2
+#   is mapped to 0x00AE (REGISTERED SIGN) + 0xF87F (a variant tag).
+#
+#   b) Otherwise, use private use characters by themselves to map
+#   Mac OS Symbol characters which have no relationship to any standard
+#   Unicode character.
+#
+#   The following additional corporate zone Unicode characters are
+#   used for this purpose here:
+#
+#     0xF8E5  radical extender
+#     0xF8FF  Apple logo
+#
+#   NOTE: The graphic image associated with the Apple logo character
+#   is not authorized for use without permission of Apple, and
+#   unauthorized use might constitute trademark infringement.
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version c01 to version c02:
+#
+#   - Update mappings for 0xBD from 0xF8E6 to 0x23D0 (use new Unicode
+#     4.0 char)
+#   - Correct mapping for 0xE0 from 0x22C4 to 0x25CA
+#
+#   Changes from version b02 to version b03/c01:
+#
+#   - Update mappings for encoded glyph fragments 0xBE, 0xE6-EF, 0xF4,
+#     0xF6-FE to use new Unicode 3.2 characters instead of using either
+#     single corporate-use characters (e.g. 0xBE was mapped to 0xF8E7) or
+#     sequences combining a standard Unicode character with a transcoding
+#     hint (e.g. 0xE6 was mapped to 0x0028+0xF870).
+#
+#   Changes from version n05 to version b02:
+#
+#   - Encoding changed for Mac OS 8.5; 0xA0 now maps to 0x20AC, EURO
+#   SIGN. 0xA0 was unmapped in earlier versions.
+#
+#   Changes from version n03 to version n05:
+#
+#   - Change strict mapping for 0xE1 & 0xF1 from U+2329 & U+232A
+#     to their canonical decompositions, U+3008 & U+3009.
+#
+#   - Change mapping for the following to use standard Unicode +
+#     transcoding hint, instead of single corporate-zone
+#     character: 0xE2-0xE4, 0xE6-0xEE, 0xF4, 0xF6-0xFE.
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x2200	# FOR ALL
+0x23	0x0023	# NUMBER SIGN
+0x24	0x2203	# THERE EXISTS
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x220D	# SMALL CONTAINS AS MEMBER
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x2217	# ASTERISK OPERATOR
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x2212	# MINUS SIGN
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x2245	# APPROXIMATELY EQUAL TO
+0x41	0x0391	# GREEK CAPITAL LETTER ALPHA
+0x42	0x0392	# GREEK CAPITAL LETTER BETA
+0x43	0x03A7	# GREEK CAPITAL LETTER CHI
+0x44	0x0394	# GREEK CAPITAL LETTER DELTA
+0x45	0x0395	# GREEK CAPITAL LETTER EPSILON
+0x46	0x03A6	# GREEK CAPITAL LETTER PHI
+0x47	0x0393	# GREEK CAPITAL LETTER GAMMA
+0x48	0x0397	# GREEK CAPITAL LETTER ETA
+0x49	0x0399	# GREEK CAPITAL LETTER IOTA
+0x4A	0x03D1	# GREEK THETA SYMBOL
+0x4B	0x039A	# GREEK CAPITAL LETTER KAPPA
+0x4C	0x039B	# GREEK CAPITAL LETTER LAMDA
+0x4D	0x039C	# GREEK CAPITAL LETTER MU
+0x4E	0x039D	# GREEK CAPITAL LETTER NU
+0x4F	0x039F	# GREEK CAPITAL LETTER OMICRON
+0x50	0x03A0	# GREEK CAPITAL LETTER PI
+0x51	0x0398	# GREEK CAPITAL LETTER THETA
+0x52	0x03A1	# GREEK CAPITAL LETTER RHO
+0x53	0x03A3	# GREEK CAPITAL LETTER SIGMA
+0x54	0x03A4	# GREEK CAPITAL LETTER TAU
+0x55	0x03A5	# GREEK CAPITAL LETTER UPSILON
+0x56	0x03C2	# GREEK SMALL LETTER FINAL SIGMA
+0x57	0x03A9	# GREEK CAPITAL LETTER OMEGA
+0x58	0x039E	# GREEK CAPITAL LETTER XI
+0x59	0x03A8	# GREEK CAPITAL LETTER PSI
+0x5A	0x0396	# GREEK CAPITAL LETTER ZETA
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x2234	# THEREFORE
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x22A5	# UP TACK
+0x5F	0x005F	# LOW LINE
+0x60	0xF8E5	# radical extender # corporate char
+0x61	0x03B1	# GREEK SMALL LETTER ALPHA
+0x62	0x03B2	# GREEK SMALL LETTER BETA
+0x63	0x03C7	# GREEK SMALL LETTER CHI
+0x64	0x03B4	# GREEK SMALL LETTER DELTA
+0x65	0x03B5	# GREEK SMALL LETTER EPSILON
+0x66	0x03C6	# GREEK SMALL LETTER PHI
+0x67	0x03B3	# GREEK SMALL LETTER GAMMA
+0x68	0x03B7	# GREEK SMALL LETTER ETA
+0x69	0x03B9	# GREEK SMALL LETTER IOTA
+0x6A	0x03D5	# GREEK PHI SYMBOL
+0x6B	0x03BA	# GREEK SMALL LETTER KAPPA
+0x6C	0x03BB	# GREEK SMALL LETTER LAMDA
+0x6D	0x03BC	# GREEK SMALL LETTER MU
+0x6E	0x03BD	# GREEK SMALL LETTER NU
+0x6F	0x03BF	# GREEK SMALL LETTER OMICRON
+0x70	0x03C0	# GREEK SMALL LETTER PI
+0x71	0x03B8	# GREEK SMALL LETTER THETA
+0x72	0x03C1	# GREEK SMALL LETTER RHO
+0x73	0x03C3	# GREEK SMALL LETTER SIGMA
+0x74	0x03C4	# GREEK SMALL LETTER TAU
+0x75	0x03C5	# GREEK SMALL LETTER UPSILON
+0x76	0x03D6	# GREEK PI SYMBOL
+0x77	0x03C9	# GREEK SMALL LETTER OMEGA
+0x78	0x03BE	# GREEK SMALL LETTER XI
+0x79	0x03C8	# GREEK SMALL LETTER PSI
+0x7A	0x03B6	# GREEK SMALL LETTER ZETA
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x223C	# TILDE OPERATOR
+#
+0xA0	0x20AC	# EURO SIGN
+0xA1	0x03D2	# GREEK UPSILON WITH HOOK SYMBOL
+0xA2	0x2032	# PRIME # minute
+0xA3	0x2264	# LESS-THAN OR EQUAL TO
+0xA4	0x2044	# FRACTION SLASH
+0xA5	0x221E	# INFINITY
+0xA6	0x0192	# LATIN SMALL LETTER F WITH HOOK
+0xA7	0x2663	# BLACK CLUB SUIT
+0xA8	0x2666	# BLACK DIAMOND SUIT
+0xA9	0x2665	# BLACK HEART SUIT
+0xAA	0x2660	# BLACK SPADE SUIT
+0xAB	0x2194	# LEFT RIGHT ARROW
+0xAC	0x2190	# LEFTWARDS ARROW
+0xAD	0x2191	# UPWARDS ARROW
+0xAE	0x2192	# RIGHTWARDS ARROW
+0xAF	0x2193	# DOWNWARDS ARROW
+0xB0	0x00B0	# DEGREE SIGN
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2033	# DOUBLE PRIME # second
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x00D7	# MULTIPLICATION SIGN
+0xB5	0x221D	# PROPORTIONAL TO
+0xB6	0x2202	# PARTIAL DIFFERENTIAL
+0xB7	0x2022	# BULLET
+0xB8	0x00F7	# DIVISION SIGN
+0xB9	0x2260	# NOT EQUAL TO
+0xBA	0x2261	# IDENTICAL TO
+0xBB	0x2248	# ALMOST EQUAL TO
+0xBC	0x2026	# HORIZONTAL ELLIPSIS
+0xBD	0x23D0	# VERTICAL LINE EXTENSION (for arrows) # for Unicode 4.0 and later
+0xBE	0x23AF	# HORIZONTAL LINE EXTENSION (for arrows) # for Unicode 3.2 and later
+0xBF	0x21B5	# DOWNWARDS ARROW WITH CORNER LEFTWARDS
+0xC0	0x2135	# ALEF SYMBOL
+0xC1	0x2111	# BLACK-LETTER CAPITAL I
+0xC2	0x211C	# BLACK-LETTER CAPITAL R
+0xC3	0x2118	# SCRIPT CAPITAL P
+0xC4	0x2297	# CIRCLED TIMES
+0xC5	0x2295	# CIRCLED PLUS
+0xC6	0x2205	# EMPTY SET
+0xC7	0x2229	# INTERSECTION
+0xC8	0x222A	# UNION
+0xC9	0x2283	# SUPERSET OF
+0xCA	0x2287	# SUPERSET OF OR EQUAL TO
+0xCB	0x2284	# NOT A SUBSET OF
+0xCC	0x2282	# SUBSET OF
+0xCD	0x2286	# SUBSET OF OR EQUAL TO
+0xCE	0x2208	# ELEMENT OF
+0xCF	0x2209	# NOT AN ELEMENT OF
+0xD0	0x2220	# ANGLE
+0xD1	0x2207	# NABLA
+0xD2	0x00AE	# REGISTERED SIGN # serif
+0xD3	0x00A9	# COPYRIGHT SIGN # serif
+0xD4	0x2122	# TRADE MARK SIGN # serif
+0xD5	0x220F	# N-ARY PRODUCT
+0xD6	0x221A	# SQUARE ROOT
+0xD7	0x22C5	# DOT OPERATOR
+0xD8	0x00AC	# NOT SIGN
+0xD9	0x2227	# LOGICAL AND
+0xDA	0x2228	# LOGICAL OR
+0xDB	0x21D4	# LEFT RIGHT DOUBLE ARROW
+0xDC	0x21D0	# LEFTWARDS DOUBLE ARROW
+0xDD	0x21D1	# UPWARDS DOUBLE ARROW
+0xDE	0x21D2	# RIGHTWARDS DOUBLE ARROW
+0xDF	0x21D3	# DOWNWARDS DOUBLE ARROW
+0xE0	0x25CA	# LOZENGE # previously mapped to 0x22C4 DIAMOND OPERATOR
+0xE1	0x3008	# LEFT ANGLE BRACKET
+0xE2	0x00AE+0xF87F	# REGISTERED SIGN, alternate: sans serif
+0xE3	0x00A9+0xF87F	# COPYRIGHT SIGN, alternate: sans serif
+0xE4	0x2122+0xF87F	# TRADE MARK SIGN, alternate: sans serif
+0xE5	0x2211	# N-ARY SUMMATION
+0xE6	0x239B	# LEFT PARENTHESIS UPPER HOOK # for Unicode 3.2 and later
+0xE7	0x239C	# LEFT PARENTHESIS EXTENSION # for Unicode 3.2 and later
+0xE8	0x239D	# LEFT PARENTHESIS LOWER HOOK # for Unicode 3.2 and later
+0xE9	0x23A1	# LEFT SQUARE BRACKET UPPER CORNER # for Unicode 3.2 and later
+0xEA	0x23A2	# LEFT SQUARE BRACKET EXTENSION # for Unicode 3.2 and later
+0xEB	0x23A3	# LEFT SQUARE BRACKET LOWER CORNER # for Unicode 3.2 and later
+0xEC	0x23A7	# LEFT CURLY BRACKET UPPER HOOK # for Unicode 3.2 and later
+0xED	0x23A8	# LEFT CURLY BRACKET MIDDLE PIECE # for Unicode 3.2 and later
+0xEE	0x23A9	# LEFT CURLY BRACKET LOWER HOOK # for Unicode 3.2 and later
+0xEF	0x23AA	# CURLY BRACKET EXTENSION # for Unicode 3.2 and later
+0xF0	0xF8FF	# Apple logo
+0xF1	0x3009	# RIGHT ANGLE BRACKET
+0xF2	0x222B	# INTEGRAL
+0xF3	0x2320	# TOP HALF INTEGRAL
+0xF4	0x23AE	# INTEGRAL EXTENSION # for Unicode 3.2 and later
+0xF5	0x2321	# BOTTOM HALF INTEGRAL
+0xF6	0x239E	# RIGHT PARENTHESIS UPPER HOOK # for Unicode 3.2 and later
+0xF7	0x239F	# RIGHT PARENTHESIS EXTENSION # for Unicode 3.2 and later
+0xF8	0x23A0	# RIGHT PARENTHESIS LOWER HOOK # for Unicode 3.2 and later
+0xF9	0x23A4	# RIGHT SQUARE BRACKET UPPER CORNER # for Unicode 3.2 and later
+0xFA	0x23A5	# RIGHT SQUARE BRACKET EXTENSION # for Unicode 3.2 and later
+0xFB	0x23A6	# RIGHT SQUARE BRACKET LOWER CORNER # for Unicode 3.2 and later
+0xFC	0x23AB	# RIGHT CURLY BRACKET UPPER HOOK # for Unicode 3.2 and later
+0xFD	0x23AC	# RIGHT CURLY BRACKET MIDDLE PIECE # for Unicode 3.2 and later
+0xFE	0x23AD	# RIGHT CURLY BRACKET LOWER HOOK # for Unicode 3.2 and later
--- a/charmap/THAI.TXT
+++ b/charmap/THAI.TXT
@ -0,0 +1,384 @@
+#=======================================================================
+#   File name:  THAI.TXT
+#
+#   Contents:   Map (external version) from Mac OS Thai
+#               character set to Unicode 3.2 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update mapping for 0xDB to use new Unicode
+#                           3.2 WORD JOINER instead of ZWNBSP (BOM).
+#                           Update URLs. Matches internal utom<b3>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b2>, and Text
+#                           Encoding Converter version 1.5.
+#       n07  1998-Feb-05    Update to match internal utom<n5>, ufrm<n13>
+#                           and Text Encoding Converter version 1.3:
+#                           Use standard Unicodes plus transcoding hints
+#                           instead of single corporate characters; see
+#                           details below. Also update header comments
+#                           to new format.
+#       n04  1995-Nov-17    First version (after fixing some typos).
+#                           Matches internal ufrm<n6>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Thai code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode or Unicode sequence
+#       (in hex as 0xNNNN or 0xNNNN+0xNNNN).
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Thai code order.
+#
+#   Some of these mappings require the use of corporate characters.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Thai character set uses the standard control characters at
+#   0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Thai:
+# ---------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#   Codes 0xA1-0xDA and 0xDF-0xFB are the character set from Thai
+#   standard TIS 620-2533, except that the following changes are
+#   made:
+#     0xEE is TRADE MARK SIGN (instead of THAI CHARACTER YAMAKKAN)
+#     0xFA is REGISTERED SIGN (instead of THAI CHARACTER ANGKHANKHU)
+#     0xFB is COPYRIGHT SIGN (instead of THAI CHARACTER KHOMUT)
+#
+#   Codes 0x80-0x82, 0x8D-0x8E, 0x91, 0x9D-0x9E, and 0xDB-0xDE are
+#   various additional punctuation marks (e.g. curly quotes,
+#   ellipsis), no-break space, and two special characters "word join"
+#   and "word break".
+#
+#   Codes 0x83-0x8C, 0x8F, and 0x92-0x9C are for positional variants
+#   of the upper vowels, tone marks, and other signs at 0xD1,
+#   0xD4-0xD7, and 0xE7-0xED. The positional variants would normally
+#   be considered presentation forms only and not characters. In most
+#   cases they are not typed directly; they are selected automatically
+#   at display time by the WorldScript software. However, using the
+#   Thai-DTP keyboard, the presentation forms can in fact be typed
+#   directly using dead keys. Thus they must be treated as real
+#   characters in the Mac OS Thai encoding. They are mapped using
+#   variant tags; see below.
+#
+#   Several code points are undefined and unused (they cannot be
+#   typed using any of the Mac OS Thai keyboard layouts): 0x90, 0x9F,
+#   0xFC-0xFE. These are not shown in the table below.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   The goals in the Apple mappings provided here are:
+#   - Ensure roundtrip mapping from every character in the Mac OS Thai
+#   character set to Unicode and back
+#   - Use standard Unicode characters as much as possible, to maximize
+#   interchangeability of the resulting Unicode text. Whenever possible,
+#   avoid having content carried by private-use characters.
+#
+#   To satisfy both goals, we use private use characters to mark variants
+#   that are similar to a sequence of one or more standard Unicode
+#   characters.
+#
+#   Apple has defined a block of 32 corporate characters as "transcoding
+#   hints." These are used in combination with standard Unicode characters
+#   to force them to be treated in a special way for mapping to other
+#   encodings; they have no other effect. Sixteen of these transcoding
+#   hints are "grouping hints" - they indicate that the next 2-4 Unicode
+#   characters should be treated as a single entity for transcoding. The
+#   other sixteen transcoding hints are "variant tags" - they are like
+#   combining characters, and can follow a standard Unicode (or a sequence
+#   consisting of a base character and other combining characters) to
+#   cause it to be treated in a special way for transcoding. These always
+#   terminate a combining-character sequence.
+#
+#   The transcoding coding hints used in this mapping table are four
+#   variant tags in the range 0xF873-75. Since these are combined with
+#   standard Unicode characters, some characters in the Mac OS Thai
+#   character set map to a sequence of two Unicodes instead of a single
+#   Unicode character. For example, the Mac OS Thai character at 0x83 is a
+#   low-left positional variant of THAI CHARACTER MAI EK (the standard
+#   mapping is for the abstract character at 0xE8). So 0x83 is mapped to
+#   0x0E48 (THAI CHARACTER MAI EK) + 0xF875 (a variant tag).
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version b02 to version b03/c01:
+#
+#   - Update mapping for 0xDB to use new Unicode 3.2 character U+2060
+#     WORD JOINER instead of U+FEFF ZERO WIDTH NO-BREAK SPACE (BOM)
+#
+#   Changes from version n04 to version n07:
+#
+#   - Changed mappings of the positional variants to use standard
+#   Unicodes + transcoding hint, instead of using single corporate
+#   zone characters. This affected the mappings for the following:
+#   0x83-08C, 0x8F, 0x92-0x9C
+#
+#   - Just comment out unused code points in the table, instead
+#   of mapping them to U+FFFD.
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0x81	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0x82	0x2026	# HORIZONTAL ELLIPSIS
+0x83	0x0E48+0xF875	# THAI CHARACTER MAI EK, low left position
+0x84	0x0E49+0xF875	# THAI CHARACTER MAI THO, low left position
+0x85	0x0E4A+0xF875	# THAI CHARACTER MAI TRI, low left position
+0x86	0x0E4B+0xF875	# THAI CHARACTER MAI CHATTAWA, low left position
+0x87	0x0E4C+0xF875	# THAI CHARACTER THANTHAKHAT, low left position
+0x88	0x0E48+0xF873	# THAI CHARACTER MAI EK, low position
+0x89	0x0E49+0xF873	# THAI CHARACTER MAI THO, low position
+0x8A	0x0E4A+0xF873	# THAI CHARACTER MAI TRI, low position
+0x8B	0x0E4B+0xF873	# THAI CHARACTER MAI CHATTAWA, low position
+0x8C	0x0E4C+0xF873	# THAI CHARACTER THANTHAKHAT, low position
+0x8D	0x201C	# LEFT DOUBLE QUOTATION MARK
+0x8E	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0x8F	0x0E4D+0xF874	# THAI CHARACTER NIKHAHIT, left position
+#
+0x91	0x2022	# BULLET
+0x92	0x0E31+0xF874	# THAI CHARACTER MAI HAN-AKAT, left position
+0x93	0x0E47+0xF874	# THAI CHARACTER MAITAIKHU, left position
+0x94	0x0E34+0xF874	# THAI CHARACTER SARA I, left position
+0x95	0x0E35+0xF874	# THAI CHARACTER SARA II, left position
+0x96	0x0E36+0xF874	# THAI CHARACTER SARA UE, left position
+0x97	0x0E37+0xF874	# THAI CHARACTER SARA UEE, left position
+0x98	0x0E48+0xF874	# THAI CHARACTER MAI EK, left position
+0x99	0x0E49+0xF874	# THAI CHARACTER MAI THO, left position
+0x9A	0x0E4A+0xF874	# THAI CHARACTER MAI TRI, left position
+0x9B	0x0E4B+0xF874	# THAI CHARACTER MAI CHATTAWA, left position
+0x9C	0x0E4C+0xF874	# THAI CHARACTER THANTHAKHAT, left position
+0x9D	0x2018	# LEFT SINGLE QUOTATION MARK
+0x9E	0x2019	# RIGHT SINGLE QUOTATION MARK
+#
+0xA0	0x00A0	# NO-BREAK SPACE
+0xA1	0x0E01	# THAI CHARACTER KO KAI
+0xA2	0x0E02	# THAI CHARACTER KHO KHAI
+0xA3	0x0E03	# THAI CHARACTER KHO KHUAT
+0xA4	0x0E04	# THAI CHARACTER KHO KHWAI
+0xA5	0x0E05	# THAI CHARACTER KHO KHON
+0xA6	0x0E06	# THAI CHARACTER KHO RAKHANG
+0xA7	0x0E07	# THAI CHARACTER NGO NGU
+0xA8	0x0E08	# THAI CHARACTER CHO CHAN
+0xA9	0x0E09	# THAI CHARACTER CHO CHING
+0xAA	0x0E0A	# THAI CHARACTER CHO CHANG
+0xAB	0x0E0B	# THAI CHARACTER SO SO
+0xAC	0x0E0C	# THAI CHARACTER CHO CHOE
+0xAD	0x0E0D	# THAI CHARACTER YO YING
+0xAE	0x0E0E	# THAI CHARACTER DO CHADA
+0xAF	0x0E0F	# THAI CHARACTER TO PATAK
+0xB0	0x0E10	# THAI CHARACTER THO THAN
+0xB1	0x0E11	# THAI CHARACTER THO NANGMONTHO
+0xB2	0x0E12	# THAI CHARACTER THO PHUTHAO
+0xB3	0x0E13	# THAI CHARACTER NO NEN
+0xB4	0x0E14	# THAI CHARACTER DO DEK
+0xB5	0x0E15	# THAI CHARACTER TO TAO
+0xB6	0x0E16	# THAI CHARACTER THO THUNG
+0xB7	0x0E17	# THAI CHARACTER THO THAHAN
+0xB8	0x0E18	# THAI CHARACTER THO THONG
+0xB9	0x0E19	# THAI CHARACTER NO NU
+0xBA	0x0E1A	# THAI CHARACTER BO BAIMAI
+0xBB	0x0E1B	# THAI CHARACTER PO PLA
+0xBC	0x0E1C	# THAI CHARACTER PHO PHUNG
+0xBD	0x0E1D	# THAI CHARACTER FO FA
+0xBE	0x0E1E	# THAI CHARACTER PHO PHAN
+0xBF	0x0E1F	# THAI CHARACTER FO FAN
+0xC0	0x0E20	# THAI CHARACTER PHO SAMPHAO
+0xC1	0x0E21	# THAI CHARACTER MO MA
+0xC2	0x0E22	# THAI CHARACTER YO YAK
+0xC3	0x0E23	# THAI CHARACTER RO RUA
+0xC4	0x0E24	# THAI CHARACTER RU
+0xC5	0x0E25	# THAI CHARACTER LO LING
+0xC6	0x0E26	# THAI CHARACTER LU
+0xC7	0x0E27	# THAI CHARACTER WO WAEN
+0xC8	0x0E28	# THAI CHARACTER SO SALA
+0xC9	0x0E29	# THAI CHARACTER SO RUSI
+0xCA	0x0E2A	# THAI CHARACTER SO SUA
+0xCB	0x0E2B	# THAI CHARACTER HO HIP
+0xCC	0x0E2C	# THAI CHARACTER LO CHULA
+0xCD	0x0E2D	# THAI CHARACTER O ANG
+0xCE	0x0E2E	# THAI CHARACTER HO NOKHUK
+0xCF	0x0E2F	# THAI CHARACTER PAIYANNOI
+0xD0	0x0E30	# THAI CHARACTER SARA A
+0xD1	0x0E31	# THAI CHARACTER MAI HAN-AKAT
+0xD2	0x0E32	# THAI CHARACTER SARA AA
+0xD3	0x0E33	# THAI CHARACTER SARA AM
+0xD4	0x0E34	# THAI CHARACTER SARA I
+0xD5	0x0E35	# THAI CHARACTER SARA II
+0xD6	0x0E36	# THAI CHARACTER SARA UE
+0xD7	0x0E37	# THAI CHARACTER SARA UEE
+0xD8	0x0E38	# THAI CHARACTER SARA U
+0xD9	0x0E39	# THAI CHARACTER SARA UU
+0xDA	0x0E3A	# THAI CHARACTER PHINTHU
+0xDB	0x2060	# WORD JOINER # for Unicode 3.2 and later
+0xDC	0x200B	# ZERO WIDTH SPACE
+0xDD	0x2013	# EN DASH
+0xDE	0x2014	# EM DASH
+0xDF	0x0E3F	# THAI CURRENCY SYMBOL BAHT
+0xE0	0x0E40	# THAI CHARACTER SARA E
+0xE1	0x0E41	# THAI CHARACTER SARA AE
+0xE2	0x0E42	# THAI CHARACTER SARA O
+0xE3	0x0E43	# THAI CHARACTER SARA AI MAIMUAN
+0xE4	0x0E44	# THAI CHARACTER SARA AI MAIMALAI
+0xE5	0x0E45	# THAI CHARACTER LAKKHANGYAO
+0xE6	0x0E46	# THAI CHARACTER MAIYAMOK
+0xE7	0x0E47	# THAI CHARACTER MAITAIKHU
+0xE8	0x0E48	# THAI CHARACTER MAI EK
+0xE9	0x0E49	# THAI CHARACTER MAI THO
+0xEA	0x0E4A	# THAI CHARACTER MAI TRI
+0xEB	0x0E4B	# THAI CHARACTER MAI CHATTAWA
+0xEC	0x0E4C	# THAI CHARACTER THANTHAKHAT
+0xED	0x0E4D	# THAI CHARACTER NIKHAHIT
+0xEE	0x2122	# TRADE MARK SIGN
+0xEF	0x0E4F	# THAI CHARACTER FONGMAN
+0xF0	0x0E50	# THAI DIGIT ZERO
+0xF1	0x0E51	# THAI DIGIT ONE
+0xF2	0x0E52	# THAI DIGIT TWO
+0xF3	0x0E53	# THAI DIGIT THREE
+0xF4	0x0E54	# THAI DIGIT FOUR
+0xF5	0x0E55	# THAI DIGIT FIVE
+0xF6	0x0E56	# THAI DIGIT SIX
+0xF7	0x0E57	# THAI DIGIT SEVEN
+0xF8	0x0E58	# THAI DIGIT EIGHT
+0xF9	0x0E59	# THAI DIGIT NINE
+0xFA	0x00AE	# REGISTERED SIGN
+0xFB	0x00A9	# COPYRIGHT SIGN
--- a/charmap/TURKISH.TXT
+++ b/charmap/TURKISH.TXT
@ -0,0 +1,341 @@
+#=======================================================================
+#   File name:  TURKISH.TXT
+#
+#   Contents:   Map (external version) from Mac OS Turkish
+#               character set to Unicode 2.1 and later.
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments. Matches internal xml
+#                           <c1.1> and Text Encoding Converter 2.0.
+#      b3,c1 2002-Dec-19    Update URLs, notes. Matches internal
+#                           utom<b1>.
+#       b02  1999-Sep-22    Update contact e-mail address. Matches
+#                           internal utom<b1>, ufrm<b1>, and Text
+#                           Encoding Converter version 1.5.
+#       n05  1998-Feb-05    Minor update to header comments
+#       n03  1997-Dec-14    Update to match internal utom<n5>, ufrm<n15>:
+#                           Change standard mapping for 0xBD from U+2126
+#                           to its canonical decomposition, U+03A9.
+#       n02  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<n4>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Format:
+# -------
+#
+#   Three tab-separated columns;
+#   '#' begins a comment which continues to the end of the line.
+#     Column #1 is the Mac OS Turkish code (in hex as 0xNN)
+#     Column #2 is the corresponding Unicode (in hex as 0xNNNN)
+#     Column #3 is a comment containing the Unicode name
+#
+#   The entries are in Mac OS Turkish code order.
+#
+#   Two of these mappings requires the use of a corporate character.
+#   See the file "CORPCHAR.TXT" and notes below.
+#
+#   Control character mappings are not shown in this table, following
+#   the conventions of the standard UTC mapping tables. However, the
+#   Mac OS Turkish character set uses the standard control characters at
+#   0x00-0x1F and 0x7F.
+#
+# Notes on Mac OS Turkish:
+# ------------------------
+#
+#   This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
+#   environments, it is only supported via transcoding to and from
+#   Unicode.
+#
+#	Mac OS Turkish is used for Turkish.
+#
+#   The Mac OS Turkish encoding shares the script code smRoman
+#   (0) with the Mac OS Roman encoding. To determine if the Turkish
+#   encoding is being used, you must also check if the system region
+#   code is 24, verTurkey.
+#
+#   This character set is a variant of standard Mac OS Roman. It adds
+#   upper & lower G with breve, upper & lower S with cedilla, upper I
+#   with dot, and moves the dotless lower i from its position at 0xF5
+#   in standard Mac OS Roman to a position at 0xDD here (leaving the
+#   0xF5 code point undefined in Mac OS Turkish). This gives a total
+#   of 7 code point differences from standard Mac OS Roman.
+#
+# Unicode mapping issues and notes:
+# ---------------------------------
+#
+#   The following corporate zone Unicode characters are used in this
+#   mapping:
+#
+#     0xF8A0  undefined1, used to map the single undefined code point
+#             in Mac OS Turkish (to obtain roundtrip fidelity for all
+#             code points).
+#     0xF8FF  Apple logo
+#
+#   NOTE: The graphic image associated with the Apple logo character
+#   is not authorized for use without permission of Apple, and
+#   unauthorized use might constitute trademark infringement.
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version n02 to version n03:
+#
+#   - Change mapping of 0xBD from U+2126 to its canonical
+#     decomposition, U+03A9.
+#
+##################
+
+0x20	0x0020	# SPACE
+0x21	0x0021	# EXCLAMATION MARK
+0x22	0x0022	# QUOTATION MARK
+0x23	0x0023	# NUMBER SIGN
+0x24	0x0024	# DOLLAR SIGN
+0x25	0x0025	# PERCENT SIGN
+0x26	0x0026	# AMPERSAND
+0x27	0x0027	# APOSTROPHE
+0x28	0x0028	# LEFT PARENTHESIS
+0x29	0x0029	# RIGHT PARENTHESIS
+0x2A	0x002A	# ASTERISK
+0x2B	0x002B	# PLUS SIGN
+0x2C	0x002C	# COMMA
+0x2D	0x002D	# HYPHEN-MINUS
+0x2E	0x002E	# FULL STOP
+0x2F	0x002F	# SOLIDUS
+0x30	0x0030	# DIGIT ZERO
+0x31	0x0031	# DIGIT ONE
+0x32	0x0032	# DIGIT TWO
+0x33	0x0033	# DIGIT THREE
+0x34	0x0034	# DIGIT FOUR
+0x35	0x0035	# DIGIT FIVE
+0x36	0x0036	# DIGIT SIX
+0x37	0x0037	# DIGIT SEVEN
+0x38	0x0038	# DIGIT EIGHT
+0x39	0x0039	# DIGIT NINE
+0x3A	0x003A	# COLON
+0x3B	0x003B	# SEMICOLON
+0x3C	0x003C	# LESS-THAN SIGN
+0x3D	0x003D	# EQUALS SIGN
+0x3E	0x003E	# GREATER-THAN SIGN
+0x3F	0x003F	# QUESTION MARK
+0x40	0x0040	# COMMERCIAL AT
+0x41	0x0041	# LATIN CAPITAL LETTER A
+0x42	0x0042	# LATIN CAPITAL LETTER B
+0x43	0x0043	# LATIN CAPITAL LETTER C
+0x44	0x0044	# LATIN CAPITAL LETTER D
+0x45	0x0045	# LATIN CAPITAL LETTER E
+0x46	0x0046	# LATIN CAPITAL LETTER F
+0x47	0x0047	# LATIN CAPITAL LETTER G
+0x48	0x0048	# LATIN CAPITAL LETTER H
+0x49	0x0049	# LATIN CAPITAL LETTER I
+0x4A	0x004A	# LATIN CAPITAL LETTER J
+0x4B	0x004B	# LATIN CAPITAL LETTER K
+0x4C	0x004C	# LATIN CAPITAL LETTER L
+0x4D	0x004D	# LATIN CAPITAL LETTER M
+0x4E	0x004E	# LATIN CAPITAL LETTER N
+0x4F	0x004F	# LATIN CAPITAL LETTER O
+0x50	0x0050	# LATIN CAPITAL LETTER P
+0x51	0x0051	# LATIN CAPITAL LETTER Q
+0x52	0x0052	# LATIN CAPITAL LETTER R
+0x53	0x0053	# LATIN CAPITAL LETTER S
+0x54	0x0054	# LATIN CAPITAL LETTER T
+0x55	0x0055	# LATIN CAPITAL LETTER U
+0x56	0x0056	# LATIN CAPITAL LETTER V
+0x57	0x0057	# LATIN CAPITAL LETTER W
+0x58	0x0058	# LATIN CAPITAL LETTER X
+0x59	0x0059	# LATIN CAPITAL LETTER Y
+0x5A	0x005A	# LATIN CAPITAL LETTER Z
+0x5B	0x005B	# LEFT SQUARE BRACKET
+0x5C	0x005C	# REVERSE SOLIDUS
+0x5D	0x005D	# RIGHT SQUARE BRACKET
+0x5E	0x005E	# CIRCUMFLEX ACCENT
+0x5F	0x005F	# LOW LINE
+0x60	0x0060	# GRAVE ACCENT
+0x61	0x0061	# LATIN SMALL LETTER A
+0x62	0x0062	# LATIN SMALL LETTER B
+0x63	0x0063	# LATIN SMALL LETTER C
+0x64	0x0064	# LATIN SMALL LETTER D
+0x65	0x0065	# LATIN SMALL LETTER E
+0x66	0x0066	# LATIN SMALL LETTER F
+0x67	0x0067	# LATIN SMALL LETTER G
+0x68	0x0068	# LATIN SMALL LETTER H
+0x69	0x0069	# LATIN SMALL LETTER I
+0x6A	0x006A	# LATIN SMALL LETTER J
+0x6B	0x006B	# LATIN SMALL LETTER K
+0x6C	0x006C	# LATIN SMALL LETTER L
+0x6D	0x006D	# LATIN SMALL LETTER M
+0x6E	0x006E	# LATIN SMALL LETTER N
+0x6F	0x006F	# LATIN SMALL LETTER O
+0x70	0x0070	# LATIN SMALL LETTER P
+0x71	0x0071	# LATIN SMALL LETTER Q
+0x72	0x0072	# LATIN SMALL LETTER R
+0x73	0x0073	# LATIN SMALL LETTER S
+0x74	0x0074	# LATIN SMALL LETTER T
+0x75	0x0075	# LATIN SMALL LETTER U
+0x76	0x0076	# LATIN SMALL LETTER V
+0x77	0x0077	# LATIN SMALL LETTER W
+0x78	0x0078	# LATIN SMALL LETTER X
+0x79	0x0079	# LATIN SMALL LETTER Y
+0x7A	0x007A	# LATIN SMALL LETTER Z
+0x7B	0x007B	# LEFT CURLY BRACKET
+0x7C	0x007C	# VERTICAL LINE
+0x7D	0x007D	# RIGHT CURLY BRACKET
+0x7E	0x007E	# TILDE
+#
+0x80	0x00C4	# LATIN CAPITAL LETTER A WITH DIAERESIS
+0x81	0x00C5	# LATIN CAPITAL LETTER A WITH RING ABOVE
+0x82	0x00C7	# LATIN CAPITAL LETTER C WITH CEDILLA
+0x83	0x00C9	# LATIN CAPITAL LETTER E WITH ACUTE
+0x84	0x00D1	# LATIN CAPITAL LETTER N WITH TILDE
+0x85	0x00D6	# LATIN CAPITAL LETTER O WITH DIAERESIS
+0x86	0x00DC	# LATIN CAPITAL LETTER U WITH DIAERESIS
+0x87	0x00E1	# LATIN SMALL LETTER A WITH ACUTE
+0x88	0x00E0	# LATIN SMALL LETTER A WITH GRAVE
+0x89	0x00E2	# LATIN SMALL LETTER A WITH CIRCUMFLEX
+0x8A	0x00E4	# LATIN SMALL LETTER A WITH DIAERESIS
+0x8B	0x00E3	# LATIN SMALL LETTER A WITH TILDE
+0x8C	0x00E5	# LATIN SMALL LETTER A WITH RING ABOVE
+0x8D	0x00E7	# LATIN SMALL LETTER C WITH CEDILLA
+0x8E	0x00E9	# LATIN SMALL LETTER E WITH ACUTE
+0x8F	0x00E8	# LATIN SMALL LETTER E WITH GRAVE
+0x90	0x00EA	# LATIN SMALL LETTER E WITH CIRCUMFLEX
+0x91	0x00EB	# LATIN SMALL LETTER E WITH DIAERESIS
+0x92	0x00ED	# LATIN SMALL LETTER I WITH ACUTE
+0x93	0x00EC	# LATIN SMALL LETTER I WITH GRAVE
+0x94	0x00EE	# LATIN SMALL LETTER I WITH CIRCUMFLEX
+0x95	0x00EF	# LATIN SMALL LETTER I WITH DIAERESIS
+0x96	0x00F1	# LATIN SMALL LETTER N WITH TILDE
+0x97	0x00F3	# LATIN SMALL LETTER O WITH ACUTE
+0x98	0x00F2	# LATIN SMALL LETTER O WITH GRAVE
+0x99	0x00F4	# LATIN SMALL LETTER O WITH CIRCUMFLEX
+0x9A	0x00F6	# LATIN SMALL LETTER O WITH DIAERESIS
+0x9B	0x00F5	# LATIN SMALL LETTER O WITH TILDE
+0x9C	0x00FA	# LATIN SMALL LETTER U WITH ACUTE
+0x9D	0x00F9	# LATIN SMALL LETTER U WITH GRAVE
+0x9E	0x00FB	# LATIN SMALL LETTER U WITH CIRCUMFLEX
+0x9F	0x00FC	# LATIN SMALL LETTER U WITH DIAERESIS
+0xA0	0x2020	# DAGGER
+0xA1	0x00B0	# DEGREE SIGN
+0xA2	0x00A2	# CENT SIGN
+0xA3	0x00A3	# POUND SIGN
+0xA4	0x00A7	# SECTION SIGN
+0xA5	0x2022	# BULLET
+0xA6	0x00B6	# PILCROW SIGN
+0xA7	0x00DF	# LATIN SMALL LETTER SHARP S
+0xA8	0x00AE	# REGISTERED SIGN
+0xA9	0x00A9	# COPYRIGHT SIGN
+0xAA	0x2122	# TRADE MARK SIGN
+0xAB	0x00B4	# ACUTE ACCENT
+0xAC	0x00A8	# DIAERESIS
+0xAD	0x2260	# NOT EQUAL TO
+0xAE	0x00C6	# LATIN CAPITAL LETTER AE
+0xAF	0x00D8	# LATIN CAPITAL LETTER O WITH STROKE
+0xB0	0x221E	# INFINITY
+0xB1	0x00B1	# PLUS-MINUS SIGN
+0xB2	0x2264	# LESS-THAN OR EQUAL TO
+0xB3	0x2265	# GREATER-THAN OR EQUAL TO
+0xB4	0x00A5	# YEN SIGN
+0xB5	0x00B5	# MICRO SIGN
+0xB6	0x2202	# PARTIAL DIFFERENTIAL
+0xB7	0x2211	# N-ARY SUMMATION
+0xB8	0x220F	# N-ARY PRODUCT
+0xB9	0x03C0	# GREEK SMALL LETTER PI
+0xBA	0x222B	# INTEGRAL
+0xBB	0x00AA	# FEMININE ORDINAL INDICATOR
+0xBC	0x00BA	# MASCULINE ORDINAL INDICATOR
+0xBD	0x03A9	# GREEK CAPITAL LETTER OMEGA
+0xBE	0x00E6	# LATIN SMALL LETTER AE
+0xBF	0x00F8	# LATIN SMALL LETTER O WITH STROKE
+0xC0	0x00BF	# INVERTED QUESTION MARK
+0xC1	0x00A1	# INVERTED EXCLAMATION MARK
+0xC2	0x00AC	# NOT SIGN
+0xC3	0x221A	# SQUARE ROOT
+0xC4	0x0192	# LATIN SMALL LETTER F WITH HOOK
+0xC5	0x2248	# ALMOST EQUAL TO
+0xC6	0x2206	# INCREMENT
+0xC7	0x00AB	# LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC8	0x00BB	# RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+0xC9	0x2026	# HORIZONTAL ELLIPSIS
+0xCA	0x00A0	# NO-BREAK SPACE
+0xCB	0x00C0	# LATIN CAPITAL LETTER A WITH GRAVE
+0xCC	0x00C3	# LATIN CAPITAL LETTER A WITH TILDE
+0xCD	0x00D5	# LATIN CAPITAL LETTER O WITH TILDE
+0xCE	0x0152	# LATIN CAPITAL LIGATURE OE
+0xCF	0x0153	# LATIN SMALL LIGATURE OE
+0xD0	0x2013	# EN DASH
+0xD1	0x2014	# EM DASH
+0xD2	0x201C	# LEFT DOUBLE QUOTATION MARK
+0xD3	0x201D	# RIGHT DOUBLE QUOTATION MARK
+0xD4	0x2018	# LEFT SINGLE QUOTATION MARK
+0xD5	0x2019	# RIGHT SINGLE QUOTATION MARK
+0xD6	0x00F7	# DIVISION SIGN
+0xD7	0x25CA	# LOZENGE
+0xD8	0x00FF	# LATIN SMALL LETTER Y WITH DIAERESIS
+0xD9	0x0178	# LATIN CAPITAL LETTER Y WITH DIAERESIS
+0xDA	0x011E	# LATIN CAPITAL LETTER G WITH BREVE
+0xDB	0x011F	# LATIN SMALL LETTER G WITH BREVE
+0xDC	0x0130	# LATIN CAPITAL LETTER I WITH DOT ABOVE
+0xDD	0x0131	# LATIN SMALL LETTER DOTLESS I
+0xDE	0x015E	# LATIN CAPITAL LETTER S WITH CEDILLA
+0xDF	0x015F	# LATIN SMALL LETTER S WITH CEDILLA
+0xE0	0x2021	# DOUBLE DAGGER
+0xE1	0x00B7	# MIDDLE DOT
+0xE2	0x201A	# SINGLE LOW-9 QUOTATION MARK
+0xE3	0x201E	# DOUBLE LOW-9 QUOTATION MARK
+0xE4	0x2030	# PER MILLE SIGN
+0xE5	0x00C2	# LATIN CAPITAL LETTER A WITH CIRCUMFLEX
+0xE6	0x00CA	# LATIN CAPITAL LETTER E WITH CIRCUMFLEX
+0xE7	0x00C1	# LATIN CAPITAL LETTER A WITH ACUTE
+0xE8	0x00CB	# LATIN CAPITAL LETTER E WITH DIAERESIS
+0xE9	0x00C8	# LATIN CAPITAL LETTER E WITH GRAVE
+0xEA	0x00CD	# LATIN CAPITAL LETTER I WITH ACUTE
+0xEB	0x00CE	# LATIN CAPITAL LETTER I WITH CIRCUMFLEX
+0xEC	0x00CF	# LATIN CAPITAL LETTER I WITH DIAERESIS
+0xED	0x00CC	# LATIN CAPITAL LETTER I WITH GRAVE
+0xEE	0x00D3	# LATIN CAPITAL LETTER O WITH ACUTE
+0xEF	0x00D4	# LATIN CAPITAL LETTER O WITH CIRCUMFLEX
+0xF0	0xF8FF	# Apple logo
+0xF1	0x00D2	# LATIN CAPITAL LETTER O WITH GRAVE
+0xF2	0x00DA	# LATIN CAPITAL LETTER U WITH ACUTE
+0xF3	0x00DB	# LATIN CAPITAL LETTER U WITH CIRCUMFLEX
+0xF4	0x00D9	# LATIN CAPITAL LETTER U WITH GRAVE
+0xF5	0xF8A0	# undefined1
+0xF6	0x02C6	# MODIFIER LETTER CIRCUMFLEX ACCENT
+0xF7	0x02DC	# SMALL TILDE
+0xF8	0x00AF	# MACRON
+0xF9	0x02D8	# BREVE
+0xFA	0x02D9	# DOT ABOVE
+0xFB	0x02DA	# RING ABOVE
+0xFC	0x00B8	# CEDILLA
+0xFD	0x02DD	# DOUBLE ACUTE ACCENT
+0xFE	0x02DB	# OGONEK
+0xFF	0x02C7	# CARON
--- a/charmap/UKRAINE.TXT
+++ b/charmap/UKRAINE.TXT
@ -0,0 +1,106 @@
+#=======================================================================
+#   File name:  UKRAINE.TXT
+#
+#   Contents:   Notes on Mac OS Ukrainian character set
+#
+#   Copyright:  (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
+#               reserved.
+#
+#   Contact:    charsets@apple.com
+#
+#   Changes:
+#
+#       c02  2005-Apr-05    Update header comments.
+#      b3,c1 2002-Dec-19    Update URLs. Matches internal utom<b1>.
+#       b02  1999-Sep-22    Encoding changed for Mac OS 9.0 to merge
+#                           with Mac OS Cyrillic and support EURO SIGN;
+#                           change mappings for 0xFF. For Mac OS 9.0
+#                           there is no longer a separate Mac OS
+#                           Ukrainian character set; the mappings are
+#                           in CYRILLIC.TXT. Update contact e-mail
+#                           address. Matches internal utom<b1>, ufrm<b1>,
+#                           and Text  Encoding Converter version 1.5.
+#       n04  1998-Feb-05    Update header comments to new format; no
+#                           mapping changes.  Matches internal utom<2>,
+#                           ufrm<13>, and Text Encoding Converter
+#                           version 1.3.
+#       n02  1995-Apr-15    First version (after fixing some typos).
+#                           Matches internal ufrm<4>.
+#
+# Standard header:
+# ----------------
+#
+#   Apple, the Apple logo, and Macintosh are trademarks of Apple
+#   Computer, Inc., registered in the United States and other countries.
+#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
+#   throughout this document, "Macintosh" can be used to refer to
+#   Macintosh computers and "Unicode" can be used to refer to the
+#   Unicode standard.
+#
+#   Apple Computer, Inc. ("Apple") makes no warranty or representation,
+#   either express or implied, with respect to this document and the
+#   included data, its quality, accuracy, or fitness for a particular
+#   purpose. In no event will Apple be liable for direct, indirect,
+#   special, incidental, or consequential damages resulting from any
+#   defect or inaccuracy in this document or the included data.
+#
+#   These mapping tables and character lists are subject to change.
+#   The latest tables should be available from the following:
+#
+#   <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
+#
+#   For general information about Mac OS encodings and these mapping
+#   tables, see the file "README.TXT".
+#
+# Notes on Mac OS Ukrainian and Mac OS Cyrillic:
+# ----------------------------------------------
+#
+#   Before Mac OS 9.0, there were two separate Slavic Cyrillic
+#   encodings for the Mac OS:
+#
+#   1. The Cyrillic currency sign variant (used for localized Russian
+#      and Bulgarian systems), which had the following:
+#	    0xA2  U+00A2 CENT SIGN
+#	    0xB6  U+2202 PARTIAL DIFFERENTIAL
+#	    0xFF  U+00A4 CURRENCY SIGN
+#
+#   2. The Ukrainian currency sign variant (used for localized Ukrainian
+#      systems and the pre-9.0 Cyrillic Language Kit), which had the
+#      following:
+#	    0xA2  U+0490 CYRILLIC CAPITAL LETTER GHE WITH UPTURN
+#	    0xB6  U+0491 CYRILLIC SMALL LETTER GHE WITH UPTURN
+#	    0xFF  U+00A4 CURRENCY SIGN
+#
+#   Before Mac OS 9.0, The Ukrainian currency sign variant shared the
+#   script code smCyrillic (7) with the Cyrillic currency sign variant. 
+#   The Ukrainian currency sign variant was being used if one of the
+#   following was true:
+#   - The system region code was 62, verUkraine (indicates Ukrainian
+#     localized system), or
+#   - The system script was not 7, smCyrillic (indicates Cyrillic
+#     Language Kit instead of localized system).
+#
+#   For Mac OS 9.0 and later, both currency sign variants were replaced
+#   with a new Euro sign version of Mac OS Cyrillic, which is similar to
+#   the old Ukrainian currency sign variant but changes 0xFF to EURO
+#   SIGN. Mappings for this are in CYRILLIC.TXT.
+#
+#   Note: There is a common glyph variation in Ukrainian, in which the
+#   glyph for CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I may or
+#   may not have a dot above.
+#
+# Details of mapping changes in each version:
+# -------------------------------------------
+#
+#   Changes from version n04 to version b02:
+#
+#   - Encoding changed for Mac OS 9.0 to merge with Mac OS Cyrillic and
+#   support EURO SIGN; 0xFF changed from U+00A4 to U+20AC. For Mac OS
+#   9.0 there is no longer a separate Mac OS Ukrainian character set, so
+#   the mappings here are deleted; see the mappings in CYRILLIC.TXT.
+#
+##################
+
+##################
+# For mappings, see CYRILLIC.TXT
+##################