mirror of
https://github.com/depp/syncfiles.git
synced 2025-02-22 09:28:58 +00:00
Add Apple Unicode mapping data
Mirrored from ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE
This commit is contained in:
parent
db4187b65b
commit
d2401b963a
536
charmap/ARABIC.TXT
Normal file
536
charmap/ARABIC.TXT
Normal file
@ -0,0 +1,536 @@
|
||||
#=======================================================================
|
||||
# File name: ARABIC.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Arabic
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1994-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-04 Update header comments. Matches internal xml
|
||||
# <c1.2> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Add comments about character display and
|
||||
# direction overrides. Update URLs, notes.
|
||||
# Matches internal utom<b4>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n10 1998-Feb-05 Show required Unicode character
|
||||
# directionality in a different way. Matches
|
||||
# internal utom<n4>, ufrm<n21>, and Text
|
||||
# Encoding Converter version 1.3. Update
|
||||
# header comments; include information on
|
||||
# loose mapping of digits.
|
||||
# n07 1997-Jul-17 Update to match internal utom<n2>, ufrm<n17>:
|
||||
# Change standard mapping for 0xC0 from U+066D
|
||||
# to U+274A. Add direction overrides to
|
||||
# mappings for 0x25, 0x2C, 0x3B, 0x3F. Add
|
||||
# information on variants.
|
||||
# n03 1995-Apr-18 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n11>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Arabic code (in hex as 0xNN).
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN),
|
||||
# possibly preceded by a tag indicating required directionality
|
||||
# (i.e. <LR>+0xNNNN or <RL>+0xNNNN).
|
||||
# Column #3 is a comment containing the Unicode name.
|
||||
#
|
||||
# The entries are in Mac OS Arabic code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Arabic character set uses the standard control characters at
|
||||
# 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Arabic:
|
||||
# -----------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# 1. General
|
||||
#
|
||||
# The Mac OS Arabic character set is intended to cover Arabic as
|
||||
# used in North Africa, the Arabian peninsula, and the Levant. It
|
||||
# also contains several characters needed for Urdu and/or Farsi.
|
||||
#
|
||||
# The Mac OS Arabic character set is essentially a superset of ISO
|
||||
# 8859-6. The 8859-6 code points that are interpreted differently
|
||||
# in the Mac OS Arabic set are as follows:
|
||||
# 0xA0 is NO-BREAK SPACE in 8859-6 and right-left SPACE in Mac OS
|
||||
# Arabic; NO-BREAK is 0x81 in Mac OS Arabic.
|
||||
# 0xA4 is CURRENCY SIGN in 8859-6 and right-left DOLLAR SIGN in
|
||||
# Mac OS Arabic.
|
||||
# 0xAD is SOFT HYPHEN in 8859-6 and right-left HYPHEN-MINUS in
|
||||
# Mac OS Arabic.
|
||||
# ISO 8859-6 specifies that codes 0x30-0x39 can be rendered either
|
||||
# with European digit shapes or Arabic digit shapes. This is also
|
||||
# true in Mac OS Arabic, which determines from context which digit
|
||||
# shapes to use (see below).
|
||||
#
|
||||
# The Mac OS Arabic character set uses the C1 controls area and other
|
||||
# code points which are undefined in ISO 8859-6 for additional
|
||||
# graphic characters: additional Arabic letters for Farsi and Urdu,
|
||||
# some accented Roman letters for European languages (such as French),
|
||||
# and duplicates of some of the punctuation, symbols, and digits in
|
||||
# the ASCII block. The duplicate punctuation, symbol, and digit
|
||||
# characters have right-left directionality, while the ASCII versions
|
||||
# have left-right directionality. See the next section for more
|
||||
# information on this.
|
||||
#
|
||||
# Mac OS Arabic characters 0xEB-0xF2 are non-spacing/combining marks.
|
||||
#
|
||||
# 2. Directional characters and roundtrip fidelity
|
||||
#
|
||||
# The Mac OS Arabic character set was developed in 1986-1987. At that
|
||||
# time the bidirectional line layout algorithm used in the Mac OS
|
||||
# Arabic system was fairly simple; it used only a few direction
|
||||
# classes (instead of the 19 now used in the Unicode bidirectional
|
||||
# algorithm). In order to permit users to handle some tricky layout
|
||||
# problems, certain punctuation and symbol characters were encoded
|
||||
# twice, one with a left-right direction attribute and the other with
|
||||
# a right-left direction attribute.
|
||||
#
|
||||
# For example, plus sign is encoded at 0x2B with a left-right
|
||||
# attribute, and at 0xAB with a right-left attribute. However, there
|
||||
# is only one PLUS SIGN character in Unicode. This leads to some
|
||||
# interesting problems when mapping between Mac OS Arabic and Unicode;
|
||||
# see below.
|
||||
#
|
||||
# A related problem is that even when a particular character is
|
||||
# encoded only once in Mac OS Arabic, it may have a different
|
||||
# direction attribute than the corresponding Unicode character.
|
||||
#
|
||||
# For example, the Mac OS Arabic character at 0x93 is HORIZONTAL
|
||||
# ELLIPSIS with strong right-left direction. However, the Unicode
|
||||
# character HORIZONTAL ELLIPSIS has direction class neutral.
|
||||
#
|
||||
# 3. Behavior of ASCII-range numbers in WorldScript
|
||||
#
|
||||
# Mac OS Arabic also has two sets of digit codes.
|
||||
#
|
||||
# The digits at 0x30-0x39 may be displayed using either European
|
||||
# digit forms or Arabic digit forms, depending on context. If there
|
||||
# is a "strong European" character such as a Latin letter on either
|
||||
# side of a sequence consisting of digits 0x30-0x39 and possibly comma
|
||||
# 0x2C or period 0x2E, then the characters will be displayed using
|
||||
# European forms (This will happen even if there are neutral characters
|
||||
# between the digits and the strong European character). Otherwise, the
|
||||
# digits will be displayed using Arabic forms, the comma will be
|
||||
# displayed as Arabic thousands separator, and the period as Arabic
|
||||
# decimal separator. In any case, 0x2C, 0x2E, and 0x30-0x39 are always
|
||||
# left-right.
|
||||
#
|
||||
# The digits at 0xB0-0xB9 are always displayed using Arabic digit
|
||||
# shapes, and moreover, these digits always have strong right-left
|
||||
# directionality. These are mainly intended for special layout
|
||||
# purposes such as part numbers, etc.
|
||||
#
|
||||
# 4. Font variants
|
||||
#
|
||||
# The table in this file gives the Unicode mappings for the standard
|
||||
# Mac OS Arabic encoding. This encoding is supported by the Cairo font
|
||||
# (the system font for Arabic), and is the encoding supported by the
|
||||
# text processing utilities. However, the other Arabic fonts actually
|
||||
# implement slightly different encodings; this mainly affects the code
|
||||
# points 0xAA and 0xC0. For these code points the standard Mac OS
|
||||
# Arabic encoding has the following mappings:
|
||||
# 0xAA -> <RL>+0x002A ASTERISK, right-left
|
||||
# 0xC0 -> <RL>+0x274A EIGHT TEARDROP-SPOKED PROPELLER ASTERISK,
|
||||
# right-left
|
||||
# This mapping of 0xAA is consistent with the normal convention for
|
||||
# Mac OS Arabic and Hebrew that the right-left duplicates have codes
|
||||
# that are equal to the ASCII code of the left-right character plus
|
||||
# 0x80. However, in all of the other fonts, 0xAA is MULTIPLY SIGN, and
|
||||
# right-left ASTERISK may be at a different code point. The other
|
||||
# variants are described below.
|
||||
#
|
||||
# The TrueType variant is used for most of the Arabic TrueType fonts:
|
||||
# Baghdad, Geeza, Kufi, Nadeem. It differs from the standard variant
|
||||
# in the following way:
|
||||
# 0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
|
||||
# 0xC0 -> <RL>+0x002A ASTERISK, right-left
|
||||
#
|
||||
# The Thuluth variant is used for the Arabic Postscript-only fonts:
|
||||
# Thuluth and Thuluth bold. It differs from the standard variant in
|
||||
# the following way:
|
||||
# 0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
|
||||
# 0xC0 -> 0x066D ARABIC FIVE POINTED STAR
|
||||
#
|
||||
# The AlBayan variant is used for the Arabic TrueType font Al Bayan.
|
||||
# It differs from the standard variant in the following way:
|
||||
# 0x81 -> no mapping (glyph just has authorship information, etc.)
|
||||
# 0xA3 -> 0xFDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
|
||||
# 0xA4 -> 0xFDF2 ARABIC LIGATURE ALLAH ISOLATED FORM
|
||||
# 0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
|
||||
# 0xDC -> <RL>+0x25CF BLACK CIRCLE, right-left
|
||||
# 0xFC -> <RL>+0x25A0 BLACK SQUARE, right-left
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# 1. Matching the direction of Mac OS Arabic characters
|
||||
#
|
||||
# When Mac OS Arabic encodes a character twice but with different
|
||||
# direction attributes for the two code points - as in the case of
|
||||
# plus sign mentioned above - we need a way to map both Mac OS Arabic
|
||||
# code points to Unicode and back again without loss of information.
|
||||
# With the plus sign, for example, mapping one of the Mac OS Arabic
|
||||
# characters to a code in the Unicode corporate use zone is
|
||||
# undesirable, since both of the plus sign characters are likely to
|
||||
# be used in text that is interchanged.
|
||||
#
|
||||
# The problem is solved with the use of direction override characters
|
||||
# and direction-dependent mappings. When mapping from Mac OS Arabic
|
||||
# to Unicode, we use direction overrides as necessary to force the
|
||||
# direction of the resulting Unicode characters.
|
||||
#
|
||||
# The required direction is indicated by a direction tag in the
|
||||
# mappings. A tag of <LR> means the corresponding Unicode character
|
||||
# must have a strong left-right context, and a tag of <RL> indicates
|
||||
# a right-left context.
|
||||
#
|
||||
# For example, the mapping of 0x2B is given as <LR>+0x002B; the
|
||||
# mapping of 0xAB is given as <RL>+0x002B. If we map an isolated
|
||||
# instance of 0x2B to Unicode, it should be mapped as follows (LRO
|
||||
# indicates LEFT-RIGHT OVERRIDE, PDF indicates POP DIRECTION
|
||||
# FORMATTING):
|
||||
#
|
||||
# 0x2B -> 0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
|
||||
#
|
||||
# When mapping several characters in a row that require direction
|
||||
# forcing, the overrides need only be used at the beginning and end.
|
||||
# For example:
|
||||
#
|
||||
# 0x24 0x20 0x28 0x29 -> 0x202D 0x0024 0x0020 0x0028 0x0029 0x202C
|
||||
#
|
||||
# If neutral characters that require direction forcing are already
|
||||
# between strong-direction characters with matching directionality,
|
||||
# then direction overrides need not be used. Direction overrides are
|
||||
# always needed to map the right-left digits at 0xB0-0xB9.
|
||||
#
|
||||
# When mapping from Unicode to Mac OS Arabic, the Unicode
|
||||
# bidirectional algorithm should be used to determine resolved
|
||||
# direction of the Unicode characters. The mapping from Unicode to
|
||||
# Mac OS Arabic can then be disambiguated by the use of the resolved
|
||||
# direction:
|
||||
#
|
||||
# Unicode 0x002B -> Mac OS Arabic 0x2B (if L) or 0xAB (if R)
|
||||
#
|
||||
# However, this also means the direction override characters should
|
||||
# be discarded when mapping from Unicode to Mac OS Arabic (after
|
||||
# they have been used to determine resolved direction), since the
|
||||
# direction override information is carried by the code point itself.
|
||||
#
|
||||
# Even when direction overrides are not needed for roundtrip
|
||||
# fidelity, they are sometimes used when mapping Mac OS Arabic
|
||||
# characters to Unicode in order to achieve similar text layout with
|
||||
# the resulting Unicode text. For example, the single Mac OS Arabic
|
||||
# ellipsis character has direction class right-left,and there is no
|
||||
# left-right version. However, the Unicode HORIZONTAL ELLIPSIS
|
||||
# character has direction class neutral (which means it may end up
|
||||
# with a resolved direction of left-right if surrounded by left-right
|
||||
# characters). When mapping the Mac OS Arabic ellipsis to Unicode, it
|
||||
# is surrounded with a direction override to help preserve proper
|
||||
# text layout. The resolved direction is not needed or used when
|
||||
# mapping the Unicode HORIZONTAL ELLIPSIS back to Mac OS Arabic.
|
||||
#
|
||||
# 2. Mapping the Mac OS Arabic digits
|
||||
#
|
||||
# The main table below contains mappings that should be used when
|
||||
# strict round-trip fidelity is required. However, for numeric
|
||||
# values, the mappings in that table will produce Unicode characters
|
||||
# that may appear different than the Mac OS Arabic text displayed on
|
||||
# a Mac OS system using WorldScript. This is because WorldScript
|
||||
# uses context-dependent display for the 0x30-0x39 digits.
|
||||
#
|
||||
# If roundtrip fidelity is not required, then the following
|
||||
# alternate mappings should be used when a sequence of 0x30-0x39
|
||||
# digits - possibly including 0x2C and 0x2E - occurs in an Arabic
|
||||
# context (that is, when the first "strong" character on either side
|
||||
# of the digit sequence is Arabic, or there is no strong character):
|
||||
#
|
||||
# 0x2C 0x066C # ARABIC THOUSANDS SEPARATOR
|
||||
# 0x2E 0x066B # ARABIC DECIMAL SEPARATOR
|
||||
# 0x30 0x0660 # ARABIC-INDIC DIGIT ZERO
|
||||
# 0x31 0x0661 # ARABIC-INDIC DIGIT ONE
|
||||
# 0x32 0x0662 # ARABIC-INDIC DIGIT TWO
|
||||
# 0x33 0x0663 # ARABIC-INDIC DIGIT THREE
|
||||
# 0x34 0x0664 # ARABIC-INDIC DIGIT FOUR
|
||||
# 0x35 0x0665 # ARABIC-INDIC DIGIT FIVE
|
||||
# 0x36 0x0666 # ARABIC-INDIC DIGIT SIX
|
||||
# 0x37 0x0667 # ARABIC-INDIC DIGIT SEVEN
|
||||
# 0x38 0x0668 # ARABIC-INDIC DIGIT EIGHT
|
||||
# 0x39 0x0669 # ARABIC-INDIC DIGIT NINE
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version n03 to version n07:
|
||||
#
|
||||
# - Change mapping for 0xC0 from U+066D to U+274A.
|
||||
#
|
||||
# - Add direction overrides (required directionality) to mappings
|
||||
# for 0x25, 0x2C, 0x3B, 0x3F.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 <LR>+0x0020 # SPACE, left-right
|
||||
0x21 <LR>+0x0021 # EXCLAMATION MARK, left-right
|
||||
0x22 <LR>+0x0022 # QUOTATION MARK, left-right
|
||||
0x23 <LR>+0x0023 # NUMBER SIGN, left-right
|
||||
0x24 <LR>+0x0024 # DOLLAR SIGN, left-right
|
||||
0x25 <LR>+0x0025 # PERCENT SIGN, left-right
|
||||
0x26 <LR>+0x0026 # AMPERSAND, left-right
|
||||
0x27 <LR>+0x0027 # APOSTROPHE, left-right
|
||||
0x28 <LR>+0x0028 # LEFT PARENTHESIS, left-right
|
||||
0x29 <LR>+0x0029 # RIGHT PARENTHESIS, left-right
|
||||
0x2A <LR>+0x002A # ASTERISK, left-right
|
||||
0x2B <LR>+0x002B # PLUS SIGN, left-right
|
||||
0x2C <LR>+0x002C # COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
|
||||
0x2D <LR>+0x002D # HYPHEN-MINUS, left-right
|
||||
0x2E <LR>+0x002E # FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
|
||||
0x2F <LR>+0x002F # SOLIDUS, left-right
|
||||
0x30 0x0030 # DIGIT ZERO; in Arabic-script context, displayed as 0x0660 ARABIC-INDIC DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE; in Arabic-script context, displayed as 0x0661 ARABIC-INDIC DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO; in Arabic-script context, displayed as 0x0662 ARABIC-INDIC DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE; in Arabic-script context, displayed as 0x0663 ARABIC-INDIC DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR; in Arabic-script context, displayed as 0x0664 ARABIC-INDIC DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE; in Arabic-script context, displayed as 0x0665 ARABIC-INDIC DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX; in Arabic-script context, displayed as 0x0666 ARABIC-INDIC DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN; in Arabic-script context, displayed as 0x0667 ARABIC-INDIC DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT; in Arabic-script context, displayed as 0x0668 ARABIC-INDIC DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE; in Arabic-script context, displayed as 0x0669 ARABIC-INDIC DIGIT NINE
|
||||
0x3A <LR>+0x003A # COLON, left-right
|
||||
0x3B <LR>+0x003B # SEMICOLON, left-right
|
||||
0x3C <LR>+0x003C # LESS-THAN SIGN, left-right
|
||||
0x3D <LR>+0x003D # EQUALS SIGN, left-right
|
||||
0x3E <LR>+0x003E # GREATER-THAN SIGN, left-right
|
||||
0x3F <LR>+0x003F # QUESTION MARK, left-right
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B <LR>+0x005B # LEFT SQUARE BRACKET, left-right
|
||||
0x5C <LR>+0x005C # REVERSE SOLIDUS, left-right
|
||||
0x5D <LR>+0x005D # RIGHT SQUARE BRACKET, left-right
|
||||
0x5E <LR>+0x005E # CIRCUMFLEX ACCENT, left-right
|
||||
0x5F <LR>+0x005F # LOW LINE, left-right
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B <LR>+0x007B # LEFT CURLY BRACKET, left-right
|
||||
0x7C <LR>+0x007C # VERTICAL LINE, left-right
|
||||
0x7D <LR>+0x007D # RIGHT CURLY BRACKET, left-right
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 <RL>+0x00A0 # NO-BREAK SPACE, right-left
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x06BA # ARABIC LETTER NOON GHUNNA
|
||||
0x8C <RL>+0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 <RL>+0x2026 # HORIZONTAL ELLIPSIS, right-left
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 <RL>+0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B <RL>+0x00F7 # DIVISION SIGN, right-left
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 <RL>+0x0020 # SPACE, right-left
|
||||
0xA1 <RL>+0x0021 # EXCLAMATION MARK, right-left
|
||||
0xA2 <RL>+0x0022 # QUOTATION MARK, right-left
|
||||
0xA3 <RL>+0x0023 # NUMBER SIGN, right-left
|
||||
0xA4 <RL>+0x0024 # DOLLAR SIGN, right-left
|
||||
0xA5 0x066A # ARABIC PERCENT SIGN
|
||||
0xA6 <RL>+0x0026 # AMPERSAND, right-left
|
||||
0xA7 <RL>+0x0027 # APOSTROPHE, right-left
|
||||
0xA8 <RL>+0x0028 # LEFT PARENTHESIS, right-left
|
||||
0xA9 <RL>+0x0029 # RIGHT PARENTHESIS, right-left
|
||||
0xAA <RL>+0x002A # ASTERISK, right-left
|
||||
0xAB <RL>+0x002B # PLUS SIGN, right-left
|
||||
0xAC 0x060C # ARABIC COMMA
|
||||
0xAD <RL>+0x002D # HYPHEN-MINUS, right-left
|
||||
0xAE <RL>+0x002E # FULL STOP, right-left
|
||||
0xAF <RL>+0x002F # SOLIDUS, right-left
|
||||
0xB0 <RL>+0x0660 # ARABIC-INDIC DIGIT ZERO, right-left (need override)
|
||||
0xB1 <RL>+0x0661 # ARABIC-INDIC DIGIT ONE, right-left (need override)
|
||||
0xB2 <RL>+0x0662 # ARABIC-INDIC DIGIT TWO, right-left (need override)
|
||||
0xB3 <RL>+0x0663 # ARABIC-INDIC DIGIT THREE, right-left (need override)
|
||||
0xB4 <RL>+0x0664 # ARABIC-INDIC DIGIT FOUR, right-left (need override)
|
||||
0xB5 <RL>+0x0665 # ARABIC-INDIC DIGIT FIVE, right-left (need override)
|
||||
0xB6 <RL>+0x0666 # ARABIC-INDIC DIGIT SIX, right-left (need override)
|
||||
0xB7 <RL>+0x0667 # ARABIC-INDIC DIGIT SEVEN, right-left (need override)
|
||||
0xB8 <RL>+0x0668 # ARABIC-INDIC DIGIT EIGHT, right-left (need override)
|
||||
0xB9 <RL>+0x0669 # ARABIC-INDIC DIGIT NINE, right-left (need override)
|
||||
0xBA <RL>+0x003A # COLON, right-left
|
||||
0xBB 0x061B # ARABIC SEMICOLON
|
||||
0xBC <RL>+0x003C # LESS-THAN SIGN, right-left
|
||||
0xBD <RL>+0x003D # EQUALS SIGN, right-left
|
||||
0xBE <RL>+0x003E # GREATER-THAN SIGN, right-left
|
||||
0xBF 0x061F # ARABIC QUESTION MARK
|
||||
0xC0 <RL>+0x274A # EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
|
||||
0xC1 0x0621 # ARABIC LETTER HAMZA
|
||||
0xC2 0x0622 # ARABIC LETTER ALEF WITH MADDA ABOVE
|
||||
0xC3 0x0623 # ARABIC LETTER ALEF WITH HAMZA ABOVE
|
||||
0xC4 0x0624 # ARABIC LETTER WAW WITH HAMZA ABOVE
|
||||
0xC5 0x0625 # ARABIC LETTER ALEF WITH HAMZA BELOW
|
||||
0xC6 0x0626 # ARABIC LETTER YEH WITH HAMZA ABOVE
|
||||
0xC7 0x0627 # ARABIC LETTER ALEF
|
||||
0xC8 0x0628 # ARABIC LETTER BEH
|
||||
0xC9 0x0629 # ARABIC LETTER TEH MARBUTA
|
||||
0xCA 0x062A # ARABIC LETTER TEH
|
||||
0xCB 0x062B # ARABIC LETTER THEH
|
||||
0xCC 0x062C # ARABIC LETTER JEEM
|
||||
0xCD 0x062D # ARABIC LETTER HAH
|
||||
0xCE 0x062E # ARABIC LETTER KHAH
|
||||
0xCF 0x062F # ARABIC LETTER DAL
|
||||
0xD0 0x0630 # ARABIC LETTER THAL
|
||||
0xD1 0x0631 # ARABIC LETTER REH
|
||||
0xD2 0x0632 # ARABIC LETTER ZAIN
|
||||
0xD3 0x0633 # ARABIC LETTER SEEN
|
||||
0xD4 0x0634 # ARABIC LETTER SHEEN
|
||||
0xD5 0x0635 # ARABIC LETTER SAD
|
||||
0xD6 0x0636 # ARABIC LETTER DAD
|
||||
0xD7 0x0637 # ARABIC LETTER TAH
|
||||
0xD8 0x0638 # ARABIC LETTER ZAH
|
||||
0xD9 0x0639 # ARABIC LETTER AIN
|
||||
0xDA 0x063A # ARABIC LETTER GHAIN
|
||||
0xDB <RL>+0x005B # LEFT SQUARE BRACKET, right-left
|
||||
0xDC <RL>+0x005C # REVERSE SOLIDUS, right-left
|
||||
0xDD <RL>+0x005D # RIGHT SQUARE BRACKET, right-left
|
||||
0xDE <RL>+0x005E # CIRCUMFLEX ACCENT, right-left
|
||||
0xDF <RL>+0x005F # LOW LINE, right-left
|
||||
0xE0 0x0640 # ARABIC TATWEEL
|
||||
0xE1 0x0641 # ARABIC LETTER FEH
|
||||
0xE2 0x0642 # ARABIC LETTER QAF
|
||||
0xE3 0x0643 # ARABIC LETTER KAF
|
||||
0xE4 0x0644 # ARABIC LETTER LAM
|
||||
0xE5 0x0645 # ARABIC LETTER MEEM
|
||||
0xE6 0x0646 # ARABIC LETTER NOON
|
||||
0xE7 0x0647 # ARABIC LETTER HEH
|
||||
0xE8 0x0648 # ARABIC LETTER WAW
|
||||
0xE9 0x0649 # ARABIC LETTER ALEF MAKSURA
|
||||
0xEA 0x064A # ARABIC LETTER YEH
|
||||
0xEB 0x064B # ARABIC FATHATAN
|
||||
0xEC 0x064C # ARABIC DAMMATAN
|
||||
0xED 0x064D # ARABIC KASRATAN
|
||||
0xEE 0x064E # ARABIC FATHA
|
||||
0xEF 0x064F # ARABIC DAMMA
|
||||
0xF0 0x0650 # ARABIC KASRA
|
||||
0xF1 0x0651 # ARABIC SHADDA
|
||||
0xF2 0x0652 # ARABIC SUKUN
|
||||
0xF3 0x067E # ARABIC LETTER PEH
|
||||
0xF4 0x0679 # ARABIC LETTER TTEH
|
||||
0xF5 0x0686 # ARABIC LETTER TCHEH
|
||||
0xF6 0x06D5 # ARABIC LETTER AE
|
||||
0xF7 0x06A4 # ARABIC LETTER VEH
|
||||
0xF8 0x06AF # ARABIC LETTER GAF
|
||||
0xF9 0x0688 # ARABIC LETTER DDAL
|
||||
0xFA 0x0691 # ARABIC LETTER RREH
|
||||
0xFB <RL>+0x007B # LEFT CURLY BRACKET, right-left
|
||||
0xFC <RL>+0x007C # VERTICAL LINE, right-left
|
||||
0xFD <RL>+0x007D # RIGHT CURLY BRACKET, right-left
|
||||
0xFE 0x0698 # ARABIC LETTER JEH
|
||||
0xFF 0x06D2 # ARABIC LETTER YEH BARREE
|
328
charmap/CELTIC.TXT
Normal file
328
charmap/CELTIC.TXT
Normal file
@ -0,0 +1,328 @@
|
||||
#=======================================================================
|
||||
# File name: CELTIC.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Celtic
|
||||
# character set to Unicode 2.1 and later
|
||||
#
|
||||
# Contacts: charsets@apple.com, everson@evertype.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c01 2005-Apr-01 First posted version. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Celtic code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Celtic code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Celtic character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Celtic (partly from Michael Everson):
|
||||
# -----------------------------------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# This character set was developed by Michael Everson of Everson
|
||||
# Typography (everson@evertype.com) and was used for the Irish
|
||||
# localizations of Mac OS 6.0.8 and 7.1, for the Welsh localization of
|
||||
# Mac OS 7.1, and for several fonts that can be used on any version of
|
||||
# Mac OS 7.1 or later. Note that while Apple authorized
|
||||
# the Irish and Welsh localizations mentioned above, they were not
|
||||
# systems which shipped with Apple hardware, and were not otherwise
|
||||
# supported by Apple. Fonts conforming to the Mac OS Celtic character
|
||||
# set are available from Everson Typography (http://www.evertype.com)
|
||||
# and MEU Cymru (http://www.meucymru.co.uk). Information about the use
|
||||
# of this character set is available at
|
||||
# http://www.evertype.com/celtscript/celtcode.html.
|
||||
#
|
||||
# The Mac OS Celtic encoding shares the script code smRoman (0) with
|
||||
# the standard Mac OS Roman encoding. To determine if the Celtic
|
||||
# encoding is being used in Mac OS 7-9, you should also check if the
|
||||
# system region code is 50, verIreland, or 79, verWales. Otherwise,
|
||||
# you can check for particular fonts that conform to this encoding.
|
||||
#
|
||||
# This character set is a variant of standard Mac OS Roman, adding
|
||||
# capital and small y with acute, grave, and circumflex, and capital
|
||||
# and small w with acute, grave, circumflex and diaeresis. It has 14
|
||||
# code point differences from standard Mac OS Roman (0xDE, 0xDF, 0xE2,
|
||||
# 0xE3, 0xF6-0xFF).
|
||||
#
|
||||
# Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
|
||||
# mapped to U+00A4. In Mac OS 8.5 and later versions, code point
|
||||
# 0xDB is changed to EURO SIGN and maps to U+20AC; the standard
|
||||
# Apple fonts were updated for Mac OS 8.5 to reflect this. There is
|
||||
# a "currency sign" variant of the Mac OS Celtic encoding that still
|
||||
# maps 0xDB to U+00A4; this can be used for older fonts.
|
||||
# Note: U+20AC is new with Unicode 2.1; for earlier Unicode
|
||||
# versions, Mac OS Celtic 0xDB may be mapped to private-use
|
||||
# character U+F8A0.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x00C5 # LATIN CAPITAL LETTER A WITH RING ABOVE
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x00E3 # LATIN SMALL LETTER A WITH TILDE
|
||||
0x8C 0x00E5 # LATIN SMALL LETTER A WITH RING ABOVE
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 0x00EC # LATIN SMALL LETTER I WITH GRAVE
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 0x00F2 # LATIN SMALL LETTER O WITH GRAVE
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00F5 # LATIN SMALL LETTER O WITH TILDE
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 0x2020 # DAGGER
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x00A2 # CENT SIGN
|
||||
0xA3 0x00A3 # POUND SIGN
|
||||
0xA4 0x00A7 # SECTION SIGN
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x00DF # LATIN SMALL LETTER SHARP S
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x00B4 # ACUTE ACCENT
|
||||
0xAC 0x00A8 # DIAERESIS
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x00C6 # LATIN CAPITAL LETTER AE
|
||||
0xAF 0x00D8 # LATIN CAPITAL LETTER O WITH STROKE
|
||||
0xB0 0x221E # INFINITY
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x00A5 # YEN SIGN
|
||||
0xB5 0x00B5 # MICRO SIGN
|
||||
0xB6 0x2202 # PARTIAL DIFFERENTIAL
|
||||
0xB7 0x2211 # N-ARY SUMMATION
|
||||
0xB8 0x220F # N-ARY PRODUCT
|
||||
0xB9 0x03C0 # GREEK SMALL LETTER PI
|
||||
0xBA 0x222B # INTEGRAL
|
||||
0xBB 0x00AA # FEMININE ORDINAL INDICATOR
|
||||
0xBC 0x00BA # MASCULINE ORDINAL INDICATOR
|
||||
0xBD 0x03A9 # GREEK CAPITAL LETTER OMEGA
|
||||
0xBE 0x00E6 # LATIN SMALL LETTER AE
|
||||
0xBF 0x00F8 # LATIN SMALL LETTER O WITH STROKE
|
||||
0xC0 0x00BF # INVERTED QUESTION MARK
|
||||
0xC1 0x00A1 # INVERTED EXCLAMATION MARK
|
||||
0xC2 0x00AC # NOT SIGN
|
||||
0xC3 0x221A # SQUARE ROOT
|
||||
0xC4 0x0192 # LATIN SMALL LETTER F WITH HOOK
|
||||
0xC5 0x2248 # ALMOST EQUAL TO
|
||||
0xC6 0x2206 # INCREMENT
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x00C0 # LATIN CAPITAL LETTER A WITH GRAVE
|
||||
0xCC 0x00C3 # LATIN CAPITAL LETTER A WITH TILDE
|
||||
0xCD 0x00D5 # LATIN CAPITAL LETTER O WITH TILDE
|
||||
0xCE 0x0152 # LATIN CAPITAL LIGATURE OE
|
||||
0xCF 0x0153 # LATIN SMALL LIGATURE OE
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x00F7 # DIVISION SIGN
|
||||
0xD7 0x25CA # LOZENGE
|
||||
0xD8 0x00FF # LATIN SMALL LETTER Y WITH DIAERESIS
|
||||
0xD9 0x0178 # LATIN CAPITAL LETTER Y WITH DIAERESIS
|
||||
0xDA 0x2044 # FRACTION SLASH
|
||||
0xDB 0x20AC # EURO SIGN # before Mac OS 8.5 this was U+00A4 CURRENCY SIGN
|
||||
0xDC 0x2039 # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
|
||||
0xDD 0x203A # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
|
||||
0xDE 0x0176 # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
|
||||
0xDF 0x0177 # LATIN SMALL LETTER Y WITH CIRCUMFLEX
|
||||
0xE0 0x2021 # DOUBLE DAGGER
|
||||
0xE1 0x00B7 # MIDDLE DOT
|
||||
0xE2 0x1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
|
||||
0xE3 0x1EF3 # LATIN SMALL LETTER Y WITH GRAVE
|
||||
0xE4 0x2030 # PER MILLE SIGN
|
||||
0xE5 0x00C2 # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
|
||||
0xE6 0x00CA # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
|
||||
0xE7 0x00C1 # LATIN CAPITAL LETTER A WITH ACUTE
|
||||
0xE8 0x00CB # LATIN CAPITAL LETTER E WITH DIAERESIS
|
||||
0xE9 0x00C8 # LATIN CAPITAL LETTER E WITH GRAVE
|
||||
0xEA 0x00CD # LATIN CAPITAL LETTER I WITH ACUTE
|
||||
0xEB 0x00CE # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
|
||||
0xEC 0x00CF # LATIN CAPITAL LETTER I WITH DIAERESIS
|
||||
0xED 0x00CC # LATIN CAPITAL LETTER I WITH GRAVE
|
||||
0xEE 0x00D3 # LATIN CAPITAL LETTER O WITH ACUTE
|
||||
0xEF 0x00D4 # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
|
||||
0xF0 0x2663 # BLACK CLUB SUIT = shamrock # future mapping U+2618 SHAMROCK
|
||||
0xF1 0x00D2 # LATIN CAPITAL LETTER O WITH GRAVE
|
||||
0xF2 0x00DA # LATIN CAPITAL LETTER U WITH ACUTE
|
||||
0xF3 0x00DB # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
|
||||
0xF4 0x00D9 # LATIN CAPITAL LETTER U WITH GRAVE
|
||||
0xF5 0x0131 # LATIN SMALL LETTER DOTLESS I
|
||||
0xF6 0x00DD # LATIN CAPITAL LETTER Y WITH ACUTE
|
||||
0xF7 0x00FD # LATIN SMALL LETTER Y WITH ACUTE
|
||||
0xF8 0x0174 # LATIN CAPITAL LETTER W WITH CIRCUMFLEX
|
||||
0xF9 0x0175 # LATIN SMALL LETTER W WITH CIRCUMFLEX
|
||||
0xFA 0x1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
|
||||
0xFB 0x1E85 # LATIN SMALL LETTER W WITH DIAERESIS
|
||||
0xFC 0x1E80 # LATIN CAPITAL LETTER W WITH GRAVE
|
||||
0xFD 0x1E81 # LATIN SMALL LETTER W WITH GRAVE
|
||||
0xFE 0x1E82 # LATIN CAPITAL LETTER W WITH ACUTE
|
||||
0xFF 0x1E83 # LATIN SMALL LETTER W WITH ACUTE
|
327
charmap/CENTEURO.TXT
Normal file
327
charmap/CENTEURO.TXT
Normal file
@ -0,0 +1,327 @@
|
||||
#=======================================================================
|
||||
# File name: CENTEURO.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Central European
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-04 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update URLs. Matches internal utom<b1>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n05 1998-Feb-05 Update header comments to new format; no
|
||||
# mapping changes. Matches internal utom<n3>,
|
||||
# ufrm<n13>, and Text Encoding Converter
|
||||
# version 1.3.
|
||||
# n03 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n5>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Central European code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Central European code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Central European character set uses the standard control
|
||||
# characters at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Central European:
|
||||
# ---------------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported directly in programming
|
||||
# interfaces for QuickDraw Text, the Script Manager, and related
|
||||
# Text Utilities. For other purposes it is supported via transcoding
|
||||
# to and from Unicode.
|
||||
#
|
||||
# This character set is intended to cover the following languages:
|
||||
#
|
||||
# Polish, Czech, Slovak, Hungarian, Estonian, Latvian, Lithuanian
|
||||
#
|
||||
# These are written in Latin script, but using a different set of
|
||||
# of accented characters than Mac OS Roman. The Mac OS Central
|
||||
# European character set also includes a number of characters
|
||||
# needed for the Mac OS user interface and localization (e.g.
|
||||
# ellipsis, bullet, copyright sign), several typographic
|
||||
# punctuation symbols, math symbols, etc. However, it has a
|
||||
# smaller set of punctuation and symbols than Mac OS Roman. All of
|
||||
# the characters in Mac OS Central European that are also in the
|
||||
# Mac OS Roman character set are at the same code point in both
|
||||
# character sets; this improves application compatibility.
|
||||
#
|
||||
# Note: This does not have the same letter repertoire as ISO
|
||||
# 8859-2 (Latin-2); each has some accented letters that the other
|
||||
# does not have.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x0100 # LATIN CAPITAL LETTER A WITH MACRON
|
||||
0x82 0x0101 # LATIN SMALL LETTER A WITH MACRON
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x0104 # LATIN CAPITAL LETTER A WITH OGONEK
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x0105 # LATIN SMALL LETTER A WITH OGONEK
|
||||
0x89 0x010C # LATIN CAPITAL LETTER C WITH CARON
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x010D # LATIN SMALL LETTER C WITH CARON
|
||||
0x8C 0x0106 # LATIN CAPITAL LETTER C WITH ACUTE
|
||||
0x8D 0x0107 # LATIN SMALL LETTER C WITH ACUTE
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x0179 # LATIN CAPITAL LETTER Z WITH ACUTE
|
||||
0x90 0x017A # LATIN SMALL LETTER Z WITH ACUTE
|
||||
0x91 0x010E # LATIN CAPITAL LETTER D WITH CARON
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 0x010F # LATIN SMALL LETTER D WITH CARON
|
||||
0x94 0x0112 # LATIN CAPITAL LETTER E WITH MACRON
|
||||
0x95 0x0113 # LATIN SMALL LETTER E WITH MACRON
|
||||
0x96 0x0116 # LATIN CAPITAL LETTER E WITH DOT ABOVE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 0x0117 # LATIN SMALL LETTER E WITH DOT ABOVE
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00F5 # LATIN SMALL LETTER O WITH TILDE
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x011A # LATIN CAPITAL LETTER E WITH CARON
|
||||
0x9E 0x011B # LATIN SMALL LETTER E WITH CARON
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 0x2020 # DAGGER
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x0118 # LATIN CAPITAL LETTER E WITH OGONEK
|
||||
0xA3 0x00A3 # POUND SIGN
|
||||
0xA4 0x00A7 # SECTION SIGN
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x00DF # LATIN SMALL LETTER SHARP S
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x0119 # LATIN SMALL LETTER E WITH OGONEK
|
||||
0xAC 0x00A8 # DIAERESIS
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x0123 # LATIN SMALL LETTER G WITH CEDILLA
|
||||
0xAF 0x012E # LATIN CAPITAL LETTER I WITH OGONEK
|
||||
0xB0 0x012F # LATIN SMALL LETTER I WITH OGONEK
|
||||
0xB1 0x012A # LATIN CAPITAL LETTER I WITH MACRON
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x012B # LATIN SMALL LETTER I WITH MACRON
|
||||
0xB5 0x0136 # LATIN CAPITAL LETTER K WITH CEDILLA
|
||||
0xB6 0x2202 # PARTIAL DIFFERENTIAL
|
||||
0xB7 0x2211 # N-ARY SUMMATION
|
||||
0xB8 0x0142 # LATIN SMALL LETTER L WITH STROKE
|
||||
0xB9 0x013B # LATIN CAPITAL LETTER L WITH CEDILLA
|
||||
0xBA 0x013C # LATIN SMALL LETTER L WITH CEDILLA
|
||||
0xBB 0x013D # LATIN CAPITAL LETTER L WITH CARON
|
||||
0xBC 0x013E # LATIN SMALL LETTER L WITH CARON
|
||||
0xBD 0x0139 # LATIN CAPITAL LETTER L WITH ACUTE
|
||||
0xBE 0x013A # LATIN SMALL LETTER L WITH ACUTE
|
||||
0xBF 0x0145 # LATIN CAPITAL LETTER N WITH CEDILLA
|
||||
0xC0 0x0146 # LATIN SMALL LETTER N WITH CEDILLA
|
||||
0xC1 0x0143 # LATIN CAPITAL LETTER N WITH ACUTE
|
||||
0xC2 0x00AC # NOT SIGN
|
||||
0xC3 0x221A # SQUARE ROOT
|
||||
0xC4 0x0144 # LATIN SMALL LETTER N WITH ACUTE
|
||||
0xC5 0x0147 # LATIN CAPITAL LETTER N WITH CARON
|
||||
0xC6 0x2206 # INCREMENT
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x0148 # LATIN SMALL LETTER N WITH CARON
|
||||
0xCC 0x0150 # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
|
||||
0xCD 0x00D5 # LATIN CAPITAL LETTER O WITH TILDE
|
||||
0xCE 0x0151 # LATIN SMALL LETTER O WITH DOUBLE ACUTE
|
||||
0xCF 0x014C # LATIN CAPITAL LETTER O WITH MACRON
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x00F7 # DIVISION SIGN
|
||||
0xD7 0x25CA # LOZENGE
|
||||
0xD8 0x014D # LATIN SMALL LETTER O WITH MACRON
|
||||
0xD9 0x0154 # LATIN CAPITAL LETTER R WITH ACUTE
|
||||
0xDA 0x0155 # LATIN SMALL LETTER R WITH ACUTE
|
||||
0xDB 0x0158 # LATIN CAPITAL LETTER R WITH CARON
|
||||
0xDC 0x2039 # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
|
||||
0xDD 0x203A # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
|
||||
0xDE 0x0159 # LATIN SMALL LETTER R WITH CARON
|
||||
0xDF 0x0156 # LATIN CAPITAL LETTER R WITH CEDILLA
|
||||
0xE0 0x0157 # LATIN SMALL LETTER R WITH CEDILLA
|
||||
0xE1 0x0160 # LATIN CAPITAL LETTER S WITH CARON
|
||||
0xE2 0x201A # SINGLE LOW-9 QUOTATION MARK
|
||||
0xE3 0x201E # DOUBLE LOW-9 QUOTATION MARK
|
||||
0xE4 0x0161 # LATIN SMALL LETTER S WITH CARON
|
||||
0xE5 0x015A # LATIN CAPITAL LETTER S WITH ACUTE
|
||||
0xE6 0x015B # LATIN SMALL LETTER S WITH ACUTE
|
||||
0xE7 0x00C1 # LATIN CAPITAL LETTER A WITH ACUTE
|
||||
0xE8 0x0164 # LATIN CAPITAL LETTER T WITH CARON
|
||||
0xE9 0x0165 # LATIN SMALL LETTER T WITH CARON
|
||||
0xEA 0x00CD # LATIN CAPITAL LETTER I WITH ACUTE
|
||||
0xEB 0x017D # LATIN CAPITAL LETTER Z WITH CARON
|
||||
0xEC 0x017E # LATIN SMALL LETTER Z WITH CARON
|
||||
0xED 0x016A # LATIN CAPITAL LETTER U WITH MACRON
|
||||
0xEE 0x00D3 # LATIN CAPITAL LETTER O WITH ACUTE
|
||||
0xEF 0x00D4 # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
|
||||
0xF0 0x016B # LATIN SMALL LETTER U WITH MACRON
|
||||
0xF1 0x016E # LATIN CAPITAL LETTER U WITH RING ABOVE
|
||||
0xF2 0x00DA # LATIN CAPITAL LETTER U WITH ACUTE
|
||||
0xF3 0x016F # LATIN SMALL LETTER U WITH RING ABOVE
|
||||
0xF4 0x0170 # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
|
||||
0xF5 0x0171 # LATIN SMALL LETTER U WITH DOUBLE ACUTE
|
||||
0xF6 0x0172 # LATIN CAPITAL LETTER U WITH OGONEK
|
||||
0xF7 0x0173 # LATIN SMALL LETTER U WITH OGONEK
|
||||
0xF8 0x00DD # LATIN CAPITAL LETTER Y WITH ACUTE
|
||||
0xF9 0x00FD # LATIN SMALL LETTER Y WITH ACUTE
|
||||
0xFA 0x0137 # LATIN SMALL LETTER K WITH CEDILLA
|
||||
0xFB 0x017B # LATIN CAPITAL LETTER Z WITH DOT ABOVE
|
||||
0xFC 0x0141 # LATIN CAPITAL LETTER L WITH STROKE
|
||||
0xFD 0x017C # LATIN SMALL LETTER Z WITH DOT ABOVE
|
||||
0xFE 0x0122 # LATIN CAPITAL LETTER G WITH CEDILLA
|
||||
0xFF 0x02C7 # CARON
|
7914
charmap/CHINSIMP.TXT
Normal file
7914
charmap/CHINSIMP.TXT
Normal file
File diff suppressed because it is too large
Load Diff
13911
charmap/CHINTRAD.TXT
Normal file
13911
charmap/CHINTRAD.TXT
Normal file
File diff suppressed because it is too large
Load Diff
519
charmap/CORPCHAR.TXT
Normal file
519
charmap/CORPCHAR.TXT
Normal file
@ -0,0 +1,519 @@
|
||||
#=======================================================================
|
||||
# File name: CORPCHAR.TXT
|
||||
#
|
||||
# Contents: Registry (external version) of Apple use of
|
||||
# Unicode corporate-zone characters.
|
||||
#
|
||||
# Copyright: (c) 1994-2003, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c03 2005-Apr-04 Deprecate 0xF8E6. Matches internal registry
|
||||
# <c1.3>
|
||||
# c02 2003-Feb-18 Add entry for 0xF802.
|
||||
# b4,c1 2002-Dec-19 Add entries for 0xF700-0xF747 and 0xF803-
|
||||
# 0xF84F; update replacement characters for
|
||||
# 0xF883, 0xF8AA, 0xF8B4, 0xF8B7, 0xF8BD,
|
||||
# 0xF8D7-0xF8E4, 0xF8EB-0xF8F3, 0xF8F5-
|
||||
# 0xF8FE. Deprecate 0xF8E7, 0xF8F4. Delete Mac
|
||||
# OS Greek mapping for 0xF8A0. Update URLs.
|
||||
# Matches internal registry <b7>.
|
||||
# b03 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal registry <b3> and Text Encoding
|
||||
# Converter version 1.5.
|
||||
# b02 1998-Aug-18 Expanded usage of 0xF8A0. Matches internal
|
||||
# registry <b3>.
|
||||
# n11 1998-Feb-05 Minor update to header comments
|
||||
# n09 1997-Dec-14 Update to match internal registry <n23>:
|
||||
# Add source hint 0xF850, transcoding hints
|
||||
# 0xF860-0xF86B and 0xF870-0xF872, deprecate
|
||||
# almost all other non-hint corporate
|
||||
# characters.
|
||||
# n08 1997-Jul-17 Update to match internal registry <n13>:
|
||||
# Add characters for Mac OS Chinese, Korean &
|
||||
# Farsi. Add CJK source hints. Deprecate some
|
||||
# characters in favor of combinations of
|
||||
# standard characters and transcoding hints.
|
||||
# Change header format.
|
||||
# n04 1995-Nov-15 Update to match internal registry <n8>:
|
||||
# Add characters for Mac OS Hebrew and Thai.
|
||||
# n02 1995-Apr-18 First version. Matches internal registry
|
||||
# <n5>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Two tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Unicode corporate character code point
|
||||
# (in hex as 0xNNNN)
|
||||
# Column #2 is a comment containing:
|
||||
# 1) an informal name describing the Unicode corporate character,
|
||||
# or if it is deprecated, information about what to use
|
||||
# instead.
|
||||
# 2) optionally, another '#', followed by information on which
|
||||
# Mac OS encodings use the Unicode corporate character, and -
|
||||
# if relevant - the Mac OS code points that correspond to the
|
||||
# corporate character.
|
||||
#
|
||||
# The entries are in Unicode order.
|
||||
#_______________________________________________________________________
|
||||
|
||||
# NeXT's OpenStep reserved corporate characters in the range 0xF700 to
|
||||
# 0xF8FF for transient use as keyboard function keys. The ones actually
|
||||
# assigned in NextStep are 0xF700-0xF747, as follows. These are still
|
||||
# used in the Mac OS X AppKit frameworks. Note that there is no glyph
|
||||
# associated with these, and they are not mapped or used by the Mac OS
|
||||
# Text Encoding Converter.
|
||||
0xF700 # NSUpArrowFunctionKey
|
||||
0xF701 # NSDownArrowFunctionKey
|
||||
0xF702 # NSLeftArrowFunctionKey
|
||||
0xF703 # NSRightArrowFunctionKey
|
||||
0xF704 # NSF1FunctionKey
|
||||
0xF705 # NSF2FunctionKey
|
||||
0xF706 # NSF3FunctionKey
|
||||
0xF707 # NSF4FunctionKey
|
||||
0xF708 # NSF5FunctionKey
|
||||
0xF709 # NSF6FunctionKey
|
||||
0xF70A # NSF7FunctionKey
|
||||
0xF70B # NSF8FunctionKey
|
||||
0xF70C # NSF9FunctionKey
|
||||
0xF70D # NSF10FunctionKey
|
||||
0xF70E # NSF11FunctionKey
|
||||
0xF70F # NSF12FunctionKey
|
||||
0xF710 # NSF13FunctionKey
|
||||
0xF711 # NSF14FunctionKey
|
||||
0xF712 # NSF15FunctionKey
|
||||
0xF713 # NSF16FunctionKey
|
||||
0xF714 # NSF17FunctionKey
|
||||
0xF715 # NSF18FunctionKey
|
||||
0xF716 # NSF19FunctionKey
|
||||
0xF717 # NSF20FunctionKey
|
||||
0xF718 # NSF21FunctionKey
|
||||
0xF719 # NSF22FunctionKey
|
||||
0xF71A # NSF23FunctionKey
|
||||
0xF71B # NSF24FunctionKey
|
||||
0xF71C # NSF25FunctionKey
|
||||
0xF71D # NSF26FunctionKey
|
||||
0xF71E # NSF27FunctionKey
|
||||
0xF71F # NSF28FunctionKey
|
||||
0xF720 # NSF29FunctionKey
|
||||
0xF721 # NSF30FunctionKey
|
||||
0xF722 # NSF31FunctionKey
|
||||
0xF723 # NSF32FunctionKey
|
||||
0xF724 # NSF33FunctionKey
|
||||
0xF725 # NSF34FunctionKey
|
||||
0xF726 # NSF35FunctionKey
|
||||
0xF727 # NSInsertFunctionKey
|
||||
0xF728 # NSDeleteFunctionKey
|
||||
0xF729 # NSHomeFunctionKey
|
||||
0xF72A # NSBeginFunctionKey
|
||||
0xF72B # NSEndFunctionKey
|
||||
0xF72C # NSPageUpFunctionKey
|
||||
0xF72D # NSPageDownFunctionKey
|
||||
0xF72E # NSPrintScreenFunctionKey
|
||||
0xF72F # NSScrollLockFunctionKey
|
||||
0xF730 # NSPauseFunctionKey
|
||||
0xF731 # NSSysReqFunctionKey
|
||||
0xF732 # NSBreakFunctionKey
|
||||
0xF733 # NSResetFunctionKey
|
||||
0xF734 # NSStopFunctionKey
|
||||
0xF735 # NSMenuFunctionKey
|
||||
0xF736 # NSUserFunctionKey
|
||||
0xF737 # NSSystemFunctionKey
|
||||
0xF738 # NSPrintFunctionKey
|
||||
0xF739 # NSClearLineFunctionKey
|
||||
0xF73A # NSClearDisplayFunctionKey
|
||||
0xF73B # NSInsertLineFunctionKey
|
||||
0xF73C # NSDeleteLineFunctionKey
|
||||
0xF73D # NSInsertCharFunctionKey
|
||||
0xF73E # NSDeleteCharFunctionKey
|
||||
0xF73F # NSPrevFunctionKey
|
||||
0xF740 # NSNextFunctionKey
|
||||
0xF741 # NSSelectFunctionKey
|
||||
0xF742 # NSExecuteFunctionKey
|
||||
0xF743 # NSUndoFunctionKey
|
||||
0xF744 # NSRedoFunctionKey
|
||||
0xF745 # NSFindFunctionKey
|
||||
0xF746 # NSHelpFunctionKey
|
||||
0xF747 # NSModeSwitchFunctionKey
|
||||
|
||||
# The following (11) are for mapping the Mac OS Keyboard and Mac OS Korean
|
||||
# encodings (for Mac OS Korean also see 0xF83D, 0xF840-0xF84F).
|
||||
0xF802 # lower left pencil # Keyboard-0x0F
|
||||
0xF803 # contextual menu symbol # Keyboard-0x6D
|
||||
0xF804 # eject symbol # Keyboard-0x8C
|
||||
0xF805 # black diamond minus white square # Korean-0xA658
|
||||
0xF806 # black square minus white diamond # Korean-0xA663
|
||||
0xF807 # telephone dial # Korean-0xA69F
|
||||
0xF808 # five vertical lines # Korean-0xA68F
|
||||
0xF809 # one downward-pointing black triangle over two others # Korean-0xA681
|
||||
0xF80A # two interwoven eye shapes # Korean-0xA674
|
||||
0xF80B # narrow-leaf four-petal florette # Korean-0xA696
|
||||
0xF80C # four interleaved fisheyes # Korean-0xA69A
|
||||
|
||||
# The following (51) are mainly for mapping the dingbat/fleuron repetoire
|
||||
# of the Hoefler Ornaments font, which is otherwise unmappable to Unicode.
|
||||
# 0xF83D is also used for mapping MacKorean.
|
||||
0xF80D # horizontal line thickening at center # Hoefler Ornaments glyph 6
|
||||
0xF80E # dotted X design 1 # Hoefler Ornaments glyph 7
|
||||
0xF80F # dotted X design 2 # Hoefler Ornaments glyph 8
|
||||
0xF810 # dotted X design 3 # Hoefler Ornaments glyph 9
|
||||
0xF811 # dotted X design 4 # Hoefler Ornaments glyph 10
|
||||
0xF812 # horizontal line with wasp waist at center # Hoefler Ornaments glyph 11
|
||||
0xF813 # horizontal line thickening at center, alternate # Hoefler Ornaments glyph 12
|
||||
0xF814 # half-filled fleuron 1 # Hoefler Ornaments glyph 13
|
||||
0xF815 # half-filled fleuron 2 # Hoefler Ornaments glyph 14
|
||||
0xF816 # half-filled fleuron 3 # Hoefler Ornaments glyph 15
|
||||
0xF817 # half-filled fleuron 4 # Hoefler Ornaments glyph 16
|
||||
0xF818 # half-filled fleuron 5 # Hoefler Ornaments glyph 17
|
||||
0xF819 # half-filled fleuron 6 # Hoefler Ornaments glyph 18
|
||||
0xF81A # half-filled fleuron 7 # Hoefler Ornaments glyph 19
|
||||
0xF81B # half-filled fleuron 8 # Hoefler Ornaments glyph 20
|
||||
0xF81C # half-filled fleuron 9 # Hoefler Ornaments glyph 21
|
||||
0xF81D # half-filled fleuron 10 # Hoefler Ornaments glyph 22
|
||||
0xF81E # half-filled fleuron 11 # Hoefler Ornaments glyph 23
|
||||
0xF81F # half-filled fleuron 12 # Hoefler Ornaments glyph 24
|
||||
0xF820 # half-filled fleuron 13 # Hoefler Ornaments glyph 25
|
||||
0xF821 # half-filled fleuron 14 # Hoefler Ornaments glyph 26
|
||||
0xF822 # half-filled fleuron 15 # Hoefler Ornaments glyph 27
|
||||
0xF823 # half-filled fleuron 16 # Hoefler Ornaments glyph 28
|
||||
0xF824 # half-filled dingbat 1 # Hoefler Ornaments glyph 29
|
||||
0xF825 # half-filled dingbat 2 # Hoefler Ornaments glyph 30
|
||||
0xF826 # half-filled dingbat 3 # Hoefler Ornaments glyph 31
|
||||
0xF827 # filled fleuron 1 # Hoefler Ornaments glyph 34
|
||||
0xF828 # filled fleuron 2 # Hoefler Ornaments glyph 35
|
||||
0xF829 # filled fleuron 3 # Hoefler Ornaments glyph 36
|
||||
0xF82A # filled fleuron 4 # Hoefler Ornaments glyph 37
|
||||
0xF82B # filled fleuron 5 # Hoefler Ornaments glyph 38
|
||||
0xF82C # filled fleuron 6 # Hoefler Ornaments glyph 39
|
||||
0xF82D # filled fleuron 7 # Hoefler Ornaments glyph 40
|
||||
0xF82E # filled fleuron 8 # Hoefler Ornaments glyph 41
|
||||
0xF82F # filled fleuron 9 # Hoefler Ornaments glyph 42
|
||||
0xF830 # filled fleuron 10 # Hoefler Ornaments glyph 43
|
||||
0xF831 # filled fleuron 11 # Hoefler Ornaments glyph 44
|
||||
0xF832 # filled fleuron 12 # Hoefler Ornaments glyph 45
|
||||
0xF833 # filled fleuron 13 # Hoefler Ornaments glyph 46
|
||||
0xF834 # filled fleuron 14 # Hoefler Ornaments glyph 47
|
||||
0xF835 # filled fleuron 15 # Hoefler Ornaments glyph 48
|
||||
0xF836 # filled fleuron 16 # Hoefler Ornaments glyph 49
|
||||
0xF837 # filled dingbat 1 # Hoefler Ornaments glyph 50
|
||||
0xF838 # filled dingbat 2 # Hoefler Ornaments glyph 51
|
||||
0xF839 # filled dingbat 3 # Hoefler Ornaments glyph 52
|
||||
0xF83A # sun with face # Hoefler Ornaments glyph 53
|
||||
0xF83B # moon with face # Hoefler Ornaments glyph 54
|
||||
0xF83C # crown # Hoefler Ornaments glyph 55
|
||||
0xF83D # fleur-de-lis # Korean-0xA642, Hoefler Ornaments glyph 57
|
||||
0xF83E # sailing ship # Hoefler Ornaments glyph 58
|
||||
0xF83F # fleuron 17 # Hoefler Ornaments glyph 59
|
||||
|
||||
# The following (16) are for mapping the Mac OS Korean encoding
|
||||
# (also see 0xF805-0xF80C, 0xF83D).
|
||||
0xF840 # three asterisks aligned vertically # Korean-0xA16E
|
||||
0xF841 # left right up down arrow # Korean-0xA894
|
||||
0xF842 # downwards wave arrow # Korean-0xAC54
|
||||
0xF843 # leftwards white arrow from wall (cf. U+21F0) # Korean-0xAC42
|
||||
0xF844 # black leftwards arrowhead (cf. U+27A4) # Korean-0xAC49
|
||||
0xF845 # black-feathered leftwards arrow (cf. U+27B5) # Korean-0xAC5F
|
||||
0xF846 # leftwards arrowhead with tail of spreading ripples # Korean-0xA867
|
||||
0xF847 # rightwards arrowhead with tail of spreading ripples # Korean-0xA868
|
||||
0xF848 # large white leftwards arrow with white fins # Korean-0xA89D
|
||||
0xF849 # large white rightwards arrow with white fins # Korean-0xA89C
|
||||
0xF84A # leftwards arrow with bow # Korean-0xAC4B
|
||||
0xF84B # rightwards arrow with bow # Korean-0xAC4A
|
||||
0xF84C # pentagon # Korean-0xA747
|
||||
0xF84D # trapezoid # Korean-0xA74B
|
||||
0xF84E # quadrilateral with shorter right side # Korean-0xA74C
|
||||
0xF84F # quadrilateral with shorter left side # Korean-0xA74D
|
||||
|
||||
# The block of 16 characters 0xF850-0xF85F is for source hint characters.
|
||||
# These have no display (like zero-width no-break space). If they appear
|
||||
# in text, they can only be mapped to tables that include them. If a run
|
||||
# of Unicode characters such as Han characters could otherwise be mapped
|
||||
# to any of several encodings, including one of these hint characters can
|
||||
# force the text to be mapped only to an encoding whose mapping table
|
||||
# includes the hint character. Once they have forced mapping to a particular
|
||||
# encoding, they no longer apply (they don't need to be cancelled); if a
|
||||
# subsequent character cannot be mapped to that encoding, it may be mapped
|
||||
# to another encoding. Currently source hints are mainly defined for CJK
|
||||
# source disambiguation.
|
||||
# NOTE: These are only defined for application developers who have requested
|
||||
# them. The Mac OS Text Encoding Converter does not generate these when
|
||||
# converting from other CJK encodings to Unicode. However, it will handle
|
||||
# these characters correctly when converting from Unicode to other encodings.
|
||||
0xF850 # source hint: Reset, try all candidate encodings in preferred order.
|
||||
0xF85C # source hint: Chinese simplified
|
||||
0xF85D # source hint: Chinese traditional
|
||||
0xF85E # source hint: Japanese
|
||||
0xF85F # source hint: Korean
|
||||
|
||||
# The block of 32 characters 0xF860-0xF87F is for transcoding hints.
|
||||
# These are used in combination with standard Unicode characters to force
|
||||
# them to be treated in a special way for mapping to other encodings;
|
||||
# they have no other effect.
|
||||
#
|
||||
# 0xF870-0xF87F are "variant tags" - they are like combining characters,
|
||||
# and can follow a standard Unicode (or a sequence consisting of a base
|
||||
# character and other combining characters) to tag it so that it will be
|
||||
# unique, treated in a special way for transcoding. These always terminate
|
||||
# a sequence of combining characters.
|
||||
#
|
||||
# 0xF860-0xF86B are "grouping hints" - they precede a group of two to
|
||||
# four standard Unicode characters to indicate that they are treated as a
|
||||
# group for transcoding. This grouping overrides any other combining
|
||||
# behavior.
|
||||
#
|
||||
# Here are the ones defined so far:
|
||||
0xF860 # transcoding hint: group next 2 characters # Japanese,Korean
|
||||
0xF861 # transcoding hint: group next 3 characters # Japanese,Korean
|
||||
0xF862 # transcoding hint: group next 4 characters # Japanese,Korean
|
||||
0xF863 # transcoding hint: group next 4 characters, alt1 # Korean
|
||||
0xF864 # transcoding hint: group next 4 characters, alt2 # Korean
|
||||
0xF865 # transcoding hint: group next 4 characters, alt3 # Korean
|
||||
0xF866 # transcoding hint: group next 4 characters, alt4 # Korean
|
||||
0xF867 # transcoding hint: group next 2 characters, alt1 # Korean
|
||||
0xF868 # transcoding hint: group next 2 characters, alt2 # Korean
|
||||
0xF869 # transcoding hint: group next 2 characters, alt3 # Korean
|
||||
0xF86A # transcoding hint: group next 2 characters, RL # Hebrew
|
||||
0xF86B # transcoding hint: group next 4 characters, RL # Farsi variant
|
||||
#
|
||||
0xF870 # transcoding hint: variant tag 16 # Symbol, Korean
|
||||
0xF871 # transcoding hint: variant tag 15 # Symbol, Korean
|
||||
0xF872 # transcoding hint: variant tag 14 # Symbol
|
||||
0xF873 # transcoding hint: variant tag 13 # Korean, Thai
|
||||
0xF874 # transcoding hint: variant tag 12 # Korean, Thai
|
||||
0xF875 # transcoding hint: variant tag 11 # Korean, Thai
|
||||
0xF876 # transcoding hint: variant tag 10 # Korean
|
||||
0xF877 # transcoding hint: variant tag 9 # Korean
|
||||
0xF878 # transcoding hint: variant tag 8 # Korean
|
||||
0xF879 # transcoding hint: variant tag 7 # Korean
|
||||
0xF87A # transcoding hint: variant tag 6 # Korean
|
||||
0xF87B # transcoding hint: variant tag 5 # Korean
|
||||
0xF87C # transcoding hint: variant tag 4 # ChineseTrad, Korean, Dingbats
|
||||
0xF87D # transcoding hint: variant tag 3 # ChineseTrad
|
||||
0xF87E # transcoding hint: variant tag 2 # Chinese,Japanese
|
||||
0xF87F # transcoding hint: variant tag 1 # CJK,Symbol,Dingbats,Hebrew
|
||||
|
||||
# The following (2) are metrics "characters" so applications can get the
|
||||
# height and width of double-byte character glyphs by measuring the glyph of a
|
||||
# one-byte character (e.g. calling CharWidth for character 0x82 in a Chinese
|
||||
# Traditional font); this approach assumes that the glyphs for all double-byte
|
||||
# characters in a font have the same metrics, which is currently true. Note
|
||||
# that the width-metric character glyphs are used differently for TrueType and
|
||||
# old-style bitmap fonts; for TrueType fonts the metric glyph width is equal
|
||||
# to the full width of a double-byte character glyph, while for FBIT/FDEF
|
||||
# bitmap fonts the metric glyph width is half the width of a double-byte
|
||||
# character glyph.
|
||||
0xF880 # height-metric character for double-byte fonts # Chinese Simp&Trad-0x81
|
||||
0xF881 # width-metric character for double-byte fonts # Chinese Simp&Trad-0x82
|
||||
|
||||
# The following (2) are for the TrueType variant of Mac OS Farsi.
|
||||
# NOTE: 0xF883 is deprecated, but is still loosely mapped to 0xA4 in the
|
||||
# Mac OS Farsi TrueType variant.
|
||||
0xF882 # Arabic ligature "peace on him" # Farsi(TrueType variant)-0x8B
|
||||
0xF883 # deprecated, use 0xFDFC (3.2) or 0xF86B+0x0631+0x06CC+0x0627+0x0644 # Farsi(TrueType variant)-0xA4
|
||||
|
||||
# The following (22) are for the Mac OS Thai encoding.
|
||||
# In this encoding, positional variants of upper vowels, tone marks,
|
||||
# and other marks are normally handled automatically by WorldScript I.
|
||||
# However, the Thai-DTP keyboard allows the codes for the positional
|
||||
# variants to be entered directly, so they must be treated as
|
||||
# characters. When the abstract character is treated as a positional
|
||||
# variant, it has the right (and high, if relevant) position.
|
||||
# NOTE: These are now all deprecated in favor of combinations of standard
|
||||
# characters and transcoding hints. The deprecated characters will still
|
||||
# be loosely mapped to the appropriate Mac OS Thai character.
|
||||
0xF884 # deprecated, use 0x0E31+0xF874 # Thai-0x92
|
||||
0xF885 # deprecated, use 0x0E34+0xF874 # Thai-0x94
|
||||
0xF886 # deprecated, use 0x0E35+0xF874 # Thai-0x95
|
||||
0xF887 # deprecated, use 0x0E36+0xF874 # Thai-0x96
|
||||
0xF888 # deprecated, use 0x0E37+0xF874 # Thai-0x97
|
||||
0xF889 # deprecated, use 0x0E47+0xF874 # Thai-0x93
|
||||
0xF88A # deprecated, use 0x0E48+0xF874 # Thai-0x98
|
||||
0xF88B # deprecated, use 0x0E48+0xF873 # Thai-0x88
|
||||
0xF88C # deprecated, use 0x0E48+0xF875 # Thai-0x83
|
||||
0xF88D # deprecated, use 0x0E49+0xF874 # Thai-0x99
|
||||
0xF88E # deprecated, use 0x0E49+0xF873 # Thai-0x89
|
||||
0xF88F # deprecated, use 0x0E49+0xF875 # Thai-0x84
|
||||
0xF890 # deprecated, use 0x0E4A+0xF874 # Thai-0x9A
|
||||
0xF891 # deprecated, use 0x0E4A+0xF873 # Thai-0x8A
|
||||
0xF892 # deprecated, use 0x0E4A+0xF875 # Thai-0x85
|
||||
0xF893 # deprecated, use 0x0E4B+0xF874 # Thai-0x9B
|
||||
0xF894 # deprecated, use 0x0E4B+0xF873 # Thai-0x8B
|
||||
0xF895 # deprecated, use 0x0E4B+0xF875 # Thai-0x86
|
||||
0xF896 # deprecated, use 0x0E4C+0xF874 # Thai-0x9C
|
||||
0xF897 # deprecated, use 0x0E4C+0xF873 # Thai-0x8C
|
||||
0xF898 # deprecated, use 0x0E4C+0xF875 # Thai-0x87
|
||||
0xF899 # deprecated, use 0x0E4D+0xF874 # Thai-0x8F
|
||||
|
||||
# The following (6) are for the Mac OS Hebrew encoding. Four of
|
||||
# these are for the obsolete "canoral" codes that were used before
|
||||
# System 7.1/Worldscript to control positioning of nikud marks (points).
|
||||
# In the future these 4 code points may be redefined.
|
||||
# NOTE: Some of these are deprecated in favor of a combination of standard
|
||||
# character and transcoding hint. The deprecated characters will still
|
||||
# be loosely mapped to the appropriate Mac OS Hebrew character.
|
||||
0xF89A # deprecated, use 0xF86A+0x05DC+0x05B9 # Hebrew-0xC0
|
||||
0xF89B # Hebrew canoral 1 # Hebrew-0xC2
|
||||
0xF89C # Hebrew canoral 2 # Hebrew-0xC3
|
||||
0xF89D # Hebrew canoral 3 # Hebrew-0xC4
|
||||
0xF89E # Hebrew canoral 4 # Hebrew-0xC5
|
||||
0xF89F # deprecated, use 0x05B8+0xF87F # Hebrew-0xDE
|
||||
|
||||
# The following (1) is for mapping the single undefined code point in
|
||||
# the Mac OS Greek and Turkish encodings, thus permitting full
|
||||
# round-trip fidelity. This character is also used for mapping EURO SIGN
|
||||
# when mapping to Unicode 1.1 (e.g. for Mac OS Roman and Symbol).
|
||||
0xF8A0 # undefined1, also EURO SIGN for Unicode 1.1 # Turkish-0xF5, Roman-0xDB, Symbol-0xA0
|
||||
|
||||
# The following (54) are for the Mac OS Japanese encoding.
|
||||
# part 1 - Apple corporate Unicode chars for Mac OS Japanese extended
|
||||
# characters not in Unicode.
|
||||
# NOTE: These are now all deprecated in favor of combinations of standard
|
||||
# characters and transcoding hints. The deprecated characters will still
|
||||
# be loosely mapped to the appropriate Mac OS Japanese character.
|
||||
0xF8A1 # deprecated, use 0xF860+0x0030+0x002E # Jpn-0x8591
|
||||
0xF8A2 # deprecated, use 0xF862+0x0058+0x0049+0x0049+0x0049 # Jpn-0x85AB
|
||||
0xF8A3 # deprecated, use 0xF861+0x0058+0x0049+0x0056 # Jpn-0x85AC
|
||||
0xF8A4 # deprecated, use 0xF860+0x0058+0x0056 # Jpn-0x85AD
|
||||
0xF8A5 # deprecated, use 0xF862+0x0078+0x0069+0x0069+0x0069 # Jpn-0x85BF
|
||||
0xF8A6 # deprecated, use 0xF861+0x0078+0x0069+0x0076 # Jpn-0x85C0
|
||||
0xF8A7 # deprecated, use 0xF860+0x0078+0x0076 # Jpn-0x85C1
|
||||
0xF8A8 # deprecated, use 0xFF4D+0xF87F # Jpn-0x8645
|
||||
0xF8A9 # deprecated, use 0xFF47+0xF87F # Jpn-0x864B
|
||||
0xF8AA # deprecated, use 0x2113 # Jpn-0x8650
|
||||
0xF8AB # deprecated, use 0xF860+0x0054+0x0042 # Jpn-0x865D
|
||||
0xF8AC # deprecated, use 0xF861+0x0046+0x0041+0x0058 # Jpn-0x869E
|
||||
0xF8AD # deprecated, use 0xF860+0x2193+0x2191 # Jpn-0x86CE
|
||||
0xF8AE # deprecated, use 0x21E8+0xF87A # Jpn-0x86D3
|
||||
0xF8AF # deprecated, use 0x21E6+0xF87A # Jpn-0x86D4
|
||||
0xF8B0 # deprecated, use 0x21E7+0xF87A # Jpn-0x86D5
|
||||
0xF8B1 # deprecated, use 0x21E9+0xF87A # Jpn-0x86D6
|
||||
0xF8B2 # deprecated, use 0xF862+0x6709+0x9650+0x4F1A+0x793E # Jpn-0x87FB
|
||||
0xF8B3 # deprecated, use 0xF862+0x8CA1+0x56E3+0x6CD5+0x4EBA # Jpn-0x87FC
|
||||
0xF8B4 # deprecated, use 0x301F # Jpn-0x8855
|
||||
# part 2 - Apple corporate Unicode chars for Mac OS Japanese vertical
|
||||
# forms not in Unicode.
|
||||
# NOTE: These are now all deprecated in favor of combinations of standard
|
||||
# characters and transcoding hints. The deprecated characters will still
|
||||
# be loosely mapped to the appropriate Mac OS Japanese character.
|
||||
0xF8B5 # deprecated, use 0x3001+0xF87E # Jpn-0xEB41
|
||||
0xF8B6 # deprecated, use 0x3002+0xF87E # Jpn-0xEB42
|
||||
0xF8B7 # deprecated, use 0xFFE3+0xF87E # Jpn-0xEB50
|
||||
0xF8B8 # deprecated, use 0x30FC+0xF87E # Jpn-0xEB5B
|
||||
0xF8B9 # deprecated, use 0x2010+0xF87E # Jpn-0xEB5D
|
||||
0xF8BA # deprecated, use 0x301C+0xF87E # Jpn-0xEB60
|
||||
0xF8BB # deprecated, use 0x2016+0xF87E # Jpn-0xEB61
|
||||
0xF8BC # deprecated, use 0xFF5C+0xF87E # Jpn-0xEB62
|
||||
0xF8BD # deprecated, use 0x2026+0xF87E # Jpn-0xEB63
|
||||
0xF8BE # deprecated, use 0xFF3B+0xF87E # Jpn-0xEB6D
|
||||
0xF8BF # deprecated, use 0xFF3D+0xF87E # Jpn-0xEB6E
|
||||
0xF8C0 # deprecated, use 0xFF1D+0xF87E # Jpn-0xEB81
|
||||
0xF8C1 # deprecated, use 0x3041+0xF87E # Jpn-0xEC9F
|
||||
0xF8C2 # deprecated, use 0x3043+0xF87E # Jpn-0xECA1
|
||||
0xF8C3 # deprecated, use 0x3045+0xF87E # Jpn-0xECA3
|
||||
0xF8C4 # deprecated, use 0x3047+0xF87E # Jpn-0xECA5
|
||||
0xF8C5 # deprecated, use 0x3049+0xF87E # Jpn-0xECA7
|
||||
0xF8C6 # deprecated, use 0x3063+0xF87E # Jpn-0xECC1
|
||||
0xF8C7 # deprecated, use 0x3083+0xF87E # Jpn-0xECE1
|
||||
0xF8C8 # deprecated, use 0x3085+0xF87E # Jpn-0xECE3
|
||||
0xF8C9 # deprecated, use 0x3087+0xF87E # Jpn-0xECE5
|
||||
0xF8CA # deprecated, use 0x308E+0xF87E # Jpn-0xECEC
|
||||
0xF8CB # deprecated, use 0x30A1+0xF87E # Jpn-0xED40
|
||||
0xF8CC # deprecated, use 0x30A3+0xF87E # Jpn-0xED42
|
||||
0xF8CD # deprecated, use 0x30A5+0xF87E # Jpn-0xED44
|
||||
0xF8CE # deprecated, use 0x30A7+0xF87E # Jpn-0xED46
|
||||
0xF8CF # deprecated, use 0x30A9+0xF87E # Jpn-0xED48
|
||||
0xF8D0 # deprecated, use 0x30C3+0xF87E # Jpn-0xED62
|
||||
0xF8D1 # deprecated, use 0x30E3+0xF87E # Jpn-0xED83
|
||||
0xF8D2 # deprecated, use 0x30E5+0xF87E # Jpn-0xED85
|
||||
0xF8D3 # deprecated, use 0x30E7+0xF87E # Jpn-0xED87
|
||||
0xF8D4 # deprecated, use 0x30EE+0xF87E # Jpn-0xED8E
|
||||
0xF8D5 # deprecated, use 0x30F5+0xF87E # Jpn-0xED95
|
||||
0xF8D6 # deprecated, use 0x30F6+0xF87E # Jpn-0xED96
|
||||
|
||||
# The following (14) are for the Mac OS Dingbats encoding.
|
||||
# NOTE: These are now all deprecated in favor of standard characters or
|
||||
# combinations of standard characters and transcoding hints. The
|
||||
# deprecated characters will still be loosely mapped to the appropriate
|
||||
# Mac OS Dingbats character.
|
||||
0xF8D7 # deprecated, use 0x2768 (3.2) or 0x0028 # Dingbats-0x80
|
||||
0xF8D8 # deprecated, use 0x2769 (3.2) or 0x0029 # Dingbats-0x81
|
||||
0xF8D9 # deprecated, use 0x276A (3.2) or 0x0028+0xF87F # Dingbats-0x82
|
||||
0xF8DA # deprecated, use 0x276B (3.2) or 0x0029+0xF87F # Dingbats-0x83
|
||||
0xF8DB # deprecated, use 0x276C (3.2) or 0x3008 # Dingbats-0x84
|
||||
0xF8DC # deprecated, use 0x276D (3.2) or 0x3009 # Dingbats-0x85
|
||||
0xF8DD # deprecated, use 0x276E (3.2) or 0x2039 # Dingbats-0x86
|
||||
0xF8DE # deprecated, use 0x276F (3.2) or 0x203A # Dingbats-0x87
|
||||
0xF8DF # deprecated, use 0x2770 (3.2) or 0x3008+0xF87C # Dingbats-0x88
|
||||
0xF8E0 # deprecated, use 0x2771 (3.2) or 0x3009+0xF87C # Dingbats-0x89
|
||||
0xF8E1 # deprecated, use 0x2772 (3.2) or 0x3014 # Dingbats-0x8A
|
||||
0xF8E2 # deprecated, use 0x2773 (3.2) or 0x3015 # Dingbats-0x8B
|
||||
0xF8E3 # deprecated, use 0x2774 (3.2) or 0x007B # Dingbats-0x8C
|
||||
0xF8E4 # deprecated, use 0x2775 (3.2) or 0x007D # Dingbats-0x8D
|
||||
|
||||
# The following (26) are for the Mac OS Symbol encoding.
|
||||
# NOTE: Some of these are deprecated in favor of combinations of standard
|
||||
# characters and transcoding hints. The deprecated characters will still
|
||||
# be loosely mapped to the appropriate Mac OS Symbol character.
|
||||
0xF8E5 # radical extender # Symbol-0x60
|
||||
0xF8E6 # deprecated, use 0x23D0 (4.0) # Symbol-0xBD
|
||||
0xF8E7 # deprecated, use 0x23AF (3.2) # Symbol-0xBE
|
||||
0xF8E8 # deprecated, use 0x00AE+0xF87F # Symbol-0xE2
|
||||
0xF8E9 # deprecated, use 0x00A9+0xF87F # Symbol-0xE3
|
||||
0xF8EA # deprecated, use 0x2122+0xF87F # Symbol-0xE4
|
||||
0xF8EB # deprecated, use 0x239B (3.2) or 0x0028+0xF870 # Symbol-0xE6
|
||||
0xF8EC # deprecated, use 0x239C (3.2) or 0x0028+0xF871 # Symbol-0xE7
|
||||
0xF8ED # deprecated, use 0x239D (3.2) or 0x0028+0xF872 # Symbol-0xE8
|
||||
0xF8EE # deprecated, use 0x23A1 (3.2) or 0x005B+0xF870 # Symbol-0xE9
|
||||
0xF8EF # deprecated, use 0x23A2 (3.2) or 0x005B+0xF871 # Symbol-0xEA
|
||||
0xF8F0 # deprecated, use 0x23A3 (3.2) or 0x005B+0xF872 # Symbol-0xEB
|
||||
0xF8F1 # deprecated, use 0x23A7 (3.2) or 0x007B+0xF870 # Symbol-0xEC
|
||||
0xF8F2 # deprecated, use 0x23A8 (3.2) or 0x007B+0xF871 # Symbol-0xED
|
||||
0xF8F3 # deprecated, use 0x23A9 (3.2) or 0x007B+0xF872 # Symbol-0xEE
|
||||
0xF8F4 # deprecated, use 0x23AA (3.2) # Symbol-0xEF
|
||||
0xF8F5 # deprecated, use 0x23AE (3.2) or 0x222B+0xF871 # Symbol-0xF4
|
||||
0xF8F6 # deprecated, use 0x239E (3.2) or 0x0029+0xF870 # Symbol-0xF6
|
||||
0xF8F7 # deprecated, use 0x239F (3.2) or 0x0029+0xF871 # Symbol-0xF7
|
||||
0xF8F8 # deprecated, use 0x23A0 (3.2) or 0x0029+0xF872 # Symbol-0xF8
|
||||
0xF8F9 # deprecated, use 0x23A4 (3.2) or 0x005D+0xF870 # Symbol-0xF9
|
||||
0xF8FA # deprecated, use 0x23A5 (3.2) or 0x005D+0xF871 # Symbol-0xFA
|
||||
0xF8FB # deprecated, use 0x23A6 (3.2) or 0x005D+0xF872 # Symbol-0xFB
|
||||
0xF8FC # deprecated, use 0x23AB (3.2) or 0x007D+0xF870 # Symbol-0xFC
|
||||
0xF8FD # deprecated, use 0x23AC (3.2) or 0x007D+0xF871 # Symbol-0xFD
|
||||
0xF8FE # deprecated, use 0x23AD (3.2) or 0x007D+0xF872 # Symbol-0xFE
|
||||
|
||||
# The following (1) is for the Mac OS Roman encoding
|
||||
# (also used in Symbol & Croatian).
|
||||
# NOTE: The graphic image associated with the Apple logo character is
|
||||
# not authorized for use without permission of Apple, and unauthorized
|
||||
# use might constitute trademark infringement.
|
||||
0xF8FF # Apple logo # Roman-0xF0, Symbol-0xF0, Croatian-0xD8
|
351
charmap/CROATIAN.TXT
Normal file
351
charmap/CROATIAN.TXT
Normal file
@ -0,0 +1,351 @@
|
||||
#=======================================================================
|
||||
# File name: CROATIAN.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Croatian
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-04 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update URLs, notes. Matches internal
|
||||
# utom<b3>.
|
||||
# b02 1999-Sep-22 Encoding changed for Mac OS 8.5; change
|
||||
# mapping of 0xDB from CURRENCY SIGN to EURO
|
||||
# SIGN. Update contact e-mail address. Matches
|
||||
# internal utom<b2>, ufrm<b2>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n07 1998-Feb-05 Minor update to header comments
|
||||
# n05 1997-Dec-14 Update to match internal utom<5>, ufrm<16>:
|
||||
# Change standard mapping for 0xBD from U+2126
|
||||
# to its canonical decomposition, U+03A9.
|
||||
# n03 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<6>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Croatian code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Croatian code order.
|
||||
#
|
||||
# One of these mappings requires the use of a corporate character.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Croatian character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Croatian:
|
||||
# -------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# Mac OS Croatian is used for Croatian and Slovene.
|
||||
#
|
||||
# The Mac OS Croatian encoding shares the script code smRoman
|
||||
# (0) with the standard Mac OS Roman encoding. To determine if
|
||||
# the Croatian encoding is being used, you must check if the
|
||||
# system region code is 68, verCroatia (or 25, verYugoCroatian,
|
||||
# only used in older systems).
|
||||
#
|
||||
# This character set is a variant of standard Mac OS Roman
|
||||
# encoding, adding five accented letter case pairs to handle
|
||||
# Croatian. It has 20 code point differences from standard
|
||||
# Mac OS Roman, but only 10 differences in repertoire.
|
||||
#
|
||||
# Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
|
||||
# mapped to U+00A4. In Mac OS 8.5 and later versions, code point
|
||||
# 0xDB is changed to EURO SIGN and maps to U+20AC; the standard
|
||||
# Apple fonts are updated for Mac OS 8.5 to reflect this. There is
|
||||
# a "currency sign" variant of the Mac OS Croatian encoding that
|
||||
# still maps 0xDB to U+00A4; this can be used for older fonts.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# The following corporate zone Unicode character is used in this
|
||||
# mapping:
|
||||
#
|
||||
# 0xF8FF Apple logo
|
||||
#
|
||||
# NOTE: The graphic image associated with the Apple logo character
|
||||
# is not authorized for use without permission of Apple, and
|
||||
# unauthorized use might constitute trademark infringement.
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version n07 to version b02:
|
||||
#
|
||||
# - Encoding changed for Mac OS 8.5; change mapping of 0xDB from
|
||||
# CURRENCY SIGN (U+00A4) to EURO SIGN (U+20AC).
|
||||
#
|
||||
# Changes from version n03 to version n05:
|
||||
#
|
||||
# - Change mapping of 0xBD from U+2126 to its canonical
|
||||
# decomposition, U+03A9.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x00C5 # LATIN CAPITAL LETTER A WITH RING ABOVE
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x00E3 # LATIN SMALL LETTER A WITH TILDE
|
||||
0x8C 0x00E5 # LATIN SMALL LETTER A WITH RING ABOVE
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 0x00EC # LATIN SMALL LETTER I WITH GRAVE
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 0x00F2 # LATIN SMALL LETTER O WITH GRAVE
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00F5 # LATIN SMALL LETTER O WITH TILDE
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 0x2020 # DAGGER
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x00A2 # CENT SIGN
|
||||
0xA3 0x00A3 # POUND SIGN
|
||||
0xA4 0x00A7 # SECTION SIGN
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x00DF # LATIN SMALL LETTER SHARP S
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x0160 # LATIN CAPITAL LETTER S WITH CARON
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x00B4 # ACUTE ACCENT
|
||||
0xAC 0x00A8 # DIAERESIS
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x017D # LATIN CAPITAL LETTER Z WITH CARON
|
||||
0xAF 0x00D8 # LATIN CAPITAL LETTER O WITH STROKE
|
||||
0xB0 0x221E # INFINITY
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x2206 # INCREMENT
|
||||
0xB5 0x00B5 # MICRO SIGN
|
||||
0xB6 0x2202 # PARTIAL DIFFERENTIAL
|
||||
0xB7 0x2211 # N-ARY SUMMATION
|
||||
0xB8 0x220F # N-ARY PRODUCT
|
||||
0xB9 0x0161 # LATIN SMALL LETTER S WITH CARON
|
||||
0xBA 0x222B # INTEGRAL
|
||||
0xBB 0x00AA # FEMININE ORDINAL INDICATOR
|
||||
0xBC 0x00BA # MASCULINE ORDINAL INDICATOR
|
||||
0xBD 0x03A9 # GREEK CAPITAL LETTER OMEGA
|
||||
0xBE 0x017E # LATIN SMALL LETTER Z WITH CARON
|
||||
0xBF 0x00F8 # LATIN SMALL LETTER O WITH STROKE
|
||||
0xC0 0x00BF # INVERTED QUESTION MARK
|
||||
0xC1 0x00A1 # INVERTED EXCLAMATION MARK
|
||||
0xC2 0x00AC # NOT SIGN
|
||||
0xC3 0x221A # SQUARE ROOT
|
||||
0xC4 0x0192 # LATIN SMALL LETTER F WITH HOOK
|
||||
0xC5 0x2248 # ALMOST EQUAL TO
|
||||
0xC6 0x0106 # LATIN CAPITAL LETTER C WITH ACUTE
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x010C # LATIN CAPITAL LETTER C WITH CARON
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x00C0 # LATIN CAPITAL LETTER A WITH GRAVE
|
||||
0xCC 0x00C3 # LATIN CAPITAL LETTER A WITH TILDE
|
||||
0xCD 0x00D5 # LATIN CAPITAL LETTER O WITH TILDE
|
||||
0xCE 0x0152 # LATIN CAPITAL LIGATURE OE
|
||||
0xCF 0x0153 # LATIN SMALL LIGATURE OE
|
||||
0xD0 0x0110 # LATIN CAPITAL LETTER D WITH STROKE
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x00F7 # DIVISION SIGN
|
||||
0xD7 0x25CA # LOZENGE
|
||||
0xD8 0xF8FF # Apple logo
|
||||
0xD9 0x00A9 # COPYRIGHT SIGN
|
||||
0xDA 0x2044 # FRACTION SLASH
|
||||
0xDB 0x20AC # EURO SIGN
|
||||
0xDC 0x2039 # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
|
||||
0xDD 0x203A # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
|
||||
0xDE 0x00C6 # LATIN CAPITAL LETTER AE
|
||||
0xDF 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xE0 0x2013 # EN DASH
|
||||
0xE1 0x00B7 # MIDDLE DOT
|
||||
0xE2 0x201A # SINGLE LOW-9 QUOTATION MARK
|
||||
0xE3 0x201E # DOUBLE LOW-9 QUOTATION MARK
|
||||
0xE4 0x2030 # PER MILLE SIGN
|
||||
0xE5 0x00C2 # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
|
||||
0xE6 0x0107 # LATIN SMALL LETTER C WITH ACUTE
|
||||
0xE7 0x00C1 # LATIN CAPITAL LETTER A WITH ACUTE
|
||||
0xE8 0x010D # LATIN SMALL LETTER C WITH CARON
|
||||
0xE9 0x00C8 # LATIN CAPITAL LETTER E WITH GRAVE
|
||||
0xEA 0x00CD # LATIN CAPITAL LETTER I WITH ACUTE
|
||||
0xEB 0x00CE # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
|
||||
0xEC 0x00CF # LATIN CAPITAL LETTER I WITH DIAERESIS
|
||||
0xED 0x00CC # LATIN CAPITAL LETTER I WITH GRAVE
|
||||
0xEE 0x00D3 # LATIN CAPITAL LETTER O WITH ACUTE
|
||||
0xEF 0x00D4 # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
|
||||
0xF0 0x0111 # LATIN SMALL LETTER D WITH STROKE
|
||||
0xF1 0x00D2 # LATIN CAPITAL LETTER O WITH GRAVE
|
||||
0xF2 0x00DA # LATIN CAPITAL LETTER U WITH ACUTE
|
||||
0xF3 0x00DB # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
|
||||
0xF4 0x00D9 # LATIN CAPITAL LETTER U WITH GRAVE
|
||||
0xF5 0x0131 # LATIN SMALL LETTER DOTLESS I
|
||||
0xF6 0x02C6 # MODIFIER LETTER CIRCUMFLEX ACCENT
|
||||
0xF7 0x02DC # SMALL TILDE
|
||||
0xF8 0x00AF # MACRON
|
||||
0xF9 0x03C0 # GREEK SMALL LETTER PI
|
||||
0xFA 0x00CB # LATIN CAPITAL LETTER E WITH DIAERESIS
|
||||
0xFB 0x02DA # RING ABOVE
|
||||
0xFC 0x00B8 # CEDILLA
|
||||
0xFD 0x00CA # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
|
||||
0xFE 0x00E6 # LATIN SMALL LETTER AE
|
||||
0xFF 0x02C7 # CARON
|
352
charmap/CYRILLIC.TXT
Normal file
352
charmap/CYRILLIC.TXT
Normal file
@ -0,0 +1,352 @@
|
||||
#=======================================================================
|
||||
# File name: CYRILLIC.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Cyrillic
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c03 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update URLs, notes. Matches internal
|
||||
# utom<b2>.
|
||||
# b02 1999-Sep-22 Encoding changed for Mac OS 9.0 to merge
|
||||
# with Mac OS Ukrainian and support EURO SIGN;
|
||||
# Change mappings for 0xA2, 0xB6, and 0xFF.
|
||||
# Update contact e-mail address. Matches
|
||||
# internal utom<b2>, ufrm<b2>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n05 1998-Feb-05 Update header comments to new format; no
|
||||
# mapping changes. Matches internal utom<n3>,
|
||||
# ufrm<n13>, and Text Encoding Converter
|
||||
# version 1.3.
|
||||
# n03 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n5>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Cyrillic code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Cyrillic code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Cyrillic character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Cyrillic:
|
||||
# -------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported directly in programming
|
||||
# interfaces for QuickDraw Text, the Script Manager, and related
|
||||
# Text Utilities. For other purposes it is supported via transcoding
|
||||
# to and from Unicode.
|
||||
#
|
||||
# This is the "Euro sign" version of Mac Cyrillic for Mac OS 9.0 and
|
||||
# later. Before Mac OS 9.0, there were two separate Slavic Cyrillic
|
||||
# encodings:
|
||||
#
|
||||
# 1. The Cyrillic currency sign variant (used for localized Russian
|
||||
# and Bulgarian systems), which had the following:
|
||||
# 0xA2 U+00A2 CENT SIGN
|
||||
# 0xB6 U+2202 PARTIAL DIFFERENTIAL
|
||||
# 0xFF U+00A4 CURRENCY SIGN
|
||||
#
|
||||
# 2. The Ukrainian currency sign variant (used for localized Ukrainian
|
||||
# systems and the pre-9.0 Cyrillic Language Kit), which had the
|
||||
# following:
|
||||
# 0xA2 U+0490 CYRILLIC CAPITAL LETTER GHE WITH UPTURN
|
||||
# 0xB6 U+0491 CYRILLIC SMALL LETTER GHE WITH UPTURN
|
||||
# 0xFF U+00A4 CURRENCY SIGN
|
||||
#
|
||||
# This new Cyrillic Euro sign version is based on the old Ukrainian
|
||||
# currency sign variant, with 0xFF changed to be EURO SIGN.
|
||||
#
|
||||
# The Mac OS Cyrillic encoding includes the Cyrillic letter repertoire
|
||||
# of ISO 8859-5 (although not at the same code points). This covers
|
||||
# most of the Slavic languages written in Cyrillic script.
|
||||
#
|
||||
# The Mac OS Cyrillic encoding also includes a number of characters
|
||||
# needed for the Mac OS user interface and localization (e.g.
|
||||
# ellipsis, bullet, copyright sign). All of the characters in Mac OS
|
||||
# Cyrillic that are also in the Mac OS Roman encoding are at the
|
||||
# same code point in both; this improves application compatibility.
|
||||
#
|
||||
# Note: There is a common Ukrainian glyph variation in which the glyph
|
||||
# for CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I may or may not
|
||||
# have a dot above.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version n05 to version b02:
|
||||
#
|
||||
# - Encoding changed for Mac OS 9.0 to merge with Mac OS Ukrainian and
|
||||
# support EURO SIGN. 0xA2 changed from U+00A2 to U+0490; 0xB6 changed
|
||||
# from U+2202 to U+0491; 0xFF changed from U+00A4 to U+20AC.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x0410 # CYRILLIC CAPITAL LETTER A
|
||||
0x81 0x0411 # CYRILLIC CAPITAL LETTER BE
|
||||
0x82 0x0412 # CYRILLIC CAPITAL LETTER VE
|
||||
0x83 0x0413 # CYRILLIC CAPITAL LETTER GHE
|
||||
0x84 0x0414 # CYRILLIC CAPITAL LETTER DE
|
||||
0x85 0x0415 # CYRILLIC CAPITAL LETTER IE
|
||||
0x86 0x0416 # CYRILLIC CAPITAL LETTER ZHE
|
||||
0x87 0x0417 # CYRILLIC CAPITAL LETTER ZE
|
||||
0x88 0x0418 # CYRILLIC CAPITAL LETTER I
|
||||
0x89 0x0419 # CYRILLIC CAPITAL LETTER SHORT I
|
||||
0x8A 0x041A # CYRILLIC CAPITAL LETTER KA
|
||||
0x8B 0x041B # CYRILLIC CAPITAL LETTER EL
|
||||
0x8C 0x041C # CYRILLIC CAPITAL LETTER EM
|
||||
0x8D 0x041D # CYRILLIC CAPITAL LETTER EN
|
||||
0x8E 0x041E # CYRILLIC CAPITAL LETTER O
|
||||
0x8F 0x041F # CYRILLIC CAPITAL LETTER PE
|
||||
0x90 0x0420 # CYRILLIC CAPITAL LETTER ER
|
||||
0x91 0x0421 # CYRILLIC CAPITAL LETTER ES
|
||||
0x92 0x0422 # CYRILLIC CAPITAL LETTER TE
|
||||
0x93 0x0423 # CYRILLIC CAPITAL LETTER U
|
||||
0x94 0x0424 # CYRILLIC CAPITAL LETTER EF
|
||||
0x95 0x0425 # CYRILLIC CAPITAL LETTER HA
|
||||
0x96 0x0426 # CYRILLIC CAPITAL LETTER TSE
|
||||
0x97 0x0427 # CYRILLIC CAPITAL LETTER CHE
|
||||
0x98 0x0428 # CYRILLIC CAPITAL LETTER SHA
|
||||
0x99 0x0429 # CYRILLIC CAPITAL LETTER SHCHA
|
||||
0x9A 0x042A # CYRILLIC CAPITAL LETTER HARD SIGN
|
||||
0x9B 0x042B # CYRILLIC CAPITAL LETTER YERU
|
||||
0x9C 0x042C # CYRILLIC CAPITAL LETTER SOFT SIGN
|
||||
0x9D 0x042D # CYRILLIC CAPITAL LETTER E
|
||||
0x9E 0x042E # CYRILLIC CAPITAL LETTER YU
|
||||
0x9F 0x042F # CYRILLIC CAPITAL LETTER YA
|
||||
0xA0 0x2020 # DAGGER
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x0490 # CYRILLIC CAPITAL LETTER GHE WITH UPTURN
|
||||
0xA3 0x00A3 # POUND SIGN
|
||||
0xA4 0x00A7 # SECTION SIGN
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x0406 # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x0402 # CYRILLIC CAPITAL LETTER DJE
|
||||
0xAC 0x0452 # CYRILLIC SMALL LETTER DJE
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x0403 # CYRILLIC CAPITAL LETTER GJE
|
||||
0xAF 0x0453 # CYRILLIC SMALL LETTER GJE
|
||||
0xB0 0x221E # INFINITY
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x0456 # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
|
||||
0xB5 0x00B5 # MICRO SIGN
|
||||
0xB6 0x0491 # CYRILLIC SMALL LETTER GHE WITH UPTURN
|
||||
0xB7 0x0408 # CYRILLIC CAPITAL LETTER JE
|
||||
0xB8 0x0404 # CYRILLIC CAPITAL LETTER UKRAINIAN IE
|
||||
0xB9 0x0454 # CYRILLIC SMALL LETTER UKRAINIAN IE
|
||||
0xBA 0x0407 # CYRILLIC CAPITAL LETTER YI
|
||||
0xBB 0x0457 # CYRILLIC SMALL LETTER YI
|
||||
0xBC 0x0409 # CYRILLIC CAPITAL LETTER LJE
|
||||
0xBD 0x0459 # CYRILLIC SMALL LETTER LJE
|
||||
0xBE 0x040A # CYRILLIC CAPITAL LETTER NJE
|
||||
0xBF 0x045A # CYRILLIC SMALL LETTER NJE
|
||||
0xC0 0x0458 # CYRILLIC SMALL LETTER JE
|
||||
0xC1 0x0405 # CYRILLIC CAPITAL LETTER DZE
|
||||
0xC2 0x00AC # NOT SIGN
|
||||
0xC3 0x221A # SQUARE ROOT
|
||||
0xC4 0x0192 # LATIN SMALL LETTER F WITH HOOK
|
||||
0xC5 0x2248 # ALMOST EQUAL TO
|
||||
0xC6 0x2206 # INCREMENT
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x040B # CYRILLIC CAPITAL LETTER TSHE
|
||||
0xCC 0x045B # CYRILLIC SMALL LETTER TSHE
|
||||
0xCD 0x040C # CYRILLIC CAPITAL LETTER KJE
|
||||
0xCE 0x045C # CYRILLIC SMALL LETTER KJE
|
||||
0xCF 0x0455 # CYRILLIC SMALL LETTER DZE
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x00F7 # DIVISION SIGN
|
||||
0xD7 0x201E # DOUBLE LOW-9 QUOTATION MARK
|
||||
0xD8 0x040E # CYRILLIC CAPITAL LETTER SHORT U
|
||||
0xD9 0x045E # CYRILLIC SMALL LETTER SHORT U
|
||||
0xDA 0x040F # CYRILLIC CAPITAL LETTER DZHE
|
||||
0xDB 0x045F # CYRILLIC SMALL LETTER DZHE
|
||||
0xDC 0x2116 # NUMERO SIGN
|
||||
0xDD 0x0401 # CYRILLIC CAPITAL LETTER IO
|
||||
0xDE 0x0451 # CYRILLIC SMALL LETTER IO
|
||||
0xDF 0x044F # CYRILLIC SMALL LETTER YA
|
||||
0xE0 0x0430 # CYRILLIC SMALL LETTER A
|
||||
0xE1 0x0431 # CYRILLIC SMALL LETTER BE
|
||||
0xE2 0x0432 # CYRILLIC SMALL LETTER VE
|
||||
0xE3 0x0433 # CYRILLIC SMALL LETTER GHE
|
||||
0xE4 0x0434 # CYRILLIC SMALL LETTER DE
|
||||
0xE5 0x0435 # CYRILLIC SMALL LETTER IE
|
||||
0xE6 0x0436 # CYRILLIC SMALL LETTER ZHE
|
||||
0xE7 0x0437 # CYRILLIC SMALL LETTER ZE
|
||||
0xE8 0x0438 # CYRILLIC SMALL LETTER I
|
||||
0xE9 0x0439 # CYRILLIC SMALL LETTER SHORT I
|
||||
0xEA 0x043A # CYRILLIC SMALL LETTER KA
|
||||
0xEB 0x043B # CYRILLIC SMALL LETTER EL
|
||||
0xEC 0x043C # CYRILLIC SMALL LETTER EM
|
||||
0xED 0x043D # CYRILLIC SMALL LETTER EN
|
||||
0xEE 0x043E # CYRILLIC SMALL LETTER O
|
||||
0xEF 0x043F # CYRILLIC SMALL LETTER PE
|
||||
0xF0 0x0440 # CYRILLIC SMALL LETTER ER
|
||||
0xF1 0x0441 # CYRILLIC SMALL LETTER ES
|
||||
0xF2 0x0442 # CYRILLIC SMALL LETTER TE
|
||||
0xF3 0x0443 # CYRILLIC SMALL LETTER U
|
||||
0xF4 0x0444 # CYRILLIC SMALL LETTER EF
|
||||
0xF5 0x0445 # CYRILLIC SMALL LETTER HA
|
||||
0xF6 0x0446 # CYRILLIC SMALL LETTER TSE
|
||||
0xF7 0x0447 # CYRILLIC SMALL LETTER CHE
|
||||
0xF8 0x0448 # CYRILLIC SMALL LETTER SHA
|
||||
0xF9 0x0449 # CYRILLIC SMALL LETTER SHCHA
|
||||
0xFA 0x044A # CYRILLIC SMALL LETTER HARD SIGN
|
||||
0xFB 0x044B # CYRILLIC SMALL LETTER YERU
|
||||
0xFC 0x044C # CYRILLIC SMALL LETTER SOFT SIGN
|
||||
0xFD 0x044D # CYRILLIC SMALL LETTER E
|
||||
0xFE 0x044E # CYRILLIC SMALL LETTER YU
|
||||
0xFF 0x20AC # EURO SIGN
|
447
charmap/DEVANAGA.TXT
Normal file
447
charmap/DEVANAGA.TXT
Normal file
@ -0,0 +1,447 @@
|
||||
#=======================================================================
|
||||
# File name: DEVANAGA.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Devanagari
|
||||
# encoding to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments; add section on
|
||||
# roundtrip considerations. Matches internal
|
||||
# xml <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update URLs. Matches internal utom<b1>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n04 1998-Feb-05 First version; matches internal utom<n9>,
|
||||
# ufrm<n15>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Devanagari code or code sequence
|
||||
# (in hex as 0xNN or 0xNN+0xNN)
|
||||
# Column #2 is the corresponding Unicode or Unicode sequence
|
||||
# (in hex as 0xNNNN or 0xNNNN+0xNNNN).
|
||||
# Column #3 is a comment containing the Unicode name or sequence
|
||||
# of names. In some cases an additional comment follows the
|
||||
# Unicode name(s).
|
||||
#
|
||||
# The entries are in two sections. The first section is for pairs of
|
||||
# Mac OS Devanagari code points that must be mapped in a special way.
|
||||
# The second section maps individual code points.
|
||||
#
|
||||
# Within each section, the entries are in Mac OS Devanagari code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Devanagari character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Devanagari:
|
||||
# ---------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# Mac OS Devanagari is based on IS 13194:1991 (ISCII-91), with the
|
||||
# addition of several punctuation and symbol characters. However,
|
||||
# Mac OS Devanagari does not support the ATR (attribute) mechanism of
|
||||
# ISCII-91.
|
||||
#
|
||||
# 1. ISCII-91 features in Mac OS Devanagari include:
|
||||
#
|
||||
# a) Overloading of nukta
|
||||
#
|
||||
# In addition to using the nukta (0xE9) like a combining dot below,
|
||||
# nukta is overloaded to function as a general character modifier.
|
||||
# In this role, certain code points followed by 0xE9 are treated as
|
||||
# a two-byte code point representing a character which may be
|
||||
# rather different than the characters represented by either of
|
||||
# the code points alone. For example, the character DEVANAGARI OM
|
||||
# (U+0950) is represented in ISCII-91 as candrabindu + nukta.
|
||||
#
|
||||
# b) Explicit halant and soft halant
|
||||
#
|
||||
# A double halant (0xE8 + 0xE8) constitutes an "explicit halant",
|
||||
# which will always appear as a halant instead of causing formation
|
||||
# of a ligature or half-form consonant.
|
||||
#
|
||||
# Halant followed by nukta (0xE8 + 0xE9) constitutes a "soft
|
||||
# halant", which prevents formation of a ligature and instead
|
||||
# retains the half-form of the first consonant.
|
||||
#
|
||||
# c) Invisible consonant
|
||||
#
|
||||
# The byte 0xD9 (called INV in ISCII-91) is an invisible consonant:
|
||||
# It behaves like a consonant but has no visible appearance. It is
|
||||
# intended to be used (often in combination with halant) to display
|
||||
# dependent forms in isolation, such as the RA forms or consonant
|
||||
# half-forms.
|
||||
#
|
||||
# d) Extensions for Vedic, etc.
|
||||
#
|
||||
# The byte 0xF0 (called EXT in ISCII-91) followed by any byte in
|
||||
# the range 0xA1-0xEE constitutes a two-byte code point which can
|
||||
# be used to represent additional characters for Vedic (or other
|
||||
# extensions); 0xF0 followed by any other byte value constitutes
|
||||
# malformed text. Mac OS Devanagari supports this mechanism, but
|
||||
# does not currently map any of these two-byte code points to
|
||||
# anything.
|
||||
#
|
||||
# 2. Mac OS Devanagari additions
|
||||
#
|
||||
# Mac OS Devanagari adds characters using the code points
|
||||
# 0x80-0x8A and 0x90-0x91 (the latter are some Devanagari additions
|
||||
# from Unicode).
|
||||
#
|
||||
# 3. Unused code points
|
||||
#
|
||||
# The following code points are currently unused, and are not shown
|
||||
# here: 0x8B-0x8F, 0x92-0xA0, 0xEB-0xEF, 0xFB-0xFF. In addition,
|
||||
# 0xF0 is not shown here, but it has a special function as described
|
||||
# above.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# 1. Mapping the byte pairs
|
||||
#
|
||||
# If one of the following byte values is encountered when mapping
|
||||
# Mac OS Devanagari text - 0xA1, 0xA6, 0xA7, 0xAA, 0xDB, 0xDC, 0xDF,
|
||||
# 0xE8, or 0xEA - then the next byte (if there is one) should be
|
||||
# examined. If the next byte is 0xE9 - or also 0xE8, if the first
|
||||
# byte was 0xE8 - then the byte pair should be mapped using the
|
||||
# first section of the mapping table below. Otherwise, each byte
|
||||
# should be mapped using the second section of the mapping table
|
||||
# below.
|
||||
#
|
||||
# - The Unicode Standard, Version 2.0, specifies how explicit
|
||||
# halant and soft halant should be represented in Unicode;
|
||||
# these mappings are used below.
|
||||
#
|
||||
# If the byte value 0xF0 is encountered when mapping Mac OS
|
||||
# Devanagari text, then the next byte should be examined. If there
|
||||
# is no next byte (e.g. 0xF0 at end of buffer), the mapping
|
||||
# process should indicate incomplete character. If there is a next
|
||||
# byte but it is not in the range 0xA1-0xEE, the mapping process
|
||||
# should indicate malformed text. Otherwise, the mapping process
|
||||
# should treat the byte pair as a valid two-byte code point with no
|
||||
# mapping (e.g. map it to QUESTION MARK, REPLACEMENT CHARACTER,
|
||||
# etc.).
|
||||
#
|
||||
# 2. Mapping the invisible consonant
|
||||
#
|
||||
# It has been suggested that INV in ISCII-91 should map to ZERO
|
||||
# WIDTH NON-JOINER in Unicode. However, this causes problems with
|
||||
# roundtrip fidelity: The ISCII-91 sequences 0xE8+0xE8 and 0xE8+0xD9
|
||||
# would map to the same sequence of Unicode characters. We have
|
||||
# instead mapped INV to LEFT-TO-RIGHT MARK, which avoids these
|
||||
# problems.
|
||||
#
|
||||
# 3. Additional loose mappings from Unicode
|
||||
#
|
||||
# These are not preserved in roundtrip mappings.
|
||||
#
|
||||
# U+0958 0xB3+0xE9 # DEVANAGARI LETTER QA
|
||||
# U+0959 0xB4+0xE9 # DEVANAGARI LETTER KHHA
|
||||
# U+095A 0xB5+0xE9 # DEVANAGARI LETTER GHHA
|
||||
# U+095B 0xBA+0xE9 # DEVANAGARI LETTER ZA
|
||||
# U+095C 0xBF+0xE9 # DEVANAGARI LETTER DDDHA
|
||||
# U+095D 0xC0+0xE9 # DEVANAGARI LETTER RHA
|
||||
# U+095E 0xC9+0xE9 # DEVANAGARI LETTER FA
|
||||
#
|
||||
# 4. Roundtrip considerations when mapping to decomposed Unicode
|
||||
#
|
||||
# Both ISCII-91 (hence Mac OS Devanagari) and Unicode provide multiple
|
||||
# ways of representing certain Devanagari consonants. For example,
|
||||
# DEVANAGARI LETTER NNNA can be represented in Unicode as the single
|
||||
# character 0x0929 or as the sequence 0x0928 0x093C; similarly, this
|
||||
# consonant can be represented in Mac OS Devanagari as 0xC7 or as the
|
||||
# sequence 0xC6 0xE9. This leads to some roundtrip problems. First
|
||||
# note that we have the following mappings without such problems:
|
||||
#
|
||||
# ISCII/ standard decomposition of reverse mapping
|
||||
# Mac OS Unicode mapping standard mapping of decomposition
|
||||
# ------ ----------------------- ---------------- ----------------
|
||||
# 0xC6 0x0928 ... LETTER NA 0x0928 (same) 0xC6
|
||||
# 0xCD 0x092F ... LETTER YA 0x092F (same) 0xCD
|
||||
# 0xCF 0x0930 ... LETTER RA 0x0930 (same) 0xCF
|
||||
# 0xD2 0x0933 ... LETTER LLA 0x0933 (same) 0xD2
|
||||
# 0xE9 0x093C ... SIGN NUKTA 0x093C (same) 0xE9
|
||||
#
|
||||
# However, those mappings above cause roundtrip problems for the
|
||||
# the following mappings if they are decomposed:
|
||||
#
|
||||
# ISCII/ standard decomposition of reverse mapping
|
||||
# Mac OS Unicode mapping standard mapping of decomposition
|
||||
# ------ ----------------------- ---------------- ----------------
|
||||
# 0xC7 0x0929 ... LETTER NNNA 0x0928 0x093C 0xC6 0xE9
|
||||
# 0xCE 0x095F ... LETTER YYA 0x092F 0x093C 0xCD 0xE9
|
||||
# 0xD0 0x0931 ... LETTER RRA 0x0930 0x093C 0xCF 0xE9
|
||||
# 0xD3 0x0934 ... LETTER LLLA 0x0933 0x093C 0xD2 0xE9
|
||||
#
|
||||
# One solution is to use a grouping transcoding hint with the four
|
||||
# decompositions above to mark the decomposed sequence for special
|
||||
# treatment in transcoding. This yields the following mappings to
|
||||
# decomposed Unicode:
|
||||
#
|
||||
# ISCII/ decomposed
|
||||
# Mac OS Unicode mapping
|
||||
# ------ ----------------
|
||||
# 0xC7 0xF860 0x0928 0x093C
|
||||
# 0xCE 0xF860 0x092F 0x093C
|
||||
# 0xD0 0xF860 0x0930 0x093C
|
||||
# 0xD3 0xF860 0x0933 0x093C
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
##################
|
||||
|
||||
# Section 1: Map the following byte pairs as indicated:
|
||||
# (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
|
||||
# (Also see note about 0xF0 in comments above)
|
||||
|
||||
0xA1+0xE9 0x0950 # DEVANAGARI OM
|
||||
0xA6+0xE9 0x090C # DEVANAGARI LETTER VOCALIC L
|
||||
0xA7+0xE9 0x0961 # DEVANAGARI LETTER VOCALIC LL
|
||||
0xAA+0xE9 0x0960 # DEVANAGARI LETTER VOCALIC RR
|
||||
0xDB+0xE9 0x0962 # DEVANAGARI VOWEL SIGN VOCALIC L
|
||||
0xDC+0xE9 0x0963 # DEVANAGARI VOWEL SIGN VOCALIC LL
|
||||
0xDF+0xE9 0x0944 # DEVANAGARI VOWEL SIGN VOCALIC RR
|
||||
0xE8+0xE8 0x094D+0x200C # DEVANAGARI SIGN VIRAMA + ZWNJ # explicit halant
|
||||
0xE8+0xE9 0x094D+0x200D # DEVANAGARI SIGN VIRAMA + ZWJ # soft halant
|
||||
0xEA+0xE9 0x093D # DEVANAGARI SIGN AVAGRAHA
|
||||
|
||||
# Section 2: Map the remaining bytes as follows:
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00D7 # MULTIPLICATION SIGN
|
||||
0x81 0x2212 # MINUS SIGN
|
||||
0x82 0x2013 # EN DASH
|
||||
0x83 0x2014 # EM DASH
|
||||
0x84 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0x85 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0x86 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0x87 0x2022 # BULLET
|
||||
0x88 0x00A9 # COPYRIGHT SIGN
|
||||
0x89 0x00AE # REGISTERED SIGN
|
||||
0x8A 0x2122 # TRADE MARK SIGN
|
||||
#
|
||||
0x90 0x0965 # DEVANAGARI DOUBLE DANDA
|
||||
0x91 0x0970 # DEVANAGARI ABBREVIATION SIGN
|
||||
#
|
||||
0xA1 0x0901 # DEVANAGARI SIGN CANDRABINDU
|
||||
0xA2 0x0902 # DEVANAGARI SIGN ANUSVARA
|
||||
0xA3 0x0903 # DEVANAGARI SIGN VISARGA
|
||||
0xA4 0x0905 # DEVANAGARI LETTER A
|
||||
0xA5 0x0906 # DEVANAGARI LETTER AA
|
||||
0xA6 0x0907 # DEVANAGARI LETTER I
|
||||
0xA7 0x0908 # DEVANAGARI LETTER II
|
||||
0xA8 0x0909 # DEVANAGARI LETTER U
|
||||
0xA9 0x090A # DEVANAGARI LETTER UU
|
||||
0xAA 0x090B # DEVANAGARI LETTER VOCALIC R
|
||||
0xAB 0x090E # DEVANAGARI LETTER SHORT E
|
||||
0xAC 0x090F # DEVANAGARI LETTER E
|
||||
0xAD 0x0910 # DEVANAGARI LETTER AI
|
||||
0xAE 0x090D # DEVANAGARI LETTER CANDRA E
|
||||
0xAF 0x0912 # DEVANAGARI LETTER SHORT O
|
||||
0xB0 0x0913 # DEVANAGARI LETTER O
|
||||
0xB1 0x0914 # DEVANAGARI LETTER AU
|
||||
0xB2 0x0911 # DEVANAGARI LETTER CANDRA O
|
||||
0xB3 0x0915 # DEVANAGARI LETTER KA
|
||||
0xB4 0x0916 # DEVANAGARI LETTER KHA
|
||||
0xB5 0x0917 # DEVANAGARI LETTER GA
|
||||
0xB6 0x0918 # DEVANAGARI LETTER GHA
|
||||
0xB7 0x0919 # DEVANAGARI LETTER NGA
|
||||
0xB8 0x091A # DEVANAGARI LETTER CA
|
||||
0xB9 0x091B # DEVANAGARI LETTER CHA
|
||||
0xBA 0x091C # DEVANAGARI LETTER JA
|
||||
0xBB 0x091D # DEVANAGARI LETTER JHA
|
||||
0xBC 0x091E # DEVANAGARI LETTER NYA
|
||||
0xBD 0x091F # DEVANAGARI LETTER TTA
|
||||
0xBE 0x0920 # DEVANAGARI LETTER TTHA
|
||||
0xBF 0x0921 # DEVANAGARI LETTER DDA
|
||||
0xC0 0x0922 # DEVANAGARI LETTER DDHA
|
||||
0xC1 0x0923 # DEVANAGARI LETTER NNA
|
||||
0xC2 0x0924 # DEVANAGARI LETTER TA
|
||||
0xC3 0x0925 # DEVANAGARI LETTER THA
|
||||
0xC4 0x0926 # DEVANAGARI LETTER DA
|
||||
0xC5 0x0927 # DEVANAGARI LETTER DHA
|
||||
0xC6 0x0928 # DEVANAGARI LETTER NA
|
||||
0xC7 0x0929 # DEVANAGARI LETTER NNNA
|
||||
0xC8 0x092A # DEVANAGARI LETTER PA
|
||||
0xC9 0x092B # DEVANAGARI LETTER PHA
|
||||
0xCA 0x092C # DEVANAGARI LETTER BA
|
||||
0xCB 0x092D # DEVANAGARI LETTER BHA
|
||||
0xCC 0x092E # DEVANAGARI LETTER MA
|
||||
0xCD 0x092F # DEVANAGARI LETTER YA
|
||||
0xCE 0x095F # DEVANAGARI LETTER YYA
|
||||
0xCF 0x0930 # DEVANAGARI LETTER RA
|
||||
0xD0 0x0931 # DEVANAGARI LETTER RRA
|
||||
0xD1 0x0932 # DEVANAGARI LETTER LA
|
||||
0xD2 0x0933 # DEVANAGARI LETTER LLA
|
||||
0xD3 0x0934 # DEVANAGARI LETTER LLLA
|
||||
0xD4 0x0935 # DEVANAGARI LETTER VA
|
||||
0xD5 0x0936 # DEVANAGARI LETTER SHA
|
||||
0xD6 0x0937 # DEVANAGARI LETTER SSA
|
||||
0xD7 0x0938 # DEVANAGARI LETTER SA
|
||||
0xD8 0x0939 # DEVANAGARI LETTER HA
|
||||
0xD9 0x200E # LEFT-TO-RIGHT MARK # invisible consonant
|
||||
0xDA 0x093E # DEVANAGARI VOWEL SIGN AA
|
||||
0xDB 0x093F # DEVANAGARI VOWEL SIGN I
|
||||
0xDC 0x0940 # DEVANAGARI VOWEL SIGN II
|
||||
0xDD 0x0941 # DEVANAGARI VOWEL SIGN U
|
||||
0xDE 0x0942 # DEVANAGARI VOWEL SIGN UU
|
||||
0xDF 0x0943 # DEVANAGARI VOWEL SIGN VOCALIC R
|
||||
0xE0 0x0946 # DEVANAGARI VOWEL SIGN SHORT E
|
||||
0xE1 0x0947 # DEVANAGARI VOWEL SIGN E
|
||||
0xE2 0x0948 # DEVANAGARI VOWEL SIGN AI
|
||||
0xE3 0x0945 # DEVANAGARI VOWEL SIGN CANDRA E
|
||||
0xE4 0x094A # DEVANAGARI VOWEL SIGN SHORT O
|
||||
0xE5 0x094B # DEVANAGARI VOWEL SIGN O
|
||||
0xE6 0x094C # DEVANAGARI VOWEL SIGN AU
|
||||
0xE7 0x0949 # DEVANAGARI VOWEL SIGN CANDRA O
|
||||
0xE8 0x094D # DEVANAGARI SIGN VIRAMA # halant
|
||||
0xE9 0x093C # DEVANAGARI SIGN NUKTA
|
||||
0xEA 0x0964 # DEVANAGARI DANDA
|
||||
#
|
||||
0xF1 0x0966 # DEVANAGARI DIGIT ZERO
|
||||
0xF2 0x0967 # DEVANAGARI DIGIT ONE
|
||||
0xF3 0x0968 # DEVANAGARI DIGIT TWO
|
||||
0xF4 0x0969 # DEVANAGARI DIGIT THREE
|
||||
0xF5 0x096A # DEVANAGARI DIGIT FOUR
|
||||
0xF6 0x096B # DEVANAGARI DIGIT FIVE
|
||||
0xF7 0x096C # DEVANAGARI DIGIT SIX
|
||||
0xF8 0x096D # DEVANAGARI DIGIT SEVEN
|
||||
0xF9 0x096E # DEVANAGARI DIGIT EIGHT
|
||||
0xFA 0x096F # DEVANAGARI DIGIT NINE
|
329
charmap/DINGBATS.TXT
Normal file
329
charmap/DINGBATS.TXT
Normal file
@ -0,0 +1,329 @@
|
||||
#=======================================================================
|
||||
# File name: DINGBATS.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Dingbats
|
||||
# character set to Unicode 3.2 and later.
|
||||
#
|
||||
# Copyright: (c) 1994-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update mappings for 0x80-0x8D to use new
|
||||
# Unicode 3.2 characters. Update URLs, notes.
|
||||
# Matches internal utom<b2>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n05 1998-Feb-05 Update to match internal utom<n4>, ufrm<n14>,
|
||||
# and Text Encoding Converter version 1.3:
|
||||
# Change all mappings to single corporate-zone
|
||||
# Unicodes to either use standard Unicodes
|
||||
# or standard Unicodes plus transcoding hints;
|
||||
# see details below. Also update header
|
||||
# comments to new format.
|
||||
# n03 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n4>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Dingbats code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode or Unicode sequence
|
||||
# (in hex as 0xNNNN).
|
||||
# Column #3 is a comment containing the Unicode name.
|
||||
# In some cases an additional comment follows the Unicode name.
|
||||
#
|
||||
# The entries are in Mac OS Dingbats code order.
|
||||
#
|
||||
# Some of these mappings require the use of corporate characters.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Dingbats character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Dingbats:
|
||||
# -------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported directly in programming
|
||||
# interfaces for QuickDraw Text, the Script Manager, and related
|
||||
# Text Utilities. For other purposes it is supported via transcoding
|
||||
# to and from Unicode.
|
||||
#
|
||||
# The Mac OS Dingbats encoding shares the script code smRoman
|
||||
# (0) with the standard Mac OS Roman encoding. To determine if
|
||||
# the Dingbats encoding is being used, you must check if the
|
||||
# font name is "Zapf Dingbats".
|
||||
#
|
||||
# The layout of the Dingbats character set is identical to or
|
||||
# a superset of the layout of the Adobe Zapf Dingbats encoding
|
||||
# vector.
|
||||
#
|
||||
# The following code points are unused, and are not shown here:
|
||||
# 0x8E-0xA0, 0xF0, 0xFF.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version b02 to version b03/c01:
|
||||
#
|
||||
# - The mappings for the following Mac OS Dingbats characters
|
||||
# were changed to use standard Unicode characters added for
|
||||
# Unicode 3.2: 0x80-0x8D.
|
||||
#
|
||||
# Changes from version n03 to version n05:
|
||||
#
|
||||
# - The mappings for the following Mac OS Dingbats characters
|
||||
# were changed from single corporate-zone Unicode characters
|
||||
# to standard Unicode characters:
|
||||
# 0x80-0x81, 0x84-0x87, 0x8A-0x8D.
|
||||
#
|
||||
# - The mappings for the following Mac OS Dingbats characters
|
||||
# were changed from single corporate-zone Unicode characters
|
||||
# to combinations of a standard Unicode and a transcoding hint:
|
||||
# 0x82-0x83, 0x88-0x89.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x2701 # UPPER BLADE SCISSORS
|
||||
0x22 0x2702 # BLACK SCISSORS
|
||||
0x23 0x2703 # LOWER BLADE SCISSORS
|
||||
0x24 0x2704 # WHITE SCISSORS
|
||||
0x25 0x260E # BLACK TELEPHONE
|
||||
0x26 0x2706 # TELEPHONE LOCATION SIGN
|
||||
0x27 0x2707 # TAPE DRIVE
|
||||
0x28 0x2708 # AIRPLANE
|
||||
0x29 0x2709 # ENVELOPE
|
||||
0x2A 0x261B # BLACK RIGHT POINTING INDEX
|
||||
0x2B 0x261E # WHITE RIGHT POINTING INDEX
|
||||
0x2C 0x270C # VICTORY HAND
|
||||
0x2D 0x270D # WRITING HAND
|
||||
0x2E 0x270E # LOWER RIGHT PENCIL
|
||||
0x2F 0x270F # PENCIL
|
||||
0x30 0x2710 # UPPER RIGHT PENCIL
|
||||
0x31 0x2711 # WHITE NIB
|
||||
0x32 0x2712 # BLACK NIB
|
||||
0x33 0x2713 # CHECK MARK
|
||||
0x34 0x2714 # HEAVY CHECK MARK
|
||||
0x35 0x2715 # MULTIPLICATION X
|
||||
0x36 0x2716 # HEAVY MULTIPLICATION X
|
||||
0x37 0x2717 # BALLOT X
|
||||
0x38 0x2718 # HEAVY BALLOT X
|
||||
0x39 0x2719 # OUTLINED GREEK CROSS
|
||||
0x3A 0x271A # HEAVY GREEK CROSS
|
||||
0x3B 0x271B # OPEN CENTRE CROSS
|
||||
0x3C 0x271C # HEAVY OPEN CENTRE CROSS
|
||||
0x3D 0x271D # LATIN CROSS
|
||||
0x3E 0x271E # SHADOWED WHITE LATIN CROSS
|
||||
0x3F 0x271F # OUTLINED LATIN CROSS
|
||||
0x40 0x2720 # MALTESE CROSS
|
||||
0x41 0x2721 # STAR OF DAVID
|
||||
0x42 0x2722 # FOUR TEARDROP-SPOKED ASTERISK
|
||||
0x43 0x2723 # FOUR BALLOON-SPOKED ASTERISK
|
||||
0x44 0x2724 # HEAVY FOUR BALLOON-SPOKED ASTERISK
|
||||
0x45 0x2725 # FOUR CLUB-SPOKED ASTERISK
|
||||
0x46 0x2726 # BLACK FOUR POINTED STAR
|
||||
0x47 0x2727 # WHITE FOUR POINTED STAR
|
||||
0x48 0x2605 # BLACK STAR
|
||||
0x49 0x2729 # STRESS OUTLINED WHITE STAR
|
||||
0x4A 0x272A # CIRCLED WHITE STAR
|
||||
0x4B 0x272B # OPEN CENTRE BLACK STAR
|
||||
0x4C 0x272C # BLACK CENTRE WHITE STAR
|
||||
0x4D 0x272D # OUTLINED BLACK STAR
|
||||
0x4E 0x272E # HEAVY OUTLINED BLACK STAR
|
||||
0x4F 0x272F # PINWHEEL STAR
|
||||
0x50 0x2730 # SHADOWED WHITE STAR
|
||||
0x51 0x2731 # HEAVY ASTERISK
|
||||
0x52 0x2732 # OPEN CENTRE ASTERISK
|
||||
0x53 0x2733 # EIGHT SPOKED ASTERISK
|
||||
0x54 0x2734 # EIGHT POINTED BLACK STAR
|
||||
0x55 0x2735 # EIGHT POINTED PINWHEEL STAR
|
||||
0x56 0x2736 # SIX POINTED BLACK STAR
|
||||
0x57 0x2737 # EIGHT POINTED RECTILINEAR BLACK STAR
|
||||
0x58 0x2738 # HEAVY EIGHT POINTED RECTILINEAR BLACK STAR
|
||||
0x59 0x2739 # TWELVE POINTED BLACK STAR
|
||||
0x5A 0x273A # SIXTEEN POINTED ASTERISK
|
||||
0x5B 0x273B # TEARDROP-SPOKED ASTERISK
|
||||
0x5C 0x273C # OPEN CENTRE TEARDROP-SPOKED ASTERISK
|
||||
0x5D 0x273D # HEAVY TEARDROP-SPOKED ASTERISK
|
||||
0x5E 0x273E # SIX PETALLED BLACK AND WHITE FLORETTE
|
||||
0x5F 0x273F # BLACK FLORETTE
|
||||
0x60 0x2740 # WHITE FLORETTE
|
||||
0x61 0x2741 # EIGHT PETALLED OUTLINED BLACK FLORETTE
|
||||
0x62 0x2742 # CIRCLED OPEN CENTRE EIGHT POINTED STAR
|
||||
0x63 0x2743 # HEAVY TEARDROP-SPOKED PINWHEEL ASTERISK
|
||||
0x64 0x2744 # SNOWFLAKE
|
||||
0x65 0x2745 # TIGHT TRIFOLIATE SNOWFLAKE
|
||||
0x66 0x2746 # HEAVY CHEVRON SNOWFLAKE
|
||||
0x67 0x2747 # SPARKLE
|
||||
0x68 0x2748 # HEAVY SPARKLE
|
||||
0x69 0x2749 # BALLOON-SPOKED ASTERISK
|
||||
0x6A 0x274A # EIGHT TEARDROP-SPOKED PROPELLER ASTERISK
|
||||
0x6B 0x274B # HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK
|
||||
0x6C 0x25CF # BLACK CIRCLE
|
||||
0x6D 0x274D # SHADOWED WHITE CIRCLE
|
||||
0x6E 0x25A0 # BLACK SQUARE
|
||||
0x6F 0x274F # LOWER RIGHT DROP-SHADOWED WHITE SQUARE
|
||||
0x70 0x2750 # UPPER RIGHT DROP-SHADOWED WHITE SQUARE
|
||||
0x71 0x2751 # LOWER RIGHT SHADOWED WHITE SQUARE
|
||||
0x72 0x2752 # UPPER RIGHT SHADOWED WHITE SQUARE
|
||||
0x73 0x25B2 # BLACK UP-POINTING TRIANGLE
|
||||
0x74 0x25BC # BLACK DOWN-POINTING TRIANGLE
|
||||
0x75 0x25C6 # BLACK DIAMOND
|
||||
0x76 0x2756 # BLACK DIAMOND MINUS WHITE X
|
||||
0x77 0x25D7 # RIGHT HALF BLACK CIRCLE
|
||||
0x78 0x2758 # LIGHT VERTICAL BAR
|
||||
0x79 0x2759 # MEDIUM VERTICAL BAR
|
||||
0x7A 0x275A # HEAVY VERTICAL BAR
|
||||
0x7B 0x275B # HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT
|
||||
0x7C 0x275C # HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT
|
||||
0x7D 0x275D # HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT
|
||||
0x7E 0x275E # HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT
|
||||
#
|
||||
0x80 0x2768 # MEDIUM LEFT PARENTHESIS ORNAMENT # for Unicode 3.2 and later
|
||||
0x81 0x2769 # MEDIUM RIGHT PARENTHESIS ORNAMENT # for Unicode 3.2 and later
|
||||
0x82 0x276A # MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT # for Unicode 3.2 and later
|
||||
0x83 0x276B # MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT # for Unicode 3.2 and later
|
||||
0x84 0x276C # MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT # for Unicode 3.2 and later
|
||||
0x85 0x276D # MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT # for Unicode 3.2 and later
|
||||
0x86 0x276E # HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT # for Unicode 3.2 and later
|
||||
0x87 0x276F # HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT # for Unicode 3.2 and later
|
||||
0x88 0x2770 # HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT # for Unicode 3.2 and later
|
||||
0x89 0x2771 # HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT # for Unicode 3.2 and later
|
||||
0x8A 0x2772 # LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT # for Unicode 3.2 and later
|
||||
0x8B 0x2773 # LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT # for Unicode 3.2 and later
|
||||
0x8C 0x2774 # MEDIUM LEFT CURLY BRACKET ORNAMENT # for Unicode 3.2 and later
|
||||
0x8D 0x2775 # MEDIUM RIGHT CURLY BRACKET ORNAMENT # for Unicode 3.2 and later
|
||||
#
|
||||
0xA1 0x2761 # CURVED STEM PARAGRAPH SIGN ORNAMENT
|
||||
0xA2 0x2762 # HEAVY EXCLAMATION MARK ORNAMENT
|
||||
0xA3 0x2763 # HEAVY HEART EXCLAMATION MARK ORNAMENT
|
||||
0xA4 0x2764 # HEAVY BLACK HEART
|
||||
0xA5 0x2765 # ROTATED HEAVY BLACK HEART BULLET
|
||||
0xA6 0x2766 # FLORAL HEART
|
||||
0xA7 0x2767 # ROTATED FLORAL HEART BULLET
|
||||
0xA8 0x2663 # BLACK CLUB SUIT
|
||||
0xA9 0x2666 # BLACK DIAMOND SUIT
|
||||
0xAA 0x2665 # BLACK HEART SUIT
|
||||
0xAB 0x2660 # BLACK SPADE SUIT
|
||||
0xAC 0x2460 # CIRCLED DIGIT ONE
|
||||
0xAD 0x2461 # CIRCLED DIGIT TWO
|
||||
0xAE 0x2462 # CIRCLED DIGIT THREE
|
||||
0xAF 0x2463 # CIRCLED DIGIT FOUR
|
||||
0xB0 0x2464 # CIRCLED DIGIT FIVE
|
||||
0xB1 0x2465 # CIRCLED DIGIT SIX
|
||||
0xB2 0x2466 # CIRCLED DIGIT SEVEN
|
||||
0xB3 0x2467 # CIRCLED DIGIT EIGHT
|
||||
0xB4 0x2468 # CIRCLED DIGIT NINE
|
||||
0xB5 0x2469 # CIRCLED NUMBER TEN
|
||||
0xB6 0x2776 # DINGBAT NEGATIVE CIRCLED DIGIT ONE
|
||||
0xB7 0x2777 # DINGBAT NEGATIVE CIRCLED DIGIT TWO
|
||||
0xB8 0x2778 # DINGBAT NEGATIVE CIRCLED DIGIT THREE
|
||||
0xB9 0x2779 # DINGBAT NEGATIVE CIRCLED DIGIT FOUR
|
||||
0xBA 0x277A # DINGBAT NEGATIVE CIRCLED DIGIT FIVE
|
||||
0xBB 0x277B # DINGBAT NEGATIVE CIRCLED DIGIT SIX
|
||||
0xBC 0x277C # DINGBAT NEGATIVE CIRCLED DIGIT SEVEN
|
||||
0xBD 0x277D # DINGBAT NEGATIVE CIRCLED DIGIT EIGHT
|
||||
0xBE 0x277E # DINGBAT NEGATIVE CIRCLED DIGIT NINE
|
||||
0xBF 0x277F # DINGBAT NEGATIVE CIRCLED NUMBER TEN
|
||||
0xC0 0x2780 # DINGBAT CIRCLED SANS-SERIF DIGIT ONE
|
||||
0xC1 0x2781 # DINGBAT CIRCLED SANS-SERIF DIGIT TWO
|
||||
0xC2 0x2782 # DINGBAT CIRCLED SANS-SERIF DIGIT THREE
|
||||
0xC3 0x2783 # DINGBAT CIRCLED SANS-SERIF DIGIT FOUR
|
||||
0xC4 0x2784 # DINGBAT CIRCLED SANS-SERIF DIGIT FIVE
|
||||
0xC5 0x2785 # DINGBAT CIRCLED SANS-SERIF DIGIT SIX
|
||||
0xC6 0x2786 # DINGBAT CIRCLED SANS-SERIF DIGIT SEVEN
|
||||
0xC7 0x2787 # DINGBAT CIRCLED SANS-SERIF DIGIT EIGHT
|
||||
0xC8 0x2788 # DINGBAT CIRCLED SANS-SERIF DIGIT NINE
|
||||
0xC9 0x2789 # DINGBAT CIRCLED SANS-SERIF NUMBER TEN
|
||||
0xCA 0x278A # DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE
|
||||
0xCB 0x278B # DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT TWO
|
||||
0xCC 0x278C # DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT THREE
|
||||
0xCD 0x278D # DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FOUR
|
||||
0xCE 0x278E # DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FIVE
|
||||
0xCF 0x278F # DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SIX
|
||||
0xD0 0x2790 # DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SEVEN
|
||||
0xD1 0x2791 # DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT EIGHT
|
||||
0xD2 0x2792 # DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE
|
||||
0xD3 0x2793 # DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN
|
||||
0xD4 0x2794 # HEAVY WIDE-HEADED RIGHTWARDS ARROW
|
||||
0xD5 0x2192 # RIGHTWARDS ARROW
|
||||
0xD6 0x2194 # LEFT RIGHT ARROW
|
||||
0xD7 0x2195 # UP DOWN ARROW
|
||||
0xD8 0x2798 # HEAVY SOUTH EAST ARROW
|
||||
0xD9 0x2799 # HEAVY RIGHTWARDS ARROW
|
||||
0xDA 0x279A # HEAVY NORTH EAST ARROW
|
||||
0xDB 0x279B # DRAFTING POINT RIGHTWARDS ARROW
|
||||
0xDC 0x279C # HEAVY ROUND-TIPPED RIGHTWARDS ARROW
|
||||
0xDD 0x279D # TRIANGLE-HEADED RIGHTWARDS ARROW
|
||||
0xDE 0x279E # HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW
|
||||
0xDF 0x279F # DASHED TRIANGLE-HEADED RIGHTWARDS ARROW
|
||||
0xE0 0x27A0 # HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW
|
||||
0xE1 0x27A1 # BLACK RIGHTWARDS ARROW
|
||||
0xE2 0x27A2 # THREE-D TOP-LIGHTED RIGHTWARDS ARROWHEAD
|
||||
0xE3 0x27A3 # THREE-D BOTTOM-LIGHTED RIGHTWARDS ARROWHEAD
|
||||
0xE4 0x27A4 # BLACK RIGHTWARDS ARROWHEAD
|
||||
0xE5 0x27A5 # HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW
|
||||
0xE6 0x27A6 # HEAVY BLACK CURVED UPWARDS AND RIGHTWARDS ARROW
|
||||
0xE7 0x27A7 # SQUAT BLACK RIGHTWARDS ARROW
|
||||
0xE8 0x27A8 # HEAVY CONCAVE-POINTED BLACK RIGHTWARDS ARROW
|
||||
0xE9 0x27A9 # RIGHT-SHADED WHITE RIGHTWARDS ARROW
|
||||
0xEA 0x27AA # LEFT-SHADED WHITE RIGHTWARDS ARROW
|
||||
0xEB 0x27AB # BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW
|
||||
0xEC 0x27AC # FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW
|
||||
0xED 0x27AD # HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
|
||||
0xEE 0x27AE # HEAVY UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
|
||||
0xEF 0x27AF # NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
|
||||
#
|
||||
0xF1 0x27B1 # NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW
|
||||
0xF2 0x27B2 # CIRCLED HEAVY WHITE RIGHTWARDS ARROW
|
||||
0xF3 0x27B3 # WHITE-FEATHERED RIGHTWARDS ARROW
|
||||
0xF4 0x27B4 # BLACK-FEATHERED SOUTH EAST ARROW
|
||||
0xF5 0x27B5 # BLACK-FEATHERED RIGHTWARDS ARROW
|
||||
0xF6 0x27B6 # BLACK-FEATHERED NORTH EAST ARROW
|
||||
0xF7 0x27B7 # HEAVY BLACK-FEATHERED SOUTH EAST ARROW
|
||||
0xF8 0x27B8 # HEAVY BLACK-FEATHERED RIGHTWARDS ARROW
|
||||
0xF9 0x27B9 # HEAVY BLACK-FEATHERED NORTH EAST ARROW
|
||||
0xFA 0x27BA # TEARDROP-BARBED RIGHTWARDS ARROW
|
||||
0xFB 0x27BB # HEAVY TEARDROP-SHANKED RIGHTWARDS ARROW
|
||||
0xFC 0x27BC # WEDGE-TAILED RIGHTWARDS ARROW
|
||||
0xFD 0x27BD # HEAVY WEDGE-TAILED RIGHTWARDS ARROW
|
||||
0xFE 0x27BE # OPEN-OUTLINED RIGHTWARDS ARROW
|
521
charmap/FARSI.TXT
Normal file
521
charmap/FARSI.TXT
Normal file
@ -0,0 +1,521 @@
|
||||
#=======================================================================
|
||||
# File name: FARSI.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Farsi
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1997-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Add comments about character display and
|
||||
# direction overrides. Update URLs, notes.
|
||||
# Matches internal utom<b3>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n04 1998-Feb-05 Show required Unicode character
|
||||
# directionality in a different way. Matches
|
||||
# internal utom<n3>, ufrm<n9>, and Text
|
||||
# Encoding Converter version 1.3. Update
|
||||
# header comments; include information on
|
||||
# loose mapping of digits, and changes to
|
||||
# mapping for the TrueType variant.
|
||||
# n01 1997-Jul-17 First version. Matches internal utom<n1>,
|
||||
# ufrm<n2>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Farsi code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN),
|
||||
# possibly preceded by a tag indicating required directionality
|
||||
# (i.e. <LR>+0xNNNN or <RL>+0xNNNN).
|
||||
# Column #3 is a comment containing the Unicode name.
|
||||
#
|
||||
# The entries are in Mac OS Farsi code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Farsi character set uses the standard control characters at
|
||||
# 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Farsi:
|
||||
# ----------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# 1. General
|
||||
#
|
||||
# The Mac OS Farsi character set is based on the Mac OS Arabic
|
||||
# character set. The main difference is in the right-to-left digits
|
||||
# 0xB0-0xB9: For Mac OS Arabic these correspond to right-left
|
||||
# versions of the Unicode ARABIC-INDIC DIGITs 0660-0669; for
|
||||
# Mac OS Farsi these correspond to right-left versions of the
|
||||
# Unicode EXTENDED ARABIC-INDIC DIGITs 06F0-06F9. The other
|
||||
# difference is in the nature of the font variants.
|
||||
#
|
||||
# For more information, see the comments in the mapping table for
|
||||
# Mac OS Arabic.
|
||||
#
|
||||
# Mac OS Farsi characters 0xEB-0xF2 are non-spacing/combining marks.
|
||||
#
|
||||
# 2. Directional characters and roundtrip fidelity
|
||||
#
|
||||
# The Mac OS Arabic character set (on which Mac OS Farsi is based)
|
||||
# was developed in 1986-1987. At that time the bidirectional line
|
||||
# layout algorithm used in the Mac OS Arabic system was fairly simple;
|
||||
# it used only a few direction classes (instead of the 19 now used in
|
||||
# the Unicode bidirectional algorithm). In order to permit users to
|
||||
# handle some tricky layout problems, certain punctuation and symbol
|
||||
# characters were encoded twice, one with a left-right direction
|
||||
# attribute and the other with a right-left direction attribute. This
|
||||
# is the case in Mac OS Farsi too.
|
||||
#
|
||||
# For example, plus sign is encoded at 0x2B with a left-right
|
||||
# attribute, and at 0xAB with a right-left attribute. However, there
|
||||
# is only one PLUS SIGN character in Unicode. This leads to some
|
||||
# interesting problems when mapping between Mac OS Farsi and Unicode;
|
||||
# see below.
|
||||
#
|
||||
# A related problem is that even when a particular character is
|
||||
# encoded only once in Mac OS Farsi, it may have a different
|
||||
# direction attribute than the corresponding Unicode character.
|
||||
#
|
||||
# For example, the Mac OS Farsi character at 0x93 is HORIZONTAL
|
||||
# ELLIPSIS with strong right-left direction. However, the Unicode
|
||||
# character HORIZONTAL ELLIPSIS has direction class neutral.
|
||||
#
|
||||
# 3. Behavior of ASCII-range numbers in WorldScript
|
||||
#
|
||||
# Mac OS Farsi also has two sets of digit codes.
|
||||
|
||||
# The digits at 0x30-0x39 may be displayed using either European
|
||||
# digit forms or Persian digit forms, depending on context. If there
|
||||
# is a "strong European" character such as a Latin letter on either
|
||||
# side of a sequence consisting of digits 0x30-0x39 and possibly comma
|
||||
# 0x2C or period 0x2E, then the characters will be displayed using
|
||||
# European forms (This will happen even if there are neutral characters
|
||||
# between the digits and the strong European character). Otherwise, the
|
||||
# digits will be displayed using Persian forms, the comma will be
|
||||
# displayed as Arabic thousands separator, and the period as Arabic
|
||||
# decimal separator. In any case, 0x2C, 0x2E, and 0x30-0x39 are always
|
||||
# left-right.
|
||||
#
|
||||
# The digits at 0xB0-0xB9 are always displayed using Persian digit
|
||||
# shapes, and moreover, these digits always have strong right-left
|
||||
# directionality. These are mainly intended for special layout
|
||||
# purposes such as part numbers, etc.
|
||||
#
|
||||
# 4. Font variants
|
||||
#
|
||||
# The table in this file gives the Unicode mappings for the standard
|
||||
# Mac OS Farsi encoding. This encoding is supported by the Tehran font
|
||||
# (the system font for Farsi), and is the encoding supported by the
|
||||
# text processing utilities. However, the other Farsi fonts actually
|
||||
# implement a somewhat different encoding; this affects nine code
|
||||
# points including 0xAA and 0xC0 (which are also affected by font
|
||||
# variants in Mac OS Arabic). For these nine code points the standard
|
||||
# Mac OS Farsi encoding has the following mappings:
|
||||
# 0x8B -> 0x06BA ARABIC LETTER NOON GHUNNA (Urdu)
|
||||
# 0xA4 -> <RL>+0x0024 DOLLAR SIGN, right-left
|
||||
# 0xAA -> <RL>+0x002A ASTERISK, right-left
|
||||
# 0xC0 -> <RL>+0x274A EIGHT TEARDROP-SPOKED PROPELLER ASTERISK,
|
||||
# right-left
|
||||
# 0xF4 -> 0x0679 ARABIC LETTER TTEH (Urdu)
|
||||
# 0xF7 -> 0x06A4 ARABIC LETTER VEH (for transliteration)
|
||||
# 0xF9 -> 0x0688 ARABIC LETTER DDAL (Urdu)
|
||||
# 0xFA -> 0x0691 ARABIC LETTER RREH (Urdu)
|
||||
# 0xFF -> 0x06D2 ARABIC LETTER YEH BARREE (Urdu)
|
||||
#
|
||||
# The TrueType variant is used for the Farsi TrueType fonts: Ashfahan,
|
||||
# Amir, Kamran, Mashad, NadeemFarsi. It differs from the standard
|
||||
# variant in the following ways:
|
||||
# 0x8B -> 0xF882 Arabic ligature "peace on him" (corporate char.)
|
||||
# 0xA4 -> 0xFDFC RIAL SIGN (added in Unicode 3.2)
|
||||
# 0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
|
||||
# 0xC0 -> <RL>+0x002A ASTERISK, right-left
|
||||
# 0xF4 -> <RL>+0x00B0 DEGREE SIGN, right-left
|
||||
# 0xF7 -> 0xFDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
|
||||
# 0xF9 -> <RL>+0x25CF BLACK CIRCLE, right-left
|
||||
# 0xFA -> <RL>+0x25A0 BLACK SQUARE, right-left
|
||||
# 0xFF -> <RL>+0x25B2 BLACK UP-POINTING TRIANGLE, right-left
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# 1. Matching the direction of Mac OS Farsi characters
|
||||
#
|
||||
# When Mac OS Farsi encodes a character twice but with different
|
||||
# direction attributes for the two code points - as in the case of
|
||||
# plus sign mentioned above - we need a way to map both Mac OS Farsi
|
||||
# code points to Unicode and back again without loss of information.
|
||||
# With the plus sign, for example, mapping one of the Mac OS Farsi
|
||||
# characters to a code in the Unicode corporate use zone is
|
||||
# undesirable, since both of the plus sign characters are likely to
|
||||
# be used in text that is interchanged.
|
||||
#
|
||||
# The problem is solved with the use of direction override characters
|
||||
# and direction-dependent mappings. When mapping from Mac OS Farsi
|
||||
# to Unicode, we use direction overrides as necessary to force the
|
||||
# direction of the resulting Unicode characters.
|
||||
#
|
||||
# The required direction is indicated by a direction tag in the
|
||||
# mappings. A tag of <LR> means the corresponding Unicode character
|
||||
# must have a strong left-right context, and a tag of <RL> indicates
|
||||
# a right-left context.
|
||||
#
|
||||
# For example, the mapping of 0x2B is given as <LR>+0x002B; the
|
||||
# mapping of 0xAB is given as <RL>+0x002B. If we map an isolated
|
||||
# instance of 0x2B to Unicode, it should be mapped as follows (LRO
|
||||
# indicates LEFT-RIGHT OVERRIDE, PDF indicates POP DIRECTION
|
||||
# FORMATTING):
|
||||
#
|
||||
# 0x2B -> 0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
|
||||
#
|
||||
# When mapping several characters in a row that require direction
|
||||
# forcing, the overrides need only be used at the beginning and end.
|
||||
# For example:
|
||||
#
|
||||
# 0x24 0x20 0x28 0x29 -> 0x202D 0x0024 0x0020 0x0028 0x0029 0x202C
|
||||
#
|
||||
# If neutral characters that require direction forcing are already
|
||||
# between strong-direction characters with matching directionality,
|
||||
# then direction overrides need not be used. Direction overrides are
|
||||
# always needed to map the right-left digits at 0xB0-0xB9.
|
||||
#
|
||||
# When mapping from Unicode to Mac OS Farsi, the Unicode
|
||||
# bidirectional algorithm should be used to determine resolved
|
||||
# direction of the Unicode characters. The mapping from Unicode to
|
||||
# Mac OS Farsi can then be disambiguated by the use of the resolved
|
||||
# direction:
|
||||
#
|
||||
# Unicode 0x002B -> Mac OS Farsi 0x2B (if L) or 0xAB (if R)
|
||||
#
|
||||
# However, this also means the direction override characters should
|
||||
# be discarded when mapping from Unicode to Mac OS Farsi (after
|
||||
# they have been used to determine resolved direction), since the
|
||||
# direction override information is carried by the code point itself.
|
||||
#
|
||||
# Even when direction overrides are not needed for roundtrip
|
||||
# fidelity, they are sometimes used when mapping Mac OS Farsi
|
||||
# characters to Unicode in order to achieve similar text layout with
|
||||
# the resulting Unicode text. For example, the single Mac OS Farsi
|
||||
# ellipsis character has direction class right-left,and there is no
|
||||
# left-right version. However, the Unicode HORIZONTAL ELLIPSIS
|
||||
# character has direction class neutral (which means it may end up
|
||||
# with a resolved direction of left-right if surrounded by left-right
|
||||
# characters). When mapping the Mac OS Farsi ellipsis to Unicode, it
|
||||
# is surrounded with a direction override to help preserve proper
|
||||
# text layout. The resolved direction is not needed or used when
|
||||
# mapping the Unicode HORIZONTAL ELLIPSIS back to Mac OS Farsi.
|
||||
#
|
||||
# 2. Mapping the Mac OS Farsi digits
|
||||
#
|
||||
# The main table below contains mappings that should be used when
|
||||
# strict round-trip fidelity is required. However, for numeric
|
||||
# values, the mappings in that table will produce Unicode characters
|
||||
# that may appear different than the Mac OS Farsi text displayed on
|
||||
# a Mac OS system using WorldScript. This is because WorldScript
|
||||
# uses context-dependent display for the 0x30-0x39 digits.
|
||||
#
|
||||
# If roundtrip fidelity is not required, then the following
|
||||
# alternate mappings should be used when a sequence of 0x30-0x39
|
||||
# digits - possibly including 0x2C and 0x2E - occurs in an Arabic
|
||||
# context (that is, when the first "strong" character on either side
|
||||
# of the digit sequence is Arabic, or there is no strong character):
|
||||
#
|
||||
# 0x2C 0x066C # ARABIC THOUSANDS SEPARATOR
|
||||
# 0x2E 0x066B # ARABIC DECIMAL SEPARATOR
|
||||
# 0x30 0x06F0 # EXTENDED ARABIC-INDIC DIGIT ZERO
|
||||
# 0x31 0x06F1 # EXTENDED ARABIC-INDIC DIGIT ONE
|
||||
# 0x32 0x06F2 # EXTENDED ARABIC-INDIC DIGIT TWO
|
||||
# 0x33 0x06F3 # EXTENDED ARABIC-INDIC DIGIT THREE
|
||||
# 0x34 0x06F4 # EXTENDED ARABIC-INDIC DIGIT FOUR
|
||||
# 0x35 0x06F5 # EXTENDED ARABIC-INDIC DIGIT FIVE
|
||||
# 0x36 0x06F6 # EXTENDED ARABIC-INDIC DIGIT SIX
|
||||
# 0x37 0x06F7 # EXTENDED ARABIC-INDIC DIGIT SEVEN
|
||||
# 0x38 0x06F8 # EXTENDED ARABIC-INDIC DIGIT EIGHT
|
||||
# 0x39 0x06F9 # EXTENDED ARABIC-INDIC DIGIT NINE
|
||||
#
|
||||
# 3. Use of corporate-zone Unicodes (mapping the TrueType variant)
|
||||
#
|
||||
# The following corporate zone Unicode character is used in this
|
||||
# mapping:
|
||||
#
|
||||
# 0xF882 Arabic ligature "peace on him"
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version b02 to version b03/c01:
|
||||
#
|
||||
# - Update mapping of 0xA4 in TrueType variant to use new Unicode
|
||||
# character U+FDFC RIAL SIGN addded for Unicode 3.2
|
||||
#
|
||||
# Changes from version n01 to version n04:
|
||||
#
|
||||
# - Change mapping of 0xA4 in TrueType variant (just described in
|
||||
# header comment) from single corporate character to use
|
||||
# grouping hint
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 <LR>+0x0020 # SPACE, left-right
|
||||
0x21 <LR>+0x0021 # EXCLAMATION MARK, left-right
|
||||
0x22 <LR>+0x0022 # QUOTATION MARK, left-right
|
||||
0x23 <LR>+0x0023 # NUMBER SIGN, left-right
|
||||
0x24 <LR>+0x0024 # DOLLAR SIGN, left-right
|
||||
0x25 <LR>+0x0025 # PERCENT SIGN, left-right
|
||||
0x26 <LR>+0x0026 # AMPERSAND, left-right
|
||||
0x27 <LR>+0x0027 # APOSTROPHE, left-right
|
||||
0x28 <LR>+0x0028 # LEFT PARENTHESIS, left-right
|
||||
0x29 <LR>+0x0029 # RIGHT PARENTHESIS, left-right
|
||||
0x2A <LR>+0x002A # ASTERISK, left-right
|
||||
0x2B <LR>+0x002B # PLUS SIGN, left-right
|
||||
0x2C <LR>+0x002C # COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
|
||||
0x2D <LR>+0x002D # HYPHEN-MINUS, left-right
|
||||
0x2E <LR>+0x002E # FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
|
||||
0x2F <LR>+0x002F # SOLIDUS, left-right
|
||||
0x30 0x0030 # DIGIT ZERO; in Arabic-script context, displayed as 0x06F0 EXTENDED ARABIC-INDIC DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE; in Arabic-script context, displayed as 0x06F1 EXTENDED ARABIC-INDIC DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO; in Arabic-script context, displayed as 0x06F2 EXTENDED ARABIC-INDIC DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE; in Arabic-script context, displayed as 0x06F3 EXTENDED ARABIC-INDIC DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR; in Arabic-script context, displayed as 0x06F4 EXTENDED ARABIC-INDIC DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE; in Arabic-script context, displayed as 0x06F5 EXTENDED ARABIC-INDIC DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX; in Arabic-script context, displayed as 0x06F6 EXTENDED ARABIC-INDIC DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN; in Arabic-script context, displayed as 0x06F7 EXTENDED ARABIC-INDIC DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT; in Arabic-script context, displayed as 0x06F8 EXTENDED ARABIC-INDIC DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE; in Arabic-script context, displayed as 0x06F9 EXTENDED ARABIC-INDIC DIGIT NINE
|
||||
0x3A <LR>+0x003A # COLON, left-right
|
||||
0x3B <LR>+0x003B # SEMICOLON, left-right
|
||||
0x3C <LR>+0x003C # LESS-THAN SIGN, left-right
|
||||
0x3D <LR>+0x003D # EQUALS SIGN, left-right
|
||||
0x3E <LR>+0x003E # GREATER-THAN SIGN, left-right
|
||||
0x3F <LR>+0x003F # QUESTION MARK, left-right
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B <LR>+0x005B # LEFT SQUARE BRACKET, left-right
|
||||
0x5C <LR>+0x005C # REVERSE SOLIDUS, left-right
|
||||
0x5D <LR>+0x005D # RIGHT SQUARE BRACKET, left-right
|
||||
0x5E <LR>+0x005E # CIRCUMFLEX ACCENT, left-right
|
||||
0x5F <LR>+0x005F # LOW LINE, left-right
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B <LR>+0x007B # LEFT CURLY BRACKET, left-right
|
||||
0x7C <LR>+0x007C # VERTICAL LINE, left-right
|
||||
0x7D <LR>+0x007D # RIGHT CURLY BRACKET, left-right
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 <RL>+0x00A0 # NO-BREAK SPACE, right-left
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x06BA # ARABIC LETTER NOON GHUNNA
|
||||
0x8C <RL>+0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 <RL>+0x2026 # HORIZONTAL ELLIPSIS, right-left
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 <RL>+0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B <RL>+0x00F7 # DIVISION SIGN, right-left
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 <RL>+0x0020 # SPACE, right-left
|
||||
0xA1 <RL>+0x0021 # EXCLAMATION MARK, right-left
|
||||
0xA2 <RL>+0x0022 # QUOTATION MARK, right-left
|
||||
0xA3 <RL>+0x0023 # NUMBER SIGN, right-left
|
||||
0xA4 <RL>+0x0024 # DOLLAR SIGN, right-left
|
||||
0xA5 0x066A # ARABIC PERCENT SIGN
|
||||
0xA6 <RL>+0x0026 # AMPERSAND, right-left
|
||||
0xA7 <RL>+0x0027 # APOSTROPHE, right-left
|
||||
0xA8 <RL>+0x0028 # LEFT PARENTHESIS, right-left
|
||||
0xA9 <RL>+0x0029 # RIGHT PARENTHESIS, right-left
|
||||
0xAA <RL>+0x002A # ASTERISK, right-left
|
||||
0xAB <RL>+0x002B # PLUS SIGN, right-left
|
||||
0xAC 0x060C # ARABIC COMMA
|
||||
0xAD <RL>+0x002D # HYPHEN-MINUS, right-left
|
||||
0xAE <RL>+0x002E # FULL STOP, right-left
|
||||
0xAF <RL>+0x002F # SOLIDUS, right-left
|
||||
0xB0 <RL>+0x06F0 # EXTENDED ARABIC-INDIC DIGIT ZERO, right-left (need override)
|
||||
0xB1 <RL>+0x06F1 # EXTENDED ARABIC-INDIC DIGIT ONE, right-left (need override)
|
||||
0xB2 <RL>+0x06F2 # EXTENDED ARABIC-INDIC DIGIT TWO, right-left (need override)
|
||||
0xB3 <RL>+0x06F3 # EXTENDED ARABIC-INDIC DIGIT THREE, right-left (need override)
|
||||
0xB4 <RL>+0x06F4 # EXTENDED ARABIC-INDIC DIGIT FOUR, right-left (need override)
|
||||
0xB5 <RL>+0x06F5 # EXTENDED ARABIC-INDIC DIGIT FIVE, right-left (need override)
|
||||
0xB6 <RL>+0x06F6 # EXTENDED ARABIC-INDIC DIGIT SIX, right-left (need override)
|
||||
0xB7 <RL>+0x06F7 # EXTENDED ARABIC-INDIC DIGIT SEVEN, right-left (need override)
|
||||
0xB8 <RL>+0x06F8 # EXTENDED ARABIC-INDIC DIGIT EIGHT, right-left (need override)
|
||||
0xB9 <RL>+0x06F9 # EXTENDED ARABIC-INDIC DIGIT NINE, right-left (need override)
|
||||
0xBA <RL>+0x003A # COLON, right-left
|
||||
0xBB 0x061B # ARABIC SEMICOLON
|
||||
0xBC <RL>+0x003C # LESS-THAN SIGN, right-left
|
||||
0xBD <RL>+0x003D # EQUALS SIGN, right-left
|
||||
0xBE <RL>+0x003E # GREATER-THAN SIGN, right-left
|
||||
0xBF 0x061F # ARABIC QUESTION MARK
|
||||
0xC0 <RL>+0x274A # EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
|
||||
0xC1 0x0621 # ARABIC LETTER HAMZA
|
||||
0xC2 0x0622 # ARABIC LETTER ALEF WITH MADDA ABOVE
|
||||
0xC3 0x0623 # ARABIC LETTER ALEF WITH HAMZA ABOVE
|
||||
0xC4 0x0624 # ARABIC LETTER WAW WITH HAMZA ABOVE
|
||||
0xC5 0x0625 # ARABIC LETTER ALEF WITH HAMZA BELOW
|
||||
0xC6 0x0626 # ARABIC LETTER YEH WITH HAMZA ABOVE
|
||||
0xC7 0x0627 # ARABIC LETTER ALEF
|
||||
0xC8 0x0628 # ARABIC LETTER BEH
|
||||
0xC9 0x0629 # ARABIC LETTER TEH MARBUTA
|
||||
0xCA 0x062A # ARABIC LETTER TEH
|
||||
0xCB 0x062B # ARABIC LETTER THEH
|
||||
0xCC 0x062C # ARABIC LETTER JEEM
|
||||
0xCD 0x062D # ARABIC LETTER HAH
|
||||
0xCE 0x062E # ARABIC LETTER KHAH
|
||||
0xCF 0x062F # ARABIC LETTER DAL
|
||||
0xD0 0x0630 # ARABIC LETTER THAL
|
||||
0xD1 0x0631 # ARABIC LETTER REH
|
||||
0xD2 0x0632 # ARABIC LETTER ZAIN
|
||||
0xD3 0x0633 # ARABIC LETTER SEEN
|
||||
0xD4 0x0634 # ARABIC LETTER SHEEN
|
||||
0xD5 0x0635 # ARABIC LETTER SAD
|
||||
0xD6 0x0636 # ARABIC LETTER DAD
|
||||
0xD7 0x0637 # ARABIC LETTER TAH
|
||||
0xD8 0x0638 # ARABIC LETTER ZAH
|
||||
0xD9 0x0639 # ARABIC LETTER AIN
|
||||
0xDA 0x063A # ARABIC LETTER GHAIN
|
||||
0xDB <RL>+0x005B # LEFT SQUARE BRACKET, right-left
|
||||
0xDC <RL>+0x005C # REVERSE SOLIDUS, right-left
|
||||
0xDD <RL>+0x005D # RIGHT SQUARE BRACKET, right-left
|
||||
0xDE <RL>+0x005E # CIRCUMFLEX ACCENT, right-left
|
||||
0xDF <RL>+0x005F # LOW LINE, right-left
|
||||
0xE0 0x0640 # ARABIC TATWEEL
|
||||
0xE1 0x0641 # ARABIC LETTER FEH
|
||||
0xE2 0x0642 # ARABIC LETTER QAF
|
||||
0xE3 0x0643 # ARABIC LETTER KAF
|
||||
0xE4 0x0644 # ARABIC LETTER LAM
|
||||
0xE5 0x0645 # ARABIC LETTER MEEM
|
||||
0xE6 0x0646 # ARABIC LETTER NOON
|
||||
0xE7 0x0647 # ARABIC LETTER HEH
|
||||
0xE8 0x0648 # ARABIC LETTER WAW
|
||||
0xE9 0x0649 # ARABIC LETTER ALEF MAKSURA
|
||||
0xEA 0x064A # ARABIC LETTER YEH
|
||||
0xEB 0x064B # ARABIC FATHATAN
|
||||
0xEC 0x064C # ARABIC DAMMATAN
|
||||
0xED 0x064D # ARABIC KASRATAN
|
||||
0xEE 0x064E # ARABIC FATHA
|
||||
0xEF 0x064F # ARABIC DAMMA
|
||||
0xF0 0x0650 # ARABIC KASRA
|
||||
0xF1 0x0651 # ARABIC SHADDA
|
||||
0xF2 0x0652 # ARABIC SUKUN
|
||||
0xF3 0x067E # ARABIC LETTER PEH
|
||||
0xF4 0x0679 # ARABIC LETTER TTEH
|
||||
0xF5 0x0686 # ARABIC LETTER TCHEH
|
||||
0xF6 0x06D5 # ARABIC LETTER AE
|
||||
0xF7 0x06A4 # ARABIC LETTER VEH
|
||||
0xF8 0x06AF # ARABIC LETTER GAF
|
||||
0xF9 0x0688 # ARABIC LETTER DDAL
|
||||
0xFA 0x0691 # ARABIC LETTER RREH
|
||||
0xFB <RL>+0x007B # LEFT CURLY BRACKET, right-left
|
||||
0xFC <RL>+0x007C # VERTICAL LINE, right-left
|
||||
0xFD <RL>+0x007D # RIGHT CURLY BRACKET, right-left
|
||||
0xFE 0x0698 # ARABIC LETTER JEH
|
||||
0xFF 0x06D2 # ARABIC LETTER YEH BARREE
|
337
charmap/GAELIC.TXT
Normal file
337
charmap/GAELIC.TXT
Normal file
@ -0,0 +1,337 @@
|
||||
#=======================================================================
|
||||
# File name: GAELIC.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Celtic
|
||||
# character set to Unicode 3.0 and later
|
||||
#
|
||||
# Contacts: charsets@apple.com, everson@evertype.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c01 2005-Apr-01 First posted version. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Gaelic code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Gaelic code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Gaelic character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Gaelic (partly from Michael Everson):
|
||||
# -----------------------------------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# This character set was developed by Michael Everson of Everson
|
||||
# Typography (everson@evertype.com) and was used for fonts in his
|
||||
# Celtic Utilities and CeltScript font packages for the Mac, as well
|
||||
# as some fonts included with the Irish localizations of Mac OS 6.0.8
|
||||
# and 7.1. Note that while Apple authorized this Irish localization,
|
||||
# it was not a system which shipped with Apple hardware, and was not
|
||||
# otherwise supported by Apple. Fonts conforming to the Mac OS Gaelic
|
||||
# character set are available from Everson Typography
|
||||
# (http://www.evertype.com/celtscript/). Information about the use of
|
||||
# this character set is available at
|
||||
# http://www.evertype.com/celtscript/celtcode.html.
|
||||
#
|
||||
# The Mac OS Gaelic encoding shares the script code smRoman (0) with
|
||||
# the standard Mac OS Roman encoding. To determine if the Gaelic
|
||||
# encoding is being used in Mac OS 7-9, you should also check if the
|
||||
# system region code is 81. Otherwise, you can check for particular
|
||||
# fonts that conform to this encoding (since in practice Gaelic fonts
|
||||
# are used with the ordinary US or UK system versions).
|
||||
#
|
||||
# This character set is a variant of standard Mac OS Roman, adding
|
||||
# capital and small y with acute, grave, and circumflex; capital and
|
||||
# small w with acute, grave, circumflex and diaeresis; capital and
|
||||
# small b, c, d, f, g, m, p, s, t with dot above; tironian et; small
|
||||
# long r, small long s, and small long s with dot above. It has 36
|
||||
# code point differences from standard Mac OS Roman.
|
||||
#
|
||||
# Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
|
||||
# mapped to U+00A4. In Mac OS 8.5 and later versions, code point
|
||||
# 0xDB is changed to EURO SIGN and maps to U+20AC; the standard
|
||||
# Apple fonts are updated for Mac OS 8.5 to reflect this. There is
|
||||
# a "currency sign" variant of the Latin 8 Extended encoding that still
|
||||
# maps 0xDB to U+00A4; this can be used for older fonts.
|
||||
# Note: U+20AC is new with Unicode 2.1; for earlier Unicode
|
||||
# versions, Latin 8 Extended 0xDB may be mapped to private-use
|
||||
# character U+F8A0.
|
||||
#
|
||||
# Before Unicode 3.0, code point 0xE4 was PER MILLE SIGN, and was
|
||||
# mapped to U+2030. Since August 1998, code point 0xE4 is changed
|
||||
# to TIRONIAN SIGN ET and maps to U+204A. There is a "per mille
|
||||
# sign" variant of the Mac OS Gaelic encoding that still
|
||||
# maps 0xE4 to U+2030; this can be used for older fonts.
|
||||
# Note: U+204A is new with Unicode 3.0; for earlier Unicode
|
||||
# versions, Mac OS Gaelic was unified with AMPERSAND.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x00C5 # LATIN CAPITAL LETTER A WITH RING ABOVE
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x00E3 # LATIN SMALL LETTER A WITH TILDE
|
||||
0x8C 0x00E5 # LATIN SMALL LETTER A WITH RING ABOVE
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 0x00EC # LATIN SMALL LETTER I WITH GRAVE
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 0x00F2 # LATIN SMALL LETTER O WITH GRAVE
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00F5 # LATIN SMALL LETTER O WITH TILDE
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 0x2020 # DAGGER
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x00A2 # CENT SIGN
|
||||
0xA3 0x00A3 # POUND SIGN
|
||||
0xA4 0x00A7 # SECTION SIGN
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x00DF # LATIN SMALL LETTER SHARP S
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x00B4 # ACUTE ACCENT
|
||||
0xAC 0x00A8 # DIAERESIS
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x00C6 # LATIN CAPITAL LETTER AE
|
||||
0xAF 0x00D8 # LATIN CAPITAL LETTER O WITH STROKE
|
||||
0xB0 0x1E02 # LATIN CAPITAL LETTER B WITH DOT ABOVE
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x1E03 # LATIN SMALL LETTER B WITH DOT ABOVE
|
||||
0xB5 0x010A # LATIN CAPITAL LETTER C WITH DOT ABOVE
|
||||
0xB6 0x010B # LATIN SMALL LETTER C WITH DOT ABOVE
|
||||
0xB7 0x1E0A # LATIN CAPITAL LETTER D WITH DOT ABOVE
|
||||
0xB8 0x1E0B # LATIN SMALL LETTER D WITH DOT ABOVE
|
||||
0xB9 0x1E1E # LATIN CAPITAL LETTER F WITH DOT ABOVE
|
||||
0xBA 0x1E1F # LATIN SMALL LETTER F WITH DOT ABOVE
|
||||
0xBB 0x0120 # LATIN CAPITAL LETTER G WITH DOT ABOVE
|
||||
0xBC 0x0121 # LATIN SMALL LETTER G WITH DOT ABOVE
|
||||
0xBD 0x1E40 # LATIN CAPITAL LETTER M WITH DOT ABOVE
|
||||
0xBE 0x00E6 # LATIN SMALL LETTER AE
|
||||
0xBF 0x00F8 # LATIN SMALL LETTER O WITH STROKE
|
||||
0xC0 0x1E41 # LATIN SMALL LETTER M WITH DOT ABOVE
|
||||
0xC1 0x1E56 # LATIN CAPITAL LETTER P WITH DOT ABOVE
|
||||
0xC2 0x1E57 # LATIN SMALL LETTER P WITH DOT ABOVE
|
||||
0xC3 0x027C # LATIN SMALL LETTER R WITH LONG LEG
|
||||
0xC4 0x0192 # LATIN SMALL LETTER F WITH HOOK
|
||||
0xC5 0x017F # LATIN SMALL LETTER LONG S
|
||||
0xC6 0x1E60 # LATIN CAPITAL LETTER S WITH DOT ABOVE
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x00C0 # LATIN CAPITAL LETTER A WITH GRAVE
|
||||
0xCC 0x00C3 # LATIN CAPITAL LETTER A WITH TILDE
|
||||
0xCD 0x00D5 # LATIN CAPITAL LETTER O WITH TILDE
|
||||
0xCE 0x0152 # LATIN CAPITAL LIGATURE OE
|
||||
0xCF 0x0153 # LATIN SMALL LIGATURE OE
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x1E61 # LATIN SMALL LETTER S WITH DOT ABOVE
|
||||
0xD7 0x1E9B # LATIN SMALL LETTER LONG S WITH DOT ABOVE
|
||||
0xD8 0x00FF # LATIN SMALL LETTER Y WITH DIAERESIS
|
||||
0xD9 0x0178 # LATIN CAPITAL LETTER Y WITH DIAERESIS
|
||||
0xDA 0x1E6A # LATIN CAPITAL LETTER T WITH DOT ABOVE
|
||||
0xDB 0x20AC # EURO SIGN # before Mac OS 8.5 this was U+00A4 CURRENCY SIGN
|
||||
0xDC 0x2039 # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
|
||||
0xDD 0x203A # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
|
||||
0xDE 0x0176 # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
|
||||
0xDF 0x0177 # LATIN SMALL LETTER Y WITH CIRCUMFLEX
|
||||
0xE0 0x1E6B # LATIN SMALL LETTER T WITH DOT ABOVE
|
||||
0xE1 0x00B7 # MIDDLE DOT
|
||||
0xE2 0x1EF2 # LATIN CAPITAL LETTER Y WITH GRAVE
|
||||
0xE3 0x1EF3 # LATIN SMALL LETTER Y WITH GRAVE
|
||||
0xE4 0x204A # TIRONIAN SIGN ET # change from MacCeltic for Unicode 3.0; before Aug. 1998 this was U+2030 PER MILLE SIGN
|
||||
0xE5 0x00C2 # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
|
||||
0xE6 0x00CA # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
|
||||
0xE7 0x00C1 # LATIN CAPITAL LETTER A WITH ACUTE
|
||||
0xE8 0x00CB # LATIN CAPITAL LETTER E WITH DIAERESIS
|
||||
0xE9 0x00C8 # LATIN CAPITAL LETTER E WITH GRAVE
|
||||
0xEA 0x00CD # LATIN CAPITAL LETTER I WITH ACUTE
|
||||
0xEB 0x00CE # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
|
||||
0xEC 0x00CF # LATIN CAPITAL LETTER I WITH DIAERESIS
|
||||
0xED 0x00CC # LATIN CAPITAL LETTER I WITH GRAVE
|
||||
0xEE 0x00D3 # LATIN CAPITAL LETTER O WITH ACUTE
|
||||
0xEF 0x00D4 # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
|
||||
0xF0 0x2663 # BLACK CLUB SUIT = shamrock # future mapping U+2618 SHAMROCK
|
||||
0xF1 0x00D2 # LATIN CAPITAL LETTER O WITH GRAVE
|
||||
0xF2 0x00DA # LATIN CAPITAL LETTER U WITH ACUTE
|
||||
0xF3 0x00DB # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
|
||||
0xF4 0x00D9 # LATIN CAPITAL LETTER U WITH GRAVE
|
||||
0xF5 0x0131 # LATIN SMALL LETTER DOTLESS I
|
||||
0xF6 0x00DD # LATIN CAPITAL LETTER Y WITH ACUTE
|
||||
0xF7 0x00FD # LATIN SMALL LETTER Y WITH ACUTE
|
||||
0xF8 0x0174 # LATIN CAPITAL LETTER W WITH CIRCUMFLEX
|
||||
0xF9 0x0175 # LATIN SMALL LETTER W WITH CIRCUMFLEX
|
||||
0xFA 0x1E84 # LATIN CAPITAL LETTER W WITH DIAERESIS
|
||||
0xFB 0x1E85 # LATIN SMALL LETTER W WITH DIAERESIS
|
||||
0xFC 0x1E80 # LATIN CAPITAL LETTER W WITH GRAVE
|
||||
0xFD 0x1E81 # LATIN SMALL LETTER W WITH GRAVE
|
||||
0xFE 0x1E82 # LATIN CAPITAL LETTER W WITH ACUTE
|
||||
0xFF 0x1E83 # LATIN SMALL LETTER W WITH ACUTE
|
355
charmap/GREEK.TXT
Normal file
355
charmap/GREEK.TXT
Normal file
@ -0,0 +1,355 @@
|
||||
#=======================================================================
|
||||
# File name: GREEK.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Greek
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update to match changes in Mac OS Greek
|
||||
# encoding for Mac OS 9.2.2 and later.
|
||||
# Update URLs, notes. Matches internal
|
||||
# utom<b3>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n06 1998-Feb-05 Update to match internal utom<n4>, ufrm<n17>,
|
||||
# and Text Encoding Converter versions 1.3:
|
||||
# Change mapping for 0xAF from U+0387 to its
|
||||
# canonical decomposition, U+00B7. Also
|
||||
# update header comments to new format.
|
||||
# n04 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n7>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Greek code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Greek code order.
|
||||
#
|
||||
# One of these mappings requires the use of a corporate character.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Greek character set uses the standard control characters at
|
||||
# 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Greek:
|
||||
# ----------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# Although a Mac OS script code is defined for Greek (smGreek = 6),
|
||||
# the Greek localized system does not currently use it (the font
|
||||
# family IDs are in the Mac OS Roman range). To determine if the
|
||||
# Greek encoding is being used when the script code is smRoman (0),
|
||||
# you must check if the system region code is 20, verGreece.
|
||||
#
|
||||
# The Mac OS Greek encoding is a superset of the repertoire of
|
||||
# ISO 8859-7 (although characters are not at the same code points),
|
||||
# except that LEFT & RIGHT SINGLE QUOTATION MARK replace the
|
||||
# MODIFIER LETTER REVERSED COMMA & APOSTROPHE (spacing versions of
|
||||
# Greek rough & smooth breathing marks) that are in ISO 8859-7.
|
||||
# The added characters in Mac OS Greek include more punctuation and
|
||||
# symbols and several accented Latin letters.
|
||||
#
|
||||
# Before Mac OS 9.2.2, code point 0x9C was SOFT HYPHEN (U+00AD), and
|
||||
# code point 0xFF was undefined. In Mac OS 9.2.2 and later versions,
|
||||
# SOFT HYPHEN was moved to 0xFF, and code point 0x9C was changed to be
|
||||
# EURO SIGN (U+20AC); the standard Apple fonts are updated for Mac OS
|
||||
# 9.2.2 to reflect this. There is a "no Euro sign" variant of the Mac
|
||||
# OS Greek encoding that uses the older mapping; this can be used for
|
||||
# older fonts.
|
||||
#
|
||||
# This "no Euro sign" variant of Mac OS Greek was the character set
|
||||
# used by Mac OS Greek systems before 9.2.2 except for system 6.0.7,
|
||||
# which used a variant character set but was quickly replaced with
|
||||
# Greek system 6.0.7.1 using the no Euro sign" character set
|
||||
# documented here. Greek system 4.1 used a variant Greek set that had
|
||||
# ISO 8859-7 in 0xA0-0xFF (with some holes filled in with DTP
|
||||
# characters), and Mac OS Roman accented Roman letters in 0x80-0x9F.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version b02 to version b03/c01:
|
||||
#
|
||||
# - The Mac OS Greek encoding changed for Mac OS 9.2.2 and later
|
||||
# as follows:
|
||||
# 0x9C, changed from 0x00AD SOFT HYPHEN to 0x20AC EURO SIGN
|
||||
# 0xFF, changed from undefined to 0x00AD SOFT HYPHEN
|
||||
#
|
||||
# Changes from version n04 to version n06:
|
||||
#
|
||||
# - Change mapping of 0xAF from U+0387 to its canonical
|
||||
# decomposition, U+00B7.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x00B9 # SUPERSCRIPT ONE
|
||||
0x82 0x00B2 # SUPERSCRIPT TWO
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00B3 # SUPERSCRIPT THREE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x0385 # GREEK DIALYTIKA TONOS
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x0384 # GREEK TONOS
|
||||
0x8C 0x00A8 # DIAERESIS
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00A3 # POUND SIGN
|
||||
0x93 0x2122 # TRADE MARK SIGN
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x2022 # BULLET
|
||||
0x97 0x00BD # VULGAR FRACTION ONE HALF
|
||||
0x98 0x2030 # PER MILLE SIGN
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00A6 # BROKEN BAR
|
||||
0x9C 0x20AC # EURO SIGN # before Mac OS 9.2.2, was SOFT HYPHEN
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 0x2020 # DAGGER
|
||||
0xA1 0x0393 # GREEK CAPITAL LETTER GAMMA
|
||||
0xA2 0x0394 # GREEK CAPITAL LETTER DELTA
|
||||
0xA3 0x0398 # GREEK CAPITAL LETTER THETA
|
||||
0xA4 0x039B # GREEK CAPITAL LETTER LAMDA
|
||||
0xA5 0x039E # GREEK CAPITAL LETTER XI
|
||||
0xA6 0x03A0 # GREEK CAPITAL LETTER PI
|
||||
0xA7 0x00DF # LATIN SMALL LETTER SHARP S
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x03A3 # GREEK CAPITAL LETTER SIGMA
|
||||
0xAB 0x03AA # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
|
||||
0xAC 0x00A7 # SECTION SIGN
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x00B0 # DEGREE SIGN
|
||||
0xAF 0x00B7 # MIDDLE DOT
|
||||
0xB0 0x0391 # GREEK CAPITAL LETTER ALPHA
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x00A5 # YEN SIGN
|
||||
0xB5 0x0392 # GREEK CAPITAL LETTER BETA
|
||||
0xB6 0x0395 # GREEK CAPITAL LETTER EPSILON
|
||||
0xB7 0x0396 # GREEK CAPITAL LETTER ZETA
|
||||
0xB8 0x0397 # GREEK CAPITAL LETTER ETA
|
||||
0xB9 0x0399 # GREEK CAPITAL LETTER IOTA
|
||||
0xBA 0x039A # GREEK CAPITAL LETTER KAPPA
|
||||
0xBB 0x039C # GREEK CAPITAL LETTER MU
|
||||
0xBC 0x03A6 # GREEK CAPITAL LETTER PHI
|
||||
0xBD 0x03AB # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
|
||||
0xBE 0x03A8 # GREEK CAPITAL LETTER PSI
|
||||
0xBF 0x03A9 # GREEK CAPITAL LETTER OMEGA
|
||||
0xC0 0x03AC # GREEK SMALL LETTER ALPHA WITH TONOS
|
||||
0xC1 0x039D # GREEK CAPITAL LETTER NU
|
||||
0xC2 0x00AC # NOT SIGN
|
||||
0xC3 0x039F # GREEK CAPITAL LETTER OMICRON
|
||||
0xC4 0x03A1 # GREEK CAPITAL LETTER RHO
|
||||
0xC5 0x2248 # ALMOST EQUAL TO
|
||||
0xC6 0x03A4 # GREEK CAPITAL LETTER TAU
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x03A5 # GREEK CAPITAL LETTER UPSILON
|
||||
0xCC 0x03A7 # GREEK CAPITAL LETTER CHI
|
||||
0xCD 0x0386 # GREEK CAPITAL LETTER ALPHA WITH TONOS
|
||||
0xCE 0x0388 # GREEK CAPITAL LETTER EPSILON WITH TONOS
|
||||
0xCF 0x0153 # LATIN SMALL LIGATURE OE
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2015 # HORIZONTAL BAR
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x00F7 # DIVISION SIGN
|
||||
0xD7 0x0389 # GREEK CAPITAL LETTER ETA WITH TONOS
|
||||
0xD8 0x038A # GREEK CAPITAL LETTER IOTA WITH TONOS
|
||||
0xD9 0x038C # GREEK CAPITAL LETTER OMICRON WITH TONOS
|
||||
0xDA 0x038E # GREEK CAPITAL LETTER UPSILON WITH TONOS
|
||||
0xDB 0x03AD # GREEK SMALL LETTER EPSILON WITH TONOS
|
||||
0xDC 0x03AE # GREEK SMALL LETTER ETA WITH TONOS
|
||||
0xDD 0x03AF # GREEK SMALL LETTER IOTA WITH TONOS
|
||||
0xDE 0x03CC # GREEK SMALL LETTER OMICRON WITH TONOS
|
||||
0xDF 0x038F # GREEK CAPITAL LETTER OMEGA WITH TONOS
|
||||
0xE0 0x03CD # GREEK SMALL LETTER UPSILON WITH TONOS
|
||||
0xE1 0x03B1 # GREEK SMALL LETTER ALPHA
|
||||
0xE2 0x03B2 # GREEK SMALL LETTER BETA
|
||||
0xE3 0x03C8 # GREEK SMALL LETTER PSI
|
||||
0xE4 0x03B4 # GREEK SMALL LETTER DELTA
|
||||
0xE5 0x03B5 # GREEK SMALL LETTER EPSILON
|
||||
0xE6 0x03C6 # GREEK SMALL LETTER PHI
|
||||
0xE7 0x03B3 # GREEK SMALL LETTER GAMMA
|
||||
0xE8 0x03B7 # GREEK SMALL LETTER ETA
|
||||
0xE9 0x03B9 # GREEK SMALL LETTER IOTA
|
||||
0xEA 0x03BE # GREEK SMALL LETTER XI
|
||||
0xEB 0x03BA # GREEK SMALL LETTER KAPPA
|
||||
0xEC 0x03BB # GREEK SMALL LETTER LAMDA
|
||||
0xED 0x03BC # GREEK SMALL LETTER MU
|
||||
0xEE 0x03BD # GREEK SMALL LETTER NU
|
||||
0xEF 0x03BF # GREEK SMALL LETTER OMICRON
|
||||
0xF0 0x03C0 # GREEK SMALL LETTER PI
|
||||
0xF1 0x03CE # GREEK SMALL LETTER OMEGA WITH TONOS
|
||||
0xF2 0x03C1 # GREEK SMALL LETTER RHO
|
||||
0xF3 0x03C3 # GREEK SMALL LETTER SIGMA
|
||||
0xF4 0x03C4 # GREEK SMALL LETTER TAU
|
||||
0xF5 0x03B8 # GREEK SMALL LETTER THETA
|
||||
0xF6 0x03C9 # GREEK SMALL LETTER OMEGA
|
||||
0xF7 0x03C2 # GREEK SMALL LETTER FINAL SIGMA
|
||||
0xF8 0x03C7 # GREEK SMALL LETTER CHI
|
||||
0xF9 0x03C5 # GREEK SMALL LETTER UPSILON
|
||||
0xFA 0x03B6 # GREEK SMALL LETTER ZETA
|
||||
0xFB 0x03CA # GREEK SMALL LETTER IOTA WITH DIALYTIKA
|
||||
0xFC 0x03CB # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
|
||||
0xFD 0x0390 # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
|
||||
0xFE 0x03B0 # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
|
||||
0xFF 0x00AD # SOFT HYPHEN # before Mac OS 9.2.2, was undefined
|
383
charmap/GUJARATI.TXT
Normal file
383
charmap/GUJARATI.TXT
Normal file
@ -0,0 +1,383 @@
|
||||
#=======================================================================
|
||||
# File name: GUJARATI.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Gujarati
|
||||
# encoding to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1997-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update URLs. Matches internal utom<b1>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n02 1998-Feb-05 First version; matches internal utom<n4>,
|
||||
# ufrm<n5>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Gujarati code or code sequence
|
||||
# (in hex as 0xNN or 0xNN+0xNN)
|
||||
# Column #2 is the corresponding Unicode or Unicode sequence
|
||||
# (in hex as 0xNNNN or 0xNNNN+0xNNNN).
|
||||
# Column #3 is a comment containing the Unicode name or sequence
|
||||
# of names. In some cases an additional comment follows the
|
||||
# Unicode name(s).
|
||||
#
|
||||
# The entries are in two sections. The first section is for pairs of
|
||||
# Mac OS Gujarati code points that must be mapped in a special way.
|
||||
# The second section maps individual code points.
|
||||
#
|
||||
# Within each section, the entries are in Mac OS Gujarati code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Gujarati character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Gujarati:
|
||||
# -------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# Mac OS Gujarati is based on IS 13194:1991 (ISCII-91), with the
|
||||
# addition of several punctuation and symbol characters. However,
|
||||
# Mac OS Gujarati does not support the ATR (attribute) mechanism of
|
||||
# ISCII-91.
|
||||
#
|
||||
# 1. ISCII-91 features in Mac OS Gujarati include:
|
||||
#
|
||||
# a) Overloading of nukta
|
||||
#
|
||||
# In addition to using the nukta (0xE9) like a combining dot below,
|
||||
# nukta is overloaded to function as a general character modifier.
|
||||
# In this role, certain code points followed by 0xE9 are treated as
|
||||
# a two-byte code point representing a character which may be
|
||||
# rather different than the characters represented by either of
|
||||
# the code points alone. For example, the character GUJARATI OM
|
||||
# (U+0AD0) is represented in ISCII-91 as candrabindu + nukta.
|
||||
#
|
||||
# b) Explicit halant and soft halant
|
||||
#
|
||||
# A double halant (0xE8 + 0xE8) constitutes an "explicit halant",
|
||||
# which will always appear as a halant instead of causing formation
|
||||
# of a ligature or half-form consonant.
|
||||
#
|
||||
# Halant followed by nukta (0xE8 + 0xE9) constitutes a "soft
|
||||
# halant", which prevents formation of a ligature and instead
|
||||
# retains the half-form of the first consonant.
|
||||
#
|
||||
# c) Invisible consonant
|
||||
#
|
||||
# The byte 0xD9 (called INV in ISCII-91) is an invisible consonant:
|
||||
# It behaves like a consonant but has no visible appearance. It is
|
||||
# intended to be used (often in combination with halant) to display
|
||||
# dependent forms in isolation, such as the RA forms or consonant
|
||||
# half-forms.
|
||||
#
|
||||
# d) Extensions for Vedic, etc.
|
||||
#
|
||||
# The byte 0xF0 (called EXT in ISCII-91) followed by any byte in
|
||||
# the range 0xA1-0xEE constitutes a two-byte code point which can
|
||||
# be used to represent additional characters for Vedic (or other
|
||||
# extensions); 0xF0 followed by any other byte value constitutes
|
||||
# malformed text. Mac OS Gujarati supports this mechanism, but
|
||||
# does not currently map any of these two-byte code points to
|
||||
# anything.
|
||||
#
|
||||
# 2. Mac OS Gujarati additions
|
||||
#
|
||||
# Mac OS Gujarati adds characters using the code points
|
||||
# 0x80-0x8A and 0x90.
|
||||
#
|
||||
# 3. Unused code points
|
||||
#
|
||||
# The following code points are currently unused, and are not shown
|
||||
# here: 0x8B-0x8F, 0x91-0xA0, 0xAB, 0xAF, 0xC7, 0xCE, 0xD0, 0xD3,
|
||||
# 0xE0, 0xE4, 0xEB-0xEF, 0xFB-0xFF. In addition, 0xF0 is not shown
|
||||
# here, but it has a special function as described above.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# 1. Mapping the byte pairs
|
||||
#
|
||||
# If one of the following byte values is encountered when mapping
|
||||
# Mac OS Gujarati text - xA1, xAA, xDF, or 0xE8 - then the next
|
||||
# byte (if there is one) should be examined. If the next byte is
|
||||
# 0xE9 - or also 0xE8, if the first byte was 0xE8 - then the byte
|
||||
# pair should be mapped using the first section of the mapping
|
||||
# table below. Otherwise, each byte should be mapped using the
|
||||
# second section of the mapping table below.
|
||||
#
|
||||
# - The Unicode Standard, Version 2.0, specifies how explicit
|
||||
# halant and soft halant should be represented in Unicode;
|
||||
# these mappings are used below.
|
||||
#
|
||||
# If the byte value 0xF0 is encountered when mapping Mac OS
|
||||
# Gujarati text, then the next byte should be examined. If there
|
||||
# is no next byte (e.g. 0xF0 at end of buffer), the mapping
|
||||
# process should indicate incomplete character. If there is a next
|
||||
# byte but it is not in the range 0xA1-0xEE, the mapping process
|
||||
# should indicate malformed text. Otherwise, the mapping process
|
||||
# should treat the byte pair as a valid two-byte code point with no
|
||||
# mapping (e.g. map it to QUESTION MARK, REPLACEMENT CHARACTER,
|
||||
# etc.).
|
||||
#
|
||||
# 2. Mapping the invisible consonant
|
||||
#
|
||||
# It has been suggested that INV in ISCII-91 should map to ZERO
|
||||
# WIDTH NON-JOINER in Unicode. However, this causes problems with
|
||||
# roundtrip fidelity: The ISCII-91 sequences 0xE8+0xE8 and 0xE8+0xD9
|
||||
# would map to the same sequence of Unicode characters. We have
|
||||
# instead mapped INV to LEFT-TO-RIGHT MARK, which avoids these
|
||||
# problems.
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
##################
|
||||
|
||||
# Section 1: Map the following byte pairs as indicated:
|
||||
# (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
|
||||
# (Also see note about 0xF0 in comments above)
|
||||
|
||||
0xA1+0xE9 0x0AD0 # GUJARATI OM
|
||||
0xAA+0xE9 0x0AE0 # GUJARATI LETTER VOCALIC RR
|
||||
0xDF+0xE9 0x0AC4 # GUJARATI VOWEL SIGN VOCALIC RR
|
||||
0xE8+0xE8 0x0ACD+0x200C # GUJARATI SIGN VIRAMA + ZWNJ # explicit halant
|
||||
0xE8+0xE9 0x0ACD+0x200D # GUJARATI SIGN VIRAMA + ZWJ # soft halant
|
||||
|
||||
# Section 2: Map the remaining bytes as follows:
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00D7 # MULTIPLICATION SIGN
|
||||
0x81 0x2212 # MINUS SIGN
|
||||
0x82 0x2013 # EN DASH
|
||||
0x83 0x2014 # EM DASH
|
||||
0x84 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0x85 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0x86 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0x87 0x2022 # BULLET
|
||||
0x88 0x00A9 # COPYRIGHT SIGN
|
||||
0x89 0x00AE # REGISTERED SIGN
|
||||
0x8A 0x2122 # TRADE MARK SIGN
|
||||
#
|
||||
0x90 0x0965 # DEVANAGARI DOUBLE DANDA
|
||||
#
|
||||
0xA1 0x0A81 # GUJARATI SIGN CANDRABINDU
|
||||
0xA2 0x0A82 # GUJARATI SIGN ANUSVARA
|
||||
0xA3 0x0A83 # GUJARATI SIGN VISARGA
|
||||
0xA4 0x0A85 # GUJARATI LETTER A
|
||||
0xA5 0x0A86 # GUJARATI LETTER AA
|
||||
0xA6 0x0A87 # GUJARATI LETTER I
|
||||
0xA7 0x0A88 # GUJARATI LETTER II
|
||||
0xA8 0x0A89 # GUJARATI LETTER U
|
||||
0xA9 0x0A8A # GUJARATI LETTER UU
|
||||
0xAA 0x0A8B # GUJARATI LETTER VOCALIC R
|
||||
#
|
||||
0xAC 0x0A8F # GUJARATI LETTER E
|
||||
0xAD 0x0A90 # GUJARATI LETTER AI
|
||||
0xAE 0x0A8D # GUJARATI VOWEL CANDRA E
|
||||
#
|
||||
0xB0 0x0A93 # GUJARATI LETTER O
|
||||
0xB1 0x0A94 # GUJARATI LETTER AU
|
||||
0xB2 0x0A91 # GUJARATI VOWEL CANDRA O
|
||||
0xB3 0x0A95 # GUJARATI LETTER KA
|
||||
0xB4 0x0A96 # GUJARATI LETTER KHA
|
||||
0xB5 0x0A97 # GUJARATI LETTER GA
|
||||
0xB6 0x0A98 # GUJARATI LETTER GHA
|
||||
0xB7 0x0A99 # GUJARATI LETTER NGA
|
||||
0xB8 0x0A9A # GUJARATI LETTER CA
|
||||
0xB9 0x0A9B # GUJARATI LETTER CHA
|
||||
0xBA 0x0A9C # GUJARATI LETTER JA
|
||||
0xBB 0x0A9D # GUJARATI LETTER JHA
|
||||
0xBC 0x0A9E # GUJARATI LETTER NYA
|
||||
0xBD 0x0A9F # GUJARATI LETTER TTA
|
||||
0xBE 0x0AA0 # GUJARATI LETTER TTHA
|
||||
0xBF 0x0AA1 # GUJARATI LETTER DDA
|
||||
0xC0 0x0AA2 # GUJARATI LETTER DDHA
|
||||
0xC1 0x0AA3 # GUJARATI LETTER NNA
|
||||
0xC2 0x0AA4 # GUJARATI LETTER TA
|
||||
0xC3 0x0AA5 # GUJARATI LETTER THA
|
||||
0xC4 0x0AA6 # GUJARATI LETTER DA
|
||||
0xC5 0x0AA7 # GUJARATI LETTER DHA
|
||||
0xC6 0x0AA8 # GUJARATI LETTER NA
|
||||
#
|
||||
0xC8 0x0AAA # GUJARATI LETTER PA
|
||||
0xC9 0x0AAB # GUJARATI LETTER PHA
|
||||
0xCA 0x0AAC # GUJARATI LETTER BA
|
||||
0xCB 0x0AAD # GUJARATI LETTER BHA
|
||||
0xCC 0x0AAE # GUJARATI LETTER MA
|
||||
0xCD 0x0AAF # GUJARATI LETTER YA
|
||||
#
|
||||
0xCF 0x0AB0 # GUJARATI LETTER RA
|
||||
#
|
||||
0xD1 0x0AB2 # GUJARATI LETTER LA
|
||||
0xD2 0x0AB3 # GUJARATI LETTER LLA
|
||||
#
|
||||
0xD4 0x0AB5 # GUJARATI LETTER VA
|
||||
0xD5 0x0AB6 # GUJARATI LETTER SHA
|
||||
0xD6 0x0AB7 # GUJARATI LETTER SSA
|
||||
0xD7 0x0AB8 # GUJARATI LETTER SA
|
||||
0xD8 0x0AB9 # GUJARATI LETTER HA
|
||||
0xD9 0x200E # LEFT-TO-RIGHT MARK # invisible consonant
|
||||
0xDA 0x0ABE # GUJARATI VOWEL SIGN AA
|
||||
0xDB 0x0ABF # GUJARATI VOWEL SIGN I
|
||||
0xDC 0x0AC0 # GUJARATI VOWEL SIGN II
|
||||
0xDD 0x0AC1 # GUJARATI VOWEL SIGN U
|
||||
0xDE 0x0AC2 # GUJARATI VOWEL SIGN UU
|
||||
0xDF 0x0AC3 # GUJARATI VOWEL SIGN VOCALIC R
|
||||
#
|
||||
0xE1 0x0AC7 # GUJARATI VOWEL SIGN E
|
||||
0xE2 0x0AC8 # GUJARATI VOWEL SIGN AI
|
||||
0xE3 0x0AC5 # GUJARATI VOWEL SIGN CANDRA E
|
||||
#
|
||||
0xE5 0x0ACB # GUJARATI VOWEL SIGN O
|
||||
0xE6 0x0ACC # GUJARATI VOWEL SIGN AU
|
||||
0xE7 0x0AC9 # GUJARATI VOWEL SIGN CANDRA O
|
||||
0xE8 0x0ACD # GUJARATI SIGN VIRAMA # halant
|
||||
0xE9 0x0ABC # GUJARATI SIGN NUKTA
|
||||
0xEA 0x0964 # DEVANAGARI DANDA
|
||||
#
|
||||
0xF1 0x0AE6 # GUJARATI DIGIT ZERO
|
||||
0xF2 0x0AE7 # GUJARATI DIGIT ONE
|
||||
0xF3 0x0AE8 # GUJARATI DIGIT TWO
|
||||
0xF4 0x0AE9 # GUJARATI DIGIT THREE
|
||||
0xF5 0x0AEA # GUJARATI DIGIT FOUR
|
||||
0xF6 0x0AEB # GUJARATI DIGIT FIVE
|
||||
0xF7 0x0AEC # GUJARATI DIGIT SIX
|
||||
0xF8 0x0AED # GUJARATI DIGIT SEVEN
|
||||
0xF9 0x0AEE # GUJARATI DIGIT EIGHT
|
||||
0xFA 0x0AEF # GUJARATI DIGIT NINE
|
441
charmap/GURMUKHI.TXT
Normal file
441
charmap/GURMUKHI.TXT
Normal file
@ -0,0 +1,441 @@
|
||||
#=======================================================================
|
||||
# File name: GURMUKHI.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Gurmukhi
|
||||
# encoding to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1997-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Change mappings for 0x91, 0xD5 based on
|
||||
# new decomposition rules. Update URLs,
|
||||
# notes. Matches internal utom<b2>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n02 1998-Feb-05 First version; matches internal utom<n5>,
|
||||
# ufrm<n6>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Gurmukhi code or code sequence
|
||||
# (in hex as 0xNN or 0xNN+0xNN)
|
||||
# Column #2 is the corresponding Unicode or Unicode sequence
|
||||
# (in hex as 0xNNNN or 0xNNNN+0xNNNN).
|
||||
# Column #3 is a comment containing the Unicode name or sequence
|
||||
# of names. In some cases an additional comment follows the
|
||||
# Unicode name(s).
|
||||
#
|
||||
# The entries are in two sections. The first section is for pairs of
|
||||
# Mac OS Gurmukhi code points that must be mapped in a special way.
|
||||
# The second section maps individual code points.
|
||||
#
|
||||
# Within each section, the entries are in Mac OS Gurmukhi code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Gurmukhi character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Gurmukhi:
|
||||
# -------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# Mac OS Gurmukhi is based on IS 13194:1991 (ISCII-91), with the
|
||||
# addition of several punctuation and symbol characters. However,
|
||||
# Mac OS Gurmukhi does not support the ATR (attribute) mechanism of
|
||||
# ISCII-91.
|
||||
#
|
||||
# 1. ISCII-91 features in Mac OS Gurmukhi include:
|
||||
#
|
||||
# a) Explicit halant and soft halant
|
||||
#
|
||||
# A double halant (0xE8 + 0xE8) constitutes an "explicit halant",
|
||||
# which will always appear as a halant instead of causing formation
|
||||
# of a ligature or half-form consonant.
|
||||
#
|
||||
# Halant followed by nukta (0xE8 + 0xE9) constitutes a "soft
|
||||
# halant", which prevents formation of a ligature and instead
|
||||
# retains the half-form of the first consonant.
|
||||
#
|
||||
# b) Invisible consonant
|
||||
#
|
||||
# The byte 0xD9 (called INV in ISCII-91) is an invisible consonant:
|
||||
# It behaves like a consonant but has no visible appearance. It is
|
||||
# intended to be used (often in combination with halant) to display
|
||||
# dependent forms in isolation, such as the RA forms or consonant
|
||||
# half-forms.
|
||||
#
|
||||
# c) Extensions for Vedic, etc.
|
||||
#
|
||||
# The byte 0xF0 (called EXT in ISCII-91) followed by any byte in
|
||||
# the range 0xA1-0xEE constitutes a two-byte code point which can
|
||||
# be used to represent additional characters for Vedic (or other
|
||||
# extensions); 0xF0 followed by any other byte value constitutes
|
||||
# malformed text. Mac OS Gurmukhi supports this mechanism, but
|
||||
# does not currently map any of these two-byte code points to
|
||||
# anything.
|
||||
#
|
||||
# 2. Mac OS Gurmukhi additions
|
||||
#
|
||||
# Mac OS Gurmukhi adds characters using the code points
|
||||
# 0x80-0x8A and 0x90-0x94 (the latter are some Gurmukhi additions).
|
||||
#
|
||||
# 3. Unused code points
|
||||
#
|
||||
# The following code points are currently unused, and are not shown
|
||||
# here: 0x8B-0x8F, 0x95-0xA1, 0xA3, 0xAA-0xAB, 0xAE-0xAF, 0xB2,
|
||||
# 0xC7, 0xCE, 0xD0, 0xD2-0xD3, 0xD6, 0xDF-0xE0, 0xE3-0xE4, 0xE7,
|
||||
# 0xEB-0xEF, 0xFB-0xFF. In addition, 0xF0 is not shown here, but it
|
||||
# has a special function as described above.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# 1. Mapping the byte pairs
|
||||
#
|
||||
# If the byte value 0xE8 is encountered when mapping Mac OS
|
||||
# Gurmukhi text, then the next byte (if there is one) should be
|
||||
# examined. If the next byte is 0xE8 or 0xE9, then the byte pair
|
||||
# should be mapped using the first section of the mapping table
|
||||
# below. Otherwise, each byte should be mapped using the second
|
||||
# section of the mapping table below.
|
||||
#
|
||||
# - The Unicode Standard, Version 2.0, specifies how explicit
|
||||
# halant and soft halant should be represented in Unicode;
|
||||
# these mappings are used below.
|
||||
#
|
||||
# If the byte value 0xF0 is encountered when mapping Mac OS
|
||||
# Gurmukhi text, then the next byte should be examined. If there
|
||||
# is no next byte (e.g. 0xF0 at end of buffer), the mapping
|
||||
# process should indicate incomplete character. If there is a next
|
||||
# byte but it is not in the range 0xA1-0xEE, the mapping process
|
||||
# should indicate malformed text. Otherwise, the mapping process
|
||||
# should treat the byte pair as a valid two-byte code point with no
|
||||
# mapping (e.g. map it to QUESTION MARK, REPLACEMENT CHARACTER,
|
||||
# etc.).
|
||||
#
|
||||
# 2. Mapping the invisible consonant
|
||||
#
|
||||
# It has been suggested that INV in ISCII-91 should map to ZERO
|
||||
# WIDTH NON-JOINER in Unicode. However, this causes problems with
|
||||
# roundtrip fidelity: The ISCII-91 sequences 0xE8+0xE8 and 0xE8+0xD9
|
||||
# would map to the same sequence of Unicode characters. We have
|
||||
# instead mapped INV to LEFT-TO-RIGHT MARK, which avoids these
|
||||
# problems.
|
||||
#
|
||||
# 3. Mappings using corporate characters
|
||||
#
|
||||
# Mapping the GURMUKHI LETTER SHA 0xD5 presents an interesting
|
||||
# problem. At first glance, we could map it to the single Unicode
|
||||
# character 0x0A36.
|
||||
#
|
||||
# However, our goal is that the mappings provided here should also
|
||||
# be able to generate the mappings to maximally decomposed Unicode
|
||||
# by simple recursive substitution of the canonical decompositions
|
||||
# in the Unicode database. We want mapping tables derived this way
|
||||
# to retain full roundtrip fidelity.
|
||||
#
|
||||
# Since the canonical decomposition of 0x0A36 is 0x0A38+0x0A3C,
|
||||
# the decomposition mapping for 0xD5 would be identical with the
|
||||
# decomposition mapping for 0xD7+0xE9, and roundtrip fidelity would
|
||||
# be lost.
|
||||
#
|
||||
# We solve this problem by using a grouping hint (one of the set of
|
||||
# transcoding hints defined by Apple).
|
||||
#
|
||||
# Apple has defined a block of 32 corporate characters as "transcoding
|
||||
# hints." These are used in combination with standard Unicode characters
|
||||
# to force them to be treated in a special way for mapping to other
|
||||
# encodings; they have no other effect. Sixteen of these transcoding
|
||||
# hints are "grouping hints" - they indicate that the next 2-4 Unicode
|
||||
# characters should be treated as a single entity for transcoding. The
|
||||
# other sixteen transcoding hints are "variant tags" - they are like
|
||||
# combining characters, and can follow a standard Unicode (or a sequence
|
||||
# consisting of a base character and other combining characters) to
|
||||
# cause it to be treated in a special way for transcoding. These always
|
||||
# terminate a combining-character sequence.
|
||||
#
|
||||
# The transcoding coding hint used in this mapping table is:
|
||||
# 0xF860 group next 2 characters
|
||||
#
|
||||
# Then we can map 0x91 as follows:
|
||||
# 0xD5 -> 0xF860+0x0A38+0x0A3C
|
||||
#
|
||||
# We could also have used a variant tag such as 0xF87F and mapped it
|
||||
# this way:
|
||||
# 0xD5 -> 0x0A36+0xF87F
|
||||
#
|
||||
# 4. Additional loose mappings from Unicode
|
||||
#
|
||||
# These are not preserved in roundtrip mappings.
|
||||
#
|
||||
# 0A59 -> 0xB4+0xE9 # GURMUKHI LETTER KHHA
|
||||
# 0A5A -> 0xB5+0xE9 # GURMUKHI LETTER GHHA
|
||||
# 0A5B -> 0xBA+0xE9 # GURMUKHI LETTER ZA
|
||||
# 0A5E -> 0xC9+0xE9 # GURMUKHI LETTER FA
|
||||
#
|
||||
# 0A70 -> 0xA2 # GURMUKHI TIPPI
|
||||
#
|
||||
# Loose mappings from Unicode should also map U+0A71 (GURMUKHI ADDAK)
|
||||
# followed by any Gurmukhi consonant to the equivalent ISCII-91
|
||||
# consonant plus halant plus the consonant again. For example:
|
||||
#
|
||||
# 0A71+0A15 -> 0xB3+0xE8+0xB3
|
||||
# 0A71+0A16 -> 0xB4+0xE8+0xB4
|
||||
# ...
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version b02 to version b03/c01:
|
||||
#
|
||||
# - Change mapping of 0x91 from 0xF860+0x0A21+0x0A3C to 0x0A5C GURMUKHI
|
||||
# LETTER RRA, now that the canonical decomposition of 0x0A5C to
|
||||
# 0x0A21+0x0A3C has been deleted
|
||||
#
|
||||
# - Change mapping of 0xD5 from 0x0A36 GURMUKHI LETTER SHA to
|
||||
# 0xF860+0x0A38+0x0A3C, now that a canonical decomposition of 0x0A36
|
||||
# to 0x0A38+0x0A3C has been added.
|
||||
#
|
||||
##################
|
||||
|
||||
# Section 1: Map the following byte pairs as indicated:
|
||||
# (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
|
||||
# (Also see note about 0xF0 in comments above)
|
||||
|
||||
0xE8+0xE8 0x0A4D+0x200C # GURMUKHI SIGN VIRAMA + ZWNJ # explicit halant
|
||||
0xE8+0xE9 0x0A4D+0x200D # GURMUKHI SIGN VIRAMA + ZWJ # soft halant
|
||||
|
||||
# Section 2: Map the remaining bytes as follows:
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00D7 # MULTIPLICATION SIGN
|
||||
0x81 0x2212 # MINUS SIGN
|
||||
0x82 0x2013 # EN DASH
|
||||
0x83 0x2014 # EM DASH
|
||||
0x84 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0x85 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0x86 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0x87 0x2022 # BULLET
|
||||
0x88 0x00A9 # COPYRIGHT SIGN
|
||||
0x89 0x00AE # REGISTERED SIGN
|
||||
0x8A 0x2122 # TRADE MARK SIGN
|
||||
#
|
||||
0x90 0x0A71 # GURMUKHI ADDAK
|
||||
0x91 0x0A5C # GURMUKHI LETTER RRA
|
||||
0x92 0x0A73 # GURMUKHI URA
|
||||
0x93 0x0A72 # GURMUKHI IRI
|
||||
0x94 0x0A74 # GURMUKHI EK ONKAR
|
||||
#
|
||||
0xA2 0x0A02 # GURMUKHI SIGN BINDI
|
||||
#
|
||||
0xA4 0x0A05 # GURMUKHI LETTER A
|
||||
0xA5 0x0A06 # GURMUKHI LETTER AA
|
||||
0xA6 0x0A07 # GURMUKHI LETTER I
|
||||
0xA7 0x0A08 # GURMUKHI LETTER II
|
||||
0xA8 0x0A09 # GURMUKHI LETTER U
|
||||
0xA9 0x0A0A # GURMUKHI LETTER UU
|
||||
#
|
||||
0xAC 0x0A0F # GURMUKHI LETTER EE
|
||||
0xAD 0x0A10 # GURMUKHI LETTER AI
|
||||
#
|
||||
0xB0 0x0A13 # GURMUKHI LETTER OO
|
||||
0xB1 0x0A14 # GURMUKHI LETTER AU
|
||||
#
|
||||
0xB3 0x0A15 # GURMUKHI LETTER KA
|
||||
0xB4 0x0A16 # GURMUKHI LETTER KHA
|
||||
0xB5 0x0A17 # GURMUKHI LETTER GA
|
||||
0xB6 0x0A18 # GURMUKHI LETTER GHA
|
||||
0xB7 0x0A19 # GURMUKHI LETTER NGA
|
||||
0xB8 0x0A1A # GURMUKHI LETTER CA
|
||||
0xB9 0x0A1B # GURMUKHI LETTER CHA
|
||||
0xBA 0x0A1C # GURMUKHI LETTER JA
|
||||
0xBB 0x0A1D # GURMUKHI LETTER JHA
|
||||
0xBC 0x0A1E # GURMUKHI LETTER NYA
|
||||
0xBD 0x0A1F # GURMUKHI LETTER TTA
|
||||
0xBE 0x0A20 # GURMUKHI LETTER TTHA
|
||||
0xBF 0x0A21 # GURMUKHI LETTER DDA
|
||||
0xC0 0x0A22 # GURMUKHI LETTER DDHA
|
||||
0xC1 0x0A23 # GURMUKHI LETTER NNA
|
||||
0xC2 0x0A24 # GURMUKHI LETTER TA
|
||||
0xC3 0x0A25 # GURMUKHI LETTER THA
|
||||
0xC4 0x0A26 # GURMUKHI LETTER DA
|
||||
0xC5 0x0A27 # GURMUKHI LETTER DHA
|
||||
0xC6 0x0A28 # GURMUKHI LETTER NA
|
||||
#
|
||||
0xC8 0x0A2A # GURMUKHI LETTER PA
|
||||
0xC9 0x0A2B # GURMUKHI LETTER PHA
|
||||
0xCA 0x0A2C # GURMUKHI LETTER BA
|
||||
0xCB 0x0A2D # GURMUKHI LETTER BHA
|
||||
0xCC 0x0A2E # GURMUKHI LETTER MA
|
||||
0xCD 0x0A2F # GURMUKHI LETTER YA
|
||||
#
|
||||
0xCF 0x0A30 # GURMUKHI LETTER RA
|
||||
#
|
||||
0xD1 0x0A32 # GURMUKHI LETTER LA
|
||||
#
|
||||
0xD4 0x0A35 # GURMUKHI LETTER VA
|
||||
0xD5 0xF860+0x0A38+0x0A3C # GURMUKHI LETTER SHA
|
||||
#
|
||||
0xD7 0x0A38 # GURMUKHI LETTER SA
|
||||
0xD8 0x0A39 # GURMUKHI LETTER HA
|
||||
0xD9 0x200E # LEFT-TO-RIGHT MARK # invisible consonant
|
||||
0xDA 0x0A3E # GURMUKHI VOWEL SIGN AA
|
||||
0xDB 0x0A3F # GURMUKHI VOWEL SIGN I
|
||||
0xDC 0x0A40 # GURMUKHI VOWEL SIGN II
|
||||
0xDD 0x0A41 # GURMUKHI VOWEL SIGN U
|
||||
0xDE 0x0A42 # GURMUKHI VOWEL SIGN UU
|
||||
#
|
||||
0xE1 0x0A47 # GURMUKHI VOWEL SIGN EE
|
||||
0xE2 0x0A48 # GURMUKHI VOWEL SIGN AI
|
||||
#
|
||||
0xE5 0x0A4B # GURMUKHI VOWEL SIGN OO
|
||||
0xE6 0x0A4C # GURMUKHI VOWEL SIGN AU
|
||||
#
|
||||
0xE8 0x0A4D # GURMUKHI SIGN VIRAMA # halant
|
||||
0xE9 0x0A3C # GURMUKHI SIGN NUKTA
|
||||
0xEA 0x0964 # DEVANAGARI DANDA
|
||||
#
|
||||
0xF1 0x0A66 # GURMUKHI DIGIT ZERO
|
||||
0xF2 0x0A67 # GURMUKHI DIGIT ONE
|
||||
0xF3 0x0A68 # GURMUKHI DIGIT TWO
|
||||
0xF4 0x0A69 # GURMUKHI DIGIT THREE
|
||||
0xF5 0x0A6A # GURMUKHI DIGIT FOUR
|
||||
0xF6 0x0A6B # GURMUKHI DIGIT FIVE
|
||||
0xF7 0x0A6C # GURMUKHI DIGIT SIX
|
||||
0xF8 0x0A6D # GURMUKHI DIGIT SEVEN
|
||||
0xF9 0x0A6E # GURMUKHI DIGIT EIGHT
|
||||
0xFA 0x0A6F # GURMUKHI DIGIT NINE
|
601
charmap/HEBREW.TXT
Normal file
601
charmap/HEBREW.TXT
Normal file
@ -0,0 +1,601 @@
|
||||
#=======================================================================
|
||||
# File name: HEBREW.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Hebrew
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments; add section on
|
||||
# roundtrip considerations. Matches internal
|
||||
# xml <c1.4> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Don't require left-right context for digits
|
||||
# 0x30-0x39. Change mapping of 0x81 to use
|
||||
# decomposition. Reverse the mappings of 0xA8,
|
||||
# 0xA9. Update URLs, notes. Matches internal
|
||||
# utom<b7>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n03 1998-Feb-05 Show required Unicode character
|
||||
# directionality in a different way. Update
|
||||
# mappings for 0xC0 and 0xDE to use
|
||||
# transcoding hints; matches internal utom<n6>,
|
||||
# ufrm<n20>, and Text Encoding Converter
|
||||
# version 1.3. Rewrite header comments.
|
||||
# n01 1995-Nov-15 First version. Matches internal ufrm<n8>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Hebrew code (in hex as 0xNN).
|
||||
# Column #2 is the corresponding Unicode or Unicode sequence (in
|
||||
# hex as 0xNNNN, 0xNNNN+0xNNNN, etc.). Sequences of up to 3
|
||||
# Unicode characters are used here. A single Unicode character
|
||||
# may be preceded by a tag indicating required directionality
|
||||
# (i.e. <LR>+0xNNNN or <RL>+0xNNNN).
|
||||
# Column #3 is a comment containing the Unicode name.
|
||||
#
|
||||
# The entries are in Mac OS Hebrew code order.
|
||||
#
|
||||
# Some of these mappings require the use of corporate characters.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Hebrew character set uses the standard control characters at
|
||||
# 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Hebrew:
|
||||
# -----------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# 1. General
|
||||
#
|
||||
# The Mac OS Hebrew character set supports the Hebrew and Yiddish
|
||||
# languages. It incorporates the Hebrew letter repertoire of
|
||||
# ISO 8859-8, and uses the same code points for them, 0xE0-0xFA.
|
||||
# It also incorporates the ASCII character set. In addition, the
|
||||
# Mac OS Hebrew character set includes the following:
|
||||
#
|
||||
# - Hebrew points (nikud marks) at 0xC6, 0xCB-0xCF and 0xD8-0xDF.
|
||||
# These are non-spacing combining marks. Note that the RAFE point
|
||||
# at 0xD8 is not displayed correctly in some fonts, and cannot be
|
||||
# typed using the keyboard layouts in the current Hebrew localized
|
||||
# systems. Also note: The character given in Unicode as QAMATS
|
||||
# (U+05B8) actually refers to two different sounds, depending on
|
||||
# context. For example, when ALEF is followed by QAMATS, the QAMATS
|
||||
# can actually refer to two different sounds depending on the
|
||||
# following letters. The Mac OS Hebrew character set separately
|
||||
# encodes these two sounds for the same graphic shape, as "qamats"
|
||||
# (0xCB) and "qamats qatan" (0xDE). The "qamats" character is more
|
||||
# common, so it is mapped to the Unicode QAMATS; "qamats qatan" can
|
||||
# only be used with a limited number of characters, and it is
|
||||
# mapped using a corporate-zone variant tag (see below).
|
||||
#
|
||||
# - Various Hebrew ligatures at 0x81, 0xC0, 0xC7, 0xC8, 0xD6, and
|
||||
# 0xD7. Also note that the Yiddish YOD YOD PATAH ligature at 0x81
|
||||
# is missing in some fonts.
|
||||
#
|
||||
# - The NEW SHEQEL SIGN at 0xA6.
|
||||
#
|
||||
# - Latin characters with diacritics at 0x80 and 0x82-0x9F. However,
|
||||
# most of these cannot be typed using the keyboard layouts in the
|
||||
# Hebrew localized systems.
|
||||
#
|
||||
# - Right-left versions of certain ASCII punctuation, symbols and
|
||||
# digits: 0xA0-0xA5, 0xA7-0xBF, 0xFB-0xFF. See below.
|
||||
#
|
||||
# - Miscellaneous additional punctuation at 0xC1, 0xC9, 0xCA, and
|
||||
# 0xD0-0xD5. There is a variant of the Hebrew encoding in which
|
||||
# the LEFT SINGLE QUOTATION MARK at 0xD4 is replaced by FIGURE
|
||||
# SPACE. The glyphs for some of the other punctuation characters
|
||||
# are missing in some fonts.
|
||||
#
|
||||
# - Four obsolete characters at 0xC2-0xC5 known as canorals (not to
|
||||
# be confused with cantillation marks!). These were used for
|
||||
# manual positioning of nikud marks before System 7.1 (at which
|
||||
# point nikud positioning became automatic with WorldScript.).
|
||||
#
|
||||
# 2. Directional characters and roundtrip fidelity
|
||||
#
|
||||
# The Mac OS Hebrew character set was developed around 1987. At that
|
||||
# time the bidirectional line line layout algorithm used in the Mac OS
|
||||
# Hebrew system was fairly simple; it used only a few direction
|
||||
# classes (instead of the 19 now used in the Unicode bidirectional
|
||||
# algorithm). In order to permit users to handle some tricky layou
|
||||
# problems, certain punctuation, symbol, and digit characters have
|
||||
# duplicate code points, one with a left-right direction attribute and
|
||||
# the other with a right-left direction attribute.
|
||||
#
|
||||
# For example, plus sign is encoded at 0x2B with a left-right
|
||||
# attribute, and at 0xAB with a right-left attribute. However, there
|
||||
# is only one PLUS SIGN character in Unicode. This leads to some
|
||||
# interesting problems when mapping between Mac OS Hebrew and Unicode;
|
||||
# see below.
|
||||
#
|
||||
# A related problem is that even when a particular character is
|
||||
# encoded only once in Mac OS Hebrew, it may have a different
|
||||
# direction attribute than the corresponding Unicode character.
|
||||
#
|
||||
# For example, the Mac OS Hebrew character at 0xC9 is HORIZONTAL
|
||||
# ELLIPSIS with strong right-left direction. However, the Unicode
|
||||
# character HORIZONTAL ELLIPSIS has direction class neutral.
|
||||
#
|
||||
# 3. Font variants
|
||||
#
|
||||
# The table in this file gives the Unicode mappings for the standard
|
||||
# Mac OS Hebrew encoding. This encoding is supported by many of the
|
||||
# Apple fonts (including all of the fonts in the Hebrew Language Kit),
|
||||
# and is the encoding supported by the text processing utilities.
|
||||
# However, some TrueType fonts provided with the localized Hebrew
|
||||
# system implement a slightly different encoding; the difference is
|
||||
# only in one code point, 0xD4. For the standard variant, this is:
|
||||
# 0xD4 -> <RL>+0x2018 LEFT SINGLE QUOTATION MARK, right-left
|
||||
#
|
||||
# The TrueType variant is used by the following TrueType fonts from
|
||||
# the localized system: Caesarea, Carmel Book, Gilboa, Ramat Sharon,
|
||||
# and Sinai Book. For these, 0xD4 is as follows:
|
||||
# 0xD4 -> <RL>+0x2007 FIGURE SPACE, right-left
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# 1. Matching the direction of Mac OS Hebrew characters
|
||||
#
|
||||
# When Mac OS Hebrew encodes a character twice but with different
|
||||
# direction attributes for the two code points - as in the case of
|
||||
# plus sign mentioned above - we need a way to map both Mac OS Hebrew
|
||||
# code points to Unicode and back again without loss of information.
|
||||
# With the plus sign, for example, mapping one of the Mac OS Hebrew
|
||||
# characters to a code in the Unicode corporate use zone is
|
||||
# undesirable, since both of the plus sign characters are likely to
|
||||
# be used in text that is interchanged.
|
||||
#
|
||||
# The problem is solved with the use of direction override characters
|
||||
# and direction-dependent mappings. When mapping from Mac OS Hebrew
|
||||
# to Unicode, we use direction overrides as necessary to force the
|
||||
# direction of the resulting Unicode characters.
|
||||
#
|
||||
# The required direction is indicated by a direction tag in the
|
||||
# mappings. A tag of <LR> means the corresponding Unicode character
|
||||
# must have a strong left-right context, and a tag of <RL> indicates
|
||||
# a right-left context.
|
||||
#
|
||||
# For example, the mapping of 0x2B is given as <LR>+0x002B; the
|
||||
# mapping of 0xAB is given as <RL>+0x002B. If we map an isolated
|
||||
# instance of 0x2B to Unicode, it should be mapped as follows (LRO
|
||||
# indicates LEFT-RIGHT OVERRIDE, PDF indicates POP DIRECTION
|
||||
# FORMATTING):
|
||||
#
|
||||
# 0x2B -> 0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
|
||||
#
|
||||
# When mapping several characters in a row that require direction
|
||||
# forcing, the overrides need only be used at the beginning and end.
|
||||
# For example:
|
||||
#
|
||||
# 0x24 0x20 0x28 0x29 -> 0x202D 0x0024 0x0020 0x0028 0x0029 0x202C
|
||||
#
|
||||
# If neutral characters that require direction forcing are already
|
||||
# between strong-direction characters with matching directionality,
|
||||
# then direction overrides need not be used. Direction overrides are
|
||||
# always needed to map the right-left digits at 0xB0-0xB9.
|
||||
#
|
||||
# When mapping from Unicode to Mac OS Hebrew, the Unicode
|
||||
# bidirectional algorithm should be used to determine resolved
|
||||
# direction of the Unicode characters. The mapping from Unicode to
|
||||
# Mac OS Hebrew can then be disambiguated by the use of the resolved
|
||||
# direction:
|
||||
#
|
||||
# Unicode 0x002B -> Mac OS Hebrew 0x2B (if L) or 0xAB (if R)
|
||||
#
|
||||
# However, this also means the direction override characters should
|
||||
# be discarded when mapping from Unicode to Mac OS Hebrew (after
|
||||
# they have been used to determine resolved direction), since the
|
||||
# direction override information is carried by the code point itself.
|
||||
#
|
||||
# Even when direction overrides are not needed for roundtrip
|
||||
# fidelity, they are sometimes used when mapping Mac OS Hebrew
|
||||
# characters to Unicode in order to achieve similar text layout with
|
||||
# the resulting Unicode text. For example, the single Mac OS Hebrew
|
||||
# ellipsis character has direction class right-left,and there is no
|
||||
# left-right version. However, the Unicode HORIZONTAL ELLIPSIS
|
||||
# character has direction class neutral (which means it may end up
|
||||
# with a resolved direction of left-right if surrounded by left-right
|
||||
# characters). When mapping the Mac OS Hebrew ellipsis to Unicode, it
|
||||
# is surrounded with a direction override to help preserve proper
|
||||
# text layout. The resolved direction is not needed or used when
|
||||
# mapping the Unicode HORIZONTAL ELLIPSIS back to Mac OS Hebrew.
|
||||
#
|
||||
# 2. Use of corporate-zone Unicodes
|
||||
#
|
||||
# The goals in the mappings provided here are:
|
||||
# - Ensure roundtrip mapping from every character in the Mac OS
|
||||
# Hebrew character set to Unicode and back
|
||||
# - Use standard Unicode characters as much as possible, to
|
||||
# maximize interchangeability of the resulting Unicode text.
|
||||
# Whenever possible, avoid having content carried by private-use
|
||||
# characters.
|
||||
#
|
||||
# Some of the characters in the Mac OS Hebrew character set do not
|
||||
# correspond to distinct, single Unicode characters. To map these
|
||||
# and satisfy both goals above, we employ various strategies.
|
||||
#
|
||||
# a) If possible, use private use characters in combination with
|
||||
# standard Unicode characters to mark variants of the standard
|
||||
# Unicode character.
|
||||
#
|
||||
# Apple has defined a block of 32 corporate characters as "transcoding
|
||||
# hints." These are used in combination with standard Unicode characters
|
||||
# to force them to be treated in a special way for mapping to other
|
||||
# encodings; they have no other effect. Sixteen of these transcoding
|
||||
# hints are "grouping hints" - they indicate that the next 2-4 Unicode
|
||||
# characters should be treated as a single entity for transcoding. The
|
||||
# other sixteen transcoding hints are "variant tags" - they are like
|
||||
# combining characters, and can follow a standard Unicode (or a sequence
|
||||
# consisting of a base character and other combining characters) to
|
||||
# cause it to be treated in a special way for transcoding. These always
|
||||
# terminate a combining-character sequence.
|
||||
#
|
||||
# Two transcoding hints are used in this mapping table: a grouping hint
|
||||
# and a variant tag:
|
||||
# hint:
|
||||
# 0xF86A group next 2 characters, right-left directionality
|
||||
# 0xF87F variant tag
|
||||
#
|
||||
# In Mac OS Hebrew, 0xC0 is a ligature for lamed holam. This can also
|
||||
# be represented in Mac OS Hebrew as 0xEC+0xDD, using separate
|
||||
# characters for lamed and holam. The latter sequence is mapped to
|
||||
# Unicode as 0x05DC+0x05B9, i.e. as the sequence HEBREW LETTER LAMED +
|
||||
# HEBREW POINT HOLAM. We want to map the ligature 0xC0 using the same
|
||||
# standard Unicode characters, but for round-trip fidelity we need to
|
||||
# distinguish it from the mapping of the sequence 0xEC+0xDD. Thus for
|
||||
# 0xC0 we use a grouping hint, and map as follows:
|
||||
#
|
||||
# 0xC0 -> 0xF86A+0x05DC+0x05B9
|
||||
#
|
||||
# The variant tag is used for "qamats qatan" to mark it as an alternate
|
||||
# for HEBREW POINT QAMATS, as follows:
|
||||
#
|
||||
# 0xDE -> 0x05B8+0xF87F
|
||||
#
|
||||
# b) Otherwise, use private use characters by themselves to map Mac OS
|
||||
# Hebrew characters which have no relationship to any standard Unicode
|
||||
# character.
|
||||
#
|
||||
# The following additional corporate zone Unicode characters are used
|
||||
# for this purpose here (to map the obsolete "canorals", see above):
|
||||
#
|
||||
# 0xF89B Hebrew canoral 1
|
||||
# 0xF89C Hebrew canoral 2
|
||||
# 0xF89D Hebrew canoral 3
|
||||
# 0xF89E Hebrew canoral 4
|
||||
#
|
||||
# 3. Roundtrip considerations when mapping to decomposed Unicode
|
||||
#
|
||||
# Both Mac OS Hebrew and Unicode provide multiple ways of representing
|
||||
# certain letter-and-point combinations. For example, HEBREW LETTER
|
||||
# VAV WITH HOLAM can be represented in Unicode as the single character
|
||||
# 0xFB4B or as the sequence 0x05D5 0x05B9; similarly, it can be
|
||||
# represented in Mac OS Hebrew as 0xC7 or as the sequence 0xE5 0xDD.
|
||||
# This leads to some roundtrip problems. First note that we have the
|
||||
# following mappings without such problems:
|
||||
#
|
||||
# Mac standard decomp. of reverse map
|
||||
# OS Unicode mapping std. mapping of decomp.
|
||||
# ---- ---------------------------------- ------------- -----------
|
||||
# 0xC6 0x05BC ... POINT DAGESH OR MAPIQ 0x05BC (same) 0xC6
|
||||
# 0xE5 0x05D5 ... LETTER VAV 0x05D5 (same) 0xE5
|
||||
# 0xDD 0x05B9 ... POINT HOLAM 0x05B9 (same) 0xDD
|
||||
#
|
||||
# However, those mappings above cause roundtrip problems for the
|
||||
# the following mappings if they are decomposed:
|
||||
#
|
||||
# Mac standard decomp. of reverse map
|
||||
# OS Unicode mapping std. mapping of decomp.
|
||||
# ---- ---------------------------------- ------------- -----------
|
||||
# 0xC7 0xFB4B ... LETTER VAV WITH HOLAM 0x05D5 0x05B9 0xE5 0xDD
|
||||
# 0xC8 0xFB35 ... LETTER VAV WITH DAGESH 0x05D5 0x05BC 0xE5 0xC6
|
||||
#
|
||||
# One solution is to use a grouping transcoding hint with the two
|
||||
# decompositions above to mark the decomposed sequence for special
|
||||
# treatment in transcoding. This yields the following mappings to
|
||||
# decomposed Unicode:
|
||||
#
|
||||
# Mac decomposed
|
||||
# OS Unicode mapping
|
||||
# ---- --------------------
|
||||
# 0xC7 0xF86A 0x05D5 0x05B9
|
||||
# 0xC8 0xF86A 0x05D5 0x05BC
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version b02 to version b03/c01:
|
||||
#
|
||||
# - Stop specifying left-right context for digits 0x30-0x39, since the
|
||||
# corresponding Unicodes 0x0030-0x0039 already have left-right
|
||||
# directionality.
|
||||
#
|
||||
# - Change mapping of 0x81 from 0xFB1F HEBREW LIGATURE YIDDISH YOD YOD
|
||||
# PATAH to its canonical decomposition 0x05F2+0x05B7 to improve
|
||||
# cross-platform compatibility (Windows doesn't handle 0xFB1F)
|
||||
#
|
||||
# - Interchange the mappings of 0xA8 and 0xA9 to obtain the correct
|
||||
# open/close behavior; they work differently than in Mac Arabic.
|
||||
# The old mapping was
|
||||
# 0xA8 <RL>+0x0028 # LEFT PARENTHESIS, right-left
|
||||
# 0xA9 <RL>+0x0029 # RIGHT PARENTHESIS, right-left
|
||||
# and the new mapping is
|
||||
# 0xA8 <RL>+0x0029 # RIGHT PARENTHESIS, right-left
|
||||
# 0xA9 <RL>+0x0028 # LEFT PARENTHESIS, right-left
|
||||
#
|
||||
# Changes from version n01 to version n03:
|
||||
#
|
||||
# - Change mapping for 0xC0 from single corporate character to
|
||||
# grouping hint plus standard Unicodes
|
||||
#
|
||||
# - Change mapping for 0xDE from single corporate character to
|
||||
# standard Unicode plus variant tag
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 <LR>+0x0020 # SPACE, left-right
|
||||
0x21 <LR>+0x0021 # EXCLAMATION MARK, left-right
|
||||
0x22 <LR>+0x0022 # QUOTATION MARK, left-right
|
||||
0x23 <LR>+0x0023 # NUMBER SIGN, left-right
|
||||
0x24 <LR>+0x0024 # DOLLAR SIGN, left-right
|
||||
0x25 <LR>+0x0025 # PERCENT SIGN, left-right
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 <LR>+0x0027 # APOSTROPHE, left-right
|
||||
0x28 <LR>+0x0028 # LEFT PARENTHESIS, left-right
|
||||
0x29 <LR>+0x0029 # RIGHT PARENTHESIS, left-right
|
||||
0x2A <LR>+0x002A # ASTERISK, left-right
|
||||
0x2B <LR>+0x002B # PLUS SIGN, left-right
|
||||
0x2C <LR>+0x002C # COMMA, left-right
|
||||
0x2D <LR>+0x002D # HYPHEN-MINUS, left-right
|
||||
0x2E <LR>+0x002E # FULL STOP, left-right
|
||||
0x2F <LR>+0x002F # SOLIDUS, left-right
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A <LR>+0x003A # COLON, left-right
|
||||
0x3B <LR>+0x003B # SEMICOLON, left-right
|
||||
0x3C <LR>+0x003C # LESS-THAN SIGN, left-right
|
||||
0x3D <LR>+0x003D # EQUALS SIGN, left-right
|
||||
0x3E <LR>+0x003E # GREATER-THAN SIGN, left-right
|
||||
0x3F <LR>+0x003F # QUESTION MARK, left-right
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B <LR>+0x005B # LEFT SQUARE BRACKET, left-right
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D <LR>+0x005D # RIGHT SQUARE BRACKET, left-right
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B <LR>+0x007B # LEFT CURLY BRACKET, left-right
|
||||
0x7C <LR>+0x007C # VERTICAL LINE, left-right
|
||||
0x7D <LR>+0x007D # RIGHT CURLY BRACKET, left-right
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x05F2+0x05B7 # HEBREW LIGATURE YIDDISH YOD YOD PATAH
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x00E3 # LATIN SMALL LETTER A WITH TILDE
|
||||
0x8C 0x00E5 # LATIN SMALL LETTER A WITH RING ABOVE
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 0x00EC # LATIN SMALL LETTER I WITH GRAVE
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 0x00F2 # LATIN SMALL LETTER O WITH GRAVE
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00F5 # LATIN SMALL LETTER O WITH TILDE
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 <RL>+0x0020 # SPACE, right-left
|
||||
0xA1 <RL>+0x0021 # EXCLAMATION MARK, right-left
|
||||
0xA2 <RL>+0x0022 # QUOTATION MARK, right-left
|
||||
0xA3 <RL>+0x0023 # NUMBER SIGN, right-left
|
||||
0xA4 <RL>+0x0024 # DOLLAR SIGN, right-left
|
||||
0xA5 <RL>+0x0025 # PERCENT SIGN, right-left
|
||||
0xA6 0x20AA # NEW SHEQEL SIGN
|
||||
0xA7 <RL>+0x0027 # APOSTROPHE, right-left
|
||||
0xA8 <RL>+0x0029 # RIGHT PARENTHESIS, right-left # close parenthesis
|
||||
0xA9 <RL>+0x0028 # LEFT PARENTHESIS, right-left # open parenthesis
|
||||
0xAA <RL>+0x002A # ASTERISK, right-left
|
||||
0xAB <RL>+0x002B # PLUS SIGN, right-left
|
||||
0xAC <RL>+0x002C # COMMA, right-left
|
||||
0xAD <RL>+0x002D # HYPHEN-MINUS, right-left
|
||||
0xAE <RL>+0x002E # FULL STOP, right-left
|
||||
0xAF <RL>+0x002F # SOLIDUS, right-left
|
||||
0xB0 <RL>+0x0030 # DIGIT ZERO, right-left (need override)
|
||||
0xB1 <RL>+0x0031 # DIGIT ONE, right-left (need override)
|
||||
0xB2 <RL>+0x0032 # DIGIT TWO, right-left (need override)
|
||||
0xB3 <RL>+0x0033 # DIGIT THREE, right-left (need override)
|
||||
0xB4 <RL>+0x0034 # DIGIT FOUR, right-left (need override)
|
||||
0xB5 <RL>+0x0035 # DIGIT FIVE, right-left (need override)
|
||||
0xB6 <RL>+0x0036 # DIGIT SIX, right-left (need override)
|
||||
0xB7 <RL>+0x0037 # DIGIT SEVEN, right-left (need override)
|
||||
0xB8 <RL>+0x0038 # DIGIT EIGHT, right-left (need override)
|
||||
0xB9 <RL>+0x0039 # DIGIT NINE, right-left (need override)
|
||||
0xBA <RL>+0x003A # COLON, right-left
|
||||
0xBB <RL>+0x003B # SEMICOLON, right-left
|
||||
0xBC <RL>+0x003C # LESS-THAN SIGN, right-left
|
||||
0xBD <RL>+0x003D # EQUALS SIGN, right-left
|
||||
0xBE <RL>+0x003E # GREATER-THAN SIGN, right-left
|
||||
0xBF <RL>+0x003F # QUESTION MARK, right-left
|
||||
0xC0 0xF86A+0x05DC+0x05B9 # Hebrew ligature lamed holam
|
||||
0xC1 <RL>+0x201E # DOUBLE LOW-9 QUOTATION MARK, right-left
|
||||
0xC2 0xF89B # Hebrew canoral 1
|
||||
0xC3 0xF89C # Hebrew canoral 2
|
||||
0xC4 0xF89D # Hebrew canoral 3
|
||||
0xC5 0xF89E # Hebrew canoral 4
|
||||
0xC6 0x05BC # HEBREW POINT DAGESH OR MAPIQ
|
||||
0xC7 0xFB4B # HEBREW LETTER VAV WITH HOLAM
|
||||
0xC8 0xFB35 # HEBREW LETTER VAV WITH DAGESH
|
||||
0xC9 <RL>+0x2026 # HORIZONTAL ELLIPSIS, right-left
|
||||
0xCA <RL>+0x00A0 # NO-BREAK SPACE, right-left
|
||||
0xCB 0x05B8 # HEBREW POINT QAMATS
|
||||
0xCC 0x05B7 # HEBREW POINT PATAH
|
||||
0xCD 0x05B5 # HEBREW POINT TSERE
|
||||
0xCE 0x05B6 # HEBREW POINT SEGOL
|
||||
0xCF 0x05B4 # HEBREW POINT HIRIQ
|
||||
0xD0 <RL>+0x2013 # EN DASH, right-left
|
||||
0xD1 <RL>+0x2014 # EM DASH, right-left
|
||||
0xD2 <RL>+0x201C # LEFT DOUBLE QUOTATION MARK, right-left
|
||||
0xD3 <RL>+0x201D # RIGHT DOUBLE QUOTATION MARK, right-left
|
||||
0xD4 <RL>+0x2018 # LEFT SINGLE QUOTATION MARK, right-left
|
||||
0xD5 <RL>+0x2019 # RIGHT SINGLE QUOTATION MARK, right-left
|
||||
0xD6 0xFB2A # HEBREW LETTER SHIN WITH SHIN DOT
|
||||
0xD7 0xFB2B # HEBREW LETTER SHIN WITH SIN DOT
|
||||
0xD8 0x05BF # HEBREW POINT RAFE
|
||||
0xD9 0x05B0 # HEBREW POINT SHEVA
|
||||
0xDA 0x05B2 # HEBREW POINT HATAF PATAH
|
||||
0xDB 0x05B1 # HEBREW POINT HATAF SEGOL
|
||||
0xDC 0x05BB # HEBREW POINT QUBUTS
|
||||
0xDD 0x05B9 # HEBREW POINT HOLAM
|
||||
0xDE 0x05B8+0xF87F # HEBREW POINT QAMATS, alternate form "qamats qatan"
|
||||
0xDF 0x05B3 # HEBREW POINT HATAF QAMATS
|
||||
0xE0 0x05D0 # HEBREW LETTER ALEF
|
||||
0xE1 0x05D1 # HEBREW LETTER BET
|
||||
0xE2 0x05D2 # HEBREW LETTER GIMEL
|
||||
0xE3 0x05D3 # HEBREW LETTER DALET
|
||||
0xE4 0x05D4 # HEBREW LETTER HE
|
||||
0xE5 0x05D5 # HEBREW LETTER VAV
|
||||
0xE6 0x05D6 # HEBREW LETTER ZAYIN
|
||||
0xE7 0x05D7 # HEBREW LETTER HET
|
||||
0xE8 0x05D8 # HEBREW LETTER TET
|
||||
0xE9 0x05D9 # HEBREW LETTER YOD
|
||||
0xEA 0x05DA # HEBREW LETTER FINAL KAF
|
||||
0xEB 0x05DB # HEBREW LETTER KAF
|
||||
0xEC 0x05DC # HEBREW LETTER LAMED
|
||||
0xED 0x05DD # HEBREW LETTER FINAL MEM
|
||||
0xEE 0x05DE # HEBREW LETTER MEM
|
||||
0xEF 0x05DF # HEBREW LETTER FINAL NUN
|
||||
0xF0 0x05E0 # HEBREW LETTER NUN
|
||||
0xF1 0x05E1 # HEBREW LETTER SAMEKH
|
||||
0xF2 0x05E2 # HEBREW LETTER AYIN
|
||||
0xF3 0x05E3 # HEBREW LETTER FINAL PE
|
||||
0xF4 0x05E4 # HEBREW LETTER PE
|
||||
0xF5 0x05E5 # HEBREW LETTER FINAL TSADI
|
||||
0xF6 0x05E6 # HEBREW LETTER TSADI
|
||||
0xF7 0x05E7 # HEBREW LETTER QOF
|
||||
0xF8 0x05E8 # HEBREW LETTER RESH
|
||||
0xF9 0x05E9 # HEBREW LETTER SHIN
|
||||
0xFA 0x05EA # HEBREW LETTER TAV
|
||||
0xFB <RL>+0x007D # RIGHT CURLY BRACKET, right-left
|
||||
0xFC <RL>+0x005D # RIGHT SQUARE BRACKET, right-left
|
||||
0xFD <RL>+0x007B # LEFT CURLY BRACKET, right-left
|
||||
0xFE <RL>+0x005B # LEFT SQUARE BRACKET, right-left
|
||||
0xFF <RL>+0x007C # VERTICAL LINE, right-left
|
369
charmap/ICELAND.TXT
Normal file
369
charmap/ICELAND.TXT
Normal file
@ -0,0 +1,369 @@
|
||||
#=======================================================================
|
||||
# File name: ICELAND.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Icelandic
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update URLs, notes. Matches internal
|
||||
# utom<b3>.
|
||||
# b02 1999-Sep-22 Encoding changed for Mac OS 8.5; change
|
||||
# mapping of 0xDB from CURRENCY SIGN to EURO
|
||||
# SIGN. Update contact e-mail address. Matches
|
||||
# internal utom<b2>, ufrm<b2>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n06 1998-Feb-05 Minor update to header comments, add
|
||||
# information on font variants
|
||||
# n03 1997-Dec-14 Update to match internal utom<n4>, ufrm<n16>:
|
||||
# Change standard mapping for 0xBD from U+2126
|
||||
# to its canonical decomposition, U+03A9.
|
||||
# n02 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n5>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Icelandic code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Icelandic code order.
|
||||
#
|
||||
# One of these mappings requires the use of a corporate character.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Icelandic character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Icelandic:
|
||||
# --------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# 1. General
|
||||
#
|
||||
# Mac OS Icelandic is used for Icelandic and Faroese.
|
||||
#
|
||||
# The Mac OS Icelandic encoding shares the script code smRoman
|
||||
# (0) with the standard Mac OS Roman encoding. To determine if
|
||||
# the Icelandic encoding is being used, you must also check if
|
||||
# the system region code is 21, verIceland.
|
||||
#
|
||||
# This character set is a variant of standard Mac OS Roman,
|
||||
# adding upper and lower eth, thorn, and Y acute. It has 6 code
|
||||
# point differences from standard Mac OS Roman.
|
||||
#
|
||||
# Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
|
||||
# mapped to U+00A4. In Mac OS 8.5 and later versions, code point
|
||||
# 0xDB is changed to EURO SIGN and maps to U+20AC; the standard
|
||||
# Apple fonts are updated for Mac OS 8.5 to reflect this. There are
|
||||
# "currency sign" variants of the Mac OS Icelandic encoding that
|
||||
# still map 0xDB to U+00A4; these can be used for older fonts.
|
||||
#
|
||||
# 2. Font variants
|
||||
#
|
||||
# The table in this file gives the Unicode mappings for the standard
|
||||
# Mac OS Icelandic encoding. This encoding is supported by the
|
||||
# Icelandic versions of the fonts Chicago, Geneva, Monaco, and New
|
||||
# York, and is the encoding supported by the text processing
|
||||
# utilities. However, other TrueType fonts implement a slightly
|
||||
# different encoding; the difference is only in two code points.
|
||||
# For the standard variant, these are:
|
||||
# 0xBB -> 0x00AA FEMININE ORDINAL INDICATOR
|
||||
# 0xBC -> 0x00BA MASCULINE ORDINAL INDICATOR
|
||||
#
|
||||
# For the TrueType variant (used by the Icelandic versions of the
|
||||
# fonts Courier, Helvetica, Palatino, and Times), these are:
|
||||
# 0xBB -> 0xFB01 LATIN SMALL LIGATURE FI
|
||||
# 0xBC -> 0xFB02 LATIN SMALL LIGATURE FL
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# The following corporate zone Unicode character is used in this
|
||||
# mapping:
|
||||
#
|
||||
# 0xF8FF Apple logo
|
||||
#
|
||||
# NOTE: The graphic image associated with the Apple logo character
|
||||
# is not authorized for use without permission of Apple, and
|
||||
# unauthorized use might constitute trademark infringement.
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version n06 to version b02:
|
||||
#
|
||||
# - Encoding changed for Mac OS 8.5; change mapping of 0xDB from
|
||||
# CURRENCY SIGN (U+00A4) to EURO SIGN (U+20AC).
|
||||
#
|
||||
# Changes from version n02 to version n03:
|
||||
#
|
||||
# - Change mapping of 0xBD from U+2126 to its canonical
|
||||
# decomposition, U+03A9.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x00C5 # LATIN CAPITAL LETTER A WITH RING ABOVE
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x00E3 # LATIN SMALL LETTER A WITH TILDE
|
||||
0x8C 0x00E5 # LATIN SMALL LETTER A WITH RING ABOVE
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 0x00EC # LATIN SMALL LETTER I WITH GRAVE
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 0x00F2 # LATIN SMALL LETTER O WITH GRAVE
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00F5 # LATIN SMALL LETTER O WITH TILDE
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 0x00DD # LATIN CAPITAL LETTER Y WITH ACUTE
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x00A2 # CENT SIGN
|
||||
0xA3 0x00A3 # POUND SIGN
|
||||
0xA4 0x00A7 # SECTION SIGN
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x00DF # LATIN SMALL LETTER SHARP S
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x00B4 # ACUTE ACCENT
|
||||
0xAC 0x00A8 # DIAERESIS
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x00C6 # LATIN CAPITAL LETTER AE
|
||||
0xAF 0x00D8 # LATIN CAPITAL LETTER O WITH STROKE
|
||||
0xB0 0x221E # INFINITY
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x00A5 # YEN SIGN
|
||||
0xB5 0x00B5 # MICRO SIGN
|
||||
0xB6 0x2202 # PARTIAL DIFFERENTIAL
|
||||
0xB7 0x2211 # N-ARY SUMMATION
|
||||
0xB8 0x220F # N-ARY PRODUCT
|
||||
0xB9 0x03C0 # GREEK SMALL LETTER PI
|
||||
0xBA 0x222B # INTEGRAL
|
||||
0xBB 0x00AA # FEMININE ORDINAL INDICATOR
|
||||
0xBC 0x00BA # MASCULINE ORDINAL INDICATOR
|
||||
0xBD 0x03A9 # GREEK CAPITAL LETTER OMEGA
|
||||
0xBE 0x00E6 # LATIN SMALL LETTER AE
|
||||
0xBF 0x00F8 # LATIN SMALL LETTER O WITH STROKE
|
||||
0xC0 0x00BF # INVERTED QUESTION MARK
|
||||
0xC1 0x00A1 # INVERTED EXCLAMATION MARK
|
||||
0xC2 0x00AC # NOT SIGN
|
||||
0xC3 0x221A # SQUARE ROOT
|
||||
0xC4 0x0192 # LATIN SMALL LETTER F WITH HOOK
|
||||
0xC5 0x2248 # ALMOST EQUAL TO
|
||||
0xC6 0x2206 # INCREMENT
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x00C0 # LATIN CAPITAL LETTER A WITH GRAVE
|
||||
0xCC 0x00C3 # LATIN CAPITAL LETTER A WITH TILDE
|
||||
0xCD 0x00D5 # LATIN CAPITAL LETTER O WITH TILDE
|
||||
0xCE 0x0152 # LATIN CAPITAL LIGATURE OE
|
||||
0xCF 0x0153 # LATIN SMALL LIGATURE OE
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x00F7 # DIVISION SIGN
|
||||
0xD7 0x25CA # LOZENGE
|
||||
0xD8 0x00FF # LATIN SMALL LETTER Y WITH DIAERESIS
|
||||
0xD9 0x0178 # LATIN CAPITAL LETTER Y WITH DIAERESIS
|
||||
0xDA 0x2044 # FRACTION SLASH
|
||||
0xDB 0x20AC # EURO SIGN
|
||||
0xDC 0x00D0 # LATIN CAPITAL LETTER ETH
|
||||
0xDD 0x00F0 # LATIN SMALL LETTER ETH
|
||||
0xDE 0x00DE # LATIN CAPITAL LETTER THORN
|
||||
0xDF 0x00FE # LATIN SMALL LETTER THORN
|
||||
0xE0 0x00FD # LATIN SMALL LETTER Y WITH ACUTE
|
||||
0xE1 0x00B7 # MIDDLE DOT
|
||||
0xE2 0x201A # SINGLE LOW-9 QUOTATION MARK
|
||||
0xE3 0x201E # DOUBLE LOW-9 QUOTATION MARK
|
||||
0xE4 0x2030 # PER MILLE SIGN
|
||||
0xE5 0x00C2 # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
|
||||
0xE6 0x00CA # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
|
||||
0xE7 0x00C1 # LATIN CAPITAL LETTER A WITH ACUTE
|
||||
0xE8 0x00CB # LATIN CAPITAL LETTER E WITH DIAERESIS
|
||||
0xE9 0x00C8 # LATIN CAPITAL LETTER E WITH GRAVE
|
||||
0xEA 0x00CD # LATIN CAPITAL LETTER I WITH ACUTE
|
||||
0xEB 0x00CE # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
|
||||
0xEC 0x00CF # LATIN CAPITAL LETTER I WITH DIAERESIS
|
||||
0xED 0x00CC # LATIN CAPITAL LETTER I WITH GRAVE
|
||||
0xEE 0x00D3 # LATIN CAPITAL LETTER O WITH ACUTE
|
||||
0xEF 0x00D4 # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
|
||||
0xF0 0xF8FF # Apple logo
|
||||
0xF1 0x00D2 # LATIN CAPITAL LETTER O WITH GRAVE
|
||||
0xF2 0x00DA # LATIN CAPITAL LETTER U WITH ACUTE
|
||||
0xF3 0x00DB # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
|
||||
0xF4 0x00D9 # LATIN CAPITAL LETTER U WITH GRAVE
|
||||
0xF5 0x0131 # LATIN SMALL LETTER DOTLESS I
|
||||
0xF6 0x02C6 # MODIFIER LETTER CIRCUMFLEX ACCENT
|
||||
0xF7 0x02DC # SMALL TILDE
|
||||
0xF8 0x00AF # MACRON
|
||||
0xF9 0x02D8 # BREVE
|
||||
0xFA 0x02D9 # DOT ABOVE
|
||||
0xFB 0x02DA # RING ABOVE
|
||||
0xFC 0x00B8 # CEDILLA
|
||||
0xFD 0x02DD # DOUBLE ACUTE ACCENT
|
||||
0xFE 0x02DB # OGONEK
|
||||
0xFF 0x02C7 # CARON
|
322
charmap/INUIT.TXT
Normal file
322
charmap/INUIT.TXT
Normal file
@ -0,0 +1,322 @@
|
||||
#=======================================================================
|
||||
# File name: INUIT.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Inuit
|
||||
# character set to Unicode 3.0 and later
|
||||
#
|
||||
# Contacts: charsets@apple.com, everson@evertype.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c01 2005-Apr-01 First posted version. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Inuit code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Inuit code order.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Inuit character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Inuit (partly from Michael Everson):
|
||||
# ----------------------------------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# This character set was developed by Michael Everson of Everson
|
||||
# Typography (everson@evertype.com) and was used for the Inuktitut
|
||||
# localizations of Mac OS, as well as for the Inuktitut utilities
|
||||
# package from Everson Typography. Note that while Apple authorized
|
||||
# the Inuktitut localization mentioned above, it was not shipped with
|
||||
# Apple hardware, and was not otherwise supported by Apple. Fonts
|
||||
# conforming to the Mac OS Inuit character set are available from
|
||||
# Everson Typography (http://www.evertype.com/software/apple/).
|
||||
# Information about the use of this character set is available at
|
||||
# http://www.evertype.com/standards/iu/.
|
||||
#
|
||||
# The Mac OS Inuit character set shares the script code smEthiopic
|
||||
# (28) with the Ethiopic encoding. To determine if the Inuktitut
|
||||
# encoding is being used, you must also check if the system region
|
||||
# code is 78, verNunavut.
|
||||
#
|
||||
# The Mac OS Inuit character set includes the full syllabic letter
|
||||
# repertoire required for Inuktitut; it is a subset of the Unified
|
||||
# Canadian Aboriginal Syllabics set encoded in Unicode. The encoding
|
||||
# is InuitSCII, designed by Doug Hitch for the Government of the
|
||||
# Northwest Territories.
|
||||
#
|
||||
# The Mac OS Inuit character set also includes a number of characters
|
||||
# that were needed for the classic Mac OS user interface and
|
||||
# localization (e.g. ellipsis, bullet, copyright sign). All of the
|
||||
# characters in Mac OS Inuit that are also in the Mac OS Roman
|
||||
# encoding are at the same code point in both; this improves
|
||||
# application compatibility.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x1403 # CANADIAN SYLLABICS I
|
||||
0x81 0x1404 # CANADIAN SYLLABICS II
|
||||
0x82 0x1405 # CANADIAN SYLLABICS O
|
||||
0x83 0x1406 # CANADIAN SYLLABICS OO
|
||||
0x84 0x140A # CANADIAN SYLLABICS A
|
||||
0x85 0x140B # CANADIAN SYLLABICS AA
|
||||
0x86 0x1431 # CANADIAN SYLLABICS PI
|
||||
0x87 0x1432 # CANADIAN SYLLABICS PII
|
||||
0x88 0x1433 # CANADIAN SYLLABICS PO
|
||||
0x89 0x1434 # CANADIAN SYLLABICS POO
|
||||
0x8A 0x1438 # CANADIAN SYLLABICS PA
|
||||
0x8B 0x1439 # CANADIAN SYLLABICS PAA
|
||||
0x8C 0x1449 # CANADIAN SYLLABICS P
|
||||
0x8D 0x144E # CANADIAN SYLLABICS TI
|
||||
0x8E 0x144F # CANADIAN SYLLABICS TII
|
||||
0x8F 0x1450 # CANADIAN SYLLABICS TO
|
||||
0x90 0x1451 # CANADIAN SYLLABICS TOO
|
||||
0x91 0x1455 # CANADIAN SYLLABICS TA
|
||||
0x92 0x1456 # CANADIAN SYLLABICS TAA
|
||||
0x93 0x1466 # CANADIAN SYLLABICS T
|
||||
0x94 0x146D # CANADIAN SYLLABICS KI
|
||||
0x95 0x146E # CANADIAN SYLLABICS KII
|
||||
0x96 0x146F # CANADIAN SYLLABICS KO
|
||||
0x97 0x1470 # CANADIAN SYLLABICS KOO
|
||||
0x98 0x1472 # CANADIAN SYLLABICS KA
|
||||
0x99 0x1473 # CANADIAN SYLLABICS KAA
|
||||
0x9A 0x1483 # CANADIAN SYLLABICS K
|
||||
0x9B 0x148B # CANADIAN SYLLABICS CI
|
||||
0x9C 0x148C # CANADIAN SYLLABICS CII
|
||||
0x9D 0x148D # CANADIAN SYLLABICS CO
|
||||
0x9E 0x148E # CANADIAN SYLLABICS COO
|
||||
0x9F 0x1490 # CANADIAN SYLLABICS CA
|
||||
0xA0 0x1491 # CANADIAN SYLLABICS CAA
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x14A1 # CANADIAN SYLLABICS C
|
||||
0xA3 0x14A5 # CANADIAN SYLLABICS MI
|
||||
0xA4 0x14A6 # CANADIAN SYLLABICS MII
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x14A7 # CANADIAN SYLLABICS MO
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x14A8 # CANADIAN SYLLABICS MOO
|
||||
0xAC 0x14AA # CANADIAN SYLLABICS MA
|
||||
0xAD 0x14AB # CANADIAN SYLLABICS MAA
|
||||
0xAE 0x14BB # CANADIAN SYLLABICS M
|
||||
0xAF 0x14C2 # CANADIAN SYLLABICS NI
|
||||
0xB0 0x14C3 # CANADIAN SYLLABICS NII
|
||||
0xB1 0x14C4 # CANADIAN SYLLABICS NO
|
||||
0xB2 0x14C5 # CANADIAN SYLLABICS NOO
|
||||
0xB3 0x14C7 # CANADIAN SYLLABICS NA
|
||||
0xB4 0x14C8 # CANADIAN SYLLABICS NAA
|
||||
0xB5 0x14D0 # CANADIAN SYLLABICS N
|
||||
0xB6 0x14EF # CANADIAN SYLLABICS SI
|
||||
0xB7 0x14F0 # CANADIAN SYLLABICS SII
|
||||
0xB8 0x14F1 # CANADIAN SYLLABICS SO
|
||||
0xB9 0x14F2 # CANADIAN SYLLABICS SOO
|
||||
0xBA 0x14F4 # CANADIAN SYLLABICS SA
|
||||
0xBB 0x14F5 # CANADIAN SYLLABICS SAA
|
||||
0xBC 0x1505 # CANADIAN SYLLABICS S
|
||||
0xBD 0x14D5 # CANADIAN SYLLABICS LI
|
||||
0xBE 0x14D6 # CANADIAN SYLLABICS LII
|
||||
0xBF 0x14D7 # CANADIAN SYLLABICS LO
|
||||
0xC0 0x14D8 # CANADIAN SYLLABICS LOO
|
||||
0xC1 0x14DA # CANADIAN SYLLABICS LA
|
||||
0xC2 0x14DB # CANADIAN SYLLABICS LAA
|
||||
0xC3 0x14EA # CANADIAN SYLLABICS L
|
||||
0xC4 0x1528 # CANADIAN SYLLABICS YI
|
||||
0xC5 0x1529 # CANADIAN SYLLABICS YII
|
||||
0xC6 0x152A # CANADIAN SYLLABICS YO
|
||||
0xC7 0x152B # CANADIAN SYLLABICS YOO
|
||||
0xC8 0x152D # CANADIAN SYLLABICS YA
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x152E # CANADIAN SYLLABICS YAA
|
||||
0xCC 0x153E # CANADIAN SYLLABICS Y
|
||||
0xCD 0x1555 # CANADIAN SYLLABICS FI
|
||||
0xCE 0x1556 # CANADIAN SYLLABICS FII
|
||||
0xCF 0x1557 # CANADIAN SYLLABICS FO
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x1558 # CANADIAN SYLLABICS FOO
|
||||
0xD7 0x1559 # CANADIAN SYLLABICS FA
|
||||
0xD8 0x155A # CANADIAN SYLLABICS FAA
|
||||
0xD9 0x155D # CANADIAN SYLLABICS F
|
||||
0xDA 0x1546 # CANADIAN SYLLABICS RI
|
||||
0xDB 0x1547 # CANADIAN SYLLABICS RII
|
||||
0xDC 0x1548 # CANADIAN SYLLABICS RO
|
||||
0xDD 0x1549 # CANADIAN SYLLABICS ROO
|
||||
0xDE 0x154B # CANADIAN SYLLABICS RA
|
||||
0xDF 0x154C # CANADIAN SYLLABICS RAA
|
||||
0xE0 0x1550 # CANADIAN SYLLABICS R
|
||||
0xE1 0x157F # CANADIAN SYLLABICS QI
|
||||
0xE2 0x1580 # CANADIAN SYLLABICS QII
|
||||
0xE3 0x1581 # CANADIAN SYLLABICS QO
|
||||
0xE4 0x1582 # CANADIAN SYLLABICS QOO
|
||||
0xE5 0x1583 # CANADIAN SYLLABICS QA
|
||||
0xE6 0x1584 # CANADIAN SYLLABICS QAA
|
||||
0xE7 0x1585 # CANADIAN SYLLABICS Q
|
||||
0xE8 0x158F # CANADIAN SYLLABICS NGI
|
||||
0xE9 0x1590 # CANADIAN SYLLABICS NGII
|
||||
0xEA 0x1591 # CANADIAN SYLLABICS NGO
|
||||
0xEB 0x1592 # CANADIAN SYLLABICS NGOO
|
||||
0xEC 0x1593 # CANADIAN SYLLABICS NGA
|
||||
0xED 0x1594 # CANADIAN SYLLABICS NGAA
|
||||
0xEE 0x1595 # CANADIAN SYLLABICS NG
|
||||
0xEF 0x1671 # CANADIAN SYLLABICS NNGI
|
||||
0xF0 0x1672 # CANADIAN SYLLABICS NNGII
|
||||
0xF1 0x1673 # CANADIAN SYLLABICS NNGO
|
||||
0xF2 0x1674 # CANADIAN SYLLABICS NNGOO
|
||||
0xF3 0x1675 # CANADIAN SYLLABICS NNGA
|
||||
0xF4 0x1676 # CANADIAN SYLLABICS NNGAA
|
||||
0xF5 0x1596 # CANADIAN SYLLABICS NNG
|
||||
0xF6 0x15A0 # CANADIAN SYLLABICS LHI
|
||||
0xF7 0x15A1 # CANADIAN SYLLABICS LHII
|
||||
0xF8 0x15A2 # CANADIAN SYLLABICS LHO
|
||||
0xF9 0x15A3 # CANADIAN SYLLABICS LHOO
|
||||
0xFA 0x15A4 # CANADIAN SYLLABICS LHA
|
||||
0xFB 0x15A5 # CANADIAN SYLLABICS LHAA
|
||||
0xFC 0x15A6 # CANADIAN SYLLABICS LH
|
||||
0xFD 0x157C # CANADIAN SYLLABICS NUNAVUT H
|
||||
0xFE 0x0141 # LATIN CAPITAL LETTER L WITH STROKE
|
||||
0xFF 0x0142 # LATIN SMALL LETTER L WITH STROKE
|
7728
charmap/JAPANESE.TXT
Normal file
7728
charmap/JAPANESE.TXT
Normal file
File diff suppressed because it is too large
Load Diff
234
charmap/KEYBOARD.TXT
Normal file
234
charmap/KEYBOARD.TXT
Normal file
@ -0,0 +1,234 @@
|
||||
#=======================================================================
|
||||
# File name: KEYBOARD.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Keyboard
|
||||
# character set to Unicode 4.0 and later.
|
||||
#
|
||||
# Copyright: (c) 2001-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Change mappings for 0x09, 0x0F, 0x8C; add
|
||||
# Mac OS X-only mappings for 0x8D-9x8F.
|
||||
# Update header comments, including
|
||||
# clarification of Mac OS X usage. Matches
|
||||
# internal xml <c1.2> and Text Encoding
|
||||
# Converter 2.0.
|
||||
# b1,c1 2002-Dec-19 First version. Matches internal utom<b6>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Keyboard code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode or Unicode sequence
|
||||
# (in hex as 0xNNNN or 0xNNNN+0xNNNN, etc.).
|
||||
# Column #3 is a comment containing the Unicode name.
|
||||
# In some cases an additional comment follows the Unicode name.
|
||||
#
|
||||
# The entries are in Mac OS Keyboard code order.
|
||||
#
|
||||
# Some of these mappings require the use of corporate characters.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# The Mac OS Keyboard character set uses the ranges normally set aside
|
||||
# for controls, so those ranges are present in this table.
|
||||
#
|
||||
# Notes on Mac OS Keyboard:
|
||||
# -------------------------
|
||||
#
|
||||
# This is the encoding for the legacy font named ".Keyboard". Before
|
||||
# Mac OS X, this font was used by the user-interface system to display
|
||||
# glyphs for special keys on the keyboard. In Mac OS X, that font is
|
||||
# not present and this mapping is not associated with a font; it is
|
||||
# only used as a way to map from a set of Menu Manager constants to
|
||||
# associated Unicode sequences. As such, new mappings added for Mac OS
|
||||
# X only may be one-way mappings: From the Keyboard glyph "encoding"
|
||||
# to Unicode, but not back.
|
||||
#
|
||||
# The Mac OS Keyboard encoding shares the script code smRoman
|
||||
# (0) with the Mac OS Roman encoding. To determine if the Keyboard
|
||||
# encoding is being used in Mac OS 8 or Mac OS 9, you must check if
|
||||
# the font name is ".Keyboard".
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# The goals in the mappings provided here are:
|
||||
# - For mappings used in Mac OS 8 and Mac OS 9, ensure roundtrip
|
||||
# mapping from every character in the Mac OS Keyboard character set
|
||||
# to Unicode and back. This consideration does not apply to mappings
|
||||
# added for Mac OS X only (noted below).
|
||||
# - Use standard Unicode characters as much as possible, to
|
||||
# maximize interchangeability of the resulting Unicode text.
|
||||
# Whenever possible, avoid having content carried by private-use
|
||||
# characters.
|
||||
#
|
||||
# Some of the characters in the Mac OS Keyboard character set do not
|
||||
# correspond to distinct, single Unicode characters. To map these
|
||||
# and satisfy both goals above, we employ various strategies.
|
||||
#
|
||||
# a) If possible, use private use characters in combination with
|
||||
# standard Unicode characters to mark variants of the standard
|
||||
# Unicode character.
|
||||
#
|
||||
# Apple has defined a block of 32 corporate characters as "transcoding
|
||||
# hints." These are used in combination with standard Unicode
|
||||
# characters to force them to be treated in a special way for mapping
|
||||
# to other encodings; they have no other effect. Sixteen of these
|
||||
# transcoding hints are "grouping hints" - they indicate that the next
|
||||
# 2-4 Unicode characters should be treated as a single entity for
|
||||
# transcoding. The other sixteen transcoding hints are "variant tags"
|
||||
# - they are like combining characters, and can follow a standard
|
||||
# Unicode (or a sequence consisting of a base character and other
|
||||
# combining characters) to cause it to be treated in a special way for
|
||||
# transcoding. These always terminate a combining-character sequence.
|
||||
#
|
||||
# The transcoding coding hints used in this mapping table are two
|
||||
# grouping tags, 0xF860-61, and one variant tag, 0xF87F. Since these
|
||||
# are combined with standard Unicode characters, some characters in
|
||||
# the Mac OS Keyboard character set map to a sequence of two to four
|
||||
# Unicodes instead of a single Unicode character.
|
||||
#
|
||||
# For example, the Mac OS Keyboard character at 0x6F, representing the
|
||||
# F1 key, is mapped to Unicode using the grouping tag F860 (group next
|
||||
# two) followed by U+0046 (LATIN CAPITAL LETTER F) and U+0031 (DIGIT
|
||||
# ONE).
|
||||
#
|
||||
# b) Otherwise, use private use characters by themselves to map Mac OS
|
||||
# Keyboard characters which have no relationship to any standard
|
||||
# Unicode character.
|
||||
#
|
||||
# The following additional corporate zone Unicode characters are
|
||||
# used for this purpose here:
|
||||
#
|
||||
# 0xF802 Lower left pencil
|
||||
# 0xF803 Contextual menu key symbol
|
||||
# 0xF8FF Apple logo
|
||||
#
|
||||
# NOTE: The graphic image associated with the Apple logo character
|
||||
# is not authorized for use without permission of Apple, and
|
||||
# unauthorized use might constitute trademark infringement.
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version c01 to version c02:
|
||||
#
|
||||
# - Mapping for 0x09 changed from 0x0009 (wrong) to 0x2423
|
||||
# - Mapping for 0x0F changed from 0x270E (wrong) to 0xF802
|
||||
# - Mapping for 0x8C changed from 0xF804 to 0x23CF (Unicode 4.0)
|
||||
# - Add Mac OS X-only mappings for 0x8D-0x8F
|
||||
#
|
||||
##################
|
||||
|
||||
0x00 0x0000 # control - NUL
|
||||
#
|
||||
0x02 0x21E5 # RIGHTWARDS ARROW TO BAR # Tab right (left-to-right text)
|
||||
0x03 0x21E4 # LEFTWARDS ARROW TO BAR # Tab left (right-to-left text)
|
||||
0x04 0x2324 # UP ARROWHEAD BETWEEN TWO HORIZONTAL BARS # Enter key
|
||||
0x05 0x21E7 # UPWARDS WHITE ARROW # Shift key
|
||||
0x06 0x2303 # UP ARROWHEAD # Control key
|
||||
0x07 0x2325 # OPTION KEY # Option key
|
||||
0x08 0x0008 # control - BS
|
||||
0x09 0x2423 # OPEN BOX # Space key (Mac OS X mapping, duplicates mapping for 0x61, hence no round-trip)
|
||||
0x0A 0x2326 # ERASE TO THE RIGHT # Delete right (right-to-left text)
|
||||
0x0B 0x21A9 # LEFTWARDS ARROW WITH HOOK # Return key (left-to-right text)
|
||||
0x0C 0x21AA # RIGHTWARDS ARROW WITH HOOK # Return key (right-to-left text)
|
||||
0x0D 0x000D # control - CR
|
||||
#
|
||||
0x0F 0xF802 # lower left pencil
|
||||
0x10 0x21E3 # DOWNWARDS DASHED ARROW
|
||||
0x11 0x2318 # PLACE OF INTEREST SIGN # Command key
|
||||
0x12 0x2713 # CHECK MARK
|
||||
0x13 0x25C6 # BLACK DIAMOND
|
||||
0x14 0xF8FF # Apple logo
|
||||
#
|
||||
0x17 0x232B # ERASE TO THE LEFT # Delete left (left-to-right text)
|
||||
0x18 0x21E0 # LEFTWARDS DASHED ARROW
|
||||
0x19 0x21E1 # UPWARDS DASHED ARROW
|
||||
0x1A 0x21E2 # RIGHTWARDS DASHED ARROW
|
||||
0x1B 0x238B # BROKEN CIRCLE WITH NORTHWEST ARROW # Escape key; for Unicode 3.0 and later
|
||||
0x1C 0x2327 # X IN A RECTANGLE BOX # Clear key
|
||||
#
|
||||
0x20 0x0020 # SPACE
|
||||
#
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
#
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
#
|
||||
0x61 0x2423 # OPEN BOX # Blank key
|
||||
0x62 0x21DE # UPWARDS ARROW WITH DOUBLE STROKE # Page up key
|
||||
0x63 0x21EA # UPWARDS WHITE ARROW FROM BAR # Caps lock key
|
||||
0x64 0x2190 # LEFTWARDS ARROW
|
||||
0x65 0x2192 # RIGHTWARDS ARROW
|
||||
0x66 0x2196 # NORTH WEST ARROW
|
||||
0x67 0x003F+0x20DD # QUESTION MARK + COMBINING ENCLOSING CIRCLE # Help key
|
||||
0x68 0x2191 # UPWARDS ARROW
|
||||
0x69 0x2198 # SOUTH EAST ARROW
|
||||
0x6A 0x2193 # DOWNWARDS ARROW
|
||||
0x6B 0x21DF # DOWNWARDS ARROW WITH DOUBLE STROKE # Page down key
|
||||
0x6C 0xF8FF+0xF87F # Apple logo, outline
|
||||
0x6D 0xF803 # Contextual menu key symbol
|
||||
0x6E 0x2758+0x20DD # LIGHT VERTICAL BAR + COMBINING ENCLOSING CIRCLE # Power key
|
||||
0x6F 0xF860+0x0046+0x0031 # group_2 + F + 1 # F1 key
|
||||
0x70 0xF860+0x0046+0x0032 # group_2 + F + 2 # F2 key
|
||||
0x71 0xF860+0x0046+0x0033 # group_2 + F + 3 # F3 key
|
||||
0x72 0xF860+0x0046+0x0034 # group_2 + F + 4 # F4 key
|
||||
0x73 0xF860+0x0046+0x0035 # group_2 + F + 5 # F5 key
|
||||
0x74 0xF860+0x0046+0x0036 # group_2 + F + 6 # F6 key
|
||||
0x75 0xF860+0x0046+0x0037 # group_2 + F + 7 # F7 key
|
||||
0x76 0xF860+0x0046+0x0038 # group_2 + F + 8 # F8 key
|
||||
0x77 0xF860+0x0046+0x0039 # group_2 + F + 9 # F9 key
|
||||
0x78 0xF861+0x0046+0x0031+0x0030 # group_3 + F + 1 + 0 # F10 key
|
||||
0x79 0xF861+0x0046+0x0031+0x0031 # group_3 + F + 1 + 1 # F11 key
|
||||
0x7A 0xF861+0x0046+0x0031+0x0032 # group_3 + F + 1 + 2 # F12 key
|
||||
#
|
||||
0x87 0xF861+0x0046+0x0031+0x0033 # group_3 + F + 1 + 3 # F13 key
|
||||
0x88 0xF861+0x0046+0x0031+0x0034 # group_3 + F + 1 + 4 # F14 key
|
||||
0x89 0xF861+0x0046+0x0031+0x0035 # group_3 + F + 1 + 5 # F15 key
|
||||
0x8A 0x2388 # HELM SYMBOL # Control key (ISO standard), Unicode 3.0 and later
|
||||
0x8B 0x2387 # ALTERNATIVE KEY SYMBOL # Unicode 3.0 and later
|
||||
0x8C 0x23CF # EJECT SYMBOL # Unicode 4.0 and later, Mac OS X only
|
||||
0x8D 0x82F1+0x6570 # Japanese "eisu" key symbol # Mac OS X only
|
||||
0x8E 0x304B+0x306A # Japanese "kana" key symbol # Mac OS X only
|
||||
0x8F 0xF861+0x0046+0x0031+0x0036 # group_3 + F + 1 + 6 # F16 key, Mac OS X only
|
||||
#
|
9942
charmap/KOREAN.TXT
Normal file
9942
charmap/KOREAN.TXT
Normal file
File diff suppressed because it is too large
Load Diff
370
charmap/ROMAN.TXT
Normal file
370
charmap/ROMAN.TXT
Normal file
@ -0,0 +1,370 @@
|
||||
#=======================================================================
|
||||
# File name: ROMAN.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Roman
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1994-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b4,c1 2002-Dec-19 Update URLs, notes. Matches internal
|
||||
# utom<b5>.
|
||||
# b03 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b4>, ufrm<b3>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# b02 1998-Aug-18 Encoding changed for Mac OS 8.5; change
|
||||
# mapping of 0xDB from CURRENCY SIGN to
|
||||
# EURO SIGN. Matches internal utom<b3>,
|
||||
# ufrm<b3>.
|
||||
# n08 1998-Feb-05 Minor update to header comments
|
||||
# n06 1997-Dec-14 Add warning about future changes to 0xDB
|
||||
# from CURRENCY SIGN to EURO SIGN. Clarify
|
||||
# some header information
|
||||
# n04 1997-Dec-01 Update to match internal utom<n3>, ufrm<n22>:
|
||||
# Change standard mapping for 0xBD from U+2126
|
||||
# to its canonical decomposition, U+03A9.
|
||||
# n03 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n9>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Roman code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Roman code order.
|
||||
#
|
||||
# One of these mappings requires the use of a corporate character.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Roman character set uses the standard control characters at
|
||||
# 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Roman:
|
||||
# ----------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported directly in programming
|
||||
# interfaces for QuickDraw Text, the Script Manager, and related
|
||||
# Text Utilities. For other purposes it is supported via transcoding
|
||||
# to and from Unicode.
|
||||
#
|
||||
# This character set is used for at least the following Mac OS
|
||||
# localizations: U.S., British, Canadian French, French, Swiss
|
||||
# French, German, Swiss German, Italian, Swiss Italian, Dutch,
|
||||
# Swedish, Norwegian, Danish, Finnish, Spanish, Catalan,
|
||||
# Portuguese, Brazilian, and the default International system.
|
||||
#
|
||||
# Variants of Mac OS Roman are used for Croatian, Icelandic,
|
||||
# Turkish, Romanian, and other encodings. Separate mapping tables
|
||||
# are available for these encodings.
|
||||
#
|
||||
# Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
|
||||
# mapped to U+00A4. In Mac OS 8.5 and later versions, code point
|
||||
# 0xDB is changed to EURO SIGN and maps to U+20AC; the standard
|
||||
# Apple fonts are updated for Mac OS 8.5 to reflect this. There is
|
||||
# a "currency sign" variant of the Mac OS Roman encoding that still
|
||||
# maps 0xDB to U+00A4; this can be used for older fonts.
|
||||
#
|
||||
# Before Mac OS 8.5, the ROM bitmap versions of the fonts Chicago,
|
||||
# New York, Geneva, and Monaco did not implement the full Mac OS
|
||||
# Roman character set; they only supported character codes up to
|
||||
# 0xD8. The TrueType versions of these fonts have always implemented
|
||||
# the full character set, as with the bitmap and TrueType versions
|
||||
# of the other standard Roman fonts.
|
||||
#
|
||||
# In all Mac OS encodings, fonts such as Chicago which are used
|
||||
# as "system" fonts (for menus, dialogs, etc.) have four glyphs
|
||||
# at code points 0x11-0x14 for transient use by the Menu Manager.
|
||||
# These glyphs are not intended as characters for use in normal
|
||||
# text, and the associated code points are not generally
|
||||
# interpreted as associated with these glyphs; they are usually
|
||||
# interpreted (if at all) as the control codes DC1-DC4.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# The following corporate zone Unicode character is used in this
|
||||
# mapping:
|
||||
#
|
||||
# 0xF8FF Apple logo
|
||||
#
|
||||
# NOTE: The graphic image associated with the Apple logo character
|
||||
# is not authorized for use without permission of Apple, and
|
||||
# unauthorized use might constitute trademark infringement.
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version n08 to version b02:
|
||||
#
|
||||
# - Encoding changed for Mac OS 8.5; change mapping of 0xDB from
|
||||
# CURRENCY SIGN (U+00A4) to EURO SIGN (U+20AC).
|
||||
#
|
||||
# Changes from version n03 to version n04:
|
||||
#
|
||||
# - Change mapping of 0xBD from U+2126 to its canonical
|
||||
# decomposition, U+03A9.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x00C5 # LATIN CAPITAL LETTER A WITH RING ABOVE
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x00E3 # LATIN SMALL LETTER A WITH TILDE
|
||||
0x8C 0x00E5 # LATIN SMALL LETTER A WITH RING ABOVE
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 0x00EC # LATIN SMALL LETTER I WITH GRAVE
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 0x00F2 # LATIN SMALL LETTER O WITH GRAVE
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00F5 # LATIN SMALL LETTER O WITH TILDE
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 0x2020 # DAGGER
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x00A2 # CENT SIGN
|
||||
0xA3 0x00A3 # POUND SIGN
|
||||
0xA4 0x00A7 # SECTION SIGN
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x00DF # LATIN SMALL LETTER SHARP S
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x00B4 # ACUTE ACCENT
|
||||
0xAC 0x00A8 # DIAERESIS
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x00C6 # LATIN CAPITAL LETTER AE
|
||||
0xAF 0x00D8 # LATIN CAPITAL LETTER O WITH STROKE
|
||||
0xB0 0x221E # INFINITY
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x00A5 # YEN SIGN
|
||||
0xB5 0x00B5 # MICRO SIGN
|
||||
0xB6 0x2202 # PARTIAL DIFFERENTIAL
|
||||
0xB7 0x2211 # N-ARY SUMMATION
|
||||
0xB8 0x220F # N-ARY PRODUCT
|
||||
0xB9 0x03C0 # GREEK SMALL LETTER PI
|
||||
0xBA 0x222B # INTEGRAL
|
||||
0xBB 0x00AA # FEMININE ORDINAL INDICATOR
|
||||
0xBC 0x00BA # MASCULINE ORDINAL INDICATOR
|
||||
0xBD 0x03A9 # GREEK CAPITAL LETTER OMEGA
|
||||
0xBE 0x00E6 # LATIN SMALL LETTER AE
|
||||
0xBF 0x00F8 # LATIN SMALL LETTER O WITH STROKE
|
||||
0xC0 0x00BF # INVERTED QUESTION MARK
|
||||
0xC1 0x00A1 # INVERTED EXCLAMATION MARK
|
||||
0xC2 0x00AC # NOT SIGN
|
||||
0xC3 0x221A # SQUARE ROOT
|
||||
0xC4 0x0192 # LATIN SMALL LETTER F WITH HOOK
|
||||
0xC5 0x2248 # ALMOST EQUAL TO
|
||||
0xC6 0x2206 # INCREMENT
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x00C0 # LATIN CAPITAL LETTER A WITH GRAVE
|
||||
0xCC 0x00C3 # LATIN CAPITAL LETTER A WITH TILDE
|
||||
0xCD 0x00D5 # LATIN CAPITAL LETTER O WITH TILDE
|
||||
0xCE 0x0152 # LATIN CAPITAL LIGATURE OE
|
||||
0xCF 0x0153 # LATIN SMALL LIGATURE OE
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x00F7 # DIVISION SIGN
|
||||
0xD7 0x25CA # LOZENGE
|
||||
0xD8 0x00FF # LATIN SMALL LETTER Y WITH DIAERESIS
|
||||
0xD9 0x0178 # LATIN CAPITAL LETTER Y WITH DIAERESIS
|
||||
0xDA 0x2044 # FRACTION SLASH
|
||||
0xDB 0x20AC # EURO SIGN
|
||||
0xDC 0x2039 # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
|
||||
0xDD 0x203A # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
|
||||
0xDE 0xFB01 # LATIN SMALL LIGATURE FI
|
||||
0xDF 0xFB02 # LATIN SMALL LIGATURE FL
|
||||
0xE0 0x2021 # DOUBLE DAGGER
|
||||
0xE1 0x00B7 # MIDDLE DOT
|
||||
0xE2 0x201A # SINGLE LOW-9 QUOTATION MARK
|
||||
0xE3 0x201E # DOUBLE LOW-9 QUOTATION MARK
|
||||
0xE4 0x2030 # PER MILLE SIGN
|
||||
0xE5 0x00C2 # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
|
||||
0xE6 0x00CA # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
|
||||
0xE7 0x00C1 # LATIN CAPITAL LETTER A WITH ACUTE
|
||||
0xE8 0x00CB # LATIN CAPITAL LETTER E WITH DIAERESIS
|
||||
0xE9 0x00C8 # LATIN CAPITAL LETTER E WITH GRAVE
|
||||
0xEA 0x00CD # LATIN CAPITAL LETTER I WITH ACUTE
|
||||
0xEB 0x00CE # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
|
||||
0xEC 0x00CF # LATIN CAPITAL LETTER I WITH DIAERESIS
|
||||
0xED 0x00CC # LATIN CAPITAL LETTER I WITH GRAVE
|
||||
0xEE 0x00D3 # LATIN CAPITAL LETTER O WITH ACUTE
|
||||
0xEF 0x00D4 # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
|
||||
0xF0 0xF8FF # Apple logo
|
||||
0xF1 0x00D2 # LATIN CAPITAL LETTER O WITH GRAVE
|
||||
0xF2 0x00DA # LATIN CAPITAL LETTER U WITH ACUTE
|
||||
0xF3 0x00DB # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
|
||||
0xF4 0x00D9 # LATIN CAPITAL LETTER U WITH GRAVE
|
||||
0xF5 0x0131 # LATIN SMALL LETTER DOTLESS I
|
||||
0xF6 0x02C6 # MODIFIER LETTER CIRCUMFLEX ACCENT
|
||||
0xF7 0x02DC # SMALL TILDE
|
||||
0xF8 0x00AF # MACRON
|
||||
0xF9 0x02D8 # BREVE
|
||||
0xFA 0x02D9 # DOT ABOVE
|
||||
0xFB 0x02DA # RING ABOVE
|
||||
0xFC 0x00B8 # CEDILLA
|
||||
0xFD 0x02DD # DOUBLE ACUTE ACCENT
|
||||
0xFE 0x02DB # OGONEK
|
||||
0xFF 0x02C7 # CARON
|
365
charmap/ROMANIAN.TXT
Normal file
365
charmap/ROMANIAN.TXT
Normal file
@ -0,0 +1,365 @@
|
||||
#=======================================================================
|
||||
# File name: ROMANIAN.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Romanian
|
||||
# character set to Unicode 3.0 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.2> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update mappings for 0xAF, 0xBF, 0xDE, 0xDF
|
||||
# to use new composed characters added in
|
||||
# Unicode 3.0. Update URLs, notes. Matches
|
||||
# internal utom<b3>.
|
||||
# b02 1999-Sep-22 Encoding changed for Mac OS 8.5; change
|
||||
# mapping of 0xDB from CURRENCY SIGN to EURO
|
||||
# SIGN. Update contact e-mail address. Matches
|
||||
# internal utom<b2>, ufrm<b2>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n05 1998-Feb-05 Minor update to header comments
|
||||
# n03 1997-Dec-14 Update to match internal utom<n5>, ufrm<n16>:
|
||||
# Change standard mapping for 0xBD from U+2126
|
||||
# to its canonical decomposition, U+03A9.
|
||||
# Change mapping of 0xAF,0xBF,0xDE,0xDF from
|
||||
# composed S/T WITH CEDILLA to S/T with
|
||||
# COMBINING COMMA BELOW (to match our
|
||||
# decomposition mappings).
|
||||
# n02 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n4>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Romanian code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Romanian code order.
|
||||
#
|
||||
# One of these mappings requires the use of a corporate character.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Romanian character set uses the standard control characters at
|
||||
# 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Romanian:
|
||||
# -------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# Mac OS Romanian is used only for Romanian.
|
||||
#
|
||||
# The Mac OS Romanian encoding shares the script code smRoman
|
||||
# (0) with the standard Mac OS Roman encoding. To determine if
|
||||
# the Romanian encoding is being used, you must also check if the
|
||||
# system region code is 39, verRomania.
|
||||
#
|
||||
# This character set is a variant of standard Mac OS Roman, adding
|
||||
# upper and lower A breve, S comma below, and T comma below. It
|
||||
# has 6 code point differences from standard Mac OS Roman.
|
||||
#
|
||||
# Before Mac OS 8.5, code point 0xDB was CURRENCY SIGN, and was
|
||||
# mapped to U+00A4. In Mac OS 8.5 and later versions, code point
|
||||
# 0xDB is changed to EURO SIGN and maps to U+20AC; the standard
|
||||
# Apple fonts are updated for Mac OS 8.5 to reflect this. There is
|
||||
# a "currency sign" variant of the Mac OS Romanian encoding that
|
||||
# still maps 0xDB to U+00A4; this can be used for older fonts.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# The following corporate zone Unicode character is used in this
|
||||
# mapping:
|
||||
#
|
||||
# 0xF8FF Apple logo
|
||||
#
|
||||
# NOTE: The graphic image associated with the Apple logo character
|
||||
# is not authorized for use without permission of Apple, and
|
||||
# unauthorized use might constitute trademark infringement.
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version b02 to version b03/c01:
|
||||
#
|
||||
# - Update the mappings for 0xAF, 0xBF, 0xDE, 0xDF to use new
|
||||
# composed Unicode characters 0x0218-0x021B added in Unicode 3.0;
|
||||
# the previous mappings were to the equivalent decomposition
|
||||
# sequences.
|
||||
#
|
||||
# Changes from version n05 to version b02:
|
||||
#
|
||||
# - Encoding changed for Mac OS 8.5; change mapping of 0xDB from
|
||||
# CURRENCY SIGN (U+00A4) to EURO SIGN (U+20AC).
|
||||
#
|
||||
# Changes from version n02 to version n03:
|
||||
#
|
||||
# - Change mapping of 0xBD from U+2126 to its canonical
|
||||
# decomposition, U+03A9.
|
||||
# - Change mapping of 0xAF,0xBF,0xDE,0xDF from composed S or T
|
||||
# WITH CEDILLA to S or T with COMBINING COMMA BELOW (to match
|
||||
# our decomposition mappings).
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x00C5 # LATIN CAPITAL LETTER A WITH RING ABOVE
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x00E3 # LATIN SMALL LETTER A WITH TILDE
|
||||
0x8C 0x00E5 # LATIN SMALL LETTER A WITH RING ABOVE
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 0x00EC # LATIN SMALL LETTER I WITH GRAVE
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 0x00F2 # LATIN SMALL LETTER O WITH GRAVE
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00F5 # LATIN SMALL LETTER O WITH TILDE
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 0x2020 # DAGGER
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x00A2 # CENT SIGN
|
||||
0xA3 0x00A3 # POUND SIGN
|
||||
0xA4 0x00A7 # SECTION SIGN
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x00DF # LATIN SMALL LETTER SHARP S
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x00B4 # ACUTE ACCENT
|
||||
0xAC 0x00A8 # DIAERESIS
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x0102 # LATIN CAPITAL LETTER A WITH BREVE
|
||||
0xAF 0x0218 # LATIN CAPITAL LETTER S WITH COMMA BELOW # for Unicode 3.0 and later
|
||||
0xB0 0x221E # INFINITY
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x00A5 # YEN SIGN
|
||||
0xB5 0x00B5 # MICRO SIGN
|
||||
0xB6 0x2202 # PARTIAL DIFFERENTIAL
|
||||
0xB7 0x2211 # N-ARY SUMMATION
|
||||
0xB8 0x220F # N-ARY PRODUCT
|
||||
0xB9 0x03C0 # GREEK SMALL LETTER PI
|
||||
0xBA 0x222B # INTEGRAL
|
||||
0xBB 0x00AA # FEMININE ORDINAL INDICATOR
|
||||
0xBC 0x00BA # MASCULINE ORDINAL INDICATOR
|
||||
0xBD 0x03A9 # GREEK CAPITAL LETTER OMEGA
|
||||
0xBE 0x0103 # LATIN SMALL LETTER A WITH BREVE
|
||||
0xBF 0x0219 # LATIN SMALL LETTER S WITH COMMA BELOW # for Unicode 3.0 and later
|
||||
0xC0 0x00BF # INVERTED QUESTION MARK
|
||||
0xC1 0x00A1 # INVERTED EXCLAMATION MARK
|
||||
0xC2 0x00AC # NOT SIGN
|
||||
0xC3 0x221A # SQUARE ROOT
|
||||
0xC4 0x0192 # LATIN SMALL LETTER F WITH HOOK
|
||||
0xC5 0x2248 # ALMOST EQUAL TO
|
||||
0xC6 0x2206 # INCREMENT
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x00C0 # LATIN CAPITAL LETTER A WITH GRAVE
|
||||
0xCC 0x00C3 # LATIN CAPITAL LETTER A WITH TILDE
|
||||
0xCD 0x00D5 # LATIN CAPITAL LETTER O WITH TILDE
|
||||
0xCE 0x0152 # LATIN CAPITAL LIGATURE OE
|
||||
0xCF 0x0153 # LATIN SMALL LIGATURE OE
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x00F7 # DIVISION SIGN
|
||||
0xD7 0x25CA # LOZENGE
|
||||
0xD8 0x00FF # LATIN SMALL LETTER Y WITH DIAERESIS
|
||||
0xD9 0x0178 # LATIN CAPITAL LETTER Y WITH DIAERESIS
|
||||
0xDA 0x2044 # FRACTION SLASH
|
||||
0xDB 0x20AC # EURO SIGN
|
||||
0xDC 0x2039 # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
|
||||
0xDD 0x203A # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
|
||||
0xDE 0x021A # LATIN CAPITAL LETTER T WITH COMMA BELOW # for Unicode 3.0 and later
|
||||
0xDF 0x021B # LATIN SMALL LETTER T WITH COMMA BELOW # for Unicode 3.0 and later
|
||||
0xE0 0x2021 # DOUBLE DAGGER
|
||||
0xE1 0x00B7 # MIDDLE DOT
|
||||
0xE2 0x201A # SINGLE LOW-9 QUOTATION MARK
|
||||
0xE3 0x201E # DOUBLE LOW-9 QUOTATION MARK
|
||||
0xE4 0x2030 # PER MILLE SIGN
|
||||
0xE5 0x00C2 # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
|
||||
0xE6 0x00CA # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
|
||||
0xE7 0x00C1 # LATIN CAPITAL LETTER A WITH ACUTE
|
||||
0xE8 0x00CB # LATIN CAPITAL LETTER E WITH DIAERESIS
|
||||
0xE9 0x00C8 # LATIN CAPITAL LETTER E WITH GRAVE
|
||||
0xEA 0x00CD # LATIN CAPITAL LETTER I WITH ACUTE
|
||||
0xEB 0x00CE # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
|
||||
0xEC 0x00CF # LATIN CAPITAL LETTER I WITH DIAERESIS
|
||||
0xED 0x00CC # LATIN CAPITAL LETTER I WITH GRAVE
|
||||
0xEE 0x00D3 # LATIN CAPITAL LETTER O WITH ACUTE
|
||||
0xEF 0x00D4 # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
|
||||
0xF0 0xF8FF # Apple logo
|
||||
0xF1 0x00D2 # LATIN CAPITAL LETTER O WITH GRAVE
|
||||
0xF2 0x00DA # LATIN CAPITAL LETTER U WITH ACUTE
|
||||
0xF3 0x00DB # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
|
||||
0xF4 0x00D9 # LATIN CAPITAL LETTER U WITH GRAVE
|
||||
0xF5 0x0131 # LATIN SMALL LETTER DOTLESS I
|
||||
0xF6 0x02C6 # MODIFIER LETTER CIRCUMFLEX ACCENT
|
||||
0xF7 0x02DC # SMALL TILDE
|
||||
0xF8 0x00AF # MACRON
|
||||
0xF9 0x02D8 # BREVE
|
||||
0xFA 0x02D9 # DOT ABOVE
|
||||
0xFB 0x02DA # RING ABOVE
|
||||
0xFC 0x00B8 # CEDILLA
|
||||
0xFD 0x02DD # DOUBLE ACUTE ACCENT
|
||||
0xFE 0x02DB # OGONEK
|
||||
0xFF 0x02C7 # CARON
|
590
charmap/ReadMe.txt
Normal file
590
charmap/ReadMe.txt
Normal file
@ -0,0 +1,590 @@
|
||||
#=======================================================================
|
||||
# File name: README.TXT
|
||||
#
|
||||
# Contents: Background information on Unicode mapping tables for
|
||||
# Mac OS legacy text encodings
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-04 Update discussion of roundtrip fidelity,
|
||||
# delete discussion of mappings dependent on
|
||||
# symmetric swapping (no longer supported),
|
||||
# provide information on how legacy encodings
|
||||
# are supported in Mac OS X.
|
||||
# b3,c1 2002-Dec-19 Add Keyboard font encoding. Update URLs,
|
||||
# notes.
|
||||
# b02 1999-Sep-22 Update information on Cyrillic. Update
|
||||
# contact e-mail address.
|
||||
# n07 1998-Feb-05 Rewrite to provide additional information
|
||||
# relevant to using the accompanying mapping
|
||||
# tables, and to delete some extraneous
|
||||
# information. Delete Bulgarian (no special
|
||||
# encoding, uses standard Cyrillic), add
|
||||
# Farsi, Devanagari, Gurmukhi, Gujarati,
|
||||
# Celtic, Gaelic, Inuit, Tibetan.
|
||||
# n04 1995-Nov-15 Update info for Hebrew and Thai
|
||||
# n03 1995-Apr-15 First version (after fixing some typos).
|
||||
#
|
||||
##################
|
||||
|
||||
0. Preliminaries
|
||||
----------------
|
||||
|
||||
For maximum interchangeability, this file and the accompanying Mac OS
|
||||
mapping tables use only ASCII characters. They are intended to be
|
||||
displayed in a monospaced font.
|
||||
|
||||
Apple, the Apple logo, Mac, and Macintosh are trademarks of Apple
|
||||
Computer, Inc., registered in the United States and other countries.
|
||||
QuickDraw and TrueType are trademarks of Apple Computer, Inc. Unicode is
|
||||
a trademark of Unicode Inc. PostScript is a trademark of Adobe Systems
|
||||
Inc., which may be registered in certain jurisdictions. IBM is a
|
||||
registered trademark of International Business Machines Corporation. ITC
|
||||
Zapf Dingbats is a registered trademark of the International Typeface
|
||||
Corporation. For the sake of brevity, throughout this document and the
|
||||
accompanying tables, "Macintosh" can be used to refer to Macintosh
|
||||
computers and "Unicode" can be used to refer to the Unicode standard.
|
||||
|
||||
Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
either express or implied, with respect to this document and the
|
||||
accompanying tables, their quality, accuracy, or fitness for a
|
||||
particular purpose. In no event will Apple be liable for direct,
|
||||
indirect, special, incidental, or consequential damages resulting from
|
||||
any defect or inaccuracy in this document or the accompanying tables.
|
||||
|
||||
1. Introduction
|
||||
---------------
|
||||
|
||||
This document summarizes some Unicode mapping considerations that are
|
||||
relevant for the accompanying mapping tables. It also provides an
|
||||
overview of Mac OS legacy encodings.
|
||||
|
||||
These mapping tables and character lists are subject to change. The
|
||||
latest tables should be available from the following:
|
||||
|
||||
<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
|
||||
2. Round-trip fidelity and overview of mapping techniques
|
||||
---------------------------------------------------------
|
||||
|
||||
For a particular set of national and international standards, Unicode
|
||||
provides round-trip fidelity: Text in one of those encodings can be
|
||||
mapped to Unicode and back again, yielding the original characters.
|
||||
Characters which are distinct in one of these source standards have a
|
||||
distinct counterpart in Unicode. Note that this counterpart might not be
|
||||
a single Unicode character; as is pointed out in "The Unicode Standard,
|
||||
Version 2.0" (page 2-10), "sometimes a single code value in another
|
||||
standard corresponds to a sequence of code values in the Unicode
|
||||
Standard, or vice versa."
|
||||
|
||||
However, Unicode does not attempt to provide round-trip fidelity for
|
||||
most vendor standards. Nevertheless, Apple and other platform vendors
|
||||
may need to provide such round-trip fidelity for their current platform
|
||||
encodings and/or legacy platform encodings (this can be important in
|
||||
file systems, for example). In order to do this, Apple makes use of some
|
||||
Unicode characters in the corporate-use zone (the upper end of the
|
||||
private use area).
|
||||
|
||||
Corporate-zone characters must be used with care. Indiscriminate use of
|
||||
such characters can result in text which is not easily interchanged with
|
||||
other systems, since these characters have no standard meaning outside a
|
||||
particular platform. The mappings provided here are intended to minimize
|
||||
the use of private use characters, or to use them in such a way that
|
||||
basic text content will not be lost if the corporate zone characters are
|
||||
dropped when text is transferred to another system.
|
||||
|
||||
The tables provided here have three goals, in the following order of
|
||||
importance:
|
||||
1. Provide 100% round-trip mapping from a Mac OS legacy encoding to
|
||||
Unicode and back.
|
||||
2. Map characters in a Mac OS encoding into the Unicode characters that
|
||||
best represent the interpretation and usage of the Mac OS characters.
|
||||
3. When mapping text in a Mac OS encoding to Unicode using the tables,
|
||||
the resulting Unicode text should be as interchangeable as possible.
|
||||
|
||||
To satisfy these goals, the mappings use a variety of techniques. First
|
||||
we attempt to achieve round-trip mappings using any standard Unicode
|
||||
feature at our disposal, without resorting to corporate-zone characters.
|
||||
This can includes the following techniques:
|
||||
- Use of all Unicode characters defined in Unicode 2.1 and later,
|
||||
including compatibility characters.
|
||||
- Mapping a single character in a Mac OS encoding to a sequence of
|
||||
standard Unicode characters, or vice versa. This requires grouping
|
||||
characters into appropriate chunks for lookup before mapping them
|
||||
(this mainly applies to sequences of Unicode characters).
|
||||
- Using Unicode direction overrides to force direction attributes when
|
||||
mapping to Unicode. This requires resolution of Unicode character
|
||||
direction, and use of this information, when mapping from Unicode back
|
||||
to certain Mac OS encodings.
|
||||
The requirements imposed on Unicode handling are necessary for other,
|
||||
non-transcoding operations in a full Unicode implementation anyway, so
|
||||
requiring them for transcoding should not impose much of a burden.
|
||||
|
||||
Next, if round-trip fidelity cannot be achieved using the above
|
||||
techniques, we attempt to use corporate-zone characters only as
|
||||
"transcoding hints" (more on this below). These are combined with one or
|
||||
more standard Unicode characters to mark them as special for
|
||||
transcoding, but have no other function and can be deleted with no loss
|
||||
of basic text content (only of round-trip fidelity).
|
||||
|
||||
Finally, if a character in a Mac OS encoding is unrelated to any Unicode
|
||||
character or Unicode character sequence, we may map it to a single
|
||||
corporate-zone Unicode code point.
|
||||
|
||||
These techniques are described in more detail in the following sections.
|
||||
|
||||
Some clients of these tables may have a different set of goals. For
|
||||
example, some clients may prefer to avoid compatibility characters,
|
||||
perhaps sacrificing round-trip fidelity if necessary. In most cases it
|
||||
is fairly easy to construct other types of mappings from the mappings
|
||||
given here. In particular, the Unicode mappings here have been designed
|
||||
so that if they are converted to a restricted form of NFD (a form that
|
||||
does NOT decompose or normalize Unicode characters in the ranges
|
||||
2000-2FFF or F900-FAFF), the resulting mappings still provide roundtrip
|
||||
fidelity. (For certain characters in the Mac OS Hebrew and Devanagari
|
||||
encodings, the decomposition mappings must use a grouping transcoding
|
||||
hint to ensure roundtrip fidelity; more details on this are provided in
|
||||
the mapping tables for those encodings.)
|
||||
|
||||
There is one more round-trip issue that should be mentioned. If a
|
||||
Unicode character or sequence can be mapped at all into a particular Mac
|
||||
OS encoding, then the reverse mapping back to Unicode should yield the
|
||||
original Unicode character or sequence (except for possible differences
|
||||
in direction overrides or other Unicode characters with General Category
|
||||
Cf). The tables here also provide this. For a related issue, see the
|
||||
next section.
|
||||
|
||||
3. Mapping tolerance: Strict and loose
|
||||
--------------------------------------
|
||||
|
||||
In many character sets, a single character may have multiple semantics,
|
||||
either by explicit definition, ambiguous definition, or established
|
||||
usage. For example, the JIS character 0x2142, or 0x8161 in Shift-JIS,
|
||||
is specified in the JIS X0208 standard to have two meanings: "double
|
||||
vertical line" and "parallel". Each of these meanings corresponds to a
|
||||
different Unicode character: 0x2016 DOUBLE VERTICAL LINE and 0x2225
|
||||
PARALLEL TO. When mapping from Unicode to Shift-JIS, it is normally
|
||||
desirable to map both of these Unicode characters to the single
|
||||
Shift-JIS character. However, when mapping the Shift-JIS character to
|
||||
Unicode, we can choose only one of the possible Unicode characters.
|
||||
|
||||
For two encodings X and Y, we can define a set of "strict" mappings
|
||||
from one to the other as follows: If text in X can be mapped to Y using
|
||||
the strict mappings from X to Y, then the resulting text can be mapped
|
||||
back using the strict mappings from Y to X to end up with the original
|
||||
text from X. Similarly, if text in Y can be mapped to X using the strict
|
||||
mappings from Y to X, then the resulting text can be mapped back using
|
||||
the strict mappings from X to Y to end up with the original text from Y.
|
||||
|
||||
There may be several characters in one encoding that all map to a
|
||||
single character in another encoding, but only one of these mappings
|
||||
can be strict; the others are "loose".
|
||||
|
||||
The mappings given in the accompanying tables are strict mappings.
|
||||
However, the Mac OS Text Encoding Converter also supports loose
|
||||
mappings and fallback mappings. Some of the accompanying tables provide
|
||||
suggestions about possible loose mappings.
|
||||
|
||||
4. Mapping a Mac encoding character to a Unicode sequence or vice versa
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
In some cases, a character in a Mac OS legacy encoding maps to a
|
||||
sequence of Unicode characters. For example, the Mac OS Japanese
|
||||
encoding includes a character for the circled CJK ideograph "big".
|
||||
Although Unicode encodes other circled ideographs as single characters,
|
||||
it does not encode this one. However, this character can be
|
||||
unambiguously represented in Unicode as the Unicode sequence
|
||||
0x5927+0x20DD, the CJK ideograph for "big" followed by COMBINING
|
||||
ENCLOSING CIRCLE.
|
||||
|
||||
To handle the reverse mapping, a transcoding process must group the
|
||||
Unicode sequence 0x5927+0x20DD as a single element for lookup (The
|
||||
Mac OS Text Encoding Converter does this).
|
||||
|
||||
In a few cases, a sequence of characters in a Mac OS legacy encoding
|
||||
must be grouped for mapping to a single Unicode character or a sequence
|
||||
of Unicode characters. For example, in Mac OS Devanagari (based on
|
||||
ISCII-91), DEVANAGARI LETTER VOCALIC L is represented as 0xA6+0xE9;
|
||||
but this is represented in Unicode by the single character 0x090C.
|
||||
Furthermore, explicit halant is represented in Mac OS Devanagari as
|
||||
0xE8+0xE8 (double halant) and in Unicode as 0x094D+0x200C (VIRAMA
|
||||
plus ZERO WIDTH NON-JOINER). The latter can also be considered as
|
||||
a context-dependent mapping of 0xE8, halant.
|
||||
|
||||
Loose mappings from Unicode to a Mac OS encoding often map a single
|
||||
Unicode to a sequence of characters in the Mac OS encoding. For example,
|
||||
the Unicode character 0x00BD VULGAR FRACTION ONE HALF cannot be mapped
|
||||
into the Mac OS Roman character set as a single character, but it has a
|
||||
loose mapping to the sequence 0x31+0xDA+0x32, "digit one" + "fraction
|
||||
slash" + "digit two".
|
||||
|
||||
In some cases a Unicode character such as a direction override may
|
||||
simply be discarded when mapping to a Mac OS encoding, since the
|
||||
information carried by the override may be represented in a different
|
||||
way by the Mac OS encoding. See the next section for an example.
|
||||
|
||||
5. Mappings that depend on directionality (or other attributes)
|
||||
---------------------------------------------------------------
|
||||
|
||||
Strict mappings from Unicode to Mac OS legacy encodings may depend on
|
||||
resolved character direction. Loose mappings may depend on additional
|
||||
attributes such as whether the text should use vertical form codes if
|
||||
available (i.e. whether the text is intended for vertical display on a
|
||||
system that cannot automatically substitute vertical forms).
|
||||
|
||||
a) Resolved character direction
|
||||
|
||||
The Mac OS Arabic and Hebrew character sets were developed in 1986-1987.
|
||||
At that time the bidirectional line layout algorithm used in the Mac OS
|
||||
was fairly simple; it used only a few direction classes (instead of the
|
||||
19 now used in the Unicode bidirectional algorithm). In order to permit
|
||||
users to handle some tricky layout problems, certain punctuation and
|
||||
symbol characters have duplicate code points, one with a left-right
|
||||
direction attribute and the other with a right-left direction attribute.
|
||||
|
||||
For example, plus sign is encoded at 0x2B with a left-right attribute,
|
||||
and at 0xAB with a right-left attribute. However, there is only one PLUS
|
||||
SIGN character in Unicode. This leads to some interesting problems when
|
||||
mapping between Mac OS Arabic or Hebrew and Unicode.
|
||||
|
||||
We need a way to map both of these plus signs to Unicode and back. Using
|
||||
a single corporate character for one of these plus signs is not a good
|
||||
solution, since both of the plus sign characters are likely to be used
|
||||
in text that is interchanged, and thus content would be lost.
|
||||
|
||||
The problem is solved with the use of direction override characters and
|
||||
direction-dependent mappings. When mapping from Mac OS Arabic or Hebrew
|
||||
to Unicode, we use direction overrides as necessary to force the
|
||||
direction of the resulting Unicode characters. When mapping back from
|
||||
Unicode, the Unicode bidirectional algorithm should be used to determine
|
||||
resolved direction of the Unicode characters. The mapping from Unicode
|
||||
to Mac OS Arabic or Hebrew can then be disambiguated as necessary by
|
||||
using the resolved direction.
|
||||
|
||||
For example, when mapping from Mac OS Arabic or Hebrew, we can use
|
||||
LEFT-RIGHT OVERRIDE (LRO), RIGHT-LEFT OVERRIDE (RLO), and POP DIRECTION
|
||||
FORMATTING (PDF) as follows:
|
||||
|
||||
0x2B -> 0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
|
||||
0xAB -> 0x202E (RLO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
|
||||
|
||||
When mapping back, we resolve the direction of the Unicode character
|
||||
0x002B, and use this information to determine which of the Mac OS
|
||||
encoding characters to use:
|
||||
|
||||
0x002B -> 0x2B (if LR) or 0xAB (if RL)
|
||||
|
||||
After direction overrides have been used in this way to force a
|
||||
particular resolved direction, they may be discarded when mapping from
|
||||
Unicode to Mac OS Arabic and Hebrew (since the information they carried
|
||||
in Unicode is represented in the Mac OS encoding by the code point of
|
||||
the plus sign).
|
||||
|
||||
Even when not required for round-trip fidelity, direction overrides
|
||||
may be used when mapping from a Mac OS encoding to Unicode in order to
|
||||
preserve proper text layout. For example, the single Mac OS Arabic
|
||||
ellipsis character has direction class right-left, while the Unicode
|
||||
HORIZONTAL ELLIPSIS character has direction class neutral. When
|
||||
mapping the Mac OS ellipsis to Unicode, it is surrounded with a
|
||||
direction override to help preserve proper text layout. However,
|
||||
resolved direction is not needed or used when mapping the Unicode
|
||||
HORIZONTAL ELLIPSIS back to Mac OS Arabic.
|
||||
|
||||
b) Horizontal or vertical display
|
||||
|
||||
The Mac OS Japanese encoding includes separately-encoded vertical forms
|
||||
for some punctuation and kana. When Unicode characters in the CJK
|
||||
punctuation and kana ranges are mapped to Mac OS Japanese characters and
|
||||
(1) those characters are intended for vertical display, (2) they will be
|
||||
displayed in an environment that does not provide automatic vertical
|
||||
form substitution, and (3) loose mappings are desired, the Unicode
|
||||
characters can be mapped to the corresponding vertical form codes in the
|
||||
Mac OS Japanese encoding.
|
||||
|
||||
This does not affect mapping of the Unicode vertical presentation forms
|
||||
(which always map to the Mac OS Japanese vertical form codes).
|
||||
|
||||
6. Use of corporate characters
|
||||
------------------------------
|
||||
|
||||
Apple has defined a block of 32 corporate characters as "transcoding
|
||||
hints." These are used in combination with standard Unicode characters
|
||||
to force them to be treated in a special way for mapping to other
|
||||
encodings; they have no other effect. Sixteen of these transcoding
|
||||
hints are "grouping hints" - they indicate that the next 2-4 Unicode
|
||||
characters should be treated as a single entity for transcoding. The
|
||||
other sixteen transcoding hints are "variant tags" - they are like
|
||||
combining characters, and can follow a standard Unicode (or a sequence
|
||||
consisting of a base character and other combining characters) to
|
||||
cause it to be treated in a special way for transcoding. These always
|
||||
terminate a combining-character sequence.
|
||||
|
||||
Whenever possible, mappings that require corporate-zone characters
|
||||
use standard Unicode characters in combination with a single
|
||||
transcoding hint (no mapping uses more than one transcoding hint).
|
||||
For these mappings, even if the corporate-zone characters are lost in
|
||||
interchange, the basic text content will be preserved.
|
||||
|
||||
However, some characters in a Mac OS encoding - such as the Apple
|
||||
logo character - bear no relation to any standard Unicode character.
|
||||
In these cases, the Mac OS character is mapped to a single corporate
|
||||
zone character defined by Apple. Fewer than 40 corporate characters
|
||||
are used in this way.
|
||||
|
||||
All of the corporate characters defined by Apple are listed in the
|
||||
accompanying file "CORPCHAR.TXT", including old Apple corporate
|
||||
character assignments which are now deprecated (but which are still
|
||||
supported as loose mappings by the Mac OS Text Encoding Converter).
|
||||
|
||||
7. Font variants
|
||||
----------------
|
||||
|
||||
For some Mac OS legacy encodings, certain fonts used with that encoding
|
||||
may actually implement a slight variant of the standard encoding
|
||||
specified in the accompanying mapping tables. The header comments in the
|
||||
mapping table files for each encoding describe any font variants
|
||||
associated with that encoding.
|
||||
|
||||
8. Encodings in Mac OS X
|
||||
------------------------
|
||||
|
||||
The Mac OS X Cocoa and Carbon environments use Unicode as the primary
|
||||
text encoding. Some legacy programming interfaces in the Carbon
|
||||
environment - e.g. Quickdraw Text, the Script Manager, and related
|
||||
Text Utilities - use and support the following subset of Mac OS legacy
|
||||
encodings:
|
||||
Roman
|
||||
Central European
|
||||
Cyrillic
|
||||
Chinese Traditional
|
||||
Chinese Simplified
|
||||
Japanese
|
||||
Korean
|
||||
|
||||
Other legacy Mac OS encodings are supported in Carbon and Cocoa via
|
||||
transcoding using the Mac OS Text Encoding Converter or other
|
||||
transcoding interfaces; the character repertoires of all Mac OS
|
||||
legacy encodings are supported in Unicode on Mac OS X.
|
||||
|
||||
Additional legacy encodings are also supported in the Classic
|
||||
environment under Mac OS X.
|
||||
|
||||
9. Mac OS legacy encodings
|
||||
--------------------------
|
||||
|
||||
Mac OS versions 7.1 and later supported multiple encodings via the
|
||||
Script Manager, QuickDraw Text and related Text Utilities. These
|
||||
system components distinguish these encodings primarily by script code:
|
||||
font family IDs are grouped into ranges, and each range is associated
|
||||
with a script code.
|
||||
|
||||
In some cases, there are several encodings that share a single script
|
||||
code. Usually these are closely related. To distinguish among these,
|
||||
additional information is required, such as font name or system
|
||||
region code (locale code).
|
||||
|
||||
The encodings described here (and in the accompanying tables) are the
|
||||
legacy encodings used in Mac OS versions 7.1 and later. In some cases,
|
||||
certain earlier system versions have used different encodings. Not all
|
||||
of these encodings are directly supported in Mac OS X, but Mac OS X
|
||||
does support transcoding between all of these encodings and Unicode.
|
||||
|
||||
In all Mac OS legacy encodings, character codes 0x00-0x7F are identical
|
||||
to ASCII, except that
|
||||
- in Mac OS Japanese, reverse solidus is replaced by yen sign
|
||||
- in Mac OS Arabic, Farsi, and Hebrew, some of the punctuation in this
|
||||
range is treated as having strong left-right directionality,
|
||||
although the corresponding Unicode characters have neutral
|
||||
directionality
|
||||
- in the three symbol glyphs encodings (Symbol, Dingbats, and Keyboard
|
||||
glyphs), a different mapping is used for the ASCII range. The
|
||||
Keyboard glyphs encoding even has a special mapping for the control
|
||||
characters range 0x00-0x1F.
|
||||
Fonts used as "system" fonts (for menus, dialogs, etc.) had four glyphs
|
||||
at code points 0x11-0x14 for transient use by the Menu Manager. These
|
||||
glyphs were not intended as characters for use in normal text, and the
|
||||
associated code points are not generally interpreted as associated with
|
||||
these glyphs. (However, a "system font variant" mapping table could
|
||||
provide mappings for these).
|
||||
|
||||
Note that in general, character sets cannot be determined from font
|
||||
layouts (they are not the same thing!). This is very noticeable with
|
||||
Arabic, Hebrew, and Devanagari, for example.
|
||||
|
||||
The following is a list of legacy Mac OS encodings. The accompanying
|
||||
tables provide mappings from these encodings to Unicode.
|
||||
|
||||
a) Mac OS encodings for script code 0, smRoman.
|
||||
|
||||
* Roman - this is the default for script code 0 (when the special
|
||||
cases listed below do not apply). It covers several western European
|
||||
languages, and includes math operators and various symbols.
|
||||
|
||||
* Symbol - this is the encoding for the font named "Symbol". It includes
|
||||
Greek letters, math operators, and miscellaneous symbols. The layout
|
||||
of the Symbol character set is identical to the layout of the Adobe
|
||||
Symbol encoding vector, with the addition of the Apple logo at 0xF0
|
||||
and the EURO SIGN at 0xA0.
|
||||
|
||||
* Dingbats - this is the encoding for the font named "Zapf Dingbats".
|
||||
The layout of the Dingbats character set is identical to or a superset
|
||||
of the layout of the Adobe Zapf Dingbats encoding vector.
|
||||
|
||||
* Keyboard glyphs - this is the encoding for the legacy font named
|
||||
".Keyboard". Before Mac OS X, this font was used by the user-interface
|
||||
system to display glyphs for special keys on the keyboard. In Mac OS
|
||||
X, this mapping is not associated with a font; it is only used as a
|
||||
way to map from a set of Menu Manager constants to associated Unicode
|
||||
sequences. As such, new mappings added for Mac OS X only may be
|
||||
one-way mappings: From the Keyboard glyph "encoding" to Unicode, but
|
||||
not back.
|
||||
|
||||
* Turkish - this is the encoding if the script code is 0 and the system
|
||||
region code is 24, verTurkey. It has 7 code point differences from
|
||||
Mac OS Roman.
|
||||
|
||||
* Croatian - this is the encoding if the script code is 0 and the system
|
||||
region code is any of the following:
|
||||
68, verCroatia
|
||||
66, verSlovenian
|
||||
25, verYugoCroatian (only used in older systems)
|
||||
It has 20 code point differences from standard Roman, but only 10
|
||||
differences in repertoire.
|
||||
|
||||
* Icelandic - this is the encoding if the script code is 0 and the
|
||||
system region code is either of the following:
|
||||
21, verIceland
|
||||
47, verFaroeIsl
|
||||
It has 6 code point differences from standard Roman. It also has one
|
||||
font variant.
|
||||
|
||||
* Romanian - this is the encoding if the script code is 0 and the system
|
||||
region code is 39, verRomania . It has 6 code point differences from
|
||||
standard Roman.
|
||||
|
||||
* Celtic - this is the encoding if the script code is 0 and the system
|
||||
region code is any of the following:
|
||||
50, verIreland
|
||||
75, verScottishGaelic
|
||||
76, verManxGaelic
|
||||
77, verBreton
|
||||
79, verWelsh
|
||||
It is a variant of Mac OS Roman with a few extra accented characters
|
||||
for Welsh.
|
||||
|
||||
* Gaelic - this is the encoding if the script code is 0 and the system
|
||||
region code is 81, verIrishGaelicScript. It is a variant of Mac OS
|
||||
Roman, and supports the older Irish orthography using dot above.
|
||||
|
||||
* Greek (monotonic) - this is the encoding if the script code is 0 and
|
||||
the system region code is 20, verGreece. Although a script code is
|
||||
defined for Greek, the Greek localized system does not use it (the
|
||||
font family IDs are in the smRoman range). This encoding is based on
|
||||
the ISO/IEC 8859-7 repertoire with additional Roman characters for
|
||||
French and German, as well as additional symbols. Greek system 4.1
|
||||
used a different encoding that matched 8859-7 code points for Greek
|
||||
letters. Greek system 6.0.7 also used a variant of the standard
|
||||
encoding, but it was quickly replaced by Greek system 6.0.7.1 which
|
||||
used the standard encoding.
|
||||
|
||||
See also the Central European encoding under script code 29 below.
|
||||
|
||||
b) Mac OS encodings for script code 1, smJapanese.
|
||||
|
||||
* Japanese - this is the default for script code 1. It is based on a
|
||||
Shift-JIS implementation of JIS X0208-1990 ("fullwidth") and
|
||||
JIS X0201-1976 ("halfwidth"), with 5 additional one-byte characters
|
||||
and one modified character, a set of Apple extension characters which
|
||||
include many industry standard extensions, and separate codes for
|
||||
vertical forms of some punctuation and kana. There are several font
|
||||
variants.
|
||||
|
||||
c) Mac OS encodings for script code 2, smTradChinese.
|
||||
|
||||
* Chinese Traditional - this is an extension of Big-5.
|
||||
|
||||
d) Mac OS encodings for script code 3, smKorean.
|
||||
|
||||
* Korean - this is an extension of EUC-KR.
|
||||
|
||||
e) Mac OS encodings for script code 4, smArabic.
|
||||
|
||||
* Arabic - This is the default for script code 4 (when the special
|
||||
case listed below does not apply). It is based on the ISO/IEC 8859-6
|
||||
repertoire, with additional Arabic letters for Persian and Urdu and
|
||||
with accented Roman letters for European languages. It has the
|
||||
interesting feature mentioned above that certain ASCII punctuation
|
||||
and symbol characters are encoded twice, once for each direction. It
|
||||
has several font variants.
|
||||
|
||||
* Farsi - This is the encoding if the script code is 4 and the system
|
||||
region code is 48, verIran. It is similar to Mac OS Arabic, but has
|
||||
the "extended" or Persian digits instead of the standard Arabic
|
||||
digits. It has one font variant.
|
||||
|
||||
f) Mac OS encodings for script code 5, smHebrew.
|
||||
|
||||
* Hebrew - This is based on the ISO/IEC 8859-8 Hebrew letter repertoire,
|
||||
but adds Hebrew points, some Hebrew ligatures, some accented Roman
|
||||
letters for European languages, and some non-ASCII punctuation. As
|
||||
with Mac OS Arabic, certain ASCII punctuation and symbol characters
|
||||
are encoded twice, once for each direction. This is also true for the
|
||||
European digits. This has one font variant.
|
||||
|
||||
g) Mac OS encodings for script code 6, smGreek.
|
||||
|
||||
None currently - see smRoman.
|
||||
|
||||
h) Mac OS encodings for script code 7, smCyrillic.
|
||||
|
||||
* Cyrillic - This is based on the ISO/IEC 8859-5 Cyrillic character
|
||||
repertoire plus an additional case pair for Ukrainian.
|
||||
|
||||
i) Mac OS encodings for script code 9, smDevanagari.
|
||||
|
||||
* Devanagari - This is based on IS 13194:1991 (ISCII-91), and adds some
|
||||
punctuation and symbols.
|
||||
|
||||
j) Mac OS encodings for script code 10, smGurmukhi.
|
||||
|
||||
* Gurmukhi - This is based on IS 13194:1991 (ISCII-91), and adds some
|
||||
punctuation and symbols.
|
||||
|
||||
k) Mac OS encodings for script code 11, smGujarati.
|
||||
|
||||
* Gujarati - This is based on IS 13194:1991 (ISCII-91), and adds some
|
||||
punctuation and symbols.
|
||||
|
||||
l) Mac OS encodings for script code 21, smThai.
|
||||
|
||||
* Thai - This is based on TIS 620-2533, except that three of the
|
||||
TIS 620-2533 characters are replaced with other characters. Some
|
||||
undefined code points in TIS 620-2533 are used for additional
|
||||
punctuation characters.
|
||||
|
||||
m) Mac OS encodings for script code 25, smSimpChinese.
|
||||
|
||||
* Chinese Simplified - this is an extension of EUC-CN.
|
||||
|
||||
n) Mac OS encodings for script code 26, smTibetan.
|
||||
|
||||
* Tibetan
|
||||
|
||||
o) Mac OS encodings for script code 28, smEthiopic.
|
||||
|
||||
* Inuit - this is the encoding if the script code is 28 and the
|
||||
system region code is 78, verNunavut (for Inuktitut language).
|
||||
There is no script code for Inuit, so it shares the script code
|
||||
with Ethiopic.
|
||||
|
||||
p) Mac OS encodings for script code 29, smCentralEuroRoman.
|
||||
|
||||
* Central European - This is similar to standard Roman, but with a
|
||||
different (and larger) set of European characters and with fewer
|
||||
symbols. It is used for Polish, Czech, Slovak, Hungarian, Estonian,
|
||||
Latvian, and Lithuanian.
|
590
charmap/Readme.txt
Normal file
590
charmap/Readme.txt
Normal file
@ -0,0 +1,590 @@
|
||||
#=======================================================================
|
||||
# File name: README.TXT
|
||||
#
|
||||
# Contents: Background information on Unicode mapping tables for
|
||||
# Mac OS legacy text encodings
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-04 Update discussion of roundtrip fidelity,
|
||||
# delete discussion of mappings dependent on
|
||||
# symmetric swapping (no longer supported),
|
||||
# provide information on how legacy encodings
|
||||
# are supported in Mac OS X.
|
||||
# b3,c1 2002-Dec-19 Add Keyboard font encoding. Update URLs,
|
||||
# notes.
|
||||
# b02 1999-Sep-22 Update information on Cyrillic. Update
|
||||
# contact e-mail address.
|
||||
# n07 1998-Feb-05 Rewrite to provide additional information
|
||||
# relevant to using the accompanying mapping
|
||||
# tables, and to delete some extraneous
|
||||
# information. Delete Bulgarian (no special
|
||||
# encoding, uses standard Cyrillic), add
|
||||
# Farsi, Devanagari, Gurmukhi, Gujarati,
|
||||
# Celtic, Gaelic, Inuit, Tibetan.
|
||||
# n04 1995-Nov-15 Update info for Hebrew and Thai
|
||||
# n03 1995-Apr-15 First version (after fixing some typos).
|
||||
#
|
||||
##################
|
||||
|
||||
0. Preliminaries
|
||||
----------------
|
||||
|
||||
For maximum interchangeability, this file and the accompanying Mac OS
|
||||
mapping tables use only ASCII characters. They are intended to be
|
||||
displayed in a monospaced font.
|
||||
|
||||
Apple, the Apple logo, Mac, and Macintosh are trademarks of Apple
|
||||
Computer, Inc., registered in the United States and other countries.
|
||||
QuickDraw and TrueType are trademarks of Apple Computer, Inc. Unicode is
|
||||
a trademark of Unicode Inc. PostScript is a trademark of Adobe Systems
|
||||
Inc., which may be registered in certain jurisdictions. IBM is a
|
||||
registered trademark of International Business Machines Corporation. ITC
|
||||
Zapf Dingbats is a registered trademark of the International Typeface
|
||||
Corporation. For the sake of brevity, throughout this document and the
|
||||
accompanying tables, "Macintosh" can be used to refer to Macintosh
|
||||
computers and "Unicode" can be used to refer to the Unicode standard.
|
||||
|
||||
Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
either express or implied, with respect to this document and the
|
||||
accompanying tables, their quality, accuracy, or fitness for a
|
||||
particular purpose. In no event will Apple be liable for direct,
|
||||
indirect, special, incidental, or consequential damages resulting from
|
||||
any defect or inaccuracy in this document or the accompanying tables.
|
||||
|
||||
1. Introduction
|
||||
---------------
|
||||
|
||||
This document summarizes some Unicode mapping considerations that are
|
||||
relevant for the accompanying mapping tables. It also provides an
|
||||
overview of Mac OS legacy encodings.
|
||||
|
||||
These mapping tables and character lists are subject to change. The
|
||||
latest tables should be available from the following:
|
||||
|
||||
<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
|
||||
2. Round-trip fidelity and overview of mapping techniques
|
||||
---------------------------------------------------------
|
||||
|
||||
For a particular set of national and international standards, Unicode
|
||||
provides round-trip fidelity: Text in one of those encodings can be
|
||||
mapped to Unicode and back again, yielding the original characters.
|
||||
Characters which are distinct in one of these source standards have a
|
||||
distinct counterpart in Unicode. Note that this counterpart might not be
|
||||
a single Unicode character; as is pointed out in "The Unicode Standard,
|
||||
Version 2.0" (page 2-10), "sometimes a single code value in another
|
||||
standard corresponds to a sequence of code values in the Unicode
|
||||
Standard, or vice versa."
|
||||
|
||||
However, Unicode does not attempt to provide round-trip fidelity for
|
||||
most vendor standards. Nevertheless, Apple and other platform vendors
|
||||
may need to provide such round-trip fidelity for their current platform
|
||||
encodings and/or legacy platform encodings (this can be important in
|
||||
file systems, for example). In order to do this, Apple makes use of some
|
||||
Unicode characters in the corporate-use zone (the upper end of the
|
||||
private use area).
|
||||
|
||||
Corporate-zone characters must be used with care. Indiscriminate use of
|
||||
such characters can result in text which is not easily interchanged with
|
||||
other systems, since these characters have no standard meaning outside a
|
||||
particular platform. The mappings provided here are intended to minimize
|
||||
the use of private use characters, or to use them in such a way that
|
||||
basic text content will not be lost if the corporate zone characters are
|
||||
dropped when text is transferred to another system.
|
||||
|
||||
The tables provided here have three goals, in the following order of
|
||||
importance:
|
||||
1. Provide 100% round-trip mapping from a Mac OS legacy encoding to
|
||||
Unicode and back.
|
||||
2. Map characters in a Mac OS encoding into the Unicode characters that
|
||||
best represent the interpretation and usage of the Mac OS characters.
|
||||
3. When mapping text in a Mac OS encoding to Unicode using the tables,
|
||||
the resulting Unicode text should be as interchangeable as possible.
|
||||
|
||||
To satisfy these goals, the mappings use a variety of techniques. First
|
||||
we attempt to achieve round-trip mappings using any standard Unicode
|
||||
feature at our disposal, without resorting to corporate-zone characters.
|
||||
This can includes the following techniques:
|
||||
- Use of all Unicode characters defined in Unicode 2.1 and later,
|
||||
including compatibility characters.
|
||||
- Mapping a single character in a Mac OS encoding to a sequence of
|
||||
standard Unicode characters, or vice versa. This requires grouping
|
||||
characters into appropriate chunks for lookup before mapping them
|
||||
(this mainly applies to sequences of Unicode characters).
|
||||
- Using Unicode direction overrides to force direction attributes when
|
||||
mapping to Unicode. This requires resolution of Unicode character
|
||||
direction, and use of this information, when mapping from Unicode back
|
||||
to certain Mac OS encodings.
|
||||
The requirements imposed on Unicode handling are necessary for other,
|
||||
non-transcoding operations in a full Unicode implementation anyway, so
|
||||
requiring them for transcoding should not impose much of a burden.
|
||||
|
||||
Next, if round-trip fidelity cannot be achieved using the above
|
||||
techniques, we attempt to use corporate-zone characters only as
|
||||
"transcoding hints" (more on this below). These are combined with one or
|
||||
more standard Unicode characters to mark them as special for
|
||||
transcoding, but have no other function and can be deleted with no loss
|
||||
of basic text content (only of round-trip fidelity).
|
||||
|
||||
Finally, if a character in a Mac OS encoding is unrelated to any Unicode
|
||||
character or Unicode character sequence, we may map it to a single
|
||||
corporate-zone Unicode code point.
|
||||
|
||||
These techniques are described in more detail in the following sections.
|
||||
|
||||
Some clients of these tables may have a different set of goals. For
|
||||
example, some clients may prefer to avoid compatibility characters,
|
||||
perhaps sacrificing round-trip fidelity if necessary. In most cases it
|
||||
is fairly easy to construct other types of mappings from the mappings
|
||||
given here. In particular, the Unicode mappings here have been designed
|
||||
so that if they are converted to a restricted form of NFD (a form that
|
||||
does NOT decompose or normalize Unicode characters in the ranges
|
||||
2000-2FFF or F900-FAFF), the resulting mappings still provide roundtrip
|
||||
fidelity. (For certain characters in the Mac OS Hebrew and Devanagari
|
||||
encodings, the decomposition mappings must use a grouping transcoding
|
||||
hint to ensure roundtrip fidelity; more details on this are provided in
|
||||
the mapping tables for those encodings.)
|
||||
|
||||
There is one more round-trip issue that should be mentioned. If a
|
||||
Unicode character or sequence can be mapped at all into a particular Mac
|
||||
OS encoding, then the reverse mapping back to Unicode should yield the
|
||||
original Unicode character or sequence (except for possible differences
|
||||
in direction overrides or other Unicode characters with General Category
|
||||
Cf). The tables here also provide this. For a related issue, see the
|
||||
next section.
|
||||
|
||||
3. Mapping tolerance: Strict and loose
|
||||
--------------------------------------
|
||||
|
||||
In many character sets, a single character may have multiple semantics,
|
||||
either by explicit definition, ambiguous definition, or established
|
||||
usage. For example, the JIS character 0x2142, or 0x8161 in Shift-JIS,
|
||||
is specified in the JIS X0208 standard to have two meanings: "double
|
||||
vertical line" and "parallel". Each of these meanings corresponds to a
|
||||
different Unicode character: 0x2016 DOUBLE VERTICAL LINE and 0x2225
|
||||
PARALLEL TO. When mapping from Unicode to Shift-JIS, it is normally
|
||||
desirable to map both of these Unicode characters to the single
|
||||
Shift-JIS character. However, when mapping the Shift-JIS character to
|
||||
Unicode, we can choose only one of the possible Unicode characters.
|
||||
|
||||
For two encodings X and Y, we can define a set of "strict" mappings
|
||||
from one to the other as follows: If text in X can be mapped to Y using
|
||||
the strict mappings from X to Y, then the resulting text can be mapped
|
||||
back using the strict mappings from Y to X to end up with the original
|
||||
text from X. Similarly, if text in Y can be mapped to X using the strict
|
||||
mappings from Y to X, then the resulting text can be mapped back using
|
||||
the strict mappings from X to Y to end up with the original text from Y.
|
||||
|
||||
There may be several characters in one encoding that all map to a
|
||||
single character in another encoding, but only one of these mappings
|
||||
can be strict; the others are "loose".
|
||||
|
||||
The mappings given in the accompanying tables are strict mappings.
|
||||
However, the Mac OS Text Encoding Converter also supports loose
|
||||
mappings and fallback mappings. Some of the accompanying tables provide
|
||||
suggestions about possible loose mappings.
|
||||
|
||||
4. Mapping a Mac encoding character to a Unicode sequence or vice versa
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
In some cases, a character in a Mac OS legacy encoding maps to a
|
||||
sequence of Unicode characters. For example, the Mac OS Japanese
|
||||
encoding includes a character for the circled CJK ideograph "big".
|
||||
Although Unicode encodes other circled ideographs as single characters,
|
||||
it does not encode this one. However, this character can be
|
||||
unambiguously represented in Unicode as the Unicode sequence
|
||||
0x5927+0x20DD, the CJK ideograph for "big" followed by COMBINING
|
||||
ENCLOSING CIRCLE.
|
||||
|
||||
To handle the reverse mapping, a transcoding process must group the
|
||||
Unicode sequence 0x5927+0x20DD as a single element for lookup (The
|
||||
Mac OS Text Encoding Converter does this).
|
||||
|
||||
In a few cases, a sequence of characters in a Mac OS legacy encoding
|
||||
must be grouped for mapping to a single Unicode character or a sequence
|
||||
of Unicode characters. For example, in Mac OS Devanagari (based on
|
||||
ISCII-91), DEVANAGARI LETTER VOCALIC L is represented as 0xA6+0xE9;
|
||||
but this is represented in Unicode by the single character 0x090C.
|
||||
Furthermore, explicit halant is represented in Mac OS Devanagari as
|
||||
0xE8+0xE8 (double halant) and in Unicode as 0x094D+0x200C (VIRAMA
|
||||
plus ZERO WIDTH NON-JOINER). The latter can also be considered as
|
||||
a context-dependent mapping of 0xE8, halant.
|
||||
|
||||
Loose mappings from Unicode to a Mac OS encoding often map a single
|
||||
Unicode to a sequence of characters in the Mac OS encoding. For example,
|
||||
the Unicode character 0x00BD VULGAR FRACTION ONE HALF cannot be mapped
|
||||
into the Mac OS Roman character set as a single character, but it has a
|
||||
loose mapping to the sequence 0x31+0xDA+0x32, "digit one" + "fraction
|
||||
slash" + "digit two".
|
||||
|
||||
In some cases a Unicode character such as a direction override may
|
||||
simply be discarded when mapping to a Mac OS encoding, since the
|
||||
information carried by the override may be represented in a different
|
||||
way by the Mac OS encoding. See the next section for an example.
|
||||
|
||||
5. Mappings that depend on directionality (or other attributes)
|
||||
---------------------------------------------------------------
|
||||
|
||||
Strict mappings from Unicode to Mac OS legacy encodings may depend on
|
||||
resolved character direction. Loose mappings may depend on additional
|
||||
attributes such as whether the text should use vertical form codes if
|
||||
available (i.e. whether the text is intended for vertical display on a
|
||||
system that cannot automatically substitute vertical forms).
|
||||
|
||||
a) Resolved character direction
|
||||
|
||||
The Mac OS Arabic and Hebrew character sets were developed in 1986-1987.
|
||||
At that time the bidirectional line layout algorithm used in the Mac OS
|
||||
was fairly simple; it used only a few direction classes (instead of the
|
||||
19 now used in the Unicode bidirectional algorithm). In order to permit
|
||||
users to handle some tricky layout problems, certain punctuation and
|
||||
symbol characters have duplicate code points, one with a left-right
|
||||
direction attribute and the other with a right-left direction attribute.
|
||||
|
||||
For example, plus sign is encoded at 0x2B with a left-right attribute,
|
||||
and at 0xAB with a right-left attribute. However, there is only one PLUS
|
||||
SIGN character in Unicode. This leads to some interesting problems when
|
||||
mapping between Mac OS Arabic or Hebrew and Unicode.
|
||||
|
||||
We need a way to map both of these plus signs to Unicode and back. Using
|
||||
a single corporate character for one of these plus signs is not a good
|
||||
solution, since both of the plus sign characters are likely to be used
|
||||
in text that is interchanged, and thus content would be lost.
|
||||
|
||||
The problem is solved with the use of direction override characters and
|
||||
direction-dependent mappings. When mapping from Mac OS Arabic or Hebrew
|
||||
to Unicode, we use direction overrides as necessary to force the
|
||||
direction of the resulting Unicode characters. When mapping back from
|
||||
Unicode, the Unicode bidirectional algorithm should be used to determine
|
||||
resolved direction of the Unicode characters. The mapping from Unicode
|
||||
to Mac OS Arabic or Hebrew can then be disambiguated as necessary by
|
||||
using the resolved direction.
|
||||
|
||||
For example, when mapping from Mac OS Arabic or Hebrew, we can use
|
||||
LEFT-RIGHT OVERRIDE (LRO), RIGHT-LEFT OVERRIDE (RLO), and POP DIRECTION
|
||||
FORMATTING (PDF) as follows:
|
||||
|
||||
0x2B -> 0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
|
||||
0xAB -> 0x202E (RLO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
|
||||
|
||||
When mapping back, we resolve the direction of the Unicode character
|
||||
0x002B, and use this information to determine which of the Mac OS
|
||||
encoding characters to use:
|
||||
|
||||
0x002B -> 0x2B (if LR) or 0xAB (if RL)
|
||||
|
||||
After direction overrides have been used in this way to force a
|
||||
particular resolved direction, they may be discarded when mapping from
|
||||
Unicode to Mac OS Arabic and Hebrew (since the information they carried
|
||||
in Unicode is represented in the Mac OS encoding by the code point of
|
||||
the plus sign).
|
||||
|
||||
Even when not required for round-trip fidelity, direction overrides
|
||||
may be used when mapping from a Mac OS encoding to Unicode in order to
|
||||
preserve proper text layout. For example, the single Mac OS Arabic
|
||||
ellipsis character has direction class right-left, while the Unicode
|
||||
HORIZONTAL ELLIPSIS character has direction class neutral. When
|
||||
mapping the Mac OS ellipsis to Unicode, it is surrounded with a
|
||||
direction override to help preserve proper text layout. However,
|
||||
resolved direction is not needed or used when mapping the Unicode
|
||||
HORIZONTAL ELLIPSIS back to Mac OS Arabic.
|
||||
|
||||
b) Horizontal or vertical display
|
||||
|
||||
The Mac OS Japanese encoding includes separately-encoded vertical forms
|
||||
for some punctuation and kana. When Unicode characters in the CJK
|
||||
punctuation and kana ranges are mapped to Mac OS Japanese characters and
|
||||
(1) those characters are intended for vertical display, (2) they will be
|
||||
displayed in an environment that does not provide automatic vertical
|
||||
form substitution, and (3) loose mappings are desired, the Unicode
|
||||
characters can be mapped to the corresponding vertical form codes in the
|
||||
Mac OS Japanese encoding.
|
||||
|
||||
This does not affect mapping of the Unicode vertical presentation forms
|
||||
(which always map to the Mac OS Japanese vertical form codes).
|
||||
|
||||
6. Use of corporate characters
|
||||
------------------------------
|
||||
|
||||
Apple has defined a block of 32 corporate characters as "transcoding
|
||||
hints." These are used in combination with standard Unicode characters
|
||||
to force them to be treated in a special way for mapping to other
|
||||
encodings; they have no other effect. Sixteen of these transcoding
|
||||
hints are "grouping hints" - they indicate that the next 2-4 Unicode
|
||||
characters should be treated as a single entity for transcoding. The
|
||||
other sixteen transcoding hints are "variant tags" - they are like
|
||||
combining characters, and can follow a standard Unicode (or a sequence
|
||||
consisting of a base character and other combining characters) to
|
||||
cause it to be treated in a special way for transcoding. These always
|
||||
terminate a combining-character sequence.
|
||||
|
||||
Whenever possible, mappings that require corporate-zone characters
|
||||
use standard Unicode characters in combination with a single
|
||||
transcoding hint (no mapping uses more than one transcoding hint).
|
||||
For these mappings, even if the corporate-zone characters are lost in
|
||||
interchange, the basic text content will be preserved.
|
||||
|
||||
However, some characters in a Mac OS encoding - such as the Apple
|
||||
logo character - bear no relation to any standard Unicode character.
|
||||
In these cases, the Mac OS character is mapped to a single corporate
|
||||
zone character defined by Apple. Fewer than 40 corporate characters
|
||||
are used in this way.
|
||||
|
||||
All of the corporate characters defined by Apple are listed in the
|
||||
accompanying file "CORPCHAR.TXT", including old Apple corporate
|
||||
character assignments which are now deprecated (but which are still
|
||||
supported as loose mappings by the Mac OS Text Encoding Converter).
|
||||
|
||||
7. Font variants
|
||||
----------------
|
||||
|
||||
For some Mac OS legacy encodings, certain fonts used with that encoding
|
||||
may actually implement a slight variant of the standard encoding
|
||||
specified in the accompanying mapping tables. The header comments in the
|
||||
mapping table files for each encoding describe any font variants
|
||||
associated with that encoding.
|
||||
|
||||
8. Encodings in Mac OS X
|
||||
------------------------
|
||||
|
||||
The Mac OS X Cocoa and Carbon environments use Unicode as the primary
|
||||
text encoding. Some legacy programming interfaces in the Carbon
|
||||
environment - e.g. Quickdraw Text, the Script Manager, and related
|
||||
Text Utilities - use and support the following subset of Mac OS legacy
|
||||
encodings:
|
||||
Roman
|
||||
Central European
|
||||
Cyrillic
|
||||
Chinese Traditional
|
||||
Chinese Simplified
|
||||
Japanese
|
||||
Korean
|
||||
|
||||
Other legacy Mac OS encodings are supported in Carbon and Cocoa via
|
||||
transcoding using the Mac OS Text Encoding Converter or other
|
||||
transcoding interfaces; the character repertoires of all Mac OS
|
||||
legacy encodings are supported in Unicode on Mac OS X.
|
||||
|
||||
Additional legacy encodings are also supported in the Classic
|
||||
environment under Mac OS X.
|
||||
|
||||
9. Mac OS legacy encodings
|
||||
--------------------------
|
||||
|
||||
Mac OS versions 7.1 and later supported multiple encodings via the
|
||||
Script Manager, QuickDraw Text and related Text Utilities. These
|
||||
system components distinguish these encodings primarily by script code:
|
||||
font family IDs are grouped into ranges, and each range is associated
|
||||
with a script code.
|
||||
|
||||
In some cases, there are several encodings that share a single script
|
||||
code. Usually these are closely related. To distinguish among these,
|
||||
additional information is required, such as font name or system
|
||||
region code (locale code).
|
||||
|
||||
The encodings described here (and in the accompanying tables) are the
|
||||
legacy encodings used in Mac OS versions 7.1 and later. In some cases,
|
||||
certain earlier system versions have used different encodings. Not all
|
||||
of these encodings are directly supported in Mac OS X, but Mac OS X
|
||||
does support transcoding between all of these encodings and Unicode.
|
||||
|
||||
In all Mac OS legacy encodings, character codes 0x00-0x7F are identical
|
||||
to ASCII, except that
|
||||
- in Mac OS Japanese, reverse solidus is replaced by yen sign
|
||||
- in Mac OS Arabic, Farsi, and Hebrew, some of the punctuation in this
|
||||
range is treated as having strong left-right directionality,
|
||||
although the corresponding Unicode characters have neutral
|
||||
directionality
|
||||
- in the three symbol glyphs encodings (Symbol, Dingbats, and Keyboard
|
||||
glyphs), a different mapping is used for the ASCII range. The
|
||||
Keyboard glyphs encoding even has a special mapping for the control
|
||||
characters range 0x00-0x1F.
|
||||
Fonts used as "system" fonts (for menus, dialogs, etc.) had four glyphs
|
||||
at code points 0x11-0x14 for transient use by the Menu Manager. These
|
||||
glyphs were not intended as characters for use in normal text, and the
|
||||
associated code points are not generally interpreted as associated with
|
||||
these glyphs. (However, a "system font variant" mapping table could
|
||||
provide mappings for these).
|
||||
|
||||
Note that in general, character sets cannot be determined from font
|
||||
layouts (they are not the same thing!). This is very noticeable with
|
||||
Arabic, Hebrew, and Devanagari, for example.
|
||||
|
||||
The following is a list of legacy Mac OS encodings. The accompanying
|
||||
tables provide mappings from these encodings to Unicode.
|
||||
|
||||
a) Mac OS encodings for script code 0, smRoman.
|
||||
|
||||
* Roman - this is the default for script code 0 (when the special
|
||||
cases listed below do not apply). It covers several western European
|
||||
languages, and includes math operators and various symbols.
|
||||
|
||||
* Symbol - this is the encoding for the font named "Symbol". It includes
|
||||
Greek letters, math operators, and miscellaneous symbols. The layout
|
||||
of the Symbol character set is identical to the layout of the Adobe
|
||||
Symbol encoding vector, with the addition of the Apple logo at 0xF0
|
||||
and the EURO SIGN at 0xA0.
|
||||
|
||||
* Dingbats - this is the encoding for the font named "Zapf Dingbats".
|
||||
The layout of the Dingbats character set is identical to or a superset
|
||||
of the layout of the Adobe Zapf Dingbats encoding vector.
|
||||
|
||||
* Keyboard glyphs - this is the encoding for the legacy font named
|
||||
".Keyboard". Before Mac OS X, this font was used by the user-interface
|
||||
system to display glyphs for special keys on the keyboard. In Mac OS
|
||||
X, this mapping is not associated with a font; it is only used as a
|
||||
way to map from a set of Menu Manager constants to associated Unicode
|
||||
sequences. As such, new mappings added for Mac OS X only may be
|
||||
one-way mappings: From the Keyboard glyph "encoding" to Unicode, but
|
||||
not back.
|
||||
|
||||
* Turkish - this is the encoding if the script code is 0 and the system
|
||||
region code is 24, verTurkey. It has 7 code point differences from
|
||||
Mac OS Roman.
|
||||
|
||||
* Croatian - this is the encoding if the script code is 0 and the system
|
||||
region code is any of the following:
|
||||
68, verCroatia
|
||||
66, verSlovenian
|
||||
25, verYugoCroatian (only used in older systems)
|
||||
It has 20 code point differences from standard Roman, but only 10
|
||||
differences in repertoire.
|
||||
|
||||
* Icelandic - this is the encoding if the script code is 0 and the
|
||||
system region code is either of the following:
|
||||
21, verIceland
|
||||
47, verFaroeIsl
|
||||
It has 6 code point differences from standard Roman. It also has one
|
||||
font variant.
|
||||
|
||||
* Romanian - this is the encoding if the script code is 0 and the system
|
||||
region code is 39, verRomania . It has 6 code point differences from
|
||||
standard Roman.
|
||||
|
||||
* Celtic - this is the encoding if the script code is 0 and the system
|
||||
region code is any of the following:
|
||||
50, verIreland
|
||||
75, verScottishGaelic
|
||||
76, verManxGaelic
|
||||
77, verBreton
|
||||
79, verWelsh
|
||||
It is a variant of Mac OS Roman with a few extra accented characters
|
||||
for Welsh.
|
||||
|
||||
* Gaelic - this is the encoding if the script code is 0 and the system
|
||||
region code is 81, verIrishGaelicScript. It is a variant of Mac OS
|
||||
Roman, and supports the older Irish orthography using dot above.
|
||||
|
||||
* Greek (monotonic) - this is the encoding if the script code is 0 and
|
||||
the system region code is 20, verGreece. Although a script code is
|
||||
defined for Greek, the Greek localized system does not use it (the
|
||||
font family IDs are in the smRoman range). This encoding is based on
|
||||
the ISO/IEC 8859-7 repertoire with additional Roman characters for
|
||||
French and German, as well as additional symbols. Greek system 4.1
|
||||
used a different encoding that matched 8859-7 code points for Greek
|
||||
letters. Greek system 6.0.7 also used a variant of the standard
|
||||
encoding, but it was quickly replaced by Greek system 6.0.7.1 which
|
||||
used the standard encoding.
|
||||
|
||||
See also the Central European encoding under script code 29 below.
|
||||
|
||||
b) Mac OS encodings for script code 1, smJapanese.
|
||||
|
||||
* Japanese - this is the default for script code 1. It is based on a
|
||||
Shift-JIS implementation of JIS X0208-1990 ("fullwidth") and
|
||||
JIS X0201-1976 ("halfwidth"), with 5 additional one-byte characters
|
||||
and one modified character, a set of Apple extension characters which
|
||||
include many industry standard extensions, and separate codes for
|
||||
vertical forms of some punctuation and kana. There are several font
|
||||
variants.
|
||||
|
||||
c) Mac OS encodings for script code 2, smTradChinese.
|
||||
|
||||
* Chinese Traditional - this is an extension of Big-5.
|
||||
|
||||
d) Mac OS encodings for script code 3, smKorean.
|
||||
|
||||
* Korean - this is an extension of EUC-KR.
|
||||
|
||||
e) Mac OS encodings for script code 4, smArabic.
|
||||
|
||||
* Arabic - This is the default for script code 4 (when the special
|
||||
case listed below does not apply). It is based on the ISO/IEC 8859-6
|
||||
repertoire, with additional Arabic letters for Persian and Urdu and
|
||||
with accented Roman letters for European languages. It has the
|
||||
interesting feature mentioned above that certain ASCII punctuation
|
||||
and symbol characters are encoded twice, once for each direction. It
|
||||
has several font variants.
|
||||
|
||||
* Farsi - This is the encoding if the script code is 4 and the system
|
||||
region code is 48, verIran. It is similar to Mac OS Arabic, but has
|
||||
the "extended" or Persian digits instead of the standard Arabic
|
||||
digits. It has one font variant.
|
||||
|
||||
f) Mac OS encodings for script code 5, smHebrew.
|
||||
|
||||
* Hebrew - This is based on the ISO/IEC 8859-8 Hebrew letter repertoire,
|
||||
but adds Hebrew points, some Hebrew ligatures, some accented Roman
|
||||
letters for European languages, and some non-ASCII punctuation. As
|
||||
with Mac OS Arabic, certain ASCII punctuation and symbol characters
|
||||
are encoded twice, once for each direction. This is also true for the
|
||||
European digits. This has one font variant.
|
||||
|
||||
g) Mac OS encodings for script code 6, smGreek.
|
||||
|
||||
None currently - see smRoman.
|
||||
|
||||
h) Mac OS encodings for script code 7, smCyrillic.
|
||||
|
||||
* Cyrillic - This is based on the ISO/IEC 8859-5 Cyrillic character
|
||||
repertoire plus an additional case pair for Ukrainian.
|
||||
|
||||
i) Mac OS encodings for script code 9, smDevanagari.
|
||||
|
||||
* Devanagari - This is based on IS 13194:1991 (ISCII-91), and adds some
|
||||
punctuation and symbols.
|
||||
|
||||
j) Mac OS encodings for script code 10, smGurmukhi.
|
||||
|
||||
* Gurmukhi - This is based on IS 13194:1991 (ISCII-91), and adds some
|
||||
punctuation and symbols.
|
||||
|
||||
k) Mac OS encodings for script code 11, smGujarati.
|
||||
|
||||
* Gujarati - This is based on IS 13194:1991 (ISCII-91), and adds some
|
||||
punctuation and symbols.
|
||||
|
||||
l) Mac OS encodings for script code 21, smThai.
|
||||
|
||||
* Thai - This is based on TIS 620-2533, except that three of the
|
||||
TIS 620-2533 characters are replaced with other characters. Some
|
||||
undefined code points in TIS 620-2533 are used for additional
|
||||
punctuation characters.
|
||||
|
||||
m) Mac OS encodings for script code 25, smSimpChinese.
|
||||
|
||||
* Chinese Simplified - this is an extension of EUC-CN.
|
||||
|
||||
n) Mac OS encodings for script code 26, smTibetan.
|
||||
|
||||
* Tibetan
|
||||
|
||||
o) Mac OS encodings for script code 28, smEthiopic.
|
||||
|
||||
* Inuit - this is the encoding if the script code is 28 and the
|
||||
system region code is 78, verNunavut (for Inuktitut language).
|
||||
There is no script code for Inuit, so it shares the script code
|
||||
with Ethiopic.
|
||||
|
||||
p) Mac OS encodings for script code 29, smCentralEuroRoman.
|
||||
|
||||
* Central European - This is similar to standard Roman, but with a
|
||||
different (and larger) set of European characters and with fewer
|
||||
symbols. It is used for Polish, Czech, Slovak, Hungarian, Estonian,
|
||||
Latvian, and Lithuanian.
|
405
charmap/SYMBOL.TXT
Normal file
405
charmap/SYMBOL.TXT
Normal file
@ -0,0 +1,405 @@
|
||||
#=======================================================================
|
||||
# File name: SYMBOL.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Symbol
|
||||
# character set to Unicode 4.0 and later.
|
||||
#
|
||||
# Copyright: (c) 1994-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Change mappings for 0xBD, 0xE0. Update
|
||||
# header comments. Matches internal xml <c1.2>
|
||||
# and Text Encoding Converter 2.0.
|
||||
# b4,c1 2002-Dec-19 Update mappings for encoded glyph fragments
|
||||
# 0xBE, 0xE6-EF, 0xF4, 0xF6-FE to use new
|
||||
# Unicode 3.2 characters instead of sequences
|
||||
# involving corporate-use characters. Update
|
||||
# URLs, notes. Matches internal utom<b4>.
|
||||
# b03 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b3>, ufrm<b3>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# b02 1998-Aug-18 Encoding changed for Mac OS 8.5; add new
|
||||
# mapping from 0xA0 to EURO SIGN. Matches
|
||||
# internal utom<b3>, ufrm<b3>.
|
||||
# n05 1998-Feb-05 Update to match internal utom<n5>, ufrm<n15>
|
||||
# and Text Encoding Converter version 1.3:
|
||||
# Use standard Unicodes plus transcoding hints
|
||||
# instead of single corporate characters, also
|
||||
# change mappings for 0xE1 & 0xF1 from U+2329
|
||||
# & U+232A to their canonical decompositions;
|
||||
# see details below. Also update header
|
||||
# comments to new format.
|
||||
# n03 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n4>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Symbol code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode or Unicode sequence
|
||||
# (in hex as 0xNNNN or 0xNNNN+0xNNNN).
|
||||
# Column #3 is a comment containing the Unicode name.
|
||||
# In some cases an additional comment follows the Unicode name.
|
||||
#
|
||||
# The entries are in Mac OS Symbol code order.
|
||||
#
|
||||
# Some of these mappings require the use of corporate characters.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Symbol character set uses the standard control characters
|
||||
# at 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Symbol:
|
||||
# -----------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported directly in programming
|
||||
# interfaces for QuickDraw Text, the Script Manager, and related
|
||||
# Text Utilities. For other purposes it is supported via transcoding
|
||||
# to and from Unicode.
|
||||
#
|
||||
# The Mac OS Symbol encoding shares the script code smRoman
|
||||
# (0) with the Mac OS Roman encoding. To determine if the Symbol
|
||||
# encoding is being used, you must check if the font name is
|
||||
# "Symbol".
|
||||
#
|
||||
# Before Mac OS 8.5, code point 0xA0 was unused. In Mac OS 8.5
|
||||
# and later versions, code point 0xA0 is EURO SIGN and maps to
|
||||
# U+20AC (the Symbol font is updated for Mac OS 8.5 to reflect
|
||||
# this).
|
||||
#
|
||||
# The layout of the Mac OS Symbol character set is identical to
|
||||
# the layout of the Adobe Symbol encoding vector, with the
|
||||
# addition of the Apple logo character at 0xF0.
|
||||
#
|
||||
# This character set encodes a number of glyph fragments. Some are
|
||||
# used as extenders: 0x60 is used to extend radical signs, 0xBD and
|
||||
# 0xBE are used to extend vertical and horizontal arrows, etc. In
|
||||
# addition, there are top, bottom, and center sections for
|
||||
# parentheses, brackets, integral signs, and other signs that may
|
||||
# extend vertically for 2 or more lines of normal text. As of
|
||||
# Unicode 3.2, most of these are now encoded in Unicode; a few are
|
||||
# not, so these are mapped using corporate-zone Unicode characters
|
||||
# (see below).
|
||||
#
|
||||
# In addition, Symbol separately encodes both serif and sans-serif
|
||||
# forms for copyright, trademark, and registered signs. Unicode
|
||||
# encodes only the abstract characters, so one set of these (the
|
||||
# sans-serif forms) are also mapped using corporate-zone Unicode
|
||||
# characters (see below).
|
||||
#
|
||||
# The following code points are unused, and are not shown here:
|
||||
# 0x80-0x9F, 0xFF.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# The goals in the mappings provided here are:
|
||||
# - Ensure roundtrip mapping from every character in the Mac OS
|
||||
# Symbol character set to Unicode and back
|
||||
# - Use standard Unicode characters as much as possible, to
|
||||
# maximize interchangeability of the resulting Unicode text.
|
||||
# Whenever possible, avoid having content carried by private-use
|
||||
# characters.
|
||||
#
|
||||
# Some of the characters in the Mac OS Symbol character set do not
|
||||
# correspond to distinct, single Unicode characters. To map these
|
||||
# and satisfy both goals above, we employ various strategies.
|
||||
#
|
||||
# a) If possible, use private use characters in combination with
|
||||
# standard Unicode characters to mark variants of the standard
|
||||
# Unicode character.
|
||||
#
|
||||
# Apple has defined a block of 32 corporate characters as "transcoding
|
||||
# hints." These are used in combination with standard Unicode
|
||||
# characters to force them to be treated in a special way for mapping
|
||||
# to other encodings; they have no other effect. Sixteen of these
|
||||
# transcoding hints are "grouping hints" - they indicate that the next
|
||||
# 2-4 Unicode characters should be treated as a single entity for
|
||||
# transcoding. The other sixteen transcoding hints are "variant tags"
|
||||
# - they are like combining characters, and can follow a standard
|
||||
# Unicode (or a sequence consisting of a base character and other
|
||||
# combining characters) to cause it to be treated in a special way for
|
||||
# transcoding. These always terminate a combining-character sequence.
|
||||
#
|
||||
# The transcoding coding hint used in this mapping table is the
|
||||
# variant tag 0xF87F. Since this is combined with standard Unicode
|
||||
# characters, some characters in the Mac OS Symbol character set map
|
||||
# to a sequence of two Unicodes instead of a single Unicode character.
|
||||
#
|
||||
# For example, the Mac OS Symbol character at 0xE2 is an alternate,
|
||||
# sans-serif form of the REGISTERED SIGN (the standard mapping is for
|
||||
# the abstract character at 0xD2, which here has a serif form). So 0xE2
|
||||
# is mapped to 0x00AE (REGISTERED SIGN) + 0xF87F (a variant tag).
|
||||
#
|
||||
# b) Otherwise, use private use characters by themselves to map
|
||||
# Mac OS Symbol characters which have no relationship to any standard
|
||||
# Unicode character.
|
||||
#
|
||||
# The following additional corporate zone Unicode characters are
|
||||
# used for this purpose here:
|
||||
#
|
||||
# 0xF8E5 radical extender
|
||||
# 0xF8FF Apple logo
|
||||
#
|
||||
# NOTE: The graphic image associated with the Apple logo character
|
||||
# is not authorized for use without permission of Apple, and
|
||||
# unauthorized use might constitute trademark infringement.
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version c01 to version c02:
|
||||
#
|
||||
# - Update mappings for 0xBD from 0xF8E6 to 0x23D0 (use new Unicode
|
||||
# 4.0 char)
|
||||
# - Correct mapping for 0xE0 from 0x22C4 to 0x25CA
|
||||
#
|
||||
# Changes from version b02 to version b03/c01:
|
||||
#
|
||||
# - Update mappings for encoded glyph fragments 0xBE, 0xE6-EF, 0xF4,
|
||||
# 0xF6-FE to use new Unicode 3.2 characters instead of using either
|
||||
# single corporate-use characters (e.g. 0xBE was mapped to 0xF8E7) or
|
||||
# sequences combining a standard Unicode character with a transcoding
|
||||
# hint (e.g. 0xE6 was mapped to 0x0028+0xF870).
|
||||
#
|
||||
# Changes from version n05 to version b02:
|
||||
#
|
||||
# - Encoding changed for Mac OS 8.5; 0xA0 now maps to 0x20AC, EURO
|
||||
# SIGN. 0xA0 was unmapped in earlier versions.
|
||||
#
|
||||
# Changes from version n03 to version n05:
|
||||
#
|
||||
# - Change strict mapping for 0xE1 & 0xF1 from U+2329 & U+232A
|
||||
# to their canonical decompositions, U+3008 & U+3009.
|
||||
#
|
||||
# - Change mapping for the following to use standard Unicode +
|
||||
# transcoding hint, instead of single corporate-zone
|
||||
# character: 0xE2-0xE4, 0xE6-0xEE, 0xF4, 0xF6-0xFE.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x2200 # FOR ALL
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x2203 # THERE EXISTS
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x220D # SMALL CONTAINS AS MEMBER
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x2217 # ASTERISK OPERATOR
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x2212 # MINUS SIGN
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x2245 # APPROXIMATELY EQUAL TO
|
||||
0x41 0x0391 # GREEK CAPITAL LETTER ALPHA
|
||||
0x42 0x0392 # GREEK CAPITAL LETTER BETA
|
||||
0x43 0x03A7 # GREEK CAPITAL LETTER CHI
|
||||
0x44 0x0394 # GREEK CAPITAL LETTER DELTA
|
||||
0x45 0x0395 # GREEK CAPITAL LETTER EPSILON
|
||||
0x46 0x03A6 # GREEK CAPITAL LETTER PHI
|
||||
0x47 0x0393 # GREEK CAPITAL LETTER GAMMA
|
||||
0x48 0x0397 # GREEK CAPITAL LETTER ETA
|
||||
0x49 0x0399 # GREEK CAPITAL LETTER IOTA
|
||||
0x4A 0x03D1 # GREEK THETA SYMBOL
|
||||
0x4B 0x039A # GREEK CAPITAL LETTER KAPPA
|
||||
0x4C 0x039B # GREEK CAPITAL LETTER LAMDA
|
||||
0x4D 0x039C # GREEK CAPITAL LETTER MU
|
||||
0x4E 0x039D # GREEK CAPITAL LETTER NU
|
||||
0x4F 0x039F # GREEK CAPITAL LETTER OMICRON
|
||||
0x50 0x03A0 # GREEK CAPITAL LETTER PI
|
||||
0x51 0x0398 # GREEK CAPITAL LETTER THETA
|
||||
0x52 0x03A1 # GREEK CAPITAL LETTER RHO
|
||||
0x53 0x03A3 # GREEK CAPITAL LETTER SIGMA
|
||||
0x54 0x03A4 # GREEK CAPITAL LETTER TAU
|
||||
0x55 0x03A5 # GREEK CAPITAL LETTER UPSILON
|
||||
0x56 0x03C2 # GREEK SMALL LETTER FINAL SIGMA
|
||||
0x57 0x03A9 # GREEK CAPITAL LETTER OMEGA
|
||||
0x58 0x039E # GREEK CAPITAL LETTER XI
|
||||
0x59 0x03A8 # GREEK CAPITAL LETTER PSI
|
||||
0x5A 0x0396 # GREEK CAPITAL LETTER ZETA
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x2234 # THEREFORE
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x22A5 # UP TACK
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0xF8E5 # radical extender # corporate char
|
||||
0x61 0x03B1 # GREEK SMALL LETTER ALPHA
|
||||
0x62 0x03B2 # GREEK SMALL LETTER BETA
|
||||
0x63 0x03C7 # GREEK SMALL LETTER CHI
|
||||
0x64 0x03B4 # GREEK SMALL LETTER DELTA
|
||||
0x65 0x03B5 # GREEK SMALL LETTER EPSILON
|
||||
0x66 0x03C6 # GREEK SMALL LETTER PHI
|
||||
0x67 0x03B3 # GREEK SMALL LETTER GAMMA
|
||||
0x68 0x03B7 # GREEK SMALL LETTER ETA
|
||||
0x69 0x03B9 # GREEK SMALL LETTER IOTA
|
||||
0x6A 0x03D5 # GREEK PHI SYMBOL
|
||||
0x6B 0x03BA # GREEK SMALL LETTER KAPPA
|
||||
0x6C 0x03BB # GREEK SMALL LETTER LAMDA
|
||||
0x6D 0x03BC # GREEK SMALL LETTER MU
|
||||
0x6E 0x03BD # GREEK SMALL LETTER NU
|
||||
0x6F 0x03BF # GREEK SMALL LETTER OMICRON
|
||||
0x70 0x03C0 # GREEK SMALL LETTER PI
|
||||
0x71 0x03B8 # GREEK SMALL LETTER THETA
|
||||
0x72 0x03C1 # GREEK SMALL LETTER RHO
|
||||
0x73 0x03C3 # GREEK SMALL LETTER SIGMA
|
||||
0x74 0x03C4 # GREEK SMALL LETTER TAU
|
||||
0x75 0x03C5 # GREEK SMALL LETTER UPSILON
|
||||
0x76 0x03D6 # GREEK PI SYMBOL
|
||||
0x77 0x03C9 # GREEK SMALL LETTER OMEGA
|
||||
0x78 0x03BE # GREEK SMALL LETTER XI
|
||||
0x79 0x03C8 # GREEK SMALL LETTER PSI
|
||||
0x7A 0x03B6 # GREEK SMALL LETTER ZETA
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x223C # TILDE OPERATOR
|
||||
#
|
||||
0xA0 0x20AC # EURO SIGN
|
||||
0xA1 0x03D2 # GREEK UPSILON WITH HOOK SYMBOL
|
||||
0xA2 0x2032 # PRIME # minute
|
||||
0xA3 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xA4 0x2044 # FRACTION SLASH
|
||||
0xA5 0x221E # INFINITY
|
||||
0xA6 0x0192 # LATIN SMALL LETTER F WITH HOOK
|
||||
0xA7 0x2663 # BLACK CLUB SUIT
|
||||
0xA8 0x2666 # BLACK DIAMOND SUIT
|
||||
0xA9 0x2665 # BLACK HEART SUIT
|
||||
0xAA 0x2660 # BLACK SPADE SUIT
|
||||
0xAB 0x2194 # LEFT RIGHT ARROW
|
||||
0xAC 0x2190 # LEFTWARDS ARROW
|
||||
0xAD 0x2191 # UPWARDS ARROW
|
||||
0xAE 0x2192 # RIGHTWARDS ARROW
|
||||
0xAF 0x2193 # DOWNWARDS ARROW
|
||||
0xB0 0x00B0 # DEGREE SIGN
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2033 # DOUBLE PRIME # second
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x00D7 # MULTIPLICATION SIGN
|
||||
0xB5 0x221D # PROPORTIONAL TO
|
||||
0xB6 0x2202 # PARTIAL DIFFERENTIAL
|
||||
0xB7 0x2022 # BULLET
|
||||
0xB8 0x00F7 # DIVISION SIGN
|
||||
0xB9 0x2260 # NOT EQUAL TO
|
||||
0xBA 0x2261 # IDENTICAL TO
|
||||
0xBB 0x2248 # ALMOST EQUAL TO
|
||||
0xBC 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xBD 0x23D0 # VERTICAL LINE EXTENSION (for arrows) # for Unicode 4.0 and later
|
||||
0xBE 0x23AF # HORIZONTAL LINE EXTENSION (for arrows) # for Unicode 3.2 and later
|
||||
0xBF 0x21B5 # DOWNWARDS ARROW WITH CORNER LEFTWARDS
|
||||
0xC0 0x2135 # ALEF SYMBOL
|
||||
0xC1 0x2111 # BLACK-LETTER CAPITAL I
|
||||
0xC2 0x211C # BLACK-LETTER CAPITAL R
|
||||
0xC3 0x2118 # SCRIPT CAPITAL P
|
||||
0xC4 0x2297 # CIRCLED TIMES
|
||||
0xC5 0x2295 # CIRCLED PLUS
|
||||
0xC6 0x2205 # EMPTY SET
|
||||
0xC7 0x2229 # INTERSECTION
|
||||
0xC8 0x222A # UNION
|
||||
0xC9 0x2283 # SUPERSET OF
|
||||
0xCA 0x2287 # SUPERSET OF OR EQUAL TO
|
||||
0xCB 0x2284 # NOT A SUBSET OF
|
||||
0xCC 0x2282 # SUBSET OF
|
||||
0xCD 0x2286 # SUBSET OF OR EQUAL TO
|
||||
0xCE 0x2208 # ELEMENT OF
|
||||
0xCF 0x2209 # NOT AN ELEMENT OF
|
||||
0xD0 0x2220 # ANGLE
|
||||
0xD1 0x2207 # NABLA
|
||||
0xD2 0x00AE # REGISTERED SIGN # serif
|
||||
0xD3 0x00A9 # COPYRIGHT SIGN # serif
|
||||
0xD4 0x2122 # TRADE MARK SIGN # serif
|
||||
0xD5 0x220F # N-ARY PRODUCT
|
||||
0xD6 0x221A # SQUARE ROOT
|
||||
0xD7 0x22C5 # DOT OPERATOR
|
||||
0xD8 0x00AC # NOT SIGN
|
||||
0xD9 0x2227 # LOGICAL AND
|
||||
0xDA 0x2228 # LOGICAL OR
|
||||
0xDB 0x21D4 # LEFT RIGHT DOUBLE ARROW
|
||||
0xDC 0x21D0 # LEFTWARDS DOUBLE ARROW
|
||||
0xDD 0x21D1 # UPWARDS DOUBLE ARROW
|
||||
0xDE 0x21D2 # RIGHTWARDS DOUBLE ARROW
|
||||
0xDF 0x21D3 # DOWNWARDS DOUBLE ARROW
|
||||
0xE0 0x25CA # LOZENGE # previously mapped to 0x22C4 DIAMOND OPERATOR
|
||||
0xE1 0x3008 # LEFT ANGLE BRACKET
|
||||
0xE2 0x00AE+0xF87F # REGISTERED SIGN, alternate: sans serif
|
||||
0xE3 0x00A9+0xF87F # COPYRIGHT SIGN, alternate: sans serif
|
||||
0xE4 0x2122+0xF87F # TRADE MARK SIGN, alternate: sans serif
|
||||
0xE5 0x2211 # N-ARY SUMMATION
|
||||
0xE6 0x239B # LEFT PARENTHESIS UPPER HOOK # for Unicode 3.2 and later
|
||||
0xE7 0x239C # LEFT PARENTHESIS EXTENSION # for Unicode 3.2 and later
|
||||
0xE8 0x239D # LEFT PARENTHESIS LOWER HOOK # for Unicode 3.2 and later
|
||||
0xE9 0x23A1 # LEFT SQUARE BRACKET UPPER CORNER # for Unicode 3.2 and later
|
||||
0xEA 0x23A2 # LEFT SQUARE BRACKET EXTENSION # for Unicode 3.2 and later
|
||||
0xEB 0x23A3 # LEFT SQUARE BRACKET LOWER CORNER # for Unicode 3.2 and later
|
||||
0xEC 0x23A7 # LEFT CURLY BRACKET UPPER HOOK # for Unicode 3.2 and later
|
||||
0xED 0x23A8 # LEFT CURLY BRACKET MIDDLE PIECE # for Unicode 3.2 and later
|
||||
0xEE 0x23A9 # LEFT CURLY BRACKET LOWER HOOK # for Unicode 3.2 and later
|
||||
0xEF 0x23AA # CURLY BRACKET EXTENSION # for Unicode 3.2 and later
|
||||
0xF0 0xF8FF # Apple logo
|
||||
0xF1 0x3009 # RIGHT ANGLE BRACKET
|
||||
0xF2 0x222B # INTEGRAL
|
||||
0xF3 0x2320 # TOP HALF INTEGRAL
|
||||
0xF4 0x23AE # INTEGRAL EXTENSION # for Unicode 3.2 and later
|
||||
0xF5 0x2321 # BOTTOM HALF INTEGRAL
|
||||
0xF6 0x239E # RIGHT PARENTHESIS UPPER HOOK # for Unicode 3.2 and later
|
||||
0xF7 0x239F # RIGHT PARENTHESIS EXTENSION # for Unicode 3.2 and later
|
||||
0xF8 0x23A0 # RIGHT PARENTHESIS LOWER HOOK # for Unicode 3.2 and later
|
||||
0xF9 0x23A4 # RIGHT SQUARE BRACKET UPPER CORNER # for Unicode 3.2 and later
|
||||
0xFA 0x23A5 # RIGHT SQUARE BRACKET EXTENSION # for Unicode 3.2 and later
|
||||
0xFB 0x23A6 # RIGHT SQUARE BRACKET LOWER CORNER # for Unicode 3.2 and later
|
||||
0xFC 0x23AB # RIGHT CURLY BRACKET UPPER HOOK # for Unicode 3.2 and later
|
||||
0xFD 0x23AC # RIGHT CURLY BRACKET MIDDLE PIECE # for Unicode 3.2 and later
|
||||
0xFE 0x23AD # RIGHT CURLY BRACKET LOWER HOOK # for Unicode 3.2 and later
|
384
charmap/THAI.TXT
Normal file
384
charmap/THAI.TXT
Normal file
@ -0,0 +1,384 @@
|
||||
#=======================================================================
|
||||
# File name: THAI.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Thai
|
||||
# character set to Unicode 3.2 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update mapping for 0xDB to use new Unicode
|
||||
# 3.2 WORD JOINER instead of ZWNBSP (BOM).
|
||||
# Update URLs. Matches internal utom<b3>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b2>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n07 1998-Feb-05 Update to match internal utom<n5>, ufrm<n13>
|
||||
# and Text Encoding Converter version 1.3:
|
||||
# Use standard Unicodes plus transcoding hints
|
||||
# instead of single corporate characters; see
|
||||
# details below. Also update header comments
|
||||
# to new format.
|
||||
# n04 1995-Nov-17 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n6>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Thai code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode or Unicode sequence
|
||||
# (in hex as 0xNNNN or 0xNNNN+0xNNNN).
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Thai code order.
|
||||
#
|
||||
# Some of these mappings require the use of corporate characters.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Thai character set uses the standard control characters at
|
||||
# 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Thai:
|
||||
# ---------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# Codes 0xA1-0xDA and 0xDF-0xFB are the character set from Thai
|
||||
# standard TIS 620-2533, except that the following changes are
|
||||
# made:
|
||||
# 0xEE is TRADE MARK SIGN (instead of THAI CHARACTER YAMAKKAN)
|
||||
# 0xFA is REGISTERED SIGN (instead of THAI CHARACTER ANGKHANKHU)
|
||||
# 0xFB is COPYRIGHT SIGN (instead of THAI CHARACTER KHOMUT)
|
||||
#
|
||||
# Codes 0x80-0x82, 0x8D-0x8E, 0x91, 0x9D-0x9E, and 0xDB-0xDE are
|
||||
# various additional punctuation marks (e.g. curly quotes,
|
||||
# ellipsis), no-break space, and two special characters "word join"
|
||||
# and "word break".
|
||||
#
|
||||
# Codes 0x83-0x8C, 0x8F, and 0x92-0x9C are for positional variants
|
||||
# of the upper vowels, tone marks, and other signs at 0xD1,
|
||||
# 0xD4-0xD7, and 0xE7-0xED. The positional variants would normally
|
||||
# be considered presentation forms only and not characters. In most
|
||||
# cases they are not typed directly; they are selected automatically
|
||||
# at display time by the WorldScript software. However, using the
|
||||
# Thai-DTP keyboard, the presentation forms can in fact be typed
|
||||
# directly using dead keys. Thus they must be treated as real
|
||||
# characters in the Mac OS Thai encoding. They are mapped using
|
||||
# variant tags; see below.
|
||||
#
|
||||
# Several code points are undefined and unused (they cannot be
|
||||
# typed using any of the Mac OS Thai keyboard layouts): 0x90, 0x9F,
|
||||
# 0xFC-0xFE. These are not shown in the table below.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# The goals in the Apple mappings provided here are:
|
||||
# - Ensure roundtrip mapping from every character in the Mac OS Thai
|
||||
# character set to Unicode and back
|
||||
# - Use standard Unicode characters as much as possible, to maximize
|
||||
# interchangeability of the resulting Unicode text. Whenever possible,
|
||||
# avoid having content carried by private-use characters.
|
||||
#
|
||||
# To satisfy both goals, we use private use characters to mark variants
|
||||
# that are similar to a sequence of one or more standard Unicode
|
||||
# characters.
|
||||
#
|
||||
# Apple has defined a block of 32 corporate characters as "transcoding
|
||||
# hints." These are used in combination with standard Unicode characters
|
||||
# to force them to be treated in a special way for mapping to other
|
||||
# encodings; they have no other effect. Sixteen of these transcoding
|
||||
# hints are "grouping hints" - they indicate that the next 2-4 Unicode
|
||||
# characters should be treated as a single entity for transcoding. The
|
||||
# other sixteen transcoding hints are "variant tags" - they are like
|
||||
# combining characters, and can follow a standard Unicode (or a sequence
|
||||
# consisting of a base character and other combining characters) to
|
||||
# cause it to be treated in a special way for transcoding. These always
|
||||
# terminate a combining-character sequence.
|
||||
#
|
||||
# The transcoding coding hints used in this mapping table are four
|
||||
# variant tags in the range 0xF873-75. Since these are combined with
|
||||
# standard Unicode characters, some characters in the Mac OS Thai
|
||||
# character set map to a sequence of two Unicodes instead of a single
|
||||
# Unicode character. For example, the Mac OS Thai character at 0x83 is a
|
||||
# low-left positional variant of THAI CHARACTER MAI EK (the standard
|
||||
# mapping is for the abstract character at 0xE8). So 0x83 is mapped to
|
||||
# 0x0E48 (THAI CHARACTER MAI EK) + 0xF875 (a variant tag).
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version b02 to version b03/c01:
|
||||
#
|
||||
# - Update mapping for 0xDB to use new Unicode 3.2 character U+2060
|
||||
# WORD JOINER instead of U+FEFF ZERO WIDTH NO-BREAK SPACE (BOM)
|
||||
#
|
||||
# Changes from version n04 to version n07:
|
||||
#
|
||||
# - Changed mappings of the positional variants to use standard
|
||||
# Unicodes + transcoding hint, instead of using single corporate
|
||||
# zone characters. This affected the mappings for the following:
|
||||
# 0x83-08C, 0x8F, 0x92-0x9C
|
||||
#
|
||||
# - Just comment out unused code points in the table, instead
|
||||
# of mapping them to U+FFFD.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0x81 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0x82 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0x83 0x0E48+0xF875 # THAI CHARACTER MAI EK, low left position
|
||||
0x84 0x0E49+0xF875 # THAI CHARACTER MAI THO, low left position
|
||||
0x85 0x0E4A+0xF875 # THAI CHARACTER MAI TRI, low left position
|
||||
0x86 0x0E4B+0xF875 # THAI CHARACTER MAI CHATTAWA, low left position
|
||||
0x87 0x0E4C+0xF875 # THAI CHARACTER THANTHAKHAT, low left position
|
||||
0x88 0x0E48+0xF873 # THAI CHARACTER MAI EK, low position
|
||||
0x89 0x0E49+0xF873 # THAI CHARACTER MAI THO, low position
|
||||
0x8A 0x0E4A+0xF873 # THAI CHARACTER MAI TRI, low position
|
||||
0x8B 0x0E4B+0xF873 # THAI CHARACTER MAI CHATTAWA, low position
|
||||
0x8C 0x0E4C+0xF873 # THAI CHARACTER THANTHAKHAT, low position
|
||||
0x8D 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0x8E 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0x8F 0x0E4D+0xF874 # THAI CHARACTER NIKHAHIT, left position
|
||||
#
|
||||
0x91 0x2022 # BULLET
|
||||
0x92 0x0E31+0xF874 # THAI CHARACTER MAI HAN-AKAT, left position
|
||||
0x93 0x0E47+0xF874 # THAI CHARACTER MAITAIKHU, left position
|
||||
0x94 0x0E34+0xF874 # THAI CHARACTER SARA I, left position
|
||||
0x95 0x0E35+0xF874 # THAI CHARACTER SARA II, left position
|
||||
0x96 0x0E36+0xF874 # THAI CHARACTER SARA UE, left position
|
||||
0x97 0x0E37+0xF874 # THAI CHARACTER SARA UEE, left position
|
||||
0x98 0x0E48+0xF874 # THAI CHARACTER MAI EK, left position
|
||||
0x99 0x0E49+0xF874 # THAI CHARACTER MAI THO, left position
|
||||
0x9A 0x0E4A+0xF874 # THAI CHARACTER MAI TRI, left position
|
||||
0x9B 0x0E4B+0xF874 # THAI CHARACTER MAI CHATTAWA, left position
|
||||
0x9C 0x0E4C+0xF874 # THAI CHARACTER THANTHAKHAT, left position
|
||||
0x9D 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0x9E 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
#
|
||||
0xA0 0x00A0 # NO-BREAK SPACE
|
||||
0xA1 0x0E01 # THAI CHARACTER KO KAI
|
||||
0xA2 0x0E02 # THAI CHARACTER KHO KHAI
|
||||
0xA3 0x0E03 # THAI CHARACTER KHO KHUAT
|
||||
0xA4 0x0E04 # THAI CHARACTER KHO KHWAI
|
||||
0xA5 0x0E05 # THAI CHARACTER KHO KHON
|
||||
0xA6 0x0E06 # THAI CHARACTER KHO RAKHANG
|
||||
0xA7 0x0E07 # THAI CHARACTER NGO NGU
|
||||
0xA8 0x0E08 # THAI CHARACTER CHO CHAN
|
||||
0xA9 0x0E09 # THAI CHARACTER CHO CHING
|
||||
0xAA 0x0E0A # THAI CHARACTER CHO CHANG
|
||||
0xAB 0x0E0B # THAI CHARACTER SO SO
|
||||
0xAC 0x0E0C # THAI CHARACTER CHO CHOE
|
||||
0xAD 0x0E0D # THAI CHARACTER YO YING
|
||||
0xAE 0x0E0E # THAI CHARACTER DO CHADA
|
||||
0xAF 0x0E0F # THAI CHARACTER TO PATAK
|
||||
0xB0 0x0E10 # THAI CHARACTER THO THAN
|
||||
0xB1 0x0E11 # THAI CHARACTER THO NANGMONTHO
|
||||
0xB2 0x0E12 # THAI CHARACTER THO PHUTHAO
|
||||
0xB3 0x0E13 # THAI CHARACTER NO NEN
|
||||
0xB4 0x0E14 # THAI CHARACTER DO DEK
|
||||
0xB5 0x0E15 # THAI CHARACTER TO TAO
|
||||
0xB6 0x0E16 # THAI CHARACTER THO THUNG
|
||||
0xB7 0x0E17 # THAI CHARACTER THO THAHAN
|
||||
0xB8 0x0E18 # THAI CHARACTER THO THONG
|
||||
0xB9 0x0E19 # THAI CHARACTER NO NU
|
||||
0xBA 0x0E1A # THAI CHARACTER BO BAIMAI
|
||||
0xBB 0x0E1B # THAI CHARACTER PO PLA
|
||||
0xBC 0x0E1C # THAI CHARACTER PHO PHUNG
|
||||
0xBD 0x0E1D # THAI CHARACTER FO FA
|
||||
0xBE 0x0E1E # THAI CHARACTER PHO PHAN
|
||||
0xBF 0x0E1F # THAI CHARACTER FO FAN
|
||||
0xC0 0x0E20 # THAI CHARACTER PHO SAMPHAO
|
||||
0xC1 0x0E21 # THAI CHARACTER MO MA
|
||||
0xC2 0x0E22 # THAI CHARACTER YO YAK
|
||||
0xC3 0x0E23 # THAI CHARACTER RO RUA
|
||||
0xC4 0x0E24 # THAI CHARACTER RU
|
||||
0xC5 0x0E25 # THAI CHARACTER LO LING
|
||||
0xC6 0x0E26 # THAI CHARACTER LU
|
||||
0xC7 0x0E27 # THAI CHARACTER WO WAEN
|
||||
0xC8 0x0E28 # THAI CHARACTER SO SALA
|
||||
0xC9 0x0E29 # THAI CHARACTER SO RUSI
|
||||
0xCA 0x0E2A # THAI CHARACTER SO SUA
|
||||
0xCB 0x0E2B # THAI CHARACTER HO HIP
|
||||
0xCC 0x0E2C # THAI CHARACTER LO CHULA
|
||||
0xCD 0x0E2D # THAI CHARACTER O ANG
|
||||
0xCE 0x0E2E # THAI CHARACTER HO NOKHUK
|
||||
0xCF 0x0E2F # THAI CHARACTER PAIYANNOI
|
||||
0xD0 0x0E30 # THAI CHARACTER SARA A
|
||||
0xD1 0x0E31 # THAI CHARACTER MAI HAN-AKAT
|
||||
0xD2 0x0E32 # THAI CHARACTER SARA AA
|
||||
0xD3 0x0E33 # THAI CHARACTER SARA AM
|
||||
0xD4 0x0E34 # THAI CHARACTER SARA I
|
||||
0xD5 0x0E35 # THAI CHARACTER SARA II
|
||||
0xD6 0x0E36 # THAI CHARACTER SARA UE
|
||||
0xD7 0x0E37 # THAI CHARACTER SARA UEE
|
||||
0xD8 0x0E38 # THAI CHARACTER SARA U
|
||||
0xD9 0x0E39 # THAI CHARACTER SARA UU
|
||||
0xDA 0x0E3A # THAI CHARACTER PHINTHU
|
||||
0xDB 0x2060 # WORD JOINER # for Unicode 3.2 and later
|
||||
0xDC 0x200B # ZERO WIDTH SPACE
|
||||
0xDD 0x2013 # EN DASH
|
||||
0xDE 0x2014 # EM DASH
|
||||
0xDF 0x0E3F # THAI CURRENCY SYMBOL BAHT
|
||||
0xE0 0x0E40 # THAI CHARACTER SARA E
|
||||
0xE1 0x0E41 # THAI CHARACTER SARA AE
|
||||
0xE2 0x0E42 # THAI CHARACTER SARA O
|
||||
0xE3 0x0E43 # THAI CHARACTER SARA AI MAIMUAN
|
||||
0xE4 0x0E44 # THAI CHARACTER SARA AI MAIMALAI
|
||||
0xE5 0x0E45 # THAI CHARACTER LAKKHANGYAO
|
||||
0xE6 0x0E46 # THAI CHARACTER MAIYAMOK
|
||||
0xE7 0x0E47 # THAI CHARACTER MAITAIKHU
|
||||
0xE8 0x0E48 # THAI CHARACTER MAI EK
|
||||
0xE9 0x0E49 # THAI CHARACTER MAI THO
|
||||
0xEA 0x0E4A # THAI CHARACTER MAI TRI
|
||||
0xEB 0x0E4B # THAI CHARACTER MAI CHATTAWA
|
||||
0xEC 0x0E4C # THAI CHARACTER THANTHAKHAT
|
||||
0xED 0x0E4D # THAI CHARACTER NIKHAHIT
|
||||
0xEE 0x2122 # TRADE MARK SIGN
|
||||
0xEF 0x0E4F # THAI CHARACTER FONGMAN
|
||||
0xF0 0x0E50 # THAI DIGIT ZERO
|
||||
0xF1 0x0E51 # THAI DIGIT ONE
|
||||
0xF2 0x0E52 # THAI DIGIT TWO
|
||||
0xF3 0x0E53 # THAI DIGIT THREE
|
||||
0xF4 0x0E54 # THAI DIGIT FOUR
|
||||
0xF5 0x0E55 # THAI DIGIT FIVE
|
||||
0xF6 0x0E56 # THAI DIGIT SIX
|
||||
0xF7 0x0E57 # THAI DIGIT SEVEN
|
||||
0xF8 0x0E58 # THAI DIGIT EIGHT
|
||||
0xF9 0x0E59 # THAI DIGIT NINE
|
||||
0xFA 0x00AE # REGISTERED SIGN
|
||||
0xFB 0x00A9 # COPYRIGHT SIGN
|
341
charmap/TURKISH.TXT
Normal file
341
charmap/TURKISH.TXT
Normal file
@ -0,0 +1,341 @@
|
||||
#=======================================================================
|
||||
# File name: TURKISH.TXT
|
||||
#
|
||||
# Contents: Map (external version) from Mac OS Turkish
|
||||
# character set to Unicode 2.1 and later.
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments. Matches internal xml
|
||||
# <c1.1> and Text Encoding Converter 2.0.
|
||||
# b3,c1 2002-Dec-19 Update URLs, notes. Matches internal
|
||||
# utom<b1>.
|
||||
# b02 1999-Sep-22 Update contact e-mail address. Matches
|
||||
# internal utom<b1>, ufrm<b1>, and Text
|
||||
# Encoding Converter version 1.5.
|
||||
# n05 1998-Feb-05 Minor update to header comments
|
||||
# n03 1997-Dec-14 Update to match internal utom<n5>, ufrm<n15>:
|
||||
# Change standard mapping for 0xBD from U+2126
|
||||
# to its canonical decomposition, U+03A9.
|
||||
# n02 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<n4>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Format:
|
||||
# -------
|
||||
#
|
||||
# Three tab-separated columns;
|
||||
# '#' begins a comment which continues to the end of the line.
|
||||
# Column #1 is the Mac OS Turkish code (in hex as 0xNN)
|
||||
# Column #2 is the corresponding Unicode (in hex as 0xNNNN)
|
||||
# Column #3 is a comment containing the Unicode name
|
||||
#
|
||||
# The entries are in Mac OS Turkish code order.
|
||||
#
|
||||
# Two of these mappings requires the use of a corporate character.
|
||||
# See the file "CORPCHAR.TXT" and notes below.
|
||||
#
|
||||
# Control character mappings are not shown in this table, following
|
||||
# the conventions of the standard UTC mapping tables. However, the
|
||||
# Mac OS Turkish character set uses the standard control characters at
|
||||
# 0x00-0x1F and 0x7F.
|
||||
#
|
||||
# Notes on Mac OS Turkish:
|
||||
# ------------------------
|
||||
#
|
||||
# This is a legacy Mac OS encoding; in the Mac OS X Carbon and Cocoa
|
||||
# environments, it is only supported via transcoding to and from
|
||||
# Unicode.
|
||||
#
|
||||
# Mac OS Turkish is used for Turkish.
|
||||
#
|
||||
# The Mac OS Turkish encoding shares the script code smRoman
|
||||
# (0) with the Mac OS Roman encoding. To determine if the Turkish
|
||||
# encoding is being used, you must also check if the system region
|
||||
# code is 24, verTurkey.
|
||||
#
|
||||
# This character set is a variant of standard Mac OS Roman. It adds
|
||||
# upper & lower G with breve, upper & lower S with cedilla, upper I
|
||||
# with dot, and moves the dotless lower i from its position at 0xF5
|
||||
# in standard Mac OS Roman to a position at 0xDD here (leaving the
|
||||
# 0xF5 code point undefined in Mac OS Turkish). This gives a total
|
||||
# of 7 code point differences from standard Mac OS Roman.
|
||||
#
|
||||
# Unicode mapping issues and notes:
|
||||
# ---------------------------------
|
||||
#
|
||||
# The following corporate zone Unicode characters are used in this
|
||||
# mapping:
|
||||
#
|
||||
# 0xF8A0 undefined1, used to map the single undefined code point
|
||||
# in Mac OS Turkish (to obtain roundtrip fidelity for all
|
||||
# code points).
|
||||
# 0xF8FF Apple logo
|
||||
#
|
||||
# NOTE: The graphic image associated with the Apple logo character
|
||||
# is not authorized for use without permission of Apple, and
|
||||
# unauthorized use might constitute trademark infringement.
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version n02 to version n03:
|
||||
#
|
||||
# - Change mapping of 0xBD from U+2126 to its canonical
|
||||
# decomposition, U+03A9.
|
||||
#
|
||||
##################
|
||||
|
||||
0x20 0x0020 # SPACE
|
||||
0x21 0x0021 # EXCLAMATION MARK
|
||||
0x22 0x0022 # QUOTATION MARK
|
||||
0x23 0x0023 # NUMBER SIGN
|
||||
0x24 0x0024 # DOLLAR SIGN
|
||||
0x25 0x0025 # PERCENT SIGN
|
||||
0x26 0x0026 # AMPERSAND
|
||||
0x27 0x0027 # APOSTROPHE
|
||||
0x28 0x0028 # LEFT PARENTHESIS
|
||||
0x29 0x0029 # RIGHT PARENTHESIS
|
||||
0x2A 0x002A # ASTERISK
|
||||
0x2B 0x002B # PLUS SIGN
|
||||
0x2C 0x002C # COMMA
|
||||
0x2D 0x002D # HYPHEN-MINUS
|
||||
0x2E 0x002E # FULL STOP
|
||||
0x2F 0x002F # SOLIDUS
|
||||
0x30 0x0030 # DIGIT ZERO
|
||||
0x31 0x0031 # DIGIT ONE
|
||||
0x32 0x0032 # DIGIT TWO
|
||||
0x33 0x0033 # DIGIT THREE
|
||||
0x34 0x0034 # DIGIT FOUR
|
||||
0x35 0x0035 # DIGIT FIVE
|
||||
0x36 0x0036 # DIGIT SIX
|
||||
0x37 0x0037 # DIGIT SEVEN
|
||||
0x38 0x0038 # DIGIT EIGHT
|
||||
0x39 0x0039 # DIGIT NINE
|
||||
0x3A 0x003A # COLON
|
||||
0x3B 0x003B # SEMICOLON
|
||||
0x3C 0x003C # LESS-THAN SIGN
|
||||
0x3D 0x003D # EQUALS SIGN
|
||||
0x3E 0x003E # GREATER-THAN SIGN
|
||||
0x3F 0x003F # QUESTION MARK
|
||||
0x40 0x0040 # COMMERCIAL AT
|
||||
0x41 0x0041 # LATIN CAPITAL LETTER A
|
||||
0x42 0x0042 # LATIN CAPITAL LETTER B
|
||||
0x43 0x0043 # LATIN CAPITAL LETTER C
|
||||
0x44 0x0044 # LATIN CAPITAL LETTER D
|
||||
0x45 0x0045 # LATIN CAPITAL LETTER E
|
||||
0x46 0x0046 # LATIN CAPITAL LETTER F
|
||||
0x47 0x0047 # LATIN CAPITAL LETTER G
|
||||
0x48 0x0048 # LATIN CAPITAL LETTER H
|
||||
0x49 0x0049 # LATIN CAPITAL LETTER I
|
||||
0x4A 0x004A # LATIN CAPITAL LETTER J
|
||||
0x4B 0x004B # LATIN CAPITAL LETTER K
|
||||
0x4C 0x004C # LATIN CAPITAL LETTER L
|
||||
0x4D 0x004D # LATIN CAPITAL LETTER M
|
||||
0x4E 0x004E # LATIN CAPITAL LETTER N
|
||||
0x4F 0x004F # LATIN CAPITAL LETTER O
|
||||
0x50 0x0050 # LATIN CAPITAL LETTER P
|
||||
0x51 0x0051 # LATIN CAPITAL LETTER Q
|
||||
0x52 0x0052 # LATIN CAPITAL LETTER R
|
||||
0x53 0x0053 # LATIN CAPITAL LETTER S
|
||||
0x54 0x0054 # LATIN CAPITAL LETTER T
|
||||
0x55 0x0055 # LATIN CAPITAL LETTER U
|
||||
0x56 0x0056 # LATIN CAPITAL LETTER V
|
||||
0x57 0x0057 # LATIN CAPITAL LETTER W
|
||||
0x58 0x0058 # LATIN CAPITAL LETTER X
|
||||
0x59 0x0059 # LATIN CAPITAL LETTER Y
|
||||
0x5A 0x005A # LATIN CAPITAL LETTER Z
|
||||
0x5B 0x005B # LEFT SQUARE BRACKET
|
||||
0x5C 0x005C # REVERSE SOLIDUS
|
||||
0x5D 0x005D # RIGHT SQUARE BRACKET
|
||||
0x5E 0x005E # CIRCUMFLEX ACCENT
|
||||
0x5F 0x005F # LOW LINE
|
||||
0x60 0x0060 # GRAVE ACCENT
|
||||
0x61 0x0061 # LATIN SMALL LETTER A
|
||||
0x62 0x0062 # LATIN SMALL LETTER B
|
||||
0x63 0x0063 # LATIN SMALL LETTER C
|
||||
0x64 0x0064 # LATIN SMALL LETTER D
|
||||
0x65 0x0065 # LATIN SMALL LETTER E
|
||||
0x66 0x0066 # LATIN SMALL LETTER F
|
||||
0x67 0x0067 # LATIN SMALL LETTER G
|
||||
0x68 0x0068 # LATIN SMALL LETTER H
|
||||
0x69 0x0069 # LATIN SMALL LETTER I
|
||||
0x6A 0x006A # LATIN SMALL LETTER J
|
||||
0x6B 0x006B # LATIN SMALL LETTER K
|
||||
0x6C 0x006C # LATIN SMALL LETTER L
|
||||
0x6D 0x006D # LATIN SMALL LETTER M
|
||||
0x6E 0x006E # LATIN SMALL LETTER N
|
||||
0x6F 0x006F # LATIN SMALL LETTER O
|
||||
0x70 0x0070 # LATIN SMALL LETTER P
|
||||
0x71 0x0071 # LATIN SMALL LETTER Q
|
||||
0x72 0x0072 # LATIN SMALL LETTER R
|
||||
0x73 0x0073 # LATIN SMALL LETTER S
|
||||
0x74 0x0074 # LATIN SMALL LETTER T
|
||||
0x75 0x0075 # LATIN SMALL LETTER U
|
||||
0x76 0x0076 # LATIN SMALL LETTER V
|
||||
0x77 0x0077 # LATIN SMALL LETTER W
|
||||
0x78 0x0078 # LATIN SMALL LETTER X
|
||||
0x79 0x0079 # LATIN SMALL LETTER Y
|
||||
0x7A 0x007A # LATIN SMALL LETTER Z
|
||||
0x7B 0x007B # LEFT CURLY BRACKET
|
||||
0x7C 0x007C # VERTICAL LINE
|
||||
0x7D 0x007D # RIGHT CURLY BRACKET
|
||||
0x7E 0x007E # TILDE
|
||||
#
|
||||
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
|
||||
0x81 0x00C5 # LATIN CAPITAL LETTER A WITH RING ABOVE
|
||||
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
|
||||
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
|
||||
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
|
||||
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
|
||||
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
|
||||
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
|
||||
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
|
||||
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
|
||||
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
|
||||
0x8B 0x00E3 # LATIN SMALL LETTER A WITH TILDE
|
||||
0x8C 0x00E5 # LATIN SMALL LETTER A WITH RING ABOVE
|
||||
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
|
||||
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
|
||||
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
|
||||
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
|
||||
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
|
||||
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
|
||||
0x93 0x00EC # LATIN SMALL LETTER I WITH GRAVE
|
||||
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
|
||||
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
|
||||
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
|
||||
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
|
||||
0x98 0x00F2 # LATIN SMALL LETTER O WITH GRAVE
|
||||
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
|
||||
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
|
||||
0x9B 0x00F5 # LATIN SMALL LETTER O WITH TILDE
|
||||
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
|
||||
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
|
||||
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
|
||||
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
|
||||
0xA0 0x2020 # DAGGER
|
||||
0xA1 0x00B0 # DEGREE SIGN
|
||||
0xA2 0x00A2 # CENT SIGN
|
||||
0xA3 0x00A3 # POUND SIGN
|
||||
0xA4 0x00A7 # SECTION SIGN
|
||||
0xA5 0x2022 # BULLET
|
||||
0xA6 0x00B6 # PILCROW SIGN
|
||||
0xA7 0x00DF # LATIN SMALL LETTER SHARP S
|
||||
0xA8 0x00AE # REGISTERED SIGN
|
||||
0xA9 0x00A9 # COPYRIGHT SIGN
|
||||
0xAA 0x2122 # TRADE MARK SIGN
|
||||
0xAB 0x00B4 # ACUTE ACCENT
|
||||
0xAC 0x00A8 # DIAERESIS
|
||||
0xAD 0x2260 # NOT EQUAL TO
|
||||
0xAE 0x00C6 # LATIN CAPITAL LETTER AE
|
||||
0xAF 0x00D8 # LATIN CAPITAL LETTER O WITH STROKE
|
||||
0xB0 0x221E # INFINITY
|
||||
0xB1 0x00B1 # PLUS-MINUS SIGN
|
||||
0xB2 0x2264 # LESS-THAN OR EQUAL TO
|
||||
0xB3 0x2265 # GREATER-THAN OR EQUAL TO
|
||||
0xB4 0x00A5 # YEN SIGN
|
||||
0xB5 0x00B5 # MICRO SIGN
|
||||
0xB6 0x2202 # PARTIAL DIFFERENTIAL
|
||||
0xB7 0x2211 # N-ARY SUMMATION
|
||||
0xB8 0x220F # N-ARY PRODUCT
|
||||
0xB9 0x03C0 # GREEK SMALL LETTER PI
|
||||
0xBA 0x222B # INTEGRAL
|
||||
0xBB 0x00AA # FEMININE ORDINAL INDICATOR
|
||||
0xBC 0x00BA # MASCULINE ORDINAL INDICATOR
|
||||
0xBD 0x03A9 # GREEK CAPITAL LETTER OMEGA
|
||||
0xBE 0x00E6 # LATIN SMALL LETTER AE
|
||||
0xBF 0x00F8 # LATIN SMALL LETTER O WITH STROKE
|
||||
0xC0 0x00BF # INVERTED QUESTION MARK
|
||||
0xC1 0x00A1 # INVERTED EXCLAMATION MARK
|
||||
0xC2 0x00AC # NOT SIGN
|
||||
0xC3 0x221A # SQUARE ROOT
|
||||
0xC4 0x0192 # LATIN SMALL LETTER F WITH HOOK
|
||||
0xC5 0x2248 # ALMOST EQUAL TO
|
||||
0xC6 0x2206 # INCREMENT
|
||||
0xC7 0x00AB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC8 0x00BB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
|
||||
0xC9 0x2026 # HORIZONTAL ELLIPSIS
|
||||
0xCA 0x00A0 # NO-BREAK SPACE
|
||||
0xCB 0x00C0 # LATIN CAPITAL LETTER A WITH GRAVE
|
||||
0xCC 0x00C3 # LATIN CAPITAL LETTER A WITH TILDE
|
||||
0xCD 0x00D5 # LATIN CAPITAL LETTER O WITH TILDE
|
||||
0xCE 0x0152 # LATIN CAPITAL LIGATURE OE
|
||||
0xCF 0x0153 # LATIN SMALL LIGATURE OE
|
||||
0xD0 0x2013 # EN DASH
|
||||
0xD1 0x2014 # EM DASH
|
||||
0xD2 0x201C # LEFT DOUBLE QUOTATION MARK
|
||||
0xD3 0x201D # RIGHT DOUBLE QUOTATION MARK
|
||||
0xD4 0x2018 # LEFT SINGLE QUOTATION MARK
|
||||
0xD5 0x2019 # RIGHT SINGLE QUOTATION MARK
|
||||
0xD6 0x00F7 # DIVISION SIGN
|
||||
0xD7 0x25CA # LOZENGE
|
||||
0xD8 0x00FF # LATIN SMALL LETTER Y WITH DIAERESIS
|
||||
0xD9 0x0178 # LATIN CAPITAL LETTER Y WITH DIAERESIS
|
||||
0xDA 0x011E # LATIN CAPITAL LETTER G WITH BREVE
|
||||
0xDB 0x011F # LATIN SMALL LETTER G WITH BREVE
|
||||
0xDC 0x0130 # LATIN CAPITAL LETTER I WITH DOT ABOVE
|
||||
0xDD 0x0131 # LATIN SMALL LETTER DOTLESS I
|
||||
0xDE 0x015E # LATIN CAPITAL LETTER S WITH CEDILLA
|
||||
0xDF 0x015F # LATIN SMALL LETTER S WITH CEDILLA
|
||||
0xE0 0x2021 # DOUBLE DAGGER
|
||||
0xE1 0x00B7 # MIDDLE DOT
|
||||
0xE2 0x201A # SINGLE LOW-9 QUOTATION MARK
|
||||
0xE3 0x201E # DOUBLE LOW-9 QUOTATION MARK
|
||||
0xE4 0x2030 # PER MILLE SIGN
|
||||
0xE5 0x00C2 # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
|
||||
0xE6 0x00CA # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
|
||||
0xE7 0x00C1 # LATIN CAPITAL LETTER A WITH ACUTE
|
||||
0xE8 0x00CB # LATIN CAPITAL LETTER E WITH DIAERESIS
|
||||
0xE9 0x00C8 # LATIN CAPITAL LETTER E WITH GRAVE
|
||||
0xEA 0x00CD # LATIN CAPITAL LETTER I WITH ACUTE
|
||||
0xEB 0x00CE # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
|
||||
0xEC 0x00CF # LATIN CAPITAL LETTER I WITH DIAERESIS
|
||||
0xED 0x00CC # LATIN CAPITAL LETTER I WITH GRAVE
|
||||
0xEE 0x00D3 # LATIN CAPITAL LETTER O WITH ACUTE
|
||||
0xEF 0x00D4 # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
|
||||
0xF0 0xF8FF # Apple logo
|
||||
0xF1 0x00D2 # LATIN CAPITAL LETTER O WITH GRAVE
|
||||
0xF2 0x00DA # LATIN CAPITAL LETTER U WITH ACUTE
|
||||
0xF3 0x00DB # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
|
||||
0xF4 0x00D9 # LATIN CAPITAL LETTER U WITH GRAVE
|
||||
0xF5 0xF8A0 # undefined1
|
||||
0xF6 0x02C6 # MODIFIER LETTER CIRCUMFLEX ACCENT
|
||||
0xF7 0x02DC # SMALL TILDE
|
||||
0xF8 0x00AF # MACRON
|
||||
0xF9 0x02D8 # BREVE
|
||||
0xFA 0x02D9 # DOT ABOVE
|
||||
0xFB 0x02DA # RING ABOVE
|
||||
0xFC 0x00B8 # CEDILLA
|
||||
0xFD 0x02DD # DOUBLE ACUTE ACCENT
|
||||
0xFE 0x02DB # OGONEK
|
||||
0xFF 0x02C7 # CARON
|
106
charmap/UKRAINE.TXT
Normal file
106
charmap/UKRAINE.TXT
Normal file
@ -0,0 +1,106 @@
|
||||
#=======================================================================
|
||||
# File name: UKRAINE.TXT
|
||||
#
|
||||
# Contents: Notes on Mac OS Ukrainian character set
|
||||
#
|
||||
# Copyright: (c) 1995-2002, 2005 by Apple Computer, Inc., all rights
|
||||
# reserved.
|
||||
#
|
||||
# Contact: charsets@apple.com
|
||||
#
|
||||
# Changes:
|
||||
#
|
||||
# c02 2005-Apr-05 Update header comments.
|
||||
# b3,c1 2002-Dec-19 Update URLs. Matches internal utom<b1>.
|
||||
# b02 1999-Sep-22 Encoding changed for Mac OS 9.0 to merge
|
||||
# with Mac OS Cyrillic and support EURO SIGN;
|
||||
# change mappings for 0xFF. For Mac OS 9.0
|
||||
# there is no longer a separate Mac OS
|
||||
# Ukrainian character set; the mappings are
|
||||
# in CYRILLIC.TXT. Update contact e-mail
|
||||
# address. Matches internal utom<b1>, ufrm<b1>,
|
||||
# and Text Encoding Converter version 1.5.
|
||||
# n04 1998-Feb-05 Update header comments to new format; no
|
||||
# mapping changes. Matches internal utom<2>,
|
||||
# ufrm<13>, and Text Encoding Converter
|
||||
# version 1.3.
|
||||
# n02 1995-Apr-15 First version (after fixing some typos).
|
||||
# Matches internal ufrm<4>.
|
||||
#
|
||||
# Standard header:
|
||||
# ----------------
|
||||
#
|
||||
# Apple, the Apple logo, and Macintosh are trademarks of Apple
|
||||
# Computer, Inc., registered in the United States and other countries.
|
||||
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
|
||||
# throughout this document, "Macintosh" can be used to refer to
|
||||
# Macintosh computers and "Unicode" can be used to refer to the
|
||||
# Unicode standard.
|
||||
#
|
||||
# Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
||||
# either express or implied, with respect to this document and the
|
||||
# included data, its quality, accuracy, or fitness for a particular
|
||||
# purpose. In no event will Apple be liable for direct, indirect,
|
||||
# special, incidental, or consequential damages resulting from any
|
||||
# defect or inaccuracy in this document or the included data.
|
||||
#
|
||||
# These mapping tables and character lists are subject to change.
|
||||
# The latest tables should be available from the following:
|
||||
#
|
||||
# <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
|
||||
#
|
||||
# For general information about Mac OS encodings and these mapping
|
||||
# tables, see the file "README.TXT".
|
||||
#
|
||||
# Notes on Mac OS Ukrainian and Mac OS Cyrillic:
|
||||
# ----------------------------------------------
|
||||
#
|
||||
# Before Mac OS 9.0, there were two separate Slavic Cyrillic
|
||||
# encodings for the Mac OS:
|
||||
#
|
||||
# 1. The Cyrillic currency sign variant (used for localized Russian
|
||||
# and Bulgarian systems), which had the following:
|
||||
# 0xA2 U+00A2 CENT SIGN
|
||||
# 0xB6 U+2202 PARTIAL DIFFERENTIAL
|
||||
# 0xFF U+00A4 CURRENCY SIGN
|
||||
#
|
||||
# 2. The Ukrainian currency sign variant (used for localized Ukrainian
|
||||
# systems and the pre-9.0 Cyrillic Language Kit), which had the
|
||||
# following:
|
||||
# 0xA2 U+0490 CYRILLIC CAPITAL LETTER GHE WITH UPTURN
|
||||
# 0xB6 U+0491 CYRILLIC SMALL LETTER GHE WITH UPTURN
|
||||
# 0xFF U+00A4 CURRENCY SIGN
|
||||
#
|
||||
# Before Mac OS 9.0, The Ukrainian currency sign variant shared the
|
||||
# script code smCyrillic (7) with the Cyrillic currency sign variant.
|
||||
# The Ukrainian currency sign variant was being used if one of the
|
||||
# following was true:
|
||||
# - The system region code was 62, verUkraine (indicates Ukrainian
|
||||
# localized system), or
|
||||
# - The system script was not 7, smCyrillic (indicates Cyrillic
|
||||
# Language Kit instead of localized system).
|
||||
#
|
||||
# For Mac OS 9.0 and later, both currency sign variants were replaced
|
||||
# with a new Euro sign version of Mac OS Cyrillic, which is similar to
|
||||
# the old Ukrainian currency sign variant but changes 0xFF to EURO
|
||||
# SIGN. Mappings for this are in CYRILLIC.TXT.
|
||||
#
|
||||
# Note: There is a common glyph variation in Ukrainian, in which the
|
||||
# glyph for CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I may or
|
||||
# may not have a dot above.
|
||||
#
|
||||
# Details of mapping changes in each version:
|
||||
# -------------------------------------------
|
||||
#
|
||||
# Changes from version n04 to version b02:
|
||||
#
|
||||
# - Encoding changed for Mac OS 9.0 to merge with Mac OS Cyrillic and
|
||||
# support EURO SIGN; 0xFF changed from U+00A4 to U+20AC. For Mac OS
|
||||
# 9.0 there is no longer a separate Mac OS Ukrainian character set, so
|
||||
# the mappings here are deleted; see the mappings in CYRILLIC.TXT.
|
||||
#
|
||||
##################
|
||||
|
||||
##################
|
||||
# For mappings, see CYRILLIC.TXT
|
||||
##################
|
Loading…
x
Reference in New Issue
Block a user