[Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


1   Character Sets

The Compaq Tru64 UNIX (formerly DIGITAL UNIX) operating system software supports the following Chinese character sets:

For traditional Chinese characters the CNS 11643 and Big-5 character sets are commonly used. The GB2312-80 character set is commonly used for Simplified Chinese characters. The Unicode and ISO/IEC 10646 character sets are common to both traditional and Simplified Chinese.


[Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


1.1   CNS 11643

The CNS (Chinese National Standard) 11643 character set standard was published by the National Bureau of Standards of Taiwan in 1986 and was updated in 1992. It was also called "Standard Interchange Code for Generally-used Chinese Character" (SICGCC). CNS 11643 provides 16 character planes for defining Chinese characters. Each character plane is divided into 94 rows and each row has 94 columns. Altogether, a total number of 8,836 characters can be accommodated in each plane. Character planes 1-11 are reserved for defining standard Chinese characters while character planes 12-16 are user-defined areas.

Figure 1-1: CNS 11643 Character Planes

CNS 11643 Character Planes

The original CNS 11643 standard, published in 1986, defines certain groups of characters only on the first and second character planes. Table 1-1 shows these groups of characters.

Table 1-1: Characters Defined in CNS 11643-1986

Character Plane Character Type Number of Characters
Plane 1 Special characters
Control characters
Frequently-used characters
651
33
5,401
Plane 2 Less frequently-used characters 7,650

Figure 1-2 and Figure 1-3 illustrate the positions of these characters in the first and second character planes.

Figure 1-2: CNS 11643 First Character Planes

CNS 11643 First Character Planes

Figure 1-3: CNS 11643 Second Character Plane

CNS 11643 Second Character Plane

As the CNS11643-1986 character set was not rich enough to meet most of the application requirements, such as names and addresses, the information industry in Taiwan requested to expand the character set. In 1991, the Bureau of National Standard formed a team to study how to expand CNS 11643. On August 4, 1992, the Bureau of National Standard published the revised CNS 11643 - Chinese Standard Interchange Code (CSIC).

The revised CNS 11643, called CNS 11643-1992, defined 651 special characters, 33 control characters and 48,027 Chinese characters, as shown in Table 1-2.

Table 1-2: Characters Defined in CNS 11643-1992

Character Plane Character Type Number of Characters
Plane 1 Special characters
Control characters
Frequently-used characters
651
33
5,401
Plane 2 Less frequently-used characters 7,650
Plane 3 Rarely-used characters (EDPC Part I) 6,148
Plane 4 Used for residency system, ISO 2nd edition DIS 10646 Han characters, 171 EDPC Part II Characters 7,298
Plane 5 Rarely-used characters (Based on the Ministry of Education publications) 8,603
Plane 6 Variants based on the Ministry of Education publications (<=14 strokes) 6,388
Plane 7 Variants based on the Ministry of Education publications (>14 strokes) 6,539

Since the number of characters defined in CNS11643-1992 is far greater than those required for general use, the revised CNS 11643 is called "Chinese Standard Interchange Code (CSIC)".

Note

In this release, the new characters added to CNS 11643-1992 are not supported. Only the characters defined in CNS 11643-1986 and DTSCS (which will be described in the next section) are supported.


[Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


1.2   DTSCS

In addition to CNS 11643, the Compaq Tru64 UNIX operating system supports the DIGITAL Taiwan Supplemental Character Set (DTSCS). Currently, only the EDPC Recommended Character Set, which defines a total of 6,319 characters, is included in DTSCS. EDPC Recommended Character Set was first published by the Electronic Data Processing Center of Executive Yuen in June, 1988.

Figure 1-4: EDPC Recommended Character Set

EDPC Recommended Character Set

As a de facto standard, computer vendors support the EDPC Recommended Character Set and assign it to CNS 11643 character plane 14.

In the revised CNS 11643-1992, the 6,319 characters in the EDPC Recommended Character Set are assigned to the third and fourth character planes of CNS 11643, as shown in Table 1-3.

Table 1-3: Mapping of EDPC Recommended Character Set to CNS 11643-1992

EDPC Characters Character Plane Number of Characters
Part I Plane 3 6,148
Part II Plane 4 171


[Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


1.3   Big-5

The Big-5 character set, though not a national standard, is commonly used by the Taiwan information industry, particularly in the PC and workstation market. Big-5 character set was designed to meet the requirements of five major software vendors in Taiwan. Since its publication, much software and hardware, and many peripheral devices have been developed to support Big-5.

Big-5 is very similar to the first two planes of CNS 11643-1992. The frequently-used Chinese characters (5,401) defined in the two character sets are exactly the same except that their positions in the code table are different. For the less frequently-used Chinese characters, Big-5 defines two more characters in addition to the 7,650 characters defined in the second character plane of CNS 11643, and their positions in the code table are different.


[Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


1.4   GB2312-80

The GB2312-80 character set is a standard published by the State Bureau of Standardization of the People's Republic of China (PRC) in 1980 and put in force in May, 1981.

GB2312-80 defines 7,445 characters, including 6,763 Chinese characters:

682 graphic symbols are defined and placed in rows 1-9.

Those are 3,755 frequently-used characters placed in rows 16-55.

Those are 3,008 less frequently-used characters placed in rows 56-87. See Figure 1-5.

The GB2312-80 code table is divided into 94 rows (Qu), numbered from 1 to 94. Each row has 94 columns (Wei), also numbered from 1 to 94.

Figure 1-5: GB2312-80 Character Set

GB2312-80 Character Set


[Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


1.5   Extended GB

The extended GB character set provides 8,836 (94 x 94) code points for defining user-defined characters. The 8,836 code points are divided into two regions:

The extended GB code table is similar to the GB2312 code table. It is divided into 94 rows and each row has 94 columns.


[Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


1.6   Unicode

The Unicode Standard, Version 2.0 specifies a universal character set (UCS) that contains definitions for 38,885 characters and includes a Private Use Area for vendor-defined or user-defined characters. The main features of this character set are:


[Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]


  ISO/IEC 10646

The ISO/IEC 10646 standard, which is specified in Information Technology-Universal Multiple-Octet Coded Character Set, ISO/IEC 10646, allows characters to be specified as either 32-bit units or like Unicode, as 16-bit units. In their 32-bit form, the 16-bit character values in Unicode are zero-extended through a second 16-bit unit to conform to ISO/IEC 10646.


[Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]