This chapter describes the internationalization features of Tru64 UNIX. The first section provides a brief internationalization overview (Section 8.1), after which the following topics are discussed:
Supported languages (Section 8.2)
Using the
localedef
utility to create locales
(Section 8.3)
Converting text from one codeset to another (Section 8.4)
Unicode locales and dense code locales for WLS localization (Section 8.5
Support for the Unicode Standard Version 3.1 and ISO 10646 standards (Section 8.6)
The Configure International Software options (Section 8.7)
Support for the euro currency symbol (Section 8.8)
The dxim input server (Section 8.9)
The internationalized Curses library (Section 8.10)
Additional internationalization utilities and features supported by the operating system (Section 8.11
The term "internationalization" is formally defined by The Open Group as a
"provision within a computer program of the capability of making itself adaptable to the requirements of different native languages, local customs, and coded character sets"
This essentially means that internationalized programs can run in any supported locale without having to be modified. A locale is a software environment that correctly handles the cultural conventions of a particular geographic area, such as China or France, and a language as it is used in that area. So by selecting a Chinese locale, for example, all commands, system messages, and keystrokes can be in Chinese characters and displayed in a way appropriate for Chinese.
Tru64 UNIX is an internationalized operating system that not only allows users to interact with existing applications in their native language, but also supports a full set of application interfaces, referred to as the Worldwide Portability Interfaces (WPI), to enable software developers to write internationalized applications. The original code for these interfaces came from the Open Software Foundation (OSF) and has been enhanced.
The internationalization support in the operating system conforms to The Open Group's CAE specifications for system interfaces and headers (XSH Issue 5), curses (XCURSES Issue 4.2), and commands and utilities (XCU Issue 5). These specifications align with current POSIX and ISO C standards. This conformance ensures that commands, utilities, and libraries have been internationalized, and their corresponding message catalogs have been included in the base system.
Tru64 UNIX conforms to the Chinese Character Input Standard, GB18030-2000, which went into effect on September 1, 2001.
In addition, the operating system supports the X Input Method (XIM) and X Output Method (XOM) to facilitate input of local language characters, text drawing, measurement, and interclient communication. These functions are implemented according to the X11R6.3 specification and include some problem corrections specified by X11R6.4.
Note that the operating system also supports a 32-bit
wchar_t
datatype which in turn enables support for a wide array of codesets,
including the one defined by the ISO 10646 standard.
See the following information about internationalization on the Tru64 UNIX operating system:
Writing Software for the International Market (information for programmers)
The Tru64 UNIX worldwide language support page:
Most locales are included in Worldwide Language Support (WLS) subsets that are optionally installed. Some, as indicated in Table 8-1, are part of the mandatory base operating system.
Locales whose
names end in
.UTF-8
use file code and internal process
code
(wchar_t
encoding) defined in the ISO 10646 and Unicode standards.
Other, non-UTF-8
Unicode locales use traditional UNIX and
proprietary codesets for the file code while using
UTF-32
as the internal process code.
A subset of these Unicode locales have a
@ucs4
modifier; however, they are the same as the locales without
the
@ucs4
modifier.
The universal.UTF-8
locale is also available (for
use by applications rather than end users).
It supports the complete set of
characters in the universal character set (UCS).
See
unicode(5)
UTF-8 and Latin-9 (ISO 8859-15) locales support the
euro currency symbol.
For the most up-to-date list of supported languages and locales, refer
to the
l10n_intro(5)
Table 8-1 lists the languages supported by the operating system and their corresponding locales.
Table 8-1: Languages and Locales
| Language | Locale Name |
| Catalan |
|
| Chinese, Simplified (PRC) |
|
| Chinese, Traditional(Hong Kong) |
|
| Chinese, Traditional (Taiwan) |
|
| Czech |
|
| Danish |
|
| Dutch |
|
| Dutch, Belgian |
|
| English, U.S.(ASCII) | C
(POSIX)
[Footnote 2] |
| English, U.S. |
|
| English, U.K. |
|
| European | en_EU.UTF-8@euro
[Footnote 4]
|
| Finnish |
|
| French |
|
| French, Belgian |
|
| French, Canadian |
|
| French, Swiss |
|
| German |
|
| German, Swiss |
|
| Greek |
|
| Hebrew |
|
| Hungarian |
|
| Icelandic |
|
| Italian |
|
| Japanese |
|
| Korean |
|
| Lithuanian |
|
| Norwegian |
|
| Polish |
|
| Portuguese |
|
| Russian |
|
| Slovak |
|
| Slovene |
|
| Spanish |
|
| Swedish |
|
| Thai |
|
| Turkish |
|
Note that you can switch languages or character sets as necessary and can even operate multiple processes in different languages or codesets in the same system at the same time.
For more information on a particular coded character set, such as
ISO8859-9, see the reference page with the same name.
For more information
about
UCS-4
and
UTF-8
encoding, see
Unicode(5)code_page(5)8.3 Locale Creation
The
localedef
utility allows programmers to create their own locales, compile
their source code, and generate a unique name for their new locale.
For more information on creating locales, see
Writing Software for the International Market.
8.4 Codeset Conversion
The
operating system includes the
iconv
utility and the
iconv_open(),
iconv(), and
iconv_close()
functions, which convert text from one codeset to another, thereby
assisting programmers in the writing of international applications.
For use
with these interfaces, the operating system includes a large set of codeset
converters.
The
en_US.UTF-8 X
locale database file contains font
definitions that include all the various fonts used with the operating system.
Thus, applications running under the
en_US.UTF-8
locale
can display all the font characters installed with Worldwide Language Support
(WLS).
Applications running under the Asian locales display all of the WLS
installed fonts, except for
ISO8859-2,
-4,
-5,
-7,
-8,
-9,
and
TACTIS.
In addition to conversion between different codesets for the same language,
these converters support conversion between different Unicode formats, such
as
UCS-2,
UCS-4, and
UTF-8.
There are also codeset converters that handle the most commonly
used PC code-page formats.
Codeset conversion is also used by the printing subsystem and utilities,
such as
man, to allow processing of files in different
languages and encoding formats.
Additionally, codeset conversion is implemented
in mail utilities for mail interchange with systems using different codesets
and in the X Windows System Toolkit for text input, drawing, and interclient
communication.
For more information on codeset conversion, see the
iconv_intro(5)Unicode(5)code_page(5)8.5 Unicode and Dense Code Locales
When you install Worldwide Language Support, Tru64 UNIX provides localization support with two types of locales: Unicode locales and dense code locales.
Unicode locales conform to Unicode and ISO/IEC 10646 standards and use
UTF-32
as the wide character encoding.
Under
UTF-32
wide character encoding,
wchar_t
values represent the same
characters regardless of the locale and, because Unicode standards prevail,
implementation is consistent across platforms.
Dense code locales use dense code for wide character encoding to minimize table size (that is, codepoints are assigned consecutively with no empty positions).
In addition to
UTF-8
locales, which use ISO 10646
(Unicode) as both the internal and external representation of characters,
the dense code and Unicode locales provide functionally equivalent versions
of many locales.
The dense code locales are those with names that end in a code set other
than
UTF-8
(for example, ISO8859-1, eucJP, GB18030).
The
non-UTF-8
Unicode locales are those that include
@ucs4
at the end of the locale name.
A sample pair of dense and
Unicode locales is
pl_PL.ISO8859-2
and
pl_PL.ISO8859-2@ucs4.
In general, the same charmaps and locale source can be used for dense
code and Unicode locales.
However, characters that are not defined in the
LC_COLLATE
section of the locale source may sort differently
in the two types of locales.
For Latin-1 locales (ISO 8859-1), the dense code and Unicode locales
are identical because Latin-1 characters are the same as the first 256 characters
in Unicode.
The operating system also supports three UCS transformation formats
(UTFs),
UTF-8,
UTF-16, and
UTF-32, all of which are defined in the Unicode standard.
See
Unicode(5)UCS-4, and the transformation
formats.
To switch between Unicode and dense code locales, the system administrator,
as root, uses
i18nconfig
to change the systemwide default
or manually changes the symbolic link
/usr/i18n/lib/nls/dloc
from
./ucsloc
to
./loc.
8.6 Unicode Support
Tru64
UNIX supports the Unicode Standard Version 3.1 and ISO 10646 standards through
a set of
UCS-4
and
UTF-8
based locales.
Codeset conversion capability among
UCS-4
(UTF-32),
UCS-2
(UTF-16), and
UTF-8
formats is provided for all supported codesets.
Conversion
support between Unicode and a number of single-byte PC code pages and from
those PC code pages to the ISO Latin codeset is provided.
For more information
on the Unicode locales, see
Unicode(5)8.7 Configure International Software Utility
The Configure International Software utility allows system administrators to manage country support subsets, Asian terminal drivers, installed font files, the local language settings and input method, user accounts, and the Japanese Input Method (Wnn). Configuration of these WLS options establishes an operating system environment for writing and using internationalized applications. These options also allow system administrators and users to display keyboard mappings.
The Configure International Software utility is a menu-oriented function available from the SysMan Menu under the Software option. You must be root, or have the appropriate system administrator privileges, to use the Configure International Software utility to do the following:
View and delete installed support for selected countries. Non-root users can only view current country support.
Configure support options, including Asian terminal driver support, Thai language support, pseudo terminal drivers with static or dynamic linking, the number of UNIX Terminal Extension (UTX) devices, and the rebuilding of the kernel after changes. Nonroot users cannot perform this task.
View and delete installed fonts. Nonroot users can only view installed fonts.
View installed keyboard map files and sort the display. Nonroot users can also view and sort installed keyboard map files.
View installed locales (consisting of installed languages, country support, and codesets), sort the display, change the system default locale, switch between dense code and Unicode locales, and select a locale input method. Nonroot users can only view and sort the display of installed locales.
Configure user, root, and system accounts for WLS support. Nonroot users can configure only the account from which they started the Configure International Software utility.
Configure Wnn, a character-cell input method for Japanese. Nonroot users can only view current Wnn settings.
8.8 Support for the Euro Currency Symbol
Tru64 UNIX
supports the euro currency symbol.
Locales that use the
UTF-8
or Latin-9 (ISO 8859-15) codesets support the euro characters, while locales
with a
@euro
suffix define the local currency sign to be
the euro character.
The locale
en_EU.UTF-8@euro
is an English locale
providing support for the euro symbol, decimal as comma, and period as thousands
separator.
Printer support for the euro character is enabled by a generic
PostScript print filter, (wwpsof).
Keyboard entry of the euro character is supported by key sequences defined
in keymaps and through use of the Compose key.
Also, codeset converters convert
file data between the various encoding formats that support the euro character.
See the
euro(5)wwpsof(8)8.9 The dxim Input Server
The multilingual input server
dxim
gives you the
means to use and manage input methods for Korean, as well as traditional and
simplified Chinese.
The
dxim
input server menu is has two functional
parts: Customizing Input Method Classes and Methods and Customizing Input
Method Window.
Customizing Input Method Classes and Methods allows you to do the following:
Select a class of input methods that is appropriate to the locale of the client application. For an application internationalized for the Chinese language, you select and activate one or more of the following classes: traditional Chinese, simplified Chinese, or Phrase.
Select and activate one or more input methods within a class.
With the exception of the Phrase input method, the traditional and simplified
Chinese classes under
dxim
support the same set of input
methods as
dxhanziim
and
dxhanyuim.
The Phrase input method is a separate class under
dxim
and uses a different database than that used by the operating system Phrase
Utility.
Establish an input method class as the default.
Establish an input method as the default for its class.
Customize the simplified Chinese 5-Shape and Intelligent ABC input method classes.
Customize error bell volume and set the input method invocation key.
The Customizing Input Method Window allows you to do the following:
Increase or decrease the root input window font size.
Set the root input window foreground and background color.
Set the root input window line spacing.
The
dxim
input server can support multiple clients
working under different locales.
When a client application connects to
dxim, the input server determines the client's locale and, if compatible,
uses the default input method.
If the client locale is not compatible with
the default input method,
dxim
searches for an active input
method that is compatible.
The input server uses the first compatible input
method it finds.
For additional information on the
dxim
input server,
see
dxim(1X)dxim
online help.
8.10 Internationalized Curses Library
The operating system supplies an internationalized
Curses library in conformance with X/Open Curses, Issue 4 Version 2.
This
library provides functions for processing characters that span one or multiple
bytes.
These characters may be in either wide-character (wchar_t) or complex-character (cchar_t) formats.
The
complex-character format provides for a single logical character made up of
multiple wide characters.
Some of the components of the complex character
may be nonspacing characters.
For information on the syntax and effect of Curses interfaces, see
curses(3)8.11 Additional Internationalization Features
Tru64 UNIX supports the following internationalization utilities and features:
Base
tty
terminal driver subsystem
This subsystem includes additional BSD line disciplines and STREAMS terminal driver modules for processing data in Chinese, Japanese, Korean, and Thai. For example, the enhanced terminal subsystem supports the following capabilities for these languages:
Japanese Kana-Kanji conversion input method
Character-based line processing in cooked mode
Input line history and editing (BSD line discipline only)
Software on-demand-loading for user-defined characters
Conversion between terminal code and application code
The
asort
utility
This utility, an extension of the
sort
command, allows
characters of ideogrammatic languages, like Chinese and Japanese, to be sorted
according to multiple collation sequences.
For more information on the
asort
utility, see
asort(1)
Multilingual Emacs editor (MULE) for Asian languages
Mule is a multilingual enhancement to GNU Emacs.
It provides a facility
to display, input, and edit multilingual characters in addition to all GNU
Emacs facilities.
See
mule(1)
User-defined characters in Chinese, Japanese, and Korean
Users can create and define character fonts and their attributes, including
bitmap fonts, with the
cedit
and
cgen
utilities.
Font-rendering facilities are available so that X clients can use
UDC databases through the X server or font server to obtain bitmap fonts for
user-defined characters.
For more information on user-defined characters, see
Writing Software for the International Market,
cedit(1)cgen(1)
Printing plain text and PostScript files for various languages
Tru64 UNIX provides outline fonts for high quality printing on PostScript
printers.
In addition to print filters for a variety of local-language printers,
generic internationalized print filters are available for use with a variety
of printers.
One of these filters,
wwpsof, supports printing
of local-language files on PostScript printers that do not include the required
fonts.
For more information on internationalized printing features, see the
i18n_printing(5)pcfof(8)wwpsof(8)
Mail and 8-Bit Character Support
By default, the operating system provides support for 8-bit character
encoding in
mailx,
dtmail,
MH, and
comsat.
See
mailx(1)dtmail(1)mh(1)comsat(8)
This command recognizes UCS-2 and UCS-4 encoding in any locale setting.
For other encoding formats, the command recognizes file data encoding if it
is valid for the current locale setting.
This command also has a
jfile
alias that, in any locale, can recognize DEC Kanji, Japanese
EUC, Shift JIS, and 7-bit JIS encoding.
Internationalization for graphical applications
Motif Version 1.2.3 takes advantage of many of the internationalization
features of X11R6 and the C library to support locales.
Motif Version 1.2.3
also supports the use of alternate input methods, which allows input of non-ISO
Latin-1 keystrokes, and delivers an extensively rewritten
XmText
widget, which supports multibyte and wide-character format and
on-the-spot input style.
Motif supports multibyte and wide-character encoding through the use
of the internationalized X Library functions and C Library functions.
In addition,
the compound string routines include the X11R6
XFontSet
component to allow for the creation of localized strings.
The User Interface Language (UIL) supports the creation of localized UID files through the UIL compiler's -s compile-time option, which causes the compiler to construct localized strings.
Alternate input methods can be specified by a resource on the
VendorShell
widget.
Widgets that are parented by a
Shell
class widget can take advantage of this resource and register themselves
to a specific method for input.