i18n_intro(5)

Index for
Section 5
Alphabetical
listing for I
Bottom of
page
i18n_intro(5)
NAME
  i18n_intro, i18n, LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES,
  LC_MONETARY, LC_NUMERIC, LC_TIME - Introduction to internationalization
  (I18N)

DESCRIPTION
  Internationalization refers to the process of developing programs without
  prior knowledge of the language, cultural data, or character-encoding
  schemes that the programs are expected to handle. In other words,
  internationalization refers to the availability and use of interfaces that
  let programs modify their behavior at run time for operation in a specific
  language environment.	 The abbreviation I18N is often used to stand for
  internationalization as there are 18 characters between the beginning "I"
  and the ending "N" of that word.

  The I18N interfaces and utilities provided in Tru64 UNIX conform to Issue 4
  of X/Open CAE specifications.

  A concept related to internationalization is localization (L10N), which
  refers to the process of establishing information within a computer system
  for each combination of native language, cultural data, and coded character
  set (codeset). A locale is a database that provides information for a
  unique combination of these three components. However, locales do not solve
  all of the problems that localization must address. Many native languages
  require additional support in the form of language-specific print filters,
  fonts, codeset converters, character input methods, and other kinds of
  specialized software.

  For additional introductory information on topics related to
  internationalization, refer to the following reference pages:

  l10n_intro(5)
	  For more information on localization and locales

  iconv_intro(5)
	  For an introduction to codeset conversion

  i18n_printing(5)
	  For a summary of printer support for native languages

  Characters, Character Sets, and Codesets


  A character is a member of a set of elements used for the organization,
  control, or representation of data.

  A character set is a set of alphabetic or other characters used to
  construct the words and other elementary units of a native language or
  computer language.  A character set only specifies the characters that are
  included in the set.	ASCII, CNS 11643 and DTSCS are examples of character
  sets.

  A coded character set (codeset) is a set of unambiguous rules that support
  one or more character sets and establishes the one-to-one relationship
  between each character and its bit representation. In other words, a
  codeset consists of the code points for characters in one or more character
  sets. For example, DEC Hanyu (dechanyu) is a codeset for Chinese and
  contains code points for characters in the ASCII, CNS 11643-1986 (plane 1
  and plane 2), and DTSCS character sets.

  Language Announcement (Setting Locale)


  Language announcement is the mechanism by which language, cultural data,
  and codeset requirements are set either for the system as a whole or by
  individual users. An application can also set these requirements, although
  it is more common for an internationalized application to use the setting
  in effect for the user who runs the program. Refer to the System
  Administration manual for information about setting systemwide defaults for
  shells. Refer to setlocale(3) and Writing Software for the International
  Market for information on how applications query or set locale requirements
  at run time.

  Language announcement is performed by setting one or more reserved
  environment variables to the name of an installed locale. Each locale has
  associated with it collating sequences, character conversion tables,
  character classification tables, formats for different kinds of data, and
  message catalogs. If the same locale meets user requirements in all these
  categories, set only the LANG environment variable to the locale name. A
  locale name usually has the following format:

  language_territory.codeset[@modifier]

  The following Korn shell example sets LANG to a locale supporting the
  English language, United States cultural data, and ISO8859-1 codeset:

       $ LANG=en_US.ISO8859-1

  The following C shell example sets LANG to a locale supporting the
  Traditional Chinese language, Hong Kong cultural data, and the DEC Hanyu
  codeset:

       % setenv LANG zh_HK.dechanyu

  Note that locale name formats can vary from vendor to vendor. Use the
  locale -a command to display the names of locales installed on your system.
  Refer to the l10n_intro(5) reference page for a list of the locales
  provided with the Tru64 UNIX product.

  An alternative way to set locale requirements for all locale categories is
  to set the LC_ALL environment variable. The difference between the LANG and
  LC_ALL variables is that LC_ALL is a high-precedence variable that
  overrides all other locale variables, including LANG. The LANG variable, on
  the other hand, is a low-precedence variable.	 When used by itself, the
  LANG variable implicitly sets all locale categories to the specified locale
  just as LC_ALL does. However, the LANG variable can be used together with
  variables for specific locale categories to create a multilocale
  environment.	The category-specific locale variables and what they control
  follow:

  LC_COLLATE
	  String collation

  LC_CTYPE
	  Character classification

  LC_MESSAGES
	  Translations for messages and valid strings for "yes" and "no"
	  responses

  LC_MONETARY
	  The currency symbol and the format of monetary values

  LC_NUMERIC
	  The format of numeric values

  LC_TIME The format of date and time values

	  A locale can support only one set of date and time formats;
	  however, there can be several sets of date and time formats in use
	  for a particular language and territory. See the l10n_intro(5)
	  reference page for information about creating a site-specific
	  version of a locale to support date and time formats different from
	  those supported by an installed locale.

  Some locale names have one or more @modifier suffixes. A locale with the
  suffix @ucs4 is for use by applications that require internal process code
  to be in UCS-4 format. See Unicode(5) for more information about UCS-4.
  Other @modifier suffixes indicate locale variants that support alternative
  rules for collation in Asian languages. Use locales with these suffixes
  only when setting LC_COLLATE. For example, there are three different sets
  of collation rules (chuyin, radical, and stroke) that can be used with the
  locale supporting the Chinese language, Taiwanese cultural data, and the
  Taiwanese EUC codeset. If Korn shell users want to use this locale, they
  might make the following settings:

       $ LANG=zh_TW.eucTW
       $ LC_COLLATE=zh_TW.eucTW@stroke

  The preceding example implicitly sets all locale category variables to
  zh_TW.eucTW, except for the LC_COLLATE variable, which is set to
  zh_TW.eucTW@stroke. The following locale command displays the variable
  settings after these assignments:

       $ locale
       LANG=zh_TW.eucTW
       LC_COLLATE=zh_TW.eucTW@stroke
       LC_CTYPE="zh_TW.eucTW"
       LC_MONETARY="zh_TW.eucTW"
       LC_NUMERIC="zh_TW.eucTW"
       LC_TIME="zh_TW.eucTW"
       LC_MESSAGES="zh_TW.eucTW"
       LC_ALL=

SEE ALSO
  Commands: locale(1), setlocale(3)

  Others: i18n_printing(5), iconv_intro(5), l10n_intro(5), Unicode(5)

  Writing Software for the International Market

  System Administration
Index for
Section 5
Alphabetical
listing for I
Top of
page