 |
Index for Section 5 |
|
 |
Alphabetical listing for I |
|
 |
Bottom of page |
|
iconv_intro(5)
NAME
iconv_intro, iconv - Introduction to codeset conversion
DESCRIPTION
Conversion of character encoding from one coded character set (codeset) to
another is an operation that often has to be performed by the operating
system and some applications. For example, the man command supports codeset
conversion to allow one set of reference page files to meet the needs of
locales that support the same language and territory but different codesets
(see man(1)).
The following commands and library interfaces give users and application
developers direct access to codeset conversion operations:
· The iconv command converts characters in a data file from one codeset
to another (see iconv(1)).
· The iconv(), iconv_open(), and iconv_close() functions convert a
string of characters from one codeset to another (see iconv(3),
iconv_open(3), and iconv_close(3)). The iconv command uses these
interfaces to convert characters.
There are two types of codeset converters: algorithmic and table.
Algorithmic converters, which reside in the /usr/lib/nls/loc/iconv
directory, are shared libraries with a predefined entry point for
invocation by functions in the libiconv.so library. Algorithmic converters
are needed for the conversion of multibyte codesets, in part because table
converters cannot handle the required number of character values and also
because some of these codesets require complex handling (see NOTES).
Algorithmic converters are supplied as part of the operating system
product; the internal interfaces that they require are not published for
external use.
Table converters, which reside in the /usr/lib/nls/loc/iconvTable
directory, can be created by using the genxlt command (see genxlt(1)).
These converters can support single-byte codesets and up to 256 encoded
character values.
Names of codeset converters are in the following form:
from-codeset_to-codeset
For example, the following converter converts values from Super DEC Kanji
to Japanese Extended UNIX Code:
sdeckanji_eucJP
The codeset converters produce an invalid character error in response to
characters that cannot be converted from the source codeset to the
destination codeset. This error is always produced for character codes that
are invalid in the source codeset. However, if the error results from
characters that are valid in the source codeset but have no counterparts in
the destination codeset, you can eliminate the error by defining the
ICONV_DEFSTR environment variable to specify a substitute output string.
See the ENVIRONMENT VARIABLES section for more information about using the
ICONV_DEFSTR variable.
It is possible to convert data directly between two codesets or by way of
an intermediate codeset, such as UCS-2, UCS-4, or UTF-8. For conversion of
Chinese characters, be aware that the results of converting a Traditional
Chinese codeset directly to a Simplified Chinese codeset may not be the
same as the results of converting Traditional Chinese first to UCS-2, UCS-
4, or UTF-8 and then to Simplified Chinese.
ENVIRONMENT VARIABLES
Some codeset converters require more complex algorithms than can be
provided through tables. The following environment variables provide
control over conversion behavior for different kinds of codeset converters:
ICONV_ACTION
Controls the behavior for the many-to-one value conversions for
conversion of Traditional Chinese (except for Traditional Chinese
encoded in Telecode) to Simplified Chinese. The valid settings for this
environment variable are as follows:
batch
Specifies that the preferred mapping value (the first one in the
one-to-many mapping list) is always taken. The batch setting is the
ICONV_ACTION default.
conv_all
Specifies that all the possible values are printed to the standard
output, enclosed by braces ({ }), so that the user can later
manually edit the converted file and select the one to use.
conv_all_nosym
Specifies that all the possible values are printed to the standard
output except for punctuation symbols, for which only the preferred
mapping value is printed. As is true for conv-all, the
conv_all_nosym setting prints value choices enclosed by braces so
that the converted file can later be edited.
ICONV_BYTEORDER
Sets byte ordering for UCS-2 or UCS-4 converters only. Valid values are
little-endian (the default) or big-endian. Setting this environment
variable may be necessary when producing UCS-2 or UCS-4 output that
will be processed by codeset converters on platforms other than Tru64
UNIX.
ICONV_DEFSTR[_from-codeset_to-codeset]
Defines the default string to be substituted in output for valid input
characters that cannot be converted from the source codeset to the
destination codeset. The variable value can be an arbitrary string or a
code number. If the value is a code number (for example, 10, 07, 0x10,
or, for Unicode converters, U+1234), the corresponding character in the
output codeset (to-codeset) is printed.
For a given type of codeset conversion, a matching ICONV_DEFSTR_from-
codeset_to-codeset variable has precedence over the ICONV_DEFSTR
variable without the from-codeset_to-codeset suffix. When defining the
variable with the suffix, replace from-codeset_to-codeset with the name
of the codeset converter to which the variable applies. The
ICONV_DEFSTR variable (defined without the suffix) is used by a
converter when no ICONV_DEFSTR_from-codeset_to-codeset variable has
been defined specifically for the type of conversion being done.
If these variables are not defined or are set to the null string, the
characters that cannot be converted are skipped and have no
representation in converted output.
The following converter-specific restrictions apply to ICONV_DEFSTR*
variables:
·
ICONV_DEFSTR* environment variables do not work for converters that
convert between Japanese codesets or between Korean codesets.
·
For converters that handle UCS-2, UCS-4 or UTF-8 format, the only
valid variable value is a code number (such as U+1234 or 0x10) or a
string whose value is a single ASCII character (such as ?). For these
converters, any string value other than a single ASCII character is
ignored and any characters that cannot be converted have no
representation in output.
·
For converters that handle output in UCS-2, UCS-4 or UTF-8 format,
characters that cannot be converted and for which no valid
ICONV_DEFSTR* value has been defined produce an error condition that
aborts the conversion process.
ICONV_NOBOM
Disables generation of the byte-order mark at the beginning of UCS-2 or
UCS-4 output. A valid setting is any value other than a null string.
By default, or if this variable is set to a null string, the byte-order
mark is generated at the beginning of UCS-2 or UCS-4 output.
Codeset converters that process UCS-2 or UCS-4 data on platforms other
than Tru64 UNIX usually require the byte-order mark. Therefore, the
current default behavior of Tru64 UNIX codeset converters produces
output that is more likely to be supported as input to codeset
converters on other platforms. Use the ICONV_NOBOM variable only if
you need backward compatibility with output produced by codeset
converters that were included in versions of Tru64 UNIX prior to Tru64
UNIX Version 4.0D.
ICONV_PHRCONV
Activates phrase conversion for converters that convert from a
Traditional Chinese codeset (except for Traditional Chinese encoded in
Telecode) to a Simplified Chinese codeset or the reverse. When phrase
conversion is activated, a whole phrase in Traditional Chinese is
converted to a different phrase in Simplified Chinese or the reverse.
If ICONV_PHRCONV is set to mark, the converted phrases are be bracketed
by [ and ] to highlight the conversion result for visual checking.
The phrase conversion databases in the /usr/share/phrdb directory are
normal text files with the same file names as those of the algorithmic
converters in /usr/lib/nls/loc/iconv/*. These phrase conversion
databases contain entries for phrase conversion pairs.
FILES
/usr/lib/nls/loc/iconv/*
Algorithmic converters
/usr/lib/nls/loc/iconvTable/*
Table converters
/usr/share/phrdb/*
Phrase conversion databases
SEE ALSO
Commands: genxlt(1), iconv(1), phrase(1)
Functions: iconv(3), iconv_close(3), iconv_open(3)
Others: i18n_intro(5), l10n_intro(5)
 |
Index for Section 5 |
|
 |
Alphabetical listing for I |
|
 |
Top of page |
|