|
haracter
set problems constitute only a part of the whole HTML
internationalization issue. Almost equally important is the problem of
language identification of a document. Lots of aspects of
document presentation depend not only on the character set, but also
on the language of the text.
For example, as I've mentioned before, the same ideographs are used
in many Far East languages, so that in each language they are rendered
by slightly different glyphs and quite different sounds of speech.
Also, different languages using the same character set may differ
greatly in respect to hyphenation, spacing, use of punctuation, and so
on.
To this end, HTML 4.0 introduces the new LANG attribute,
which can be used with most HTML elements to describe the language of
the element contents. A "language" in this context is defined as
"spoken (or written) by human beings for communication of information
to other human beings; computer languages are explicitly excluded."
For example:
<P LANG="fr">Ce paragraphe est en Français</P>
The LANG attribute may take as a value a two-letter
abbreviated code (or tag) of the language. A list of
these codes is defined by ISO
639 standard; these codes should not be confused with country codes
(for example, uk as a language code means Ukrainian, not
United Kingdom).
Also, extended identifiers may be used to designate
different dialects or writing systems of a language, identify the
country in which it is used, and so forth. These extended identifiers
are based on two-letter codes with the addition of subtags
separated by a hyphen (-), for example:
- en-US
-
English language of the USA (two-letter subtags are always interpreted as country codes)
- no-nynorsk
-
Nynorsk variant of Norwegian
- az-cyrillic
-
Azerbaijani language written in Cyrillic script
A registry
of such extended language identifiers is maintained by
IANA.
All LANG values are case insensitive; their complete
syntax is defined by RFC 1766.
Another useful resource is the
document where most known languages are listed along with the
character sets they use.
|