| |
o
book on HTML is complete without a section on the ways to overcome
the pronounced Western bias in the language and to provide for its
fruitful application in the worldwide multilingual
environment. This chapter covers the main approaches to this
problem, both those used by practicing webmasters all around the
world and those devised by standard-setting bodies.
The primary problem related to HTML internationalization (or
i18n, as it is often abbreviated: i plus 18 in-between
letters plus n) is the correct rendering of characters used by
other languages. This is why I start by examining different standards
of character encoding (character sets). These standards are classified
by the length of bit combinations they use, from 7-bit ASCII to
Unicode and ISO 10646.
Various HTML internationalization issues were first crystallized in
the important document, RFC 2070.
Then, RFC 2070 provisions were incorporated in the DTD for HTML
version 4.0. However, since at the time of this writing there
was no HTML 4.0 specification available to accompany its DTD, we
will discuss, for the most part, the material of RFC 2070 paying
special attention to the cases where it is not identical to the
declarations of HTML 4.0 DTD.
In the field of HTML proper, this chapter starts by investigating
the new document character set as defined in RFC 2070 and HTML 4.0.
You will be introduced to the important distinction between the
document character set and external character encoding. You'll learn
about existing methods of specifying external character encoding,
proposed additions to handle multilanguage form input, as well as a
number of real-world problems related to HTML character set.
Another big part of the HTML internationalization problem is
language markup, that is, specifying the language of a piece of text
in order to help user agent software to render it, observing the
typography conventions of that language. Some language-specific
aspects of text presentation are also addressed in RFC 2070, which
introduces tools to control writing direction, cursive joining,
rendering of quotation marks, text alignment, and hyphenation. As a
conclusion, I cover briefly the font issues related to HTML
internationalization.
| |