spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / experts / dlab / html unleashed
Developer News
News Flash: Adobe Has iPhone Workaround
Adobe's Flash 10.1 Goes Mobile (Minus iPhone)
A Salute to Visionary CEOs

 

HTML Unleashed: Internationalizing HTML

Language Identification

 
 

Character set problems constitute only a part of the whole HTML internationalization issue.  Almost equally important is the problem of language identification of a document.  Lots of aspects of document presentation depend not only on the character set, but also on the language of the text.

For example, as I've mentioned before, the same ideographs are used in many Far East languages, so that in each language they are rendered by slightly different glyphs and quite different sounds of speech.  Also, different languages using the same character set may differ greatly in respect to hyphenation, spacing, use of punctuation, and so on.

To this end, HTML 4.0 introduces the new LANG attribute, which can be used with most HTML elements to describe the language of the element contents.  A "language" in this context is defined as "spoken (or written) by human beings for communication of information to other human beings; computer languages are explicitly excluded." For example:

<P LANG="fr">Ce paragraphe est en Français</P>

The LANG attribute may take as a value a two-letter abbreviated code (or tag) of the language.  A list of these codes is defined by ISO 639 standard; these codes should not be confused with country codes (for example, uk as a language code means Ukrainian, not United Kingdom).

Also, extended identifiers may be used to designate different dialects or writing systems of a language, identify the country in which it is used, and so forth.  These extended identifiers are based on two-letter codes with the addition of subtags separated by a hyphen (-), for example:

en-US
English language of the USA (two-letter subtags are always interpreted as country codes)

no-nynorsk
Nynorsk variant of Norwegian

az-cyrillic
Azerbaijani language written in Cyrillic script

A registry of such extended language identifiers is maintained by IANA.  All LANG values are case insensitive; their complete syntax is defined by RFC 1766.  Another useful resource is the document where most known languages are listed along with the character sets they use.

 

Produced by Dmitry Kirsanov
Copyright Sams.net Publishing and

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs

webref The latest from WebReference.com Browse >
Building a Banking Application Home Page with OOP · Mixing Scripting Languages · Review: phpFox, a Social Networking CMS with all the Bells and Whistles
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Enterprise 2.0: Social Networking in the Cloud · BroadSoft Marketplace Hastens Pace of Telephony Innovation · Review: HTC Hero for Sprint

Created: Jun. 15, 1997
Revised: Jun. 16, 1997

URL: http://www.webreference.com/dlab/books/html/39-4.html