Tutorial 17: Shady Characters - HTML with Style | 5 | WebReference

Tutorial 17: Shady Characters - HTML with Style | 5

index12345summary

Tutorial 17: Shady Characters

Numeric character references

Numeric character references use a number to refer to a character in the document character set. As we saw earlier, HTML's character set is UCS. There are two ways to use numeric character references. One is to use decimal numbers and the other is to use hexadecimal numbers. Here are some examples:

<P>&#229; or &#xE5; is the latin letter "a"
with a circle above it.</P>
<P>&#1048; or &#x418; is the Cyrillic letter "I".</P>
<P>&#27700; or &#x6C34; is the Chinese character for water.</P>

As you can see, numeric character references start with an ampersand and a hash mark (&#), are followed by a number in decimal or the letter "x" and a number in hexadecimal, and are ended with a semi-colon (;).

å or å is the latin letter "a" with a circle above it.

И or И is the Cyrillic letter "I".

水 or 水 is the Chinese character for water.

If you don't see what you expected to see above, don't worry: numeric character references are nice, but they do have some serious niggles: First of all, if you're using Netscape Navigator or an older version of another browser, you probably won't see the hexadecimal references displayed correctly. This is because many older browsers don't recognize the hexadecimal syntax. For this reason, you should generally avoid the hexadecimal notation until its use becomes more wide-spread. The reason it is included in the specification is that most character sets, including UCS, list characters in hexadecimal in their specification, so it should be easy to look up a character in the specification and just insert it into your document. But, to be on the safe side, use a scientific calculator or something similar to convert the number to decimal.

Also, many older browsers do not properly understand character references with respect to the character set, and attempt to interpret them according to the character encoding. This is another source of confusion for people trying to understand the difference between character set and character encoding (or, you might say, a result of many programmers not understanding said distinction; but I digress). This should not be a problem with this document, as it uses the UTF-8 encoding that covers all of UCS, but it might be a problem if you where using an encoding that did not cover, for instance, the Chinese character for water used in the example above.

Another reason that you may not view the above characters correctly is if your browser does not have access to a font that contains glyphs (images) for these characters. Your browser might understand that the above entity stands for the water character, but be unable to display it. If this is the case, the browser will probably display a question mark or a blank box in place of the character. This is usually not much of a problem; usually, if your audience is expected to be able to read a document containing certain characters, they'll probably have appropriate fonts.

And before you start whining, another reason the above characters don't work for you might be that I speak no languages that use them, so I may have gotten them wrong. In this case, I apologise and welcome criticism on the matter.

In general, numerical character references can cause a lot of trouble with browsers that are not properly internationalized. The best solution is to avoid them and use characters from your selected character encoding instead.

index12345summary

http://www.internet.com/

All Rights Reserved. Legal Notices.

URL: http://www.webreference.com/html/tutorial17/4.html

Produced by Stephanos Piperoglou
Created: December 02, 1999
Revised: December 15, 1999