Tutorial 17: Shady Characters - HTML with Style | 4 | WebReference

Tutorial 17: Shady Characters - HTML with Style | 4

index12345summary

Tutorial 17: Shady Characters

Specifying the character encoding

The best way to specify the character is not using HTML, but using HTTP, the HyperText Transfer Protocol. HTTP has a pretty impressive way of negotiating encoding. Browsers will send a server a list of encodings they understand, and the server will attempt to send the document in an encoding that they understand. This means, specifically, that you need to take care to properly configure your Web server. This is entirely beyond the scope of this tutorial, which concerns itself with HTML, but if you are responsible for configuring your Web server, take care that it understands what encodings your documents are in. Most advanced Web servers can even change the encoding of documents on the fly to match browser's requirements, but they must know what encoding the document uses initially before they translate it into a different encoding.

This is fine when you actually have control of your Web server's configuration, but these days, a lot of people don't. If your Web server can not be configured to properly negotiate encoding, there is a hackish little method you can use to actually define the encoding of your document, using the META element. Put simply, you can use something like the following in your document head to specify your document's encoding.

<META HTTP-EQUIV="Content-type" 
      CONTENT="text/html;charset=iso-8859-7">

As we have seen before, the HTTP-EQUIV attribute to META is used to attach HTTP headers to an HTML document. In this case, we are setting the Content-type header. Normally, this should be set appropriately by the Web server, but if it is not, the META element can be used instead.

This of course does create a problem, and that is that the Web browser is already reading the document before it reads the META element. So, all of the document up to and including the META element should be in an encoding that is (a) already understood by the browser and (b) compatible with the encoding specified in the META element. That can be tricky. In this case, I've specified the ISO-8859-7 encoding, which is the ISO standard encoding for Latin and Greek. Happily, this encoding is exactly the same with ISO-8859-1, the standard encoding for Latin text for all non-accented Latin characters and punctuation. So as long as I don't actually use any Greek characters before the Meta element, I'm safe. This means that the following will create problems:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
 "http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML LANG="el">
 <HEAD>
  <TITLE>Ισολογισμός Τελευταίου Έτους</TITLE>
  <META HTTP-EQUIV="Content-type" 
      CONTENT="text/html;charset=iso-8859-7">

Here, I'm using Greek characters in the TITLE element's content, but the browser does not recognize them because it hasn't been notified of the encoding (assuming the Web server serving this document didn't select the encoding correctly in the first place). The correct way to do this then would be the following:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
 "http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML LANG="el">
 <HEAD>
  <META HTTP-EQUIV="Content-type" 
      CONTENT="text/html;charset=iso-8859-7">
  <TITLE>Ισολογισμός Τελευταίου Έτους</TITLE>

In any case, it's best if you can set up your Web server to correctly understand and negotiate encodings instead of trying to set them in HTML, but it's a good idea to set them in HTML anyway in case your document has to be viewed out of context, but remember that the method above is just a hack; properly setting your encoding on the server side is the only way to be sure that the message gets across.

Character references

Character references are another thing HTML inherited from its rich daddy SGML. Character references allow authors (that's you) to use characters using their number in the document character set or a symbolic name. These two types of character references are called numeric character references and character entity references respectively.

index12345summary

http://www.internet.com/

All Rights Reserved. Legal Notices.

URL: http://www.webreference.com/html/tutorial17/3.html

Produced by Stephanos Piperoglou
Created: December 02, 1999
Revised: December 15, 1999