spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / html / refactoring_html3
[next]

Ad Copywriter
Aquent
US-VA-Richmond

Justtechjobs.com Post A Job | Post A Resume
Developer News
Get Ready for Microsoft's 'Oslo' Modeling Tool
Latest Linux Hits Networking Flaws
Metasploit 3.2 Offers More 'Evil Deeds'

Refactoring HTML: Well-Formedness - Part 3

Elliotte Rusty Harold

Digg This Add to del.icio.us

Convert Text to UTF-8

Reencode all text as Unicode UTF-8.

Motivation

Pages that use any content except basic ASCII have cross-platform display problems. Windows encodings are not interpreted correctly on the Mac and vice versa. Web browsers guess what encoding they think a page is in, but they often guess wrong.

UTF-8 is a standard encoding that works across all web browsers and is supported by all major text editors and other tools. It is reasonably fast, small, and efficient. It can support all Unicode characters and is a good basis for internationalization and localization of pages.

Potential Trade-offs

You need to be able to control your web servers HTTP response headers to properly implement this. This can be problematic in shared hosting environments. Bad tools do not always recognize UTF-8 when they should.

Mechanics

There are two steps here. First, reencode all content in UTF-8. Second, tell clients that you've done that. Reencoding is straightforward, provided that you know what encoding you're starting with. You have to tell Tidy that you want UTF-8, but once you do, it will do the work:

TagSoup you don't have to tell. It just produces UTF-8 by default.

A number of command-line tools and other programs will also save content in UTF-8 if you ask, such as GNU recode, BBEdit, and jEdit. You should also set your editor of choice to save in UTF-8 by default.

The next step is to tell the browsers that the content is in UTF-8. There are three parts to this.

  1. Add a byte order mark.
  2. Add a meta tag.
  3. Specify the Content-type header.

The byte order mark is Unicode character 0xFEFF, the zero-width space. When this is the first character in a document, the browser should recognize the byte sequence and treat the rest of the content as UTF-8. This shouldn't be necessary, but Internet Explorer and some other tools are more reliable if they have it. Some editors add this automatically and some require you to request it.

The second step is to add a meta tag in the head, such as this one:

The charset=UTF-8 part warns browsers that they're dealing with UTF-8 if they havent figured it out already.

Finally, you want to configure the web server so that it too specifies that the content is UTF-8. This can be tricky. It requires access to your servers configuration files or the ability to override the configuration locally. This may not be possible on a shared host, but it should be possible on a professionally managed server. On Apache, you can do this by adding the following line to your httpd.conf file or your .htaccess file within the content directory:

You really shouldn't have to do all three of these. One should be enough. However, in practice, some tools recognize one of these hints but not the others, and the redundancy doesn't hurt as long as you're consistent.

I do not recommend adding an XML declaration. XML parsers don't need it, and it will confuse some browsers.

home / html / refactoring_html3
[next]



JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
Intel Article: Using Power & Display Context in the Intel Mobile Platform SDK
Internet.com eBook: Real Life Rails
IBM SCA Center Article: Simplifying Composite Applications with Service Component Architecture
Intel PDF: Quad-Core Impacts More Than the Data Center
Internet.com eBook: The Pros and Cons of Outsourcing
Go Parallel Article: Scalable Parallelism with Intel(R) Threading Building Blocks
Intel PDF: Analysis of Early Testing of Intel vPro in Large IT Departments
Internet.com eBook: Best Practices for Developing a Web Site
Intel PDF: IT Agility through Automated, Policy-based Virtual Infrastructure
IBM CIO Whitepaper: The New Information Agenda. Do You Have One?
Microsoft Article: BitLocker Brings Encryption to Windows Server 2008
Microsoft Article: RODCs Transform Branch Office Security
Go Parallel Article: James Reinders on the Intel Parallel Studio Beta Program
Avaya Article: Advancing the State of the Art in Customer Service
IBM Whitepaper: How are other CIOs driving growth?
Adobe Acrobat Connect Pro: Web Conferencing and eLearning Whitepapers
Avaya Article: Avaya AE Services Provide Rapid Telephony Integration with Facebook
Go Parallel Article: Getting Started with TBB on Windows
HP eBook: Storage Networking , Part 1
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Go Parallel Video: Intel(R) Threading Building Blocks: A New Method for Threading in C++
HP Video: Is Your Data Center Ready for a Real World Disaster?
HP On Demand Webcast: Virtualization in Action
Go Parallel Video: Performance and Threading Tools for Game Developers
Rackspace Hosting Center: Customer Videos
Intel vPro Developer Virtual Bootcamp
HP Disaster-Proof Solutions eSeminar
HP On Demand Webcast: Discover the Benefits of Virtualization
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Actuate Download: Free Visual Report Development Tool
Red Gate Download: SQL Backup Pro
Microsoft Download: Silverlight 2 Software Development Kit Beta 2
30-Day Trial: SPAMfighter Exchange Module
Red Gate Download: SQL Toolbelt
IBM SCA Download: Start Building SCA Applications Today
Iron Speed Designer Application Generator
Microsoft Download: Silverlight 2 Beta 2 Runtime
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
IBM IT Innovation Article: Green Servers Provide a Competitive Advantage
Microsoft Article: Expression Web 2 for PHP Developers--Simplify Your PHP Applications
Featured Algorithm: Intel Threading Building Blocks - parallel_reduce
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES
webref The latest from WebReference.com Browse >
Anatomy of an Ajax Application · Popular JavaScript Framework Libraries: An Overview · Controllers: Programming Application Logic - Part 2
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
MS Access and MySQL · Cisco AutoQoS: VoIP QoS for Mere Mortals · While VoIP Adoption Explodes in Enterprise, Carrier Spending Lags

URL: