spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / xml / 1 To page 1current pageTo page 3To page 4
[previous][next]

Effective XML: 50 Specific Ways to Improve Your XML

Senior Web Analytics Engineer
Professional Technical Resources
US-OR-Beaverton

Justtechjobs.com Post A Job | Post A Resume
Developer News
Mandrake Linux Founder Back, Virtually
Amazon: We're a Technology Company
Sun Expands MySQL With Closed Source

C0 Control Characters

The first 32 Unicode characters with code points from 0 to 31 are known as the C0 controls. They were originally defined in ASCII to control teletypes and other monospace dumb terminals. Aside from the tab, carriage return, and line feed they have no obvious meaning in text. Since XML is text, it does not include binary characters such as NULL (#x00), BEL (#x07), DC1 (#x11) through DC4 (#x14), and so forth. These noncharacters are historical relics. XML 1.0 does not allow them. This is a good
thing. Although dumb terminals and binary-hostile gateways are far less common today than they were twenty years ago, they are still used, and passing these characters through equipment that expects to see plain text can have nasty consequences, including disabling the screen. (One com-mon problem that still occurs is accidentally paging a binary file on a con-sole. This is generally quite ugly and often disables the console.)

A few of these characters occasionally do appear in non-XML text data. For example, the form feed (#x0C) is sometimes used to indicate a page break. Thus moving data from a non-XML system such as a BLOB or CLOB field in a database into an XML document can unexpectedly cause malformedness errors. Text may need to be cleaned before it can be added to an XML document. However, the far more common problem is that a document’s encoding is misidentified, for example, defaulted as UTF-8 when it’s really UTF-16 or ISO-8859-1. In this case, the parser will notice unexpected nulls and throw a wellformedness error.

XML 1.1 fortunately still does not allow raw binary data in an XML document. However, it does allow you to use character references to escape the C0 controls such as form feed and BEL. The parser will resolve them into the actual characters before reporting the data to the client application. You simply can’t include them directly. For example, the following document uses form feeds to separate pages.

However, this style of page break died out with the line printer. Modern systems use stylesheets or explicit markup to indicate page boundaries. For example, you might place each separate page inside a page element or add a pagebreak element where you wanted the break to occur, as shown below.

Better yet, you might not change the markup at all, just write a stylesheet that assigns each rhyme to a separate page. Any of these options would be superior to using form feeds. Most uses of the other C0 controls are equally obsolete.

There is one exception. You still cannot embed a null in an XML document, not even with a character reference. Allowing this would have caused massive problems for C, C++, and other languages that use null-terminated strings. The null is still forbidden, even with character escaping, which means it’s still not possible to directly embed binary data in XML. You have to encode it using Base64 or some similar format first. (See Item 19.)

home / programming / xml / 1 To page 1current pageTo page 3To page 4
[previous][next]

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

webref The latest from WebReference.com Browse >
Working with the DOM Stylesheets Collection · Administering RBAC in PHP 5 CMS Framework · xref: Automatic Cross Referencing Script
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Combine BottomCount() with Other MDX Functions to Add Sophistication · Creating a Daemon with Python · The Coming Voice-over-WiMAX Revolution

Created: March 27, 2003
Revised: October 25, 2003

URL: http://webreference.com/programming/xml/1