| home / html / refactoring_html3 |
[previous] |
|
|
Convert " to " or ' to ' in attribute values.
A quotation mark that appears inside an attribute value delimited with the same style of quotation mark prematurely closes the value. Different browsers deal differently with this situation, but the result is almost never anything you want. Even if you arent transitioning to full XHTML, this refactoring is an important fix.
None. This change can only improve your web pages.
Because this is a real bug that does cause problems on pages, its unlikely to show up in a lot of significant places. You can usually fix all the occurrences by hand fairly easily.
Because the legality or illegality of any one quote mark depends on others, its not easy to check for this problem using regular expressions. However, well-formedness testing will find this problem. Indeed, you may need to fix this one before fixing other, lesser problems because its likely to hide other errors.
As with < and &, this problem is most often caused by blindly copying data from a database or other external source without first scanning it for reserved characters. Be sure to clean the data using a function such as PHPs htmlspecialchars to convert quotation marks and apostrophes into the equivalent entity references before inserting them into attribute values.
Contrary to popular belief, you do not need to escape all quotation marks, only those inside attribute values. You can escape quote marks in plain text if you want to, but this is superfiuous. I usually dont bother. Even inside attribute values, you only need to escape the kind of quote that delimits the attribute value. Because different authors, editors, and tools differ in whether they prefer single or double quote marks, I usually escape both to be safe.
Tidy and TagSoup cannot reliably fix quotation marks inside attribute values. For example, Tidy turned this:
into this:
You shouldn't encounter a lot of these problems, though, so its best to fix them by hand once a validator points them out.
Insert an XHTML DOCTYPE declaration at the start of each document.
The DOCTYPE declaration points to the DTD that is used to resolve entity references. Without it, the only entity references you can use are &, <, >, ', and ". Once you've added it, though, you can use the full set of HTML entity references: ©, , é, and so forth.
The DOCTYPE declaration will also be important in the next chapter when we begin to make documents valid, not merely well-formed.
Adding an XHTML DOCTYPE declaration has the side effect of turning off quirks mode in many browsers. This can affect how a browser renders a document. In general, this is a good thing, because nonquirks mode is much more interoperable. However, if you have old stylesheets that depend on quirks mode for proper appearance, adding a DOCTYPE may break them. You might have to update them to be standards conformant first. This is especially true for stylesheets that do very precise layout calculations.
You can use three possible DTDs for XHTML: frameset, transitional, and strict.
These are indicated by one of the following three DOCTYPE declarations:
In the short run, it doesn't matter which you pick. In the long run, you'll probably want to migrate your documents to the strict DTD, but for now you can use the frameset DTD on any pages that contain frames and the transitional DTD for other documents.
Browsers look at the public identifier to determine what flavor of HTML they're dealing with. However, they will not actually load the DTD from the specified URL. In essence, they already know whats there and don't need to load it every time.
Other, non-HTML-specific tools such as XSLT processors may indeed load the DTD. In this case, you may wish to replace the remote URLs with local copies. For example:
As long as the public identifiers are the same, the browsers will still recognize these.
Some documents on a site may already have DOCTYPE declarations, either XHTML or otherwise. Many tools have added these by default over the years, even though browsers never paid much attention to them. Thus, the first step is to find out what you've already got. Do a multifile search for <!DOCTYPE. Unless you're writing HTML or XML tutorials, any hits you get are almost certain to be preexisting DOCTYPE declarations. In most cases, though, they will not be the right one. Usually, there are only a few variants, so you can do a constant string multifile search and replace to upgrade to the newer XHTML DOCTYPE. Any that dont fit the pattern can be fixed by hand.
Documents that don't have a DOCTYPE are also easy to fix. The DOCTYPE always goes immediately before the <html> start-tag. Thus, all you have to do is search for <html\w and replace it with the following:
You should also take this opportunity to configure your authoring tools to specify the XHTML DOCTYPE by default. Often its a simple checkbox in a preference pane somewhere.
This chapter is an excerpt from the book, Refactoring HTML: Improving the Design of Existing Web Applications by Elliotte Rusty Harold, published by Addison-Wesley Professional, May 2008, ISBN 0321503635, Copyright 2008 Pearson Education, Inc.| home / html / refactoring_html3 |
[previous] |
URL: