spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / php / cookbook /chap11/ 1 To page 1To page 2To page 3To page 4To page 5To page 6current page
[previous]

PHP Cookbook: Web Automation

Sr. Web Developer
mediabistro.com
US-NY-New York

Justtechjobs.com Post A Job | Post A Resume
Developer News
Microsoft Shows Off Silverlight 4, IE9 Plans
Metasploit Expands Vulnerability Test Framework
HyperCard Reborn?


Converting ASCII to HTML

Problem

You want to turn plaintext into reasonably formatted HTML.

Solution

First, encode entities with htmlentities( ); then, transform the text into various HTML structures. The pc_ascii2html( ) function shown in Example 11-3 has basic transformations for links and paragraph breaks.

Example 11-3: pc_ascii2html( )

function pc_ascii2html($s) {
  $s = htmlentities($s);
  $grafs = split("\n\n",$s);
  for ($i = 0, $j = count($grafs); $i < $j; $i++) {
    // Link to what seem to be http or ftp URLs
    $grafs[$i] = preg_replace('/((ht|f)tp:\/\/[^\s&]+)/',
                              '<a href="$1">$1</a>',$grafs[$i]);
 
    // Link to email addresses
    $grafs[$i] = preg_replace('/[^@\s]+@([-a-z0-9]+\.)+[a-z]{2,}/i',
        '<a href="mailto:$1">$1</a>',$grafs[$i]);
 
    // Begin with a new paragraph 
    $grafs[$i] = '<p>'.$grafs[$i].'</p>';
  }
  return join("\n\n",$grafs);
}

Discussion

The more you know about what the ASCII text looks like, the better your HTML conversion can be. For example, if emphasis is indicated with *asterisks* or /slashes/ around words, you can add rules that take care of that, as follows:

$grafs[$i] = preg_replace('/(\A|\s)\*([^*]+)\*(\s|\z)/',
                          '$1<b>$2</b>$3',$grafs[$i]);
$grafs[$i] = preg_replace('{(\A|\s)/([^/]+)/(\s|\z)}',
                          '$1<i>$2</i>$3',$grafs[$i]);

See Also

Documentation on preg_replace( ) at http://www.php.net/preg-replace.

Converting HTML to ASCII

Problem

You need to convert HTML to readable, formatted ASCII text.

Solution

If you have access to an external program that formats HTML as ASCII, such as lynx, call it like so:

$file = escapeshellarg($file);
$ascii = `lynx -dump $file`;

Discussion

If you can't use an external formatter, the pc_html2ascii( ) function shown in Example 11-4 handles a reasonable subset of HTML (no tables or frames, though).

Example 11-4: pc_html2ascii( )

function pc_html2ascii($s) {
  // convert links
  $s = preg_replace('/<a\s+.*?href="?([^\" >]*)"?[^>]*>(.*?)<\/a>/i',
                    '$2 ($1)', $s);
 
  // convert <br>, <hr>, <p>, <div> to line breaks
  $s = preg_replace('@<(b|h)r[^>]*>@i',"\n",$s);
  $s = preg_replace('@<p[^>]*>@i',"\n\n",$s);
  $s = preg_replace('@<div[^>]*>(.*)</div>@i',"\n".'$1'."\n",$s);
  
  // convert bold and italic
  $s = preg_replace('@<b[^>]*>(.*?)</b>@i','*$1*',$s);
  $s = preg_replace('@<i[^>]*>(.*?)</i>@i','/$1/',$s);
 
  // decode named entities
  $s = strtr($s,array_flip(get_html_translation_table(HTML_ENTITIES)));
 
  // decode numbered entities
  $s = preg_replace('//e','chr(\\1)',$s);
  
  // remove any remaining tags
  $s = strip_tags($s);
  
  return $s;
}

See Also

Recipe 9.8 for more on get_html_translation_table(); documentation on preg_replace( ) at http://www.php.net/preg-replace, get_html_translation_table( ) at http://www.php.net/get-html-translation-table, and strip_tags( ) at http://www.php.net/strip-tags.


home / programming / php / cookbook /chap11 1 To page 1To page 2To page 3To page 4To page 5To page 6current page
[previous]

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs

webref The latest from WebReference.com Browse >
Rolling Out Your Own HTML Application Version Control · HTML 5: Client-side Storage · Working with Ajax Server Extensions
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Wi-Fi Product Watch, November 2009 · Chip Market Recovering From '08 Collapse · Low-Cost Tools to Kickstart Your New Business

Created: March 11, 2003
Revised: March 11, 2003

URL: http://webreference.com/programming/php/chap11/1