spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / php / cookbook /chap11/ 1 To page 1To page 2To page 3To page 4To page 5To page 6current page
[previous]

PHP Cookbook: Web Automation

Data Center Architect
The Computer Merchant, Ltd
US-MA-chelsea

Justtechjobs.com Post A Job | Post A Resume
Developer News
News Flash: Adobe Has iPhone Workaround
Adobe's Flash 10.1 Goes Mobile (Minus iPhone)
A Salute to Visionary CEOs


Converting ASCII to HTML

Problem

You want to turn plaintext into reasonably formatted HTML.

Solution

First, encode entities with htmlentities( ); then, transform the text into various HTML structures. The pc_ascii2html( ) function shown in Example 11-3 has basic transformations for links and paragraph breaks.

Example 11-3: pc_ascii2html( )

function pc_ascii2html($s) {
  $s = htmlentities($s);
  $grafs = split("\n\n",$s);
  for ($i = 0, $j = count($grafs); $i < $j; $i++) {
    // Link to what seem to be http or ftp URLs
    $grafs[$i] = preg_replace('/((ht|f)tp:\/\/[^\s&]+)/',
                              '<a href="$1">$1</a>',$grafs[$i]);
 
    // Link to email addresses
    $grafs[$i] = preg_replace('/[^@\s]+@([-a-z0-9]+\.)+[a-z]{2,}/i',
        '<a href="mailto:$1">$1</a>',$grafs[$i]);
 
    // Begin with a new paragraph 
    $grafs[$i] = '<p>'.$grafs[$i].'</p>';
  }
  return join("\n\n",$grafs);
}

Discussion

The more you know about what the ASCII text looks like, the better your HTML conversion can be. For example, if emphasis is indicated with *asterisks* or /slashes/ around words, you can add rules that take care of that, as follows:

$grafs[$i] = preg_replace('/(\A|\s)\*([^*]+)\*(\s|\z)/',
                          '$1<b>$2</b>$3',$grafs[$i]);
$grafs[$i] = preg_replace('{(\A|\s)/([^/]+)/(\s|\z)}',
                          '$1<i>$2</i>$3',$grafs[$i]);

See Also

Documentation on preg_replace( ) at http://www.php.net/preg-replace.

Converting HTML to ASCII

Problem

You need to convert HTML to readable, formatted ASCII text.

Solution

If you have access to an external program that formats HTML as ASCII, such as lynx, call it like so:

$file = escapeshellarg($file);
$ascii = `lynx -dump $file`;

Discussion

If you can't use an external formatter, the pc_html2ascii( ) function shown in Example 11-4 handles a reasonable subset of HTML (no tables or frames, though).

Example 11-4: pc_html2ascii( )

function pc_html2ascii($s) {
  // convert links
  $s = preg_replace('/<a\s+.*?href="?([^\" >]*)"?[^>]*>(.*?)<\/a>/i',
                    '$2 ($1)', $s);
 
  // convert <br>, <hr>, <p>, <div> to line breaks
  $s = preg_replace('@<(b|h)r[^>]*>@i',"\n",$s);
  $s = preg_replace('@<p[^>]*>@i',"\n\n",$s);
  $s = preg_replace('@<div[^>]*>(.*)</div>@i',"\n".'$1'."\n",$s);
  
  // convert bold and italic
  $s = preg_replace('@<b[^>]*>(.*?)</b>@i','*$1*',$s);
  $s = preg_replace('@<i[^>]*>(.*?)</i>@i','/$1/',$s);
 
  // decode named entities
  $s = strtr($s,array_flip(get_html_translation_table(HTML_ENTITIES)));
 
  // decode numbered entities
  $s = preg_replace('//e','chr(\\1)',$s);
  
  // remove any remaining tags
  $s = strip_tags($s);
  
  return $s;
}

See Also

Recipe 9.8 for more on get_html_translation_table(); documentation on preg_replace( ) at http://www.php.net/preg-replace, get_html_translation_table( ) at http://www.php.net/get-html-translation-table, and strip_tags( ) at http://www.php.net/strip-tags.


home / programming / php / cookbook /chap11 1 To page 1To page 2To page 3To page 4To page 5To page 6current page
[previous]

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs

webref The latest from WebReference.com Browse >
Building a Banking Application Home Page with OOP · Mixing Scripting Languages · Review: phpFox, a Social Networking CMS with all the Bells and Whistles
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Enterprise 2.0: Social Networking in the Cloud · BroadSoft Marketplace Hastens Pace of Telephony Innovation · Review: HTC Hero for Sprint

Created: March 11, 2003
Revised: March 11, 2003

URL: http://webreference.com/programming/php/chap11/1