spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / php / cookbook /chap11/ 1 To page 1To page 2To page 3To page 4To page 5To page 6current page
[previous]

PHP Cookbook: Web Automation

Java Software Engineer / Architect Sr (IL)
Next Step Systems
US-IL-Chicago

Justtechjobs.com Post A Job | Post A Resume
Developer News
OpenOffice 3.2 Lands Amid Critical Changes
Red Hat, IBM Firmly in KVM Virtualization Camp
Red Hat Talks Up Open Source Cloud Plans


Converting ASCII to HTML

Problem

You want to turn plaintext into reasonably formatted HTML.

Solution

First, encode entities with htmlentities( ); then, transform the text into various HTML structures. The pc_ascii2html( ) function shown in Example 11-3 has basic transformations for links and paragraph breaks.

Example 11-3: pc_ascii2html( )

function pc_ascii2html($s) {
  $s = htmlentities($s);
  $grafs = split("\n\n",$s);
  for ($i = 0, $j = count($grafs); $i < $j; $i++) {
    // Link to what seem to be http or ftp URLs
    $grafs[$i] = preg_replace('/((ht|f)tp:\/\/[^\s&]+)/',
                              '<a href="$1">$1</a>',$grafs[$i]);
 
    // Link to email addresses
    $grafs[$i] = preg_replace('/[^@\s]+@([-a-z0-9]+\.)+[a-z]{2,}/i',
        '<a href="mailto:$1">$1</a>',$grafs[$i]);
 
    // Begin with a new paragraph 
    $grafs[$i] = '<p>'.$grafs[$i].'</p>';
  }
  return join("\n\n",$grafs);
}

Discussion

The more you know about what the ASCII text looks like, the better your HTML conversion can be. For example, if emphasis is indicated with *asterisks* or /slashes/ around words, you can add rules that take care of that, as follows:

$grafs[$i] = preg_replace('/(\A|\s)\*([^*]+)\*(\s|\z)/',
                          '$1<b>$2</b>$3',$grafs[$i]);
$grafs[$i] = preg_replace('{(\A|\s)/([^/]+)/(\s|\z)}',
                          '$1<i>$2</i>$3',$grafs[$i]);

See Also

Documentation on preg_replace( ) at http://www.php.net/preg-replace.

Converting HTML to ASCII

Problem

You need to convert HTML to readable, formatted ASCII text.

Solution

If you have access to an external program that formats HTML as ASCII, such as lynx, call it like so:

$file = escapeshellarg($file);
$ascii = `lynx -dump $file`;

Discussion

If you can't use an external formatter, the pc_html2ascii( ) function shown in Example 11-4 handles a reasonable subset of HTML (no tables or frames, though).

Example 11-4: pc_html2ascii( )

function pc_html2ascii($s) {
  // convert links
  $s = preg_replace('/<a\s+.*?href="?([^\" >]*)"?[^>]*>(.*?)<\/a>/i',
                    '$2 ($1)', $s);
 
  // convert <br>, <hr>, <p>, <div> to line breaks
  $s = preg_replace('@<(b|h)r[^>]*>@i',"\n",$s);
  $s = preg_replace('@<p[^>]*>@i',"\n\n",$s);
  $s = preg_replace('@<div[^>]*>(.*)</div>@i',"\n".'$1'."\n",$s);
  
  // convert bold and italic
  $s = preg_replace('@<b[^>]*>(.*?)</b>@i','*$1*',$s);
  $s = preg_replace('@<i[^>]*>(.*?)</i>@i','/$1/',$s);
 
  // decode named entities
  $s = strtr($s,array_flip(get_html_translation_table(HTML_ENTITIES)));
 
  // decode numbered entities
  $s = preg_replace('//e','chr(\\1)',$s);
  
  // remove any remaining tags
  $s = strip_tags($s);
  
  return $s;
}

See Also

Recipe 9.8 for more on get_html_translation_table(); documentation on preg_replace( ) at http://www.php.net/preg-replace, get_html_translation_table( ) at http://www.php.net/get-html-translation-table, and strip_tags( ) at http://www.php.net/strip-tags.


home / programming / php / cookbook /chap11 1 To page 1To page 2To page 3To page 4To page 5To page 6current page
[previous]


The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers

webref The latest from WebReference.com Browse >
Search Engine Optimization: Selecting and Embedding Keywords · Are Google's Language Translation Web Services Ready for Prime Time? · Installing and Using Meeplace, the Business Review CMS
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
IBM DB2 10 for z/OS: Justifying the Upgrade · Living La Vida Colo: Choosing the Right Colocation Facility · FTC Concerns over Social Media Privacy Linger

Created: March 11, 2003
Revised: March 11, 2003

URL: http://webreference.com/programming/php/chap11/1