| home / programming / php / cookbook /chap11/ 1 | [previous] |
|
|
You want to turn plaintext into reasonably formatted HTML.
First, encode entities with htmlentities(
); then, transform the text into various HTML structures. The pc_ascii2html(
) function shown in Example 11-3 has basic transformations for links
and paragraph breaks.
Example 11-3: pc_ascii2html( )
function pc_ascii2html($s) {$s = htmlentities($s);$grafs = split("\n\n",$s);for ($i = 0, $j = count($grafs); $i < $j; $i++) {// Link to what seem to be http or ftp URLs$grafs[$i] = preg_replace('/((ht|f)tp:\/\/[^\s&]+)/','<a href="$1">$1</a>',$grafs[$i]);// Link to email addresses$grafs[$i] = preg_replace('/[^@\s]+@([-a-z0-9]+\.)+[a-z]{2,}/i','<a href="mailto:$1">$1</a>',$grafs[$i]);// Begin with a new paragraph$grafs[$i] = '<p>'.$grafs[$i].'</p>';}return join("\n\n",$grafs);}
The more you know about what the ASCII text looks like, the better your HTML conversion can be. For example, if emphasis is indicated with *asterisks* or /slashes/ around words, you can add rules that take care of that, as follows:
$grafs[$i] = preg_replace('/(\A|\s)\*([^*]+)\*(\s|\z)/','$1<b>$2</b>$3',$grafs[$i]);$grafs[$i] = preg_replace('{(\A|\s)/([^/]+)/(\s|\z)}','$1<i>$2</i>$3',$grafs[$i]);
Documentation on preg_replace( )
at http://www.php.net/preg-replace.
You need to convert HTML to readable, formatted ASCII text.
If you have access to an external program that formats HTML as ASCII, such as lynx, call it like so:
$file = escapeshellarg($file);$ascii = `lynx -dump $file`;
If you can't use an external formatter, the pc_html2ascii(
) function shown in Example 11-4 handles a reasonable subset of HTML
(no tables or frames, though).
Example 11-4: pc_html2ascii( )
function pc_html2ascii($s) {// convert links$s = preg_replace('/<a\s+.*?href="?([^\" >]*)"?[^>]*>(.*?)<\/a>/i','$2 ($1)', $s);// convert <br>, <hr>, <p>, <div> to line breaks$s = preg_replace('@<(b|h)r[^>]*>@i',"\n",$s);$s = preg_replace('@<p[^>]*>@i',"\n\n",$s);$s = preg_replace('@<div[^>]*>(.*)</div>@i',"\n".'$1'."\n",$s);// convert bold and italic$s = preg_replace('@<b[^>]*>(.*?)</b>@i','*$1*',$s);$s = preg_replace('@<i[^>]*>(.*?)</i>@i','/$1/',$s);// decode named entities$s = strtr($s,array_flip(get_html_translation_table(HTML_ENTITIES)));// decode numbered entities$s = preg_replace('//e','chr(\\1)',$s);// remove any remaining tags$s = strip_tags($s);return $s;}
Recipe 9.8 for more on get_html_translation_table();
documentation on preg_replace( ) at http://www.php.net/preg-replace,
get_html_translation_table( ) at http://www.php.net/get-html-translation-table,
and strip_tags( ) at http://www.php.net/strip-tags.
| home / programming / php / cookbook /chap11 1 | [previous] |
Created: March 11, 2003
Revised: March 11, 2003
URL: http://webreference.com/programming/php/chap11/1