| home / programming / php / cookbook /chap11 / 1 | [previous] [next] |
|
|
You want to display a web page, for example a search result, with certain words highlighted.
Use preg_replace( ) with an array
of patterns and replacements:
$patterns = array('\bdog\b/', '\bcat\b');$replacements = array('<b style="color:black;background-color=#FFFF00">dog</b>','<b style='color:black;background-color=#FF9900">cat</b>');while ($page) {if (preg_match('{^([^<]*)?(</?[^>]+?>)?(.*)$}',$page,$matches)) {print preg_replace($patterns,$replacements,$matches[1]);print $matches[2];$page = $matches[3];}}
The regular expression used with preg_match(
) matches as much text as possible before an HTML tag, then an HTML tag,
and then the rest of the content. The text before the HTML tag has the highlighting
applied to it, the HTML tag is printed out without any highlighting, and the
rest of the content has the same match applied to it. This prevents any highlighting
of words that occur inside HTML tags (in URLs or alt
text, for example) which would prevent the page from displaying properly.
The following program retrieves the URL in $url
and highlights the words in the $words array. Words
are not highlighted when they are part of larger words because they are matched
with the \b Perl-compatible regular expression
operator for finding word boundaries.
$colors = array('FFFF00','FF9900','FF0000','FF00FF','99FF33','33FFCC','FF99FF','00CC33');// build search and replace patterns for regex$patterns = array();$replacements = array();for ($i = 0, $j = count($words); $i < $j; $i++) {$patterns[$i] = '/\b'.preg_quote($words[$i], '/').'\b/';$replacements[$i] = '<b style="color:black;background-color:#' .$colors[$i % 8] .'">' . $words[$i] . '</b>';}// retrieve page$fh = fopen($url,'r') or die($php_errormsg);while (! feof($fh)) {$s .= fread($fh,4096);}fclose($fh);if ($j) {while ($s) {if (preg_match('{^([^<]*)?(</?[^>]+?>)?(.*)$}s',$s,$matches)) {print preg_replace($patterns,$replacements,$matches[1]);print $matches[2];$s = $matches[3];}}} else {print $s;}
Recipe 13.7 for information on capturing text inside HTML tags;
documentation on preg_match( ) at http://www.php.net/preg-match
and preg_replace( ) at http://www.php.net/preg-replace.
You need to extract the URLs that are specified inside an HTML document.
Use the pc_link_extractor( ) function
shown in Example 11-2.
Example 11-2: pc_link_extractor( )
function pc_link_extractor($s) {$a = array();if (preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i',$s,$matches,PREG_SET_ORDER)) {foreach($matches as $match) {array_push($a,array($match[1],$match[2]));}}return $a;}
For example:
$links = pc_link_extractor($page);
The pc_link_extractor( ) function
returns an array. Each element of that array is itself a two-element array.
The first element is the target of the link, and the second element is the text
that is linked. For example:
$links=<<<ENDClick <a href="http://www.oreilly.com">here</a> to visit a computer bookpublisher. Click <a href="http://www.sklar.com">over here</a> to visita computer book author.END;$a = pc_link_extractor($links);print_r($a);Array([0] => Array([0] => http://www.oreilly.com[1] => here)[1] => Array([0] => http://www.sklar.com[1] => over here))
The regular expression in pc_link_extractor(
) won't work on all links, such as those that are constructed with JavaScript
or some hexadecimal escapes, but it should function on the majority of reasonably
well-formed HTML.
Recipe 13.7 for information on capturing text inside HTML tags;
documentation on preg_match_all( ) at http://www.php.net/preg-match-all.
| home / programming / php / cookbook /chap11 / 1 | [previous] [next] |
Created: March 27, 2003
Revised: March 27, 2003
URL: http://webreference.com/programming/php/chap11/1