spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / php / cookbook /chap11 / 1 To page 1To page 2To page 3To page 4To page 5current pageTo page 7
[previous] [next]

PHP Cookbook: Web Automation

Subject Matter Expert - Managed Services (PA)
Next Step Systems
US-PA-Wayne

Justtechjobs.com Post A Job | Post A Resume
Developer News
News Flash: Adobe Has iPhone Workaround
Adobe's Flash 10.1 Goes Mobile (Minus iPhone)
A Salute to Visionary CEOs


Marking Up a Web Page

Problem

You want to display a web page, for example a search result, with certain words highlighted.

Solution

Use preg_replace( ) with an array of patterns and replacements:

$patterns = array('\bdog\b/', '\bcat\b');
$replacements = array('<b style="color:black;background-color=#FFFF00">dog</b>',
                      '<b style='color:black;background-color=#FF9900">cat</b>');
while ($page) {
    if (preg_match('{^([^<]*)?(</?[^>]+?>)?(.*)$}',$page,$matches)) {
        print preg_replace($patterns,$replacements,$matches[1]);
        print $matches[2];
        $page = $matches[3];
    }
}

Discussion

The regular expression used with preg_match( ) matches as much text as possible before an HTML tag, then an HTML tag, and then the rest of the content. The text before the HTML tag has the highlighting applied to it, the HTML tag is printed out without any highlighting, and the rest of the content has the same match applied to it. This prevents any highlighting of words that occur inside HTML tags (in URLs or alt text, for example) which would prevent the page from displaying properly.

The following program retrieves the URL in $url and highlights the words in the $words array. Words are not highlighted when they are part of larger words because they are matched with the \b Perl-compatible regular expression operator for finding word boundaries.

$colors = array('FFFF00','FF9900','FF0000','FF00FF',
                '99FF33','33FFCC','FF99FF','00CC33'); 
 
// build search and replace patterns for regex 
$patterns = array();
$replacements = array();
for ($i = 0, $j = count($words); $i < $j; $i++) {
    $patterns[$i] = '/\b'.preg_quote($words[$i], '/').'\b/';
    $replacements[$i] = '<b style="color:black;background-color:#' .
                         $colors[$i % 8] .'">' . $words[$i] . '</b>';
}
 
// retrieve page 
$fh = fopen($url,'r') or die($php_errormsg);
while (! feof($fh)) {
    $s .= fread($fh,4096);
}
fclose($fh);
 
if ($j) {
    while ($s) {
        if (preg_match('{^([^<]*)?(</?[^>]+?>)?(.*)$}s',$s,$matches)) {
            print preg_replace($patterns,$replacements,$matches[1]);
            print $matches[2];
            $s = $matches[3];
        }
    }
} else {
    print $s;
}

See Also

Recipe 13.7 for information on capturing text inside HTML tags; documentation on preg_match( ) at http://www.php.net/preg-match and preg_replace( ) at http://www.php.net/preg-replace.

Extracting Links from an HTML File

Problem

You need to extract the URLs that are specified inside an HTML document.

Solution

Use the pc_link_extractor( ) function shown in Example 11-2.

Example 11-2: pc_link_extractor( )

function pc_link_extractor($s) {
  $a = array();
  if (preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i',
                     $s,$matches,PREG_SET_ORDER)) {
    foreach($matches as $match) {
      array_push($a,array($match[1],$match[2]));
    }
  }
  return $a;
}

For example:

$links = pc_link_extractor($page);

Discussion

The pc_link_extractor( ) function returns an array. Each element of that array is itself a two-element array. The first element is the target of the link, and the second element is the text that is linked. For example:

$links=<<<END
Click <a href="http://www.oreilly.com">here</a> to visit a computer book 
publisher. Click <a href="http://www.sklar.com">over here</a> to visit 
a computer book author.
END;
 
$a = pc_link_extractor($links);
print_r($a);
Array
(
    [0] => Array
        (
            [0] => http://www.oreilly.com
            [1] => here
        )
    [1] => Array
        (
            [0] => http://www.sklar.com
            [1] => over here
        )
)

The regular expression in pc_link_extractor( ) won't work on all links, such as those that are constructed with JavaScript or some hexadecimal escapes, but it should function on the majority of reasonably well-formed HTML.

See Also

Recipe 13.7 for information on capturing text inside HTML tags; documentation on preg_match_all( ) at http://www.php.net/preg-match-all.


home / programming / php / cookbook /chap11 / 1 To page 1To page 2To page 3To page 4To page 5current pageTo page 7
[previous] [next]

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs

webref The latest from WebReference.com Browse >
Building a Banking Application Home Page with OOP · Mixing Scripting Languages · Review: phpFox, a Social Networking CMS with all the Bells and Whistles
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Enterprise 2.0: Social Networking in the Cloud · BroadSoft Marketplace Hastens Pace of Telephony Innovation · Review: HTC Hero for Sprint

Created: March 27, 2003
Revised: March 27, 2003

URL: http://webreference.com/programming/php/chap11/1