spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / php / cookbook /chap11 / 1 To page 1To page 2To page 3To page 4To page 5current pageTo page 7
[previous] [next]

PHP Cookbook: Web Automation

Sr. Programmer/Analyst
Professional Technical Resources
US-OR-Portland

Justtechjobs.com Post A Job | Post A Resume
Developer News
Metasploit 3.2 Offers More 'Evil Deeds'
'Thank You Apple. Seriously.'
The Buzz: BlackBerry App Store Seen Next

Marking Up a Web Page

Problem

You want to display a web page, for example a search result, with certain words highlighted.

Solution

Use preg_replace( ) with an array of patterns and replacements:

$patterns = array('\bdog\b/', '\bcat\b');
$replacements = array('<b style="color:black;background-color=#FFFF00">dog</b>',
                      '<b style='color:black;background-color=#FF9900">cat</b>');
while ($page) {
    if (preg_match('{^([^<]*)?(</?[^>]+?>)?(.*)$}',$page,$matches)) {
        print preg_replace($patterns,$replacements,$matches[1]);
        print $matches[2];
        $page = $matches[3];
    }
}

Discussion

The regular expression used with preg_match( ) matches as much text as possible before an HTML tag, then an HTML tag, and then the rest of the content. The text before the HTML tag has the highlighting applied to it, the HTML tag is printed out without any highlighting, and the rest of the content has the same match applied to it. This prevents any highlighting of words that occur inside HTML tags (in URLs or alt text, for example) which would prevent the page from displaying properly.

The following program retrieves the URL in $url and highlights the words in the $words array. Words are not highlighted when they are part of larger words because they are matched with the \b Perl-compatible regular expression operator for finding word boundaries.

$colors = array('FFFF00','FF9900','FF0000','FF00FF',
                '99FF33','33FFCC','FF99FF','00CC33'); 
 
// build search and replace patterns for regex 
$patterns = array();
$replacements = array();
for ($i = 0, $j = count($words); $i < $j; $i++) {
    $patterns[$i] = '/\b'.preg_quote($words[$i], '/').'\b/';
    $replacements[$i] = '<b style="color:black;background-color:#' .
                         $colors[$i % 8] .'">' . $words[$i] . '</b>';
}
 
// retrieve page 
$fh = fopen($url,'r') or die($php_errormsg);
while (! feof($fh)) {
    $s .= fread($fh,4096);
}
fclose($fh);
 
if ($j) {
    while ($s) {
        if (preg_match('{^([^<]*)?(</?[^>]+?>)?(.*)$}s',$s,$matches)) {
            print preg_replace($patterns,$replacements,$matches[1]);
            print $matches[2];
            $s = $matches[3];
        }
    }
} else {
    print $s;
}

See Also

Recipe 13.7 for information on capturing text inside HTML tags; documentation on preg_match( ) at http://www.php.net/preg-match and preg_replace( ) at http://www.php.net/preg-replace.

Extracting Links from an HTML File

Problem

You need to extract the URLs that are specified inside an HTML document.

Solution

Use the pc_link_extractor( ) function shown in Example 11-2.

Example 11-2: pc_link_extractor( )

function pc_link_extractor($s) {
  $a = array();
  if (preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i',
                     $s,$matches,PREG_SET_ORDER)) {
    foreach($matches as $match) {
      array_push($a,array($match[1],$match[2]));
    }
  }
  return $a;
}

For example:

$links = pc_link_extractor($page);

Discussion

The pc_link_extractor( ) function returns an array. Each element of that array is itself a two-element array. The first element is the target of the link, and the second element is the text that is linked. For example:

$links=<<<END
Click <a href="http://www.oreilly.com">here</a> to visit a computer book 
publisher. Click <a href="http://www.sklar.com">over here</a> to visit 
a computer book author.
END;
 
$a = pc_link_extractor($links);
print_r($a);
Array
(
    [0] => Array
        (
            [0] => http://www.oreilly.com
            [1] => here
        )
    [1] => Array
        (
            [0] => http://www.sklar.com
            [1] => over here
        )
)

The regular expression in pc_link_extractor( ) won't work on all links, such as those that are constructed with JavaScript or some hexadecimal escapes, but it should function on the majority of reasonably well-formed HTML.

See Also

Recipe 13.7 for information on capturing text inside HTML tags; documentation on preg_match_all( ) at http://www.php.net/preg-match-all.


home / programming / php / cookbook /chap11 / 1 To page 1To page 2To page 3To page 4To page 5current pageTo page 7
[previous] [next]



JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
IBM Whitepaper: Innovative Collaboration to Advance Your Business
Internet.com eBook: Real Life Rails
Avaya Article: Call Control XML - Powerful, Standards-Based Call Control
Tripwire Whitepaper: Seven Practical Steps to Mitigate Virtualization Security Risks
Internet.com eBook: The Pros and Cons of Outsourcing
Go Parallel Article: Scalable Parallelism with Intel(R) Threading Building Blocks
Internet.com eBook: Best Practices for Developing a Web Site
IBM CXO Whitepaper: The 2008 Global CEO Study "The Enterprise of the Future"
Avaya Article: Call Control XML in Action - A CCXML Auto Attendant
Go Parallel Article: James Reinders on the Intel Parallel Studio Beta Program
IBM CXO Whitepaper: Unlocking the DNA of the Adaptable Workforce--The Global Human Capital Study 2008
Adobe Acrobat Connect Pro: Web Conferencing and eLearning Whitepapers
Go Parallel Article: Getting Started with TBB on Windows
HP eBook: Storage Networking , Part 1
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Go Parallel Video: Intel(R) Threading Building Blocks: A New Method for Threading in C++
HP Video: Is Your Data Center Ready for a Real World Disaster?
Microsoft Partner Portal Video: Microsoft Gold Certified Partners Build Successful Practices
HP On Demand Webcast: Virtualization in Action
Go Parallel Video: Performance and Threading Tools for Game Developers
Rackspace Hosting Center: Customer Videos
Intel vPro Developer Virtual Bootcamp
HP Disaster-Proof Solutions eSeminar
HP On Demand Webcast: Discover the Benefits of Virtualization
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Microsoft Download: Silverlight 2 Software Development Kit Beta 2
30-Day Trial: SPAMfighter Exchange Module
Red Gate Download: SQL Toolbelt
Iron Speed Designer Application Generator
Microsoft Download: Silverlight 2 Beta 2 Runtime
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
IBM IT Innovation Article: Green Servers Provide a Competitive Advantage
Microsoft Article: Expression Web 2 for PHP Developers--Simplify Your PHP Applications
Featured Algorithm: Intel Threading Building Blocks - parallel_reduce
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES
webref The latest from WebReference.com Browse >
Popular JavaScript Framework Libraries: An Overview · Controllers: Programming Application Logic - Part 2 · How to Use JavaScript to Validate Form Data
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Choosing the Right Online Backup Provider · Mother Avaya Nurtures Her Technology Partners · Software as a Service a Winning Model for Hotspot Provider

Created: March 27, 2003
Revised: March 27, 2003

URL: http://webreference.com/programming/php/chap11/1