spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / programming / php / cookbook /chap11 / 1 current pageTo page 2To page 3To page 4To page 5To page 6To page 7
[next]

PHP Cookbook: Web Automation

Business Systems Analyst
Professional Technical Resources
US-OR-Portland

Justtechjobs.com Post A Job | Post A Resume
Developer News
Metasploit 3.2 Offers More 'Evil Deeds'
'Thank You Apple. Seriously.'
The Buzz: BlackBerry App Store Seen Next

Chapter 11: Web Automation

Introduction

Most of the time, PHP is part of a web server, sending content to browsers. Even when you run it from the command line, it usually performs a task and then prints some output. PHP can also be useful, however, playing the role of a web browser--retrieving URLs and then operating on the content. Most recipes in this chapter cover retrieving URLs and processing the results, although there are a few other tasks in here as well, such as using templates and processing server logs.

There are four ways to retrieve a remote URL in PHP. Choosing one method over another depends on your needs for simplicity, control, and portability. The four methods are to use fopen( ), fsockopen( ), the cURL extension, or the HTTP_Request class from PEAR.

Using fopen( ) is simple and convenient. We discuss it in Fetching a URL with the GET Method. The fopen( ) function automatically follows redirects, so if you use this function to retrieve the directory http://www.example.com/people and the server redirects you to http://www.example.com/people/, you'll get the contents of the directory index page, not a message telling you that the URL has moved. The fopen( ) function also works with both HTTP and FTP. The downsides to fopen( ) include: it can handle only HTTP GET requests (not HEAD or POST), you can't send additional headers or any cookies with the request, and you can retrieve only the response body with it, not response headers.

Using fsockopen( ) requires more work but gives you more flexibility. We use fsockopen( ) in Fetching a URL with the POST Method. After opening a socket with fsockopen( ), you need to print the appropriate HTTP request to that socket and then read and parse the response. This lets you add headers to the request and gives you access to all the response headers. However, you need to have additional code to properly parse the response and take any appropriate action, such as following a redirect.

If you have access to the cURL extension or PEAR's HTTP_Request class, you should use those rather than fsockopen( ). cURL supports a number of different protocols (including HTTPS, discussed in Fetching an HTTPS URL) and gives you access to response headers. We use cURL in most of the recipes in this chapter. To use cURL, you must have the cURL library installed, available at http://curl.haxx.se. Also, PHP must be built with the --with-curl configuration option.

PEAR's HTTP_Request class, which we use in Fetching a URL with the POST Method, Fetching a URL with Cookies, and Fetching a URL with Headers, doesn't support HTTPS, but does give you access to headers and can use any HTTP method. If this PEAR module isn't installed on your system, you can download it from http://pear.php.net/get/HTTP_Request. As long as the module's files are in your include_path, you can use it, making it a very portable solution.

Debugging the Raw HTTP Exchange helps you go behind the scenes of an HTTP request to examine the headers in a request and response. If a request you're making from a program isn't giving you the results you're looking for, examining the headers often provides clues as to what's wrong.

Once you've retrieved the contents of a web page into a program, use Marking Up a Web Page through Removing HTML and PHP Tags to help you manipulate those page contents. Marking Up a Web Page demonstrates how to mark up certain words in a page with blocks of color. This technique is useful for highlighting search terms, for example. Extracting Links from an HTML File provides a function to find all the links in a page. This is an essential building block for a web spider or a link checker. Converting between plain ASCII and HTML is covered in Converting ASCII to HTML and Converting HTML to ASCII. Removing HTML and PHP Tags shows how to remove all HTML and PHP tags from a web page.

Another kind of page manipulation is using a templating system. Discussed in Using Smarty Templates, templates give you freedom to change the look and feel of your web pages without changing the PHP plumbing that populates the pages with dynamic data. Similarly, you can make changes to the code that drives the pages without affecting the look and feel. Parsing a Web Server Log File discusses a common server administration task--parsing your web server's access log files.

Two sample programs use the link extractor from Extracting Links from an HTML File. The program in Program: Finding Stale Links scans the links in a page and reports which are still valid, which have been moved, and which no longer work. The program in Program: Finding Fresh Links reports on the freshness of links. It tells you when a linked-to page was last modified and if it's been moved.


home / programming / php / cookbook /chap11 / 1 current pageTo page 2To page 3To page 4To page 5To page 6To page 7
[next]



JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
IBM Whitepaper: Innovative Collaboration to Advance Your Business
Internet.com eBook: Real Life Rails
Avaya Article: Call Control XML - Powerful, Standards-Based Call Control
Tripwire Whitepaper: Seven Practical Steps to Mitigate Virtualization Security Risks
Internet.com eBook: The Pros and Cons of Outsourcing
Go Parallel Article: Scalable Parallelism with Intel(R) Threading Building Blocks
Internet.com eBook: Best Practices for Developing a Web Site
IBM CXO Whitepaper: The 2008 Global CEO Study "The Enterprise of the Future"
Avaya Article: Call Control XML in Action - A CCXML Auto Attendant
Go Parallel Article: James Reinders on the Intel Parallel Studio Beta Program
IBM CXO Whitepaper: Unlocking the DNA of the Adaptable Workforce--The Global Human Capital Study 2008
Adobe Acrobat Connect Pro: Web Conferencing and eLearning Whitepapers
Go Parallel Article: Getting Started with TBB on Windows
HP eBook: Storage Networking , Part 1
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Go Parallel Video: Intel(R) Threading Building Blocks: A New Method for Threading in C++
HP Video: Is Your Data Center Ready for a Real World Disaster?
Microsoft Partner Portal Video: Microsoft Gold Certified Partners Build Successful Practices
HP On Demand Webcast: Virtualization in Action
Go Parallel Video: Performance and Threading Tools for Game Developers
Rackspace Hosting Center: Customer Videos
Intel vPro Developer Virtual Bootcamp
HP Disaster-Proof Solutions eSeminar
HP On Demand Webcast: Discover the Benefits of Virtualization
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Microsoft Download: Silverlight 2 Software Development Kit Beta 2
30-Day Trial: SPAMfighter Exchange Module
Red Gate Download: SQL Toolbelt
Iron Speed Designer Application Generator
Microsoft Download: Silverlight 2 Beta 2 Runtime
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
IBM IT Innovation Article: Green Servers Provide a Competitive Advantage
Microsoft Article: Expression Web 2 for PHP Developers--Simplify Your PHP Applications
Featured Algorithm: Intel Threading Building Blocks - parallel_reduce
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES
webref The latest from WebReference.com Browse >
Popular JavaScript Framework Libraries: An Overview · Controllers: Programming Application Logic - Part 2 · How to Use JavaScript to Validate Form Data
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
Choosing the Right Online Backup Provider · Mother Avaya Nurtures Her Technology Partners · Software as a Service a Winning Model for Hotspot Provider

Created: March 27, 2003
Revised: March 27, 2003

URL: http://webreference.com/programming/php/chap11/1