PHP Anthology, Volume 2: Applications. Chapter 5: Caching | 2

PHP Anthology, Volume 2: Applications. Chapter 5: Caching

Chapter 5. Caching

Table of Contents
How do I prevent Web browsers caching a page?
How do I capture server side output for caching?
Using Output Buffering for Server Side Caching
Chunked Buffering
How do I implement a simple server side caching system?
Cache_Lite Options
Purging the Cache
Caching Function Calls
How do I control client side caching with PHP?
Page Expiry
Page Modification Time
Further Reading
In the good old days, back when building Websites was as easy as knocking up a few HTML pages, the delivery of a Web page to a browser was a simple matter of having the Web server fetch a file. A site’s visitors would see its small, text-only pages almost immediately, unless they were using particularly slow modems. Once the page was downloaded, the browser would cache
it somewhere on the local computer so that, should the page be requested again, after performing a quick check with the server to ensure the page hadn’t been updated, the browser could display the locally cached version. Pages were served as quickly and efficiently as possible, and everyone was happy (except those using 9600 bps modems).

The advent of dynamic Web pages spoiled the party, effectively “breaking” this model of serving pages by introducing two problems:

It’s usually possible to live with both problems given a small PHP application, but as the complexity of, and traffic to, your site increases, you may run into difficulties. However, both these issues can be solved, the first with server side caching , the second, by taking control of client side caching from within your application. The exact approach you use to solve the problem will depend on your application, but in this chapter, we’ll see how you can solve both using PHP and a number of class libraries from PEAR.

Note that in this chapter’s discussions of caching, we’ll look at only those solutions implemented in PHP. These should not be confused with some of the script caching solutions that work on the basis of optimizing and caching compiled PHP scripts. Included in this group are the Zend Accelerator, iconCube PHP Accelerator, and Turck MMCache, [Ed. Broken link.] the latter being the only accelerator that’s ready for use with Windows based PHP installations today.

Before we look at the approaches you can take to client and server side caching, the first thing we need to understand is how to prevent Web browsers (and proxy servers) from caching pages in the first place. The most basic approach to doing this utilizes HTML meta tags:
<meta http-equiv="Expires" content="Mon, 26 Jul 1997 05:00:00 GMT"
/>
<meta http-equiv="Pragma" content="no-cache" />

By inserting a past date into the Expires meta tag, we can tell the browser that the cached copy of the page is always out of date. This means the browser should never cache the page. The Pragma: no-cache meta tag is a fairly well-supported convention that most Web browsers follow. Upon encountering this tag, they usually won’t cache the page (although there’s no guarantee; this is just a convention).

It sounds good, but there are two problems associated with the use of meta tags:

  1. If a tag wasn’t present when the page was first requested by a browser, but appears later (for example, you modified the included pageheader.php file, which contains the top of every Web page), the browser will remain blissfully ignorant and keep its cached copy of the original.

  2. Proxy servers that cache Web pages, such as those common to ISPs, generally will not examine the HTML documents themselves. Instead, they rely purely on the Web server from which the documents came, and the HTTP protocol. In other words, a Web browser might know that it shouldn’t cache the page, but the proxy server between the browser and your Web server probably doesn’t—it will continue to deliver the same out-of-date page to the client

A better approach is to use the HTTP protocol itself, with the help of PHP’s header function, to produce the equivalent of the two meta tags above:

<?php
header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header('Pragma: no-cache');
?>

We can go one step further, using the Cache-Control header that’s supported by HTTP 1.1 capable browsers:

<?php
header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header('Cache-Control: no-store, no-cache, must-revalidate');
header('Cache-Control: post-check=0, pre-check=0', FALSE);
header('Pragma: no-cache');
?>

This essentially guarantees that no Web browser or intervening proxy server will cache the page, so visitors will always receive the latest content. In fact, the first header should accomplish this on its own; this is the best way to ensure a page is not cached. The Cache-Control and Pragma headers are added for “insurance” purposes. Though they don’t work on all browsers or proxies, they will catch some cases in which the Expires header doesn’t work as intended (e.g. if the client computer’s date is set incorrectly).

Of course, to disallow caching entirely introduces the problems we discussed at the start of this chapter. We’ll look at the solution to these issues in just a moment.

Internet Explorer and File Download Caching

Our discussion of PDF rendering in Chapter 3, Alternative Content Types explained that issues can arise when you’re dealing with caching and file downloads. In serving a file download via a PHP script that uses headers such as Content-Disposition: attachment, filename=myFile.pdf or Content-Disposition: inline, filename=myFile.pdf, you’ll have problems with Internet Explorer if you tell the browser not to cache the page.

Internet Explorer handles downloads in a rather unusual manner, making two requests to the Website. The first request downloads the file, and stores it in the cache before making a second request (without storing the response). This request invokes the process of delivering the file to the end user in accordance with the file’s type (e.g. it starts Acrobat Reader if the file is a PDF document). This means that, if you send the cache headers that instruct the browser not to cache the page, Internet Explorer will delete the file between the first and second requests, with the result that the end user gets nothing. If the file you’re serving through the PHP script will not change, one solution is simply to disable the “don’t cache” headers for the download script.

If the file download will change regularly (i.e. you want the browser to download an up-to-date version), you’ll need to use the last-modified header, discussed later in this chapter, and ensure that the time of modification remains the same across the two consecutive requests. You should be able to do this without affecting users of browsers that handle downloads correctly. One final solution is to write the file to your Web server and simply provide a link to it, leaving it to the Web server to report the cache headers for you. Of course, this may not be a viable option if the file is supposed to be secured by the PHP script, which requires a valid session in order to provide users access to the file; with this solution, the written file can be downloaded directly.

Created: March 27, 2003
Revised: January 2, 2004

URL: http://webreference.com/programming/phpanth3