Advanced Web Performance Optimization [con't]
A specific caching example
Let's look at a specific example as we build up the caching efficiency for WebSiteOptimization.com's logo,
l.gif. First we request the image from Internet Explorer:
To demonstrate the default Apache configuration, we eliminated the cache control directives from our
httpd.conf file, and the response was as follows:
This image was last modified June 19, 2004 and will not be changed for some time. It is clear from these response headers that this object does not change frequently and can be safely cached for at least a year into the future. Note the lack of
Cache-Control headers, and the inclusion of an ETag header for the image. Next we'll show how to add cache control headers.
mod_headers. For Apache,
mod_headershandle cache control through HTTP headers sent from the server. Because they are installed by default, you only need to configure them. Before adding the following lines, first check that they are not enabled. On many operating systems, they are enabled by default. For Apache 1.3x, enable the
headersmodules by adding the following lines to your
For Apache 2.0, enable the modules in your
httpd.conf file like so:
Target files by extension for caching
One quick way to enable cache control headers for existing sites is to target files by extension. Although this method has some disadvantages (notably the requirement of file extensions), it has the virtue of simplicity. To turn on
ExpiresActive to on:
Next, target your website's root HTML directory to enable caching for your site in one fell swoop. Note that the default web root shown in the following code (
/var/www/htdocs) varies among operating systems.
ExpiresDefault A300 sets the default expiry time to 300 seconds after access (A) (using
M300 would set the expiry time to 300 seconds after file modification). The
FilesMatch segment sets the cache control header for all
.html files to 86,400 seconds (one day). The second
Note that you can target your files with a more granular approach using multiple directory sections, like this:
For truly dynamic content you can force resources to not be cached by setting an age of zero seconds, which will not store the resource anywhere (or you can set
Target files by MIME type. The disadvantage of the preceding method is its reliance on the existence of file extensions. In some cases, webmasters elect to use URIs without extensions for portability. A better method is to use the
ExpiresByType command of the
mod_expires module. As the name implies,
ExpiresByType targets resources for caching by MIME type, like this:
httpd.conf directives set the same parameters, only in a more flexible and readable way. For expiry commands you can use
modified, depending on whether you want to start counting from the last time the file was accessed or from the last time the file was modified. In the case of
WebSiteOptimization.com, we chose to use short access offsets for text files likely to change, and longer access offsets for infrequently changing images.
AllowOverride All command. This allows webmasters to override these settings with
.htaccess files for directory-based authentication and redirection. However, overriding the
httpd.conf file causes a performance hit because Apache must traverse the directory tree looking for
After updating the
httpd.conf file with the preceding MIME-based code, we restart the HTTP daemon in Apache for Linux using this command from the shell prompt:
Red Hat Enterprise, Fedora, and CentOS all make use of the
service command. Note that the commands to restart the HTTP daemon vary among operating systems. On most systems, you can use the
apachectl command or the
/etc/init.d/apache2 init script to start, stop, or restart Apache. Some administrators choose to do Apache configuration and control entirely through a web interface such as Webmin, or through an OS-specific graphical utility.
HTTP header results. We updated the
httpd.conf configuration file with the MIME type code in the preceding section. Let's look at the how the headers change when we request the WebSiteOptimization.com logo (
The headers for our home page logo now look like this:
As a result, this resource has cache control headers. We left the ETag in as we use one server. Note also that the Server field is also stripped down, to save some header overhead. This is done with the
This minimizes the response header from this:
to the minimal:
Our images are now cacheable for one year. We could eliminate other headers, such as
Accept-Ranges, but we don't gain as much by doing so.
Cache control with Microsoft IIS. You can do cache control in Internet Information Server (IIS) by accessing the IIS Manager and setting headers on files or folders. First, navigate with the IIS Manager to the file or directory that you want to target (see FigureÂ 9.6, "Using IIS Manager to set caching policy").
Right-click Properties and choose the HTTP Headers tab. Check "Enable content expiration" and then set the appropriate time frame (see FigureÂ 9.7, "Setting content expiration in IIS"). This will land you on the screen that includes the HTTP Headers tags and content cache options.
If your site is not organized in directories for cache control optimization, it can be quite cumbersome to set cache control policies for a large number of files. See http://www.port80software.com/support/articles/developforperformance2 for more details about IIS cache control. You can't set cache control headers by MIME type settings with this technique, so Port80 wrote CacheRight to deal with this issue. CacheRight is basically "mod_expires plus" for IIS.
With Apache version 2.2, mod_cache has become suitable for production use. mod_cache implements a content cache that you can use to cache local or proxied content.
This improves performance by temporarily storing resources in faster storage. It can use one of two provider modules for storage management:
mod_disk_cache, which implements a disk-based storage manager.
mod_mem_cache, which implements a memory-based storage manager. You can configure
mod_mem_cacheto operate in two modes: caching open file descriptors or caching objects in heap storage. You can use
mod_mem_cacheto cache locally generated content or to cache backend server content for
mod_proxywhen configured using
ProxyPass(a.k.a. reverse proxy).
Content is stored in and retrieved from the cache using URI-based keys. Content with access protection is not cached. ExampleÂ 9.1 shows a sample
mod_cache configuration file.
ExampleÂ 9.1.Â Sample
mod_cache configuration file
CacheDirLevels, set to
5, is the number of directory levels below the cache root that will be included in the cache data.
CacheDirLength, set to
3, sets the number of characters in proxy cache subdirectory names.
For more details, see the Apache documentation.
This chapter is an excerpt from the book, Website Optimization: Speed, Search Engine & Conversion Rates Secrets by Andrew B. King, published by O'Reilly Media, Inc., July 2008, ISBN 0596515081, Copyright 2008 O'Reilly Media, Inc.
Original: August 25, 2008