Think of caching like maintaining your personal address book. All the phone numbers in there are available in the phonebook, but for speedier access, you maintain a list of the numbers that you call most frequently.
In computer science, caching reduces the delay of data traveling from the storage system to the user. Your computer's RAM, for example, is a caching system that spares your running programs from needing to access the hard drive thousands or even millions of times during execution. Web caching on the Internet works similarly; it prevents a round trip to the original server (which could be on the other end of the world) every time the user requests a resource.
Web caches sit between the web server and the client's browser, where they can save copies of resources (HTML pages, images, scripts, stylesheets, etc.) as they pass them on. When a client requests one of these resources, the web cache can return it itself instead of forwarding the request to the web server.
Are Web Caches Really Useful?
- Reducing latency: Because an intermediate cache (which is closer to the client than the web server) can serve the content, it takes less time for the resource to reach the client. This makes the web site seem more responsive.
- Reducing network traffic: Because several resources are reused and they don't need to be fetched afresh from the server, caching reduces the amount of bandwidth used by the client, thus reducing traffic on the network.
Levels of Cache
There are three levels of web cache: browser cache, proxy cache, and gateway cache.
Web Browser Cache
The first level of cache resides on your web browser. Every time you visit a new page, the browser stores a copy of each resource from that page on your hard drive. Each time the resource is requested again in the same browser session, the browse returns the local copy. This cache is used especially when the user hits the "Back" button. Similarly, if you use the same header graphic on all pages, then the graphic gets downloaded only once and the pages load faster.
Proxy caches work similarly to the browser cache. The primary differences are that proxy caches are shared caches that serve hundreds to thousands of clients instead of just one. So, if thousands of people visit a site while you are in a session on that same site, their requests won't be sent all the way to the web server. Rather, they will be serviced by the proxy server. The proxy settings can be either set in the preferences tab on your browser or they can be set automatically by the underlying network with intermediaries. Handling the "freshness" of the resource on the proxy server is an entirely different problem, but that challenge is beyond the scope of this article.
Gateway CacheGateway caches are similar to intermediaries, but instead of being deployed by the network administrators to save bandwidth, they are usually managed by webmasters to make their web sites more reliable and scalable.
How to Make the Most of Caching?
You can't avoid caching, so you might as well befriend it. You might wonder if it is worth the effort, but I can tell you that the end result is quite sweetespecially if you expect substantial load on your server.
HTML Meta Tags Are Not Effective
You may have learned that by putting a meta tag with the
Cache-Control attribute into the
<HEAD> section of the HTML document, you will control the caching system. As nice as this may sound, it is not that effective, because only some browser caches honor that tag (actually read the HTML document). Proxy and gateway caches don't honor it at all because they don't read the HTML document.
Pragma HTTP Headers Won't Work Either
Another myth about caching is that if you specify Pragma: no-cache to a document it will make the Pragma not cacheable. This is not always true, because the HTTP guidelines do not specify Pragma response headers. Some caches will react to this header, while most will ignore it as an unknown header.
So, if the two caching methods you know of are not effective, then how can you control the cache? There are two methods that will work and they are HTTP response headers. However, you can't see the HTTP response headers in the web browser. They are usually set automatically by the web server and depending on the web server configuration, you can control them to a certain degree.