What is HTTP Delta Encoding?
What is HTTP Delta Encoding?
by Jeffrey C. Mogul (firstname.lastname@example.org)
Web caching is successful because many Web pages don't change between references. But some resources do change, which forces the re-retrieval of modified resources. However, the modifications are often minimal; if one could encode just the differences between the cached (older) page and the new page, in many cases that would require sending very few bytes. This approach is called delta encoding. Several research studies have shown that it can improve Web performance, and a standardization effort is now underway.
- What is delta encoding?
- How much will it help?
- The connection between delta encoding and caching
- The connection between delta encoding and compression
- Delta encoding and rsync compared
- Delta encoding algorithms and formats
- Standardization efforts
- An example HTTP exchange
- Delta encoding and proxies
- Implementations and deployment status
- Printable Version
Most Web clients and proxies have the ability to cache the responses they receive. If the user of a browser, or a client of a proxy, requests a URL that is already in the cache, then the cache can return the stored response and avoid a network transfer from the origin server. This "reuse the entire response" form of caching is simple, and effective. However, it doesn't always work, and various research studies [5, 6] have found that the upper limit on full-response caching, even for infinitely large caches, appears to be in the range of 30% to 50%. It's even lower when cache hit rates are weighted by the size of the responses.
A Web request can cause a cache miss because the resource has changed since the cache entry was created. For example, a page showing the stock price of internet.com might be different each time it is retrieved from the stock quote server. Or a page showing a weather forecast for San Francisco might change every hour or two. In either example, any change in content between requests would (or, at least, should) cause a cache miss.
However, these content changes often affect only a small fraction of the bytes in a Web response. For example, on a stock quote page, except for the actual stock price and daily volume, perhaps nothing else changes very often. So why should we have to retrieve an entire copy of the new page?
In fact, the cost of a Web cache miss could be greatly reduced, using delta encoding. In this technique, the origin server sends the difference (or delta) between the old and new instances of the resource, rather than the entire new instance. (I prefer the term "instance" to "version," since the latter often means different things to different people.) If the delta is much smaller than the entire page, this saves a lot of bandwidth.
A Web request might also cause a cache miss because the cache has never seen this URL before. Delta encoding can help here, too. In many cases, two pages with different URLs have very similar content (for example, the quote pages for two different stocks might share a lot of HTML boilerplate). A technique called delta clustering can be used to retrieve the difference between two pages from different URLs. However, this variation is more complex and less mature than basic delta encoding, so I won't cover it further in this article.
Although the idea of transmitting just the differences between successive pieces of information is an old one (for example, SCCS, RCS, and similar revision-control systems store older revisions as differences), the first use of delta encoding in the Web was developed by Housel and Lindquist at IBM .
Next Page: How much will it help?
Revised: November 14, 2000