HTTP Compression Speeds up the Web
Technical Overview
HTML/XML/JavaScript/text compression: Does it make sense?
The short answer is "only if it can get there quicker." In 99% of all cases it makes sense to compress the data. However there
are several problems that need to be solved to enable seamless transmission
from the server to the consumer.
- Compression should not conflict with MIME types
- Dynamic compression should not effect server performance
- Server should be smart enough to know whether the user’s browser can decompress the content
Let's create a simple scenario. An HTML file which contains a large music listing in the form
of a table.
The file is 679,188 bytes in length.
Let's track this download over a 28K modem and then compare the results before and after compression.
The theoretical throughput over a 28K modem is 3,600 bytes per second. Reality
is more like 2,400 bytes per second but for the sake of this article we will
work at the theoretical maximum. If there was no modem compression then the
file would download in 188.66 seconds. On the average with modem compression
running we can expect a download time of about 90 seconds which indicates about
a 2:1 compression factor. The total number of packets transmitted from modem to
modem effectively "halved" the file size. But note that the server still had to
keep open the TCP/IP sub system to "send" all the bytes to the modem for
transmission. What happens if we can compress the data prior to transmission
from the server. The file is 679,188 bytes in length. If we can compress it
using standard techniques (which are not optimized for HTML) then we can expect
to see the file be compressed down to 48,951 bytes. This is a 92.8%
compression factor. We are now transmitting only 48,951 bytes (plus some header
information which should also be compressed but that's another story). Modem
compression no longer plays a factor because the data is already compressed.
Where are the performance improvements?
- Bandwidth is conserved
- Compression consumes only a few milliseconds of CPU time
- The server's TCP/IP subsystem only has to serve 48,851 bytes to the modem
- At a transfer rate of 3,600 bytes per second the file arrives in 13.6 seconds instead of 90 seconds
Compression clearly makes sense as long as it's seamless and doesn't kill server performance.
What else remains to be done?
A lot! Better algorithms need to be invented that compress the data stream more
efficiently than gzip. Remember gzip was designed before HTML came along. Any
technique which adds a new compression algorithm will require a thin client to
decode and possibly tunneling techniques to enable it "firewall friendly." To
sum up we need:
- Improved compression algorithms optimized specifically for HTML/XML
- Header compression. Every time a browser requests a page
it sends a header file. In the case of WAP browsers header information can be
as high as 900 bytes. With compression this can be reduced by 20-25%, to less than 700 bytes, as the redundancy in these headers is very low.
- Compression for WAP. (Currently WAP/WML does not
support a true entropy encoding technique. It uses binary encoding to compress
the tags while ignoring the content.)
- Dynamic compression for caching servers. (Download
RCTPD Web Accelerator for all caching servers. http://www.remotecommunications.com/rctpd/)
- Real time compression/encryption with tunneling.
Further Reading
# # # # #
About the author: Peter Cranstone was a Co-Founder and the Chief Software Architect of HyperSpace Communications, Inc., a software company dedicated to data
acceleration technology. He was also a Founder and Principal of The
James Group, another company engaged in the development of advanced data
compression algorithms. Mr. Cranstone has spent most of his professional
career as a technological innovator and inventor. Mr. Cranstone is the
co-inventor of two patent-pending applications
covering the HyperSpaceR smart engine and the ElseWare Messaging Alert
System which allows for Web-enabled devices to be controlled via simple
e-mail. He can be reached at cranstone@msn.com.
  
Comments are welcome
|