Core Web Application Development with PHP and MySQL: Part 1
Core Web Application Development with PHP and MySQL. Part 1
This content is excerpted from Chapter 13 of the new book, "Core Web Application Development with PHP and MySQL", by permission of Prentice Hall PTR. ISBN 0131867164, copyright 2005. All rights reserved. To learn more, please visit: http://www.phptr.com/bookstore/product.asp?isbn=0131867164&rl=1.
Chapter 13: Web Applications and the Internet
Now that we have covered the basics of programming with PHP and MySQL, it is time to turn our attention to the task at handwriting web applications. Before we begin to show you some of the techniques and tools for doing this, we are going to spend time discussing many of the key concepts, considerations, and issues to keep in mind when writing them.
Over the course of this chapter, we will
A Closer Look at the World Wide Web
We will begin this chapter by taking a closer look at the technologies that make the Internetspecifically the World Wide Web (WWW)work. This discussion will be a useful refresher for many, and will provide a solid foundation for showing how our web applications work. It will also frame our later discussions of performance, scalability, and security.
The Internet: It's Less Complicated Than You Think
Many beginning programmers will confess that the workings of the World Wide Web are something only slightly less fantastical than black magic. However, this could not be further from the truth. One of the most powerful features of the Internet, and surely one of the greatest factors leading to its rapid acceptance and widespread adoption, is that it is based almost entirely on simple, open specifications and technologies that are freely available for all.
Most of the work on the Internet is done by a few key protocols or mechanisms by which computers talk to each other. For example, TCP/IP is the base protocol by which computers communicate. Other protocols, which provide progressively more and more functionality, are built on top of this. As we shall see, the Hypertext Transfer Protocol (HTTP) is simply a series of text messages (albeit with some well-defined structure to them) sent from one computer to another via TCP/IP.
While this flexibility and openness facilitates adoption and allows for easy customization and extensions, it does have its drawbacks, most notably in the realm of security. Because of its well-known structure and the availability of free implementations, there are many opportunities for people with less than noble intentions to exploit this openness. We will cover this in greater detail in Chapter 16, "Securing Your Web Applications: Planning and Code Security."
One important factor that should be considered when writing a web application is the speed at which users can connect to the Internet. Although there has been a surge in availability of high-bandwidth connections over mediums such as DSL, television cable, and satellite, a large portion of users are still using standard modems no faster than 56kbps.
If you are designing an application that you know will only be used by corporate customers with high-speed connections, it might not be a problem to include large, highresolution images or video in your site. However, home users might be quickly turned off, opting to go somewhere less painfully slow.
Computers Talking to Computers
The key technology that makes the Internet work as we know it today is the TCP/IP protocol, which is actually a pair of protocols. The Internet Protocol (IP) is a mechanism by which computers identify and talk to each other. Each computer has what is called an IP address, or a set of numbers (not entirely unlike a telephone number) that identify the computer on the Internet. The Internet Protocol allows two computers with IP addresses to send each other messages.
The format of these IP addresses depends on the version of IP in use, but the one most commonly used today is IPv4, where IP addresses consist of four one-byte digits ranging from 0254 (255 is reserved for broadcasting to large numbers of computers), and is typically written as xxx.yyy.zzz.www (such as 192.168.100.1). There are various ways of grouping the possible IPv4 addresses so that when a particular machine wants to send a message to another, it does not need to know the destination's exact location, but instead can send the message to intermediaries that know how to ensure the message ends up at the correct computer (see Figure 13-1).
However, one problem with IPv4 is that the number of unallocated IP addresses is running low, and there is often an uneven distribution of addresses. (For example, there are a few universities in the USA with more IP addresses than all of China!) One way of conserving IP addresses is for organizations to use a couple of reserved address ranges for internal use and only have a few computers directly exposed to the Internet. These reserved ranges (192.168.x.y and 10.x.y.z) can be used by anybody and are designed to be used for internal networks. They usually have their traffic routed to the Internet by computers running network address translators (NAT), which allow these "nonpublic" addresses to take full advantage of the features available.
Figure 13-1: Two computers talking over the Internet via TCP/IP.
A new version of IP, IPv6, has been developed and is seeing increasing adoption. While it would not hurt to learn about this new version and its addressing scheme (which has a significantly larger address space than IPv4), we mostly use IPv4 in our examples (although we do not do anything to preclude the use of IPv6). Many key pieces of software, including Apache HTTP Server and PHP, are including IPv6 support in newer releases for early adopters.
The Internet Protocol does little else than allow computers to send messages to each other. It does nothing to verify that messages arrive in the order they were sent without corruption. (Only the key header data is verified.)
To provide this functionality, the Transmission Control Protocol (TCP) was designed to sit directly on top of IP. TCP makes sure that packets actually arrive in the correct order and makes an attempt to verify that the contents of the packet are unchanged. This implies some extra overhead and less efficiency than IP, but the only other alternative would be for every single program to do this work itselfa truly unpleasant prospect.
TCP introduces a key concept on top of the IP address that permits computers to expose a variety of services or offerings over the network called a port. Various port numbers are reserved for different services, and these numbers are both published and well known. On the machine exposing the services, there are programs that listen for traffic on a particular portservices, or daemons. For example, most e-mail occurs over port 25, while HTTP traffic for the WWW (with which we are dealing extensively in this book) occurs on port 80. You will occasionally see a reference to a web site (URL) written as http://www. mywebsitehooray.com:8080, where the :8080 tells your web browser what port number to use (in this case, port 8080).
The one other key piece of the Internet puzzle is the means by which names, such as www.warmhappybunnies.com, are mapped to an IP address. This is done by a system called the Domain Name System, or DNS. This is a hierarchical system of naming that maps names onto IP addresses and provides a more easily read and memorized way of remembering a server.
The system works by having a number of "top level" domains (com, org, net, edu, ca, de, cn, jp, biz, info, and so on) that then have their own domains within them. When you enter a name, such as www.warmhappybunnies.com, the software on your computer knows to connect to a DNS server (also known as a "name server") which, in turn, knows to go to a "root name server" for the com domain and get more information for the warmhappybunnies domain. Eventually, your computer gets an IP address back, which the TCP/IP software in the operating system knows how to use to talk to the desired server.
The Hypertext Transfer Protocol
The web servers that make the WWW work typically "listen" on port 80, the port reserved for HTTP traffic. These servers operate in a very simple manner. Somebody, typically called the "client," connects to the port and makes a request for information from the "server." The request is analyzed and processed. Then a response is sent, with content or with an error message if the request was invalid or unable to be processed. After all of this, the connection is closed and the server goes back to listening for somebody else to connect. The server does not care who is connecting and asking for data andapart from some simple loggingdoes not remember anything about the connection. This is why HTTP is sometimes called a stateless protocolno information is shared between connections to the server.
The format of both the HTTP request and response is very simple. Both share the following plain text format:
Optional Header: value
[Other optional headers and values]
[blank line, consisting of CR/LF]
Optional Body that comes after the single blank line.
An HTTP request might look something like the following:
User-Agent: WoobaBrowser/3.4 (Windows)
[this is a blank line]
The response to the previous request might be something as follows:
HTTP/1.1 200 OK
There are a couple of other HTTP methods similar to the GET method, most notably POST and HEAD. The HEAD method is similar to the GET method with the exception that the server only sends the headers rather than the actual content.
Created: March 27, 2003
Revised: September 26, 2005