HTTP

Introduction

This is the protocol used to retrieve Web pages from a server (normally on port 80) and also to send information obtained from a form back to a server. It is perhaps the most complex of protocols that we will meet.

Like other protocols, this one can be explored using Telnet to act as a primitive web browser, sending and receiving information according to the protocol.

The HTTP protocol is overall as follows:

  1. the connection. The client makes a TCP connection with the server, normally on port 80, unless some other port is specified.
  2. a request by the client. The client sends a request to the server requesting a page at a specified location. A typical request is:

    GET /index.html HTTP/1.0

    which gives only the path name, since the machine name is already implicit.
  3. the response by the server. The server sends the data back to the client as lines of ASCII text. The first line is typically:

    HTTP/1.0 200 OK
  4. closing the connection. Either the client or the server, or both close the connection.

Thus a separate connection is used for each request.


The response code 200 OK is the most common response, signaling that the request was successful. There are many response codes. They are grouped as shown below:

Response code

Meaning

200 - 299

success

300 - 399

web browser needs to go to another page

400 - 499

client error

500 - 599

server error

 

Some common response codes are:

Common Response Codes

Response Code

Meaning

200 OK

request successful

301 Moved Permanently

The page has moved to a new URL.

304 Not Modified

The client made a request for a page, but used an option to specify that it only requires the page if it has been changed.

400 Bad request

The request has faulty syntax

401 Unauthorized

Authorization is needed to access this page, Either the authorization is wring or has not been supplied.

404 Not Found

The server cannot find the page. This is a common error.

503 Service Unavailable

The server is temporarily unable to handle the request, perhaps due to maintenance or overloading

 


The request

The client sends a request, for example:

GET /index.html HTTP/1.0

Accept: text/html

Accept: image/gif

User-Agent: Lynx/2.4

This is a sequence of lines, in ASCII, terminated by an empty line. As we have seen, the second item on the first line is the path name. This is followed by the version of the HTTP protocol that the client understands. This line is all that is required. However, other information can be provided by the client. Each piece of information is on a separate line and takes the form:

keyword: value

For example:

Accept: text/html

says that the client can accept html documents. Another example is:

Accept: image/gif

which again allows the server to tailor information to what the client is able to process. The client can also say which web browser and version it is, for example:

User-Agent: Lynx/2.4

There are other request types in addition to GET:

HEAD retrieves only the file header, so that the browser can see whether it has been updated since it last retrieved a copy

POST is used in conjunction with forms and CGI (see later).

 


The Response

The response consists of a number of header lines, followed by an empty line, followed by the contents of the file - usually html. For example:

HTTP/1.1 200 OK

Date: Mon, 12 Jul 1999 12:42:22 GMT

Server: Apache/1.3.6 (Unix)

Last-Modified: Wed, 07 Jul 1999 17:14:42 GMT

ETag: "fcdd-17e-37838b02"

Accept-Ranges: bytes

Content-Length: 382

Connection: close

Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

<HTML>

<HEAD>

etc.

 

The first line gives the HTTP version number and a response code (see above).

The third line is the name of the server program and version number.

The last line of the header specifies the MIME type of the content.