Internet Principles

This page reviews the central concepts of software on the internet. It briefly explains:


The Internet is the network that connects computers all over the world. It works according to a set of agreed protocols. TCP (Transmission Control Protocol) and IP(Internet Protocol) are the most commonly-used protocols for using the Internet. (But there are others at lower levels.) The combination is simply known as TCP/IP.

The Internet is a packet switching system. Any message is broken into packets that are transmitted independently across the interment (sometime by different routes). These packets are called datagrams. The route chosen for each datagram depends on the traffic at any point in time. Each datagram has a header of between 20 and 60 bytes, followed by the payload of up to 65,515 bytes of data. The header consists of, amongst other data:

  1. the version number of the protocol in use
  2. IP address of sender
  3. IP address of destination

TCP breaks down a message into packets. At the destination, it re-assembles packets into messages. It attaches a checksum to each packet. If the checksum doesn't match the computed checksum at the destination, the packet is re-transmitted. Thus TCP ensures reliable transmission of information. In summary, TCP:

  1. provides re-transmission of lost data
  2. delivery of data in the correct order

Exercise: Suggest some typical causes for lost or damaged data transmission

IP is concerned with routing. IP attaches the address of the destination of each packet. IP ensures that packets get to the right place.

TCP is the higher-level protocol that uses the lower-level IP.

When an application is written, the general principle is to use the highest level protocol that you can, provided that it provides the functionality and performance that is required. Many applications can be written using TCP/IP. For example, a Web browser can be written in Java using only URL's, without any explicit mention of sockets.

The relationship between the protocols can be visualized as shown in the following diagram. On each machine an application program makes calls on procedures in the transport layer (normally TCP). In turn the transport layer makes calls on the Internet layer (normally IP). In turn the Internet layer makes calls on the physical layer, which is different depending on the technology of the communication link.

At the destination machine, information is passed up through the layers to the application program. Each application program acts as if it is communicating directly with the application on another machine. The lower levels of the communication software and hardware are invisible.

This 4 layer model is sufficient for understanding Internet software. But there are other models, like the ISO 7 layer model, that use a different number of layers.

The application layer produces some data, adds a header to it and passes the complete package to the transport layer. The transport layer adds another header and passes the package to the internet layer. The internet layer adds another header and passes it to the physical layer. The application data is enclosed by 4 headers used by the different layers. This process can be thought of as repeatedly putting a letter into an envelope and then addressing the envelope.


Most applications use TCP. However, an example of a situation in which it is desirable to use a lower-level protocol is the case of audio streaming. If you want to download a sound file, it can take some time, even though it may be compressed. You have to wait (maybe some time) for the complete file to download before it can be played. An alternative is to listen to the sound as it is being downloaded - streaming. One of the most popular technologies is called RealAudio.

RealAudi does not use TCP because of its overhead. The sound file is sent in IP packets using the UDP (User Datagram Protocol) instead of TCP. UDP is an unreliable protocol:

UDP doesn't re-send a packet if it is missing or there is some other error, and it doesn't assemble packets into the correct order. But it is faster than TCP. In this application, losing a few bits of data is better than waiting for the re-transmission of some missing data. The application's major mission is to keep playing the sound without interruption. (In contrast, the main goal of a file transfer program is to transmit the data accurately.)

The same mechanism is used with video streaming.

UDP is a protocol at the same level as TCP, above the level of IP.

IP Address

An IP address is a unique address for every host computer in the world. Consists of 4 bytes or 32 bits. This is represented in quad notation (or dot notation) as 4 x 8 bit numbers, each in the range 0 to 255, e.g.

IP addresses are registered so that they stay unique.

You can find the IP address of the local machine under Windows NT by typing the following command at the DOS prompt (Start-Programs-MSDOS Command Prompt):


Under Unix, this command is:


Exercise: Type this command at the Windows NT DOS command prompt or the Unix prompt.

The IP address is a special address, called the local loopback address, that denotes the local machine. A message sent to this address will simply return to the sender, without leaving the sender. It is useful for testing purposes.

Domain name

The domain name is the user-friendly equivalent of an IP address. Used because the numbers in an IP address are hard to remember and use. Also known as a host name. Example:

Such a name starts with the most local part of the name and is followed by the most general. The whole name space is a tree, whose root has no name. the first level in the tree has com, org, edu, uk, etc.

The parts of a domain name don't correspond to the parts of an IP address. Indeed

domain names don't always have 4 parts - they can have 2, 5 or whatever.

All applications that use an address should work whether an IP address or a domain name is used. In fact, a domain name is converted to an IP address before it is used.

Exercise: Compare and contrast IP addresses with domain names.

Domain Name System

A program, say a Web browser, that has a domain address usually needs to convert it into an IP address before making contact with the server. The domain name system (DNS) provides a mapping between IP addresses and domain names. All this information cannot be all in one place and so it is a distributed database.

Clients, Servers and Peers

A network application usually involves a client and a server. Each is a process (an independently running program) running on a (different) computer.

A server runs on a host and provides some particular service, e.g. e-mail, access to local Web pages. Thus a Web server is a server. A commonly-used web server program is called Apache.

A client runs on a host but needs to connect with a sever on another host to accomplish its task. Usually, different clients are used for different tasks, e.g. Web browsing and e-mail. Thus a Web browser is a client.

Some programs are not structured as clients and servers. For example a game, played across the internet by two or more players is a peer to peer relationship. Other examples: chat, internet phone, shared whiteboard.

Port Numbers

To identify a host machine, an IP address or a domain name is needed. To identify a particular server on a host, a port number is used. A port is like a logical connection to a machine. Port numbers can take values from 1 to 65,535. It has no correspondence with the physical connections, of which there might be just one. Each type of service has, by convention, a standard port number. Thus 80 usually means Web serving and 21 means file transfer. If the default port number is used, it can be omitted in the URL (see below). For each port supplying a service there is a server program waiting for any requests. Thus a web server program listens on port 80 for any incoming requests. All these server programs run together in parallel on the host machine.

When a packet of information is received by a host, the port number is examined and the packet sent to the program responsible for that port. Thus the different types of request are distinguished and dispatched to the relevant program.

The following table lists the common services, together with their normal port numbers. These conventional port numbers are sometimes not used for a variety of reasons. One example is when a host provides (say) multiple web servers, so only one can be on port 80. Another reason is where the server program has not been assigned the privilege to use port 80.

Protocol name

port number

nature of service



the server simply echoes the data sent to it. This is useful for testing purposes.



provides the ASCII representation of the current date and time on the server



transferring files. (ftp uses two ports)



sending ftp commands like RETR and STOR



remote login and command line interaction



e-mail (Simple mail Transfer Protocol)






Usenet (Network News Transfer Protocol)

Nearly all of these protocols are described later in these notes.

Exercise: Use a telnet program to enhance your understanding of ports, investigate the services provided by various servers, and understand something of high level protocols such as HTTP.

Although the main use of telnet is remote login, it can be used as a general-purpose client that simply sends text (character by character) to a port on a server and then displays any reply. Thus it can be used to simulate any text-based protocol - date, echo, HTTP, SMTP, FTP, etc

Run telnet under Windows NT by

start | programs | tools and accessories | accessories | telnet.

Select the connect menu. Connect to a host of your choice by entering the host name and a port number.


Start with the SHU server and port number 13 to obtain the date and time. Go to some servers in other parts of the world and see whether the time difference makes sense.


Continue with the port number to echo whatever you type in - port 7. Contrary to what you might expect, the text only appears once!. If you want to see what you type, rather than the echo, select the Terminal menu, select preferences and click on local echo.

Go to a server at the other side of the world and see if the echo takes longer.

Web (HTTP)

Connect for HTTP to a host, say Then type:

GET /index.html HTTP/1.0

then an empty line - a line with nothing on it.

Use upper case as shown. GET is one of several HTTP commands. You might choose to type some file name other than the one shown (the index file). Observe the reply. The meanings of error messages are also given in these notes.

If it is a long web page, it may be difficult to display because this Telnet program does not allow its window to be scrolled. You might want to switch on logging before connecting. Go to menu Terminal, select Start Logging and specify the file where you want the log placed. Thereafter you can display the dialogue using Notepad or Word. You should see several lines of heading, followed by the raw HTML of the Web page.

If you are feeling adventurous, try some of the other port numbers for example SMTP, FTP.


A socket is the software mechanism for one program to connect to another. A pair of programs open a socket connection between themselves. This then acts like a telephone connection - they can converse in both directions for as long as the connection is open. (In fact, data can flow in both directions at the same time.) More than one socket can use any particular port. The network software ensures that data is routed to or from the correct socket.

When a server (on a particular port number) gets an initial request, it often spawns a separate thread to deal with the client. This is because different clients may well run a different speeds. Having one thread per client means that the different speeds can be accommodated. The new thread creates a (software) socket to use as the connection to the client. Thus one port may be associated with many sockets.


Accessing information across the Internet is accomplished using streams. A stream is a serial collection of data, such as can be sent to a printer, a display or a serial file. Similarly a stream is like data input from a keyboard or input from a serial file. Thus reading or writing to another program across a network is just like reading or writing to a serial file.


A URL (Uniform Resource Locator) is:

A URL has the structure:


Things in square brackets indicate that the item can be omitted.

The first part of a URL is the particular protocol. Some commonly-used protocols are:


The service is the Web. The file is accessed using the HTTP protocol.




The service is file transfer protocol. The URL locates a file, a directory or an FTP server.




The service is remote login to a host. No file name is needed.




The service is e-mail.




The URL specifies a usenet newsgroup.




This locates a file on the local system. The server part of the URL is omitted.

The host name is the name of the server that provides the service. This can either be a domain name or an IP address.

The port number is only needed when the server does not use the default port number. For example, 80 is the default port number for HTTP.

A pathname (optional) specifies a directory (folder). The pathname is not the complete directory name, but is relative to some directory (folder) designated by the administrator as the directory in which publicly-accessible files are held. It would be unusual for a server to make available all of its files system to clients.

The file name can either be a data file name or can specify an executable file that produces a valid HTML document as its output. A file name is often omitted. In this case, the server decides which file to use. Many servers send a default file from the directory specified in the path name - for example a file called default.html, index.html or welcome.html.

The section part of a URL (optional) specifies a named anchor in an HTML document. Such a place in a document is specified by an HTML entry like:

<A NAME="thisplace"></A>

which would be referred to by thisplace as the section in the URL.


Distinguish between a domain name, a host name, a URL, a path name, an e-mail address.