Experts Exchange

Proxy Server Operation
Initially, proxy servers were intended for solving a specific task, namely, caching data received from the Internet. For example, you may have a network of a hundred computers that all connect to the Internet using one physical communications link. It is well known that most users load the same pages several times a day. Loading the same page wastes the local server's bandwidth.

Do a simple calculation. Every day you use a search system, for example, Yahoo (www.yahoo.com) or Google (www.google.com). Assuming that on average, 10 requests are made from each of the 100 computers, about 1,000 loads of the same page will be made every day. I will not calculate how many megabytes this is, for it is already obvious that bandwidth is wasted.

A proxy server solves this problem by storing (caching) a Web page on the local disk the first time it was accessed. The next time a local user asks to access this page, instead of requesting it from the remote server, the local server serves it from the local disk cache. The economy is obvious. With time, these features have been enhanced and currently offer the following functions:

Caching documents received from the network

Caching the results of DNS requests

Organizing a network access gateway

Controlling Internet access

Providing anonymous Internet access by hiding addresses

Reducing IP address use

In this Page, the most popular Linux proxy server — squid — will be considered.

To reduce the bandwidth traffic and to increase the loading speed, a special program is installed on the server that provides access to the Internet (Fig. 9.1). When a page, for example, www.yahoo.com, is loaded on one of the local network's computers for the first time, all of its contents are saved in the proxy's cache. The next time the same page is requested from the local network, the images it contains are loaded not from the Internet but from the provider's proxy server, and the text (depending on the contents of the page and the changes to it) may be loaded from the source server.

Accessing the Internet via a proxy server

As a rule, the graphical contents of a page take up most of its volume. The text part of a page does not usually exceed 15 KB, but the graphical part can be 100 KB and more. Loading the latter information from the local proxy server makes it possible to reduce the bandwidth load and increase the page-loading speed.

The loading speed is increased because the proxy server is sending most of the Web page data (all graphics and the unmodified text) at the local network rate, which currently is 100 Mb/sec on even the cheapest network equipment. The dial-up Internet connection speed is much lower, ranging from 2 to 8 Mb/sec. At this rate, only text data that do not change are loaded (most often, HTML file contents).

In addition to caching Web pages, a proxy server can cache results of DNS requests. This can also have a positive effect on the productivity. Although humans prefer to use symbolic Web page names, computers convert them to the corresponding numerical IP addresses. Thus, before a page can be loaded, some time is taken for converting the symbolic address to its IP form. However, if the site being accessed has already been accessed, its IP address will be saved in the proxy's cache. So instead of going to a DNS server for the IP address, the proxy will take it from its cache.

As the World Wide Web has been developing and the requirements of its users have been increasing, capabilities of proxy servers have also been growing. Now, a proxy server can perform gateway functions and provide Internet access without additional software or equipment. Moreover, it serves as a shield guarding the network against invasions from the outside. When any of the proxy clients sends a Web page request to the Internet, the proxy server hides the client's IP address and sends the packets on its own behalf. This means that hackers can see only the address of the proxy server and will attempt to break into it and not into the computers it services. This makes it much easier to organize defense against outside attacks, because you can give more attention to one computer, that is, the proxy server, instead of spreading it among all client computers. However, the protection capabilities of proxy servers are too basic and are easily circumvented, so they should be supplemented by a good firewall and an eagle-eyed administrator.

The IP address concealment feature also makes it possible to save IP addresses. Because only the proxy server has the actual Internet connection, only it must have an IP address. The rest of the computers in the local network can have unroutable addresses reserved for private networks (in the 192.168.x.x. or 10.x.x.x ranges).

There are two types of proxy servers: transparent and anonymous. Transparent proxies simply forward a client's packets to the requested Web server without changing the sender's address. A proxy that conceals the sender's IP address is called anonymous. This server communicates with the external world on behalf of clients under its own name. This feature is often taken advantage of by miscreants. For example, hackers do their break-ins through anonymous proxy servers so that the owners of the burglarized machines will not be able to determine, from which address the break-in was perpetrated.

Today, there are many servers on the Internet claiming to offer anonymous proxy services, but not all them actually do. Some of them make the request source IP available to the system, to which the request is directed; others log all traffic activities, including IP addresses, with the logs available to law-enforcement agencies. Consequently, you can never be sure that the server is as anonymous as it claims to be.

Because not all of a network's computers are allowed Internet access, user authentication can carried out on the proxy server level.

Some proxy versions have a handy feature: They can exchange their cache data. For example, several offices may share one local network, but each of them has separate Internet access through its own proxy server to keep the Internet bills separate. The individual proxies can be combined into a sort of a proxy network so that if one of them does not have the information requested in its cache, it will check the caches of the other proxies for it.

Most often, this cache-sharing feature is implemented using the Internet Cache Protocol (ICP). If one server does not find the requested document in its cache, it sends an ICP request to the other proxies. If one of the proxies gives a positive reply, the information will be taken from its cache.

Using cache sharing does not lead to a significant loading-speed increase when requesting small documents, because it takes extra time to search for the documents in the shared caches. With a large request load on the servers and a sizable cache base, the search time may even be so long that it eliminates any speed load advantages. It still leaves the bandwidth economy factor, which may be important for those who have to watch each megabyte of traffic.

Not all proxy servers will have the main features just considered. It all depends on what purposes a particular proxy was developed for, and some of them are intended to address only one task.

To work through a proxy server, you have to properly configure the program you want to use the proxy with. Consider the Mozilla browser as an example. Launch the browser and select the Edit/Preferences menu sequence. A tree of categories that can be configured is located in a pane on the left in the Preferences dialog window. Select the Advanced/Proxies category sequence to configure proxy server connections. The default is no proxy: Direct connection to the Internet. You should select the Manual proxy configuration and specify the IP address and port of the proxy server for each protocol.

Configuring a proxy server in Mozilla

When configured to use a proxy server, the browser will send all requests to the proxy server, which will then forward them to the destination server. The proxy server always has to be loaded and must listen to the specific port (or several ports for different protocols).

A separate port is allocated for each protocol. For HTTP, intended to load Web pages, most often port 8080 is used; however, this value depends on the server and can be changed. Before using proxy server software, make sure that it has the necessary features and supports the necessary protocols. If the proxy does not support a certain protocol, its traffic will connect to the Internet directly.

To enhance the security of your network, you should configure the firewall to prohibit incoming connections to the ports used by the squid proxy service. For example, for the HTTP proxy port 3128 is used. Prohibiting incoming connections to this port will prevent using the proxy server for purposes other than those it is intended for, for example, breaking into the network.