Experts Exchange: Working with Squid Proxy Server

Working with Squid
Here I will consider some security aspects of the squid service and the supplementary features that can accelerate Internet operations.

Squid Security
When I first read squid documentation, I found the following two directives interesting: cache_effective_user and cache_effective_group. If squid is run as root, the user and group identifiers will be replaced with those specified by these tags. The user and group identifiers are set to squid by default:

cache_effective_user squid
cache_effective_group squid

In this way, squid will not work with the root rights, and when an attempt is made to make it do so, the service will itself lower its rights to squid. I do not recommend modifying these directives. There is no need to give the squid service greater rights, because those for the cache directory are sufficient for it.

Site Acceleration
Squid can be used to access a certain site more rapidly by acting as an httpd accelerator. At least three parameters have to be specified for this:

httpd_accel_host address — This indicates the host name of the accelerated server.

httpd_accel_port port — This sets the port, to which the accelerated requests are to be forwarded. Most often, this is the default port (port 80).

httpd_accel_uses_host_header on|off — The HTTP header contains a HOST filed in it, which is not checked by squid. This may be a source of security problems. The developers recommend setting the value of this option to off. It should be set to on if squid is operating in the transparent mode.

httpd_accel_with_proxy on|off — This needs to be set to on for the cache to function as both a Web cache and an accelerator.

User Agent Field
Many statistical systems do not take into account or do not allow entry to users in whose requests the User Agent field is blank. This field being blank indicates that the request was channeled through a proxy.

Another company I used to work for limited Internet access by IP addresses. I was the only programmer and the network administrator in my department. Only the department head, his assistant, and I were allowed Internet access. A few hours after I was hired, all other department workers had Internet access. How? Simple: I installed a proxy server on my computer, to which all of my coworkers could connect without having to go through an authentication process. The proxy redirected all requests from my coworkers to the main corporate proxy. Because all these requests were coming from me, the main proxy did not suspect anything.

It could have been suspicious, because there is a small flaw in this charitable solution. This is the User Agent field, which was blanked out when requests passed through my proxy. But there is a solution to this problem: the field can be filled out manually in the configuration file by the fake_user_agent directive. For example, the following line emulates requests coming from a Netscape browser:

fake_user_agent Netscrape/1.0 (CP/M; 8-bit)

Network Protection
The squid service is a two-edged sword: it can be used both to protect the network and to penetrate it. To prevent outside users from using the proxy server to connect to computers in the local network, the following directives have to be added to the configuration file:

tcp_incoming_address downstream_address
tcp_outgoing_address upstream_address
udp_incoming_address downstream_address
udp_outgoing_address upstream_address

In the preceding list, downstream_address is the address of the computer with squid installed whose network connection is directed to the local network; upstream_address is the address of the network connection directed to the Internet. If addresses are specified incorrectly, it will be possible to connect to the local network's computer from the outside. The following is an example of squid configured incorrectly:

tcp_incoming_address upstream_address
tcp_outgoing_address downstream_address
udp_incoming_address upstream_address
udp_outgoing_address downstream_address

Fighting Banners and Popup Windows
It was already mentioned that most traffic from any site is graphics. Most browsers allow the image-viewing feature to be disabled; this, however, will make Web surfing less convenient. Without graphics, some sites become less informative and more difficult to navigate; thus, it is not possible to dispense with graphics display altogether.

But there is a type of graphics that irritates and does not carry any useful information — the graphics we would love to, and can, get rid of. I am talking about banners. Consider how to disable banner display way up on the proxy server level. For this, first define the following rules in the squid.conf file:

acl banners_regex url_regex "/usr/etc/banners_regex"
acl banners_path_regex urlpath_regex "/usr/etc/banners_path_regex"
acl banners_exclusion url_regex "/usr/etc/banners_exclusion"

The first entry creates an ACL named banners_regex of the url_regex type that allows a complete URL to be searched. The last parameter specifies the /usr/etc/banners_regex file, in which the URLs of banner systems will be stored.

The second entry creates an ACL named banner_path_regex of the urlpath_regex type. The last parameter here specifies the /usr/etc/banners_path_regex file, in which URLs to be disallowed will be defined.

The third entry creates an ACL of the same type as the first one, named banners_exclusion and linked to the /usr/etc/banners_exclusion file. In the first two files, descriptions of URLs or templates to be used for killing banners will be stored. Sometimes, however, you may want to view a particular banner. In this case, its URL can be recorded in this file and the banner will be loaded.

Next, specify the following operators for the created ACLs:

http_access deny banners_path_regex !banners_exclusion
http_access deny banners_regex !banners_exclusion

Both directives do basically the same: They prohibit loading from the addresses specified in the banners_path_regex and banners_regex lists unless they are included in the banners_exclusion list.

Consider the following fragment of the contents of the /usr/etc/banners_regexfile:

^http://members\.tripod\.com/adm/popup/.+html
^http://www\.geocities\.com/ad_container/pop\.html

As you shsssould remember, this file contains template URL paths, and all addresses that match them will be filtered out.

The first entry describes a template that prohibits loading of addresses of the following type:

http://members.tripod.com/adm/popup/popup.html

As you can see, it is easy to do away with the popup windosws from the www.tripod.com site. If you know how to build regular expressions, you will be able to create a similar list for any banner system and cut off the most sophisticated paths of graphical pests. The subject of regular expressions is not considered in this book because it is too extensive and requires a book all for itself

In your fight with banners, be prepared for the resurrection of the banners you thought you had killed off. This is because banners are simply commercials allowing sites to earn money to stay in business. Some especially clever administrators are constantly looking for ways to prevent users from getting rid of banners. One of the ways they achieve this is by changing the addresses, from which the banners are served, to neutralize regular expressions.

Replacing Banners
Even though in most cases banners and popup windows are irritating pests, they provide some artistic dressing for pages. Having eliminated them, you may find pages dull and unattractive. This problem can be alleviated by replacing removed banners and popup windows with your own images, which are stored on the local server and, thus, do not have to be loaded from the Internet.

The tool to implement this task is a redirector. In squid, this is an external program that replaces addresses. For example, if the page code contains an address for a banner and your banner-filter program detects it, the redirector replaces the address of the other guy's banner with the address of whatever you may want to load in its place.

There is only one little problem with this: Linux has no ready program for this task and you will have to write it yourself. Any programming language will do, but I will show an example implemented in Perl. If you know how to program in this language, I am certain you will like replacing banners better than simply killing them using ACLs.

Listing 9.2 shows an example of a classic redirector program. I tried to simplify it as much as possible to make it easier to adapt for your needs.

Listing 9.2: Perl redirector program

#!/usr/bin/perl

$| = 1;

# Specify the URL on your Web server, to which the images
# are stored.
$YOURSITE = 'http://yourserver.com/squid';
$LOG = '/usr/etc/redirectlog';
$LAZY_WRITE = 1;

if ($LOG) {
open LOG, ">> $LOG";
unless ($LAZY_WRITE)

{
select LOG ;
$| = 1;
select STDOUT;
}
}
@b468_60 = qw (
www\.sitename\.com/cgi/
# Add descriptions of the 468 x 60 banners'
# URLs here.
)

@b100 100= qw (
www\.sitename\.com/cgi/
# Add descriptions of the 100 x 100 banners'
# URLs here.
);

@various = qw (
www\.sitename\.com/cgi/
# Add descriptions of non-standard size banners'
# URLs here.
);

@popup_window = qw (
^http://members\.tripod\.com/adm/popup/.+html
^http://www\.geocities\.com/ad_container/pop\.html
^http://www\.geocities\.com/toto\?
# Add descriptions of popup windows' URLs here
);

# Descriptions of where images are located
$b468_60 = "$YOURSITE/468_60.gif";
$b100_100 = "$YOURSITE/100_100.gif";
$various = "$YOURSITE/empty.gif";
$closewindow = "$YOURSITE/close.htm";

while (<>)
{
($url, $who, $ident, $method) = /^(\S+) (\S+) (\S+) (\S+)$/;
$prev = $url;

# A check for 468 x 60 banners

$url = $b468_60 if grep $url =~ m%$_%, @b468_60;

# A check for 100 x 100 banners
$url = $b100100 if grep $url =~ m%$_%, @blOO_100;

# A check for non-standard size banners
$url = $various if grep $url =~ m%$_%, @various;

# A check for popup windows
$url = $closewindow if grep $url =~ m%$_%, @popup_window;

# An individual site not included in the list at the
# beginning of the file
$url = "$YOURSITE/empty.gif" if $url =~ m%hitbox\.com/Hitbox\?%;

if ($LOG and $url ne $prev)
{
my ($sec, $min, $hour, $mday, $mon, $year) = localtime;
printf LOG "%2d.%02d.%2d %2d:%02d:%04d: %s\r\n",
$mday, $mon + 1, $year + 1900, $hour, $min, $sec,
"$who $prev > $url";
}
print "$url $who $ident $method\n";
}
close LOG if $LOG;

Save this program in the /usr/etc/redirector file and give squid the rights to execute it. Afterward, add the following entry to the squid.conf file:

redirect_program /usr/local/etc/squid/redirector

For the program to work, you will have to create the following files on your Web server:

468_60.gif— A 468 × 60 image.

100_100.gif— A 100 × 100 image.

empty.gif — An image that will replace all nonstandard banners. It is best to make it 1 × 1 pixels so that it does not spoil the aesthetics of the site's design.

close.htm — An HTML file to close popup windows. It contains the window.close()JavaScript function to close the windows. Listing 9.3 shows an example of the contents of this file.

JavaScript for killing popup windows

All these files should be stored on the Web server in one directory. Don't forget to specify the correct path to this directory in the script's $YOURSITE variable.

I tried to explain the most important code areas in Listing 9.2 with comments. If you have Perl programming experience, you will have no problems making it all work.

Barring Sites
I had a conversation with an acquaintance not long ago, and he offered a definition of the Internet that I found amusing: The World Wide Web was created for and lives by pornography. Although I do not completely agree with him, I feel he might be partially right in that the sites with sexy content are most frequently visited (if you don't take into account the Microsoft update site, from which users download patches for software from this company).

No employers will be happy if their workers visit sites with illicit content during work hours. This produces not only traffic waste but also other expenses unrelated to work. Parents do not want their children to be exposed to sites like these either, and they strive to shelter their sensibilities from too many facts of life. I am saying this as a father of two children.

Pornography sites can be easily banned using the same methods as those used to kill banners. For example, you could disallow any site whose URL contains the word "sex." But this method can produce false calls. For example, an address may contain the "GasExpo" text in it. Because it contains a letter combination that spells "sex," this site will be barred. This is a real-life example, in which a user was not allowed to load a gas-equipment exhibition site.

Although creating lists of prohibited sites is a difficult task, it is not an impossible one. Currently, most sites of erotic persuasion have folded their activities in the com domain and are settling down in other domains, which usually belong to small island nations. In some of such domains, almost 90% of sites are of the adult entertainment nature. These you could bar without any fear that someone won't be able to catch up on the latest in the gas equipment developments.

Limiting Bandwidth
Frequently, when organizing Internet access some users have to be provided a high-speed connection. How can this be accomplished if, by default, all users are peers and can access the Internet at the maximum speed available? You have to establish some priorities to achieve this.

If a user requires a wide bandwidth channel to work with applications requiring a high data-exchange rate (e.g., for presentations), you have to reserve for this user a channel of wider bandwidth than that set aside for the rest of the users. This can be achieved only by borrowing bandwidth from other users.

Limiting the external channel is easy to accomplish using squid. The following example lists the directives used to achieve this:

delay_pools 3
delay_class 1 1
delay_class 2 2
delay_class 3 1
delay_parameters 1 256000/256000
delay_access 1 deny all
delay_access 1 allow admins
delay_parameters 2 256000/256000 4000/8000
delay_access 2 allow all
delay_access 2 deny admins
delay_parameters 3 64000/64000
delay_access 3 deny all
delay_access 3 allow bigboss

Add this code to the /etc/squid/squid.conf configuration file after the following comment:

# DELAY POOL PARAMETERS (all require DELAY_POOLS compilation option).
#--------------------------------------------------------------------

Most of the parameters are already set by default and have to be modified.

The first line — delay_pools n — specifies that there will be n number of delay pools (rules describing access speeds) to use. By default, n equals 0; there is no limit on the number of pools. Because you are going to create three pools, n is set to 3.

Next, the pools are actually created using the delay_class n c directive, where n is the pool number and c is the class number.

There are three different pool classes. These are the following:

1 — The download rates of all connections in the class are added together, and the aggregate is kept below a given maximum value. For example, you can limit the download speed from all adult entertainment sites (defined in advance using acl tag) to 32 Kb/sec. If your Internet connection bandwidth is, for example, 256 Kb/sec, no matter how many people try to download hot stuff, they will have only 32 Kb/sec to share, with the rest of the users guaranteed the remaining 224 Kb/sec of bandwidth.

2 — The aggregate bandwidth for all connections in the class and the bandwidth of each connection in the class is limited. For example, with a 256 Kb/sec Internet connection, you can limit a certain class of users to 128 Kb/sec and ensure that no single user gets more than his or her fair share of this bandwidth.

3 — The aggregate bandwidth for all connections and the bandwidth for each IP range and the bandwidth for each connection is limited. Suppose you have four IP ranges (subnetworks) in your local network and an Internet connection speed of 512 Kb/sec. You want to leave 64 Kb/sec available for mail and other service traffic. This leaves 512 - 64 = 448 Kb/sec for all four subnetworks. Each of the four subnetworks is further limited to about 112 Kb/sec. Each user of each subnetwork is then limited to his or her share of the subnetwork's bandwidth, the actual bandwidth depending on the number of users and their download habits.

In the example, I used delay pools class 1, class 2, and class 1 again. I did it on purpose to make the example more illustrative.

Next, speed limits are set on each pool as follows:

delay_parameters delay_pool aggregate_bandwidth
network_bandwidth user_bandwidth

The dealy_pool parameter is the pool number whose bandwidth is being limited. In the example, the following line limits the bandwidth of the first pool:

delay_parameters 1 256000/256000

Because pool 1 is of the type 1 class (delay_class 1 1) — that is, only its aggregate bandwidth can be limited — the directive takes only one parameter: aggregate_bandwidth (the value 256000/256000). The parameter's value consists of two numbers separated by a slash. The first number is the actual speed limit (in bytes per second). The second number is the threshold, in bytes downloaded, when this speed limit lacks in. For example, when downloading a large file, its first 16,000 bytes will be downloaded at the normal speed, whatever it is. But then the limit will kick in and the remainder of the file will download at 4,000 bytes per second (32 Kb/sec).

The number of parameters depends on the pool class. Only two parameters have to be specified for the class 1 pool, which limits the aggregate connection bandwidth:

delay_parameters delay_pool aggregate_bandwidth

The directive for the second pool class looks as follows:

delay_parameters delay_pool aggregate_bandwidth user_bandwidth

Thus, the first directive sets the aggregate bandwidth of all connections to 256,000 bytes per second (or 2 Mb/sec). No bandwidth limit is imposed if it is specified as -1.

After the bandwidth limitations for the first pool are specified, access rights to the pool are set by the delay_access directive as follows:

delay_access delay_pool allow|deny acl

The first parameter is the pool number. This is followed by the access or the deny option for the members of the list, given as the last parameter (acl).

In the example, access rights to pool 1 are set for two groups: all and admins:

delay_access 1 deny all
delay_access 1 allow admins

The first directive bars all users from working at the given bandwidth, and the second gives access to it to only the members of the admins ACL. It is assumed that only administrators are such members.

Next, a description of the bandwidth limitations and access rights for the second pool are given:

delay_parameters 2 256000/256000 4000/8000
delay_access 2 allow all
delay_access 2 deny admins

The second pool is of the type 2 class. Here, the aggregate bandwidth limitation is specified (256,000 bytes per second), as well as the bandwidth limitation for individual connections (4,000 bytes per second). All users but the administrators will work at this speed.

Finally, there could be some problems if you limit the boss to the bandwidth of 4,000 bytes per second like a regular user. To avoid potential problems, separate permission is given to the boss as follows:

delay_parameters 3 64000/64000
delay_access 3 deny all
delay_access 3 allow bigboss

The bandwidth limitation feature can be used to bar loading of multimedia files during work hours. Listing 9.4 shows how to configure squid to read Web pages at regular speeds but to limit speeds for loading media files during work hours.

Limiting speed for loading media during work hours

# ACL describing the network
acl fullspeed url_regex -i 192.168.1
# ACL describing media files that must put the brakes on
# during work hours
acl mediaspeed url_regex -i ftp .exe .mp3 .avi .mpeg .iso .wav
# The period, during which the restriction on the
# download speed of media files applies
acl day time 08:00-18:59

# Two second-class pools are needed.
delay_pools 2
delay_class 1 2

delay_class 2 2

# The first pool has no restrictions for anyone.
delay_parameters 1 -1/-1 -1/-1
delay_access 1 allow fullspeed

# The second pool restricts daytime speed.
delay_parameters 2 4000/100000 4000/100000
delay_access 2 allow day
delay_access 2 deny !day
delay_access 2 allow mediaspeed

I believe the comments to the code are sufficient to understand how it functions. The media file download speed, however, is limited for all users. If you want to make exceptions for certain users from this restriction, you can create an ACL for them (named, for example, allowfull) and add the following line at the end of the listing:

delay_access 2 deny !allowfull