Setting up a proxy server to save on WWT bandwidth usage

At the AAS233 meeting, we ran several WWT workshops where lots of people were using WWT at the same time in the same room. Traditionally, the WiFi at these meetings has been very poor, leading to a very unpleasant experience as everyone struggles to use the WWT webclient with extremely limited bandwidth. It’s frustrating for everyone involved.

To try to mitigate this, I decided to try setting up a local HTTP proxy server. The idea was that if we had everyone channel their web traffic through the proxy, we might get some significant bandwidth savings — if 30 people are trying to pull down the same file at once, the proxy will (ideally) reduce that all into one fetch from the Internet, saving you a factor of 30 in outbound bandwidth.

“Unfortunately” the WiFi was actually very good this year, so we didn’t use the proxy! But I thought I’d document what I did just in case it comes in handy to someone else. Please keep in mind, however, that this recipe is not battle-tested!

Proxy Setup

I ended up using squid, of course. In particular, I used the datadog/squid Docker image even though squid shouldn’t be hard to install on any reasonable Linux distro. This ended up being important because with the current WWT HTTP behavior, version 4.x of squid is able to cache very few assets locally, making the proxy essentially useless. The Datadog image contains squid 3.x, which you can force to be more aggressive about caching in a way that ends up being a lot more helpful.

In particular, WWT currently serves its DSS tiles with HTTP headers that prevent caching. After some experimentation, I ended up devising the following squid.conf to force the proxy to save them, and a few other common assets too. The first half of the file is basically Squid boilerplate.

# Default ACL stuff. In production, you'll probably need to add IP ranges here!
acl localnet src	# RFC1918 possible internal network
acl localnet src	# RFC1918 possible internal network
acl localnet src	# RFC1918 possible internal network
acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443
acl Safe_ports port 80		# http
acl Safe_ports port 21		# ftp
acl Safe_ports port 443		# https
acl Safe_ports port 70		# gopher
acl Safe_ports port 210		# wais
acl Safe_ports port 1025-65535	# unregistered ports
acl Safe_ports port 280		# http-mgmt
acl Safe_ports port 488		# gss-http
acl Safe_ports port 591		# filemaker
acl Safe_ports port 777		# multiling http

http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
http_access deny manager
http_access deny to_localhost
http_access allow localnet
http_access allow localhost
http_access deny all

# Other defaults:
http_port 3128
cache_dir ufs /var/spool/squid 100 16 256
coredump_dir /var/spool/squid

# More URL detail in logs:
logformat squid      %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %>ru %[un %Sh/%<a %mt

# Force caching of various static assets
refresh_pattern ^http://(www\.)?worldwidetelescope\.org/wwtweb/dss\.aspx\?q= 144000 100% 144000 ignore-private
refresh_pattern ^http://(www\.)?worldwidetelescope\.org/wwtweb/catalog\.aspx\? 144000 100% 144000 ignore-private ignore-reload
refresh_pattern ^http://(www\.)?worldwidetelescope\.org/wwtweb/thumbnail\.aspx\? 144000 100% 144000 ignore-private
refresh_pattern .		0	20%	4320

# Try to get us to cache searchdata.min.js
quick_abort_min -1 KB
quick_abort_max 100 MB

In production, you’ll need to add some IP ranges to the ACL lines to tell squid to allow requests from your local users.

You can then use a Docker volume to launch squid in the container using your customized file:

docker run --rm --name squid -d -p 3128:3128 -v $(pwd)/squid.conf:/etc/squid/squid.conf:rw,z datadog/squid

Local Network Setup

Once you have a proxy running, the next step would be to set up your local network so that your users’ computers will be able to talk to it. I’m not going to document this step, though, since it’s always going to be deployment-specific. Basically you need a way for all of your clients to talk to the squid server over a fast local connection, while the squid server needs a way to talk to the wider internet over as fast of a connection as you can wrangle.

User Setup

Finally, the users would need to set up their computers to talk to the proxy. Here is a Google Doc that I put together for AAS233 with instructions. However, I never used it because the standard WiFi performance was far, far better than usual. So probably there’s room for improvement here.