A caching service is a set of reverse proxies keeping a smaller cache of content in memory to speed up access to resources on a slower backend web server.

Tutorial

To inspect the current cache hit ratio, head over to the cache health dashboard in grafana. It should be at least 75% and generally over or close to 90%.

How-to

Traffic inspection

A quick way to see how much traffic is flowing through the cache is to fire up slurm on the public interface of the caching server (currently cache01 and cache-02):

slurm -i eth0

This will display a realtime graphic of the traffic going in and out of the server. It should be below 1Gbit/s (or around 120MB/s).

Another way to see throughput is to use iftop, in a similar way:

iftop -i eth0 -n

This will show per host traffic statistics, which might allow pinpointing possible abusers. Hit the L key to turn on the logarithmic scale, without which the display quickly becomes unreadable.

Log files are in /var/log/nginx (although those might eventually go away, see ticket #32461). The lnav program can be used to show those log files in a pretty way and do extensive queries on them. Hit the i button to flip to the "histogram" view and z multiple times to zoom all the way into a per-second hit rate view. Hit q to go back to the normal view, which is useful to inspect individual hits and diagnose why they fail to be cached, for example.

Immediate hit ratio can be extracted from lnav thanks to our custom log parser shipped through Puppet. Load the log file in lnav:

lnav /var/log/nginx/ssl.blog.torproject.org.access.log

then hit ; to enter the SQL query mode and issue this query:

SELECT count(*), upstream_cache_status FROM logline WHERE status_code < 300 GROUP BY upstream_cache_status;

See also logging for more information about lnav.

Pager playbook

The only monitoring for this service is to ensure the proper number of nginx processes are running. If this gets triggered, the fix might be to just restart nginx:

service nginx restart

... although it might be a sign of a deeper issue requiring further traffic inspection.

Disaster recovery

In case of fire, head to the torproject.org zone in the dns/domains and flip the DNS record of the affected service back to the backend. See ticket #32239 for details on that.

TODO: this could be improved. How to deal with DDOS? Memory, disk exhaustion? Performance issues?

Reference

Installation

Include roles::cache in Puppet.

TODO: document how to add new sites in the cache. See ticket#32462 for that project.

SLA

Service should generally stay online as much as possible, because it fronts critical web sites for the Tor project, but otherwise shouldn't especially differ from other SLA.

Hit ratio should be high enough to reduce costs significantly on the backend.

Design

The cache service generally constitutes of two or more servers in geographically distinct areas that run a webserver acting as a reverse proxy. In our case, we run the Nginx webserver with the proxy module for the https://blog.torproject.org/ website (and eventually others, see ticket #32462). One server is in the ganeti cluster, and another is a VM in the Hetzner Cloud (2.50EUR/mth).

DNS for the site points to cache.torproject.org, an alias for the caching servers, which are currently two: cache01.torproject.org [sic] and cache-02. An HTTPS certificate for the site was issued through letsencrypt. Like the Nginx configuration, the certificate is deployed by Puppet in the roles::cache class.

When a user hits the cache server, content is served from the cache stored in /var/cache/nginx, with a filename derived from the proxy_cache_key and proxy_cache_path settings. Those files should end up being cached by the kernel in virtual memory, which should make those accesses fast. If the cache is present and valid, it is returned directly to the user. If it is missing or invalid, it is fetched from the backend immediately. The backend is configured in Puppet as well.

Requests to the cache are logged to the disk in /var/log/nginx/ssl.$hostname.access.log, with IP address and user agent removed. Then mtail parses those log files and increments various counters and exposes those as metrics that are then scraped by prometheus. We use grafana to display that hit ratio which, at the time of writing, is about 88% for the blog.

Puppet architecture

Because the Puppet code isn't public yet (ticket #29387, here's a quick overview of how we set things up for others to follow.

The entry point in Puppet is the roles::cache class, which configures an "Nginx server" (like an Apache vhost) to do the caching of the backend. It also includes our common Nginx configuration in profile::nginx which in turns delegates most of the configuration to the Voxpupuli Nginx Module.

The role is essentially consists of:

include profile::nginx

nginx::resource::server { 'blog.torproject.org':
  ssl_cert              => '/etc/ssl/torproject/certs/blog.torproject.org.crt-chained',
  ssl_key               => '/etc/ssl/private/blog.torproject.org.key',
  proxy                 => 'https://live-tor-blog-8.pantheonsite.io',
  # no servicable parts below
  ipv6_enable           => true,
  ipv6_listen_options   => '',
  ssl                   => true,
  # part of HSTS configuration, the other bit is in add_header below
  ssl_redirect          => true,
  # proxy configuration
  #
  # pass the Host header to the backend (otherwise the proxy URL above is used)
  proxy_set_header      => ['Host $host'],
  # should map to a cache zone defined in the nginx profile
  proxy_cache           => 'default',
  # start caching redirects and 404s. this code is taken from the
  # upstream documentation in
  # https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_valid
  proxy_cache_valid     => [
    '200 302 10m',
    '301      1h',
    'any 1m',
  ],
  # allow serving stale content on error, timeout, or refresh
  proxy_cache_use_stale => 'error timeout updating',
  # allow only first request through backend
  proxy_cache_lock      => 'on',
  # purge headers from backend we will override. X-Served-By and Via
  # are merged into the Via header, as per rfc7230 section 5.7.1
  proxy_hide_header     => ['Strict-Transport-Security', 'Via', 'X-Served-By'],
  add_header            => {
    # this is a rough equivalent to Varnish's Age header: it caches
    # when the page was cached, instead of its age
    'X-Cache-Date'              => '$upstream_http_date',
    # if this was served from cache
    'X-Cache-Status'            => '$upstream_cache_status',
    # replace the Via header with ours
    'Via'                       => '$server_protocol $server_name',
    # cargo-culted from Apache's configuration
    'Strict-Transport-Security' => 'max-age=15768000; preload',
  },
  # cache 304 not modified entries
  raw_append            => "proxy_cache_revalidate on;\n",
  # caches shouldn't log, because it is too slow
  #access_log            => 'off',
  format_log            => 'cacheprivacy',
}

There are also firewall (to open the monitoring, HTTP and HTTPS ports) and mtail (to read the log fiels for hit ratios) configurations but those are not essential to get Nginx itself working.

The profile::nginx class is our common Nginx configuration that also covers non-caching setups:

# common nginx configuration
#
# @param client_max_body_size max upload size on this server. upstream
#                             default is 1m, see:
#                             https://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size
class profile::nginx(
  Optional[String] $client_max_body_size = '1m',
) {
  include webserver
  class { 'nginx':
    confd_purge           => true,
    server_purge          => true,
    manage_repo           => false,
    http2                 => 'on',
    server_tokens         => 'off',
    package_flavor        => 'light',
    log_format            => {
      # built-in, according to: http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format
      # 'combined' => '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

      # "privacy" censors the client IP address from logs, taken from
      # the Apache config, minus the "day" granularity because of
      # limitations in nginx. we remove the IP address and user agent
      # but keep the original request time, in other words.
      'privacy'      => '0.0.0.0 - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "-"',

      # the "cache" formats adds information about the backend, namely:
      # upstream_addr - address and port of upstream server (string)
      # upstream_response_time - total time spent talking to the backend server, in seconds (float)
      # upstream_cache_status - state fo the cache (MISS, HIT, UPDATING, etc)
      # request_time - total time spent answering this query, in seconds (float)
      'cache'        => '$server_name:$server_port $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $upstream_addr $upstream_response_time $upstream_cache_status $request_time',  #lint:ignore:140chars
      'cacheprivacy' => '$server_name:$server_port 0.0.0.0 - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "-" $upstream_addr $upstream_response_time $upstream_cache_status $request_time',  #lint:ignore:140chars
    },
    # XXX: doesn't work because a default is specified in the
    # class. doesn't matter much because the puppet module reuses
    # upstream default.
    worker_rlimit_nofile  => undef,
    accept_mutex          => 'off',
    # XXX: doesn't work because a default is specified in the
    # class. but that doesn't matter because accept_mutex is off so
    # this has no effect
    accept_mutex_delay    => undef,
    http_tcp_nopush       => 'on',
    gzip                  => 'on',
    client_max_body_size  => $client_max_body_size,
    run_dir               => '/run/nginx',
    client_body_temp_path => '/run/nginx/client_body_temp',
    proxy_temp_path       => '/run/nginx/proxy_temp',
    proxy_connect_timeout => '60s',
    proxy_read_timeout    => '60s',
    proxy_send_timeout    => '60s',
    proxy_cache_path      => '/var/cache/nginx/',
    proxy_cache_levels    => '1:2',
    proxy_cache_keys_zone => 'default:10m',
    # XXX: hardcoded, should just let nginx figure it out
    proxy_cache_max_size  => '15g',
    proxy_cache_inactive  => '24h',
    ssl_protocols         => 'TLSv1 TLSv1.1 TLSv1.2 TLSv1.3',
    # XXX: from the apache module see also https://trac.torproject.org/projects/tor/ticket/32351
    ssl_ciphers           => 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS', # lint:ignore:140chars
  }
  # recreate the default vhost
  nginx::resource::server { 'default':
    server_name         => ['_'],
    www_root            => "/srv/www/${webserver::defaultpage::defaultdomain}/htdocs/",
    listen_options      => 'default_server',
    ipv6_enable         => true,
    ipv6_listen_options => 'default_server',
    # XXX: until we have an anonymous log format
    access_log          => 'off',
    ssl                 => true,
    ssl_redirect        => true,
    ssl_cert            => '/etc/ssl/torproject-auto/servercerts/thishost.crt',
    ssl_key             => '/etc/ssl/torproject-auto/serverkeys/thishost.key';
  }
}

There are lots of config settings there, but they are provided to reduce the diff between the upstream debian package and the Nginx module from the forge. This was filed upstream as a bug.

Issues

Only serious issues, or issues that are not in the cache component but still relevant to the service, are listed here:

  • the cipher suite is an old hardcoded copy derived from Apache, see ticket #32351
  • the Nginx puppet module diverges needlessly from upstream and Debian package configuration see puppet-nginx-1359

The service was launched as part of improvements to the blog infrastructure, in ticket #32090. The launch checklist and progress was tricket in ticket #32239.

File or search for issues in the services - cache component.

Sources

Discussion

This section regroups notes that were gathered during the research, configuration, and deployment of the service. That includes goals, cost, benchmarks and configuration samples.

Launch was done in the first week of November 2019 as part of ticket#32239, to front the https://blog.torproject.org/ site.

Overview

The original goal of this project is to create a pair of caching servers in front of the blog to reduce the bandwidth costs we're being charged there.

Goals

Must have

  • reduce the traffic on the blog, hosted at a costly provider (#32090)
  • HTTPS support in the frontend and backend
  • deployment through Puppet
  • anonymized logs
  • hit rate stats

Nice to have

  • provide a frontend for our existing mirror infrastructure, a home-made CDN for TBB and other releases
  • no on-disk logs
  • cute dashboard or grafana integration
  • well-maintained upstream Puppet module

Approvals required

  • approved and requested by vegas

Non-Goals

  • global CDN for users outside of TPO
  • geoDNS

Cost

Somewhere between 11EUR and 100EUR/mth for bandwidth and hardware.

We're getting apparently around 2.2M "page views" per month at Pantheon. That is about 1 hit per second and 12 terabyte per month, 36Mbit/s on average:

$ qalc
> 2 200 000 ∕ (30d) to hertz

  2200000 / (30 * day) = approx. 0.84876543 Hz

> 2 200 000 * 5Mibyte

  2200000 * (5 * mebibyte) = 11.534336 terabytes

> 2 200 000 * 5Mibyte/(30d) to megabit / s

  (2200000 * (5 * mebibyte)) / (30 * day) = approx. 35.599802 megabits / s

Hetzner charges 1EUR/TB/month over our 1TB quota, so bandwidth would cost 11EUR/month on average. If costs become prohibitive, we could switch to a Hetzner VM which includes 20TB of traffic per month at costs ranging from 3EUR/mth to 30EUR/mth depending on the VPS size (between 1 vCPU, 2GB ram, 20GB SSD and 8vCPU, 32GB ram and 240GB SSD).

Dedicated servers start at 34EUR/mth (EX42, 64GB ram 2x4TB HDD) for unlimited gigabit.

We first go with a virtual machine in the ganeti cluster and also a VM in Hetzner Cloud (2.50EUR/mth).

Proposed Solution

Nginx will be deployed on two servers. ATS was found to be somewhat difficult to configure and debug, while Nginx has a more "regular" configuration file format. Furthermore, performance was equivalent or better in Nginx.

Finally, there is the possibility of converging all HTTP services towards Nginx if desired, which would reduce the number of moving parts in the infrastructure.

Benchmark results overview

Hits per second:

Server AB Siege Bombardier B. HTTP/1
Upstream n/a n/a 2800 n/a
ATS, local 800 569 n/a n/a
ATS, remote 249 241 2050 1322
Nginx 324 269 2117 n/a

Throughput (megabyte/s):

Server AB Siege Bombardier B. HTTP/1
Upstream n/a n/a 145 n/a
ATS, local 42 5 n/a n/a
ATS, remote 13 2 105 14
Nginx 17 14 107 n/a

Launch checklist

See #32239 for a followup on the launch procedure.

Benchmarking procedures

Will require a test VM (or two?) to hit the caches.

Common procedure

  1. punch a hole in the firewall to allow cache2 to access cache1

    iptables -I INPUT -s 78.47.61.104 -j ACCEPT
    ip6tables -I INPUT -s 2a01:4f8:c010:25ff::1 -j ACCEPT
    
  2. point the blog to cache1 on cache2 in /etc/hosts:

    116.202.120.172 blog.torproject.org
    2a01:4f8:fff0:4f:266:37ff:fe26:d6e1 blog.torproject.org
    
  3. disable Puppet:

    puppet agent --disable 'benchmarking requires /etc/hosts override'
    
  4. launch the benchmark

Siege

Siege configuration sample:

verbose = false
fullurl = true
concurrent = 100
time = 2M
url = http://www.example.com/
delay = 1
internet = false
benchmark = true

Might require this, which might work only with varnish:

proxy-host = 209.44.112.101
proxy-port = 80

Alternative is to hack /etc/hosts.

apachebench

Classic commandline:

ab2 -n 1000 -c 100 -X cache01.torproject.org https://example.com/

-X also doesn't work with ATS, hacked /etc/hosts.

bombardier

Unfortunately, the bombardier package in Debian is not the HTTP benchmarking tool but a commandline game. It's still possible to install it in Debian with:

export GOPATH=$HOME/go
apt install golang
go get -v github.com/codesenberg/bombardier

Then running the benchmark is as simple as:

./go/bin/bombardier --duration=2m --latencies https://blog.torproject.org/

Baseline benchmark, from cache02:

anarcat@cache-02:~$ ./go/bin/bombardier --duration=2m --latencies https://blog.torproject.org/  -c 100
Bombarding https://blog.torproject.org:443/ for 2m0s using 100 connection(s)
[================================================================================================================================================================] 2m0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      2796.01     716.69    6891.48
  Latency       35.96ms    22.59ms      1.02s
  Latency Distribution
     50%    33.07ms
     75%    40.06ms
     90%    47.91ms
     95%    54.66ms
     99%    75.69ms
  HTTP codes:
    1xx - 0, 2xx - 333646, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   144.79MB/s

This is strangely much higher, in terms of throughput, and faster, in terms of latency, than testing against our own servers. Different avenues were explored to explain that disparity with our servers:

  • jumbo frames? nope, both connexions see packets larger than 1500 bytes
  • protocol differences? nope, both go over IPv6 and (probably) HTTP/2 (at least not over UDP)
  • different link speeds

The last theory is currently the only one standing. Indeed, 144.79MB/s should not be possible on regular gigabit ethernet (GigE), as it is actually more than 1000Mbit/s (1158.32Mbit/s). Sometimes the above benchmark even gives 152MB/s (1222Mbit/s), way beyond what a regular GigE link should be able to provide.

Other tools

Siege has trouble going above ~100 concurrent clients because of its design (and ulimit) limitations. Its interactive features are also limited, here's a set of interesting alternatives:

  • bombardier - golang, HTTP/2, better performance than siege in my (2017) tests, not in debian
  • boom - python rewrite of apachebench, supports duration, HTTP/2, not in debian, unsearchable name
  • go-wrk - golang rewrite of wrk with HTTPS, had performance issues in my first tests (2017), no duration target, not in Debian
  • hey - golang rewrite of apachebench, similar to boom, not in debian (ITP #943596), unsearchable name
  • Jmeter - interactive behavior, can replay recorded sessions from browsers
  • Locust - distributed, can model login and interactive behavior, not in Debian
  • Tsung - multi-protocol, distributed, erlang
  • wrk - multithreaded, epoll, Lua scriptable, no HTTPS, only in Debian unstable

Alternatives considered

Four alternatives were seriously considered:

  • Apache Traffic Server
  • Nginx proxying + caching
  • Varnish + stunnel
  • Fastly

Other alternatives were not:

Apache Traffic Server

Summary of online reviews

Pros:

  • HTTPS
  • HTTP/2
  • industry leader (behind cloudflare)
  • out of the box clustering support

Cons:

  • load balancing is an experimental plugin (at least in 2016)
  • no static file serving? or slower?
  • no commercial support

Used by Yahoo, Apple and Comcast.

First impressions

Pros:

  • Puppet module available
  • no query logging by default (good?)
  • good documentation, but a bit lacking in tutorials
  • nice little dashboard shipped by default (traffic_top) although it could be more useful (doesn't seem to show hit ratio clearly)

Cons:

  • configuration spread out over many different configuration file
  • complex and arcane configuration language (e.g. try to guess what this actually does:: CONFIG proxy.config.http.server_ports STRING 8080:ipv6:tr-full 443:ssl ip-in=192.168.17.1:80:ip-out=[fc01:10:10:1::1]:ip-out=10.10.10.1)
  • configuration syntax varies across config files and plugins
  • couldn't decouple backend hostname and passed Host header bad random tutorial found on the internet
  • couldn't figure out how to make HTTP/2 work
  • no prometheus exporters

Configuration

apt install trafficserver

Default Debian config seems sane when compared to the Cicimov tutorial. On thing we will need to change is the default listening port, which is by default:

CONFIG proxy.config.http.server_ports STRING 8080 8080:ipv6

We want something more like this:

CONFIG proxy.config.http.server_ports STRING 80 80:ipv6 443:ssl 443:ssl:ipv6

We also need to tell ATS to keep the original Host header:

CONFIG proxy.config.url_remap.pristine_host_hdr INT 1

It's clearly stated in the tutorial, but mistakenly in Cicimov's.

Then we also need to configure the path to the SSL certs, we use the self-signed certs for benchmarking:

CONFIG proxy.config.ssl.server.cert.path STRING /etc/ssl/torproject-auto/servercerts/
CONFIG proxy.config.ssl.server.private_key.path STRING /etc/ssl/torproject-auto/serverkeys/

When we have a real cert created in let's encrypt, we can use:

CONFIG proxy.config.ssl.server.cert.path STRING /etc/ssl/torproject/certs/
CONFIG proxy.config.ssl.server.private_key.path STRING /etc/ssl/private/

Either way, we need to tell ATS about those certs:

#dest_ip=* ssl_cert_name=thishost.crt ssl_key_name=thishost.key
ssl_cert_name=blog.torproject.org.crt ssl_key_name=blog.torproject.org.key

We need to add trafficserver to the ssl-cert group so it can read those:

adduser trafficserver ssl-cert

Then we setup this remapping rule:

map https://blog.torproject.org/ https://backend.example.com/

(backend.example.com is the prod alias of our backend.)

And finally curl is able to talk to the proxy:

curl --proxy-cacert /etc/ssl/torproject-auto/servercerts/ca.crt --proxy https://cache01.torproject.org/ https://blog.torproject.org

Troubleshooting

Proxy fails to hit backend:
curl: (56) Received HTTP code 404 from proxy after CONNECT

Same with plain GET:

# curl -s -k -I --resolve *:443:127.0.0.1 https://blog.torproject.org | head -1
HTTP/1.1 404 Not Found on Accelerator

It seems that the backend needs to respond on the right-side of the remap rule correctly, as ATS doesn't reuse the Host header correctly, which is kind of a problem because the backend wants to redirect everything to the canonical hostname for SEO purposes. We could tweak that and make backend.example.com the canonical host, but then it would make disaster recovery much harder, and could make some links point there instead of the real canonical host.

I tried the mysterious regex_remap plugin:

map http://cache01.torproject.org/ http://localhost:8000/ @plugin=regex_remap.so @pparam=maps.reg @pparam=host

with this in maps.reg:

.* $s://$f/$P/

... which basically means "redirect everything to the original scheme, host and path", but that (obviously, maybe) fails with:

# curl -I -s http://cache01.torproject.org/ | head -1
HTTP/1.1 400 Multi-Hop Cycle Detected

It feels it really doesn't want to act as a transparent proxy...

I also tried a header rewrite:

map http://cache01.torproject.org/ http://localhost:8000/ @plugin=header_rewrite.so @pparam=rules1.conf

with rules1.conf like:

set-header host cache01.torproject.org
set-header foo bar

... and the Host header is untouched. The rule works though because the Foo header appears in the request.

The solution to this is the proxy.config.url_remap.pristine_host_hdr documented above.

HTTP/2 support missing

Next hurdle: no HTTP/2 support, even when using proto=http2;http (falls back on HTTP/1.1) and proto=http2 only (fails with WARNING: Unregistered protocol type 0).

Benchmarks

Same host tests

With blog.tpo in /etc/hosts, because proxy-host doesn't work, and running on the same host as the proxy (!), cold cache:

root@cache01:~# siege https://blog.torproject.org/
** SIEGE 4.0.4
** Preparing 100 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:                  68068 hits
Availability:                 100.00 %
Elapsed time:                 119.53 secs
Data transferred:             654.47 MB
Response time:                  0.18 secs
Transaction rate:             569.46 trans/sec
Throughput:                     5.48 MB/sec
Concurrency:                   99.67
Successful transactions:       68068
Failed transactions:               0
Longest transaction:            0.56
Shortest transaction:           0.00

Warm cache:

root@cache01:~# siege https://blog.torproject.org/
** SIEGE 4.0.4
** Preparing 100 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:                  65953 hits
Availability:                 100.00 %
Elapsed time:                 119.71 secs
Data transferred:             634.13 MB
Response time:                  0.18 secs
Transaction rate:             550.94 trans/sec
Throughput:                     5.30 MB/sec
Concurrency:                   99.72
Successful transactions:       65953
Failed transactions:               0
Longest transaction:            0.62
Shortest transaction:           0.00

And traffic_top looks like this after the second run:

         CACHE INFORMATION                     CLIENT REQUEST & RESPONSE        
Disk Used   77.8K    Ram Hit     99.9%   GET         98.7%    200         98.3%
Disk Total 268.1M    Fresh       98.2%   HEAD         0.0%    206          0.0%
Ram Used    16.5K    Revalidate   0.0%   POST         0.0%    301          0.0%
Ram Total  352.3K    Cold         0.0%   2xx         98.3%    302          0.0%
Lookups    134.2K    Changed      0.1%   3xx          0.0%    304          0.0%
Writes      13.0     Not Cache    0.0%   4xx          2.0%    404          0.4%
Updates      1.0     No Cache     0.0%   5xx          0.0%    502          0.0%
Deletes      0.0     Fresh (ms)   8.6M   Conn Fail    0.0     100 B        0.1%
Read Activ   0.0     Reval (ms)   0.0    Other Err    2.8K    1 KB         2.0%
Writes Act   0.0     Cold (ms)   26.2G   Abort      111.0     3 KB         0.0%
Update Act   0.0     Chang (ms)  11.0G                        5 KB         0.0%
Entries      2.0     Not (ms)     0.0                         10 KB       98.2%
Avg Size    38.9K    No (ms)      0.0                         1 MB         0.0%
DNS Lookup 156.0     DNS Hit     89.7%                        > 1 MB       0.0%
DNS Hits   140.0     DNS Entry    2.0   
             CLIENT                                ORIGIN SERVER                
Requests   136.5K    Head Bytes 151.6M   Requests   152.0     Head Bytes 156.5K
Req/Conn     1.0     Body Bytes   1.4G   Req/Conn     1.1     Body Bytes   1.1M
New Conn   137.0K    Avg Size    11.0K   New Conn   144.0     Avg Size     8.0K
Curr Conn    0.0     Net (bits)  12.0G   Curr Conn    0.0     Net (bits)   9.8M
Active Con   0.0     Resp (ms)    1.2   
Dynamic KA   0.0                        
cache01                                    (r)esponse (q)uit (h)elp (A)bsolute

ab:

# ab -c 100 -n 1000 https://blog.torproject.org/
[...]
Server Software:        ATS/8.0.2
Server Hostname:        blog.torproject.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        blog.torproject.org

Document Path:          /
Document Length:        52873 bytes

Concurrency Level:      100
Time taken for tests:   1.248 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      53974000 bytes
HTML transferred:       52873000 bytes
Requests per second:    801.43 [#/sec] (mean)
Time per request:       124.776 [ms] (mean)
Time per request:       1.248 [ms] (mean, across all concurrent requests)
Transfer rate:          42242.72 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        8   47  20.5     46     121
Processing:     6   75  16.2     76     116
Waiting:        1   13   6.8     12      49
Total:         37  122  21.6    122     196

Percentage of the requests served within a certain time (ms)
  50%    122
  66%    128
  75%    133
  80%    137
  90%    151
  95%    160
  98%    169
  99%    172
 100%    196 (longest request)
Separate host

Those tests were performed from one cache server to the other, to avoid the benchmarking tool fighting for resources with the server.

In .siege/siege.conf:

verbose = false
fullurl = true
concurrent = 100
time = 2M
url = https://blog.torproject.org/
delay = 1
internet = false
benchmark = true

Siege:

root@cache-02:~# siege
** SIEGE 4.0.4
** Preparing 100 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:              28895 hits
Availability:             100.00 %
Elapsed time:             119.73 secs
Data transferred:         285.18 MB
Response time:              0.40 secs
Transaction rate:         241.33 trans/sec
Throughput:             2.38 MB/sec
Concurrency:               96.77
Successful transactions:       28895
Failed transactions:               0
Longest transaction:            1.26
Shortest transaction:           0.05

Load went to about 2 (Load average: 1.65 0.80 0.36 after test), with one CPU constantly busy and the other at about 50%, memory usage was low (~800M).

ab:

# ab -c 100 -n 1000 https://blog.torproject.org/
[...]
Server Software:        ATS/8.0.2
Server Hostname:        blog.torproject.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,4096,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        blog.torproject.org

Document Path:          /
Document Length:        53320 bytes

Concurrency Level:      100
Time taken for tests:   4.010 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      54421000 bytes
HTML transferred:       53320000 bytes
Requests per second:    249.37 [#/sec] (mean)
Time per request:       401.013 [ms] (mean)
Time per request:       4.010 [ms] (mean, across all concurrent requests)
Transfer rate:          13252.82 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       23  254 150.0    303     549
Processing:    14  119  89.3    122     361
Waiting:        5  105  89.7    105     356
Total:         37  373 214.9    464     738

Percentage of the requests served within a certain time (ms)
  50%    464
  66%    515
  75%    549
  80%    566
  90%    600
  95%    633
  98%    659
  99%    675
 100%    738 (longest request)

Bombardier results are much better and almost max out the gigabit connexion:

anarcat@cache-02:~$ ./go/bin/bombardier --duration=2m --latencies https://blog.torproject.org/  -c 100
Bombarding https://blog.torproject.org:443/ for 2m0s using 100 connection(s)
[=========================================================================] 2m0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      2049.82     533.46    7083.03
  Latency       49.75ms    20.82ms   837.07ms
  Latency Distribution
     50%    48.53ms
     75%    57.98ms
     90%    69.05ms
     95%    78.44ms
     99%   128.34ms
  HTTP codes:
    1xx - 0, 2xx - 241187, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   104.67MB/s

It might be because it supports doing HTTP/2 requests and, indeed, the Throughput drops down to 14MB/s when we use the --http1 flag, along with rates closer to ab:

anarcat@cache-02:~$ ./go/bin/bombardier --duration=2m --latencies https://blog.torproject.org/ --http1 -c 100
Bombarding https://blog.torproject.org:443/ for 2m0s using 100 connection(s)
[=========================================================================] 2m0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      1322.21     253.18    1911.21
  Latency       78.40ms    18.65ms   688.60ms
  Latency Distribution
     50%    75.53ms
     75%    88.52ms
     90%   101.30ms
     95%   110.68ms
     99%   132.89ms
  HTTP codes:
    1xx - 0, 2xx - 153114, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    14.22MB/s

Inter-server communication is good, according to iperf3:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  1.00 GBytes   859 Mbits/sec                  receiver

So we see the roundtrip does add significant overhead to ab and siege. It's possible this is due to the nature of the virtual server, much less powerful than the server. This seems to be confirmed by bombardieer's success, since it's possibly better designed than the other two to maximize resources on the client side.

Nginx

Summary of online reviews

Pros:

  • provides full webserver stack means much more flexibility, possibility of converging over a single solution across the infrastructure
  • very popular
  • load balancing (but no active check in free version)
  • can serve static content
  • HTTP/2
  • HTTPS

Cons:

  • provides full webserver stack (!) means larger attack surface
  • no ESI or ICP?
  • does not cache out of the box, requires config which might imply lesser performance
  • opencore model with paid features, especially "active health checks", "Cache Purging API" (although there are hackish ways to clear the cache and a module), and "session persistence based on cookies"
  • most plugins are statically compiled in different "flavors", although it's possible to have dynamic modules

Used by Cloudflare, Dropbox, MaxCDN and Netflix.

First impressions

Pros:

  • "approved" Puppet module
  • single file configuration
  • config easy to understand and fairly straightforward
  • just frigging works
  • easy to serve static content in case of problems
  • can be leveraged for other applications
  • performance comparable or better than ATS

Cons:

Configuration

picking the "light" debian package. The modules that would be interesting in others would be "cache purge" (from extras) and "geoip" (from full):

apt install nginx-light

Then drop this config file in /etc/nginx/sites-available and symlink into sites-enabled:

server_names_hash_bucket_size 64;
proxy_cache_path /var/cache/nginx/ levels=1:2 keys_zone=blog:10m;

server {
    listen 80;
    listen [::]:80;
    listen 443 ssl;
    listen [::]:443 ssl;
    ssl_certificate /etc/ssl/torproject/certs/blog.torproject.org.crt-chained;
    ssl_certificate_key /etc/ssl/private/blog.torproject.org.key;

    server_name blog.torproject.org;
    proxy_cache blog;

    location / {
        proxy_pass https://live-tor-blog-8.pantheonsite.io;
        proxy_set_header Host       $host;

        # cache 304
        proxy_cache_revalidate on;

        # add cookie to cache key
        #proxy_cache_key "$host$request_uri$cookie_user";
        # not sure what the cookie name is
        proxy_cache_key $scheme$proxy_host$request_uri;

        # allow serving stale content on error, timeout, or refresh
        proxy_cache_use_stale error timeout updating;
        # allow only first request through backend
        proxy_cache_lock on;

        # add header
        add_header X-Cache-Status $upstream_cache_status;
    }
}

... and reload nginx.

I tested that logged in users bypass the cache and things generally work well.

A key problem with Nginx is getting decent statistics out. The upstream nginx exporter supports only (basically) hits per second through the stub status module a very limited module shipped with core Nginx. The commercial version, Nginx Plus, supports a more extensive API which includes the hit rate, but that's not an option for us.

There are two solutions to work around this problem:

  • create our own metrics using the Nginx Lua Prometheus module: this can have performance impacts and involves a custom configuration
  • write and parse log files, that's the way the munin plugin works - this could possibly be fed directly into mtail to avoid storing logs on disk but still get the date (include $upstream_cache_status in the logs)
  • use a third-party module like vts or sts and the exporter to expose those metrics - the vts module doesn't seem to be very well maintained (no release since 2018) and it's unclear if this will work for our use case

Here's an example of how to do the mtail hack. First tell nginx to write to syslog, to act as a buffer, so that parsing doesn't slow processing, excerpt from the nginx.conf snippet:

# Log response times so that we can compute latency histograms
# (using mtail). Works around the lack of Prometheus
# instrumentation in NGINX.
log_format extended '$server_name:$server_port '
            '$remote_addr - $remote_user [$time_local] '
            '"$request" $status $body_bytes_sent '
            '"$http_referer" "$http_user_agent" '
            '$upstream_addr $upstream_response_time $request_time';

access_log syslog:server=unix:/dev/log,facility=local3,tag=nginx_access extended;

(We would also need to add $upstream_cache_status in that format.)

Then count the different stats using mtail, excerpt from the mtail config snippet:

# Define the exported metrics.
counter nginx_http_request_total
counter nginx_http_requests by host, vhost, method, code, backend
counter nginx_http_bytes by host, vhost, method, code, backend
counter nginx_http_requests_ms by le, host, vhost, method, code, backend 

/(?P<hostname>[-0-9A-Za-z._:]+) nginx_access: (?P<vhost>[-0-9A-Za-z._:]+) (?P<remote_addr>[0-9a-f\.:]+) - - [[^\]]+\] "(?P<request_method>[A-Z]+) (?P<request_uri>\S+) (?P<http_version>HTTP\/[0-9\.]+)" (?P<status>\d{3}) ((?P<response_size>\d+)|-) "[^"]*" "[^"]*" (?P<upstream_addr>[-0-9A-Za-z._:]+) ((?P<ups_resp_seconds>\d+\.\d+)|-) (?P<request_seconds>\d+)\.(?P<request_milliseconds>\d+)/ {

    nginx_http_request_total++
    # [...]
}

We'd also need to check the cache statuf in that parser.

A variation of the mtail hack was adopted in our design.

Benchmarks

ab:

root@cache-02:~# ab -c 100 -n 1000 https://blog.torproject.org/
[...]
Server Software:        nginx/1.14.2
Server Hostname:        blog.torproject.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,4096,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        blog.torproject.org

Document Path:          /
Document Length:        53313 bytes

Concurrency Level:      100
Time taken for tests:   3.083 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      54458000 bytes
HTML transferred:       53313000 bytes
Requests per second:    324.31 [#/sec] (mean)
Time per request:       308.349 [ms] (mean)
Time per request:       3.083 [ms] (mean, across all concurrent requests)
Transfer rate:          17247.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       30  255  78.0    262     458
Processing:    18   35  19.2     28     119
Waiting:        7   19   7.4     18      58
Total:         81  290  88.3    291     569

Percentage of the requests served within a certain time (ms)
  50%    291
  66%    298
  75%    303
  80%    306
  90%    321
  95%    533
  98%    561
  99%    562
 100%    569 (longest request)

About 50% faster than ATS.

Siege:

Transactions:              32246 hits
Availability:             100.00 %
Elapsed time:             119.57 secs
Data transferred:        1639.49 MB
Response time:              0.37 secs
Transaction rate:         269.68 trans/sec
Throughput:            13.71 MB/sec
Concurrency:               99.60
Successful transactions:       32246
Failed transactions:               0
Longest transaction:            1.65
Shortest transaction:           0.23

Almost an order of magnitude faster than ATS. Update: that's for the throughput. The transaction rate is actually similar, which implies the page size might have changed between benchmarks.

Bombardier:

anarcat@cache-02:~$ ./go/bin/bombardier --duration=2m --latencies https://blog.torproject.org/  -c 100
Bombarding https://blog.torproject.org:443/ for 2m0s using 100 connection(s)
[=========================================================================] 2m0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      2116.74     506.01    5495.77
  Latency       48.42ms    34.25ms      2.15s
  Latency Distribution
     50%    37.19ms
     75%    50.44ms
     90%    89.58ms
     95%   109.59ms
     99%   169.69ms
  HTTP codes:
    1xx - 0, 2xx - 247827, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   107.43MB/s

Almost maxes out the gigabit connexion as well, but only marginally faster (~3%?) than ATS.

Does not max theoritical gigabit maximal performance, which is apparently at around 118MB/s without jumbo frames (and 123MB/s with).

Varnish

Pros:

  • specifically built for caching
  • very flexible
  • grace mode can keep objects even after TTL expired (when backends go down)
  • third most popular, after Cloudflare and ATS

Cons:

  • no HTTPS support on frontend or backend in the free version, would require stunnel hacks
  • configuration is compiled and a bit weird
  • static content needs to be generated in the config file, or sidecar
  • no HTTP/2 support

Used by Fastly.

Fastly itself

We could just put Fastly in front of all this and shove the costs on there.

Pros:

  • easy
  • possibly free

Cons:

  • might go over our quotas during large campaigns
  • sending more of our visitors to Fastly, non-anonymously

Sources

Benchmarks:

Tutorials and documentation: