TLS is the Transport Layer Security protocol, previously known as SSL and also known as HTTPS on the web. This page documents how TLS is used across the TPA infrastructure and specifically how we manage the related X.509 certificates that make this work.

Tutorial

How to get an X.509 certificate for a domain with Let's Encrypt

  1. If not already done, clone git repos letsencrypt-domains and backup-keys:

    git clone ssh://git@git-rw.torproject.org/admin/letsencrypt-domains
    cd letsencrypt-domains
    git clone pauli.torproject.org:/srv/puppet.torproject.org/git/tor-backup-keys.git backup-keys
    
  2. Add your domain name and optional alternative names (SAN) to the domains file:

    $EDITOR domains
    
  3. Push the updated domain list to the letsencrypt-domains repo

    git diff domains
    git add domains
    git commit
    git push
    

The last command will produce output from the dehydrated command which talks with the DNS primary (currently nevii) to fetch new keys and update old ones.

The new keys and certs are being copied to the LDAP host (currently pauli) under /srv/puppet.torproject.org/from-letsencrypt/. Then Puppet pick those up in the ssl module. Use the ssl::service resource to deploy them.

See the "Design" section below for more information on how that works.

See also static-component for an example of how to deploy an encrypted virtual host and onion service.

Renewing a certificate before its expiry date

If a certificate has been revoked, it should be renewed before its expiry date. To do so, you can drop a special file in the per-domain-config directory to change the expiry date range and run the script by hand.

Create a file matching the primary domain name of the certificate on the DNS master:

cat <<EOF > /srv/letsencrypt.torproject.org/repositories/letsencrypt-domains/per-domain-config/example.torproject.org
RENEW_DAYS="85"
EOF

Here we tell the ACME client (dehydrated) to renew the cert if it is 85 days or older (instead of the 30 days period).

Then run the script by hand (or wait for cron to do its thing):

letsencrypt@nevii:~$ dehydrated-wrap --cron
[...]
Processing example.torproject.org with alternative names: example.torproject.org
 + Using certificate specific config file!
   + RENEW_DAYS = 85
 + Checking domain name(s) of existing cert... unchanged.
 + Checking expire date of existing cert...
 + Valid till May 18 20:40:45 2020 GMT Certificate will expire
(Less than 85 days). Renewing!
 + Signing domains...
[..]

Then remove the file.

How-to

Enabling HPKP

HPKP is generally considered DEPRECATED. It has been disabled in Google Chrome in 2017 and should generally not be used anymore.

This section should generally be skipped unless you really need key pinning for some obscure reason.

  1. To generate backup HPKP keys, use the script provided in the domains.git repository:

    ./bin/manage-backup-keys create
    

    See tor-passwords/000-backup-keys for the passphrase when prompted.

    The private key is a backup RSA certificate that can be used to rotate HTTPS certificates in case of a compromise, while respecting the pins sent as Public-Key-Pins headers.

  2. Push the new key to the backup-keys repo:

    cd backup-keys
    git status
    git add $yourfiles
    git commit
    git push
    cd ..
    

Disabling HPKP

To disable key pinning (HPKP) on a given domain, just remove the backup key from the repository:

cd backup-keys
git rm example.torproject.org*
git commit
git push

Then run Puppet on all affected hosts, for example the static mirrors:

cumin 'C:roles::static_mirror_web' 'puppet agent -t'

Pager playbook

  • if you get email from Digicert, ask the Tor Browser team, they use it to sign code (see "Design" below for more information about which CAs are in use)

Disaster recovery

No disaster recovery plan yet (TODO).

Reference

Installation

There is no documentation on how to deploy this service from scratch. To deploy a new cert, see the above section and the ssl::service Puppet resource.

SLA

TLS is critical and should be highly available when relevant. It should fail closed, that is if it fails a security check, it should not allow a connexion.

Design

TLS is one of two major transport security protocols used at TPA (the other being ipsec). It is used by web servers (Apache, HA Proxy, Nginx), bacup servers (Bacula), mail servers (Postfix), and possibly more.

Certificate generation is done by git hooks for Let's Encrypt or by a makefile and cron job for auto-ca, see below for details.

Certificate authorities in use at Tor

This documents mostly covers the Let's Encrypt certificates used by websites and other services managed by TPA.

But there are other certificate authorities in use inside TPA and, more broadly, at Tor. Here's the list of known CAs in operation at the time of writing (2020-04-15):

  • Let's Encrypt: automatically issues certificates for most websites and domains, managed by TPA
  • Globalsign: used by the Fastly CDN used to distributed TBB updates
  • Digicert: used by other teams to sign software releases for Windows
  • Puppet: our configuration management infrastructure has its own X.509 certificate authority which allows "Puppet agents" to authenticate and verify the "Puppet Master", see our documentation and upstream documentation for details
  • internal "auto-ca": all nodes in Puppet get their own X.509 certificate signed by a standalone, self-signed X.509 certificate, documented below

Internal auto-ca

The internal "auto-ca" is a standalone certificate authority running on the Puppet master (currently pauli), in /srv/puppet.torproject.org/auto-ca.

The CA runs based on a Makefile which takes care of creating, revoking, and distributing certificates to all nodes. Certificates are valid for a year (365 days, actually). If a certificate is going to expire in less than 30 days, it gets revoked and removed.

The makefile then iterates over the known hosts (as per /var/lib/misc/thishost/ssh_known_hosts, generated from ldap) to create (two) certificates for each host. This makes sure certs get renewed before their expiry. It will also remove certificates from machines that are not known, which is the source of the revoked client emails TPA gets when a machine gets retired.

The Makefile then creates two certificates per host: a "clientcert" (in clientcerts/) and a "server" (?) cert (in certs/). The former is used by Bacula and Postfix clients to authenticate with the central servers for backups and mail delivery, respectively. The latter is used by those servers to authenticate to their clients but is also used as default HTTPS certificates on new apache hosts.

Once all certs are created, revoked, and/or removed, they gets copied into Puppet's file hierarchy, in the following locations:

  • /etc/puppet/modules/ssl/files/certs/: server certs
  • /etc/puppet/modules/ssl/files/clientcerts/: client certs.
  • /etc/puppet/modules/ssl/files/clientcerts/fingerprints: colon-separated SHA256 fingerprints of all "client certs", one per line
  • /etc/puppet/modules/ssl/files/certs/ca.crt: CA's certificate
  • /etc/puppet/modules/ssl/files/certs/ca.crl: certificate revocation list

This work gets run from cron (in /etc/cron.daily/local-auto-ca) which calls make -s install every day.

Let's encrypt workflow

When you push to the git repository on the git-rw.torproject.org server (currently cupani):

  1. a per-repository hook gets called in /srv/git.torproject.org/git-helpers/post-receive-per-repo.d/admin\%letsencrypt-domains/trigger-letsencrypt-server

  2. this hooks hits the DNS master over SSH (letsencrypt@nevii) and there the authorized_keys file hardcodes the command to /srv/letsencrypt.torproject.org/bin/from-githost

  3. ... which in turns just calls bin/update in the same directory (/srv/letsencrypt.torproject.org)

  4. ... which in turns pulls the letsencrypt-domains repository and runs dehydrated-wrap --cron with a special BASE variable that points dehydrated at our configuration, in etc/dehydrated-config, again in the same directory

  5. Through that special configuration, the dehydrated command is configured to call a custom hook (bin/le-hook) which implements logic around the DNS-01 authentication challenge, notably adding challenges, bumping serial numbers in the primary nameserver, and waiting for secondaries to sync. Note that there's a configuration file for that hook in /etc/dsa/le-hook.conf.

  6. The le-hook also pushes the changes around. The hook calls the bin/deploy file which installs the certificates files in var/result.

  7. It also generates a Public Key Pin (PKP) hash with the bin/get-pin command and appends Diffie-Hellman paramets (dh-$size.pem) to the certificate chain.

  8. It finally calls the bin/push command which runs rsync to the Puppet server, which in turns hardcodes the place where those files are dumped (in pauli:/srv/puppet.torproject.org/from-letsencrypt) through its authorized_keys file.

  9. Finally, those certificates are collected by Puppet through the ssl module. Pay close attention to how the tor-puppet/modules/apache2/templates/ssl-key-pins.erb template works: it will not deploy key pinning if the backup .pin file is missing.

Issues

There is no issue tracker specifically for this project, File or search for issues in the generic internal services component.

Monitoring and testing

When a HTTPS certificate is configured on a host, it MUST be (manually) configured in Nagios. This can be done by adding the host to the apache-https-host, haproxy-https-host, nginx-https-hosts, depending on the webserver implementation. If the TLS server is another implementation, a new check SHOULD be written.

All Let's Encrypt certificates are automatically checked for expiry by Nagios as well, on top of the above checks.

Discussion

Overview

There are no plans to do major changes to the TLS configuration, although review of the cipher suites is in progress (as of April 2020). We should have mechanisms to do such audits on a more regular basis, and facilitate changes of those configurations over the entire infrastructure.

Alternatives considered

The auto-ca machinery could be replaced by Puppet code. We could also leverage the ACME protocol designed by letsencrypt to run our own CA instead of just OpenSSL, although that might be overkill. In general it might be preferable to reuse an existing solution than maintain our own software in Make.