Tutorial

This page has a lot of stuff! There's a Reference section that explains how everything is setup, then a few How-to guides that show how to do more specific things. But this first section hopes to get you running with a simple task that you will get you to do something correctly quickly.

In this tutorial, we will add an IP address to the global allow list, on all firewalls on all machines. This is a big deal! It will allow that IP address to access the SSH servers on all boxes and more. This should be an static IP address on a trusted network.

If you have never used Puppet before or if you are nervous at all about making such a change, it's a good idea to have a more experienced sysadmin nearby to help you or to ask for help. They can also confirm this tutorial is what you actually need to do.

  1. To any change on the Puppet server, you will first need to clone the git repository:

    git clone pauli.torproject.org:/srv/puppet.torproject.org/git/tor-puppet
    

    This needs to be only done once.

  2. The firewall rules are defined in the ferm module, which lives in modules/ferm. The file you specifically need to change is modules/ferm/templates/defs.conf.erb, so open that in your editor of choice:

    $EDITOR modules/ferm/templates/defs.conf.erb
    
  3. The code you are looking for is ADMIN_IPS. Add a @def for your IP address and add the new macro to the ADMIN_IPS macro. When you exit your editor, git should show you a diff that looks something like this:

    --- a/modules/ferm/templates/defs.conf.erb
    +++ b/modules/ferm/templates/defs.conf.erb
    @@ -77,7 +77,10 @@ def $TPO_NET = (<%= networks.join(' ') %>);
     @def $linus   = ();
     @def $linus   = ($linus 193.10.5.2/32); # kcmp@adbc
     @def $linus   = ($linus 2001:6b0:8::2/128); # kcmp@adbc
    -@def $ADMIN_IPS = ($weasel $linus);
    +@def $anarcat = ();
    +@def $anarcat = ($anarcat 203.0.113.1/32); # home IP
    +@def $anarcat = ($anarcat 2001:DB8::DEAD/128 2001:DB8:F00F::/56); # home IPv6
    +@def $ADMIN_IPS = ($weasel $linus $anarcat);
    
    
     @def $BASE_SSH_ALLOWED = ();
    
  4. Then you can commit this and push:

    git commit -m'add my home address to the allow list' && git push
    
  5. Then you should login to one of the hosts and make sure the code applies correctly:

    ssh -tt perdulce.torproject.org sudo puppet agent -t
    

Puppet shows colorful messages. If nothing is red and it returns correctly, you are done. If that doesn't work, go back to step 2. If that doesn't work, ask for help from your colleague in the Tor sysadmin team.

If this works, congratulations, you have made your first change across the entire Puppet infrastructure! You might want to look at the rest of the documentation to learn more about how to do different tasks and how things are setup. A key Howto we recommend is the Progressive deployment section below, which will teach you how to make a change like the above while making sure you don't break anything even if it affects a lot of machines.

Reference

This documents generally how things are setup.

Before it all starts

  • puppet.tpo is currently being run on pauli.tpo
  • This is where the tor-puppet git repo lives
  • The repo has hooks to populate /etc/puppet with its contents, most notably the modules directory.
  • All paths in this document are relative to the root of this repository.

File layout

  • 3rdparty/modules include modules that are shared publicly and do not contain any TPO-specific configuration. There is a Puppetfile there that documents where each module comes from and that can be maintained with r10k or librarian.

  • modules includes roles, profiles, and classes that make the bulk of our configuration.

  • in there, the roles class (modules/roles/manifests/init.pp) maps services to roles, using the $nodeinfo variable.

  • The torproject_org module (modules/torproject_org/manifests/init.pp) performs basic host initialisation, like configuring Debian mirrors and APT sources, installing a base set of packages, configuring puppet and timezone, setting up a bunch of rc-files and running ud-replicate.

  • In there, local.yaml (modules/torproject_org/misc/local.yaml) defines services and list which host(s) supply each service. local.yaml is read by the roles class above for setting up the $localinfo and $nodeinfo variables. It also defines the $roles parameter and defines ferm macros.

  • There is also the hoster.yaml file (modules/torproject_org/misc/hoster.yaml) which defines hosting providers and specifies things like which network blocks they use, if they have a DNS resolver or a Debian mirror. hoster.yaml is read by

    • the nodeinfo() function (modules/puppetmaster/lib/puppet/parser/functions/nodeinfo.rb), used for setting up the $nodeinfo variable
    • ferm's def.conf template (modules/ferm/templates/defs.conf.erb)
    • the entropy provider (modules/puppetmaster/lib/puppet/parser/functions/entropy_provider.rb) TODO
  • The root of definitions and execution is in Pupept is found in the manifests/site.pp file, but this file is now mostly empty, in favor of Hiera.

Note that the above is the current state of the file hierachy. As part of the transition to Hiera, a lot of the above architecture will change in favor of the more standard role/profile/module pattern. See ticket #29387 for an in-depth discussion.

Custom facts

modules/torproject_org/lib/facter/software.rb defines our custom facts, making it possible to get answer to questions like "Is this host running apache2?" byt simply looking at a puppet variable.

Style guide

Puppet manifests should generally follow the Puppet style guide. This can be easily done with Flycheck in Emacs, vim-puppet, or a similar plugin in your favorite text editor.

Many files do not currently follow the style guide, as they predate the creation of said guide. Files should not be completely reformatted unless there's a good reason. For example, if a conditional covering a large part of a file is removed and the file needs to be reindented, it's a good opportunity to fix style in the file. Same if a file is split in two components or for some other reason completely rewritten.

Otherwise the style already in use in the file should be followed.

Hiera

Hiera is a "key/value lookup tool for configuration data" which Puppet uses to look up values for class parameters and node configuration in General.

We are in the process of transitionning over to this mecanism from our previous set of custom YAML lookup system. This documents the way we currently use Hiera.

Classes definitions

Each host declares which class it should include through a classes parameter. For example, this is what configures a Prometheus server:

classes:
  - roles::monitoring

Roles should be abstract and not implementation specific. Each role includes a set of profiles which are implementation specific. For example, the monitoring role includes profile::prometheus::server and profile::grafana. Do not include profiles directly from Hiera.

As a temporary exception to this rule, old modules can be included as we transition from the has_role mechanism to Hiera, but eventually those should be ported to shared modules from the Puppet forge, with our glue built into a profile on top of the third-party module. The role roles::monitoring follows that pattern correctly.

Node configuration

On top of the host configuration, some node-specific configuration can be performed from Hiera. This should be avoided as much as possible, but sometimes there is just no other way. A good example are the build-arm-* nodes which include the following configuration:

bacula::client::ensure: "absent"

This disables backups on those machines, which are normally configured everywhere. This is done because they are behind a firewall and therefore not reachable, an unusual condition in the network. Another example is nutans which sits behind a NAT so it doesn't know its own IP address. To export proper firewall rules, the allow address has been overriden as such:

bind::secondary::allow_address: 89.45.235.22

Those types of parameters are normally automatically guess inside modules' classes, but they are overridable from Hiera.

Note: eventually all host configuration will be done here, but there are currently still some configurations hardcoded in individual modules. For example, the Bacula director is hardcoded in the bacula base class (in modules/bacula/manifests/init.pp). That should be moved into a class parameter, probably in common.yaml.

How to guides

Modifying an existing configuration

For new deployments, this is NOT the prefered method. For example, if you are deploying new software that is not already in use in our infrastructure, do not follow this guide and instead follow the Adding new configuration guide below.

If you are touching an existing configuration, things are much simpler however: you simply go to the module where the code already exists and make changes. You git commit and git push the code, then immediately run puppet agent -t on the affected node.

Look at the File layout section above to find the right piece of code to modify. If you are making changes that potentially affect more than one host, you should also definitely look at the Progressive deployment section below.

Adding new configuration

This is a broad topic, but let's take the Prometheus monitoring system as an example which followed the role/profile/module pattern.

First, the Prometheus modules on the Puppet forge were evaluated for quality and popularity. There was a clear winner there: the Prometheus module from Vox Populi had hundreds of thousands more downloads than the next option, which was deprecated.

Next, the module was added to the Puppetfile (in 3rdparty/Puppetfile):

mod 'puppet-prometheus', '6.4.0'

... and librarian was ran:

librarian-puppet install

This fetched a lot of code from the Puppet forge: the stdlib, archive and system modules were all installed or updated. All those modules were audited manually, by reading each file and looking for obvious security flaws or backdoors. Then the code was committed into git:

git add 3rdparty
git commit -m'install prometheus module after audit'

Then the module was configured in a profile, in modules/profile/manifests/prometheus/server.pp:

class profile::prometheus::server {
  class {
    'prometheus::server':
      # follow prom2 defaults
      localstorage        => '/var/lib/prometheus/metrics2',
      storage_retention   => '15d',
  }
}

The above contains our local configuration for the upstream prometheus::server class installed in the 3rdparty directory. In particular, it sets a retention period and a different path for the metrics, so that they follow the new Prometheus 2.x defaults.

Then this profile was added to a role, in modules/roles/manifests/monitoring.pp:

# the monitoring server
class roles::monitoring {
  include profile::prometheus::server
}

Notice how the role does not refer to any implementation detail, like that the monitoring server uses Prometheus. It looks like a trivial, useless, class but it can actually grow to include multiple profiles.

Then that role is added to the Hiera configuration of the monitoring server, in hiera/nodes/hetzner-nbg1-01.torproject.org.yaml:

classes:
  - roles::monitoring

And Puppet was ran on the host, with:

puppet --enable ; puppet agent -t --noop ; puppet --disable "testing prometheus deployment"

This led to some problems as the upstream module doesn't support installing from Debian packages. Support for Debian was added to the code in 3rdparty/modules/prometheus, and committed into git:

emacs 3rdparty/modules/prometheus/manifests/*.pp # magic happens
git commit -m'implement all the missing stuff' 3rdparty
git push

And the above puppet commandline was ran again, continuing that loop until things were good.

If you need to deploy the code to multiple hosts, see the Progressive deployment section below. To contribute changes back upstream (and you should do so), see the section right below.

Contributing changes back upstream

For simple changes, the above workflow works well, but eventually it is preferable to actually fork the upstream repo and operate on our fork until the changes are merged upstream.

First, the modified module is moved out of the way:

mv 3rdparty/modules/prometheus{,.orig}

The module is then forked on GitHub or wherever it is hosted, and then added to the Puppetfile:

mod 'puppet-prometheus',
    :git => 'https://github.com/anarcat/puppet-prometheus.git',
    :branch => 'deploy'

Then Librarian is ran again to fetch that code:

librarian-puppet install

Because Librarian is a little dumb, it might checkout your module in "detached head" mode, in which case you will want to fix the checkout:

cd 3rdparty/modules/prometheus
git checkout deploy
git reset --hard origin/deploy
git pull

Note that the deploy branch here is a merge of all the different branches proposed upstream in different pull requests, but it could also be the master branch or a single branch if only a single pull request was sent.

Since you now have a clone of the upstream repository, you can push and pull normally with upstream. When you make a change, however, you need to commit (and push) the change both in the sub-repo and the main repository:

cd 3rdparty/modules/prometheus
$EDITOR manifests/init.pp # more magic stuff
git commit -m'change the frobatz to a argblu'
git push
cd ..
git commit -m'change the frobatz to a argblu'
git push

Often, I make commits directly in our main Puppet repository, without pushing to the third party fork, until I am happy with the code, and then I craft a nice pretty commit that can be pushed upstream, reversing that process:

$EDITOR 3rdparty/prometheus/manifests/init.pp # dirty magic stuff
git commit -m'change the frobatz to a quuxblah'
git push
# see if that works, generally not
git commit -m'rah. wanted a quuxblutz'
git push
# now we are good, update our pull request
cd 3rdparty/modules/prometheus
git commit -m'change the frobatz to a quuxblutz'
git push

It's annoying to double-commit things, but I haven't found a best way to do so just yet. This problem is further discussed in ticket #29387.

Also note that when you update code like this, the Puppetfile does not change, but the Puppetfile.lock file does change. The GIT.sha parameter needs to be updated. This can be done by hand, but since that is error-prone, you might want to simply run this to update modules:

librarian-puppet update

This will also update dependencies so make sure you audit those changes before committing and pushing.

Listing all hosts under puppet

This will list all active hosts known to the Puppet master:

ssh -t pauli.torproject.org 'sudo -u postgres psql puppetdb -P pager=off -A -t -c "SELECT c.certname FROM certnames c WHERE c.deactivated IS NULL"'

The following will list all hosts under Puppet and their virtual value:

ssh -t pauli.torproject.org "sudo -u postgres psql puppetdb -P pager=off -F',' -A -t -c \"SELECT c.certname, value_string FROM factsets fs INNER JOIN facts f ON f.factset_id = fs.id INNER JOIN fact_values fv ON fv.id = f.fact_value_id INNER JOIN fact_paths fp ON fp.id = f.fact_path_id INNER JOIN certnames c ON c.certname = fs.certname WHERE fp.name = 'virtual' AND c.deactivated IS NULL\""  | tee hosts.csv

The resulting file is a Comma-Seperated Value (CSV) file which can be used for other purposes later.

Possible values of the virtual field can be obtain with a similar query:

ssh -t pauli.torproject.org "sudo -u postgres psql puppetdb -P pager=off -A -t -c \"SELECT DISTINCT value_string FROM factsets fs INNER JOIN facts f ON f.factset_id = fs.id INNER JOIN fact_values fv ON fv.id = f.fact_value_id INNER JOIN fact_paths fp ON fp.id = f.fact_path_id WHERE fp.name = 'virtual';\""

The currently known values are: kvm, physical, and xenu.

Other ways of extracting a host list

  • Using the PuppetDB API:

     curl -s -G http://localhost:8080/pdb/query/v4/facts  | jq -r ".[].certname"
    

    The fact API is quite extensive and allows for very complex queries. For example, this shows all hosts with the apache2 fact set to true:

     curl -s -G http://localhost:8080/pdb/query/v4/facts --data-urlencode 'query=["and", ["=", "name", "apache2"], ["=", "value", true]]' | jq -r ".[].certname"
    

    This will list all hosts sorted by their report date, older first, followed by the timestamp, space-separated:

     curl -s -G http://localhost:8080/pdb/query/v4/nodes  | jq -r 'sort_by(.report_timestamp) | .[] | "\(.certname) \(.report_timestamp)"' | column -s\  -t
    
  • Using cumin

  • Using LDAP:

     HOSTS=$(ssh alberti.torproject.org 'ldapsearch -h db.torproject.org -x -ZZ -b dc=torproject,dc=org -LLL "hostname=*.torproject.org" hostname | awk "\$1 == \"hostname:\" {print \$2}" | sort')
     for i in `echo $HOSTS`; do mkdir hosts/x-$i 2>/dev/null || continue; echo $i; ssh $i ' ...'; done
    

    the mkdir is so that I can run the same command in many terminal windows and each host gets only one once

Batch jobs on all hosts

With that trick, a job can be ran on all hosts with [parallel-ssh][], for example, check the uptime:

cut -d, -f1 hosts.hsv | parallel-ssh -i -h /dev/stdin uptime

This would do the same, but only on physical servers:

grep 'physical$' hosts.hsv | cut -d -f1 | parallel-ssh -i -h /dev/stdin uptime

This would fetch the /etc/motd on all machines:

cut -d -f1 hosts.csv | parallel-slurp -h /dev/stdin -L motd /etc/motd motd

To run batch commands through sudo that requires a password, you will need to fool both sudo and ssh a little more:

cut -d -f1 hosts.csv | parallel-ssh -P -I -i -x -tt -h /dev/stdin -o pvs sudo pvs

You should then type your password then Control-d. Warning: this will show your password on your terminal and probably in the logs as well.

Batch jobs can also be ran on all Puppet hosts with Cumin:

ssh -N -L8080:localhost:8080 pauli.torproject.org &
cumin '*' uptime

See cumin for more examples.

Generating secrets

  • bacula::director inherits bacula which defines
  • $bacula_director_secret using
  • hkdf() and generates
  • /etc/bacula/bacula-dir.conf using that

Progressive deployment

If you are making a major change to the infrastructure, you may want to deploy it progressively. A good way to do so is to include the new class manually in the node configuration, say in hiera/nodes/$fqdn.yaml:

classes:
  - my_new_class

Then you can check the effect of the class on the host with the --noop mode. Make sure you disable Puppet so that automatic runs do not actually execute the code, with:

puppet agent --disable "testing my_new_class deployment"

Then the new manifest can be simulated with this command:

puppet agent --enable ; puppet agent -t --noop ; puppet agent --disable "testing my_new_class deployment"

Examine the output and, once you are satisfied, you can re-enable the agent and actually run the manifest with:

puppet agent --enable ; puppet agent -t

If the change is inside an existing class, that change can be enclosed in a class parameter and that parameter can be passed as an argument from Hiera. This is how the transition to a managed /etc/apt/sources.list file was done:

  1. first, a parameter was added to the class that would remove the file, defaulting to false:

    class torproject_org(
      Boolean $manage_sources_list = false,
    ) {
      if $manage_sources_list {
        # the above repositories overlap with most default sources.list
        file {
          '/etc/apt/sources.list':
            ensure => absent,
        }
      }
    }
    
  2. then that parameter was enabled on one host, say in hiera/nodes/brulloi.torproject.org.yaml:

    torproject_org::manage_sources_list: true
    
  3. Puppet was run on that host using the simulation mode:

    puppet agent --enable ; puppet agent -t --noop ; puppet agent --disable "testing my_new_class deployment"
    
  4. when satisfied, the real operation was done:

    puppet agent --enable ; puppet agent -t --noop
    
  5. then this was added to two other hosts, and Puppet was ran there

  6. finally, all hosts were checked to see if the file was present on hosts and had any content, with cumin (see above for alternative way of running a command on all hosts):

    cumin '*' 'du /etc/apt/sources.list'
    
  7. since it was missing everywhere, the parameter was set to true by default and the custom configuration removed from the three test nodes

  8. then Puppet was ran by hand everywhere, using Cumin, with a batch of 5 hosts at a time:

    cumin -o txt -b 5 '*' 'puppet agent -t'
    

    because Puppet returns a non-zero value when changes are made, this will above when any one host in a batch of 5 will actually operate a change. You can then examine the output and see if the change is legitimate or abort the configuration change.

Debugging things

When a Puppet manifest is not behaving as it should, the first step is to run it by hand on the host:

puppet agent -t

If that doesn't yield enough information, you can see pretty much everything that Puppet does with the --debug flag. This will, for example, include Exec resources onlyif commands and allow you to see why they do not work correctly (a common problem):

puppet agent -t --debug

Finally, some errors show up only on the Puppetmaster: you can look in /var/log/daemon.log there for errors that will only show up there.