Tutorial

Most work on Bacula happens on the director, which is where backups are coordinated. Actual data is stored on the storage daemon, but the director is where we can issue commands and everything.

All commands below are ran from the bconsole shell, which can be ran on the director with:

root@dictyotum:~# bconsole 
Connecting to Director dictyotum.torproject.org:9101
1000 OK: 103 torproject-dir Version: 9.4.2 (04 February 2019)
Enter a period to cancel a command.
*

Then you end up with a shell with * as a prompt where you can issue commands.

Checking last jobs

To see the last jobs ran, you can check the status of the director:

*status director
torproject-dir Version: 9.4.2 (04 February 2019) x86_64-pc-linux-gnu debian 9.7
Daemon started 22-Jul-19 10:30, conf reloaded 23-Jul-2019 12:43:41
 Jobs: run=868, running=1 mode=0,0
 Heap: heap=7,536,640 smbytes=701,360 max_bytes=21,391,382 bufs=4,518 max_bufs=8,576
 Res: njobs=74 nclients=72 nstores=73 npools=291 ncats=1 nfsets=2 nscheds=2

Scheduled Jobs:
Level          Type     Pri  Scheduled          Job Name           Volume
===================================================================================
Full           Backup    15  03-Aug-19 02:10    BackupCatalog      *unknown*
====

Running Jobs:
Console connected using TLS at 02-Aug-19 15:41
 JobId  Type Level     Files     Bytes  Name              Status
======================================================================
107689  Back Full          0         0  chiwui.torproject.org is waiting for its start time (02-Aug 19:32)
====

Terminated Jobs:
 JobId  Level      Files    Bytes   Status   Finished        Name 
====================================================================
107680  Incr      51,879    2.408 G  OK       02-Aug-19 13:16 rouyi.torproject.org
107682  Incr         355    361.2 M  OK       02-Aug-19 13:33 henryi.torproject.org
107681  Diff      12,864    715.9 M  OK       02-Aug-19 13:34 pauli.torproject.org
107683  Incr         274    30.78 M  OK       02-Aug-19 13:50 forrestii.torproject.org
107684  Incr       3,423    2.398 G  OK       02-Aug-19 13:55 meronense.torproject.org
107685  Incr         288    32.24 M  OK       02-Aug-19 14:12 nevii.torproject.org
107686  Incr         341    69.64 M  OK       02-Aug-19 14:51 getulum.torproject.org
107687  Incr         289    26.24 M  OK       02-Aug-19 15:11 dictyotum.torproject.org
107688  Incr         376    57.62 M  OK       02-Aug-19 15:22 kvm5.torproject.org
107690  Incr         238    20.88 M  OK       02-Aug-19 15:32 opacum.torproject.org

====

Here we see that no backups are running, and the last ones succeeded correctly.

You can also check the status of individual clients with status client.

Checking messages

The messages command shows the latest messages on the bconsole. It's useful to run this command when you start your session as it will flush the (usually quite long) buffer of messages. That way the next time you call the command, you will only see the result of your latest jobs.

How to...

This section is more in-depths and will explain more concepts as we go. Relax, take a deep breath, it should go fine.

Configure backups on new machines

Backups for new machines should be automatically configured by Puppet using the bacula::client class, included everywhere (through hiera/common.yaml).

There are special configurations required for MySQL and PostgreSQL databases, see the design section for more information on those.

Restore files

Short version:

$ ssh -tt dictyotum.torproject.org bconsole
*restore

... and follow instructions. Reminder: by default, backups are restored on the originating server. llist jobid=N and messages to follow progress.

The bconsole program has a pretty good interactive restore mode which you can just call with restore. It needs to know which "jobs" you want to restore from. As a given backup job is typically an incremental job, you normally mean multiple jobs to restore to a given point in time.

The first thing to know is that restores are done from the server to the client, ie. they are restored directly on the machine that is backed up. This means it can overwrite existing files and are therefore pretty powerful.

A simple way of restoring a given client to a given point in time is to use the option. So:

  1. enter bconsole in a shell on the director

  2. call the restore command:

    *restore
    Automatically selected Catalog: MyCatalog
    Using Catalog "MyCatalog"
    
    First you select one or more JobIds that contain files
    to be restored. You will be presented several methods
    of specifying the JobIds. Then you will be allowed to
    select which files from those JobIds are to be restored.
    
  3. you now have a list of possible ways of restoring, choose: 5: Select the most recent backup for a client:

    To select the JobIds, you have the following choices:
         1: List last 20 Jobs run
         2: List Jobs where a given File is saved
         3: Enter list of comma separated JobIds to select
         4: Enter SQL list command
         5: Select the most recent backup for a client
         6: Select backup for a client before a specified time
         7: Enter a list of files to restore
         8: Enter a list of files to restore before a specified time
         9: Find the JobIds of the most recent backup for a client
        10: Find the JobIds for a backup for a client before a specified time
        11: Enter a list of directories to restore for found JobIds
        12: Select full restore to a specified Job date
        13: Cancel
    Select item:  (1-13): 5
    
  4. you will see a list of machines, pick the machine you want to restore from by entering its number:

    Defined Clients:
         1: alberti.torproject.org-fd
    [...]
       117: yatei.torproject.org-fd
    Select the Client (1-117): 87
    
  5. you now get dropped in a file browser where you use the mark and unmark commands to mark and unmark files for restore. the commands support wildcards like *. use mark * to mark all files in the current directory, see also the full list of commands:

    Automatically selected FileSet: Standard Set
    +---------+-------+----------+-----------------+---------------------+----------------------------------------------------------+
    | jobid   | level | jobfiles | jobbytes        | starttime           | volumename                                               |
    +---------+-------+----------+-----------------+---------------------+----------------------------------------------------------+
    | 106,348 | F     |  363,125 | 157,545,039,843 | 2019-07-16 09:42:43 | torproject-full-perdulce.torproject.org.2019-07-16_09:42 |
    | 107,033 | D     |    9,136 |     691,803,964 | 2019-07-25 06:30:15 | torproject-diff-perdulce.torproject.org.2019-07-25_06:30 |
    | 107,107 | I     |    4,244 |     214,271,791 | 2019-07-26 06:11:30 | torproject-inc-perdulce.torproject.org.2019-07-26_06:11  |
    | 107,181 | I     |    4,285 |     197,548,921 | 2019-07-27 05:30:51 | torproject-inc-perdulce.torproject.org.2019-07-27_05:30  |
    | 107,257 | I     |    4,273 |     197,739,452 | 2019-07-28 04:52:15 | torproject-inc-perdulce.torproject.org.2019-07-28_04:52  |
    | 107,334 | I     |    4,302 |     218,259,369 | 2019-07-29 04:58:23 | torproject-inc-perdulce.torproject.org.2019-07-29_04:58  |
    | 107,423 | I     |    4,400 |     287,819,534 | 2019-07-30 05:42:09 | torproject-inc-perdulce.torproject.org.2019-07-30_05:42  |
    | 107,504 | I     |    4,278 |     413,289,422 | 2019-07-31 06:11:49 | torproject-inc-perdulce.torproject.org.2019-07-31_06:11  |
    | 107,587 | I     |    4,401 |     700,613,429 | 2019-08-01 07:51:52 | torproject-inc-perdulce.torproject.org.2019-08-01_07:51  |
    | 107,653 | I     |      471 |      63,370,161 | 2019-08-02 06:01:35 | torproject-inc-perdulce.torproject.org.2019-08-02_06:01  |
    +---------+-------+----------+-----------------+---------------------+----------------------------------------------------------+
    You have selected the following JobIds: 106348,107033,107107,107181,107257,107334,107423,107504,107587,107653
    
    Building directory tree for JobId(s) 106348,107033,107107,107181,107257,107334,107423,107504,107587,107653 ...  mark etc
    ++++++++++++++++++++++++++++++++++++++++++++++
    335,060 files inserted into the tree.
    
    You are now entering file selection mode where you add (mark) and
    remove (unmark) files to be restored. No files are initially added, unless
    you used the "all" keyword on the command line.
    Enter "done" to leave this mode.
    
    cwd is: /
    $ mark etc
    1,921 files marked.
    

    Do not use the estimate command as it can take a long time to run and will freeze the shell.

  6. when done selecting files, call the done command

    $ done
    
  7. this will drop you in a confirmation dialog showing what will happen. note the Where parameter which shows where the files will be restored, on the RestoreClient. Make sure that location has enough space for the restore to complete.

    Bootstrap records written to /var/lib/bacula/torproject-dir.restore.6.bsr
    
    The Job will require the following (*=>InChanger):
       Volume(s)                 Storage(s)                SD Device(s)
    ===========================================================================
    
        torproject-full-perdulce.torproject.org.2019-07-16_09:42 File-perdulce.torproject.org FileStorage-perdulce.torproject.org
        torproject-diff-perdulce.torproject.org.2019-07-25_06:30 File-perdulce.torproject.org FileStorage-perdulce.torproject.org
        torproject-inc-perdulce.torproject.org.2019-07-26_06:11 File-perdulce.torproject.org FileStorage-perdulce.torproject.org
        torproject-inc-perdulce.torproject.org.2019-07-27_05:30 File-perdulce.torproject.org FileStorage-perdulce.torproject.org
        torproject-inc-perdulce.torproject.org.2019-07-29_04:58 File-perdulce.torproject.org FileStorage-perdulce.torproject.org
        torproject-inc-perdulce.torproject.org.2019-07-31_06:11 File-perdulce.torproject.org FileStorage-perdulce.torproject.org
        torproject-inc-perdulce.torproject.org.2019-08-01_07:51 File-perdulce.torproject.org FileStorage-perdulce.torproject.org
        torproject-inc-perdulce.torproject.org.2019-08-02_06:01 File-perdulce.torproject.org FileStorage-perdulce.torproject.org
    
    Volumes marked with "*" are in the Autochanger.
    
    
    1,921 files selected to be restored.
    
    Using Catalog "MyCatalog"
    Run Restore job
    JobName:         RestoreFiles
    Bootstrap:       /var/lib/bacula/torproject-dir.restore.6.bsr
    Where:           /var/tmp/bacula-restores
    Replace:         Always
    FileSet:         Standard Set
    Backup Client:   perdulce.torproject.org-fd
    Restore Client:  perdulce.torproject.org-fd
    Storage:         File-perdulce.torproject.org
    When:            2019-08-02 16:43:08
    Catalog:         MyCatalog
    Priority:        10
    Plugin Options:  *None*
    
  8. this doesn't restore the backup immediately, but schedules a job that does so, like such:

    OK to run? (yes/mod/no): yes
    Job queued. JobId=107693
    

You can see the status of the jobs on the director with the status director, but also see specifically the status of that job with llist jobid=107693:

*llist JobId=107697
           jobid: 107,697
             job: RestoreFiles.2019-08-02_16.43.40_17
            name: RestoreFiles
     purgedfiles: 0
            type: R
           level: F
        clientid: 9
      clientname: dictyotum.torproject.org-fd
       jobstatus: R
       schedtime: 2019-08-02 16:43:08
       starttime: 2019-08-02 16:43:42
         endtime: 
     realendtime: 
        jobtdate: 1,564,764,222
    volsessionid: 0
  volsessiontime: 0
        jobfiles: 0
        jobbytes: 0
       readbytes: 0
       joberrors: 0
 jobmissingfiles: 0
          poolid: 0
        poolname: 
      priorjobid: 0
       filesetid: 0
         fileset: 
         hasbase: 0
        hascache: 0
         comment:

The JobStatus column is an internal database field that will show T ("terminated normally") when completed or R or C when still running or not started, and anything else if, well, anything else is happening. The full list of possible statuses is hidden deep in the developer documentation, obviously.

The messages command also provides for a good way of showing the latest status, although it will flood your terminal if it wasn't ran for a long time. You can hit "enter" to see if there are new messages.

*messages
[...]
02-Aug 16:43 torproject-sd JobId 107697: Ready to read from volume "torproject-inc-perdulce.torproject.org.2019-08-02_06:01" on File device "FileStorage-perdulce.torproject.org" (/srv/backups/bacula/perdulce.torproject.org).
02-Aug 16:43 torproject-sd JobId 107697: Forward spacing Volume "torproject-inc-perdulce.torproject.org.2019-08-02_06:01" to addr=328
02-Aug 16:43 torproject-sd JobId 107697: Elapsed time=00:00:03, Transfer rate=914.8 K Bytes/second
02-Aug 16:43 torproject-dir JobId 107697: Bacula torproject-dir 9.4.2 (04Feb19):
  Build OS:               x86_64-pc-linux-gnu debian 9.7
  JobId:                  107697
  Job:                    RestoreFiles.2019-08-02_16.43.40_17
  Restore Client:         dictyotum.torproject.org-fd
  Where:                  /var/tmp/bacula-restores
  Replace:                Always
  Start time:             02-Aug-2019 16:43:42
  End time:               02-Aug-2019 16:43:50
  Elapsed time:           8 secs
  Files Expected:         1,921
  Files Restored:         1,921
  Bytes Restored:         2,528,685 (2.528 MB)
  Rate:                   316.1 KB/s
  FD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Restore OK

Once the job is done, the files will be present in the chosen location (Where) on the given server (RestoreClient).

See the upstream manual more information about the restore command.

Restore the directory server

If the storage daemon disappears catastrophically, there's nothing we can do: the data is lost. But if the director disappears, we can still restore from backups. Those instructions should cover the case where we need to rebuild the director from backups.

The director is, essentially, a PostgreSQL database. Therefore, the restore procedure is to restore that database, along with some configuration.

TODO: this procecure is untested.

The first step is therefore to run Puppet with the bacula::director class applied to the node. This should restore a basic Bacula configuration, along with all client jobs. This will not, unfortunately, configure the PostgreSQL server, which is configured by hand.

TODO: Do consider deploying it with Puppet, as discussed in postgresql.

TODO: Document how to setup PostgreSQL by hand.

Then you need to restore the actual database. This can be done by extracting the database file from the catalog backup. The catalog backup is on the storage server (currently bungei), in /srv/backups/bacula/Catalog/. You can find the latest file and list its contents with this oneliner:

bls $(ls /srv/backups/bacula/Catalog/* | tail -1)

This is necessary because bls only takes absolute paths. It should show something like this:

root@bungei:~# bls $(ls /srv/backups/bacula/Catalog/* | tail -1)
bls: butil.c:292-0 Using device: "/srv/backups/bacula/Catalog" for reading.
19-aoû 19:53 bls JobId 0: Ready to read from volume "torproject-catalog.2019-08-19_10:57" on File device "FileStorage-catalog" (/srv/backups/bacula/Catalog).
bls JobId 0: -rw-------   1 systemd- systemd-        14880261463 2019-08-19 10:57:32  /var/lib/bacula/bacula.sql.gz
19-aoû 19:53 bls JobId 0: End of Volume "torproject-catalog.2019-08-19_10:57" at addr=14891296413 on device "FileStorage-catalog" (/srv/backups/bacula/Catalog).
1 files found.

You can then extract this entire "catalog" (which is really a compressed SQL dumped stored inside a Bacula-specific archive) using bextract:

bextract /srv/backups/bacula/Catalog/torproject-catalog.2019-08-19_10:57 /var/tmp/restore

The second argument should be a previously-created directory where the file should be stored. You now have an SQL dump of the database. Create an actual database in postgres to restore the file into, if it doesn't already exist:

sudo -u postgres psql -c 'CREATE DATABASE bacula;'

Finally, the database can be restored with this command.

WARNING: that will OVERWRITE any existing Bacula catalog. Make sure you run this on the right machine!

gunzip -c /var/tmp/restore/var/lib/bacula/bacula.sql.gz | sudo -u postgres psql bacula

Then everything should be fairies and magic and happiness all over again. Check that everything works with:

bconsole

Run a few of the "Basic commands", above just to make sure.

Restore PostgreSQL databases

See postgresql for restore instructions on PostgreSQL databases.

Restore MySQL databases

MySQL restoration should be fairly straightforward. Install MySQL:

apt install mysql-server

Load each database dump:

for dump in 20190812-220301-mysql.xz 20190812-220301-torcrm_prod.xz; do
    mysql < /var/backups/local/mysql/$dump
done

Design

This section documents how backups are setup at Tor. It should be useful if you wish to recreate or understand the architecture.

Backups are configured automatically by Puppet on all nodes, and use Bacula with TLS encryption over the wire.

Backups are pulled from machines to the backup server, which means a compromise on a machine shouldn't allow an attacker to delete backups from the backup server.

Bacula splits the different responsabilities of the backup system among multiple components, namely:

  • storage daemon (bacula::storage in Puppet, currently bungei)
  • director (bacula::director in Puppet, currently dictyotum, PostgreSQL configured by hand)
  • file daemon (bacula::client, on all nodes)

In our configuration, the Admin workstation, Database serverand Backup server are all on the same machine, the bacula::director.

Volumes are stored in the storage daemon, in /srv/backups/bacula/. Each client stores its volumes in a separate directory, which makes it easier to purge offline clients and evaluate disk usage.

We do not have a bootstrap file as advised by the upstream documentation because we do not use tapes or tape libraries, which make it harder to find volumes. Instead, our catalog is backed up in /srv/backups/bacula/Catalog and each backup contains a single file, the compressed database dump, which is sufficient to re-bootstrap the director.

See the introductio to Bacula for more information on those distinctions.

PostgreSQL backup system

Database backups are handled specially. We use PostgreSQL (postgres) everywhere apart from a few rare exceptions (currently only CiviCRM) and therefore use postgres-specific configurations to do backups of all our servers.

See postgresql for that server's specific backup/restore instructions.

MySQL backup system

MySQL also requires special handling, and it's done in the mariadb::server Puppet class. It deploys a script (backup-mysql) which runs every hour and calls mysqldump to store plaintext copies of all databases in /var/backups/local/mysql.

It also stores the SHA256 checksum of the backup file as a hardlink to the file, for example:

1184448 -rw-r----- 2 root    154820 aug 12 21:03 SHA256-665fac68c0537eda149b22445fb8bca1985ee96eb5f145019987bdf398be33e7
1184448 -rw-r----- 2 root    154820 aug 12 21:03 20190812-210301-mysql.xz

Those both point to the same file, inode 1184448.

Those backups then get included in the normal Bacula backups.

References