Migrating from Bacula to Burp

I migrated from the Bacula backup/restore tool (v5) to the Burp software. In this short article I will talk a bit about the rationale behind that decision and give some pointers on how to set up a more flexible Burp environment.

Why choose Burp over Bacula?

Let me start with a bit of background regarding my use of Bacula. I started using Bacula well over 10 years ago when it was still v2 something and long before its main author Kern Sibbald decided to split up the software in a community and enterprise version. At the time of writing this article, the community version has matured to v7.x. As far as I can recall there was never a major release v4 of Bacula - v2 became v3 and then the versioning was bumped up to v5. After v5 the release numbers were evenly (enterprise) and oddly (community) divided to indicate for which audience they are intended.

I have used Bacula in a variety of places and situations over the years. From backing up home servers - covering both Linux and Windows hosts - to safeguarding real life production data in a commercial setting. I usually configured Bacula to operate with disk-based backups but at one point I did invest in a small, second-hand DDS auto-changer and ran my home backups and restores from tapes. Unfortunately the hardware died a few months after acquisition which forced me to revert back to a disk-based set-up.

Burp is a fork of the popular Bacula tool and its author Graham Keeling explains quite well why he forked from the original project on this page: http://burp.grke.org/why.html. Below are my own motivations which are in part overlapping with Graham’s:

Bacula does not work well for disk-based backups: Bacula is built on the assumption that backups are tape-based. To perform a backup to disk, it writes a continuous stream of data (singular or multiplexed) into large files, emulating a tape drive or tape volume. This type of sequential storage is not the most efficient way to store and restore files when the underlying hardware does not limit you to sequential access only. In a typical single-file restore, Bacula needs to fast-forward in the appropriate backup volume to find the right file marker and restore from there onwards. Though this works quite fast, it is not convenient. Also access to Bacula backup volumes either needs to be done using its standard console interface or using the low-level bextract and/or bls tools. In no case is it possible to single out files from a backup volume without the actual Bacula tools.
Bacula requires a database catalog: Backing up millions of files requires good tracking and housekeeping of what and when they are being backed up. Bacula supports multiple database back-ends for this purpose ranging from SQLite over MySQL/PostgreSQL to commercial databases like Oracle. In order to guarantee consistency and avoid concurrency problems, a database like MySQL really is a must in any decent Bacula set-up. With such a database comes the need to allocate extra resources (CPU, memory etc) and foresee proper maintenance (including a self-backup). For one of my cases, Bacula is being used on a set of servers that are containerized using the OpenVZ virtualization software. On those hosts, Bacula runs on the hardware node (or system container) so that maximum flexibility and performance is achieved when backing up the guest systems. However, good practice and common sense dictates that you want to minimize the amount of services/processes you need to run on the hardware node, keeping it as pristine - and as secure - as possible. Running a database like MySQL on a OpenVZ hardware node quite goes against that practice.
Bacula has cumbersome retention policies or to put it correctly: it is quite difficult to devise retention policies without running into problems with recycled and non-appendable volumes. Bacula supports many of the concepts of entreprise grade backup products when it comes to volumes, pools etc. A backup administrator will need to be careful when defining the properties of how many volumes are allowed in a pool, how long a volume should be blocked for writing (retention) and under what conditions Bacula may create a new volume on-demand or even forcibly overwrite an old one. From my experience, this is not always as straightforward as it sometimes seems. On the upside Bacula has gradually improved the way its handles retention in practice over the years though (pruning, purging, truncating etc.). In my case, the truncate on purge option was actually the single most useful feature added to Bacula in years as it avoids already recycled data taking up unneeded disk space. Unfortunately, that feature did not make it into Bacula until v5.0.1.
Bacula has a steep learning curve: a typical Bacula environment runs a multitude of daemons (director, file, storage) on a number of hosts. All of which are driven by individual configuration files which need to be carefully matched to each other in different places. For example a storage location in the director must correspond to a similar storage definition in the storage daemon configuration file. Bacula also comes with a slew (flexibility!) of options that can be tweaked in its various components. Needless to say: Bacula is powerful but it is no walk in the park for a complete newbie (and yes: I did not even mention all the different passwords that need to match between each component).
Bacula does not properly use backup dependencies: in backup schemes that alternates full with differential and incremental backups, Bacula does not guarantee that a restore will always be successful when a backup volume becomes unavailable or when a backup job is accidentally deleted. For example: any incremental backups on top of a full backup should invalidated as soon as the full backup the incremental backups rely on, becomes null and void. This is not the case with Bacula.
Bacula uses server-side scheduling: though this may seem the preferred way of scheduling backups; given some of Bacula’s limitations - it actually makes scheduling backups quite a bit more difficult. Bacula does allow backups to run in parallel - which it didn’t in the earlier versions because of multiplexing problems to the backups volumes - or to be queued for later execution. But queuing jobs can tricky when it comes to keeping volume retention times consistent for future backup runs.
Bacula always backs up entire files: Bacula will always backup the full content of any file even if only a single byte has changed compared to its last known backup. This can lead to undesirable backup volumes in terms of network use and storage capacity for files that are less dynamic in nature.

So how does Burp makes things easier and better compared to Bacula? To summarize these points:

Burp works much better for file storage (disk-based) as it stores objects in individual files and uses delta differencing and/or hard linking to reduce the amount of stored objects. See also http://burp.grke.org/docs/shuffling.html
Burp does not require a catalog, each backup is its own catalog. No catalog, no overhead, no maintenance.
Burp uses very simple retention policies and properly tracks backup dependencies: Burp will ask how many copies you wish to keep of an object, regardless of time interval and/or available disk space/volumes. Additionally it will not allow backups to be accidentally deleted if other backups still rely on it. See also: http://burp.grke.org/docs/retention.html
Burp is easier to learn, configure and maintain: Burp only consists of a client and server component and limits the amount of configurable options to a workable minimum. Administrators with previous knowledge of Bacula will quickly get a hold on the set-up. But even for complete newbies, Burp is a good bit easier to tackle from the start in comparison with Bacula. See also: http://burp.grke.org/docs/quickstart.html
Burp uses client-side scheduling: a typical Burp backup job is scheduled from a local cron job using a timer mechanism which queries the burp server and asks if it is backup time already. Both client and server have configuration options to steer when a client should be backed up (backup time has arrived). See also: http://burp.grke.org/docs/timer_script.html

Tips and tricks for a flexible Burp set-up

I will not cover the basic set-up of a Burp backup environment. Please refer to the Burp documentation and web site for this.

Use include files

Burp allows configuration information to be sourced from other, configuration files. This allows for a re-usable and modular set-up. For example, I use following standard exclude snippets.

A snippet to exclude files from being compressed since they are in compressed format already:

$ vi incexc-comp.conf
# include/exclude from compression
exclude_comp=7z
exclude_comp=ace
exclude_comp=apk
exclude_comp=arc
exclude_comp=ark
exclude_comp=arj
exclude_comp=bz2
exclude_comp=cab
exclude_comp=cbr
exclude_comp=cbz
exclude_comp=dar
exclude_comp=deb
exclude_comp=exe
exclude_comp=gz
exclude_comp=ice
exclude_comp=jar
exclude_comp=lhz
exclude_comp=lz
exclude_comp=lzo
exclude_comp=pka
exclude_comp=rar
exclude_comp=rpm
exclude_comp=sis
exclude_comp=tgz
exclude_comp=uha
exclude_comp=xz
exclude_comp=zip
exclude_comp=z
exclude_comp=zoo

A snippet that excludes filesystems that deliver device information or present themselves as device gateways:

$ vi incexc-dev.conf
# include/exclude filesystems
exclude=/proc
exclude=/sys
exclude=/media
exclude=/mnt

A snippet that excludes filesystems which are temporary in nature:

# include/exclude filesystems
exclude=/donotbackup
exclude=/tmp

# include/exclude filesystem types
exclude_fs=tmpfs
exclude_fs=devtmpfs

Above snippets can be added to any client backup definition (burp.conf) by including them as a source file:

# The following options specify exactly what to backup.
# The server will override them if there is at least one 'include=' line on
# the server side.
. /etc/burp/incexc-client.conf
. /etc/burp/incexc-comp.conf
. /etc/burp/incexc-dev.conf
. /etc/burp/incexc-tmp.conf

In this example the snippet incexc-client.conf will contain the actual file paths that do need backing up.

Create re-usable scheduling & retention policies

By applying the concept of include files, we can define a set of scheduling and retention policies which can be re-used in different backup definitions.

Example: a 3 monthly schedule with a backup taken once a week on Sunday between midnight and 7am (4 x 3 = 12 backups):

$ more 3-months-weekly.conf
# keep backup for 3 months, run weekly

# retention to 4x3
keep = 4
keep = 3

# schedule only a once week on sunday (00hrs-07hrs)
timer_arg = 1w
timer_arg = Sun,00,01,02,03,04,05,06

Example: a 3 monthly schedule with a backup taken once a day between 3am and 7am (7 x 4 x 3 = 84 backups):

$ more 3-months-daily.conf
# keep backup for 3 months, run daily

# retention to 7x4x3
keep = 7
keep = 4
keep = 3

# schedule daily (between 03hrs-07hrs)
timer_arg = 1d
timer_arg = Mon,Tue,Wed,Thu,Fri,Sat,Sun,03,04,05,06

Scheduling & retention policies are best kept on the Burp server and included there in the client configuration files in /etc/burp/clientconfdir. And keep in mind: 84 backups do not mean 84 copies of your data! (see also http://burp.grke.org/docs/shuffling.html)

Define multiple clients

Out-of-the-box Burp does not support the existence of multiple backup definitions for a single client. Instead it sees a client system or host as a single entity. Yet in a lot of cases it is useful to have your backup split up in different parts based on the type of data you need to safeguard, for example: OS vs database backup. Luckily, this kind of set-up is possible in Burp by defining multiple backup clients for one single host. To achieve this, we configure a separate burp.conf file for each backup client. This also gives us the flexibility to distinguish backups by:

specifying different include/exclude paths
using different scheduling schemes
using different retention policies

In the following example we will define 2 backup definitions, one for backing up the operation system (OS) and one for backing up the WWW root of a web server.

First we do the necessary configuration on the Burp server:

Create two client configuration files in /etc/burp/clientconfdir:

$ vi /etc/burp/clientconfdir/os_client
$ vi /etc/burp/clientconfig/www_client

Customize both files by defining at least an entry for the client password:

password = <client_password>

You can opt to use different passwords for each client or keep the password consistent for each backup client. In any, case, the password should match the password configured in the client configuration file on the Burp client host (see below).

Set the retention policies for each backup client in above configuration files:

$ vi /etc/burp/clientconfdir/os_client
# retention & schedule
. /etc/burp/clientconfdir/schedule/3-months-weekly.conf

$ vi /etc/burp/clientconfdir/www_client
# retention & schedule
. /etc/burp/clientconfdir/schedule/3-months-daily.conf

In this example we used the scheduling and retention policies that I mentioned in the previous chapter. Both scheduling configuration files are sourced from separate files in the schedule sub-directory. You will need to create and populate this directory with the necessary information first of course.

Next, we do the configuration on the Burp client system:

Create a subdirectory for each backup definition (aka client) in /etc/burp:

$ mkdir /etc/burp/os_client
$ mkdir /etc/burp/www_client

Copy the default burp.conf into each of the above sub-directories.
Customize both burp.conf files by changing at least:

server = <server_name>
password = <client_password>
cname = <client_name>

You can choose to use different passwords for each client or keep the password consistent for each backup client. In any, case, the password should match the password configured in the client configuration file on the Burp server (see above).

Also adjust the paths for the ssl_cert and ssl_key settings. Since we are going to have 2 different backup clients, it makes sense to have 2 different SSL certificates generated as well. They will be stored in the sub-directory for each backup client. For example, for the www_client:

# Client SSL certificate             
ssl_cert = /etc/burp/www_client/ssl_cert-client.pem                                                                           

# Client SSL key                        
ssl_key = /etc/burp/www_client/ssl_cert-client.key

Do the initial client → server exchange (SSL certificates etc.) for both backup clients. Make sure you use the correct, custom burp.conf for this! For example, for the www_client:

$ burp -c /etc/burp/www_client/burp.conf -a l

2015-04-22 13:37:43: burp[370248] before client
2015-04-22 13:37:43: burp[370248] begin client
2015-04-22 13:37:43: burp[370248] auth ok
2015-04-22 13:37:43: burp[370248] Server version: 1.4.36
2015-04-22 13:37:43: burp[370248] Server will sign a certificate request
2015-04-22 13:37:43: burp[370248] Generating SSL key and certificate signing request
2015-04-22 13:37:43: burp[370248] Running '/usr/sbin/burp_ca --key --keypath /etc/burp/www_client/ssl_cert-client.key
--request --requestpath /etc/burp/CA-client/foobar.csr --name foobar'
generating key foobar: /etc/burp/www_client/ssl_cert-client.key
Generating RSA private key, 2048 bit long modulus
...............................................+++
........................+++
e is 65537 (0x10001)
generating request foobar
2015-04-22 13:37:43: burp[370248] /usr/sbin/burp_ca returned: 0
2015-04-22 13:37:43: burp[370248] Sent /etc/burp/CA-client/foobar.csr
2015-04-22 13:37:43: burp[370248] Received: /etc/burp/www_client/ssl_cert-client.pem.370248
2015-04-22 13:37:43: burp[370248] Received: /etc/burp/ssl_cert_ca.pem.370248
2015-04-22 13:37:43: burp[370248] Rewriting config file: /etc/burp/www_client/burp.conf
2015-04-22 13:37:43: burp[370248] Re-opening connection to server
2015-04-22 13:37:48: burp[370248] begin client
2015-04-22 13:37:48: burp[370248] auth ok
2015-04-22 13:37:48: burp[370248] Server version: 1.4.36
2015-04-22 13:37:48: burp[370248] nocsr ok
2015-04-22 13:37:48: burp[370248] SSL is using cipher: DHE-RSA-AES256-GCM-SHA384 TLSv1.2 Kx=DH

2015-04-22 13:37:48: burp[370248] List finished ok
2015-04-22 13:37:48: burp[370248] after client

Create a cron entry for each backup client:

$ vi /etc/cron.d/burp

# os_client
15,35,55 * * * * root    /usr/sbin/burp -c /etc/burp/os_client/burp.conf -a t
# www_client
17,37,57 * * * * root    /usr/sbin/burp -c /etc/burp/www_client/burp.conf -a t

Migrating from Bacula to Burp

Why choose Burp over Bacula?

Tips and tricks for a flexible Burp set-up

Use include files

Create re-usable scheduling & retention policies

Define multiple clients

Leave a comment

You may also enjoy

Moving from Github to self-hosted git

SSH Controls updates

Health Checker updates

HP-UX LVM scripts (update)