Assimilate — Front-End to Borg Backup

Version: 0.0a0
Released: 2024-12-08
Please report all bugs and suggestions on GitHub_.

What is Assimilate?

Assimilate is a simple command line utility to orchestrate backups. It is built as a front-end to BorgBackup_, a powerful and fast de-duplicating backup program. With Assimilate, you specify all the details about your backups once in advance, and then use a very simple command line interface for your day-to-day activities.

Use of Assimilate does not preclude the use of Borg directly on the same repository. The philosophy of Assimilate is to provide commands that you would use often and in an interactive manner with the expectation that you would use Borg directly for more unusual or esoteric situations.

Assimilate is the next generation of Emborg_ that is designed to work with Borg 2.0. You should use Emborg if you are using an earlier version of Borg and if you are currently an Emborg user you will need to switch to Assimilate when you upgrade to Borg 2.0.

Why Assimilate?

There are alternatives to Assimilate such as BorgMatic_ and Vorta_, both of which are also front-ends to BorgBackup. BorgMatic has a command line interface like Assimilate while Vorta is GUI-based. Assimilate distinguishes itself by providing a command line interface that is very efficient for common tasks, such as creating archives (backups), restoring files or directories, or comparing existing files to those in an archive. Also, Assimilate naturally supports multiple configurations. This feature can be used to simultaneously backup to a local repository, which provides rapid restores, and an off-site repository, which provides increased safety in case of a local disaster. Or it can be used to apply different retention rules to directories. For example you might want to use conservative retention rules to precious files that are not frequently accessed, such as photo, and aggressive retention rules source code it held in data management systems like GitHub.

Why Borg?

Well, everyone needs to backup their files. So perhaps the questions should be: why not Duplicity? Duplicity_ has been the standard way to do backups on Unix systems for many years.

Duplicity provides full and incremental backups. A full backup makes complete copies of each file. With an incremental backup, only the difference between the current and previous versions of the file are saved. Thus, to retrieve a file from the backup, Duplicity must first get the original version of the file, and then apply each change. That approach results in the following issues:

  1. The recovery process is slow because the desired file is reconstructed from possibly a large number of change sets, each of which must be downloaded from a remote repository before it can be applied. The change sets are large, so the recovery of even small files can require downloading a large amount of data. It is common that the recovery of a single small file could require many hours.

  2. Because the recovery process requires so many steps, it can be fragile. Apparently Duplicity keeps all the change sets open during the recovery process, and so the recovery process can fail because the operating system limits how many files you can open at any one time.

  3. Generally, when there are problems, you only find them when you try to recover a file. At that point it is too late.

  4. Duplicity does not do de-duplication, so if your were to have multiple copies of the same file, or if you moved a file, then you would keep multiple copies of it.

The first two issues can be reduced with frequent full backups, but this greatly increases the space you need to hold your backups.

Borg works in a very different way. When Borg encounters a file, it first determines whether it is new or not. The file is determined to be new if the contents of that file do not already exist in the repository, in which case it copies the contents into the repository. Then, either way, it associates a pointer to the file’s contents with the filepath. This makes it naturally de-duplicating. When it comes time to recover a file, it simply uses the file path to find the contents. In this way, it only retrieves the data it needs. There is no complicated and fragile process needed to reconstruct the file from a long string of differences.

After living with Duplicity for many years, I now find the Borg recovery process stunningly fast and extremely reliable. I am completely sold on Borg and will never use Duplicity again.

Terminology

It is helpful to understand a few terms that are used by Borg to describe your backups.

repository:

This is the location where all of your files are backed up to. It may be on a local file system or it may be remote, in which case it is accessed using ssh.

A repository consists of a collection of disembodied and deduplicated file contents along with a collection of archives.

archive:

This is a snapshot of the files that existed when a particular backup was run. Basically, it is a collection of file paths along with pointers to the contents of those files.

command:

An operation to perform on a repository, such as creating a new archive (backing up) or extracting a file or directory.

In addition, Assimilate add a new concept.

configuration:

A collection of rules that define all aspects of how to maintain a repository. The rules take the form of settings that define the location of the repository, which files should be copied to the repository when backing up, what encryption and compression schemes should be used, how to prune out archives that are no longer worth keeping, etc.

In Borg everything needed by a command must be specified as command line options. In Assimilate the addition of configurations allows you to simply specify the configuration to the command. The command can then extract the information it needs from the configuration. In this case, command line options are only needed for things not available from the configuration, which are things that tend to vary from invocation to invocation, like which file you wish to extract.

Assimilate also allows you to specify a default configuration, so in most cases you need not even specify the configuration to a command.

Quick Tour

You must initially describe your repository or repositories to Assimilate. You do so by adding configuration files to ~/.config/assimilate. At least two are required. First is the file that contains settings that are shared between all configurations. This is a NestedText_ file located at ~/.config/assimilate/shared.conf.nt. Here is an example:

# configurations
default config: home

# basic settings
default mount point: ~/tmp/ASSIMILATE
passcommand: avendesora value --stdout laptop-borg passcode
encryption: repokey-blake2-chacha20-poly1305
compression: zstd,1
notifier: notify-send -u critical {prog_name} "{msg}"

# things to exclude
exclude if present: .nobackup
exclude caches: 'yes
exclude nodump: 'yes

# command aliases
command aliases:
    repo-list:
        - archives
        - recent --last 5
    list: paths

# command aliases
logging:
    keep for: 1w

There also must be individual settings files for each backup configuration. They are also NestedText files. The above file defines the root configuration. The configuration is described in ~/.config/assimilate/root.conf.nt, an example of which is given below. It is designed to back up the whole machine:

# repository settings
repository: borgbase:backups
archive: {host_name}-{config_name}-{{now}}
match archives: sh:{host_name}-{config_name}-*

# basic settings
passphrase: singer reread marry crucible
prune after create: 'yes
check after create: 'yes
notify: admin@mydomain.com

# what to backup
patterns:
    - R /etc
    - R /home
    - R /root
    - R /var
    - R /srv
    - R /opt
    - R /usr/local

    # what to exclude
    - /var/cache
    - /var/lock
    - /var/run
    - /var/tmp
    - /root/.cache
    - /home/*/.cache

# prune settings
keep_daily = 7
keep_weekly = 4
keep_monthly = 6

Since this configuration needs to back up files that may not be accessible by normal users, it should be run by the root user.

Once you have created these files, you can use Assimilate to perform common tasks that involve your backups.

The first step would be to initialize the remote repository. A repository must be initialized before it can first used. To do so, one uses the repo-create command:

$ assimilate repo-create

Once the repository is initialized, it is ready for use. The create command creates an archive, meaning that it backs up your current files.

$ assimilate create

Once one or more archives have been created, you can list the available archives using the repo-list command.

$ assimilate repo-list

The list command displays all the files in the most recent archive.

$ assimilate list

You can restrict the listing to those files contained in the current working directory using:

$ assimilate list .

If you give the name of an archive, it displays all the files in the specified archive.

$ assimilate list --archive continuum-root-2025-04-23T18:35:33

Or, you can give a date, in which case the oldest archive created before that date is used.

$ assimilate list --before 2025-04-23

You can also specify the date and time relative to the current moment:

$ assimilate list --before 1w

The compare command allows you to see and manage the differences between your local files and those in an archive. You can compare individual files or entire directories. You can use the date and archive options to select the particular archive to compare against. You can use the interactive version of the command to graphically view changes and merge them back into you local files.

$ assimilate compare --interactive doc/thesis

The restore command restores files or directories in place, meaning it replaces the current version with the one from the archive. You can also use the date and archive options to select the particular archive to draw from.

$ cd ~/bin
$ assimilate restore accounts.json

The mount command creates a directory ‘BACKUPS’ and then mounts an archive or the whole repository on this directory. This allows you to move into the archive or repository, navigating, examining, and retrieving files as if it were a file system. Again, you can use the date and archive options to select the particular archive to mount.

$ assimilate mount BACKUPS

The umount command un-mounts the archive or repository after you are done with it.

$ assimilate umount BACKUPS

The due command tells you when the last successful backup was performed.

$ assimilate due

The help command shows you information on how to use Assimilate.

$ assimilate help

There are more commands, but the above are the most commonly used.

Status

The <overdue> command can be run in a cron script on either the client or the server machine. It notifies you if your back-ups have not completed successfully in a specified period of time. In addition, Assimilate can be configured to update monitoring services such as HealthChecks.io_ with the status of the backups.

Borg

Borg has more power than what is exposed with Assimilate. You may use it directly or through the Assimilate borg command when you need that power. More information can be found at BorgBackup_.

Precautions

You should assure you have a backup copy of the encryption key and its passphrase in a safe place (run ‘borg key export’ to extract the encryption keys). This is very important. If the only copy of the encryption credentials are on the disk being backed up and if that disk were to fail you would not be able to access your backups. I recommend the use of SpareKeys_ as a way of assuring that you always have access to the essential information, such as your Borg passphrase and keys, that you would need to get started after a catastrophic loss of your disk.

If you keep the passphrase in an Assimilate configuration file then you should set the permissions for that file so that it is not readable by others:

chmod 600 ~/.config/assimilate/*

Better is to simply not store the passphrase in Assimilate configuration files. You can use the passcommand setting for this, or you can use Avendesora_, which is a flexible password management system. The interface to Avendesora is already built in to Assimilate, but its use is optional (it need not be installed).

It is also best, if it can be arranged, to keep your backups at a remote site so that your backups do not get destroyed in the same disaster, such as a fire or flood, that claims your original files. One option is RSync_. Another is BorgBase_. I have experience with both, and both seem quite good. One I have not tried is Hetzner_.

Borg supports many different ways of excluding files and directories from your backup. Thus it is always possible that a small mistake results essential files from being excluded from your backups. Once you have performed your first backup you should mount the most recent archive and then carefully examine the resulting snapshot and make sure it contains all the expected files.

Finally, it is a good idea to practice a recovery. Pretend that you have lost all your files and then see if you can do a restore from backup. Doing this and working out the kinks before you lose your files can save you if you ever do lose your files.

Issues

Please ask questions or report problems on GitHub_.

Contents