Finding a 4D Backup Solution
Date last modified: Tue Nov 19 2024 8:28 AM
'Time spent in reconnaissance is seldom wasted' - The Art of War, Sun Tzu, 400BC
This page records my researches and conclusions about backup software - primarily for Windows-based machines, and was mostly written during 2008, though some information has subsequently been updated.
What type of backup?
- Push or Pull? What I call a 'pull' backup relies on the backup server pulling the information from the operating computers, whereas with a 'push' backup the operating computer pushes its data to the backup server. In general I think a push backup system is better because it is more secure: the backup server does not need to be able to gain access to any of the operating computers.
- 3D or 4D? By '3D Backup' I mean that a backup archive is essentially a snapshot of your files as they were at one time; 4D Backup offers you the chance to recover early versions of files going back in time, both files that no longer exist on your operating computer and files that still exist but have changed. So it adds the 4th dimension (time) to a backup archive. But it must do so elegantly, not just by transferring every file each time it changes (which requires huge bandwidth) and then storing it in full (requiring huge backup storage). To win my 4D Backup badge a solution will provide backup history and will optimise backup both for transfer and for backup storage - for example by using so-called 'diffs' or 'deltas', or by deduplication.
- Local or Remote (Offsite)? Local backup is good but what if a fire destroys the premises? The best combination is to have both: a primary onsite backup and a secondary offsite backup. If the backup can be encrypted before it is transferred to the secondary backup then this could be a third-party facility e.g. a commercial offsite backup service. Having two backups also obviates the need for RAID.
- Complete or important data only? Although it seems attractive to have a 'complete' backup of operating computers so that they could be restored seamlessly if they encountered any sort of problem, this is not a very practical approach. The amount of data to be backed up is huge (involving a lot of duplication of system files - a clever backup system such as BackupPC can avoid this overhead), and in any case in many 'wipeout' situations there will be no alternative to using a new installation. My conclusion was that for Windows computers in normal use you might for example only need to backup the 'Documents' folder (and subfolders), 'Desktop', perhaps email files, and some other specific folders for specific applications. At present for 11 computers we are using 80GB of backup space.
- Backup encryption. Some backup solutions offer encryption of the backup storage, so that if it is stolen it cannot be read by a third party. More complete protection is also possible if even the administrator of the backup server cannot read the backed-up files.
Which Software?
This is a list of backup software that wins my 4D Backup badge, free or with a free version, which I discovered as I searched for my optimal solution:
- Minarca. This is a self-hosted open source 4D Backup product from the maintainer of Rdiffweb, using the same backend tools as TimeDicer (i.e. rdiff-backup and Rdiffweb), fully optimised for transfer and storage. The product appears to be free to use, well-documented with a free mailing list and bugtracker; more in-depth support is available at a price. [June 2022]
- TimeDicer. This is a free 4D Backup product. I have to declare an interest because I wrote it! It is the result of the researches recorded here and uses rdiff-backup with dedicated primary and secondary backup servers to give easy and high-level backup storage for Windows computers, with encryption and onsite/offsite options. It does not use cloud storage - data stays on your own machines, which gives comfort and makes data security compliance (GDPR etc) more straightforward. A TimeDicer Server can also be used for backup from Linux or from any other OS that can run rdiff-backup.
- tarsnap. This is a commercial 4D Backup product (author Colin Percival) which claims to be fully optimised for transfer and storage. Data is stored in the cloud (on Amazon AWS apparently) in encrypted form and the keys are known (apparently) only to the user. The charging model is by prepayment and you are billed based on storage and bandwidth used, however it claims that because of deduplication the amount of storage space is much less than you might expect. It is designed only for UNIX-like operating systems (BSD, Linux, MacOS X etc), although it might be usable for data under Windows via Cygwin or 'Bash on Ubuntu on Windows'. An old thread discussing tarsnap can be seen here.
- Ahsay Backup. This is a commercial 4D Backup product (except AhsayACB) which is (I think) fully optimised for transfer and storage. It appears to work with 'forward diffs' i.e. it stores an original copy of a file and then stores incremental or differential files which can be combined with the original to create a later version of the same file. The backup server conducts regular checks to ensure consistency of the data. They offer much more powerful options (including replication servers) and are apparently the backbone for remote backup services offered by thousands of providers around the world - they even provide a list of them! They state that all files are first zipped and encrypted with the operating computer's defined encrypting key before being sent to Ahsay backup server, so they are safe even from the prying eyes of the backup server administrator.
- Box Backup is a 4D Backup open source automatic on-line backup system primarily for Unix platforms (including Linux) but with a Windows client. All backed up data is stored on the server in files on a filesystem, all data is encrypted and the administrator of the server has no access to the saved data. A backup daemon copies encrypted data to the server when it notices changes. Only changes within files are sent to the server, just like rsync, and old versions of files on the server are stored as changes from the current version (reverse diffs), so old versions and deleted files are available. Backup behaviour can be optimised for document or server backup, and it is designed to be easy and cheap to run a server with a portable implementation, and optional RAID implemented in userland for reliability without complex server setup or expensive hardware. There is a separate project for a GUI front-end to Box Backup called Boxi. I admit I only discovered this solution long after fixing on rdiff-backup (below). [Does Box Backup use rdiff-backup or librsync under the hood? I don't know.]
- rdiff-backup. "rdiff-backup backs up one directory to another, including between machines over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted. Finally, rdiff-backup is easy to use and settings have sensical defaults." This is a fully-qualified 4D Backup solution; however it is a command-line utility and there is no elegant GUI. [Note: a new arrival on the scene [October 2010] JBackPack offers a GUI for rdiff-backup.] Rdiff-backup is now [2010] stable at version 1.2.8 but not under active development/improvement, and is available for Windows. It does not directly offer encryption of the backup data, though this can be achieved by workarounds with some limitations.
- duplicity. This free Linux utility is similar to rdiff-backup but backs up to encrypted tar-format volumes. Duplicity grew out of rdiff-backup, however rdiff-backup's archives are meant to be as easy to view as possible, while duplicity's are as hard to view as possible and can be encrypted with GnuPG. Duplicity saves data in the more conventional full+forward delta format instead of rdiff-backup's mirror+reverse deltas, and rdiff-backup requires another copy of rdiff-backup on the remote destination, while duplicity can access remote locations with scp or ftp (other backends may be supported later). Its primary advantage is the encryption of the archives, and there is also a space saving from the compressed volumes (whereas rdiff-backup stores the most recent copy of each file uncompressed). Duplicity wins a 4D Backup badge, but it is not (at the time of writing) available for Windows except by using cygwin or, perhaps, 'Bash on Ubuntu on Windows'. Duplicity is currently (2017) actively maintained. The use of forward delta format means that you cannot delete very old backups and that recovering a recent backup depends upon having a complete and perfect set of backups from the original until the recent date; in order to reduce this dependency and the associated risk many duplicity users carry out new full backups every month or so, but this means you lose all your older backups (or if you retain them and create a parallel new archive you lose most of the advantage of the delta storage).
Here are some other good backup packages which do not however count as 4D Backup solutions:
- Sync.com is rather like tarsnap (above) but I cannot verify that it offers the same data transfer and storage efficiency and so it does not (on my definition) qualify as a 4D Backup solution. On the plus side, they offer 5GB free storage and desktop client programs - including for Windows. It's possible - just my theory - that it is based on tarsnap code?
- Duplicati allows you to backup files and folders with strong AES-256 encryption, save space with incremental backups and data deduplication, run backups on any machine through the web-based interface or via command line interface. It has a built-in scheduler and auto-updater. Duplicati works with standard protocols like FTP, SSH, WebDAV as well as popular services like Microsoft OneDrive, Amazon Cloud Drive & S3, Google Drive, box.com, Mega, hubiC and many others. And it is free and open source.
- Bacula (Wikipedia page here) is a set of open source programs that manage backup, recovery, and verification of computer data across a network of computers of different kinds. There are some 5 components to the system and you are warned off if you are not a Unix expert; however a client is available for Windows. It offers comprehensive job control, stores lists of backed up files in a database for faster retrieval, and can backup to disk, DVD, and tape. It supports encrypted transfer and encrypted storage. Looks a bit intimidating to me, but certainly has a lot of features. It does not (at the time of writing) use diffs (deltas) for file storage or transfer and so it does not (on my definition) qualify as a 4D Backup solution.
- rsync. This free Linux utility is widely used for backup, including remote backup. But although rsync is very efficient for transferring data because it uses a 'delta' system to work out the differences in files and then sends only these, it does not get involved in storing files - so if you want the chance to recover multiple earlier versions of a file that has changed over time you need to keep multiple copies of the same file, which eats disk space on the backup server. Mike Rubel has a page of impressive scripts and this led me on to:
- rsnapshot. Nathan Rosenquist's free utility is also based on rsync but is more like a polished application, and well documented. It is basically a 'pull' solution though. Some information about making it work with Windows can be found here. It has also been adapted to work with rdiff-backup instead of rsync - see here.
- DeltaCopy. This free Windows software is based around a port of rsync, with a GUI. You install a DeltaCopy Server on the Windows backup server and DeltaCopy clients on the Windows operating computers, but DeltaCopy clients can also connect to a Linux rsync-based backup computer. It is a 'push' solution - the operating computers initiate and control the backup process.
- BackupPC. As recommended on the rdiff-backup 'related' page, this is a well-considered package essentially aimed at intranet backup of Windows and Linux computers on a network. Where it scores is ease of use and its ability to recognise multiple copies of the same file - coming from different computers - and avoid storage duplication. If you are backing up system files as well as user files this can make a huge difference to the size of the backup. It can use rsync for transfer.
- How encrypted cloud storage works. A web page about the concept of 'zero-knowledge' cloud data storage, where the data is encrypted at your end before you upload it so (hopefully) neither the cloud service provider nor anyone else can read it (e.g. tarsnap - see above). There is some further information about cloud storage here.
rdiff-backup: Notes on mirror + reverse incremental backup approach
rdiff-backup uses reverse incremental diffs: each time that a backup of a changed file is performed the current version of the file is retained 'clean' on the backup server (i.e. a 'mirror' of the original), a diff file is created (in a subdirectory) to allow for retrospective migration to the previous saved copy, and then the previous saved clean copy is deleted. As the file changes and is backed up over time these diff files accumulate. It is a reverse diff because it is used to go back in time to an earlier version of the file; it is incremental because in order to get a given version of the file you apply each diff file in reverse chronological sequence.
By contrast, a 'differential' rather than 'incremental' backup strategy allows recovery to any file version with only 2 sources - the 'original' and the 'differential diff' file for that version. This approach will lead to increasingly large and repetitious diff files as the current files diverge from an original, and indeed it is not a practicable solution for reverse diffs (but see below regarding LVM snapshots).
The processing of the diff files is handled by rdiff-backup 'under the hood' - the user does not have to understand what is going on. If a diff file is missing or corrupted for a given day then because of the incremental nature of the backups any versions of the original file for that day or for any earlier day cannot be recovered. Although this presents obvious dangers, it is at least for most purposes a safer approach than forward incremental diffs, where the corruption of one older diff will mean that more recent versions of the file cannot be recovered.
LVM
Linux's LVM2 'logical volume manager' makes the 'volume groups' (which can then be formatted to provide useable storage space) flexible and independent of the physical storage medium. Here is information about why LVM is good, and it omits to mention the availability in LVM of snapshots, which are particularly useful for making backups. [An LVM snapshot could be considered a local reverse differential backup of the corresponding volume group, but it cannot be used tout court as a long-term backup solution, only as a way of getting a temporarily-frozen fileset from which a backup can proceed, because it can grow rapidly and ultimately run out of disk space, while it may also impact the performance of the filesystem.]
LVM has the ability to 'revert' a backup: you can take an LVM snapshot, make changes to the data from which the snapshot was taken, and then revert back to the snapshot using lvconvert --merge, discarding all the changes. More information can be found by googling, or here.
Useful Links
- TimeDicer - how to setup Linux primary and secondary backup servers, and a Windows script for automating backups using rdiff-backup
- Rdiff-backup
- Differential Backup in Windows with Delta Files - Using 7-Zip and RDiff
- BackupNinja - centralized backup for Linux machines via configuration files (can use rdiff-backup and other backup systems)
- Vshadow - how to use the Windows volume shadow (= snapshot) service with Windows XP to copy files while they are in use
Donation
I have provided this software free gratis and for nothing. If you would like to thank me with a contribution, please let me know and I will send you a link. Thank you!
My Other Sites
- TimeDicer - Onsite/offsite data backup for Windows (uses rdiff-backup)
- Web Scraping How To - extracting data from web sites
My Programs
Here is a selection of some (other) programs I have written, most of which run under GNU/Linux from the command line (CLI), are freely available and can be obtained by clicking on the links. Dependencies are shown and while in most cases written and tested on an x86-based Linux server, they should run on a Raspberry Pi, and many can run under Windows using Windows Subsystem for Linux (WSL) or Cygwin. Email me if you have problems or questions, or if you think I could help with a programming requirement.
Backup Utilities
- TimeDicer - Onsite/offsite data backup for Windows (uses rdiff-backup) [ GNU/Linux & MS Windows©: 2008-20 ]
- rdiff-backup-regress - GNU/Linux script to regress an rdiff-backup archive. [ GNU/Linux: 2012-24 ]
Debian/Ubuntu kernel and LVM Utilities
- kernel-remove - GNU/Linux script to list the installed GNU/Linux kernels in a Debian-based distro (e.g. Ubuntu), and can be used to remove an unwanted kernel and related packages, updating grub appropriately. [ GNU/Linux-Debian/Ubuntu: 2010-24 ]
- lvm-usage - GNU/Linux script to show available disk space and how it is used; run as cron job to warn if usage is above a set percentage. Provides additional information if LVM is in use. [ GNU/Linux: 2012-24 ]
- lvm-delete-snapshot - GNU/Linux script to remove LVM snapshot that has been left over by another process. [ GNU/Linux: 2012-21 ]
- netnames - GNU/Linux script shows current name, biosdevname and 'predictable name' of network device - helps with network device name scheme migration. [ GNU/Linux-Debian/Ubuntu: 2020-20 ]
- lv-convert2cache - GNU/Linux script to convert an existing LV into a cache LV using a smaller faster device as a cache. [ GNU/Linux: 2022-24 ]
Miscellaneous Programs
- sleepwalker - Windows© program which can be run from a remote machine to 'wake up' a Windows© machine behind a router, wait for it to start and then initiate Remote Desktop session. [MS Windows©: 2008-22]
- numliststat - GNU/Linux program giving statistical value(s) for a piped-in list of numbers. [ GNU/Linux: 2022-24 ]
- relay-enforcer - GNU/Linux program enabling a postfix-based mail server relaying to Gmail to act on reports from Gmail about blocked emails. [ GNU/Linux: 2016-24 ]
- pdf-compress - GNU/Linux program to create smaller b/w pdf file from an original large pdf file, especially when original resulted from scanning. [ GNU/Linux: 2016-24 ]
- tiny-device-monitor - GNU/Linux program to test webpages (including password-protected) or machines to check they are live; use as a cron job for your own websites, for hardware presenting a webpage, or for any machines with a presence on your local LAN or on the internet. [ GNU/Linux: 2009-24 ]
- form-extractor - GNU/Linux program to extract form tags from a web page or downloaded file. [ GNU/Linux: 2012-24 ]
- mythic-dns-sync - GNU/Linux program to update DNS record at mythic-beasts.com to match local external ip. [ GNU/Linux: 2016-23 ]
- saynoto0870 - For UK, a GNU/Linux script which performs automated lookup of the www.saynoto0870.com database, finding cheap or free geographic number replacements for expensive non-geographic (087* or 084*) numbers. [ GNU/Linux: 2012-12 ]
- bind9-resolved-switch - GNU/Linux program for switching permanently between using bind9 or systemd-resolved as the system DNS resolver. [ GNU/Linux: 2016-24 ]
- unlock - GNU/Linux remote program for easy entering of decrypt passphrase on a remote machine which has root dm-crypt+LUKS. [ GNU/Linux: 2017-18 ]
- wifi-updown - GNU/Linux program to take down wifi interface if there is a working wired interface (or restore wifi if not). [ GNU/Linux: 2018-23 ]
- routefix - GNU/Linux program to restore a default ip traffic route if there is none such (e.g. after running wifi-updown). [ GNU/Linux: 2018-23 ]
- dutree - GNU/Linux program to show a tree-style list of files and directories at the specified location which are greater than the specified size (default 1GB). [ GNU/Linux: 2012-24 ]
- Accounts - Multi-business multi-currency accounting software, uses Access [MS Windows©: 1996-2024]
- Rents Program - Residential lettings/landlord front office program, with many special features for UK market [MS Windows©: 1991-2024]
This section is closed. If you have a question, please submit it by email, thank you.