Backup Information

Introduction

How To

Background Info

Links

Differential Backup in Windows with Delta Files - Using 7-Zip and RDiff

Date last modified: Tue Nov 19 2024 8:29 AM

Introduction

This page was written in 2007 as I first began researching backups. Using 7-Zip and Rdiff in the way outlined here can work, but it is a laborious way of backing up files. A page I wrote later about backups can be seen here, and my eventual solution, now in production use, is TimeDicer.

A file-by-file differential backup strategy is where you take an original backup and thereafter you back up (say, daily) only the files that have changed from this original. By taking the original backup and applying the 'differential' backup for any particular day you can completely restore the data source to its state at the time of the differential backup. Differential backups are likely to be much smaller than the original backup and so you can have one original and many differential backups. Igor Pavlov's 7-Zip offers free, highly efficient and powerful archiving, including (once you understand them) options for creating differential backups.

Differential backups are likely to get bigger over time because the number of changes from the original backup will grow. An alternative is to use 'incremental' backups which save only those files which have changed since the last incremental backup. The problem with this approach is that to restore a complete data set at a given point of time requires you to apply sequentially every incremental backup. Complicated and, if you have lost one incremental backup, fatal. So in most situations I think a differential backup strategy is to be preferred.

But there can be another problem with file-by-file differential backups. Some files, such as those for databases or emails (e.g. *.mdb, *.pst) can be extremely large and yet change daily. In fact most of the data in these files does not change - 2000 emails may be stored in an email file [which may appear as a folder within your email program such as Outlook, Outlook Express or Thunderbird] but only 10 new emails may arrive each day. Nevertheless, a file-by-file differential backup may still produce very large differential backups because these large files have to be completely backed up in each differential backup.

A better approach for these large files is to use what is called 'delta' technology, which looks at the changes inside files. Some proprietary solutions (but by no means all) offer this technology, but it is also available free through a utility called 'RDiff' which is part of the system utilised in Linux by programs such as rsync, rdiff-backup and duplicity.

Note that rsync uses this technology to transfer files but backup utilities based around rsync do not necessarily use this when storing backups - for instance rsnapshot.

So we can use 7-Zip to create differential backups but to keep the size of these backups down we should combine it with RDiff, which looks at large files and records only the changes within these files. In this way differential backups can be a fraction of the size and still fully effective. At the heart of RDiff is the library librsync, which is very similar to rsync and allows the same delta technology that is used by rsync for file transfer to be used for file storage. You do not need a separate librsync.dll file to use rdiff.exe on Windows - instead you need the (included) cygwin-1.dll.

How to

My Windows XP batch file delta.bat handles this. It requires the following free utilities: datetime.vbs, rdiff (Windows port), Filebinreplace.exe, SendMail.exe and Robocopy (in Windows Server 2003 Resource Kit Tools).

To rewrite: [When creating the original backup you first use rdiffrse.bat (Windows batch file). This creates copies of the large files in a new specified location (which should be within the location from which you will create the backup) and then creates signature files from these. Then create your original backup, ensuring that these copies of the large files are included in the backup (but you do not need to include the same files in the original location, and it is better if you don't because these files might have changed in the meantime and so the signatures might not match them).

With the original backup safe you can delete the copies in the specified location if you like (as they may be taking up a lot of disk space)

When you need to do a differential backup you first use rdiffrde.bat. This uses the signature files (originally created with rdiffrse.bat) and the existing large files and creates delta (*.rde) files. These rde files must be included in the differential backup which follows - but you do not need to backup the existing large files, which should be excluded.]

Background Information

Using Rdiff

Rdiff is a (free) linux utility which was ported to Windows by Attila Zimler. You can download it here - this includes cygwin-1.dll in the zip file.

Rdiff can recreate a changed file if you have the original file and a 'delta' file of the differences. To create the 'delta' file you first need to have created a 'signature' file of the original - and you must keep the original file:

  1. use the original file to create a 'signature' file (my suggestion: .rse extension):
    RDIFF SIGNATURE [original-file] [signature-file].rse
  2. use the changed file and the (above) signature file to create a 'delta' file (my suggestion: .rde extension)
    RDIFF DELTA [signature-file].rse [changed-file] [delta-file].rde
  3. keep the original file and the delta file safe (backed up). To recreate the changed file:
    RDIFF PATCH [original-file] [delta-file].rde [new copy of changed-file]

rdiff does not accept wildcards (or at least the Windows port of it does not). You can overcome this in a Windows batch file as in this example, which creates signature files for all *.mdb files located in "i:\Our Documents\mdbs\":

for /f "usebackq delims=" %%A in (`dir "i:\Our Documents\mdbs\*.mdb" /b`) do c:\utils\rdiff signature "i:\Our Documents\mdbs\%%A" "i:\Our Documents\mdbs\%%A.rse"

rdiff --help:

Usage:

Options:

Delta-encoding options:

IO options:

from http://linux.die.net/man/1/rdiff:

rdiff(1) - Linux man page

Name: rdiff - compute and apply signature-based file differences

Synopsys:

rdiff [options] signature old-file signature-file
rdiff [options] delta signature-file new-file delta-file
rdiff [options] patch basis-file delta-file new-file

Description: In every case where a filename must be specified, - may be used instead to mean either standard input or standard output as appropriate.

Return Value: 0 for successful completion, 1 for environmental problems (file not found, invalid options, IO error, etc), 2 for a corrupt file and 3 for an internal error or unhandled situation in librsync or rdiff.

See Also: librsync(3)

Author: Martin Pool. The original rsync algorithm was discovered by Andrew Tridgell. Rdiff development has been supported by Linuxcare, Inc and VA Linux Systems.

Links

See my other pages: