Files getting lost or corrupted? A most heinous challenge, dude! So when strange things are afoot at the Linux-workstation, we totally hit our backups. We need to get started…but what commands should we use? We can use:
1. tar to make daily backups of all files,
2. or combine find and tar to back up changed files…
3. or use the magical rsync.
Some need more scripting to backup and some need more scripting to restore. Let’s hop in the phone booth and zoom through our options…
Tar: Whole Directories
Excellent: The clear advantage to tarring whole directories is that you are accounting for missing files when you restore from any particular point in time. You can also do a wholesale recovery very quickly, just by expanding the tarball for that directory. Below is an example of how to apply parallel compression to speed up your archive compression:
tar cf $backup.tbz2 --use-compress-program lbzip2 .
Bogus: Many of your data files are pretty large: megabytes, even gigabytes. Backing one of those up every day is likely going to be costly, especially when it rarely changes.
Tar: changed files
Excellent: We immediately start saving space. We can even detect when directories have had no changes since last backup and avoid making a misleading empty archive. You can detect when files disappear by creating a list of files with ls:
find -mindepth 0 -maxdepth 1 -type f | sort > .files touch .recent find -type f -newer .recent > backup-list tar cf $backup.tbz2 --use-compress-program lbzip2 -T backup-list
The . files
file will be picked up in the backup-list file. And while we’re here, let’s make a shortcut function for our tar command, so we can save our righteous keystrokes:
function Tarup() { tar cf "$1" --use-compress-program lbzip2 $@ } function Untar() { tar xf "$1" --use-compress-program lbzip2 $@ }
Bogus: Restoration of this type of backup strategy is more difficult. To start a restoration, you have to first start with a full backup (a grandfather backup, yearly full, monthly full, or such). Then you have to apply each archive file, and at the end of that series, use your .files list and remove any files that were not present during the last backup.
cd /home yesterday=2016-11-02 Untar /mnt/backups/yearly-home.tbz2 for f in /mnt/backups/home-*.tbz2 ; do [ x$f == x/mnt/backups/home/home-$yesterday.tbz2 ] && break; Untar $f done find -mindepth 0 -maxdepth 1 -type f > .newfiles diff .files .newfiles | grep '^>' read -p "These files will be deleted, proceed?" ANS if [ x$ANS == xy ] ; then diff .files .newfiles \ | grep '^>' \ | tr -d '>' \ | while read F; do rm -f $F ; done done
You will have to verify this process. File names with spaces and subdirectories might not work with this example as I have coded it. This is why you totally verify your backup and restore process!
Rsync and daily backups
Excellent: there are a lot of advantages to rsync
:
if
files are modified but theirmtime
doesn’t change they will still get backed up.- For simple backups you typically need little to no scripting.
- It is most excellent with ssh!
- A clever exclude syntax that comes with the command.
- With the
--delete
switch you can remove files that were deleted on your disk.
So, rsync
is great if you want mirror directories.
Totally cool: rsync
can also do hard links across directories! You can save radical space by providing a previous backup directory on the backup file system to keep only one copy of the file. You use the --link-dest
switch:
rsync -a --delete --link-dest=/mnt/backups/home-$yesterday.d \ /home /mnt/backups/home-$today.d
This requires a home-2016-11-06.d
directory and creates the home-2016-11-07.d
directory. Files deleted on the seventh are still there in sixth’s directory. Files that are the same are just hard-links between the directories home-2016-11-06.d and home-2016-11-07.d. (Refer to Jeff Layton’s article about using rsync
for incremental backups.)
Bogus: Rsync might not be excellent for your needs:
- No on-disk compression of backups (compression only over ssh)
- A point-in-time set of backup uses a new directory for each day with the hard-link technique above. This requires scripting.
- Be careful of the -b option! It means backup…but that renames each old file on the server directory you’re backing up to. If you have two years of backups, you’ll have 730 copies:
.ssh/known-hosts.729 .ssh/known-hosts.728 .ssh/known-hosts.727 ... .ssh/known-hosts
- Millions of small files? Whoa, dude:
rsync
can slow down with very large sets of small files. You might need to run a fewrsync
commands in parallel. - Limited memory? I’ve seen
rsync
take up hundreds of megabytes of memory backing when up hundreds of thousands of files (to store the list of file attributes for comparison). You might have to script yourrsync(s)
so they walk up a directory tree so as to only backup hundreds of files at a time.
Remember to backup! Stay Excellent!