Home > Life, Linux & F/OSS, Technology > Backup Strategies

Backup Strategies

With my primary hard drive (a three-year old WD Raptor WD740) having been on life support, so to speak, for the last 3 months, I’ve been a lot more diligent about keeping backup copies of my data. Every couple of days, I log out entirely and run a simple rsync script to copy my entire /home directory to a specialized partition on my secondary disk, which I keep at /mnt/backup for simplicity sake.

While its parameter handling can be a bit quirky, I find that it is extremely useful for two reasons: The first more or less negates its quirky parameter handling: Clear and thorough documentation, with lots of example program callsΒ  The second is that it saves me a lot of time in copying the files. Similar to the DeltaRPM feature I raved about with Fedora 11, it copies over only the changed content instead of the entire directory tree. With my home directory at nearly 20 GB, incrementally updating my backup like this prevents a good 90+% of the data from needing to be copied again.

In this way, I know that I have at least two copies of my data at any given time. A major plus to copying the directory tree as-is is that, once the drive does die and I replace it, I merely need to copy it over, without changing anything or unpacking huge tarballs and applying diffs, et al.

The disadvantage to this is that I only have one consistent backup copy of my data at a given time, and that backup is on a hard drive in the same computer. So, should there be a massive system failure of some sort (knock on wood!), then I would lose my data for certain. I also intend to purchase CD-RWs for this purpose – that is, as an additional backup medium – in the near future. But for right now, the second on-disk copy suffices. I also want to setup a RAID system in my next computer build…but that’ll have to wait. πŸ™‚

So this simple rsync method, as with any storage decision, has its benefits and downfalls:
Pros:

  • Easy to configure;
  • Can be automatically run (e.g., in a cron job);
  • Updates occur via content deltas, not full copies;
  • Backup data is “as-is”, and can be used immediately after copying.

Cons:

  • Only one backup copy;
  • Physical proximity to original data;
  • Requires space for an entire duplicate of the directory tree.

For me, though, this method works out well. Do others have a similar system? Would you suggest any improvements/simplifications? I’d like to hear your thoughts on the matter! Thanks.

  1. June 20th, 2009 at 22:01 | #1

    You might be interested in something like duplicity. It allows you to encrypt your backups, and allows you to store incremental backups, which would let you restore things from different points of time.

    • June 20th, 2009 at 23:23 | #2

      That might be a good idea. A simple mirroring scheme like what I mentioned doesn’t work when your backup includes the fact that you accidentally deleted a very important file or something along those lines. I will look into it. Thanks! πŸ™‚

  2. Andy Burns
    June 20th, 2009 at 22:03 | #3

    rsync has to read the entire source copy and the entire destination copy to decide which differences to copy, normally this is a win as the speed of the reads (independant at each end to caclulate the chunk checksums) is fast compared to the speed of the copy (usually over a slow remote link)

    But in your case, the copy is at similar SATA speed to the reads, both reads are one the same machine, competing with each other for I/O bandwidth and CPU for the checksums.

    So rsync *might* not be the fastest in your case …

    • June 20th, 2009 at 23:22 | #4

      Well, each drive is on its own dedicated channel (one of the inherent benefits of SATA), and I’ve got a dual-core machine with plenty of RAM, so I’m not worried about it being too slow. Thanks for the comment though…hadn’t really given that much thought. πŸ™‚

  3. Lucas
    June 21st, 2009 at 00:46 | #5

    One way to have multiple/incremental backups with rsync:

    First move your existing backup off the root of the backup partition (say, for example, to /mnt/backup/latest)

    Then, whenever you want to take a new backup first create a hard link farm of the latest backup:

    cp -al /mnt/backup/latest /mnt/date/tmp

    rename latest to something else and tmp to latest, and then rsync as normal to /mnt/backup/latest. Because hard links are used only files that have changed since the last backup are written so it’s very cheap in terms of disk space. You may have to watch your inode usage (df -i) after a while though.

    • June 21st, 2009 at 01:01 | #6

      That’s an interesting idea…I’d never thought of using hard links in that way. I’ll play with these some more over the coming days/week. Thanks for all the ideas! πŸ™‚

  4. June 21st, 2009 at 03:11 | #7

    Peter,

    It sounds like you want the “rdiff-backup” package, which is already in Fedora – I used it at a previous workplace to back up their user and config data to fixed storage – easier than Bacula and much faster to restore than tape. There’s a web frontend out there too, making browsing points in time / restores much easier.

  5. Luca Botti
    June 22nd, 2009 at 00:00 | #8

    Hi,

    you should have a look both at rsnapshot and BackupPC. Both work with rsync, using symlinks to have only incremental disk space usage. The latter is a “server calls client” solution, so if you have a server of some sorts, I think it works better.

  6. October 5th, 2009 at 13:31 | #9

    One way to have multiple/incremental backups with rsync:

    First move your existing backup off the root of the backup partition (say, for example, to /mnt/backup/latest)

    Then, whenever you want to take a new backup first create a hard link farm of the latest backup:

    cp -al /mnt/backup/latest /mnt/date/tmp

    rename latest to something else and tmp to latest, and then rsync as normal to /mnt/backup/latest. Because hard links are used only files that have changed since the last backup are written so it’s very cheap in terms of disk space. You may have to watch your inode usage (df -i) after a while though.

  7. October 6th, 2009 at 11:52 | #10

    You might want to have a look at rdiff-backup as well. It uses a mixture of hard links and librsync to give you history.

    Also why not look at backing up to Amazon S3 – 20GB is not much really, so it would be pretty cheap.

  8. October 6th, 2009 at 16:48 | #11

    @Chris Cowley Re: Amazon S3 – I’m far too paranoid about my home data to have it stored remotely, even encrypted. (Plus, my upstream speed is not very good, so it’d take several days to copy everything over most likely.)

  1. No trackbacks yet.