The Robservatory

Robservations on everything…

 

How to not accidentally delete all your rsync backups

With my Time Machine-like rsync backups running well, I decided it was time to migrate over the cleanup portion of my old script—namely, the bit that removes older backups. Soon after I added this bit to my new script, though, I had a surprise: All of my backups, save the most recent, vanished.

In investigating why this happened, I stumbled across two rsync/macOS behaviors that I wasn’t aware of…and if you’re using rsync for backup, they may be of interest to you, too.

The first behavior has to do with the creation date on folders created by rsync (though this had nothing to do with my vanishing backups). My blog’s structure on the server looks something like this:

top level folder > robservatory_com > HTML files

On my Mac, I wanted the backup folders structured like this:

backups folder > robservatory > date-specific folder > HTML files

To get that structure, my rsync command looked like this:

/usr/local/bin/rsync -aP \
  --link-dest=$homeDir/TMrobservatory/current $userhost:$remoteHTML/robservatory_com/ \
  --exclude "errors.csv" \
  --delete --delete-excluded \
  $homeDir/TMrobservatory/back-$newtime

The trailing slash on the link-dest means that rsync will grab the contents of the indicated folder, and place them in the directory specified on the last line (the date-specific folder). What I discovered is that if you do things this way, the creation date on the server’s robservatory_com folder gets applied to the date-specific folder—even though the folder was created by the rsync command! In my case, that means it was dated November of 2009, when the folder was originally created on the server.

That was odd, and I fixed it by using SetFile to change the creation date for my existing folders. More importantly, I removed the trailing slash, which forces rsync to copy the folder itself, and everything in it. Now my local backup structure looks like this:

backups folder > robservatory > date-specific folder > robservatory_com > HTML files

It’s more depth than I’d prefer, but the weird creation date stays with robservatory_com when done this way. And as noted, this issue didn’t have anything to do with the vanishing backups. But it was important to fix, as it’s now used in my new cleanup command.

The vanishing backups were all the fault of the second behavior, which, to be honest, I still don’t completely understand. What seems to happen is that folders created by rsync during the scripted backup aren’t really “seen” by the macOS Finder until I navigate into one. What I mean by that is this…

On the left is the Info window for a backup that my script created yesterday at 4:30pm. Note that that’s both the creation date and the modification date; all I’ve done at this point is select the folder and show the Inspector. On the right is that same folder, a few seconds later, showing a different modification date. So what modification did I make? None whatsoever.

All I did was navigate into the folder; as soon as I did that, macOS updated the modification date—and notice the size changed, too. It’s like that first step into the folder resolves some of the hard links, which forces a change in the modification date. Now here’s the weird part: It only does this on the first dive into the folder. After that, it behaves like any other folder, and the modification time only changes if something in the folder is modified.

I’ve only seen this behavior on these folders of hard links created by rsync, so I don’t know if it’s a bug, a feature, or something in between. If anyone can explain this, I’d love to know what’s going on!

While I don’t know why this happens, I do know is that it’s responsible for the demise of my backups. The line I had been using to trim the older backups was this one:

find /path/to/backups/ -d 1 -type d -mmin +$((60*24*4)) -maxdepth 1 -delete

That should have deleted all backups older than four days, based on the modification date. And in general, it did…as long as I either hadn’t ever visited a backup folder, or had done so very soon after it was created. But if I happened to first navigate into one of the older folders many days after it was created, the modification date would change, and poof, that backup would vanish on the script’s next run.

One day, I went into every folder, as I was verifying that all the hard links were working…and in so doing, I set up all the folders for deletion the next time the script ran. Yikes! To fix this problem, I’m now using this version of the find command:

find /path/to/backups/ -d 1 -type d -Bmin +$((60*4*24)) -maxdepth 1 -exec rm -r {} +

The -Bmin flag checks the creation date, not the modification date, so I avoid the weird “change modification time on first touch” issue. I also swapped the -delete for -exec rm -r {} as I read that it should be faster. One very important bit of that command is the -d 1. Without it, find will match the current directory, too, and delete it…thereby (again) wiping out all your backups. Do not ask me how I know this.

So far, in limited testing, this is working well.

One side note: If you’re making Time Machine-like backups like this, do not delete them in Finder. Processing the hard links is incredibly slow, and if you have tens of thousands of files, it could literally take days to delete them all. Remove them in Terminal, via rm -r, or via the find command, as shown abo ve. Trust me on this one; I learned through (thankfully) a friend’s experience, as he waited the better part of 25 working hours for his trash to empty.

5 Comments

Add a Comment
  1. I’ve use rsnapshot in the past, which handles incremental updates, backup rotation, and deletion.
    I’ve got nothing against rolling your own, but it may save you some time if there are issues you’re trying to work out.
    http://rsnapshot.org/

    1. I saw that, and it looks very good. Despite the occasional pain, I like the learning that comes from building these scripts. And in this case, the rsync bit is just one part of a larger backup that also backs up SQL files, and moves some other stuff around.

      -rob.

  2. I’m not sure it would make a difference, but in AppleScript, you can tell Finder to tell a window to “update every item with necessity”. That might let you eliminate some of the uncertainty about Finder’s involvement.

  3. No big mystery. Navigating into the folder likely created a .DS_Store file, which caused the folder’s modification date to change since an item was added to the folder. ; )

    1. Hmm, could be … the size change is 28KB, which seems a bit large for a .DS_Store file … but that must be it.

      -rob.

Leave a Reply

The Robservatory © 2017 Built from the Frontier theme