As much as I want to keep SuperDuper!'s UI as spare as possible, there are times when there's no way around adding things.

For years, I've been trying to come up with a clever, transparent way of dealing with missing drives. There's obvious tension between the reasons this can happen:

  1. A user is rotating drives, and wants two schedules, one for each drive
  2. When away from the desktop, the user may not have the backup drive available
  3. A drive is actually unavailable due to failure
  4. A drive is unavailable because it wasn't plugged in

While the first two cases are "I meant to do that", the latter two are not. But unfortunately, there's insufficient context to be able to determine which case we're in.

Getting to "Yes"

That didn't stop me from trying to derive the context:

  1. I could look at all the schedules that are there, see whether they were trying to run the same backup to different destinations, and look at when they'd last run... and if the last time was longer than a certain amount, I could prompt. (This is the Time Machine method, basically.)

    But that still assumes that the user is OK with a silent failure for that period of time...which would leave their data vulnerable with no notification.

  2. I could try to figure out, from context (network name, IP address), what the typical "at rest" configuration is, and ignore the error when the context wasn't typical.

    But that makes a lot of assumptions about a "normal" context, and again leaves user data vulnerable. Plus, a laptop moves around a lot: is working on the couch different than at your desk, and how would you really know, in the same domicile? What about taking notes during a meeting?

  3. How do you know the difference between this and being away from the connection?
  4. Ditto.

So after spinning my wheels over and over again, the end result was: it's probably not possible. But the user knows what they want. You just have to let them tell you.

And so:

Show drive error

The schedule sheet now lets you indicate you're OK with a drive being missing, and that we shouldn't give an error.

In that case, SuperDuper won't even launch: there'll just be blissful silence.

The default is checked, and you shouldn't uncheck it unless you're sure that's what you want...but now, at least, you can tell SuperDuper what you want, and it'll do it.

And who knows? Maybe in the future we'll also warn you that it's been a while.

Maybe.

He's Dead, Jim

If you're not interested in technical deep-dives, or the details of how SuperDuper does what it does, you can skip to the next section...although you might find this interesting anyway!

In Practices Make Perfect (Backups), I discuss how simpler, more direct backups are inherently more reliable than using images or networks.

That goes for applications as well. The more complex they get, the more chances there are for things to go wrong.

But, as above, sometimes you have no choice. In the case of scheduling, things are broken down into little programs (LaunchAgents) that implement the various parts of the scheduling operation:

  • sdbackupbytime manages time-based schedules
  • sdbackuponmount handles backups that occur when volumes are connected

Each of those use sdautomatedcopycontroller to manage the copy, and that, in turn, talks to SuperDuper to perform the copy itself.

As discussed previously, sdautomatedcopycontroller talks to SuperDuper via AppleEvents, using a documented API: the same one available to you. (You can even use sdautomatedcopycontroller yourself: see this blog post.)

Frustratingly, we've been seeing occasional errors during automated copies. These were previously timeout errors (-1712), but after working around those, we started seeing the occasional "bad descriptor" errors (-1701) which would result in a skipped scheduled copy.

I've spent most of the last few weeks figuring out what's going on here, running pretty extensive stress tests (with >1600 schedules all fighting to run at once; a crazy case no one would ever run) to try to replicate the issue and—with luck—fix it.

Since sdautomatedcopycontroller talks to SuperDuper, it needs to know when it's running. Otherwise, any attempts to talk to it will fail. Apple's Scripting Bridge facilitates that communication, and the scripted program has an isRunning property you can check to make sure no one quit the application you were talking to.

Well, the first thing I found is this: isRunning has a bug. It will sometimes say "the application isn't running" when it plainly is, especially when a number of external programs are talking to the same application. The Scripting Bridge isn't open source, so I don't know what the bug actually is (not that I especially want to debug Apple's code), but I do know it has one: it looks like it's got a very short timeout on certain events it must be sending.

When that short timeout (way under the 2-minute default event timeout) happens, we would get the -1712 and -1701 errors...and we'd skip the copy, because we were told the application had quit.

To work around that, I'm no longer using isRunning to determine whether SuperDuper is still running. Instead, I observe terminated in NSRunningApplication...which, frankly, is what I thought Scripting Bridge was doing.

That didn't entirely eliminate the problems, but it helped. In addition, I trap the two errors, check to see what they're actually doing, and (when possible) return values that basically have the call run again, which works around additional problems in the Scripting Bridge. With those changes, and some optimizations on the application side (pro tip: even if you already know the pid, creating an NSRunningApplication is quite slow), the problem looks to be completely resolved, even in the case where thousands of instances of sdautomatedcopycontroller are waiting to do their thing while other copies are executing.

Time's Up

Clever readers (Hi, Nathan!) recognized that there was a failure case in Smart Wake. Basically, if the machine fell asleep in the 59 seconds before the scheduled copy time, the copy wouldn't run, because the wake event was canceled.

This was a known-but-unlikely issue that I didn't have time to fix and test before release (we had to rush out version 3.2.3 due to the Summer Time bug). I had time to complete the implementation in this version, so it's fixed in 3.2.4.

And with that, it's your turn:

Download SuperDuper! 3.2.4