Workaround runaround Monday, January 21, 2008

A few days ago, we were happy to get a workaround for the bug mentioned in a previous post, so we went about implementing and testing—and things were working quite well.

Unfortunately, we've found a case where the workaround doesn't, um, work. I think I have a way of dealing with it, and so we're going to implement Workaround Part IV: The Reworkening and run it through the wringer.

Again.

These tests are really time consuming, as you might guess: it can take eight hours to rebuild a complex-case volume (so I try to keep one building while another is testing), and then more hours to run the test. I can't begin to tell you how frustrating it is to have a set of simple tests work and then find that the thing we found rearing its ugly head again in another (more complex) scenario that can (and did) happen in "real life".

This is why we don't just implement a fix and release, of course: we want to ensure it works before we toss it out there, and so we smoke test, test internally and then—if internal tests are OK—test externally, every time. We always are hoping for green lights at every stage. Sometimes, 99% of the way there, you get red. Then, the failure needs to be investigated, understood, reproduced...

Anyway, we'll leave that between me and my ulcer. If my new idea works, this will hopefully only delay things a few days, which will be plenty of time to fill the comments with complaints of our laziness, incompetence, lack of communication and general suckitude. Have at it!

Note: I've had to turn comments off for a little while because the subscriptions to comments are overloading my outbound servers (every time someone posts something, it's sent to the 200 people above their post), causing delays for regular support mail, etc. You can still review existing comments by clicking the title of this post. I'll turn them back on after the backlog clears. Sorry for any inconvenience: it's not that I don't want to hear what you want to say...

I’m Bugged Monday, January 14, 2008

Last post, I mentioned that we were bitten by a bug that showed up during late-in-the-game testing that didn't make a lot of sense, and was quite nasty in certain complex situations. This bug caused the "release process" to grind to a halt.

Well, I'm happy to say that, as of about two minutes ago, I've managed to figure out what's going on.

Basically, a folder can become "magic" in some situations, and even when the conditions that made it "magic" are reversed, the "magic" sticks around when it shouldn't. Unfortunately, this "magic" acted as a sort of "protective spell" on the folder, and was preventing us from doing anything.

Unfortunately, there's no 'external' visibility for when the "magic" sticks around, so we were seeing something that basically didn't make any sense. On top of that, it's new behavior in Leopard, which is why we've never seen it before. Fortunately, thanks to Amit Singh's recently-updated-for-Leopard hfsdebug (thanks, Amit -- love the book, too), I was able to drop down into the guts of HFS+ and determine what's happening.

Now that the problem's understood, we can implement an effective workaround. The workaround will mean that in some situations it'll re-copy a bit more than it should when Smart Updating. But, at least the result produced will be correct: and the workaround will break the spell, and remove the "magic".

Which is—let me tell you—a relief. (For those of you out there who have hit this kind of WTF-roadblock, where you have no idea what's wrong and thus can't even estimate how long it's going to take to figure it out and fix it, you know what I mean.) And for any Apple engineers reading, the (incorrect) behavior is described in rdar://5687977.

So, anyway, now that that data-integrity-related bug is getting wrapped up, we're back to putting together a final test build (should be in the next day or two), a bit of time to let our test group run their scenarios, etc.

So, barring another similar issue (please, no) showing up during testing, as I indicated in the comments of the previous post, it should be a week or so...

Quick Update Thursday, January 03, 2008

We look to still be on schedule, so hopefully you'll have the new version (which, by the way, I've decided will be 2.5, not 2.1.5) in a week or so.

We've been working through a problem with copying hard folder links in a complicated source volume recently that's got us a bit stymied—what we're seeing doesn't make sense—but hopefully we'll get it wrapped up shortly.

Sorry for the briefness of the entry: both busy and sick with some stomach flu... more when there's something new/interesting to say.

Page 1 of 1 pages