The Robservatory

Robservations on everything…

 

Remember, kids, RAID is not a backup!

Major update: The QNAP box failed tonight (Aug 3) after running flawlessly for three days straight. I went out to grab some dinner (I shouldn't leave, ever, apparently), and came back to the RAID offline with just a power light, no USB or drive lights.

I moved the drive from a USB hub on a long cable to directly into my Mac on a short cable. Same problem. I then pulled the drives from the array and dropped each into my drive dock, and they were both fine. (All my data was gone, though—thankfully I had literally cloned the drive just before I went out.)

Needless to say, the QNAP box is going back. I've ordered a different unit, with a different chipset in it, but it won't be here for about a week. In the interim, I've put my new drives in external enclosures, and I'll just use Carbon Copy Cloner to mirror them every 30 minutes or so. I've edited the post to reflect my experience.

I'll edit and repost this once the new box is here and (hopefully) working, though I might wait more than three days after it arrives, just to be sure!

On my iMac, I have a fair amount of data—somewhere around eight terabytes or so spread across 15TB of drive space. Until last week, I had it split between the internal SSD (work and personal files I access a lot), an external 6TB USB drive (archive stuff I want to keep but not regularly access), and an external 8TB RAID box (a whole bunch of music, movies, home videos, work videos, etc.)

Being paranoid, I also had relatively good—but not bulletproof, as I discovered–backup strategies for all of these things. And it's a good thing I did, as last week, my external RAID box died in spectacular fashion. While I was out of town, no less. And that's why they say, "RAID is not a backup!"1Many RAID levels duplicate your data, but if something happens to the RAID box itself, the data is toast.

So what happened, how'd I recover, and what's my new plan going forward?

What happened

As background, since 2014 I've used a LaCie 5Big Thunderbolt 2 as my external RAID box. I originally chose this pricey but powerful solution because it seemingly offered the best of both worlds: It had hardware support for RAID 10, which combines RAID 0's striping (writing data across two or more drives) for speed with RAID 1's mirroring (copying data to protect against drive failure). To do that, it uses four drives, and the fifth is a "hot spare" that takes over automatically in the event a single drive fails.

Over the years, the LaCie worked great for me. I had three drives fail over that time, and each time, the hot spare seamlessly came online and resynced its data. It continued to work great…right up until it failed. When I got back from my trip, I looked through the log file, and it seemed to indicate that three drives failed at the exact same time—and the odds of that being true seem incredibly low. While the LaCie could handle one drive dying for sure, two if they were the "right" two, there's no way could it handle three dead drives. My RAID, and all the data it held, were gone.

What happened? I still don't know, but even after replacing the three supposedly-dead drives with working drives, I was unable to get the box to create a new array—it would sound an alarm and fail as soon as I tried. If I fought with it for a while, and maybe invested in five brand-new drives, perhaps it would have worked. But given the age of the LaCie, and the amount of effort it might take to get working and then perhaps fail again in the near future, I didn't think that was a worthy investment.

Instead—after trying and failing to give it away for parts—I did some deconstructing…

deconstructed

One can never have too many tiny screws in their collection of Things That May Be Useful Someday™.

The recovery

The good news is that I'm paranoid about backups. The bad news is that I found I'm not paranoid enough about backups. I would semi-regularly clone the RAID to a local backup drive, but that's a manual process. And really key files are backed up to another drive via a CarbonCopyCloner routine every hour. Then, once a week, I would clone the array to a drive that my wife keeps at her office. But, as you can see, there's no true real-time backup in here, nor is there an online component.

Thankfully, the timing worked out well, as we'd made an offsite backup just a couple days before I left town, and my often-changed work and personal files are on other drives. In the end, all I lost was a handful of iPhone photos (thankfully, nothing irreplaceable in that small batch). So I used Carbon Copy Cloner (CCC) to restore the backed-up data from the offsite drive to my new storage solution.

But what'd I restore the data to, you may be asking? That was, indeed, the trickiest part of this process: What to use going forward for my storage solution?

The new plan

I got back to town last Thursday afternoon, and I wanted to get my data back as quickly as possible—especially as my iTunes (yes, still on Mojave), Usher, and Photos collections were on the RAID array. I considered three different replacement strategies:

  1. A new multi-disk RAID10 box from LaCie or PROMISE
  2. Separate external hard drives
  3. A hybrid approach with RAID and separate drives

As much as I liked my RAID10 setup, this crash has instilled a fear of stripes (RAID 0, which is part of RAID10) in me: The problem is that if you have an issue, even if the drive is perfectly fine, you can't get to the data on that drive unless it's in the array, because not all the data exists on any one drive. The other issue with such a box is that if I ever wanted to expand its capacity, I'd have to replace four drives at a time, which gets pricey. So while I could go with the first option, they are really expensive, and not really needed if I wasn't going to use RAID10.

I seriously considered separate external hard drives, such as the LaCie Professional 10TB external drive, which uses enterprise-class hard drives and includes a five-year warranty. The plan would be to purchase two drives, and use CCC to clone one to the other on a regular (and automatic) basis.

The advantage of this approach would be that the drives are truly separate—each has its own power supply, so I wouldn't lose multiple drives at once due to a failed power supply. The two downsides I saw were that they do have separate power supplies, meaning more things to plug in, and more cables to manage. In addition, as good as CCC is, the copying wouldn't be real time, leaving me open to at least some level of possible data loss.

So in the end, I chose a hybrid approach: I'm using a two-drive RAID enclosure with separately purchased drives, plus a single super-fast Thunderbolt 3 NVME drive in an enclosure. Here's exactly what I'm using:

In total, this setup cost $1,001 ($759 for the 10TB drives and RAID box; $242 for the NVME and enclosure). While that may sound expensive, it's less than half the cost of a big five- or six-bay RAID box with drives—Apple sells the six-bay 24TB PROMISE Pegasus32 R6 for $2,299, which is comparable to what the LaCie cost back in 2014. But why an extra drive when a single RAID box worked for me before?

The NVMe drive

I decided that now was a good time to move my Photos library from the RAID box to something faster: I tend to launch, browse, and quit Photos multiple times a day, and my database is 1.3TB in size, so I wanted it to be speedier. And as I'd never used an external NVMe drive, I thought this would be a good way to test such a drive (with good backups in place, of course!).

I picked a well-reviewed model for both the drive and the enclosure, though I have no prior experience with either brand, other than probably owning some PNY memory at some point.

NVMe drive and enclosure

Assembly was easy; the enclosure is tool-free, the drive slides in, and you insert a plast/rubber bit to hold down the other end.

Connected to the Thunderbolt 3 port on my iMac, the drive is quite fast; reads and writes are in the 950MB/s range using the Blackmagic Disk Speed Test app. This isn't quite as fast as the internal NVMe in my iMac (in the 1,300MB/s range), but it's notably faster than USB3 SSDs, and miles ahead of any traditional hard drive. And the thing is tiny, making it very easy to take with me when I travel.

I did run into one anomaly when connecting the drive. I first connected it directly to a USB3 port on the back of the iMac, as the Thunderbolt 3 ports were full at that time (I was still trying to save the RAID box). However, copying seemed incredibly slow, so I checked with the Blackmagic Disk Speed app, and was shocked at the results: 27MB/s write and 42MB/s read. For comparison, an external spinning-platter USB3 drive I have scores in the 165MB/s range—yikes!

I then moved the drive to a USB3 hub that was connected to another USB3 hub which was finally connected to the iMac. Plugged in that way, it scored 422MB/S in write and 415MB/s read. This makes no sense to me at all, but as my long-term plan was to use the Thunderbolt 3 ports, I didn't really worry about it. It may have something to do with the circuitry in the enclosure I chose, who knows—but something to be aware of it you see much slower performance than expected with a similar setup.

The RAID box

If you're using a Mac, don't buy this box! I have no idea why mine failed, but it did, and did so despite trying multiple connection methods. I've edited the following text, leaving generic bits alone, but striking out QNAP-specific references.

One reason I went with internal drives and an enclosure was to cut down on the wires and power bricks I had to manage. But the main reason was to add hardware RAID support. No, not RAID10, which uses the striping that I now consider evil. I wanted a box that supported RAID 1 (mirroring) in hardware, so I would have a duplicate of my data. That way, if one drive died, the other would have a full copy of my data. And if the RAID box itself died, the drives would be usable as they each contain a full copy of the data. And best of all, it'd be automatic and in real time, with no work on my part.

Amazon offers roughly 27,275 different hard drive enclosures. After doing some reading of reviews, I chose the QNAP two-drive box. Using DIP switches, you can configure the box for RAID 0, RAID 1, JBOD, or individual drives. And while DIP switches seem antiquated, I figured this was a "set and forget" operation, so it didn't bother me.

I've had this running for a couple days now, and so far, I'm impressed. The first impression isn't great—the trays are plastic, and the locking mechanism feels really cheap. But the drives seat well on the plastic trays (and include snap-on side panels to hold them in place, plus a set of optional-to-use screws to hold them down). If I were going to be switching drives regularly, the plastic construction would bother me, but for occasional use, they seem more than sturdy enough.

One very pleasant surprise was the presence of a separate (optional and free) RAID management app. With the DIP switches set to "software," you can use QNAP's app to create and manage the array. The app has a simple interface and is easy to use, and includes SMART status checks, a log viewer and exporter, and firmware updater.

It doesn't include any way to email alerts for problems, however, so it's more basic than what was included with my LaCie. But being able to interact with the RAID in some manner is definitely better than setting some DIP switches and hoping everything's working right.

Now that everything's set up, and most of my data is copied back to the new drive (or is that drives?), I've found the speed to be more than acceptable. It's not as fast as the striped RAID10 in the LaCie was, but it's faster (slightly on write, more notable on read) than my other external USB3 drives. There's a fan in the enclosure, but it's very quiet—much quieter than the LaCie; it never got louder even when copying terabytes of data to two drives at once.

Once again, I would not recommend the QNAP box to anyone using a Mac.

Back up

With the data recovered and things back to relative normal, that just leaves the question of improving my backup strategy. The one thing missing is a real-time offsite backup, which means some sort of online solution. I'll be looking into Backblaze and iDrive over the next few days (along with some others, I imagine).

I'll continue to do my onsite manual backups, along with the weekly offsite, but adding an online copy of my data should patch the last hole in my backup strategy.

Wrap up

Although it's never fun to deal with dead hard drives and data loss, I escaped relatively unscathed thanks to a good (but not perfect) backup strategy. The LaCie gave me seven years of reliable service, which makes its high initial cost seem not quite so bad. I'm hopeful that my new solution will work equally as well, but intend to improve my backup strategy in case that doesn't turn out to be true.

For much less money than I spent in 2014, I gained 4TB in storage space (2TB on the array, and 2TB on the NVMe), reduced the physical size of the RAID box, and lowered the fan noise level. All things considered, perhaps it was good that the LaCie died and forced my hand.

4 Comments

Add a Comment
  1. This is a good reminder. RAID is a backup against (limited) drive failure. Not deletion, overwrite, or other stuff (including ransomware). And with the speed of SSDs to do the major work, you barely even need the speed benefits of RAID.

    I do not manage major storage, but I've thought about it and I follow the DataHoarder subreddit for ideas. Personally, I like the idea of JBOD + mergergs along with backups (SnapRaid? others?). Then you get the larger size of multiple drives without the interdependency. And, maybe it is just that I have been lucky enough to never have a drive fail (knocks wood), I prefer backup to immediate robustness. Especially since downtime won't be a big deal for me.

    I am a bit surprised that you didn't already have a real time backup; especially of important things like photos. They weren't in iCloud and/or a TimeMachine backup?

    Backblaze is great if you can deal with the limitations. I hear the restore process is less-than-amazing but it is a last-line of defense so that isn't too bad.

    1. Sorry, I do have Time Machine, but not for everything—I use it for things I'm more likely to want versions of, which turns out to be mostly stuff on my internal and USB external, not the RAID. I'm going to get a larger TM drive, though, and change that approach, too.

      -rob.

  2. I'm assuming the reason that you're using external drives for storage as opposed to NAS boxes is for the speed of Thunderbolt?

    The benefit of using NAS vs direct-connect is that the NAS box will be smart enough to back itself up to other locations, so even with a catastrophic failure, you'll have a good chance to get everything back.

    I have 3 main Synology servers, one for data storage, one to capture video surveillance and one for Plex/VMs. Each Mac in the house backs up to #1 via Carbon Copy Cloner, to #2 via Time Machine, to #3 via Arq, and a fourth backup to Microsoft OneDrive via Arq.

    The main data server backs up to OneDrive and Backblaze on a regular basis, and all servers back up to an ancient #4 Synology NAS that was left over after a recent upgrade. Not to mention backups to drives in my Safety Deposit Box, optical media and photos to Amazon Prime.

    Yes, I'm paranoid. I've been dealing with computers far too long not to be.

    1. I've tried NAS a couple times, but I just can't handle the (lack of) speed. I've also run into compatibility issues with Time Machine. And one of our own apps, Usher, doesn't work well with media files on NAS drives due to some QuickTime limitations. (We could write a custom driver, but that's a huge amount of work for an app that sells in very small numbers.)

      So just from a personal perspective, I've always preferred local storage for primary storage. I do use a Time Capsule for Time Machine backups of the laptops in the house. though.

      -rob.

Leave a Reply

Your email address will not be published. Required fields are marked *

The Robservatory © 2021 • Privacy Policy Built from the Frontier theme