Page 1
    Page 2
  Page 3
   

 

 
By William Van Winkle
 
 
SAS: DEFINING RELIABILITY

"We go through enormous lengths in our SAS drive to ensure that the data that arrives at the connector gets to the media and back again without being changed," says Willis Whittington, senior marketing manager for the enterprise compute business at Seagate. "SAS's data integrity checking essentially guarantees the host that the data that was sent from the host during the write is the data that the host will get back during a read. People say, ‘Well, you added ECC to it, but desktop drives do that too.' That's true. But there are different levels."

Data written to both SAS and SATA media is protected, but in getting to the media through the electronics, SAS adds a considerable amount of protection in the form of both error detection codes (EDC) and error correction codes (ECC). Whenever information gets written into a memory cell and read back out again, there is a small possibility that the data could have an error. There are various causes for this—alpha particles, basic conditions in the silicon, transient effects in the environment, and so on. The simple act of moving bits from one point to another is risky, and desktop drives provide little or no protection against such eventualities. The insidious part of such blown bits is that they're not detectable by the host unless the system is constantly running software meant to compare two independently maintained versions of the same image. Over time, the host will accrue these undetected "miscompare errors"—errors you only find out about after it's too late. RAID isn't designed to detect such mistakes; arrays merely store and copy.

Like Night
and Day

When Seagate describes enterprise (SAS/FC) drives trouncing desktop (SATA) drives by 2.5X or more, these are the numbers behind the claims. SAS wins on sequential reads and blows away SATA on random reading.

Perhaps there's no better illustration of "mission critical" than this: If you're talking about video data and miscompare errors, nobody is ever going to miss this or that given pixel. A flipped bit may give you a gray dot instead of a black one for one-thirtieth of a second. But if an "A" on a database form used in the field becomes a "B" and we're talking about blood types, somebody just died.

Fortunately, SAS drives guard against miscompares through the use of these extra correction methods, although there are no set standards here and exact approaches will vary between manufacturers. Still, every time data moves from one gate to another, SAS technology makes sure that the data going out is the exact same data that came in.

"A SATA drive says that if there's data at its interface, it must be the right data, so it'll go ahead and store it," says Seagate's Whittington. "There's no checking about where it came from, if it's right, if it's intact. If you use Fibre Channel or a SAS interface, then you've got a check on where the data came from, where it's going, and how valid it is. So on its way from, say, Sydney to a data center in Minneapolis, that data can go through multiple switches, multiple servers. And at any point where it changes address, there could be an error. We take care of that in the protocol by putting in metadata that says its origin, its destination, what piece of the data this is, whether it's in or out of sequence, what coding is being used. The data is almost fully protected."

Of course, you might point out that the same factors that can plague data inside of a drive also exist outside it. Alpha particles aren't picky. If you're going to sell customers SAS on the basis of superior data protection, then due diligence demands that you look to the entire solution. Unfortunately, an end-to-end answer is hard to find in the channel so far. The best option in the offing relies on something called Data Integrity Field (DIF), sometimes called Protection Information Model (PIM). Intel calls it BlockGuard. Whatever the name, DIF is a feature found in a handful of devices, such as Seagate's high-end enterprise drives and LSI's 2 Gbps LSIFC929XL dual-channel Fibre Channel controller. DIF is an error correction scheme whipped up by Agilent and promoted at RAID controllers. Essentially, eight bytes of error detection code get added before every 512-byte data block, then two more bytes of cyclical redundancy code followed by six bytes of assignable metadata are frequently used for storing Logical Block Address data.

Making a Case for Vibration
Given that vibration can have a serious impact on drive performance, resellers need to pay extra attention to drive enclosures. Out of 33 cabinets Seagate tested, one-third allowed for crippling vibration results.

All of this serves not only to identify the data but where it's coming from and where it's going. At every point in the chain where data changes direction, the metadata allows each device to double-check the user data. If something seems wrong, the device knows there's no point in forwarding the data, so the information gets returned to the sender along with an error message. These checks and balances exist in the data all the way to the drive. Only after confirming that the data and metadata are intact will a SAS or Fibre Channel drive record the data to the media. Exactly the same thing happens on the way back when the data gets read. The data can't leave the drive until it passes those integrity checks. SATA controllers make no provision for any of this. Unfortunately, neither do SAS controllers. To date, only the Fibre Channel market has semi-seriously entertained DIF, but the hope is that the scheme will spread more broadly throughout enterprise storage equipment later in 2007 and beyond.

SATA isn't a complete slouch when it comes to advanced features, though. Seagate's Barracuda ES nearline SATA drives, for example, integrate a SCSI command into the firmware called write-save. This allows RAID rebuilds to happen a little faster than you'd see with the desktop-class standard Barracuda. Nearline SATA drives also often add improved vibration tolerance. Western Digital gave this a fancy name, Rotary Acceleration Feed Forward (RAFF), while Seagate generically refers to it as tolerance to rotational vibration. Frustratingly, the two vendors sometimes use different measurements for this, making direct comparisons difficult. But Seagate does give this point of reference: A Barracuda desktop drive has an RV tolerance spec of 6 radians/sec2. The nearline Barracuda ES jumps to 12.5. The enterprise SAS/FC Cheetah line exceeds 21.

Vibration tolerance is no small spec depending on your operating environment. Every drive gives off some amount of vibration as its platters spin (faster spin usually means more vibration), and, particularly in lower-quality racks or chassis, the vibrational harmonics generated by a dozen drives running together can be substantial. Add to this system fans, nearby air conditioner motors, "bursty" workloads within the system, and other factors. Vibration can knock finely tuned heads out of their track alignment and generate time-wasting errors and retries. Many enterprise drives place accelerometers on the drive's PCB, compute the vibrational motion of the drive, and signal the drive's servo system to correct accordingly. As you can see in Seagate's rotational vibration chart, one-third of the storage cabinets tested yielded dangerously unacceptable vibration levels in the Cheetah 10K.6 drives used as a benchmark. Only five were suitable for desktop hard drives.

Fastest Animal on Four Platters
The Cheetah 15K.5 introduces perpendicular recording to Seagate's 15,000 RPM SAS/FC/Ultra320 lineup, cramming up to 75GB on each of the drive's 2.6” platters and sustaining throughput at 73 to 125 MB/sec.

Drives unable to cope with vibration are prone to data errors and premature failure. Most data errors can be recovered, but the retries alone can crush overall throughput. With a moderate RV condition of 10 radians/sec2, Seagate notes that a desktop drive will lose roughly 30% of its performance whereas a nearline SATA drive will only lose about 10 percent. Enterprise drives show virtually no loss up until about 20 radians/sec2. If a storage rack were to be seriously out of whack and yielding an RV level of 30 radians/sec2, enterprise drives would still perform at about 95% throughput whereas nearline topples to 50% and desktop stops working altogether.

Our next storage crisis-in-waiting: thermals. You know that spinning platters and motors generate heat, and cramming drives into an enclosure like sardines may compound heat buildup. Unlike with desktops, few people care if storage boxes generate considerable fan noise; the main concern is keeping everything running properly. In the case of nearline storage, that means running properly around the clock every day of the year.

"Availability 24x7 actually isn't that big of a deal," notes Seagate's Whittington. "It's the ‘burstiness'—how hard the drive has to work in a given amount of time. In a high-work period, the temperature goes up. In nearline drives, we have checks in the drive to make sure that if the duty cycle is going way above spec, getting a lot warmer and doing a lot more work than it was supposed to do, the drive will actually throttle back. If we write data to a drive, we'll follow that write command with a read command to make sure the data was properly written at the high temperature or high usage rate. This doesn't kick in immediately. It'll take a while for the drive to do this throttling. So not only does this make sure the data was properly written, but while the drive is reading, it's almost like being idle. So it actually cuts the performance in half, which gives the drive a chance to cool." In a nearline environment, such performance throttling is a sensible compromise. Users aren't likely to mind a throughput cut for a few minutes. Whittington doesn't even mention that Barracuda ES drives have a non-default option for advanced power management, wherein the drive will switch into a low-power mode when idle for more than one second. But all of this ignores one potentially major pitfall with nearline SATA: When data errors and excessive vibration join together in a large RAID, it can spell disaster for users' data.

In short, RAID 5 is built to withstand a single drive failure. But in a large RAID 5, rebuilds can take a significant amount of time. While the RAID 5 is rebuilding, the parity protection is at risk. If failure hits another drive, that's it. The data is lost. If this boils down to a damaged block or two, the loss can be minimal. If a second drive fails outright, then you can pretty much kiss the whole array goodbye.

Is this a real possibility? Consider that the unrecoverable error rate for a SATA desktop drive is 1 in every 1014 bits read. That's every time the data is read. A 500GB drive has 1/25 x 1014 bits, so in a 2.0TB RAID 5, meaning five 500GB drives, a full rebuild entails transferring 5/25 x 1014 bits, so there is a 20% chance of an unrecoverable error happening during a rebuild. That error might land on an empty sector. It might be in that blood type database field. But remember that these are the error rates on desktop drives under normal conditions. We've seen the impact that vibration can have on read/write conditions, and high temperatures only compound the unrecoverable error risk. Also figure in that, at moderate duty cycles relative to one another in an enterprise setting, desktop drives have 2.5X the annual fail rate of enterprise drives. This is before figuring in the higher temperatures, longer duty cycles, and greater powered-up hours inherent in enterprise settings.

"A SAS drive operates anywhere from 1.5 to 3 or 4 times faster than a desktop drive does," observes Whittington. "So if you take a SATA drive and try to run it in exactly the same enterprise application, the very fact that the drive is, let's say, three times slower than an enterprise drive means that it will take three times longer to do the work. If an enterprise drive is busy 33% of the time, a desktop drive would be always busy with no idle time whatsoever. The usage goes up, the temperature goes up. When temperature goes up, typically reliability comes down. So you're going to get more failures."

There are two chief ways to minimize this danger. The first is to use enterprise SATA drives rather than desktop SATA. According to Seagate statistics, nearline SATA drives show error rates an order of magnitude removed from desktop SATA, or 10-15 rather than 10-14. In a three-drive RAID 5, this drops the probability of an unrecoverable error from 12% down to well under 2 percent.

The second path to safer living is to opt for RAID 6 instead of RAID 5. RAID 6 is just like RAID 5 except that it institutes two parity blocks instead of one and requires a minimum of four disks instead of three. With double distributed parity, two drives can fail in a RAID 6 at any time with no permanent data loss. The capacity equation is N-2, so four 500GB drives in a RAID 6 would leave you with 1GB of storage space, a proposition in some ways worse than RAID 1 since basic mirroring doesn't carry RAID 6's performance penalty on write operations. However, the impact of N-2 becomes less the more drives you have in the array. Past a certain point, the benefits of double protection outweigh the loss of a few hundred gigabytes. Because of this common sense approach to capacity and the additional compute overhead for double parity, there are currently no motherboard chipsets offering RAID 6 support.


...more
 
         
    Back to top
Page 1 2 3
   
   
Copyright © 2007 RAM Magazine. All rights reserved.
Do not duplicate or redistribute in any form.