Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

RAID Problems With Intel Core 2?

Posted by Zonk on Thu Jul 06, 2006 02:49 PM
from the arr-aye-eye-dee dept.
Nom du Keyboard writes "The Inquirer is reporting that the new Intel Core 2 processors Woodcrest and Conroe are suffering badly when running RAID 5 disk arrays, even when using non-Intel controllers. Can Intel afford to make a misstep now with even in the small subset of users running RAID 5 systems?" From the article: "The performance in benchmarks is there, but the performance in real world isn't. While synthetic benchmarks will do the thing and show RAID5-worthy results, CPU utilization will go through the roof no matter what CPU is used, and the hiccups will occur every now and then. It remains to be seen whether this can be fixed via BIOS or micro-code update."
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • don't worry (Score:5, Funny)

    by sum.zero (807087) on Thursday July 06 2006, @02:52PM (#15669984)
    it's not a bug, just errata ;)

    sum.zero
  • If you're running raid5 it's probably in an enterprise setup. If so, why aren't you running a dedicated controller? The CPU should have little to no impact on the raid subsystem...

    Seems odd to me that the inquirer is the only one reporting this. How about a real hardware review site?
    • I agree with this. For most people, backing up your data every week is a LOT better option for data security. Users who should be using RAID 5 should also have dedicated controllers.


      Still, this is a problem for Intel. Their products are supposed to do what they do extremely well under all conditions. I hope that they find a way to fix this admittedly niche problem.

      • by moggie_xev (695282) on Thursday July 06 2006, @03:04PM (#15670111)
        Reading the article it's all about software raid and the performance they get.

        The interesting question is what other peices of software that we run will get unexpectedly bad performance.

        ( I have > 2TB of hardware RAID 5 at home so I was wondering ... )
          • I've never heard it called FAKERAID maybe it should just be called FAID? I'll file that one back for use later...

            Anyway, it's not entirely a hw/sw combo. These types of raid controllers are entirely software based. They consist basically of an ata or sata controller and an interrupt handler. When the disk is being accessed in legacy bios mode (ie during an os install, etc) the cpu pulls the interrupt to write to the disk and the BIOS calls the software stored on the card. This software is executed by the BI
          • Err, NO! It's about FAKERAID, which is a H/W S/W combo.

            RAID stands for Redundant Array of Inexpensive/Independent Disks. Nowhere does it say "Controlled By A Dedicated CPU" ("RAIDCBADC"? Doesn't quite sing like "RAID"). Software RAID is as much RAID as a top of the line server RAID controller with RAM and a battery backup. It isn't as fast, sure, and it loads the system CPU, but it is still RAID. Calling it "FAKERAID" is just pretentious and misleading. The data integrity benefits are still present, as are some performance benefits in some circumstances (in fact, Linux RAID is demonstrably faster in some workloads than a top end Adaptec hardware RAID controller, though this is the exception rather than the rule)

            That said, I hate pretty much all RAID controllers (whether software or hardware). Linux software RAID means that I can drop the disks into any PC and access the data. Every RAID controller from Promise, Adaptec, and Tektronic requires me to use their disk format, and if I lose the controller I lose the data until I can get another controller. Sure, in high availability environments, you keep a spare...but with Linux software RAID, every PC in the office is a spare controller. That's my kinda redundancy. I've even had two identical Adaptecs with different firmware lead to pretty massive data loss during a server migration. Thankfully there were good backups. I've never had similar problems moving Linux software RAID disks into a new Linux box.
      • You are correct that RAID isn't a backup solution, but incorrect when you say if you're using RAID5 you should be in a data centre.

        What if you have a lot of photos, music or movies - these aren't unusual things these days. I don't want to go rummaging through DVDs to find the picture I want, I want to fire up f-spot and see it there straight away.

        RAID5 provides sensible protection against data loss when using consumer hard disks - software RAID5 is readily available on linux and hard disks in the 2-300GB range are easily affordable. You can often pick them up for $50 after rebates. So I can get a TB of storage for a few hundred dollars, but to use hardware RAID5 would probably double the cost. Fine if you're an enterprise, but not fine if you're using it at home.

        • by myz24 (256948) on Thursday July 06 2006, @03:45PM (#15670516) Journal
          I agree, it seems on slashdot (and actually, some of my friends) that you're an idiot if you're not running RAID but your equally dumb if you're running RAID5 because it's not a backup solution. It's as if there can't be any gray area in the matter. People make it seem like RAID5 has no purpose or benefit and everyone should just be using striping+backup. To me, the point of RAID5 or other redundancy RAID setups is it's your first line of recovery for a disk failure. If a disk fails, you replace it and you've suffered little downtime. If something major happens then yes, you restore from backup.

          My other issue is with people forgetting the idea behind being sensible about what needs to be protected and how much it should cost. There is no reason why my personal collection of photos, music and video should cost me so much. Software RAID is way more than adequate for providing a cheap way to store my files. If data protection AND peak performance are what you need, then yes you need to go full hardware. WHERE'S THE MIDDLE GROUND PEOPLE?
          • by Nutria (679911) on Thursday July 06 2006, @10:55PM (#15673141)
            WHERE'S THE MIDDLE GROUND PEOPLE?

            There's no "middle ground", there's cost-benefit analysis.

            I.e., is it worth my time to spend $50, $100, $200, $500, etc, and an hour a week to mirror a pr0n collection? Some people would say $50 and 5 minutes, and others would say $500 and 6 hours a week. And some would say, "Chunk it. If the disk dies, I'll just download it all again."

          • "3ware card for a few hundred bucks "

            The nice 3ware cards for 100 bucks are NOT hardware raid, they use the CPU to calculate the RAID, it might even say it is in the literature but working at company (tech support) who sells servers that use 3ware for 80% of it's business, I can definitely tell you this isn't the case.

            You CAN get a hardware based 3ware card, but then you are looking at 400-500 bucks (+some for the battery backup unit).

            Plus if you read the parent correctly, 4 300GB hard drives for 50 bucks t
            • These 3ware cards are definitely hardware RAID. You are spreading FUD.

              The parallel card is the $110 on newegg.

              From Newegg StorSwitch switched architecture delivers the full performance benefit of Parallel ATA's pointto- point architecture up to 133MB/sec per port On-board processor provides true hardware-based RAID and intelligent drive management functions BIOS set up utility and 3ware Disk Management (3DM) web-based management software Bootable array support for greater system fault tolerance

              http://3ware [3ware.com]
      • by jelle (14827) on Thursday July 06 2006, @03:31PM (#15670353) Homepage
        "I agree with this. For most people, backing up your data every week is a LOT better option for data security. Users who should be using RAID 5 should also have dedicated controllers."

        You're generalizing a little too much. For example: I have >1TB storage on my mythtv box (I just like to have a good selection of stuff to watch when I finally get to watch tv, and I'm never at home when the shows I like are being broadcasted), and I'm using software RAID5 on that. That is, software raid5, on shared controllers: All together seven disks off the mainboard, from a mixture of pata and sata connectors. I wouldn't do this on something like a server, but it's plenty fast enough for mythtv. It also gives a lot of protection for the array of disks, and it's a much, much better option than the weekly backup you suggest (first of all, a backup would take ages, cost waay more in disks (which wouldn't even fit in the HTPC), and last but not least: without raid5, if one disk dies, I could lose up to 7 days of recordings...).

    • The point TFA makes is not that a RAID5 setup would be used on a desktop, but that real-world performance seems to suffer on this chip.

      Am I halucinating to recall something happening like this a long time ago with Intel?
    • Yes, i tought about this too.

      Using RAID5 in software (be it completely in software like Linux MD or Windows Dynamics Disks, or 99% in Software, like most Onboard RAID Controllers out there) isn't a good idea if you want to run an "enterprise" setup. It might be okay for your mom's basement, or for test systems.

      But productive systems should be using real raid controllers, equipped with half a gig of cache memory, a battery backup in case of a power failure for the cache, and dedicated processor for the raid5
    • I totally agree. If this is actually a RAID-5 setup, then it requires at minimum 3 drives. Most onboard (intel) RAID controllers are only setup for 0,1,0+1, or 10. And not RAID 5. I don't see how it could possibly be correlated to the CPU. It seems much more likely that if it is a new North/South bridge, that the problem is the with IO controller.

      CPU utilization in RAID5 configurations is almost entirely offloaded to the RAID controller.

      The article (including spelling errors) fails to mention a lick about t
      • by ocbwilg (259828) on Thursday July 06 2006, @04:21PM (#15670869)
        Most onboard (intel) RAID controllers are only setup for 0,1,0+1, or 10. And not RAID 5. I don't see how it could possibly be correlated to the CPU.

        That's because you can do RAID 0, 1 or any combination of 0 and 1 without needing parity data. The performance killer on RAID 5 (and any other form of RAID that requires parity) is in the XOR operations used to compute and verify the parity information. In order for RAID 5 to perform at a satisfactory rate and not totally bog down your CPU, the XOR calculations should be handled on a dedicated hardware controller, not in software.

        However, for non-parity RAID setups the amount of CPU overhead is almost trivial, so referring to "fake RAID" or "software RAID" with the integrated RAID controllers on most motherboards is a misnomer. That being said, at least one of these articles is talking about servers using third-party RAID controllers.
        • That's because you can do RAID 0, 1 or any combination of 0 and 1 without needing parity data. The performance killer on RAID 5 (and any other form of RAID that requires parity) is in the XOR operations used to compute and verify the parity information. In order for RAID 5 to perform at a satisfactory rate and not totally bog down your CPU, the XOR calculations should be handled on a dedicated hardware controller, not in software.

          No, no, no, no. The processing overhead of parity calculations is miniscule on any remotely modern CPU (even a paltry 300Mhz Pentium 2 has a parity throughput of ~700M/sec).

          The performance killer on parity-based RAID configuration is the additional disk reads required to calculate the parity, *not* the parity calculations themselves. Which is why modern software RAID is typically faster than hardware RAID until you get into larges numbers of disks and/or machines with limited bus bandwidth.

          This "RAID 5 is slow because of parity calculations" meme must die (although, admittedly, it's a good indicator of whether or not someone really understands what's going on).

    • If you're running raid5 it's probably in an enterprise setup. If so, why aren't you running a dedicated controller? The CPU should have little to no impact on the raid subsystem...

      This test is interesting for two reasons:

      • Cheap cluster nodes or desktops - one might not want to shell out $300+ for a dedicated controller
      • RAID code basically just munches data around. If software RAID performance is bad, it is likely that the performance of interpreted and bytecode/JIT languages (such as perl, python, tcltk,
      • Python/perl/java have not suffered in any tests I've seen. I guess that leads me to question these findings even more.
      • My personal "analysis", is that this sounds much more like a DMA issue, either in chipsets, in the processors, or in OSes. Core 2 is doing some speculative prefetching and a quite different cache management scheme, so some naive ideas would be that some piece of code or hardware got away with doing things improperly before, a very rare race condition might have become commonplace. If that's the reason, it might be easy to fix. Of course, it might also mean that the prefetching or cache sharing between the c
      • Actually the market has become so diluted with everyone's jumping into the RAID game (thanks to Highpoint Tech and Intel with their hybrid solutions) that it's becoming increasingly difficult to discern the true hardware RAID controllers from the hybrid models. Of course there are the companies that won't so much as touch software RAID (namely 3ware) but Promise, Koutech, and even Adaptec all are very slick with their descriptions of the controllers and make it unclear as to whether or not their products ar
    • Not to mention that most workstations and home PCs don't run RAID 5. If the Core/Core2 chip sets are targeted for machines that don't run RAID, it's not a big deal. If you are running RAID 5, it's likely in a server environment where you would probably have a RAID controller and a Opteron or Xeon based chip.

      -Rick
    • Because it's often slower to do so. We ran tests on a good Adaptec u320 raid controler about a year back and though cpu usage was good. We got much better performance out of Linux softraid5. I would suspect this was because the host cpu was faster than that on the controler.

      Not to mention there is a huge cost savings in going with a softraid solution.

    • Seems odd to me that the Inquirer is the only one reporting this.
      Do you consider it equally odd when a news article is only reported in Pravda, The Sun, the Washington Times, or WorldNetDaily?
    • by temojen (678985) on Thursday July 06 2006, @04:20PM (#15670865) Journal
      If so, why aren't you running a dedicated controller?

      Because if your dedicated controller goes you have to find the same make & model of controller. On no notice. Possibly a few years after that make and model has been discontinued.

      With software RAID-5, any controller that works with your host bus (PCI) and HDD bus (ATA, SATA, or SCSI) will do just fine.

      • I'm slightly confused.

        The articles are both very light on technical details, and somewhat vague as to what's really going on. (Admittedly, maybe they don't know it.) In the first article [theinquirer.net], they allude to the problems being the result of the "softmodem"-like RAID systems that modern integrated motherboards use, which would remove some of the blame from the processor. But then they also suggest that the same problem occurs with dedicated RAID controllers [theinquirer.net] (IBM ServeRAIDs -- I think these are dedicated controllers), which don't cause too much CPU load at all ... further implicating the mobo. However, similar mobos with AMD processors didn't experience the problem, so there's obviously something going on that's Intel's fault.

        It doesn't seem like it would be that difficult to pin the blame down to the particular component: is it the integrated RAID subsystem utilizing the processor inefficiently? Or is it the processor itself, being slow? And if it was the processor, why wouldn't this slowness be exhibited in other situations?

        Seems to me that what needs to happen, is for somebody to do a test with a Conroe processor in a motherboard that doesn't include any of the integrated, offload-work-to-the-processor type of integrated subsystems (RAID, sound, Ethernet), use a 'real' hardware RAID controller, and see what the results are. If there are still problems in that scenario, then there would seem to be something wrong with the processor, and this could be confirmed with simulative benchmarks.

        As a criticism of Intel's complete "systems" (processor plus chipset) I suppose this is a valid criticism, but I'd like to see more of a breakdown as to where the performance hit is coming from.
        • If it happens with a dedicated controller such as ServeRAID, then my first hunch would be that the chipset isn't handling memory contention very well. We used to see this at Dolphin a lot; the Intel chipsets at the time would behave terribly if there was any kind of serious memory traffic coming from the "far side" of the memory controller. This could also be a problem on the "softmodem-like" RAID controllers, where one core is trying to bring previously DMAed data in for its XOR while the other is trying
  • Problem (Score:4, Insightful)

    by laffer1 (701823) <lukeNO@SPAMfoolishgames.com> on Thursday July 06 2006, @02:56PM (#15670031) Homepage
    I don't get what the problem is. Are there specific instructions used often in raid 5 algorithms that are slow on the new chips? Is it bus contention?
    • My guess is it's speed throtteling introducing delay into the occassional execution of these instructions whereas the chip is running full out when running through an artificial benchmark. That's pure speculation on my part though.
    • Software RAID 5 does:

      Load byte 1.

      Load byte 2.

      XOR bytes 1 and 2.

      Store result. There are a few things that could be wrong here. The XOR performance could be bad. This seems a bit unlikely but XOR is not an incredibly common operation so it wouldn't slow down too much else.

      It could be that the pattern of data was bad for cache usage. This would be slightly odd, since it should be a series of 4K linear blocks.

      It could be low I/O performance between the chip and the on-board controller. This seems the

      • XOR is very common (Score:4, Informative)

        by HaeMaker (221642) on Thursday July 06 2006, @03:24PM (#15670300) Homepage
        You use XOR to clear a register. XOR CX, CX sets the CX register to 0. It is faster than MOV CX, 0.
      • Re:Problem (Score:3, Interesting)

        Seems more likely to be a scheduling issue to me...

        Core 0 loads byte 1, Core 1 loads byte 2, Core 1 or Core 2 has a cache miss on the XOR...(Do the cores share a cache?) Or it could be a locking problem. XOR is very common, and it would surprise me if it was slower than on previous intel chips.
  • by b00m3rang (682108) * on Thursday July 06 2006, @03:04PM (#15670113)
    You should be using a controler with a dedicated processor, anyway.
  • Timing problem (Score:4, Insightful)

    by toybuilder (161045) on Thursday July 06 2006, @03:26PM (#15670312)
    This sounds like a timing problem -- the processors are too fast, causing the system to slow down.

    There was a similar problem that I had to wrestle with on a Linux when runnig 3Ware RAID controllers w/ RHEL3 on fast dual-processor servers. When battery backed write caching was turned on, the fast acceptance of IO requests (by the CPU's and then by the hardware RAID controller) lead to awesome sustained performance for short bursts, but under constant load would suddenly hit a wall and then IO would practically hang. (https://bugzilla.redhat.com/bugzilla/show_bug.cgi ?id=121434)

      • I was implicitly blaming bad driver/kernel design when I said that there was a timing problem caused by the processor being too fast. I'm not talking about a timing problem within a single IO/memory bus access cycle -- I'm talking about driver code that breaks because some critical section of code used to run slow enough while a peripheral was processing a request, but due to faster processing, was now allowed to livelock [wikipedia.org].
  • by DysenteryInTheRanks (902824) on Thursday July 06 2006, @03:27PM (#15670318) Homepage
    Can Intel afford to make a misstep now with even in the small subset of users running RAID 5 systems?

    No. No, it cannot. Sell your stock. Rip the CPU out of your boxen. One hundred ten billion dollars in market capitalization has disappeared in a flash with the publication of this groundbreaking article in the Inquirer.

    Intel has signed its own death warrant. As goes RAID5, so goes the world.
  • by jgarzik (11218) on Thursday July 06 2006, @04:32PM (#15670952) Homepage
    This crap does not happen on Linux, on the same hardware. Most likely *BSD is not affected either, though I have not tested such.

    It's almost a certainty that this is a software problem of some sort. Driver bugs are the most common source of "hardware" instability, particularly on Windows. Drivers are often written by clueless intern-level engineers, and quickly forgotten once the drivers initially pass based Windows hardware quality tests.

    Jeff, the Linux SATA driver guy

    • Yeah, so if this is software RAID, what OS are they using? XP I guess?

      Why is Intel's hardware bad just because Windows performs poorly on it?

      I'm a little surprised that, all the way through BOTH articles and this thread, it took this long for someone to ask if it was a HARDWARE or SOFTWARE issue.
       
      As Carlos Mencia would say, "Dee Tuh Dee".
      • If some CPU instructions are more equal than other CPU instructions, Intel should have said so a long time ago.
        Yeah, I'm outraged!

        Oh wait, they did. Not only did they say so a long time ago, they publish documentation and maintain a compiler to help you optimize for the way their processors work.
    • Software RAID is faster and more reliable than hardware RAID. Should your non-RAID controller fail, you just chuck it and get a random new one. If your RAID-controller fails, you have to get another controller exactly the same, sometimes even the same firmware revision, or kiss your data goodbye. And RAID-controllers are notoriously underpowered (SmartArray, I'm looking at you!)
      • Software is more reliable+performance, what are you smoking? To get the performance you've got to turn on write caching, system goes down with write caching you're very likely (almost guaranteed) to have a corrupt filesystem. To get the reliability you turn off write caching, and performance plumments. Any hardware raid worth more than 3 cents has battery backed cache that allows you to have write caching and maintain reliability, not even taking in account being able to do some Raid5 operations with only 1 disk iop.
      • "If your RAID-controller fails, you have to get another controller exactly the same"

        This is why we always by a spare card whenever we get a new RAID controller. That way we know that there will be something that will read the disks and know how the data is setup. Next time you by a RAID controller remember to get another one just like it. Otherwise you might be stuck with disks that will only be read by a card that is out of production.
        • I was going to recommend 3ware as well. I have done any administration work in years but one of my former employers use them for their servers for that reason. If it dies you can replace the board and we even have a few stored in case of a failure.

          Organizations should look into this and not the vendor for their server for any raid setup. It would be nice if they all did as a server is not a desktop and the data is needed NOW when it goes down.
    • They are "wide spread" because a lot of SATA-based boards have these "RAID Controllers" built in, whether you want them or not. Something like 80% of the popular A64 boards have "RAID chips" on them, usually just the RAID 0/1 variety. And there are a lot of $30 add-in cards that are of the same ilk.
    • by afidel (530433) on Thursday July 06 2006, @03:11PM (#15670184)
      Actually I would trust the Linux RAID5 software setup more than a LOT of the RAID controller firmware setups which I have had no end of problems with over the years including a card that rebuilt an array from the new drive on insertion instead of the other way around! Firmware is after all simply software, and software that tends to get a lot less scrutiny then alot of other classes of software, especially potentially data eating code in a project like Linux or one of the BSD's.