Forgot your password?
typodupeerror
America Online Data Storage Databases Hardware

AOL Spends $1M On Solid State Memory SAN 158

Posted by Soulskill
from the go-big-or-go-home dept.
Lucas123 writes "AOL recently completed the roll out of a 50TB SAN made entirely of NAND flash in order to address performance issues with its relational database. While the flash memory fixed the problem, it didn't come cheap, at about four times the cost of a typical Fibre Channel disk array with the same capacity, and it performs at about 250,000 IOPS. One reason the flash SAN is so fast is that it doesn't use a SAS or PCIe backbone, but instead has a proprietary interface that offers up 5 to 6Gb/s throughput. AOL's senior operations architect said the SAN cost about $20 per gigabyte of capacity, or about $1 million. But, as he puts it, 'It's very easy to fall in love with this stuff once you're on it.'"
This discussion has been archived. No new comments can be posted.

AOL Spends $1M On Solid State Memory SAN

Comments Filter:
  • AOL? (Score:5, Funny)

    by roman_mir (125474) on Friday October 15, 2010 @03:35PM (#33912504) Homepage Journal

    What is surprising to me is not the amount of money spent on what was bought, but the fact that AOL has any performance issues at all. They still have users? They have an entire database of users?

    • Re:AOL? (Score:5, Informative)

      by MrDiablerie (533142) on Friday October 15, 2010 @03:40PM (#33912560) Homepage
      It's a common misconception that AOL's primary business is still dial-up access. They make more money nowadays with their content sites like TMZ, Moviefone, Engadget, etc.
      • Re: (Score:3, Interesting)

        by Trepidity (597)

        It's true that they make more money now with their content sites, but only slightly more: ISP subscriptions still make up around 40% of its revenues.

    • Re:AOL? (Score:5, Funny)

      by T Murphy (1054674) on Friday October 15, 2010 @03:57PM (#33912776) Journal
      No, they think they still have lots of users. The cancellation department is separate from HQ- at 56k it's still going to be a few decades before the suits finish receiving all the cancellation notices.
    • Re:AOL? (Score:5, Funny)

      by Mikkeles (698461) on Friday October 15, 2010 @04:08PM (#33912934)

      Me Too!!!

    • Re: (Score:3, Funny)

      by dintech (998802)

      They have an entire database of users?

      No, the 50TB is for a museum of all the different CDs they sent out.

    • No, silly. The database is an historic record that they keep of every demo CD and floppy they ever sent out (date, name, address etc.). It was designed to ensure that they never sent more than 999 to the same person.
    • by abarrow (117740)

      They probably did it because their database vendor (Microsoft?) claimed that their database problems had to be due to their hardware. It couldn't possibly be software performance issues...

    • What's surprising to me is that they managed to extract such awful performance out of so many SSDs. I mean, seriously, a pitiful 250k IOPS with $1M of SSD? You could do better with a dozen SSDs from the corner store!

      • by Vancorps (746090)
        I dunno, for 300k from NetApp you can get 100k IOPS over 60TB and that was three years ago, my new unit will do that with 100TB and costs even less, takes up half the space, and uses half the power, oh, and it'll do 200k IOPS, at least projected. Of course if you're database is jacked by that back-end storage then you're database storage is poorly defined especially since Oracle OCFS can utilize multiple storage back-ends simultaneously. MS SQL server can achieve this as well through other means but most si
  • What? (Score:5, Insightful)

    by EndlessNameless (673105) on Friday October 15, 2010 @03:39PM (#33912544)

    As a DBA, I would love to have solid-state storage instead of needing to segment my databases properly and work with the software dev guys to make sure we have reasonable load distribution.

    Where can I get someone to pay a million dollars so I can do substandard work?

    • Re:What? (Score:5, Funny)

      by Jimmy King (828214) on Friday October 15, 2010 @03:44PM (#33912606) Homepage Journal
      As long as you come really cheap, I can probably get you on where I work. You won't get cool hardware like that, but you can have the other half. Management seems to be ok with substandard work as long as apologizing to the customers continues to be cheaper than doing a good job or buying the hardware to cover up the poor job.
      • Re:What? (Score:4, Funny)

        by konohitowa (220547) on Friday October 15, 2010 @06:27PM (#33914128) Journal

        As long as you come really cheap, I can probably get you on where I work. You won't get cool hardware like that, but you can have the other half. Management seems to be ok with substandard work as long as apologizing to the customers continues to be cheaper than doing a good job or buying the hardware to cover up the poor job.

        You could always take a long lunch, cross the bridge from Redmond to Seattle, and apply at Amazon. I'm sure Microsoft would give you a couple of hours off to do that, right?

    • Re: (Score:3, Insightful)

      by Threni (635302)

      You're the DBA - do what you do best, and start Googling! :)

    • Re: (Score:3, Insightful)

      by eln (21727)
      I was thinking something very much along these lines. I can't believe that AOL is doing something more I/O intensive than everyone else in the world. If you're looking at buying something this expensive, you really need to go through your database design and application code with a fine-toothed comb and look for inefficiencies first.

      Of course, in the real world, this sort of thing (maybe not to this scale) happens all the time. We just had a customer that was having major performance problems. They d
      • by Nexzus (673421)
        Cynic in me believes that in addition to a whiz bang storage network, AOL also got some free publicity in the Tech circle with the inclinination that they're leading edge.
      • by Lifyre (960576)

        Not talking about any specific real world case:

        When does it become cheaper to throw more power at it than improving code efficiency? It seems to me that this is taking the same steps that a large amount of software has. that it is cheaper to use a more powerful processor than optimize the code...

        Granted they likely jumped the gun a little bit but the world needs early adopters...

        • by Vancorps (746090)
          They are hardly early adopters. NetApp has had SSD trays for a few years now and they have units that easily bust through 250k IOPS. I have a middle-tier NetApp storage implementation and I've got almost that much bandwidth available to me at a third of the cost. NetApp also has solid state modules for 10k that are used as cache to facilitate the early morning rush. This is nothing new and not really that impressive. For me it just reenforces all the AOL stereotypes about inefficiency. I will never come clo
      • Re:What? (Score:5, Informative)

        by fluffy99 (870997) on Friday October 15, 2010 @08:23PM (#33914774)

        I have a feeling AOL just spend $1,000,000 on something they didn't really need as well.

        They admitted as much in the article. They decided that it was cheaper to improve the hardware throughput than to spend the money on developers to try to trim the demand. They were also probably losing money by not meeting SLAs and a quick fix was cheaper in the long run. They also reduced power and cooling requirements as well, so there may be some long term payback there as well. The free publicity certainly didn't hurt either

    • by h4rr4r (612664)

      Mod parent way up. This is where big companies waste bundles of money. Rather than do the work right they throw ever more hardware at it.

    • Re:What? (Score:4, Funny)

      by hxnwix (652290) on Friday October 15, 2010 @04:25PM (#33913138) Journal

      You could probably get by with a cloud of 486s, but why the fuck would you bother?

    • by Kjella (173770)

      Where can I get someone to pay a million dollars so I can do substandard work?

      You try claiming the next big work they want will take more than a million dollars in DEV/DBA work compared to buying a million dollar SAN. At this point three things could happen:

      1. They say "um, never mind"
      2. They pony up the cash
      3. They call you on it

      While I've seen some rather dysfunctional companies, I still haven't seen any where the PHBs try reestimating the IT cost themselves. Mind you, I haven't seen an overwhelming many companies that have a spare million dollars lying aorund either so I figure #1

    • Re: (Score:3, Interesting)

      by kanad (541484)
      Careful on what you wish.Virgin Blue airlines in Australia suffered a 2 day blackout costing $20 million due to a single solid state drive failure. They have since gone back to normal drive. Read it at http://www.theaustralian.com.au/australian-it/still-no-clue-to-virgin-blues-20m-question/story-e6frgakx-1225937335722 [theaustralian.com.au]
      • Re: (Score:2, Insightful)

        by Anonymous Coward

        certainly the failure of an entire infrastructure after the failure of a single drive is the fault of the drive manufacturer. spinning disks never fail?

    • That's exactly what I was thinking, unless AOL is doing something ' amazing ', it's very much likely that the requirements of their DB infrastructure are similar to that of everyone else. The way everyone else solves these problems is through a marraige of well-designed infrastructure and reactive software systems, asfaik. That said, it still sounds uber cool and the ultimate DB toy/tool.
    • by Firehed (942385)

      If databases were implemented correctly, they'd take care of the load distribution themselves. Of course we'd all still be perfectly capable of writing stupid queries, but a lot of the bullshit we have to deal with when it comes to databases stems from rotational hard drives being so ill-suited to the random seeks that databases are so useful for.

      As far as I'm concerned, running your database on solid-state drives just amounts to a bug-fix in the database software. Stuff like data denormalization, avoiding

    • Not that I've crunched the numbers or anything. But I'm willing to bet that a team DBAs outsourced to India and cheap hardware made in China provides a better ROI over an expensive team of American DBAs and a standard server configuration for the task.

      Sucks doesn't it?

    • Depends. For certain workloads like mixed read/write (let's say 70%/30% - 40%/60%), solid state approaches are pretty good. If you've got lots of writes that you need to read back randomly, then buying lots of memory or duing multi-master or master-slave replication is not ideal.

      I definitely see a use-case for flash based approaches, where you both need the read and the write IOPS and don't have warehousing amounts of data, but the usecase is narrower than people think.

      Reasonable load distribution can
  • It does mention that sas can 'only' deliver 5Gbit/sec - but is that not the bandwidth for each disk and thus not a problem at all?

    The reason the ssh is so much faster is most likely the nice search time for ssd. And I really like the concept of them using flash chips directly. Now we just need something cheeper then 20$/GB :}

    • At the rate SSD storage is growing (and the capacity is being used), it is conceivable that a company could choose cheap MLC drives and simply plan on upgrading them before their expected time of death.

      With modern wear-leveling algorithms, reduced write amplification, and better physical longevity, I can see cheap SSDs lasting the 2-3 years their capacity would be good for.

      SATA SSD over iSCSI is starting to look very appealing now compared to Fibre Channel or SAS. Since silicon performance and capacity scal

      • by hairyfeet (841228)

        I don't know about you, but I haven't seen SSDs battle tested enough for me to truly trust the things yet. With mechanical drives I've yet to have one "just die" as I ALWAYS got warnings something was going via drive noise, heat, random small errors, etc. And now SMART just makes that even easier to spot. On the other hand I've been told time and time again flash just "don't die" they wear out, yet I've had plenty of flash drives just go dead, and a couple of my early adopters managed to have dead SSDs. Not

        • Re: (Score:3, Informative)

          by rsborg (111459)

          I don't know about you, but I haven't seen SSDs battle tested enough for me to truly trust the things yet. With mechanical drives I've yet to have one "just die" as I ALWAYS got warnings something was going via drive noise, heat, random small errors, etc. And now SMART just makes that even easier to spot.

          Google found differently in their massive hard drive survey [engadget.com]... sometimes drives would just up and die with no SMART warnings. Also the most common SSD failure-case is lack of writes, at least you can retri

          • Re: (Score:2, Informative)

            by hairyfeet (841228)

            Yes I've read the Google report but IIRC we have NO access to their source, hell we don't even have access to detailed measurements of what types of load and I/O they were running. same as I wouldn't be surprised if you are running massive databases the trade off for SSD would make the $$$ worth it as in TFA, I want to see more "real world" average server and workstation loads, not the "insane pound the shit out of the drives" that someone like Google does. I also noticed in TFL they are basing everything o

          • Re: (Score:3, Informative)

            by EXrider (756168)

            With mechanical drives I've yet to have one "just die" as I ALWAYS got warnings something was going via drive noise, heat, random small errors, etc. And now SMART just makes that even easier to spot.

            Google found differently in their massive hard drive survey [engadget.com]... sometimes drives would just up and die with no SMART warnings. Also the most common SSD failure-case is lack of writes, at least you can retrieve data off the drive as opposed to a completely opaque device if the platter is frozen.

            Yeah, I've seen quite the opposite. Let me preface this with saying that I'm strictly talking about consumer and midrange drives, I've seen very few SCSI and SAS drives die without warning.

            In the past 10 years, in a company with about 200 nodes, I can literally count on one hand the amount of hard drives that have given any SMART warnings leading up to their imminent failure. They pretty much always die while the OS accumulates log entries of bad blocks and I/O errors. Most of the time it was either

        • by SQL Error (16383)

          We run all our databases on SSD. Just like disk drives, and unlike your claim, sometimes they simply drop dead without warning, even the high-end ones.

          The performance gains are entirely worth it, though.

    • by TheRaven64 (641858) on Friday October 15, 2010 @03:59PM (#33912806) Journal

      Now we just need something cheeper then 20$/GB

      Actually, the price was the most interesting part of this:

      at about four times the cost of a typical Fibre Channel disk array with the same capacity

      Four times the price and, what, ten? A hundred? times the IOPS? That makes NAND pretty much a no brainer for any heavy-use database.

      • by fusiongyro (55524)

        The first problem on my mind right now though is that nearly all widely used relational databases are built with a lot of algorithmic assumptions about the disk. They spend a great deal of time ensuring that they only fetch the minimum number of blocks, and many higher end databases go to lengths to ensure that related blocks wind up near each other on disk, implement block caches and things like that. A lot of this is done to mitigate seek time.

        With SSDs, seek time is basically constant and there's no need

        • by Skal Tura (595728)

          Something is wrong when DB is handling that ... The OS underneath should do these conclusions and optimize based on type of storage, without intervention of the DB software.

          Of course, applications to have to manage the load they do to a degree, but down to hardware level? That's simply too much, better trust kernel to make the right decisions! Then again, the world isn't perfect ...

          We have to battle with profoundly bad HDD IO management on the software (Still that software is best for our business), but it

      • by borgboy (218060)

        Yeah. The relevant metric for databases really is $/IOPS, not $/GB.

        So, off the cuff, I figure you need a 700-disk array of 146GB drives to do this much storage at RAID 10 ( or 0+1 for you pedants ). That's a lot of random IO capacity. I don't know how poorly IOPS scale for systems at this magnitude, but I'd be surprised if the SSD solution was 10x IOPS over 700 15k spindles. Maybe 2-5x?

        • by jon3k (691256)
          I believe the article claimed a 4x performance increase.
        • by jon3k (691256)
          How exactly do you get 250k IOPS out of 700 disks by the way? If we assume 15k rpm 2.5" disks you _might_ PEAK at about 250 IOPS/sec/disk, and thats being very generous. That's only 175k IOPS, and that's assuming straight reads, if you have a high mix of writes to a RAID volume (which could double your writes in RAID 10) you'd dramatically cut that down. By my math you'd need at LEAST 1,000 FC disks to get 250k IOPS.

          I'm just curious, how did you come up with 250k IOPS with 700 disks? Short-stroking?
          • by borgboy (218060)

            I didn't get 250k IOPS. I _said_ 250k IOPS was 2-5x better. I used the same math you did, and specifically hedged about not knowing how poor the scaling was with these kinds of systems. I am _not_ a storage engineer, just a developer with a (professional) interest in high performance random IO systems.

            • by jon3k (691256)
              Then your guess was pretty close! I thought I read it quoted at 4x faster. There's also the fact that it uses 90% less power, and I would assume 90% less cooling as well? Not to mention the dramatic reduction in floor space (not sure on their cost per sq ft obviously). I wonder what the difference in operational costs of the SSD array would be vs magnetic disks. And of course, no matter how many spinning disks you throw at it, you'll never get 1ms access times, unless it's coming out of RAM cache. I ca
              • by borgboy (218060)

                Power and cooling are a big win here, no doubt.

                What would really be news here would be database engines and/or filesystems that grocked SSD performance patterns well and could combine pools of spinning disks and SSDs in optimal ways for a given workload.

        • by Anpheus (908711)

          Your transactions per second won't scale as well as your IOPS because with spinning disks, there's still a significant latency before your data actually gets written to disk.

          RAID just increases the number of in-flight IOs, widening the throughput but not decreasing its latency per disk.

          • by borgboy (218060)

            RAID by itself does only increase the number of in-flight IOs, but it almost always comes with that most magical of pixie dust, the battery-backed cache.

            The other point that I'll make is that often the only writes your RDBMS is waiting on are log writes, which are sequential anyway.

            In any case - I'll cede your point that spinning rust will likely NEVER scale as well as NAND.

    • by NevarMore (248971)

      Cheaper than $20 a GB.

      I know thats expensive now, but I'm just old enough to remember when a GB of spinning magnetic disk was a big effing deal.

    • by Skal Tura (595728)

      SAS can deliver 6Gbps, as can SATA nowadays too (Tho rare). Fastest SSDs hit this limit, but there's no simple way to go beyond (PCIe controller).

      SANs have their own bottleneck: SAN switches, which due to "centralized nature" (all traffic from all nodes goes through certain set of switches, or single switch) lowers the overall throughput.

      There are ways to have waaaay more IOPS, and waaaay higher throughput total, for way less money. These ways are what we intend to use in our VM cluster to be brought up nex

  • It is called HDSL... (Score:4, Informative)

    by Yaa 101 (664725) on Friday October 15, 2010 @03:42PM (#33912582) Journal

    You can read more about that here:

    http://www.google.com/search?q=High-Speed+Data+Link [google.com]

    • by afidel (530433)
      Whee, 1GBps (10Gbps) per direction. How is this significantly better than COTS 8Gbps FC or 10Gb iSCSI/FCoE that don't need a proprietary card? Heck if you want LOTS of bandwidth use 40Gb IB. Plus according to the manufacturers site [violin-memory.com] they DO support the hosts through COTS connections.
  • Really? (Score:4, Informative)

    by Archangel Michael (180766) on Friday October 15, 2010 @03:47PM (#33912644) Journal

    My impression has been that this has been what has been going on for some time now with all the larger database operations, and one of the reasons why SSD have not yet come down in price is that all the best units and tech are going to the big companies as fast as they can get it from the manufacturers. I wouldn't be surprised to see someone like Google saying something like "yawn, 50TB" and saying that they have PETABYTE versions already out there.

    If you run a Database of any size, especially ones with large read to write ratios, SSD would only make things faster. And speed counts.

    • I wouldn't be surprised to see someone like Google saying something like "yawn, 50TB" and saying that they have PETABYTE versions already out there.

      Yeah, because at $1MM for 50TB, a $20MM investment by a publicly owned company in such a thing would entirely fly under the radar...

      • by jon3k (691256)
        Well considering it could easily be hidden in any of their billions of dollars (?) in infrastructure line items in their SEC filings I assume they could "hide" it reasonably well, no?
  • "but instead has a proprietary interface that offers up 5 to 6Gb/s throughput."

    You know that SAS offers 6Gb/s throughput and Infiniband up to 300Gb/s (with 8 and 16 being more common).

    Either way, $1M for a bunch of SAS SSD (even SAS NVRAM) is way overpriced imho. They could've done it cheaper.

    • Re: (Score:3, Funny)

      by Galestar (1473827)

      hey could've done it cheaper.

      It's AOL, would you actually expect them to make intelligent, informed decisions?

    • by jon3k (691256)
      They claim $20/GB which I have to assume includes more than the bare disks themselves. People routinely pay north of $30/GB for 15K fiber channel storage systems. Lots of things to consider - shelves, hba's, rack enclosures, I/O directors, maybe even power and cooling? Right now Intel SLC drives are over $11/GB and that's just for bare drives. I'd be curious to see if you could build a 50TB RAID5 flash based storage system for under $1M.
  • Just curious, have they exhausted all of their software avenues for this? While yes, I understand they have a huge relational DB, I know other companies that are just as big/bigger and the have next to no issues. Maybe its just poorly designed? That's a hell of a lot of (albiet super sexy) hardware to throw at what could be a software problem. Thoughts?
    • by jandrese (485)
      They mentioned in the article (albeit obliquely) that the sysadmin thought he could probably reduce the load by working with the software guys, but in the end it would cost more than the $1 he spent on this solution. Plus, it might not even work if the software guys were in fact competent and the problem is just that you have too many users for the old hardware.
  • once you figure the total energy savings (reduced power needs, reduced cooling needs, etc) over the lifetime of the drive I wonder how much more expensive it is. I can't wait for SSD to become more affordable. I'd like to have that in our SANs too.

    • by afidel (530433)
      It's not even close, power is ~5% of the TCO of anything enterprise grade, maybe 10-15% if you include capital costs for UPS, generator and AC into the equation.
      • by outZider (165286)

        Hm. I've always seen power as the most expensive part of an enterprise deployment -- see also why these companies are building data centers in cheap-power areas.

        • by afidel (530433)
          For scale out power's probably a bigger percentage of the total, there you're talking about cheap hardware, no software licensing fees, and no support contracts.
  • by Lifyre (960576)

    Does this mean AOL is doing something novel and progressive? Something doesn't feel right about that...

    I'm so confused!

  • by PatPending (953482) on Friday October 15, 2010 @04:05PM (#33912884)

    I wonder what the read/write rating is vs. a hard disk?

    Wikipedia puts flash at 1,000,000 program-erase cycles [wikipedia.org]

    • by pz (113803)

      Troll. Not even a very good one.

    • Re: (Score:3, Interesting)

      by SQL Error (16383)

      It's a non-problem. With Intel's 64GB X25-E drive, for example, you can do non-stop random writes for 6 years before you run into problems. We run all our databases on SSDs, mostly Intel and FusionIO ioDrives.

      That said, we've had drives simply drop dead with a controller failure. You still have to run a RAID array, even with SSDs.

      • by jon3k (691256)
        Wow! I've always wanted to talk to someone who was running production databases on Fusion IO ioDrives. Can you explain what the storage setup is like? Do you use FusioIO ioDrives in a Tier 1 sort of configuration, backing it with SSDs, or are they totally indepedent? What's the server hardware configuration like? x86? What manufacturer? How many ioDrives per host? What's the total capacity and performance (throughput, IOPS) ? What about the total SSD capacity? Are you using magnetic media for anyth
  • Ok guys.... (Score:3, Funny)

    by mrsteveman1 (1010381) on Friday October 15, 2010 @04:10PM (#33912960)

    It's very easy to fall in love with this stuff once you're on it.

    I said the same thing about coke in the 70's....

    I guess what i'm saying is, no one loan money to AOL until they admit they have a problem.

  • From summary:

    One reason the flash SAN is so fast is that it doesn't use a SAS or PCIe backbone, but instead has a proprietary interface that offers up 5 to 6Gb/s throughput.

    What are they talking about? The violin memory website says the appliances themselves support FC, 10 GbE, and Infiniband connections [violin-memory.com]. Their performance page [violin-memory.com] says that the appliance can be directly connected to a pcie bus, presumably using some sort of pass-through interface card, but what physical connector and media are used?

    • by LordMyren (15499)
      I just enjoyed the fact that 5-6Gb/s is a breath-stealing 150% the speed of a single lane of PCIe v2.0, and equal to SATA3's rate. Your implicit question of "what actually runs this SAN," whats behind this interfaces propositioned as blazing fast, is oh so much more dirt on the grave of this fluff piece. Still, from the outset, the "facts" present are already pretty funny.
  • 6Gbs huh? Ok, so i'm assuming you have some special cable connecting to the SAN... I know offhand that dell sells the MD3200 - a DAS unit that transfers 6Gb/s... Although I estimated it was about 10GB in 30 seconds.

    I've got to be missing something here. The seek times are probably out of this world with this "specialized" SAN, but then we have equallogic SANs that can have 48 SSDs and have 10Gb/s...

    Hey AOL - you are in the arctic right? Can I interest you in some of this amazing ice?
  • Hei folks,

    20$/GB is not that much IMHO... is that net capacity, does it include geographical replication? Depending on the answer, the real news could be that SSD storage is so much more competitive that one may have thought... :D

    • by AcquaCow (56720)

      Once you factor in the total cost of ownership for a disk-based SAN eg: heat/cooling/maintenance/etc... Flash is actually pretty cheap.

      • by jon3k (691256)
        I'd ignore the cost ($20/GB) even though it's actually pretty good. The real interesting thing here is $/IOPS. To build out a similar system based on $/IOPS using 15K FC disks would cost significantly more, and wouldn't ever provide the same access time (less than 1 millisecond). I'd love to see a storage vendor quote out a DELIVERED COMPLETE SYSTEM that provides 250k IOPS using fiber channel disks and do it for under $1M. I don't think any of the big guys (EMC, 3PAR, Hitachi, etc) could touch it.
  • I look to Google, Facebook, and other massively scaled companies that build highly distributed systems running on low availability commodity systems. These guys are not throwing Solid State Memory at biggus relational databases. Sorry, but this is a bandaid for a dinosaur.
    • by jon3k (691256)
      I think it's just a dramatic difference in workloads. I think Facebook and Google have massive storage capacity requirements, whereas AOL just wanted more IOPS and/or throughput. But, the bottom line is, it was cheaper to throw hardware at this particular problem than engineering expertise. Right tool for the job, I suppose.
  • RAID 5? (Score:3, Insightful)

    by daver_au (213961) on Friday October 15, 2010 @06:24PM (#33914106)

    They wanted performance and went *RAID 5*? That pretty much sums the entire approach up. Let's not optimise the application first, the database second, but instead hide the problem by throwing hardware at it. Then what we'll do is use a RAID configuration that hobbles the write performance of the arrays and lets not mention what happens to performance when we lose a disk (don't say it won't happen).

    Sure, RAID 5 is the answer to somethings, but not when the question is database *PERFORMANCE*.

    Also - latency is more important than IOP/s. I don't care how many IOP/s you can do, if you're latency is high, the performance won't be. Most garden variety storage engineers don't seem to grasp this concept.

    • by jon3k (691256)
      I thought that was very odd as well, but maybe their workload is dramatically more reads than writes? RAID 5 obviously gives them a LOT more capacity, so maybe it made sense for them?
  • by roc97007 (608802)

    What the hell does AOL need a database for? Users still on hold trying to cancel their accounts?

    • by jon3k (691256)
      TMZ, etc, they own quote a few popular sites these days. Not to mention the untold millions of e-mail boxes I'm sure they still service to this day.
  • by John Jamieson (890438) on Friday October 15, 2010 @10:10PM (#33915184)

    It is hard to know anything for sure with this limited amount of info. But it appears to me that they have not accomplished such a great feat.

    I put together a server this year that pushes over 9 GB/s. I did this with a mere 150 2.5 inch drives. (144 raid 10 + 6 live spares). This was SAS 2.0 of course, because in the real world SAS kicks FC's A**.

    We found that the real bottleneck to throughput is not the drives and not the SAS cards. We have 8 SAS 2.0 lanes coming into each card, multiply that by 6 cards, and you have a heck of a lot of potential.

    No, the real problem is you saturate your PCIe slots, and chipsets sometimes choke when you feed this much data. So, the chipset and PCI-e bus tend to be the restraining factor, not the archaic rotating platters.

    • by shri (17709)
      Care to share what this looks like? Several servers connected to a SAN with 150 drives?
    • by jon3k (691256)
      Throughput is far less relevant in this scenario than IOPS. You 150 drives would put out a PEAK theoretical throughput of sequential reads (no writes!) of about 27k IOPS. Or about 10% of the total IOPS of AOL's SSD-based storage system. You're also comparing individual serial connection bandwidth (a single FC or SAS connection) with the entire throughput of your director. They're doing 4GB/s (32Gb/s) per connection vs your 6Gb/s per connection.
    • by jon3k (691256)
      The article summary is wrong, from TFA:
      "So you're getting 4GB/sec. of PCIe bandwidth, not the 5Gbit/sec. or 6Gbit/sec. SAS bandwidth. You're getting almost an order of magnitude of bandwidth to the storage internally just because you're using an interface that's capable of it," Pollack said.

      That's 4GB(ytes) per second. Not 4 gigaBIT per second. That's 32Gbit/s vs your 6Gbit/s via SAS.
  • Serial ATA 3.0 and SAS achieve 5-6 Gb/s. This system delivers 4 GB/s. It's really sad how these sloppy summaries make it to the front page.

    Quote from TFA: "So you're getting the 4GB/sec. of PCIe bandwidth, not the 5Gbit/sec. or 6Gbit/sec. SAS bandwidth. You're getting almost an order of magnitude of bandwidth to the storage internally just because you're using an interface that's capable of it," Pollack said.

  • They will probably save money compared to powering and cooling the equivalent disk array.
  • Wait a while until Write Amplification kicks in. Then they'll be screwed.

  • Wow, 50TB of flash is a lot of thumbdrives!

God may be subtle, but he isn't plain mean. -- Albert Einstein

Working...