AOL Spends $1M On Solid State Memory SAN 158
Lucas123 writes "AOL recently completed the roll out of a 50TB SAN made entirely of NAND flash in order to address performance issues with its relational database. While the flash memory fixed the problem, it didn't come cheap, at about four times the cost of a typical Fibre Channel disk array with the same capacity, and it performs at about 250,000 IOPS. One reason the flash SAN is so fast is that it doesn't use a SAS or PCIe backbone, but instead has a proprietary interface that offers up 5 to 6Gb/s throughput. AOL's senior operations architect said the SAN cost about $20 per gigabyte of capacity, or about $1 million. But, as he puts it, 'It's very easy to fall in love with this stuff once you're on it.'"
AOL? (Score:5, Funny)
What is surprising to me is not the amount of money spent on what was bought, but the fact that AOL has any performance issues at all. They still have users? They have an entire database of users?
Re:AOL? (Score:5, Informative)
Re: (Score:3, Interesting)
It's true that they make more money now with their content sites, but only slightly more: ISP subscriptions still make up around 40% of its revenues.
Re:AOL? (Score:4, Informative)
Re:AOL? (Score:4, Informative)
Neither. AOL separated into its own company again.
As a very casual observer it seems like the entire TW/AOL debacle could not have been mismanged worse - well I guess both companies could have gone titsup, but that's about it. TW vastly overpaid for AOL when AOL was at its peak (160 billion dollars). Then, just as AOL had started to climb out of the bottom they spun it off for a song ($2.5 billion). Since then AOL has been doing a decent enough job of reinventing itself as "new media" company - the kind of thing TW seems to be struggling with.
That's why corporate CEO's get the big bucks though!
Re: (Score:2, Informative)
AOL bought TW. It was a very shrewd move for AOL, because TW had a much higher intrinsic value to set a floor on the stock price when the internet bubble burst.
Re: (Score:3, Informative)
This isn't true - what the AC says is true. TW was bought by AOL when AOL could leverage its 160 billion dollar fairy-dust value into tangible assets. If they hadn't done so they would have been gone years ago. It was a brilliant move by AOL and at the time, TW thought it was a great deal as well. TW got suckered, as we know now. But in that day and age it looked like a good move: AOL had the internet savvy, TW the IP - combine them and rule the Internet. Ofcourse, that didn't quite go as planned.
Re:AOL? (Score:5, Informative)
AOL is Advertising.com and some flagship sites. And yes, they still have dialup users. The access business is steadily decreasing, but its pretty profitable since they basically stopped upgrading it and now just sort of run it.
If they maintain their current path, yes, they will eventually disappear and fail, but the process is much longer than you might think. Not all of their acquisitions were as retarded as Bebo.
What they probably need the SAN for is the Advertising business. That is profitable and requires a shitload of storage. They don't need that for their websites.
Re:AOL? (Score:5, Funny)
Re:AOL? (Score:4, Funny)
Also, there are users who wanted to cancel years ago, but are still lost in the phone tree. Those are still active accounts too.
Re:AOL? (Score:5, Funny)
Me Too!!!
Re: (Score:3, Funny)
No, the 50TB is for a museum of all the different CDs they sent out.
Re: (Score:2)
Re: (Score:2)
They probably did it because their database vendor (Microsoft?) claimed that their database problems had to be due to their hardware. It couldn't possibly be software performance issues...
Re: (Score:2)
What's surprising to me is that they managed to extract such awful performance out of so many SSDs. I mean, seriously, a pitiful 250k IOPS with $1M of SSD? You could do better with a dozen SSDs from the corner store!
Re: (Score:2)
What? (Score:5, Insightful)
As a DBA, I would love to have solid-state storage instead of needing to segment my databases properly and work with the software dev guys to make sure we have reasonable load distribution.
Where can I get someone to pay a million dollars so I can do substandard work?
Re:What? (Score:5, Funny)
Re:What? (Score:4, Funny)
You could always take a long lunch, cross the bridge from Redmond to Seattle, and apply at Amazon. I'm sure Microsoft would give you a couple of hours off to do that, right?
Re: (Score:3, Insightful)
You're the DBA - do what you do best, and start Googling! :)
Re: (Score:3, Insightful)
Of course, in the real world, this sort of thing (maybe not to this scale) happens all the time. We just had a customer that was having major performance problems. They d
Re: (Score:3)
Re: (Score:2)
Not talking about any specific real world case:
When does it become cheaper to throw more power at it than improving code efficiency? It seems to me that this is taking the same steps that a large amount of software has. that it is cheaper to use a more powerful processor than optimize the code...
Granted they likely jumped the gun a little bit but the world needs early adopters...
Re: (Score:2)
Re:What? (Score:5, Informative)
I have a feeling AOL just spend $1,000,000 on something they didn't really need as well.
They admitted as much in the article. They decided that it was cheaper to improve the hardware throughput than to spend the money on developers to try to trim the demand. They were also probably losing money by not meeting SLAs and a quick fix was cheaper in the long run. They also reduced power and cooling requirements as well, so there may be some long term payback there as well. The free publicity certainly didn't hurt either
Re: (Score:2)
Mod parent way up. This is where big companies waste bundles of money. Rather than do the work right they throw ever more hardware at it.
Re:What? (Score:4, Funny)
You could probably get by with a cloud of 486s, but why the fuck would you bother?
Re: (Score:3)
Where can I get someone to pay a million dollars so I can do substandard work?
You try claiming the next big work they want will take more than a million dollars in DEV/DBA work compared to buying a million dollar SAN. At this point three things could happen:
1. They say "um, never mind"
2. They pony up the cash
3. They call you on it
While I've seen some rather dysfunctional companies, I still haven't seen any where the PHBs try reestimating the IT cost themselves. Mind you, I haven't seen an overwhelming many companies that have a spare million dollars lying aorund either so I figure #1
Re: (Score:3, Interesting)
Re: (Score:2, Insightful)
certainly the failure of an entire infrastructure after the failure of a single drive is the fault of the drive manufacturer. spinning disks never fail?
Re: (Score:2)
Re: (Score:2)
If databases were implemented correctly, they'd take care of the load distribution themselves. Of course we'd all still be perfectly capable of writing stupid queries, but a lot of the bullshit we have to deal with when it comes to databases stems from rotational hard drives being so ill-suited to the random seeks that databases are so useful for.
As far as I'm concerned, running your database on solid-state drives just amounts to a bug-fix in the database software. Stuff like data denormalization, avoiding
Re: (Score:2)
Re: (Score:2)
I definitely see a use-case for flash based approaches, where you both need the read and the write IOPS and don't have warehousing amounts of data, but the usecase is narrower than people think.
Reasonable load distribution can
Sas bandwidth constrained??? (Score:2)
It does mention that sas can 'only' deliver 5Gbit/sec - but is that not the bandwidth for each disk and thus not a problem at all?
The reason the ssh is so much faster is most likely the nice search time for ssd. And I really like the concept of them using flash chips directly. Now we just need something cheeper then 20$/GB :}
Re: (Score:2)
At the rate SSD storage is growing (and the capacity is being used), it is conceivable that a company could choose cheap MLC drives and simply plan on upgrading them before their expected time of death.
With modern wear-leveling algorithms, reduced write amplification, and better physical longevity, I can see cheap SSDs lasting the 2-3 years their capacity would be good for.
SATA SSD over iSCSI is starting to look very appealing now compared to Fibre Channel or SAS. Since silicon performance and capacity scal
Re: (Score:2)
Re: (Score:3, Informative)
Google found differently in their massive hard drive survey [engadget.com]... sometimes drives would just up and die with no SMART warnings. Also the most common SSD failure-case is lack of writes, at least you can retri
Re: (Score:2, Informative)
Re: (Score:2)
Re: (Score:3, Informative)
Google found differently in their massive hard drive survey [engadget.com]... sometimes drives would just up and die with no SMART warnings. Also the most common SSD failure-case is lack of writes, at least you can retrieve data off the drive as opposed to a completely opaque device if the platter is frozen.
Yeah, I've seen quite the opposite. Let me preface this with saying that I'm strictly talking about consumer and midrange drives, I've seen very few SCSI and SAS drives die without warning.
In the past 10 years, in a company with about 200 nodes, I can literally count on one hand the amount of hard drives that have given any SMART warnings leading up to their imminent failure. They pretty much always die while the OS accumulates log entries of bad blocks and I/O errors. Most of the time it was either
Re: (Score:2)
We run all our databases on SSD. Just like disk drives, and unlike your claim, sometimes they simply drop dead without warning, even the high-end ones.
The performance gains are entirely worth it, though.
Re:Sas bandwidth constrained??? (Score:5, Insightful)
Now we just need something cheeper then 20$/GB
Actually, the price was the most interesting part of this:
at about four times the cost of a typical Fibre Channel disk array with the same capacity
Four times the price and, what, ten? A hundred? times the IOPS? That makes NAND pretty much a no brainer for any heavy-use database.
Re: (Score:2)
The first problem on my mind right now though is that nearly all widely used relational databases are built with a lot of algorithmic assumptions about the disk. They spend a great deal of time ensuring that they only fetch the minimum number of blocks, and many higher end databases go to lengths to ensure that related blocks wind up near each other on disk, implement block caches and things like that. A lot of this is done to mitigate seek time.
With SSDs, seek time is basically constant and there's no need
Re: (Score:2)
Something is wrong when DB is handling that ... The OS underneath should do these conclusions and optimize based on type of storage, without intervention of the DB software.
Of course, applications to have to manage the load they do to a degree, but down to hardware level? That's simply too much, better trust kernel to make the right decisions! Then again, the world isn't perfect ...
We have to battle with profoundly bad HDD IO management on the software (Still that software is best for our business), but it
Re: (Score:2)
Yeah. The relevant metric for databases really is $/IOPS, not $/GB.
So, off the cuff, I figure you need a 700-disk array of 146GB drives to do this much storage at RAID 10 ( or 0+1 for you pedants ). That's a lot of random IO capacity. I don't know how poorly IOPS scale for systems at this magnitude, but I'd be surprised if the SSD solution was 10x IOPS over 700 15k spindles. Maybe 2-5x?
Re: (Score:2)
Re: (Score:2)
I'm just curious, how did you come up with 250k IOPS with 700 disks? Short-stroking?
Re: (Score:2)
I didn't get 250k IOPS. I _said_ 250k IOPS was 2-5x better. I used the same math you did, and specifically hedged about not knowing how poor the scaling was with these kinds of systems. I am _not_ a storage engineer, just a developer with a (professional) interest in high performance random IO systems.
Re: (Score:2)
Re: (Score:2)
Power and cooling are a big win here, no doubt.
What would really be news here would be database engines and/or filesystems that grocked SSD performance patterns well and could combine pools of spinning disks and SSDs in optimal ways for a given workload.
Re: (Score:2)
Your transactions per second won't scale as well as your IOPS because with spinning disks, there's still a significant latency before your data actually gets written to disk.
RAID just increases the number of in-flight IOs, widening the throughput but not decreasing its latency per disk.
Re: (Score:2)
RAID by itself does only increase the number of in-flight IOs, but it almost always comes with that most magical of pixie dust, the battery-backed cache.
The other point that I'll make is that often the only writes your RDBMS is waiting on are log writes, which are sequential anyway.
In any case - I'll cede your point that spinning rust will likely NEVER scale as well as NAND.
Re: (Score:2)
Cheaper than $20 a GB.
I know thats expensive now, but I'm just old enough to remember when a GB of spinning magnetic disk was a big effing deal.
Re: (Score:2)
SAS can deliver 6Gbps, as can SATA nowadays too (Tho rare). Fastest SSDs hit this limit, but there's no simple way to go beyond (PCIe controller).
SANs have their own bottleneck: SAN switches, which due to "centralized nature" (all traffic from all nodes goes through certain set of switches, or single switch) lowers the overall throughput.
There are ways to have waaaay more IOPS, and waaaay higher throughput total, for way less money. These ways are what we intend to use in our VM cluster to be brought up nex
It is called HDSL... (Score:4, Informative)
You can read more about that here:
http://www.google.com/search?q=High-Speed+Data+Link [google.com]
Re: (Score:2)
Really? (Score:4, Informative)
My impression has been that this has been what has been going on for some time now with all the larger database operations, and one of the reasons why SSD have not yet come down in price is that all the best units and tech are going to the big companies as fast as they can get it from the manufacturers. I wouldn't be surprised to see someone like Google saying something like "yawn, 50TB" and saying that they have PETABYTE versions already out there.
If you run a Database of any size, especially ones with large read to write ratios, SSD would only make things faster. And speed counts.
Re: (Score:2)
Yeah, because at $1MM for 50TB, a $20MM investment by a publicly owned company in such a thing would entirely fly under the radar...
Re: (Score:2)
Well, their purchasing agents got Microsofted (Score:2)
"but instead has a proprietary interface that offers up 5 to 6Gb/s throughput."
You know that SAS offers 6Gb/s throughput and Infiniband up to 300Gb/s (with 8 and 16 being more common).
Either way, $1M for a bunch of SAS SSD (even SAS NVRAM) is way overpriced imho. They could've done it cheaper.
Re: (Score:3, Funny)
hey could've done it cheaper.
It's AOL, would you actually expect them to make intelligent, informed decisions?
Re: (Score:2)
DB Performance Issues (Score:2)
Re: (Score:2)
I wonder how the total cost compares (Score:2)
once you figure the total energy savings (reduced power needs, reduced cooling needs, etc) over the lifetime of the drive I wonder how much more expensive it is. I can't wait for SSD to become more affordable. I'd like to have that in our SANs too.
Re: (Score:2)
Re: (Score:2)
Hm. I've always seen power as the most expensive part of an enterprise deployment -- see also why these companies are building data centers in cheap-power areas.
Re: (Score:2)
Wait! (Score:2)
Does this mean AOL is doing something novel and progressive? Something doesn't feel right about that...
I'm so confused!
Finite number of program-erase cycles? (Score:3, Informative)
I wonder what the read/write rating is vs. a hard disk?
Wikipedia puts flash at 1,000,000 program-erase cycles [wikipedia.org]
Re: (Score:2)
Troll. Not even a very good one.
Re: (Score:3, Interesting)
It's a non-problem. With Intel's 64GB X25-E drive, for example, you can do non-stop random writes for 6 years before you run into problems. We run all our databases on SSDs, mostly Intel and FusionIO ioDrives.
That said, we've had drives simply drop dead with a controller failure. You still have to run a RAID array, even with SSDs.
Re: (Score:2)
Ok guys.... (Score:3, Funny)
It's very easy to fall in love with this stuff once you're on it.
I said the same thing about coke in the 70's....
I guess what i'm saying is, no one loan money to AOL until they admit they have a problem.
interface? (Score:2)
From summary:
What are they talking about? The violin memory website says the appliances themselves support FC, 10 GbE, and Infiniband connections [violin-memory.com]. Their performance page [violin-memory.com] says that the appliance can be directly connected to a pcie bus, presumably using some sort of pass-through interface card, but what physical connector and media are used?
Re: (Score:2)
Re: (Score:2)
The Slashdot post is incorrect, according to the article. The actual throughput is about 4GB/s.
And? (Score:2)
I've got to be missing something here. The seek times are probably out of this world with this "specialized" SAN, but then we have equallogic SANs that can have 48 SSDs and have 10Gb/s...
Hey AOL - you are in the arctic right? Can I interest you in some of this amazing ice?
Re: (Score:2)
Nah
Cheap (Score:2)
Hei folks,
20$/GB is not that much IMHO... is that net capacity, does it include geographical replication? Depending on the answer, the real news could be that SSD storage is so much more competitive that one may have thought... :D
Re: (Score:2)
Once you factor in the total cost of ownership for a disk-based SAN eg: heat/cooling/maintenance/etc... Flash is actually pretty cheap.
Re: (Score:2)
Look at Google and Facebook, not AOL's bandaid (Score:2)
Re: (Score:2)
RAID 5? (Score:3, Insightful)
They wanted performance and went *RAID 5*? That pretty much sums the entire approach up. Let's not optimise the application first, the database second, but instead hide the problem by throwing hardware at it. Then what we'll do is use a RAID configuration that hobbles the write performance of the arrays and lets not mention what happens to performance when we lose a disk (don't say it won't happen).
Sure, RAID 5 is the answer to somethings, but not when the question is database *PERFORMANCE*.
Also - latency is more important than IOP/s. I don't care how many IOP/s you can do, if you're latency is high, the performance won't be. Most garden variety storage engineers don't seem to grasp this concept.
Re: (Score:2)
AOL is still around? (Score:2, Redundant)
What the hell does AOL need a database for? Users still on hold trying to cancel their accounts?
Re: (Score:2)
They may have wasted the cash (Score:3, Interesting)
It is hard to know anything for sure with this limited amount of info. But it appears to me that they have not accomplished such a great feat.
I put together a server this year that pushes over 9 GB/s. I did this with a mere 150 2.5 inch drives. (144 raid 10 + 6 live spares). This was SAS 2.0 of course, because in the real world SAS kicks FC's A**.
We found that the real bottleneck to throughput is not the drives and not the SAS cards. We have 8 SAS 2.0 lanes coming into each card, multiply that by 6 cards, and you have a heck of a lot of potential.
No, the real problem is you saturate your PCIe slots, and chipsets sometimes choke when you feed this much data. So, the chipset and PCI-e bus tend to be the restraining factor, not the archaic rotating platters.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
"So you're getting 4GB/sec. of PCIe bandwidth, not the 5Gbit/sec. or 6Gbit/sec. SAS bandwidth. You're getting almost an order of magnitude of bandwidth to the storage internally just because you're using an interface that's capable of it," Pollack said.
That's 4GB(ytes) per second. Not 4 gigaBIT per second. That's 32Gbit/s vs your 6Gbit/s via SAS.
Bad summary. Not 5-6 Gb/s but 4 GB/s. (Score:2)
Serial ATA 3.0 and SAS achieve 5-6 Gb/s. This system delivers 4 GB/s. It's really sad how these sloppy summaries make it to the front page.
Quote from TFA: "So you're getting the 4GB/sec. of PCIe bandwidth, not the 5Gbit/sec. or 6Gbit/sec. SAS bandwidth. You're getting almost an order of magnitude of bandwidth to the storage internally just because you're using an interface that's capable of it," Pollack said.
Higher costs? Maybe not. (Score:2)
Wait a while until Write Amplification kicks in (Score:2)
Wait a while until Write Amplification kicks in. Then they'll be screwed.
USB FTW? (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)