IBM Building 120PB Cluster Out of 200,000 Hard Disks 290
MrSeb writes "Smashing all known records by some margin, IBM Research Almaden, California, has developed hardware and software technologies that will allow it to strap together 200,000 hard drives to create a single storage cluster of 120 petabytes — 120 million gigabytes. The data repository, which currently has no name, is being developed for an unnamed customer, but with a capacity of 120PB, it's most likely use will be a storage device for a governmental (or Facebook) supercomputer. With IBM's GPFS (General Parallel File System), over 30,000 files can be created per second — and with massive parallelism, and no doubt thanks to the 200,000 individual drives in the array, single files can be read or written at several terabytes per second."
What's it for? (Score:2)
Re: (Score:2)
Billionaire porn collections are stored in multiple locations. billionaire pr0n collection [google.com]
Sounds like a data orgy.. (Score:2)
...for hoarding whorecookies.
Re: (Score:2)
What's it for? No surprise, domestic spying.
Re:What's it for? (Score:5, Funny)
What's it for? No surprise, domestic spying.
I think you mean "protecting your freedoms, fellow patriot."
Re: (Score:2)
What's it for? No surprise, domestic spying.
Well the butler always did it, this way they'll have proof.
Re: (Score:2)
Re: (Score:2)
Why would a billionaire need porn?
Re: (Score:2)
Being a billionaire would attract a lot of women regardless of how you look.
Re: (Score:2)
Being a billionaire would attract a lot of women regardless of how you look. [Emphasis mine]
I don't think that gender matters much here.
Re: (Score:2)
I'm just suggesting that billionaires would have a better option than porn. Why collect porn when you could collect porn stars?
Re:What's it for? (Score:4, Funny)
Yes, he's an admitted petaphyle.
Re: (Score:2)
China's internet surveillance records?
Re: (Score:2, Informative)
A billionaire's porn collection is called a "harem".
Re: (Score:2)
Re: (Score:2)
Could be a 4 and a half day buffer of raw data from the LHC (ok, unlikely). The data rate that thing generates blows my mind.
Not done yet (Score:2)
Re:Not done yet (Score:5, Funny)
Punch cards.
Re: (Score:3)
Imagine a Beowulf cluster of these!
I wonder.. (Score:3)
...about the sound and torque generated when all these disks start to spin-up.
Re: (Score:3)
Re: (Score:3)
Yup. Don't mount them all in the same orientation as the Earth's axis or you can probably measure the change in the day's length.
Re: (Score:2)
If the torque were an issue (which it's not), you could mount the drives in alternating directions to balance them out.
Re: (Score:3)
My geek nature disapproves such torque-negating behavior. Instead, it totally wants to see the petabytes spin at some insane RPM, cancelling the gravity and possibly crushing some enemies.
Re: (Score:2)
Re: (Score:2)
Alternating directions you say? How exactly do you expect that to cancel torque?
Upside-down.
Re: (Score:3)
Yes, alternating directions. That assumes the drives are mounted vertically. If they're mounted horizontally, then yes, upside-down.
If they're using SSDs, then they need special leveling algorithms to keep the accesses spread out so that they don't get out of balance. If you access the left side of all your SSDs in the rack, the rack might fall over. :)
imagine the brown out! (Score:2)
Can you just imagine the brown up when they power up the drive farm?
In practice they would be doing sequential spin up. I do however, wonder how long that would take to sequentially spin up 200k drives.
Re: (Score:2)
Re: (Score:2)
And the heat, assuming they're using all their old Hitachi Deskstar drives.
That sounds like a plot to a disaster movie... "Sir, the cluster won't shut down! We're looking at a full melt down!"
Finally... (Score:2)
Somewhere I can store _all_ my porn in one spot.
Re: (Score:2)
I think you mean "store _all_ THE porn".
Paranoid much? (Score:2)
it's most likely use will be a storage device for a governmental (or Facebook) supercomputer.
Actually, given the explosion of data storage needs in the bio-informatics area, it's most likely use would be in storing DNA sequences for research purposes.
Re: (Score:2)
The human genome can effectively be stored in about 750MB (each base being only 2 bits). The largest genomes are only abut 10x that size. IIRC the FASTA files for it take only about 3GB uncompressed.
Even with specific protein sequences, etc. I think that's a bit excessive the bio-informatics field.
Also, I'm not sure if even the NIH could afford that kind of storage cluster.
Re: (Score:3, Informative)
modern gernome compression techniques only store the edits needed to convert the reference genome to your genome. And the diff file is just around 24 MB per person. I am an ex-bioinformatician.
Re: (Score:2)
So am I. I was just talking about the base genome, not the diffs.
Re: (Score:2)
...the diff file is just around 24 MB per person.
OK, so 120 petabytes will store the genomes for about 5 billion people, not accounting for the further compression that could probably happen. Maybe this is for everyone's genome.
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
My understanding is that that amount of data is post-processed information, and that there are reasons not to be throwing out some of the the intermediate data (it could be re-analyzed by better algorithms in the future), but it gets thrown out anyway just because there is no space to store it.
Fill 'er up (Score:5, Funny)
All I know is that if you put it on my computer, I'll have it filled in two years and have no idea what's actually on it.
Re: (Score:2)
Sadly... that would apply to me as well :)
Finally! (Score:3)
Would be a good fit for CERN LHC (Score:3)
Re: (Score:2)
Re: (Score:2)
A good solution to THEIR data problems.
Irregardless, grammer and spelin ain't no science, its a art form... and for all intensive porpoises, I lost power last night do to the slight'est bit of wind from whether system Irene (I live in south florida) from about ~6:30PM till ~3:00AM... at lease my generator worked, and FPL was on my road buy 8PM. Not sure why I came into work... I'm so tierd.
Not the government. (Score:5, Interesting)
It's not the government guys, at least not the cloak and dagger kind. They're too paranoid to let you know how much data they can store. They also don't want you to know that even with all that data, they're still only able to utilize a fraction of it. People are still going through WWII wire intercepts *today*. No, the problem in the intelligence community is making the data useful and organized as efficiently as possible, not collecting it.
That leaves only one real option: Scientific research. Look at how much data the Hadron Supercollider produces in a day. ..
Re: (Score:2)
http://www.youtube.com/watch?v=-tNMGev1t9M [youtube.com]
Re:Not the government. (Score:5, Insightful)
This is generally something I have a hard time convincing people of. I've worked for spooky organizations. Not at the highest levels or on the most secret projects, but in the general vicinity. The government is not monitoring you. Not because they lack the legal capability (though they do, and that is mostly, but not always, respected), but because they lack the technical ability. There are only so many analysts, only so much computer time, only so much storage. Except in cases of explicit corruption or misuse of resource, those analysts, that computer time, and that storage is not being wasted on monitoring Joe and Jane average.
I'm not going to say that there aren't abuses by the people who have access to some of this stuff; they are human and weak like the rest of us and are often tempted to take advantage of their situation I'm sure. In general however, unless you've done something that got a warrant issued for your information, the government doesn't care. They just don't have the resources to be big brother, even if they want to be.
Re: (Score:3)
There are only so many analysts, only so much computer time, only so much storage.
The government has found a solution to that problem. Distribute the computing and storage requirements.
These days if you want a license to sell alcohol in your shop you have to get agreement from the police, and they usually require you to have extensive CCTV systems covering the area outside your shop as well as inside it. They shift the burden of installing and maintaining the system to the shop owner and can access the video any time they like. If a crime is reported the shop owner gets a demand for CCTV
Re:big brother (Score:2)
I'll give you credit for "this used to be true" back in the day when a computer was a 486 on a modem. It's absolutely not true any more.
Govt is Big Brother, and they Like it. And they absolutely have the resources to do it.
Why? Because all they need to do is a Red Flag system. Joe Average doesn't really produce that much data per day all by himself, and .gov isn't trying to perfectly reproduce the entire activity. They just need to know if something is getting juicy.
"Look! Here's a 12 Gig file of Joe's acti
Re: (Score:2)
It's not besides the point, it's the practical side of the point. This doesn't mean we should ignore questions of morality, how much power is too much, how much monitoring is appropriate, etc... It just mean that while these philosophical questions are both interesting and relevant you don't really need to worried about the practical implications day by day. Practically, the government *can't* watch you all the time, or really at all, unless you are the subject of some investigation worth those resources
Re: (Score:2)
Practically, the government *can't* watch you all the time, or really at all, unless you are the subject of some investigation worth those resources
Trouble is, if the government does something I don't like, and I start taking (perfectly legal) political action against that, I become someone "worth" watching. So surveillance capability is something to worry about now; otherwise, when something directly problematic comes up, you're a dissident and it's too late.
Re: (Score:2)
Things can change though. For example right now, monitoring by the USG is not on my list of worries, because I'm sure i'd bore to tears any people watching.
However, governments can change; the LEOs who are looking for felonies being committed and are abiding by their oath have a possibly of being replaced by people more interested in getting rid of any opposition.
Take a system for figuring out if someone gets an intensive or routine search at customs. That same technology can be used to data mine social n
Re: (Score:2)
don't really need to worried about the practical implications day by day
Normal activities like traveling or opening a bank account are quite noticeably affected by government surveillance of the financial and transport systems. Any practical limitation of government capabilities can be made up for by requisitioning private resources or by simply blocking events that are difficult to monitor from happening at all.
Re: (Score:2)
Do you own a cell phone? Your carrier knows where you are, right now and has records of where you've been this week. They know who you've talked to, and they're more than happy to share that information with 'interested parties' in the government, no warrant required.
Given that the data sizes are small, there's no reason they can't store everyone's location/ph
Re: (Score:2)
Entirely besides the point. They could, therefore it can be abused, therefore there must be EXTREME oversight. Period.
Yeah, we should setup a government board or require government courts to monitor and oversee the government...er wait...
Re: (Score:2)
1. Get back on the meds, really.
2. If they did get ride of the "organs of state secrecy" how would you know? They are secret after all.
Really if you got rid of the CIA, NSA, and NRO they would still be there. No nation can survive with out intelligence gathering organisations. So what would happen is they would be hidden and secret. Being pubic means that there is oversite.
So really get back on the meds and the voices will stop.
I propose a name for it ... (Score:2)
FTFS:
It's the tech equivalent of Prince - it's "the data repository with no name." We can denote it with some sort of unicode glyph that slashdot will mangle.
And of course it has amazingly fast read speeds - if each drive has a 32 meg cache, that's 6.4 terabytes just for the cache.
BTW, it's for the ^@#%^&^+++NO CARRIER
Loading times (Score:2)
Good job for a HS kid... (Score:2, Interesting)
Run around with a shopping cart and swap out drives as they fail. Kind of like they did back in first computer days with vacuum tubes.
Constant failures? (Score:2)
With 200,000 hard drives, won't there always be at least one hard drive that is failing? You'll need an IT guy 24/7 swapping out the failed drives. As soon as he swaps out one drive, another one will fail. It just seems kinda ridiculous.
Re: (Score:3)
This is what MTBF is all about. "Enterprise" drives are rated at 1.2 million hours MTBF. 1,200,200 hours / 200,000 drives = 6 hours per drive failure. Not too bad, only 4 a day.
Re: (Score:3)
Re: (Score:2)
Even ancient RAID5 implementations are not that bad. Most likely this is really some sort of RAID over RAID over RAID, or some sort of RAID like software that does similar actions. This means no downtime and most likely nearly no speed costs for a single drive.
Re: (Score:2)
Also, that 1% figure is bullshit. Expect 6% to die in the first year (nearly 3% in the first 3 months)
Re: (Score:2)
Re: (Score:2)
I would guess that would be the reason for the water cooling, to increase the drives reliability.
Also from the article it sounds like they may have more than 200,000 hard drives hooked up, but only use 200,000 at a time so the computer can automatically begin recreating the dead drive as soon as it occurs.
Re: (Score:2)
Re: (Score:2)
Given that 'water' cooling in computers is never done with water (and most other closed systems besides cars are neither) but with an inert fluid it's not really that big of a problem.
Even in home computers, "water" in water cooling (as some dweebs have indeed used tap water) has been known to calcify, have algae growths and/or corrode the components and there are a lot of other liquids that are better at transferring heat than water.
Also, pure water (the undrinkable kind) is inert.
Re: (Score:2)
http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/ [backblaze.com]
Backblaze provides some metrics about their drive failure rates. It's surprisingly low (1-5% per year). If they had 200k drives, they would need to replace 39-192 per week. I'm sure the cluster is built with lots of redundancy that doesn't require a person to immediately replace a failed drive. They'll probably need a full time staff of at least 3 to maintain it.
Google? (Score:2)
Not so impressive as a floppy RAID (Score:2, Informative)
If they could make a 120PB cluster using floppy disks, I would be much more entertained by this.
Re: (Score:2)
Man.. and make sure its the 5.25" drives that love to chatter... kind of like a commodore 64 drive loading up flight simulator ][
Re: (Score:2)
lots of aluminum (Score:2)
Someone should manufacture industrial sized hard drives for this type of application. Like full height x2, so you could cram 30 platters in there.
Re: (Score:2)
That's only 400MB for every US American (Score:2)
Re: (Score:2)
Not everybody has more than 400MB in their e-mail account and a LOT of that can be compressed or de-duplicated (spam). Google doesn't need THAT much. I think for ALL their data they're probably close to 100 PB which is again, not all that impressive these days. Off course they have it redundantly in every data center so their capacity is much larger.
From a scientific standpoint, this would be capable of storing video of everything a person has seen in his life or when running a simulation of the Universe, s
If it weren't for those meddling disk manufacturer (Score:2)
It would be 122PB. 2PB lost on bad marketing. Gimme my 1024 bytes back. But all-in-all this isn't that surprising. You can get 1PB in a 42U rack these days.
As a fun side note: You'll also need 122PB of tape storage (or 1.5 systems like this) just for backups. That's a lot of tape.
Someday, we will carry these in our pockets (Score:2)
Failure rate? (Score:2)
We know the capacity. We know the transfer rate. But how quickly do disks need to be moved in and out of the system in order to keep it running?
200,000 is a lot of disks. I assume they are all hot swap with a great deal of redundancy because I would expect multiple drive failures every day. A raid0 with that many disks might never boot.
MTBF Question (Score:2)
Re: (Score:2)
I do manage large storage farms in the petabytes range. There is a curve to the rate at which disks die. It mostly seems kinda obvious.
#1 - Infant mortality. I see a bunch of drives fail within the first few months of a new install.
#2 - Increased death rate as the drives age. Usually when the drives start to reach the warranty age. This can be accelerated depending on the IO load of the system.
There's a lot of great info out there. Here's one good whitepaper:
http://static.googleusercontent.com/externa [googleusercontent.com]
Re: (Score:2)
Re: (Score:2)
Because it's a target regardless of who owns it. God could own it and call it the garden of Eden and people would still blow it up
Re:Depressing (Score:5, Insightful)
Facebook and presumably a spy agency?
You're repeating yourself.
emo? (Score:2)
Anyone else find it depressing that the two top suspects for the use of this system are Facebook and presumably a spy agency?
Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?
No wonder F@H doesn't post more progress. Our hardware is going towards people sharing their naked bong photos and government spooks cataloging your naked bong photos.
You are trying too hard looking for something to be upset about (in a very attention-whorish manner to boot.)
Re: (Score:2)
Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?
Yes. It ain't that hard to come to that answer, you know? The slashdot's story half-seriously hints at either a government agency (NSA) or somebody like Facebook. And obviously in Emo fashion, you took it as an statement about humanity. It's more a statement about you.
I find these type of opinions rather simplistic as other opportunities in large-scale application engineering abound:
Re: (Score:2)
Don't forget the IRS.
Re: (Score:2)
I'd go with the flux capacitor personally. Then you can go back in time, invest in IBM, Microsoft, Google, and Apple when shares are still cheap and buy the 120PB cluster. Assuming you drive a DeLorean anyway.
Re: (Score:2)
What do you mean: can big disk arrays be build so that replacements can be automated? Of course they can be build, it would not even be that hard. Well, as long as you don't put drive/server production and delivery of the components or auto assembly in the automated system. I could not find one on google, I guess on such a large drive array, you can afford a human to replace some disks now and then. Humans are more flexible and more prone to see other problems occuring as well.
Re: (Score:2)
Re: (Score:2)
Your hot-spares provide immediate 'replacement', which allows you to make physical replacement less time-critical just by adding more drives to the system, and most big-huge-storage systems have front mounted indicator lights for drive health.
Having a human on duty who g
Re: (Score:2)
I looked into making a hard drive silo as a business. Even dropped the business proposal by some vendors. You would put bare SATA or SAS drives in a load port and they would be dropped into place in groups for reading/writing. Critical data would have four HDDs writing at a time (three way mirror, plus one HDD that would go offsite.) Non critical would get 5-8 HDDs writing in a RAID 6 configuration. It would have been nice to have because disks can be erased faster than tapes for security (just do an A
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
600GB drives? (Score:2)
based on 120 million GB and 200k drives, the per-drive capacity works out to 600GB a piece. Sounded like they're stringing together a bunch of WD Velociraptors.
Re: (Score:2)
So how long would a consumer grade "raid chipset" take to rebuild that raid if it was a raid 5 setup (with the drives split into 3 different raid 0 setups)?