Forgot your password?
typodupeerror
Data Storage Supercomputing Hardware

IBM Building 120PB Cluster Out of 200,000 Hard Disks 290

Posted by Soulskill
from the go-big-or-go-home dept.
MrSeb writes "Smashing all known records by some margin, IBM Research Almaden, California, has developed hardware and software technologies that will allow it to strap together 200,000 hard drives to create a single storage cluster of 120 petabytes — 120 million gigabytes. The data repository, which currently has no name, is being developed for an unnamed customer, but with a capacity of 120PB, it's most likely use will be a storage device for a governmental (or Facebook) supercomputer. With IBM's GPFS (General Parallel File System), over 30,000 files can be created per second — and with massive parallelism, and no doubt thanks to the 200,000 individual drives in the array, single files can be read or written at several terabytes per second."
This discussion has been archived. No new comments can be posted.

IBM Building 120PB Cluster Out of 200,000 Hard Disks

Comments Filter:
  • A billionaire's porn collection?
  • Do they back up to tape or external USB drive?
  • by eexaa (1252378) on Friday August 26, 2011 @10:21AM (#37218586) Homepage

    ...about the sound and torque generated when all these disks start to spin-up.

    • by jhoegl (638955)
      It may very well alter time as we know it!
      • by rrohbeck (944847)

        Yup. Don't mount them all in the same orientation as the Earth's axis or you can probably measure the change in the day's length.

    • by crow (16139)

      If the torque were an issue (which it's not), you could mount the drives in alternating directions to balance them out.

      • by eexaa (1252378)

        My geek nature disapproves such torque-negating behavior. Instead, it totally wants to see the petabytes spin at some insane RPM, cancelling the gravity and possibly crushing some enemies.

      • by rubycodez (864176)
        mounting in alternating directions? I saw some twin girl porns like that.....
      • Alternating directions you say? How exactly do you expect that to cancel torque?

        Upside-down.

        • by crow (16139)

          Yes, alternating directions. That assumes the drives are mounted vertically. If they're mounted horizontally, then yes, upside-down.

          If they're using SSDs, then they need special leveling algorithms to keep the accesses spread out so that they don't get out of balance. If you access the left side of all your SSDs in the rack, the rack might fall over. :)

    • Can you just imagine the brown up when they power up the drive farm?
      In practice they would be doing sequential spin up. I do however, wonder how long that would take to sequentially spin up 200k drives.

  • Somewhere I can store _all_ my porn in one spot.

  • it's most likely use will be a storage device for a governmental (or Facebook) supercomputer.

    Actually, given the explosion of data storage needs in the bio-informatics area, it's most likely use would be in storing DNA sequences for research purposes.

    • by ByOhTek (1181381)

      The human genome can effectively be stored in about 750MB (each base being only 2 bits). The largest genomes are only abut 10x that size. IIRC the FASTA files for it take only about 3GB uncompressed.

      Even with specific protein sequences, etc. I think that's a bit excessive the bio-informatics field.

      Also, I'm not sure if even the NIH could afford that kind of storage cluster.

      • Re: (Score:3, Informative)

        by Anonymous Coward

        modern gernome compression techniques only store the edits needed to convert the reference genome to your genome. And the diff file is just around 24 MB per person. I am an ex-bioinformatician.

        • by ByOhTek (1181381)

          So am I. I was just talking about the base genome, not the diffs.

        • by Beorytis (1014777)

          ...the diff file is just around 24 MB per person.

          OK, so 120 petabytes will store the genomes for about 5 billion people, not accounting for the further compression that could probably happen. Maybe this is for everyone's genome.

      • by tomknight (190939)
        Data requirements are doubling faster than disk storage capabilities. We're needing to find ways of dealing with this, but ideally without simply asking for more money for more disks. I've just been told a new academic here will need about 200TB in a few months. I can see my (fairly small set) of Bioinformatics researchers needing a PB before the end of next year.
      • by biodata (1981610)
        Our modest lab turns out roughly 100GB a week of finished sequence, from a single sequencer, which is only a very small fraction of the temporary disk storage needed along the way to get to finished sequence. Genome centres with many machines will turn out an order of magnitude (or two) more, and believe me, these machines are kept busy week after week. Once we have finished sequences, the assembly process adds a multiple to this. Yes, a genome is only XMB, but when you have to effectively sequence it 40
      • by skids (119237)

        My understanding is that that amount of data is post-processed information, and that there are reasons not to be throwing out some of the the intermediate data (it could be re-analyzed by better algorithms in the future), but it gets thrown out anyway just because there is no space to store it.

  • Fill 'er up (Score:5, Funny)

    by mmarlett (520340) on Friday August 26, 2011 @10:22AM (#37218596)

    All I know is that if you put it on my computer, I'll have it filled in two years and have no idea what's actually on it.

  • by AngryDeuce (2205124) on Friday August 26, 2011 @10:23AM (#37218604)
    Woot! Torrent all the things!
  • by Tynin (634655) on Friday August 26, 2011 @10:30AM (#37218714)
    My understanding is that the LHC generates so much data, that most of it is discarded immediately without going to disk. Seems like this would be a good solution to there data problems.
    • by rubycodez (864176)
      they discard the common uninteresting decays, no point in storing it
  • Not the government. (Score:5, Interesting)

    by girlintraining (1395911) on Friday August 26, 2011 @10:31AM (#37218724)

    It's not the government guys, at least not the cloak and dagger kind. They're too paranoid to let you know how much data they can store. They also don't want you to know that even with all that data, they're still only able to utilize a fraction of it. People are still going through WWII wire intercepts *today*. No, the problem in the intelligence community is making the data useful and organized as efficiently as possible, not collecting it.

    That leaves only one real option: Scientific research. Look at how much data the Hadron Supercollider produces in a day. ..

    • by DrgnDancer (137700) on Friday August 26, 2011 @10:50AM (#37218946) Homepage

      This is generally something I have a hard time convincing people of. I've worked for spooky organizations. Not at the highest levels or on the most secret projects, but in the general vicinity. The government is not monitoring you. Not because they lack the legal capability (though they do, and that is mostly, but not always, respected), but because they lack the technical ability. There are only so many analysts, only so much computer time, only so much storage. Except in cases of explicit corruption or misuse of resource, those analysts, that computer time, and that storage is not being wasted on monitoring Joe and Jane average.

      I'm not going to say that there aren't abuses by the people who have access to some of this stuff; they are human and weak like the rest of us and are often tempted to take advantage of their situation I'm sure. In general however, unless you've done something that got a warrant issued for your information, the government doesn't care. They just don't have the resources to be big brother, even if they want to be.

      • by AmiMoJo (196126)

        There are only so many analysts, only so much computer time, only so much storage.

        The government has found a solution to that problem. Distribute the computing and storage requirements.

        These days if you want a license to sell alcohol in your shop you have to get agreement from the police, and they usually require you to have extensive CCTV systems covering the area outside your shop as well as inside it. They shift the burden of installing and maintaining the system to the shop owner and can access the video any time they like. If a crime is reported the shop owner gets a demand for CCTV

      • I'll give you credit for "this used to be true" back in the day when a computer was a 486 on a modem. It's absolutely not true any more.

        Govt is Big Brother, and they Like it. And they absolutely have the resources to do it.

        Why? Because all they need to do is a Red Flag system. Joe Average doesn't really produce that much data per day all by himself, and .gov isn't trying to perfectly reproduce the entire activity. They just need to know if something is getting juicy.

        "Look! Here's a 12 Gig file of Joe's acti

  • FTFS:

    The data repository, which currently has no name, is being developed for an unnamed customer,

    It's the tech equivalent of Prince - it's "the data repository with no name." We can denote it with some sort of unicode glyph that slashdot will mangle.

    And of course it has amazingly fast read speeds - if each drive has a 32 meg cache, that's 6.4 terabytes just for the cache.

    BTW, it's for the ^@#%^&^+++NO CARRIER

  • Perhaps this cluster can load Deus Ex : Human Revolution levels in a reasonable amount of time!
  • Run around with a shopping cart and swap out drives as they fail. Kind of like they did back in first computer days with vacuum tubes.

  • With 200,000 hard drives, won't there always be at least one hard drive that is failing? You'll need an IT guy 24/7 swapping out the failed drives. As soon as he swaps out one drive, another one will fail. It just seems kinda ridiculous.

    • by SuperQ (431) *

      This is what MTBF is all about. "Enterprise" drives are rated at 1.2 million hours MTBF. 1,200,200 hours / 200,000 drives = 6 hours per drive failure. Not too bad, only 4 a day.

      • How long does it take for the cluster to rebuild after a drive fails, and does this involve downtime?
        • by h4rr4r (612664)

          Even ancient RAID5 implementations are not that bad. Most likely this is really some sort of RAID over RAID over RAID, or some sort of RAID like software that does similar actions. This means no downtime and most likely nearly no speed costs for a single drive.

    • Since they can't backup to tape, maybe they will convert their old tape library to swap out hard drives.
    • by Jeng (926980)

      I would guess that would be the reason for the water cooling, to increase the drives reliability.

      Also from the article it sounds like they may have more than 200,000 hard drives hooked up, but only use 200,000 at a time so the computer can automatically begin recreating the dead drive as soon as it occurs.

      • I'm assuming that IBM has better plumbers than I do; because "reliability" is not the first word that comes to mind when somebody suggests water-cooling 200,000 hard drives...
        • by guruevi (827432)

          Given that 'water' cooling in computers is never done with water (and most other closed systems besides cars are neither) but with an inert fluid it's not really that big of a problem.

          Even in home computers, "water" in water cooling (as some dweebs have indeed used tap water) has been known to calcify, have algae growths and/or corrode the components and there are a lot of other liquids that are better at transferring heat than water.

          Also, pure water (the undrinkable kind) is inert.

    • by Manfre (631065)

      http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/ [backblaze.com]

      Backblaze provides some metrics about their drive failure rates. It's surprisingly low (1-5% per year). If they had 200k drives, they would need to replace 39-192 per week. I'm sure the cluster is built with lots of redundancy that doesn't require a person to immediately replace a failed drive. They'll probably need a full time staff of at least 3 to maintain it.

  • This just kinda strikes me as who would need this. Backing up the entire internet has to take up some space.
  • If they could make a 120PB cluster using floppy disks, I would be much more entertained by this.

    • by rrossman2 (844318)

      Man.. and make sure its the 5.25" drives that love to chatter... kind of like a commodore 64 drive loading up flight simulator ][

    • by jandrese (485)
      Just for the heck of it, I worked out the math on this. Assuming 1.44MB 3.5" floppies, you will need 83,333,333,333 disks to store all of that data. Not even accounting for the drives, the disks alone would fill a volume of 2,240,418.91 m^3 (591,856,062 US gallons). I don't know for sure, but I suspect that number exceeds the number of floppies that have ever existed, although it is only about 12 floppies for every man, woman, and child on the Earth.
  • Someone should manufacture industrial sized hard drives for this type of application. Like full height x2, so you could cram 30 platters in there.

    • It's not as straightforward as that, because current multi-platter hard drives have all the read heads attached to the same "tonearm" (I don't know the proper term). So even with a 4-platter drive, you can only read 1 platter at a time, I assume, unless they somehow sync the platters together. With a 30-platter drive, your throughput would be much worse than with 10 3-platter drives, because you would have 10 times the usable read-heads at any given moment.
  • If this were for an American spy agency, maybe that would be enough. But when I think about how I have ten times this much data in my Gmail, and that Gmail isn't limited to only the US, I suspect that Google has a lot more storage space than this. Of course it's probably all very decentralized.
    • by guruevi (827432)

      Not everybody has more than 400MB in their e-mail account and a LOT of that can be compressed or de-duplicated (spam). Google doesn't need THAT much. I think for ALL their data they're probably close to 100 PB which is again, not all that impressive these days. Off course they have it redundantly in every data center so their capacity is much larger.

      From a scientific standpoint, this would be capable of storing video of everything a person has seen in his life or when running a simulation of the Universe, s

  • It would be 122PB. 2PB lost on bad marketing. Gimme my 1024 bytes back. But all-in-all this isn't that surprising. You can get 1PB in a 42U rack these days.

    As a fun side note: You'll also need 122PB of tape storage (or 1.5 systems like this) just for backups. That's a lot of tape.

  • Hard to image? Yes. But forty years ago, the largest computing center on earth had 57GB of disc storage.
  • We know the capacity. We know the transfer rate. But how quickly do disks need to be moved in and out of the system in order to keep it running?

    200,000 is a lot of disks. I assume they are all hot swap with a great deal of redundancy because I would expect multiple drive failures every day. A raid0 with that many disks might never boot.

  • Just curious if anyone has experience managing large, mechanical disk arrays, if you installed an array of such a size using identical hard drives and bringing everything online relatively at the same time, would there be an increased likelihood of ALL the drives dying at roughly the same time? Could failure statistics bite you with enough simultaneous failures to negate redundancy?
    • by SuperQ (431) *

      I do manage large storage farms in the petabytes range. There is a curve to the rate at which disks die. It mostly seems kinda obvious.

      #1 - Infant mortality. I see a bunch of drives fail within the first few months of a new install.
      #2 - Increased death rate as the drives age. Usually when the drives start to reach the warranty age. This can be accelerated depending on the IO load of the system.

      There's a lot of great info out there. Here's one good whitepaper:
      http://static.googleusercontent.com/externa [googleusercontent.com]

Lisp Users: Due to the holiday next Monday, there will be no garbage collection.

Working...