Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

IBM's High Performance File System

Posted by Zonk on Fri Mar 10, 2006 01:19 PM
from the but-who-will-use-it dept.
HoosierPeschke writes "BetaNews is running a story about IBM's new file system, General Parallel File System (GPFS). The short and skinny is that the new file system attained a 102 Gigabyte per second transfer rate. The size of the file system is also astonishing at 1.6 petabytes (petabyte == 1,024 terabytes). IBM has up a page with more information and specs on the system.."
+ -
story

Related Stories

[+] Petabyte Storage Array 185 comments
knight13 writes "Engadet is reporting that EMC is rolling out a petabyte RAID array. From the article, "And if you're ready for that level of storage, there's now someplace to get it: EMC has launched its first petabyte array, a version of the company's flagship Symmetrix DMX-3 system that includes nine room-filling cabinets of drives." The price? A mere $4 million."
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by kperrier (115199) on Friday March 10 2006, @01:21PM (#14891656)
    There is nothing new about GPFS. Its been around for years.
  • But what kind of performance does this give on relatively small ( 10Tbytes) file systems? Petabyte arrays are still kind of out of reach for most.
    • by KDan (90353) on Friday March 10 2006, @01:39PM (#14891835) Homepage
      You puny geekling. It's been years since I migrated my enormous collection of pr0n to my petabyte array...

      Running out of space too... maybe I should build a beowulf cluster of them.

      Daniel
    • Re: 10 Tbytes? (Score:5, Insightful)

      by chris_eineke (634570) on Friday March 10 2006, @02:01PM (#14891994) Homepage Journal
      relatively small ( 10Tbytes) file systems
      Seagate recently released a 500GB hard-drive. It costs $431.99CAD. 2 of them makes 1 TB. 2000 makes 1 PB. (Yes, that's overly simplified because it doesn't take into account interconnection cost, cooling, hydro, &c.)

      2000 x 431.99 = $863,980CAD

      I don't think that that's a lot of money for a petabyte raid. Hell, you might even get a 20% discount. Now think back about 20 years. That sum of money could have bought you 1 GB - that is an order of magnitude less in hard drive space. But here is the kicker:
      Approx. 20 years down the road you will get at least two magnitudes more for the same amount of money (wo/ inflation). Why? Because approx. 30 years ago, that sum of money bought you 1 MB of space.

      Ray Kurweil calls it the "Law of Accelerating Returns" [kurzweilai.net]. 20 years down the road I will call it my petaporn array . Or maybe better not [peta.org]. ;)
    • by Linker3000 (626634) on Friday March 10 2006, @02:12PM (#14892109)
      Typical porn movies per hour (TPMH)??
      • From the articles I've read, this was accomplished using (some subset of) ASC Purple, which is full of a lot of either custom or IBM-proprietary stuff (or else stuff that nobody but IBM seems to be using).

        According to the published/unclassified spec sheet [llnl.gov]:

        "Purple has 2 million gigabytes of storage from more than 11,000 Serial ATA and Fibre Channel disks. ... Each login node has eight 10-gigabytes-per-second network connections for parallel file transfer protocol and two 1-gigabyte-per-second network connections for network file systems and secure shell protocol. The system has a three-stage 1,536 port dual plane Federation switch interconnect ..."

        I think that it was this last thing, the Federation interconnect, that they were pushing the data over in this test, since it forms the backbone of the machine and links the storage nodes to the login node controllers, which then connect to the login nodes themselves (of which there are apparently over 1,400 of, according to this [llnl.gov]). I couldn't find much information on Federation, as it seems to only be used in a few systems, of which Purple is the most notable. One reference [sandia.gov] I found seems to put it at 1.49 GB/sec (11.92 Gbit/s) bandwidth, although it's not clear if that's "dual plane" Federation or not. 4X SDR Infiniband is around 10 Gbit/sec, IIRC, so Federation's a little faster.

        It does sound a little like it was a case of "hey, what can we do with $230M worth of hardware? I know, let's break some records." So they did. I'm not sure that there's anything there that anyone else couldn't do, with different technologies, given the same investment of capital -- it's just a matter of who else wants to, and has the capability.
  • Can I use it? (Score:4, Interesting)

    by ShieldW0lf (601553) on Friday March 10 2006, @01:24PM (#14891696) Journal
    Is this stuff available in a fashion where we might see it ported for use on standard x86 hardware? Is it GPL'd? I want this in my living room!
  • Fast Stuff (Score:4, Funny)

    by britneysimpson (960285) on Friday March 10 2006, @01:27PM (#14891719) Homepage Journal
    Wow that"s fast stuff, plus with the ability to slow light to save energy IBM should have some great new systems coming out!
  • by Anonymous Coward on Friday March 10 2006, @01:28PM (#14891741)
    I thought this article was going to be about IBM's HPFS from OS/2.
  • by Nom du Keyboard (633989) on Friday March 10 2006, @01:28PM (#14891742)
    I'm surprised that the content industries (read **AA) let them release this. After all, everyone knows that the only reason for large amounts of writable storage is to store stolen content and deprive artists of their just rewards. All things considered, I'm also surprised that IBM doesn't have to close a non-existent Analogue Hole, nor implement a Broadcast Flag to prevent the storage of infringing materials.

    That aside, how do I get one for my TiVo?

  • by frankie (91710) on Friday March 10 2006, @01:31PM (#14891766) Journal
    ...let's see if I can, never having heard of GPFS before 10 minutes ago:
    • GPFS is not new; GPFS 1.0 dates to 1998
    • IBM is touting its latest point update, v2.3
    • analogy: desktop PC is to BlueGene as RAID is to GPFS cluster

    It's basically data striping across 1000 disks. I suppose the hard part is coordinating all of that parallelism.

    So, could someone who actually knows this stuff tell me how well I did?

  • by Anonymous Coward on Friday March 10 2006, @01:39PM (#14891825)
    GPFS FAQ - http://publib.boulder.ibm.com/infocenter/clresctr/ index.jsp?topic=/com.ibm.cluster.gpfs.doc/gpfs_faq s/gpfs_faqs.html [ibm.com]

    GPFS Whitepaper - http://www-03.ibm.com/servers/eserver/pseries/soft ware/whitepapers/gpfsprimer.pdf [ibm.com]

    "GPFS is a cluster file system providing normal application interfaces, and has been available on AIX® operating system-based clusters since 1998 and Linux operating system-based clusters since 2001. GPFS distinguishes itself from other cluster file systems by providing concurrent, high-speed file access to applications executing on multiple nodes in an AIX 5L cluster, a Linux cluster or a heterogeneous cluster of AIX 5L and Linux machines. The processors supporting this cluster may be a mixture of IBM System p5(TM), p5 and pSeries® machines, IBM BladeCenter(TM) or IBM xSeries® machines based on Intel® or AMD processors. GPFS supports the current releases of AIX 5L and selected releases of Red Hat and SUSE LINUX Enterprise Server distributions. See the GPFS FAQ1 for a current list of tested machines and also tested Linux distribution levels. It is possible to run GPFS on compatible machines from other hardware vendors, but you should contact your IBM sales representative for details.

    GPFS for AIX 5L and GPFS for Linux are derived from the same programming source and differ principally in adapting to the different hardware and operating system environments. The functionality of the two products is identical. GPFS V2.3 allows AIX 5L and Linux nodes, including Linux nodes on different machine architectures, to exist in the same cluster with shared access to the same GPFS file system. A cluster is a managed collection of computers which are connected via a network and share access to storage. Storage may be shared directly using storage networking capabilities provided by a storage vendor or by using IBM supplied capabilities which simulate a storage area network (SAN) over an IP network.

    GPFS V2.3 is enhanced over previous releases of GPFS by introducing the capability to share data between clusters. This means that a cluster with proper authority can mount and directly access data owned by another cluster. It is possible to create clusters which own no data and are created for the sole purpose of accessing data owned by other clusters. The data transport uses either GPFS SAN simulation capabilities over a general network or SAN extension hardware.

    GPFS V2.3 also adds new facilities in support of disaster recovery, recoverability and scaling. See the product publications for details2."

  • binary prefixes (Score:5, Insightful)

    by Lord Ender (156273) on Friday March 10 2006, @02:03PM (#14892017) Homepage
    The submitter and editors need to learn their numeric prefixes. Come on! This web site is supposed to be for people who understand computer technology!

    A petabyte == 1000 terrabytes
    A pebibyte == 1024 terrabytes

    Please see the NIST definition page:
    http://physics.nist.gov/cuu/Units/binary.html [nist.gov]
    • Re:binary prefixes (Score:5, Informative)

      by Richard Steiner (1585) <rsteiner@visi.com> on Friday March 10 2006, @02:30PM (#14892300) Homepage Journal
      The new SI prefixes are nice and all, but there are three or four decades of prior usage that have to be unlearned before some of us will use them intuitively. Or at all. :-)

      Context-sensitive conversion of SI prefixes isn't all that difficult. Really. It's commonly understood that data is stored in powers of 2, and the subject is only relevant if (1) you're a sales type, or (2) you are being overly pedantic about an unwanted and unneeded SI standard.
  • by jm91509 (161085) on Friday March 10 2006, @02:05PM (#14892040) Homepage
    ZFS from Sun is 128-bit. According to this guy [sun.com]
    thats a whole load of data:

    "Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information [see Seth Lloyd, "Ultimate physical limits to computation." Nature 406, 1047-1054 (2000)]. A fully-populated 128-bit storage pool would contain 2^128 blocks = 2^137 bytes = 2^140 bits; therefore the minimum mass required to hold the bits would be (2^140 bits) / (10^31 bits/kg) = 136 billion kg.

    That's a lot of gear."

    • by FreeUser (11483) on Friday March 10 2006, @02:50PM (#14892544) Homepage
      "Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information

      Um, no, that's wrong.

      Bremmermann's Limit [wikipedia.org] is the maximum computational speed in the physical universe (as defined by relativity and quantum mechanical limitations) and is approximately 2 x 10^47 bits per second per gram (or, for those who prefer sexagesimal [jean.nu], one jezend [jean.nu], 60^11, bits per second per gram).

      Bousso's covariant entropy bound [elyseum.com] also called the holographic bound is a theoretical refinement on the Bekenstein Bound [wikipedia.org] that may define the limit of how compact information may be stored, based on current understanding of quantum mechanical limits, and is theorized to be equal to approximately one yezend [jean.nu] (60^37, or ~10^66) bits of information contained in a space enclosed by a spherical surface of 1 sq. cm.

      Given this, 1 kg of matter can perform approximately 2 x 10^50 bit operations per second per kilogram, in a space much smaller than 1 liter of space. Of course, other physical constraints (non-quantum related) probably limits us to a couple of orders of magnitude less computation, in a couple of orders of magnitude more space, but of course what those limits might be is very speculative
  • by localman (111171) on Friday March 10 2006, @02:31PM (#14892318) Homepage
    We used GPFS in our production environment for about 9 months in 2004/2005. We chose it specifically because it allowed several machines to share the file system (like NFS) but with file locking. It was also supposed to be very fault tolerant with no single point of failure. We set it up using a fiberchannel SAN.

    Unfortunately we had a lot of problems with it. For one, performance was quite bad in ceratin cases... doing an ls in a large directory would take a very long time. Doing finds would take a very long time. Once you had a specific file you wanted, opening and reading it was reasonable (though all disk ops were still on the slow side), but multi file operations lagged on the level of 10s of seconds or more. I think it was having to issue network checks to every machine in the set for each file or something.

    Also, the CPU usage was very high across all our machines, primarly from lock manager communications. It really taxed the system. And perhaps worst of all, it would caused crashes sometimes. A single machine in the set would die (usually a GPFS assert), and though that didn't break the set permanently, a multi-minute freeze on all disk reads would take place until the set determined the machine was unavailable. We spoke with IBM about all this stuff... provided debugging output and everything, we used the latest patches. But we never got the issues resolved. It was a very rough few months indeed. I probably averaged 4 hours sleep per night.

    When I say "slow" what am I comparing it to? In the end we switched to NFS and we came up with a somewhat clever way to avoid the need for file locking. NFS used the same SAN hardware, but had a single point of failure: the head server. We doubled up there with warm failover. The load on all servers dropped dramatically (I'm talking from ~40 load to ~.1 load). Disk operations were orders of magnitude faster. And we've not had a single NFS related lockup or failure in the past year and a half *knocks on wood*.

    Anyways -- GPFS probably has some good uses. But I would not recommend it for a very high-volume (lots of files, lots of traffic) mission critical situation. Unless they've made some major improvements.

    Cheers.
    • Re:Well.... (Score:5, Funny)

      by ackthpt (218170) * on Friday March 10 2006, @01:42PM (#14891861) Homepage Journal
      Atleast someone can make a new filesystem... *cough* Microsoft *cough*

      Oh, come now. They just finished winning their latest legal round on FAT [slashdot.org]

      Give them a moment to catch their breath, will you?

      introducing OrigamiFS, you write it out on paper then fold it in half as many times as you can

    • SCREW THAT!!! ;-) (Score:4, Insightful)

      by Ossifer (703813) on Friday March 10 2006, @02:12PM (#14892115)
      Do you even read your own links?

      the exact number in common practice could be either one of the following:
      • 1,000,000,000,000,000 bytes -- 1000^5, or 10^15.
      • 1,125,899,906,842,624 bytes -- 1024^5, or 2^50.

      Real geeks use powers of two; powers of ten we're only introduced for marketing purposes, which real geeks eschew.