Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Supercomputing Hardware

TeraGrid Gets an Upgrade 125

The Fun Guy writes to tell us The NSF has awarded $48 million to the University of Chicago to operate and expand TeraGrid over the next five years. TeraGrid is 'a national-scale system of interconnected computers that scientists and engineers are using to solve some of their most challenging problems. TeraGrid is the world's largest open computer, storage and networking system. Only the U.S. Department of Energy's weapons laboratories have larger systems, which are dedicated to classified research.' Currently, the TeraGrid's power is just over 60 teraflops.
This discussion has been archived. No new comments can be posted.

TeraGrid Gets an Upgrade

Comments Filter:
  • But (Score:2, Funny)

    by hungrygrue ( 872970 )
    Does it run Linux? Oh, I see it does... nevermind.
  • I like it - maybe they can define PI down to an even greater degree. !!
  • How does this... (Score:4, Interesting)

    by Odin_Tiger ( 585113 ) on Wednesday September 21, 2005 @06:31PM (#13617561) Journal
    ...stack against the likes of distributed.net and other similar projects for processing power?
    • Re:How does this... (Score:2, Informative)

      by 777v777 ( 730694 )
      As a user of teragrid, as well as other huge machines, There are some embarassingly parallel tasks like SETI at home which can be easily run on distributed systems. There are other problems where this is just out of the question. The Teragrid clusters will be much better for these types of problems.

      Tightly coupled problems just cannot be run efficiently even on clusters of workstations(COWs). It is the age old topic of using the right tool for the right job.
      • How well does Teragrid change things? The data connection to each site is much larger, but each site has a large number of computers too, which would seem to offset the link speed.

      • There are other problems where this is just out of the question... Tightly coupled problems just cannot be run efficiently even on clusters of workstations(COWs).

        If I can grab you attention for just a sec: Do you know of any books [or treatises or papers] that deal with the question of whether [some given class of] problems might be provably non-parallelizable?

        Heck, if you could just give me a few keywords to Google, I'd be really grateful.

        Thanks.

        • One paper which might help point in the right direction is "Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures" by Grama, Gupta, Kumar. You pose a very interesting question. Any application where you have a large number of steps, each step relying upon the result from the previous step, and each step independently not parallelizable would probably fit your description. I don't know of anything off the top of my head where you couldn't parallelize some portion of it, but it i

          • Other things like large matrix multiplications or FFTs or N-body problems do not scale as well. In these cases as you subdivide the problem into smaller pieces for your larger number of machines, the computation on each processor will quickly become small while the communication between processors will become more significant.

            But has anyone attempted to create a systematic [or systematizable] framework within which you might be able to prove that a certain problem was necessarily non-parallelizable?

            Cf

            • I have a non-parallelizable algorithm for you. Apply a non-associative operation to elements of an array like this:

              result = (a[0] * (a[1] * (a[2] * (a[3] *(....)))))

              Note that I use * to represent some binary operator that satisfies non-associativity. I think that this algorithm may be provably non-parallelizable, since the innermost * operation must be performed before any other * operations. Thus no two * operations can be done at the same time, and thus none of the * operations can be parallelized. Furthe
            • It has been attempted, but with modest success...

              The class of P-complete problems is widely held to be inherently sequential.

              Proving P-completeness amounts to proving that your problem is sequentially solvable in polynomial time (membership in class P), and proving that it can be reduced to another P-complete problem in polylogarithmic time on a parallel machine. Thus, you can establish that it is at least as hard to parallelize as the other P-complete problems.

              There is, to my knowledge, no formal proof tha
      • Teragrid is a mixed bag... A lot of out clustered needs don't scale much beyond 16-32 nodes. And we need 16-64 GB per node to handle the data sets. And we need fast storage, since the data sets we use are huge. With Teragrid, we'd get bogged down just uploading the data.

        Just look at:
        http://access.ncsa.uiuc.edu/Releases/09.19.05_Berk eley_L.html [uiuc.edu]

        I mean, the Berkeley geophysics people had to build their own. So they called on NCSA people to help build them one.

        If you have a limited budget, NSF fundig for most
  • I want one (Score:5, Funny)

    by stunt_penguin ( 906223 ) on Wednesday September 21, 2005 @06:35PM (#13617581)
    but only if it comes in white....
  • TeraGrid is 'a national-scale system of interconnected computers that scientists and engineers are using to solve some of their most challenging problems.

    Replace "challenging" with "Parralell"
    • Replace "challenging" with "Parralell"

      Replace "parralell" with "parallel"
      • What did you expect from someone who calls themselves "AvitarX."

        You would think they would check the spelling before committing to a user name. Of course I think many people name their children without knowing the first thing about English linguistics and that is a rather more important decision.
        • I take no offence, I am too lazy to spell check my posts or learn to spell so what.

          But the name was mispelled to get an AOL screen name many years ago.

          PS. Anonymous coward, if you are going to waste your time defending/attacking people on /., do so with a name.
  • Imagine... (Score:2, Redundant)

    by merreborn ( 853723 )
    A beowulf cluster of these!
  • by gardyloo ( 512791 ) on Wednesday September 21, 2005 @06:40PM (#13617611)
    A good slashdotting should be just what they want to test their servers.
  • I wonder when the cluster discovers it's own existance and deticates that there is no need for the human race.
    • Just before someone plugs in an ethernet cord OF DOOM! [scifi.dk]

      Heh. Seriously if a computer gone bad ever traps me in a building, and tries to kill me off. The first thing I would do is take some power cords, cut them open, and wrap the stripped wires in them around each other. I would then go around the building, and plug those wire into any outlets I could find. The resultant short would ground the building's power eventually overloading the circuit breakers, and disabling all of the security devices that aren't o
      • And you'd then be completely screwed as a serious security door will have deadlocks to stop you doing just that!

        Just don't fry the comms kit so someone outside can hopefully come and find you...
      • hmm all etherkillers i've seen before have had a joint in them but i can't see one in that one.

        the mains plug is moulded and i'd think it would be damn near impossible to terminate mains flex in a rj45 and also i don't see any wires running down the rj45 though its hard to tell.

        I think they just stuffed the mains flex into the boot of the rj45 without actually terminating it not really made an etherkiller.
    • The TeraGrids have already won.
  • by malraid ( 592373 ) on Wednesday September 21, 2005 @06:43PM (#13617638)
    ...Windows Vista!!

    Had to say it, sorry!
  • ...giving a whole new meaning to Teraflops.
  • ... until I make that system my personal zombie!
  • by alienfluid ( 677872 ) on Wednesday September 21, 2005 @06:50PM (#13617674) Homepage

    hah .. you could just put 60 xbox 360s together to achieve that kind of power ..



    xbox 360 specs [xbox.com]


    • by FLAGGR ( 800770 ) on Wednesday September 21, 2005 @07:07PM (#13617779)
      Just incase you weren't joking.

      Microsoft can say they get x amount of tflops, and lets pretend for a moment that theoretically they are telling the truth. In reality, things *never* get to their potentials because of bottlenecks, unless of course your building super computers and have millions to invest. Microsoft and Sony can play the TFlops game, but in the end they aren't that powerful (as ArsTechnica has reported based on developer comments, they are much closer to 2-3 times the current generation in power)

      I know you were probably joking, but you were modded insigtful and I couldn't help myself.
    • by CTho9305 ( 264265 ) on Wednesday September 21, 2005 @07:11PM (#13617806) Homepage
      CPU Game Math Performance
              * 9.6 billion dot product operations per second

      9.6GFLOPS*60=576GFLOPS. That's not even remotely close to 1TFLOPS, let alone 60 TFLOPS. You're off by 2 orders of magnitude.
    • The xbox 360 is going to use a Cell processor. The cell processor it is going to use is capable of 256GFLOPs single-precision and 25GFLOPs double-precision.

      Double-precision is all that matters in most scientific apps.

      25*60 = 1500GFLOPs = 1.5TFLOPs
      You'd need 2400 xbox360s to get to 60 TFLOPs.

      Also, Xbox360s have 512MB of RAM. This would not make for a very useful cluster node.
  • So they have computer systems larger than TeraGrid for weapons research? Imagine if the Department of Energy applied those resources to improving or replacing gasoline, supplying California's nearly-insatiable demand, creating more efficient power...

    ...you know, developing sources of energy.

    • by njcoder ( 657816 ) on Wednesday September 21, 2005 @07:06PM (#13617772)
      "Imagine if the Department of Energy applied those resources to improving or replacing gasoline, supplying California's nearly-insatiable demand, creating more efficient power..."

      Imagine if they used it to make ice cream!

    • I imagine if they just didn't run this grid there'd be more than enough energy to go around.
    • DoE currently has a 136TFLOP cluster [top500.org]. They use it for more than nuclear weapons research.
      • I didn't say anything about nuclear weapons, but I suppose that's what the energy department would be researching in that field.

        I'm just suggesting that perhaps weapons could be left entirely to the Department of Defense, the Department of Homeland Security, and/or the military.

        Then all of the power of that cluster would be available for energy-related research and analysis.

        • I'm not sure that solving the worlds energy problems has an application that would benefit from running on a cluster. Clusters are only good at certain things, obviously crunching a lot of data, sometimes data streaming in realtime. For instance, when searching for new oil, companies will place hundreds of microphones in the ocean, then detonate a small ammount of TNT and record the reflections off the ocean floor. This data needs to be processed in realtime, and many calculations need to be done on the
    • That would be a great idea -- if the major bottleneck in developing new sources of energy were a shortage of computational cycles. I'm no expert on the subject, but I'd be surprised if it were.
    • One of the DoE's primary marching orders is nuclear weapons research. They need computers that large to simulate nuclear explosions to determine yields, burst effects, etc. I'd much prefer them doing this on a super computer than testing the warheads out in the south pacific.
    • ...you know, developing sources of energy.
      But maybe Nuclear Weapons research is developing sources of energy. Perhaps the DoE's plan is to turn the entire Middle East into a solid sheet of glass and then go and steal all the oil.
  • by Salis ( 52373 ) on Wednesday September 21, 2005 @06:52PM (#13617691) Journal
    it makes me smile.

    It's just so ... cool.

    (And the only people who I say that to are my research group members and ... the people of Slashdot!)

    The TeraGrid is well managed too.. very few problems for such a huge system.
    • I'm curious - could you please let me know what you're doing with it? And whether any special approach was needed to make best use of the system?

      In other words, suppose I threw a standard text processing job at it (trawl through 2 gigs of disparate log files, correlate, spit out unified log). Simple enough task and on the machine I use it takes about 4 minutes to complete. If I took that and ran it on the TeraGrid with no special thought to that environment (it's a single-threaded Perl script), would it r

      • It's an MPI environment. MPI stands for Message Passing Interface. It's a library of subroutines that allow you to send data from one processor to another. So if you wanted to process a 2 gig text processing, you could divide the text file into 500 chunks and have each processor perform some function on its chunk. Then you would collect all of the results together and save them.

        Certain programs are more 'parallelizable' than others. The programs I run (and code) are very 'embarrassingly parallel'. That mean
      • As suggested by another post, you would want to parallelize your job to make it run on some number of the machines simultaneously. Using MPI could be one way of doing this. However, lets say you have an application which generates terabytes of data, and then processes it, a system like this with tons of fast storage and high bandwidth networks would be useful.
      • A single-threaded Perl script would use one CPU on one node. The individual CPUs are pretty good, but only comparable to, say, a modern high-end PC.

        If you wanted to run 500 copies of your single-threaded Perl script, they'd probably all finish in about 4 minutes -- but that doesn't make very good use of the system. You could get the same results using something along the lines of SETI@Home or distributed.net.

        What makes TeraGrid special is that it's a whole bunch of CPUs (along with lots of memory, disk, a
    • The TeraGrid is well managed too.. very few problems for such a huge system.

      For $48 million, one should hope so. There are "national assets" of other federal agencies that don't get anywhere near that kind of funding for managing much more data. It really sounds like someone brought the bacon home from Congress.

      • $48 million isn't much for multiple supercomputers.

        Did you look at the price tag for IBM's Blue Gene or Japan's Earth Simulator? Yeah...much more than $48 mil.

        And I can't use either of those nice government funded ones! Bastards. They won't even accept an allocation request. ;)

        "But I only want to use 15 000 processors for an hour!" Heh.
    • > The TeraGrid is well managed too.. very few problems for such a huge system.

      As an admin on the UC/ANL Teragrid cluster, I thank you for the compliment. Keep computin'!
    • How does one get an account on TeraGrid?
      • If you're a member of an academic institution, you can just submit a proposal to the NSF to apply for time on the TeraGrid. I've heard at conferences that it's quite easy to get time, provided you give them a good account of the time complexity of the algorithm you want to run on their machines. And of course, that you attempt to answer a science question :-)
    • by Anonymous Coward
      The TeraGrid is well managed too.. very few problems for such a huge system.

      Actually it isn't. Each site has it's own way of doing things, its own software stack (NMI alleviates some of this), and its own particular configuration. It translates to a bunch of clusters interconnected by a high-bandwidth, low-latency network.

      Ever ran a cross-site application?
  • I really hope ... (Score:1, Interesting)

    by Anonymous Coward
    ... that some of that money is going to go towards securing the system [rawstory.com]. :-\
    • I wouldn't take the rawstory.com story too seriously.

      For one thing, the author seems to think that "Grid" and "TeraGrid" are the same thing. A Grid is a generic term for a set of computer resources, possibly spread across multiple administrative domains, working together using Grid software (such as Globus http://www.globus.org/ [globus.org]). The TeraGrid is one specific Grid project.

      Beyond that, I don't know why he thinks the Department of Homeland Security has anything to do with this. The TeraGrid is not, as far
  • A Beowulf cluster of those!

    Oh, wait...
  • The NSA certainly doesn't have computers anywhere near this. No siree.
  • Wasnt there a movie about a computer like this? Get it up to about 100 terraflops and then throw in a little AI and next thing you know, it will only talk to you if you call it cybernet...
  • This round of Teragrid funding is $150 Million. PSC [psc.edu] got $52 Million. The rest is split up among the other 7 institutions. The other major partners already got big awards under earlier rounds of funding.

    Oh, and this news is a month old.

  • I understand It's 'powerful'... but... will it run Half-Life 2: Lost Cost?
  • I looked on their site and I can't see anything about getting an account (except for a form to fill-out to add users to an existing account)?
  • Ranked: #38, Name: TeraGrid, Itanium2 1.3/1.5 GHZ, Owner: NCSA, Country: United States, Year built: 2004, Interconnect technology: Myrinet, Number of processors: 1776, Manufacturer: IBM

    Ranked: #1 Name: BlueGene/L eServer Blue Gene Solution, Owner: DOE/NNSA/LLNL, Country: United States, Year built: 2005, Number of processors: 65536, Manufacturer: IBM.

    Is it kind of wierd that in /usr/include/limits.h:

    /* Maximum value an `unsigned short int' can hold. (Minimum is 0.) */
    # define USHRT_MAX 65535

    and
    • I'm confused as to what the problem is here... I suppose each processor has an integer ID, ranging from 0-65535 (for a total of 65536 ID numbers). It seems like they simply filled the system to the maximum.
    • That is a convenient number if you look at the bluegene configuration. Bluegene/L at LLNL has 64 racks, each with 1024 processors. 64*1024=65536. or maybe this is the 32 racks with 2048 processors. The counting of these things is ambiguous, but Each rack has 1024 Nodes(each node having two processors). And can be used in two modes, a coprocessor mode where one just does network stuff. All this information is public, so you can search for it on Google. It is just a happy power of two that you found.
    • USHRT_MAX has to be one less than a power of two on any system that uses binary integers (i.e., on just about any system in the real world); 65535 is the minimum allowed value. It's not a significant limitation; if you need bigger numbers, use a bigger type.

      For various reasons, the number of nodes in a system is often conveniently some power of 2.

      The fact that USHRT_MAX+1 and the number of nodes in the BlueGene/L system happen to be the same power of 2 is purely coincidental. It's conceivable, I suppose,
  • Sombody make sure Cyberdyne systems was NOT one of the vendors involved......
  • In case you are wonder what the DOE needs teraflop weapons computers for...

    think ICBM, baby... http://www.sandia.gov/media/online.htm [sandia.gov]
  • Hmmm, TerraGrid... can anyone say 'SkyNet'?

It is easier to write an incorrect program than understand a correct one.

Working...