Forgot your password?
typodupeerror
Hardware Technology

Grid Processing 130

Posted by Hemos
from the all-together-now dept.
c1ay writes "We've all heard the new buzzword, "grid computing" quite a bit in the news recently. Now the EE Times reports that a team of computer architects at the University of Texas here plans to develop prototypes of an adaptive, gridlike processor that exploits instruction-level parallelism. The prototypes will include four Trips(Tera-op Reliable Intelligently Adaptive Processing System) processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second. In an age where clusters are becoming more prevalent for parallel computing I've often wondered where the parallel processor was. How about you?"
This discussion has been archived. No new comments can be posted.

Grid Processing

Comments Filter:
  • by PakProtector (115173) <cevkiv@g[ ]l.com ['mai' in gap]> on Monday September 15, 2003 @08:53AM (#6962911) Journal
    To make a brick of these things, or some kind of cube, with massive processing power that one could just carry around and interface with via their PDA?

    Just think about carrying around something as fast, if not faster, than your desktop that fits in the palm of your hand.
  • by exebeoex (561339) on Monday September 15, 2003 @08:55AM (#6962928)
    A question for anyone with such experience:

    I assume it would be somewhat difficult to program efficiently for such systems. I don't mean just getting programs to run, but getting the most bang for your buck. Can anyone here confirm or deny this? Also does anyone know where to find resources on the topic of programming such machines (and no, I am not talking about smp docs or bewoulf docs or even pvm docs)?
  • by pridkett (2666) <slashdot@wagCHEETAHstrom.net minus cat> on Monday September 15, 2003 @09:00AM (#6962953) Homepage Journal
    This is not an example of the Grid Computing (ala Globus [globus.org]) that we've been hearing about. This is another example of laying out processor cores on a chip. So a better thing would be to compare this to the ideas for the UltraSPARC V and IBM BlueGene computers where multiple processing cores are put on one chip and then arranged in a grid (think physical grid) architecture.

    Grid Computing deals with computation and information sharing seemlessy across a network, they used to always say like how the power grid works. Which in reality is about right as it doesn't always work as advertised.

    Anyway, Grid Computing is mainly concerned with software to allow multiple computers to work together seemlessly. This includes registry services, single sign of, information transfer, etc.

    This appears to be the rather fortunate result of a phenomenon called "Buzzword collision", where two different projects pick the same buzzword in hopes to really confuse people who don't read the articles and trick PHBs into thinking that each project is ueberimportant.
  • by Tangurena (576827) on Monday September 15, 2003 @09:04AM (#6962966)
    Transputers were processors designed from the ground up for parallel processing. Have been around for years, but no one in America noticed them. Therefore they did not exist. I am surprised at the constant reinvention of the wheel, because of the NIH principle (Not Invented Here).

    There are some programming languages designed for parallelism. Biggest hassle is efficiently partitioning problems into something parallel. Not all problems can be done faster by doing more of it at once.

  • by Rolken (703064) on Monday September 15, 2003 @09:08AM (#6962989)
    They do work on the same principle though. It's just that grid computing on a network involves processors that are vastly separated and consume different resources, whereas the "new" grid computing involves tightly bound, hardwired processors that share resources. It's not like you have to be an engineer to figure out the difference... and if you don't read about it and you get confused, that's your own fault. ;)
  • BS & hype (Score:5, Interesting)

    by master_p (608214) on Monday September 15, 2003 @09:10AM (#6963004)

    The prototypes will include four Trips processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second.

    At 32 nanometers, Intel could put tens of HT pentium cores on a single chip, achieving the same result.

    "One key question is, Will this novel architecture perform well on a variety of commercial applications?"

    For computational problems that can be broken down into parallel computations, the answer is yes. For all the other types of problems, the answer is no. Although I have to admit that most algorithmic bottlenecks is in iterative tasks that are highly parallelizable.

    On Trips, a traditional program is compiled so that the program breaks down into hyperblocks. The machine loads the blocks so that they go down trees of interconnected execution units. As one instruction is executed, the next one is loaded, and so on.

    *cough* EPIC *cough* VLIW architecture *cough*

    I support parallelism and I am looking forward to seeing it on my desktop, as it will increase the computational power of my computer tremendously. Unfortunately, it will mean new compilers and maybe programming languages that have primitives for expressing parallelism.

    By the way, the transputer [google.com] chip was promising. The idea of lots of computational units running in parallel is nothing new(maybe each memory block must have its own processor to locally process and compute the data).

  • by *weasel (174362) on Monday September 15, 2003 @09:17AM (#6963045)
    ... because nearly all programs are data-centric. parallelizing execution of code has an upper-bound with regards to increased efficiency, particularly when considering the increased overhead in memory management and control flow.

    parallelizing the data-processing itself (Eg Seti@Home) whereby the data being worked on itself is spread amongst 'loosely parallel' execution units is much more practical, and doesn't suffer from the overhead involved in creating parallel processor servers, or even parallel execution chips. It also alleviates the memory bottlenecks of parallel execution cores.

    I always wondered what kind of an app demands the kind of big iron that Cray and NEC churn out - that couldn't be more cost effectively realized through distributed processing amongst many independent computers (a la Google).

    It seems, even cyclical, result-dependant processing (weather prediction) could be coded to work in such a manner.

    1000 bare bones p4 3ghz PCs (~$600) have more processing power ( 2500 MFLOPS each ) than a single X1 cabinet ( 819 GFLOPS @ $2.5M ) and as you can see - for less than 1/4 of the cost.
    ( 2.55 TFLOPS @ $600,000 vs 819 GFLOPS @ $2.5M )
    ( p4 MFLOPS hit 5700 each w/ SSE2 )

    Now I imagine there have to be exceptions. There -has- to be a reason to have such big iron for certain problems. There must be a reason that very smart people advise their superiors to buy up around $8b of this stuff each year.

    but i don't personally see the applications, and given the monumental cost of developing a new processor nowadays - the market doesn't seem to either.

    so that's my $0.02 as to why more complex esoteric parallel execution designed chips remain so rare.
  • by Adm1n (699849) on Monday September 15, 2003 @09:45AM (#6963253)
    Hypercube Theory handels this quite well. Addressing would be n-dimensional you can google hypercube and find lots of nifty SGI doc's for thier old Onyx architechure but it also applies to Beowulf's, PVM, MPI, Cray and any other massivlly parallell architechure. This would be a hypercube on a chip as opposed to a hypercube of chips. And I'm not going to mention the complexities of Queing theory but at 32nm it's a Doctoral Thesus waiting to happen.
  • by Rufosx (693184) on Monday September 15, 2003 @10:24AM (#6963624)
    If this really was just a grid layout of cores on a chip, then no, I would not call it grid computing.

    But from looking at the diagram and rereading the article a few times, I think this goes far beyond that and approaches something that really could be called grid computing.

    Instead of just being issued instructions from a central control unit, these units seem to have far more developed abilities to communicate with each other and work together. Not just for the issuing of instructions, but during execution.
  • by AlecC (512609) <aleccawley@gmail.com> on Monday September 15, 2003 @10:27AM (#6963661)
    This is very much not new. The basic idea has come and gone several times in the last twenty years, to my knowledge. Both SIMD and MIMD systems have been tried several timed. NCR even had one called the Grid, IIRC. Thinking machines (as seen on Jurassic Park I). The Inmos tranputer was designed for exactly this sort of connectivity. Intel had a development machine (?iWarp?) which tried to use it. And I am sure there were others that I don't recall. (As a user and fan of the transputer, I used to follow the field from a distance).

    But the problem has always been the programming. Ordinary software does not map very well onto these architectures. Certain specific problems can be mapped well onto them, which results in spectacular performance claims for the system. But generally such systems perform well only on those problems for which they were specifically designed.

    Communications is a common reason for failure. They scale very badly. In the early days of development, the first few processors have any-to-any connectivity, so the application will really fly. But since the connectivity rises as the square of the nuymber of processors, this cannot hold for very long. As soon as connectivity becomes limited, communications bottlenecks start to appear, and you get processors being held up either sending messages or waiting for them to arrive. Buffering (which many did not implement in their communications architectures) helps, but itm doesn't solve the problem. (A bit like lubrication - a small amount brings a considerable improvement in performance, but past a certain point, it only adds to costs).

    Another problem is load balancing. It is very difficult to design your system so you don't end up with most of the CPUs waiting for one, overloaded, CPU to finish its job. The only architectures which really worked were the farm model - a central dispatcher sends tasks to a "farm" of identical "workers", which therefore request work units as and when they need them. This means that the whole code for the system has to be loaded into each worker; not necessarily a killer at todays memory prices, but it would be nice to be more efficient. It also requires the task to be divisible into a vey large number of chunks, which can executed independently without too much communications. OK for large volume simulations etc., but a disaster for (say) database programming, image/voice recognition.

    It also doesn't help that not may people really think multi-threaded in their program design. Again, no-one that I know has a good Object Oriented multi-threading model. Current models are analagous to either pre-structured programming or early structured programming. Which means that people, reasonably, approach multi-threading as a dangerous monster to be approached only whan absolutely necessary, with great care, and if possible in flame-proof armour. For this sort of system to be much use we need a development which does to current threading what inheritance did to pre-OO languages: something that makes is so simple that, one over the hump of initial unfamiliarity, people use it all the time without even thinking about it.

    I designed one of the larger heterogenous transputer based system to ship - up to 100 transputers in 6 different roles. Load and communications balancing was a real hassle from the the day the system first started to work for real, and we were constantly tuning buffers, fiddling with routing algoirithms, movong bits or processing from this CPU to that to get the perfomance up. (Not to mention that inmos completely blew their second generation transputer, which we had been hoping would solve many of our problems).
  • by AlecC (512609) <aleccawley@gmail.com> on Monday September 15, 2003 @10:32AM (#6963703)
    Obviously there's a lot of work to be done in parallel processing. You can hardly blame Inmos's problems on geography (or America for Inmos's problems). They looked very promising for awhile, but just didn't keep up.

    Seconded, loudly. Inmos was a classic case of great engineering trashed by lousy management. When the transputer came out, it was fantastic, leading edge stuff. But inmos turned everybody off bay saying that you had to use it their way and no other.

    The thing that shows how good the transputer was that it was still selling ten years after it first came out, when it had been overtaken and lapped several times by conventional CPUs. But that cannot go on for ever - by the time they died, you could simulate a tranputer in a conventional CPU that cost less but ran faster.

  • by pmz (462998) on Monday September 15, 2003 @03:30PM (#6966844) Homepage
    Good examples of parallel processing are the ones we know...

    On a coarser level this also includes any multi-user UNIX system that is actually used by multiple users. While not allowing per-person scaling, it allows very significant institutional scaling.
  • Good question (Score:3, Interesting)

    by epepke (462220) on Monday September 15, 2003 @03:38PM (#6966918)

    I spent 13 years at the Supercomputer Computations Research Institute, an interdisciplinary research institute whose job it was to figure such things out. Amongst other goodies, we had the first CM-2 (a SIMD box with 65536 processors) with floating point chips, at the time the fastest machine in the world. We also had a homegrown machine for quantum chromadynamics. And a cluster with 150+ nodes, and some shared memory machines, yada yada yada. Lots of stuff.

    So, from my experience:

    It's a little bit tricky to do. Sometimes you find an algorithm that someone abandoned fifty years ago that turns out to map better onto the hardware. However, it isn't all that tricky to do, and there are plenty of algorithms and libraries to make the job easier.

    But it still doesn't happen anyway, because even a small amount of work is more than no work at all. And besides, what people want to do is run their old dusty decks but just have them run faster. And in the mean time, Intel has just come out with a faster scalar processor, so why bother?

    The only thing I can see coming out of this is if, say, NVidia makes a faster graphics card based on it.

Mathematicians stand on each other's shoulders. -- Gauss

Working...