Grid Processing 130
c1ay writes "We've all heard the new buzzword, "grid computing" quite a bit in the news recently. Now the EE Times reports that a team of computer architects at the University of Texas here plans to develop prototypes of an adaptive, gridlike processor that exploits instruction-level parallelism. The prototypes will include four Trips(Tera-op Reliable Intelligently Adaptive Processing System) processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second. In an age where clusters are becoming more prevalent for parallel computing I've often wondered where the parallel processor was. How about you?"
Yep (Score:1, Funny)
And before anyone says it, no I have ever thought about a beowolf cluster of those...
Re:Yep (Score:1)
!
Re:Terminator? (Score:1)
Re:For the rest of us (Score:3, Funny)
Which is why most of your tech jobs are being shipped overseas.
Re:For the rest of us (Score:1)
Re:For the rest of us (Score:1)
Interesting promise. I guess it depends on what you mean by "pretty much"...
Would it be possible... (Score:2, Interesting)
Just think about carrying around something as fast, if not faster, than your desktop that fits in the palm of your hand.
Re:Would it be possible... (Score:1)
Re:Would it be possible... (Score:1)
Battletoads anyone? (Score:2)
Re:Battletoads anyone? (Score:2)
Just out of curiosity.... (Score:5, Interesting)
I assume it would be somewhat difficult to program efficiently for such systems. I don't mean just getting programs to run, but getting the most bang for your buck. Can anyone here confirm or deny this? Also does anyone know where to find resources on the topic of programming such machines (and no, I am not talking about smp docs or bewoulf docs or even pvm docs)?
Re:Just out of curiosity.... (Score:2, Informative)
Re:Just out of curiosity.... (Score:2, Interesting)
Re:Just out of curiosity.... (Score:5, Informative)
eg. who cares how many instructions you can process in parallel, if module A requires data from module B. In these cases parallelisation is limited to making each module run faster (if it doesn't have sub dependencies, of course), the entire program doesn't benefit from the parallelisation.
Good examples of parallel processing are the ones we know - distributed apps like SETI@home, graphics rendering, etc.
Bad systems are everyday data processing systems - they typically work on a single lump of data at a time in sequences.
A good source of parallel programming is http://wotug.ukc.ac.uk/parallel/ or, of course, google.
Re:Just out of curiosity.... (Score:3, Interesting)
On a coarser level this also includes any multi-user UNIX system that is actually used by multiple users. While not allowing per-person scaling, it allows very significant institutional scaling.
Re:Just out of curiosity.... (Score:1)
Re:Just out of curiosity.... (Score:2)
Think of a computer system as virtual boxes within boxes. At the lowest level you have the physical logic gates that make up the processor. Above that level you have the microcode which is a very simple application that forms the underlying structure of a virtual von neuman machine, and associated extensions - accessible via an assembler
Fortran 95 oddly enough is multi-processor aware. (Score:5, Informative)
for parallel processing fortran boast many language level features that give ANY code implicit parallelism and implicit multi-threading and implicit distribution of memory WITHOUT the programmer cognizantly invoking multiple threads or having to use special libraries or overloaded commands.
An example of this is the FORALL and WHERE statements that replace the usual "for" and "if" in C.
FORALL (I = 1:5)
WHERE (A(I,:)
A(I,:) = log(A(i;0)
ENDWHERE
call some_slow_disk_write(A(I,:)
END FORALL
the FORALL runs the loop with the variable "i" over the range 1 to 5 but in any order not just 1,2,3,4,5 and also of course can be done in parallel if the compiler or OS, not the programmer, sees the opportunity on the run-time platform. The statement is a clue from the programmer to the compiler not to worry about dependencies. Moreover the program can intelligently multi-thread so the slow-disk-write operation does not stop the loop on each interation.
The WHERE is like an "if" but tells the compiler to map the if operation over the array in parallel. What this means is that you can place conditional test inside of loops and the compiler knows how to factor the if out of the loop in a parallel and non-dependant manner.
Moreover, since the WHERE and FORALL tell the compiler that the there are no memory dependent interactions it must worry about. thus it can simply distibute just peices of the A array to different processors, without having to do maintain concurrency between the array used by different processcors, thus elminating shared memory bottlenecks.
Another parallelism feature is that the header declaration not only declare the "type" of variable
Other rather nice virutes of FORTRAN is that it uses references rather than pointers (like java). And amazingly the syntax makes typos that compile almost impossible. that is, a missing +,=,comma, semi colon, the wrong number of array indicies, etc... will not compile (in contrast to ==, ++, =+ and [][] etc
One sad reason the world does not know about these wonderful features, or repeats the myths about the fortran language missing features is due to GNU. yes I know its a crime to crtisize GNU on slashdot but bear with me here because in this case they desereve some for releasing a non DEC-compatible language.
for the record, ancient fortran 77 as welll as modern fortran 95 DOES do dynamic allocation, support complex data structures (classes), have pointers (references) in every professional fortran compiler. Sadly GNU fortran 77, the free fortran, lacks these language features and there is no GNU fortran 95 yet. This is lack prevents a lot of people from writing code in this modern language. if Gnu g77 did not exist the professional compilers would be much more affordable. So I hope some reader who know about complier design is motivate to give the languishing GNU fortran 95 project the push it needs to finnish.
In the age of ubiquitous dual processing fortran could well become a valuable scientific language due to its ease of programming and resitance to syntax errors
Re:Fortran 95 oddly enough is multi-processor awar (Score:3, Informative)
Gnu Fortran 95 compilation server? (Score:2)
I wonder if there is some way someone could practically and legally set up a compilation server for F95 using a non gnu fortran. One could probably talk one of the proprietary compilers (Portland group or Absoft) into allowing this since it would actually promote sales of their products.
The reason this would improves sales is that This would alleviate the dillema programmers face. Writ
Re:Fortran 95 oddly enough is multi-processor awar (Score:2, Informative)
I wouldn't put much blame on GNU. Fortran 77 was a fairly unpleasant language, even before GNU existed. Compiler extensions sometimes helped but weren't too great for portability.
Not that I don't want to see a GNU Fortran 95, but if you can tolerate free as in beer software, Intel makes their fortran compiler availa
Re:Just out of curiosity.... (Score:2)
Re:Just out of curiosity.... (Score:1)
Memory is like an orgasm. It's a lot better if you don't have to fake it. -- Seymore Cray
Good question (Score:3, Interesting)
I spent 13 years at the Supercomputer Computations Research Institute, an interdisciplinary research institute whose job it was to figure such things out. Amongst other goodies, we had the first CM-2 (a SIMD box with 65536 processors) with floating point chips, at the time the fastest machine in the world. We also had a homegrown machine for quantum chromadynamics. And a cluster with 150+ nodes, and some shared memory machines, yada yada yada. Lots of stuff.
So, from my experience:
It's a little bit tri
Re:Good question (Score:1)
Uh oh, Terminator andriods will rule the earth! (Score:4, Funny)
Don't say I didn't warn you!
This is not "Grid Computing" (Score:5, Interesting)
Grid Computing deals with computation and information sharing seemlessy across a network, they used to always say like how the power grid works. Which in reality is about right as it doesn't always work as advertised.
Anyway, Grid Computing is mainly concerned with software to allow multiple computers to work together seemlessly. This includes registry services, single sign of, information transfer, etc.
This appears to be the rather fortunate result of a phenomenon called "Buzzword collision", where two different projects pick the same buzzword in hopes to really confuse people who don't read the articles and trick PHBs into thinking that each project is ueberimportant.
Re:This is not "Grid Computing" (Score:2, Interesting)
Buzzword collision (Score:1)
Re:This is not "Grid Computing" (Score:3, Interesting)
But from looking at the diagram and rereading the article a few times, I think this goes far beyond that and approaches something that really could be called grid computing.
Instead of just being issued instructions from a central control unit, these units seem to have far more developed abilities to communicate with each other and work together. Not just for the issuing of instructions, but during ex
What about Transputers? (Score:5, Interesting)
There are some programming languages designed for parallelism. Biggest hassle is efficiently partitioning problems into something parallel. Not all problems can be done faster by doing more of it at once.
connection machines too (Score:1)
the connection machine was another parallel computing system (64k little bitty processors hooked together into a grid) that had a flurry of excitement around it (almost 70 of them in operation at the peak of activity!) and then sorta died off. alot of the problems with systems like this weren't really flaws in the basic idea, just economic issues. If you can make a cheap non parallel system run some ugly hack of solution to the problem in something semi close to the time
Re:What about Transputers? (Score:3, Funny)
Re:What about Transputers? (Score:3, Funny)
pregnant on a rotation schedule, they will produce
one baby per month, with some variance and the
occasional miscarriage.
As a domain expert with years in parallel computing
under my belt, I claim dibs on that job.
Re:What about Transputers? (Score:3, Informative)
We used transputers on quite a large number of projects right here at the University of Texas.
the NIH principle
Actually, the problem was that they were slow and complicated. They went so long between family upgrades that eventually we could replace a large array of transputers with a few regular CPUs. Not to mention that we can also get a handy little thing like an OS on general purpose CPUs.
programming languages designed for parallelism
Did I mention complicated? Occam w
Re:What about Transputers? (Score:5, Interesting)
Seconded, loudly. Inmos was a classic case of great engineering trashed by lousy management. When the transputer came out, it was fantastic, leading edge stuff. But inmos turned everybody off bay saying that you had to use it their way and no other.
The thing that shows how good the transputer was that it was still selling ten years after it first came out, when it had been overtaken and lapped several times by conventional CPUs. But that cannot go on for ever - by the time they died, you could simulate a tranputer in a conventional CPU that cost less but ran faster.
Re:What about Transputers? (Score:2)
Back to the OS. I think it was in use by Southampton University, and IIRC the m
Re:What about Transputers? (Score:2)
Helios. The Atari transputer workstation used it.
Sun may already be ahead of the game here(!) (Score:4, Informative)
Read about plans for Sun's "Niagra" core [theregister.co.uk]
I understand they hope to create blade systems using high densities of these multiscalar cores for incredible throughput.
There's your parallel/grid computing.
Re:Sun may already be ahead of the game here(!) (Score:2, Informative)
Grid computing? (Score:5, Informative)
And is exemplified by projects like MyGrid [man.ac.uk].
Grid confusion (Score:5, Informative)
Re:CPU 'Blackouts'? (Score:1)
What sort of computations will this be good at? (Score:4, Insightful)
I use parallel computing on a cluster, in which I divide up my computational domain into a number of chunks, and each chunk is farmed out to a processor. Communication between the processes is required at the chunk boundaries.
For this case, I see how my code is partitioned, and I also understand (on a general level, at least) what the limitations on speed are: information based between the chunks.
Now, how will this processor do its 'instruction level' parallelization? Will it be great at do loops (one 'do' per processer)? Will it be like a mini vector processor? What will break down the efficiency of the parallelization?
I have found that efficiency in parallelization is very application dependent after about 8-32 procesors. Will this break that barrier?
Most importantly, will it kick butt for MY applications?
Re:What sort of computations will this be good at? (Score:1)
If they're cheap and you can get the density up high enough maybe AES [nist.gov] won't last as long as we thought.
Gridlike Computing Vs Grid Computing (Score:3, Informative)
The article doesn't actually have anything to do with "grid computing", but the processor's design is like a grid. The term "grid computing" [globus.org] often refers to large-scale resource sharing (processing/storage).
Re:beowulf (Score:1)
This is usually limited by the amount of resources on the platforms. At times, this has been governed by such things as the number of open sockets the OS supported and/or how long it took to open all the connections to all the machines before RSH or the like started timing out and closing the connections.
Has anyone thought of making a beowulf cluster of beowulf clusters yet?
Yes. Grid computing encompasses this idea (things li
Re:beowulf (Score:1)
BS & hype (Score:5, Interesting)
The prototypes will include four Trips processors, each containing 16 execution units laid out in a 4 x 4 grid. By the end of the decade, when 32-nanometer process technology is available, the goal is to have tens of processing units on a single die, delivering more than 1 trillion operations per second.
At 32 nanometers, Intel could put tens of HT pentium cores on a single chip, achieving the same result.
"One key question is, Will this novel architecture perform well on a variety of commercial applications?"
For computational problems that can be broken down into parallel computations, the answer is yes. For all the other types of problems, the answer is no. Although I have to admit that most algorithmic bottlenecks is in iterative tasks that are highly parallelizable.
On Trips, a traditional program is compiled so that the program breaks down into hyperblocks. The machine loads the blocks so that they go down trees of interconnected execution units. As one instruction is executed, the next one is loaded, and so on.
*cough* EPIC *cough* VLIW architecture *cough*
I support parallelism and I am looking forward to seeing it on my desktop, as it will increase the computational power of my computer tremendously. Unfortunately, it will mean new compilers and maybe programming languages that have primitives for expressing parallelism.
By the way, the transputer [google.com] chip was promising. The idea of lots of computational units running in parallel is nothing new(maybe each memory block must have its own processor to locally process and compute the data).
Re:BS & hype (Score:3, Informative)
At 32 nanometers, Intel could put tens of HT pe
Re:BS & hype (Score:1)
One of the more interesting processor designs would be the FORTH based 25xC18 using 25 C18 cpu cores which could achieve up to 60,000 (!!!) MIPS using a very low power design. The 25xC18 was designed by Chuck Moore. The interesting thing about the FORTH processors is that they use an extremely small instruction set (~24 instructions) and require only ~10K transistors per CPU allowing for very fast and low power operation. It also allows one to add on-chip DRAM right next to the core allowing 1ns memory acce
Re:BS & hype (Score:1)
No, they couldn't, because HT pentium cores use way too much power to be packed in at that density. This (and other similar) research is based on using many simple (but fast) low-power cores, usually in an adaptive fashion. (e.g., for one app I use certain processor cores for one portion of processing, for another I use them for something else entirely - and the mapping is usually done explicitly eithe
How does this compare to VLIW? (Score:3, Insightful)
read the comments from the horse's mouth (Score:5, Informative)
-- emery berger, dept. of cs, univ. of massachusetts
Grid posting (Score:2)
Why parallel processors aren't common (Score:5, Interesting)
parallelizing the data-processing itself (Eg Seti@Home) whereby the data being worked on itself is spread amongst 'loosely parallel' execution units is much more practical, and doesn't suffer from the overhead involved in creating parallel processor servers, or even parallel execution chips. It also alleviates the memory bottlenecks of parallel execution cores.
I always wondered what kind of an app demands the kind of big iron that Cray and NEC churn out - that couldn't be more cost effectively realized through distributed processing amongst many independent computers (a la Google).
It seems, even cyclical, result-dependant processing (weather prediction) could be coded to work in such a manner.
1000 bare bones p4 3ghz PCs (~$600) have more processing power ( 2500 MFLOPS each ) than a single X1 cabinet ( 819 GFLOPS @ $2.5M ) and as you can see - for less than 1/4 of the cost.
( 2.55 TFLOPS @ $600,000 vs 819 GFLOPS @ $2.5M )
( p4 MFLOPS hit 5700 each w/ SSE2 )
Now I imagine there have to be exceptions. There -has- to be a reason to have such big iron for certain problems. There must be a reason that very smart people advise their superiors to buy up around $8b of this stuff each year.
but i don't personally see the applications, and given the monumental cost of developing a new processor nowadays - the market doesn't seem to either.
so that's my $0.02 as to why more complex esoteric parallel execution designed chips remain so rare.
Re:Why parallel processors aren't common (Score:3, Insightful)
Many of the approaches to these problems take the form of a grid of elements that have local and possibly non-local interactions with each other. Each processor gets a subset of the points to work with and has to communicate with the neighboring processor's memory space to get information about neighboring points.
In a cluster, handling the points at the edges (or any non-local effec
Cray's Ideas (Score:1)
Re:Why parallel processors aren't common (Score:2)
AKA Reconfigurable Computing (Score:3, Informative)
Carly Fiorina (Score:2)
Re:Carly Fiorina (Score:1)
And the projected date and time for (Score:1)
Main memory bandwidth limits HPC today (Score:3, Insightful)
They don't seem to be considering business servers here, but they are more main memory latency limited than bandwidth limited, so multiple cores can help a lot. But you need more than simply lots of cores to have a good design. A critical thing to have is major software support which means using an existing ISA, not a new one.
So I'd expect this to be quite an obscure product in reality.
PS3? (Score:1)
Re:PS3? (Score:1)
Re:PS3? (Score:1)
Re:PS3? (Score:1)
The Parrallel Processor (Score:2)
I may be thinking in different terms than you, but my understanding of future chip design is, that multiple CPU cores on one chip, is basically becoming the norm. To some extent, is this what hyper threading does on the newest Intel chips? I recall also reading the PPC G5 chip in the newer Mac's has multiple processor cores.
So, to answer... where are parall
Re:The Parrallel Processor (Score:4, Informative)
Ok, HT double clocks the Cache! so you have two cache's for the price of one! The G5 is a multicore chip so is Cell Linky [zive.net] and The Opteron are all multicore chips, the diffrence (apart for the arch!) is the way VLIW's are feed to each of these. They are NOT paralell processors, paralellisam can be defined as the maintence of cache coherence, it is either inclusive (cray) or excluseive (rs6000), and requries a lot of bandwidth (local x-bar versus network). Where as parallel computers are not cache coherent and have a remote x-bar architechure, it all adds up to the same hypercube.
Re:The Parrallel Processor (Score:2)
Huh? I thought hyperthreading gave you a second instruction pipeline so that when the first one doesn't provide enough instructions for the processors parallel execution units (which have been a feature since the pentium) instructions from a second thread can be executed with the spare power.
paralellisam can be defined as the maintence of cache coherence, it is either inclusive (cray) or excluseive (rs6000), and requries a lot o
Re:The Parrallel Processor (Score:1, Informative)
The Connetion Machine (Score:3, Insightful)
Danny Hillis, the guy who founded ThinkingMachines designed a mchine called The Connection Machine [base.com], (this story [svisions.com] has a cooler, more sci-fi lookin' pic of the old beastie [svisions.com]) the central design philosophy was to achieve MASSIVE computing power through parallelism. It had 65,535 procs, each of lived on a wafer with dram thereon and a high bandwidth connection to up to (if I remember correctly) up to 4 other of the procs. Young sir Danny wrote a book on his exploits, [barnesandnoble.com] well worth checking out (seemingly, it's been calling to me from my bookshelf for about a year now).
And as someone pointed out, it seems we've seen this topic before. [slashdot.org] I'd have modded him up, [slashdot.org] (hint, hint) but I really like mentioning the connection machine where appropriate.
Re:The Connetion Machine (Score:2)
1-bit bit-slice processors on each chip, so a
65k node CM-2, for example, had 4k "beta" chips.
You could program it as a 65kb-wide VLIW machine.
But TMC quickly discovered that the bulk of sales
opportunities were related to the Cold War, and
for that purpose what was wanting was not the
vast symbol-pushing capacity of the CM-1, but
lots and lots of FLOPS. So they added FPUs.
The FPUs used blocks of 32 1-bit CPUs like MMUs.
This lead to enormous complications. In order
Die Yields (Score:1)
Re:Die Yields (Score:2, Informative)
Deja Vu all over again (Score:5, Interesting)
But the problem has always been the programming. Ordinary software does not map very well onto these architectures. Certain specific problems can be mapped well onto them, which results in spectacular performance claims for the system. But generally such systems perform well only on those problems for which they were specifically designed.
Communications is a common reason for failure. They scale very badly. In the early days of development, the first few processors have any-to-any connectivity, so the application will really fly. But since the connectivity rises as the square of the nuymber of processors, this cannot hold for very long. As soon as connectivity becomes limited, communications bottlenecks start to appear, and you get processors being held up either sending messages or waiting for them to arrive. Buffering (which many did not implement in their communications architectures) helps, but itm doesn't solve the problem. (A bit like lubrication - a small amount brings a considerable improvement in performance, but past a certain point, it only adds to costs).
Another problem is load balancing. It is very difficult to design your system so you don't end up with most of the CPUs waiting for one, overloaded, CPU to finish its job. The only architectures which really worked were the farm model - a central dispatcher sends tasks to a "farm" of identical "workers", which therefore request work units as and when they need them. This means that the whole code for the system has to be loaded into each worker; not necessarily a killer at todays memory prices, but it would be nice to be more efficient. It also requires the task to be divisible into a vey large number of chunks, which can executed independently without too much communications. OK for large volume simulations etc., but a disaster for (say) database programming, image/voice recognition.
It also doesn't help that not may people really think multi-threaded in their program design. Again, no-one that I know has a good Object Oriented multi-threading model. Current models are analagous to either pre-structured programming or early structured programming. Which means that people, reasonably, approach multi-threading as a dangerous monster to be approached only whan absolutely necessary, with great care, and if possible in flame-proof armour. For this sort of system to be much use we need a development which does to current threading what inheritance did to pre-OO languages: something that makes is so simple that, one over the hump of initial unfamiliarity, people use it all the time without even thinking about it.
I designed one of the larger heterogenous transputer based system to ship - up to 100 transputers in 6 different roles. Load and communications balancing was a real hassle from the the day the system first started to work for real, and we were constantly tuning buffers, fiddling with routing algoirithms, movong bits or processing from this CPU to that to get the perfomance up. (Not to mention that inmos completely blew their second generation transputer, which we had been hoping would solve many of our problems).
Re:Deja Vu all over again (Score:1)
Read the Article (Score:3, Insightful)
Re:Read the Article (Score:2)
You've got a whole bunch of non-traditional processors and you try and divide the work between them.
The individual CPUs are, as you say, more flexible than current CPUs. Like hyperthread
project home page (Score:2, Informative)
They have some papers available there...
I'm sorry. (Score:2)
Internal parsing error reported.
I always thought the "single" processor paradigm has gone on way too long. I guess soon we'll be able to plug in multiple processors like we do ram.. But a question.
(1/req)=(1/R1)+(1/r2)+(1/R3)
The Grid... (Score:2)
karma whoring. (Score:1)
Globus Toolkit [globus.org]
LSF [platform.com]
openPBS [openpbs.org]
gridengine [sunsource.net]
OSCAR [sourceforge.net]
ROCK MPP [rocklinux.net]
maui [supercluster.org]
and last but not least: beowulf cluster [beowulf.org]
---
Grid computing: get a clue (Score:1)
Links:
See also: Throughput Computing [sun.com]
Urgonomics (Score:1)
the difference being... (Score:1)
Ramifications for encryption? (Score:1)
I.e., how soon will the average processor available on the street be able to crack a 56-bit DES key? A 128 bit key? Will a 1024 bit key ever be crackable by brute force?
We keep hearing that "all the kings computers and all the kings men" could never crack 1024 bits by brute force in millenia of trying. But does the continued exponential advancement of computing power threaten this state o
Not hard to program (Score:1)