IBM Releases Cell SDK 207
derek_farn writes "IBM has released an SDK running under Fedora core 4 for the Cell Broadband Engine (CBE) Processor. The software includes many gnu tools, but the underlying compiler does not appear to be gnu based. For those keen to start running programs before they get their hands on actual hardware a full system simulator is available. The minimum system requirement specification has obviously not been written by the marketing department: 'Processor - x86 or x86-64; anything under 2GHz or so will be slow to the point of being unusable.'"
Well . . . (Score:2, Funny)
Oh. Well, okay then.
Re:Well . . .Next question (Score:2)
Well, we know the answer to that. Next we want to know, will it kill Intel?
Wikipedia article question (Score:2, Insightful)
Why?
Re:Wikipedia article question (Score:2)
Re:Wikipedia article question (Score:5, Insightful)
Cell processors aren't really anything all that new per say. The multi-core design makes them superficially similar to GPUs (which are also vector processors) with the difference that GPUs use multiple pipelines for parallel processing whereas each cell is a self-contained pipeline capable of true multi-threaded execution. In theory, the interplay between these chips could accelerate a lot of the work currently done through a combination of software and hardware. e.g. All the work that graphics drivers do to process OpenGL commands into vector instructions could be done on one or two cells, thus allowing those cells to feed the other cells with data.
I guess you could say that the cell processor is the start of a general purpose vector processing design. I'm not really sure if it will take off, but unbroken thoroughput on these things is just incredible.
Re:Wikipedia article question (Score:5, Insightful)
I'm sure IBM and Sony have much better documentation on the CPU than I do, but that's it in a nutshell. Everything else you hear about it is just marketing. Oh yeah, almost forgot. Microsoft's "Xenon" processor for the Xbox360 is pretty much just 3 of those stripped down, in-order PPC cores in one cpu die.
Re:Wikipedia article question (Score:2, Interesting)
You know, I'm looking back at all these replies to the poor guy, and I can't help but think that he's sitting in front of his computer wondering, "Can't anyone explain it in ENGLISH?!?"
For instance, you have to unroll your "for" loops to start, since those SIMD co-processors can't do loops.
Actually, we need a new program
Re:Wikipedia article question (Score:3, Funny)
0..9 { |i| puts i }
Re:Wikipedia article question (Score:2, Informative)
gcc and other compilers have options such as -funroll-loops, which will unroll loops (no matter how they were specified) for you if the count can be determined at compile time. So you wind up with "Do yo
Re:Wikipedia article question (Score:3, Informative)
As for the other posters, the real reason you want to unroll loops is basically to avoid the cost of managing the loop, e.g.
a simple loop like
for (a = i = 0; i b; i++) a += data[i];
In x86 would amount to
mov ecx,b
loop:
add eax,[ebx]
add ebx,4
dec ecx
jnz loop
So you have a 50% efficiency at best. Now if you unroll it to
mov ecx,b
shr ecx,1
loop:
add eax,[e
Re:Wikipedia article question (Score:3, Interesting)
shr ecx,2
loop:
add eax,[ebx]
add eax,[ebx+4]
add eax,[ebx+8]
add eax,[ebx+12]
add ebx,16
dec ecx
jnz loop
With SIMD instructions, you can execute all four of those adds in one instruction. I wish I knew SSE a bit better, then I could rewrite the above. Sadly, I haven't gotten around to learning the precise syntax.
However, there's a fairly good (if not a bit dated) explanation of SIMD here [mackido.com].
Re:Wikipedia article question (Score:2)
iirc the instruction is "paddd", and you'd do four parallel adds then shuffle and add twice to get the single sum.
Tom
Re:Wikipedia article question (Score:2)
mov ecx, b
shr ecx, 2
pxor xmm1, xmm1;
loop_:
movdqa xmm0, [ebx]
paddd xmm1, xmm0
add ebx, 16
dec ecx
jnz loop_
Re:Wikipedia article question (Score:2)
LOOP_START:
LOOP LOOP_START
Re:Wikipedia article question (Score:2)
In most cases, I think template metaprogramming (in C++) is pedantic garbage. In this case however, you could probably use it to great effect (ie, the compiler will unroll your loops for you). The syntax is still pretty horrible though.
Re:Wikipedia article question (Score:2)
(Must (resist (temptation (to (joke (about (syntax)))))))
Re:Wikipedia article question (Score:2, Funny)
Ack, pfft, says the evil Schemer. This is just insipid syntactic sugar for what you really mean:
instead of whatever dark magic your buggy
ends up being mangled into by your CommonLis
Re:Wikipedia article question (Score:2)
loop:
branch to loop
However, you can "unroll" loops. If you have a loop that always runs 8 times, instead of doing a for loop you can just put the statement there 8 times. It makes the code larger in memory, but it saves processing time since you don't have to check exit conditions or jump around. This would be something done by the compiler, so OP's po
Re:Wikipedia article question (Score:2)
Re:Wikipedia article question (Score:3, Informative)
You have this backwards. Optimizing compilers will turn tail-recursive style source into "normal" loops.
You can write a loop recursively, so that:
becomes
Re:Wikipedia article question (Score:2)
Thanks, no I don't. I said "like writing in tail recursive style" - I know what it means.
My point is that, just like one can write recursions in a form that a compiler can turn them into something more (stack) efficient, so one might write iterations in a style that they can be unwound more easily (like using a primitive type as the counter, rather than an OO-style iterator)...
Re:Wikipedia article question (Score:2, Interesting)
I now think you were using a simile or making an analogy to argue that compilers can benefit from careful construction of loops in the source code.
If so, then of course I agree with you.
Saying this in a much more general way: careful choice of syntax can make the semantics more clear to the compiler.
A high level language with "dotimes (count) { action }" syntax lets the compiler make good choices about loop unrol
Re:Wikipedia article question (Score:2)
Quite (hence my second post) - I couldn't work out what you thought I meant - at first I wondered if you thought I meant that iterations could be turned into recursions by the Cell compiler (i.e. the opposite to the normal optimisation, which is why I was trying to make it clear that I know what direction this happens in), then I realised you'd mistaken my analogy for an example... Rather
Re:Wikipedia article question (Score:2)
Re:Wikipedia article question (Score:2)
Re:Wikipedia article question (Score:2)
Frak is actually a made up swear word from Battlestar Galactica.
It is sometimes used by the super geeky. Like er, um, er.. me.
Re:Wikipedia article question (Score:2)
Think about it just in broad terms. Computer programing is like math. It really is best expressed visually. Think of a math class with no white board and just some one lecturing. Pretty useless. So even if you have an AI as smart or smarter than a person they will probably still want to see what your talking about.
Not to mention that AIs as smart as a Hamster are still years or decades away.
Re:Wikipedia article question (Score:5, Informative)
The reason you want to unroll loops is because of various other delays. If it takes 7 cycles to load from the local store to a register, you want to throw a few more operations in there to fill the stall slots. Unrolling can provide those operations, as well as reduce the relative importance of branch overheads.
Re:Wikipedia article question (Score:2)
Um... I'm not sure that's what he's trying to say. SIMD by definition is Single Instruction, Multiple Data. i.e. You give it a couple of instructions and watch it perform them on every item in the stream of data. By definition, a loop is an iteration over each instruction, multiple times. a.k.a. Multiple Instruction Multiple Data (MIMD).
What's neede
Re:Wikipedia article question (Score:2)
Minor correction. That's supposed to be Single Instruction, Single Data. (SISD) My bad.
Re:Wikipedia article question (Score:2)
Re:Wikipedia article question (Score:2)
Optimization won't be a problem. At least it won't be the main problem. The instruction set is rich enough to provide scalar and vector integer/fp/dp operations along with both conditional branching and conditional assignment. And it can be programmed in C using intrinsics for SIMD instead of assembly. That brings up the really important part-- 128 128-bit registers. Current x86 compilers suck balls at intrinsics mostly because SSE is such
Re:Wikipedia article question (Score:2)
Re:Wikipedia article question (Score:2, Informative)
A modern desktop computer has one master CPU, then several smaller CPUs each running their own software. Graphics, Sound, CD/DVD, HD, not to mention all the CPUs in all the peripherals.
But the analogy ends there. The Cell has certian limitations and wouldn't be able to operate as a full computer system with no other processors very efficiently. I believe the PS3 has a s
Re:Wikipedia article question (Score:2)
Each cell processor includes not only the multiple processors mentioned elsethread, but addressable memory, DMA controller, and a controller for what is essentially a proprietary network. The last is somewhat open to argument -- for example, current AMD CPUs include HyperTransport controllers, which are somewhat similar.
In any case, IBM does (e.g. here [ibm.com]) talk about the Cell as a System on a Chip, t
Re:Wikipedia article question (Score:2)
Why?
Actually each of the SPU's resemble a system-on-a-chip. They each have local memory, CPU and I/O. The Cell itself actually resembles a network-on-a-chip (or in slashdotology, a Beowulf-Cluster-on-a-Chip) if you consider main memory to be I/O storage.
Is this the same Cell processor used in the PS3? (Score:2)
Re:Is this the same Cell processor used in the PS3 (Score:5, Funny)
Re:Is this the same Cell processor used in the PS3 (Score:2, Funny)
Unproductive? (Score:5, Funny)
My favorite quote from TFA...
Re:Unproductive? (Score:2)
Since the submitter didn't bother to explain... (Score:5, Informative)
Re:Since the submitter didn't bother to explain... (Score:2)
Re:Since the submitter didn't bother to explain... (Score:2)
Short story: The cool thing about the Cell are the SPEs that are the best thing since sliced bread if you have lots of matrix-vector operations to perform but more or less useless otherwise.
IBM is eager to run Linux on it because the Cell could make one hell of a supercomputing grid. (Although it loses lots of flops i
Re:Branch Prediction (Score:2)
Re:Branch Prediction (Score:2)
Most of the other posters have no idea what they are talking about. The PPE is a fully PowerPC compliant two-way SMT processor and absolutely has a branch predictor. It is the SPEs (SIMD vector units) that do not have branch prediction, but they do ha
Re:Branch Prediction (Score:2)
Re:Branch Prediction (Score:2)
Re:Branch Prediction (Score:2)
I think it's very early to talk about the integer performance of Cell. I have been working on Cell for a few months now, and all I can say is that the integer performance of the PPE core is on par with the competition; and it beats them handily using hand-written code to take advantage of the SPEs.
Re:Branch Prediction (Score:2)
Re:Since the submitter didn't bother to explain... (Score:2)
With 8 or more semi-independent "Synergistic Processing Unit" pipelines, it doesn't really need to have a lot of complex branch prediction logic. It could adopt a bit of a quantum methodology and assign a different SPU to proceed for each possible outcome of a compare/branch instruction, and then once the correct outcome has been established, discard the "dead-end" pipelines.
Then again, I learned microprocessor design principles back when the PPC 601 was s
Re:Since the submitter didn't bother to explain... (Score:2)
Re:Since the submitter didn't bother to explain... (Score:2)
You're bitter about something. Care to share?
Steve most certainly made a decision to go Intel. No "pretending" involved. Just what dollar value to you ascribe to "5% of IBM's chip volume", BTW?
4) With Cell processors in Macs no longer an option for Apple, the sour grapes meme that the idiot above parroted starts to make its rounds in Mac circles.
Cell wouldn't be that great for it's clock speed, but it would certain
Source for actual chips? (Score:4, Interesting)
I should have added... (Score:2)
But I would like to know.
Mike.
Re:Source for actual chips? (Score:2)
Re:Source for actual chips? (Score:2)
Thanks,
Mike.
Re:Source for actual chips? (Score:2)
Re:Source for actual chips? (Score:2)
Re:Source for actual chips? (Score:2)
Re:Source for actual chips? (Score:2)
I'd forgotten that these processors are not made on the 3 micron processes like the chips I used to work on!
Re:Source for actual chips? (Score:2)
3 microns? Wow. That's huge! The top of the line chips these days are easily below 0.5 microns. (The PIV chips are 0.18 and 0.13 microns!) I know I was just shocked when I got my Spartan III FPGA kit. I couldn't believe how small the thing was in it's packaging. I can't even imagine how small the actual die must be!
Re:Source for actual chips? (Score:2)
cool! I just got a Spartan III dev board in the post last week too. First thing I did was hook it up to a monitor and twiddle a few buttons
Fancy chatting about it by email?
mikehibbett at oceanfree (dot) net
Mike.
What about a PPC SDK and simulator? (Score:5, Interesting)
It'd be nice if IBM released a PPC SDK for Fedora, it would have the potential to run much faster than an x86 SDK and simulator.
Re:What about a PPC SDK and simulator? (Score:2)
Re:What about a PPC SDK and simulator? (Score:2)
Another question about the simulator (Score:2)
I wonder if it'll take advantage of multi-core chips? Might make sense to do so, especially since that's also (sort of) similar to the hardware being simulated.
Re:What about a PPC SDK and simulator? (Score:2)
Got it in one try. Anyone interested in actually using this thing has a spare PC to load FC4 on, almost none has a spare top of the line PowerMac in the closet. Heck, most don't have a top of the line Powermac period.
Re:What about a PPC SDK and simulator? (Score:2)
I keep having this fantasy that a PCI-E development board will come out and I'll be able to do something interesting with it (what I have no idea but I'm open to suggestions). I'd really like OS X development environment for it to tinker with.
Not a PPC Processor (Score:2, Informative)
Once again, the cell is not a PPC processor. It is not PPC based. The cell going into the playstation 3 has a POWER based PPE (power processing element) that is used as a controller, not a main system processor. Releasing an SDK for Macs would not give any advantage over an X-86 based SDK because you are still emulating another platform.
Wiki [wikipedia.org]
Re:Not a PPC Processor (Score:2)
Re:Not a PPC Processor (Score:3, Informative)
What is Power Architecture technology? [ibm.com]
"Power Architecture is an umbrella term for the PowerPC® and POWER4(TM) and POWER5(TM) processors produced by IBM, as well as PowerPC processors from other suppliers."
Re:What about a PPC SDK and simulator? (Score:3, Informative)
Comment removed (Score:5, Interesting)
Re:GNU toolchain (Score:4, Informative)
Re:GNU toolchain (Score:2)
who the heck says they have to keep the GCC they distribute with the software development kit platform agnostic??? what a stupid concept. It has absolutely NOTHING to do with the GCC leads..
Re:GNU toolchain (Score:4, Informative)
Echoes of Redhat (Score:3, Insightful)
Why Fedora is so often considered the default target distribution I don't know. Even the project page [redhat.com] states it's an unsupported, experimental OS, and one now comparitvely marginal when tallied [distrowatch.com].
Must be a case of 'brand leakage' from a distant past, one that held Redhat as the most popular desktop Linux distribution.
Shame, I guess IBM is missing out on where the real action is.
I agree! (Score:2)
BTW, this parent might be offtopic, be he is no troll. Shame on you mods!
Re:I agree! (Score:2)
Re:I agree! (Score:2)
Well, first I'd like to irraterate what you already pointed out, that neither has unpatched vulnerabilities.
Second, you're comparing EVERY release of Gentoo ever to Fedora Core 4.0. Notice how Fedora Core 4.0 doesn't have any vulnerabilities before Feb 2005? That's because it didn't exist much before then.
Yo
Re:Echoes of Redhat (Score:4, Insightful)
When it comes down to it, Fedora is the most advanced linux distribution out there. It comes standard with SELinux and virtualization. It uses LVM by default, integrates exec-shield and other code foritfying techniques into all major services. It has the latest and greatest of everything. Things just work in Fedora because a large portion of that code was coded by Red Hat. Red Hat maintains GCC and glibc, they commit more kernel code than anyone else, they play a large role in everything from Apache and Gnome to creating GCJ to get java to run natively under linux. Whether you like it or not, Fedora is the distro most professionals go with, despite what the slashdot popular oppinion is and despite the large amounts of noise that a few ubuntu users create.
Out of the big two, Novell and Red Hat, Novell has never been worse off and Red Hat has never been healthier. Red Hat doesn't officially provide support for Fedora, but it is built and paid for by Red Hat and their engineers (in addition to the community contributions). By targetting Fedora, IBM knows that they are targeting a stable platform with the largest array of hardware support. IBM is in bed with both Novell and Red Hat, they didn't choose Fedora because they were paid to or something... they chose Fedora based on technical merits. Claiming that Fedora is unstable is no different than claiming GMail is in beta, both products are still the best in their respective industries. Why do people go spreading FUD about such a good produc when they've never used it themselves? Whether you want to admit it or not, Fedora is the platform to target for most. It is compatible in large part with RHEL, so you're getting the most bang for your buck.
IBM doesn't just shit around, or make decisions for dumb reasons. If Fedora is good enough for IBM it is good enough for anyone. Apparently this is a common oppinion as more and more businesses switch to Fedora desktops. Here [computerworld.com.au] is one recent story of a major Australian company, Kennards, replacing 400 desktops with Fedora. Don't be so close minded or you might be left behind.
Regards,
Steve
Re:Echoes of Redhat (Score:2)
Re:Echoes of Redhat (Score:2)
You cannot prove Fedora better because it just plain is not. It is impossible to debate against the truth. Thus you must resort to ad hominem attacks, which instantly prove that I am the victor in this debate.
New & Improved (Score:2, Funny)
Linux on PS3? (Score:2)
The real question is whether the the PS3 will have an Linux hard disk option like the PS2. If that is the case, it may be the cheapest way to get actual development hardware.
Re:Linux on PS3? (Score:2, Interesting)
Cell Hardware... (Score:4, Informative)
How does one get a hold of a real CBE-based system now? It is not easy: Cell reference and other systems are not expected to ship in volume until spring 2006 at the earliest. In the meantime, one can contact the right people within IBM [ibm.com] to inquire about early access.
By the end of Q1 2006 (or thereabouts), we expect to see shipments of Mercury Computer Systems' Dual Cell-Based Blades [mc.com]; Toshiba's comprehensive Cell Reference Set development platform [toshiba.co.jp]; and of course the Sony PlayStation 3 [gamespot.com].
Sony has poisoned (Score:2)
Purchasing IBM's (or perhaps Mercury Computer's) reference CBE-based platform are now my only choices. Sony's NRE for the PS3 might make their platform a "best buy" pri
Re: (Score:2, Interesting)
Re:Rosetta to the rescue? (Score:2)
Re:Rosetta to the rescue? (Score:2, Interesting)
http://www.mactech.com/articles/mactech/Vol.10/10. 09/Emulation/ [mactech.com]
So you end up doing four instructions to decode the 68K instruction, and then whatever it takes to actually do the operation, typically 2-4.
JIT emulators would profile
Re:Rosetta to the rescue? (Score:2)
This is a sim, not just an emulator. It's not just vaguely implementing the output; it is at least to some extent modeling the instruction pipelining, branch miss penalties, and so on.
"cell" architecture is all about local memory (Score:5, Informative)
The cell processors can do DMA to and from main memory while computing. As IBM puts it, "The most productive SPE memory-access model appears to be the one in which a list (such as a scatter-gather list) of DMA transfers is constructed in an SPE's local store so that the SPE's DMA controller can process the list asynchronously while the SPE operates on previously transferred data." So the cell processors basically have to be used as pipeline elements in a messaging system.
That's a tough design constraint. It's fine for low-interaction problems like cryptanalysis. It's OK for signal processing. It may or may not be good for rendering; the cell processors don't have enough memory to store a whole frame, or even a big chunk of one.
This is actually an old supercomputer design trick. In the supercomputer world, it was not too successful; look up the the nCube and the BBN Butterfly, all of which were a bunch of non-shared-memory machines tied to a control CPU. But the problem was that those machines were intended for heavy number-crunching on big problems, and those problems didn't break up well.
The closest machine architecturally to the "cell" processor is the Sony PS2. The PS2 is basically a rather slow general purpose CPU and two fast vector units. Initial programmer reaction to the PS2 was quite negative, and early games weren't very good. It took about two years before people figured out how to program the beast effectively. It was worth it because there were enough PS2s in the world to justify the programming headaches.
The small memory per cell processor is going to a big hassle for rendering. GPUs today let the pixel processors get at the frame buffer, dealing with the latency problem by having lots of pixel processors. The PS2 has a GS unit which owns the frame buffer and does the per-pixel updates. It looks like the cell architecture must do all frame buffer operations in the main CPU, which will bottleneck the graphics pipeline. For the "cell" scheme to succeed in graphics, there's going to have to be some kind of pixel-level GPU bolted on somewhere.
It's not really clear what the "cell" processors are for. They're fine for audio processing, but seem to be overkill for that alone. The memory limitations make them underpowered for rendering. And they're a pain to program for more general applications. Multicore shared-memory multiprocessors with good cacheing look like a better bet.
Read the cell architecture manual. [ibm.com]
Re:"cell" architecture is all about local memory (Score:2, Informative)
Re:"cell" architecture is all about local memory (Score:2, Informative)
There was a Toshiba demo, showing 8 Cells; 6 used to decode forty-eight HDTV MPEG4 streams, simultaneously, 1 for scaling the results to display, and one left over. A spare, I guess?
This reminds me of the Texas Instruments 320C80 processor; 1 RISC general purpose cpu, plus four DSP-oriented CPUs. Each had an on-chip memory chunk. 4KB. 256KB would be fantastic, after the experience of programming for the C80. 256KB will be plenty of memory to work on a tile of framebuffer.
1.
Re:"cell" architecture is all about local memory (Score:2)
Suppose your program is 48k, you use 32k of memory dynamically, that leaves 172k for data, which is double-buffered, which means the program can only process 86k of data at a time.
But it sure can do it fast.
The NVidia GPU in the PS3 (Score:3, Informative)
SCEA press release:
SONY COMPUTER ENTERTAINMENT INC. AND NVIDIA ANNOUNCE JOINT GPU DEVELOPMENT FOR SCEI'S NEXT-GENERATION COMPUTER ENTERTAINMENT SYSTEM> [playstation.com].
TOKYO and SANTA CLARA, CA
DECEMBER 7, 2004
"Sony Computer Entertainment Inc. (SCEI) and NVIDIA Corporation (Nasdaq: NVDA) today announced that the companies have been collaborating on bringing advanced graphics technology and computer entertainment technology to SCEI's highly anticipated next-generation computer enterta
why? (Score:2)
why? Is the cell processor expected to go anywhere past PS3? There is obviously no OS port planned, and I have no access to PS3 game SDK. I have read some pretty awesome posts regarding the technical details of cell vs. x86 or Mac architectures, but none that would encourage me to download, install, and play around with this with the hope of ever making a buck.
Re:why? (Score:2)
I would buy one of these, and no, I don't plan to get a PS3.