Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×

Octopiler to Ease Use of Cell Processor 423

Sean0michael writes "Ars Technica is running a piece about The Octopiler from IBM. The Octopiler is supposed to be compiler designed to handle the Cell processor (the one inside Sony's PS3). From the article: 'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip. So Cell has immense performance potential, but if you want to make it programable by mere mortals then you need a compiler that can ingest code written in a high-level language and produce optimized binaries that fit not just a programming model or a microarchitecture, but an entire multiprocessor system.' The article also has several links to some technical information released by IBM."
This discussion has been archived. No new comments can be posted.

Octopiler to Ease Use of Cell Processor

Comments Filter:
  • by ScrewMaster ( 602015 ) on Sunday February 26, 2006 @04:49PM (#14805048)
    Hire "Real Programmers". You know, the ones that only code in Assembler, and if they can't do it in Assembler then it isn't worth doing.
  • Makes you wonder (Score:5, Insightful)

    by Egregius ( 842820 ) on Sunday February 26, 2006 @04:50PM (#14805050)
    It makes you wonder what the release-titles of the PS3 will be like, if they didn't have a decent compiler untill now. And 'the PS3 is due out in 2006.'
  • Hello, Itanium... (Score:5, Insightful)

    by general_re ( 8883 ) on Sunday February 26, 2006 @04:50PM (#14805052) Homepage
    Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."
    • Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."

      From TFA:
      "I say "intended to become," because judging from the paper the guys at IBM are still in the early stages of taming this many-headed beast. This is by no means meant to disparage all the IBM researchers who have done yeoman's work in their practically single-handed attempts to move the entire field of computer science forward by a quantum leap. No, the Octopiler paper is full of
    • Re:Hello, Itanium... (Score:4, Informative)

      by Brain_Recall ( 868040 ) <brain_recall@yaho[ ]om ['o.c' in gap]> on Sunday February 26, 2006 @05:13PM (#14805145)
      More familiar than you may think. Some of the first Itanium compilers were spitting out nearly 40% NOP's, which are simply do-nothings. Because the IA-64 is explicilty parallel, instructions are generated and bundled together to be executed in parallel. The problem is branches, which destroy parallelism since they can change the code direction. On average, there are about 6 instructions between branches, so, such a design is very costly since the memory controller will be stuck getting inscructions that are empty. Of course, speculation and branch-prediction is generally a good way to increase performance, but like many things on the IA-64, that's left to the compilier to figure out. These are some of the exact same problems with the Cell, although, I wish I knew how the instruction set was. If it's more like Itanium, then they got all of the problems of the Itanium. If it's more of a direct approach, they may be able to pull it of because of the work in multi-processor systems that are done today. But, they simply can't expect the "super-computer" numbers Sony keeps flashing around. It may be good on certain tightly coded scientific calculations, but when it comes down to real-world code, it's stuck to the stripped-down Power4 that is coordinating the Cells.


      They didn't call it the Itanic for nothing...

    • by timeOday ( 582209 ) on Sunday February 26, 2006 @05:23PM (#14805181)
      Everybody prefers a simpler programming model, there's no doubt about that. But with the recent lack of progress in unicore speeds, something has to give, and apparently that "something" is programming complexity. While the PC world moves from 1 to 2 cores, the PS3 is jumping straight to 8. But going from 1 to 2 threads is a bigger conceptual jump than from 2 to 8 anyways.

      Fortunately for IBM and Sony, games are one place where hand-optimizing certain algorithms is still practical. I doubt they will place all their eggs in the octopiler basket. I can't imagine a compiler will find that much paralellism in code that isn't explicitly written to be parallel. Personally, I think they should instead focus on explicitly parallel libraries for common game algorithms like collision detection.

    • Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."

      That's kind of a weird comparison given the differences in innovation, demonstrated results and company attitudes.

      IBM's Cell is a much more radical break from previous chips like Itanium [wikipedia.org], but the CES demo was reported to be very impressive. IBM has already released the SDK [slashdot.org] and openly published all specifications [slashdot.org]. The pace of development has been very rapid and people are predicting th [linuxinsider.com]

      • The pace of development has been very rapid and people are predicting the replacement of Intel.

        Sorry, you lost all credibility there. The Core is a single core with a bunch of DSPs tacked on. It's a great replacement for a general purpose PowerPC in many embedded applications, but won't touch Intel's target market any time soon. In the year and a half since that article was written we've learned how much Intel and AMD can do to keep ahead of the game and how applicable to general-purpose computing the

      • >We can only wonder how things would have been if Intel had opened things up like IBM has, instead of making it so people have to figure things out on their own.

        It's not quite as clean as it looks. "Full specifications" doesn't include any information on instruction latencies, cache performance, etc. They've documented the platform itself, but not the specific implementation. This makes optimization difficult.

        I've had to distill information from several publications to determine even basic things like ho
      • by Macka ( 9388 )

        The Itanium on the other hand was obsolete on it's launch. Even HP dumped it after killing their own better performing 64 bit processor for it and spending billions of dollars and ten years building it.

        HP most certainly have not dumped it. If anything they're pushing harder than ever. All I hear from HP these days is Itanium, Itanium, Itanium .... and I've been to a few HP pre-sales events in the last couple of months where they've been pushing it very hard. In a few months they'll be revising their I

  • by mosel-saar-ruwer ( 732341 ) on Sunday February 26, 2006 @04:53PM (#14805064)

    'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip.

    Sadly, there's almost no FPU hardware to speak of: 32-bit single precision floats in hardware; 64-bit double precision floats are [somehow?] implemented in software and bring the chip to its knees [wikipedia.org].

    Why can't someone invent a chip for math geeks? With 128-bit hardware doubles? Are we really that tiny a proportion of the world's population?

    • Math geeks that would need 128-bit double percision are a subset of all math geeks...

      Therefore an even smaller portion of an already small population.
      • Math geeks that would need 128-bit double percision are a subset of all math geeks...

        Perhaps you meant longdouble precision. Math geeks that can live with 32-bit floating point precision are also a small subset - most of those who do heavy math (not pixel processing) pretty much require 64-bit double precision. And that is not available in hardware from Cell (come to think of it, not for Alitvec, either)
    • SPARCv8(?) and up have quad precision.

      I've also implemented a simple double double (represents numbers as an unevaluated sum of two non-overlapping doubles) arithmetic in CL. It was ~25% as fast as doubles (mostly branchless, each op expands into ~2-8 double precision op). That gives an upper-bound on the slowdown ratio for the emulation of doubles with singles.
    • The basic purpose of the Cell is to make the PS3 work. The basic purpose of the PS3 is to play games. Games, as a rule, don't give a damn about 64-bit floating point. Games can get away with 32-bit because they don't need to be incredibly accurate, they just need to be fast. No gamer will care whether or not the trajectory of the bullet was out by 0.000000000023~ as long as it moves fluidly. So, in making a chip for gaming, you are far better off making 32-bit really fast than spending time and die space on
      • by Animats ( 122034 ) on Sunday February 26, 2006 @05:59PM (#14805304) Homepage
        Games, as a rule, don't give a damn about 64-bit floating point.

        You wish. In a big 32-bit game world, effort has to be made to re-origin the data as you move. Suppose you want vertices to be positioned to within 1cm (worse than that and you'll see it), and you're 10km from the origin. The low order bit of a 32-bit floating point number is now more than 1cm.

        It's even worse for physics engines, but that's another story.

        If the XBox 360 had simply been a dual- or quad-core IA-32, life would have been much simpler for the game industry.

        • True

          Actually, what I can't figure out is why you want floating point at all. Floating-point data stores a certain number of bits of actual data, and a certain number of bits as a scaling factor. To use your example, this would mean that while items near the origin would be picture-perfect, the object 10km away would be out by well more than a cm.

          Back when integer arithmetic was so much faster that floating point it was worth the effort, game coders used to use fixed-point arithmetic. This kept a uniform

    • Doesn't the Itanium do pretty well on floating point?
      -Ack
    • Comment removed based on user account deletion
    • They have, although outside of certain implementations of double-complex, 64-bit double-precision (REAL*8 to Real Programmers) is enough.

      Those machines are Cray Vector Processors, MIPS R8K and later, DEC Alpha, HP/Intel Itanium, IBM Power 4/5/n, IBM Vector Facility for the 3090, etc.

      Notice how many of those you see every day, and how many fewer of those you can still buy.

      Yes, unfortunately, you are that tiny a proportion of the world pop. I had hoped by this point that we'd have Cray Vector Proces
      • You are forgetting Grid Computing where you can have 1000 or more CPUs working on the problem components or in parallel. I've seen some pretty hairy physics problems solved on these. Also, a fair amount of the scientific community seems to be buying the Sun SPARC IV+ architecture. Power 5 is going to be around a while, but when they start cranking the chips speeds past about 3.5GHz then they will need liquid cooling. Itanium is hanging by a thread. I wouldn't invest in that. Best new things I see on the hor
    • Why can't someone invent a chip for math geeks? With 128-bit hardware doubles? Are we really that tiny a proportion of the world's population?

      Yes, in fact you are a really tiny proportion of the world's population!

    • by OldManAndTheC++ ( 723450 ) on Sunday February 26, 2006 @06:10PM (#14805358)
      Are we really that tiny a proportion of the world's population?

      You math geeks need to multiply. :)

    • Why can't someone invent a chip for math geeks? With 128-bit hardware doubles?

      Because the math geeks won't pay for the fab plants.

      Are we really that tiny a proportion of the world's population?

      Yes. You're the math geek - you do the math.

    • Not quite as bad as you make it out to be --

      Each SPU can do 2 DP FMACs (in one vector) in 6 cycles -- not pipelined. and at 3.2 GHz. Then you can add the single pipelined DP FMAC unit in the PPE.

      Sure, it's an order of magnitude less than SP, but it's not that anemic. And if I weren't still under NDA, I could speculate about what IBM/SONY might be doing about that situation. But I wont.

      Oh and back on topic, I used to work at IBM on compilers, and I recognized some of the names on the list of authors of the
  • Octointerpreter (Score:3, Interesting)

    by yerdaddie ( 313155 ) on Sunday February 26, 2006 @04:55PM (#14805072) Homepage
    Reading this is making me nostalgic for LISP machines [comcast.net] and interpreter environments that let programmers really play with the machine instead of abstracting it away. What I'd really like to see is someone who takes all the potential for reconfiguration and parallelism and doesn't hide it away but makes it available.
    • Lisp machines were dog slow and hugely expensive, even compared to the workstations around at the time. If you want that kind of environment, you can emulate it by running any of the interpreted languages so popular on Linux, Windows, and Macintosh (although you'll be hard pressed to be as slow as the Lisp machine).

      As for not hiding reconfigurability: you can buy anything you desire as an add-on board, like an FPGA board or an array processor. People don't use them a lot because they are a pain to program
  • isn't this a bit of a pipe dream? A compiler that optimizes a program for multiple processors is a nice idea, but how can you foresee worst-case-scenarios that only emerge with human use? Take driving as a very abstract example. You "write" a car. You want it to both accelerate and brake on a dime while still being fuel efficient. Without knowing the driving conditions, city or country, how can you optimize your driving for efficiency?
    • But you do know the driving conditions: they're the specs of the target architecture. It's still not an easy problem, of course, but it's not like you are supposed to write a compiler that emits perfect code for any target architecture - that would indeed be a rather hard problem.
    • You engineer programs in a sense similar to cars, yes. But, you interact with your tools on a much higher level than putting in a pedal and a brake pad. I suspect you do in actual car design too: it wouldn't be a huge step to be able to model a car in a 3D app and ask the computer how that shape of car will perform in terms of aerodynamics, gears, engine power and therefore miles per gallon or acceleration etc.

      It's similar with programming. Instead of saying, this is a car, and it goes in that world, and
  • Is it just me or is it a bad idea to make something that completely breaks most programming paradigms, and requires a special compiler to compile it properly, and *then* use it in a next gen console, due out this year?

    Surely it was screaming at them that this isn't something that's meant to be released so soon. I mean, the compiler have 4 tiers of 'optimisation', which is meant for the programmers to set so the compiler doesn't make a mess of their memory-management code if they memory manage correctly, or
    • Is it just me or is it a bad idea to make something that completely breaks most programming paradigms, and requires a special compiler to compile it properly, and *then* use it in a next gen console, due out this year?

      Not really, it's future proofing. It can be used as pretty much a still pretty powerful single core machine for the initial release titles, and as the programmers get to grips with how to get the most out of the cell architecture, and better tools come out, the titles will keep getting better
    • I would imagine that a lot of the problem is trying to generate SPU code from a language such as C. I would have thought that the solution would be to design a language more like Erlang[1] that is designed for parallelism, and allow your programmers to express their algorithms in this, rather than getting them to program for the PDP-11 and then trying to turn this into optimal code for something like the Cell.

      [1] Much as I like Erlang, it would not actually be quite suitable for the Cell.

    • On the PS2, there are two vector units (vu0 and vu1), which are basically where all the grunt work is done - the mips chip is there for housekeeping and non-time-critical code. Each VU has 2 code-paths (the instruction word is 64-bit, and there are two 32-bit instructions in each word). There are limitations on what you can do in each of the two words simultaneously. Sony have a GUI tool (in their professional kit) which allows the programmer to write essentially sequential code, and have it take full advan
  • Microsoft's Todd Proebsting claims [microsoft.com] that compiler optimization only adds 4% performance per year, based on some back of the envelopes on x86 hardware.

    This radical of a change in architecture should at least provide an accelerated growth from introduction through the next several years, which I'm sure will provide added incentive for those involved in compiler optimization -- finally, some real enhancements.
    • I'm glad to see some real progress in the processor world. We are so guided by the enterprise market that we've had to support x86 WAY longer than we should have. The cell looks like it has a real chance of becoming the next big advancement. For one, IBM is working heavily with the open source community. This is possibly one of the best things they could have done to help the cell. By doing this, you make open source developers happy and more inclined to port over their applications. One of the hardest thin
      • The cell looks like it has a real chance of becoming the next big advancement.

        It will be interesting to compare the Cell with the UltraSPARC T1 (Niagara). They both have about 8 cores (T1 is 8 cores, Cell is 8+1), but the T1 can do 32 threads of execution simultaneously. The Cell has good floating point performance, but the T1 only has 1 FPU for all 8 cores (it's specifically not designed for FP performance). The T1 has very low power requirements, at about 72 watts (79 peak), while (as far as I can tell fr
      • This is possibly one of the best things they could have done to help the cell. By doing this, you make open source developers happy and more inclined to port over their applications.

        It's too bad that the only popular commercial implementation of the Cell processor for several years is going to be in a machine with a lockout chip, a technical measure that prohibits end users from compiling Free software on the machine. Otherwise, game developers could develop a Free engine subsidized by keeping game asse

    • There is no magic silver bullet to vectorizing code. Compilers need to guarantee that your app will run how you meant it to run and that is no small task when it needs to infer from a language without explicit parallelism support. If the PS3 uses standard C++, I doubt this compiler will do much to help measurably.

      At the last PDC, Microsoft announced some very exciting ideas it is looking at to propose for the next C++ standard that will give language support for parallelism, essentially letting you do
      • There is no magic silver bullet to vectorizing code.

        It's even harder when there's no memory protection. One might imagine (within reason) that a Java compiler could separate independent tasks by tracking what variables are used in what sections of code, and inferring that one section must be independent of another until you reach line X (at which point you may need to synchonize access to a variable the two pieces have in common, or join the threads). That could (perhaps) achieve decent multithreaded perfor
    • This radical of a change in architecture

      There's nothing "radical" about it--it's just a bunch of CPUs on a chip. It's about the least radical way in which you can put a bunch of CPUs on a chip, beyond multicore.
  • Posit: Parallel processing can solve certain types of problems much faster than serial processing.
    Posit: The Cell architecture is highly parallel.
    Posit: Most programmers today are good at writing serial, not parallel, code.

    Hypothesis: A compiler can be developed that takes serially written programs and auto-transforms them into parallel programs to exploit the benefits of parallelism.

    Now comes the research to attempt to validate that hypothesis. Will it succeed? We'll find out in several years. There are
    • by irexe ( 567524 ) on Sunday February 26, 2006 @05:32PM (#14805213)
      Hypothesis: A compiler can be developed that takes serially written programs and auto-transforms them into parallel programs to exploit the benefits of parallelism.

      Parallel programming and automated parallelization have already been researched exhaustively throughout the last thirty years of the 20th century. The outcome of all this research is that it is not feasible/tractable to create a compiler that is capable of recongising parallelism, as you suggest. Compilers that can do this are sometimes called 'heroic' compilers, for the reason that the required transformations are so incredibly difficult, and heroic compilers that actually work (well) simply don't exist.

    • See my reply above (vcl v2 [slashdot.org]) and look on the linux for PS2 website for VCL [playstation2-linux.com].

      VCL takes sequential code and splits it up into parallel code based on the constraints of the vector-units (each VU is dual-issue, with some restrictions). It'll re-order code, insert wait states, etc. Certainly it's a good start at auto-parallelisation of the code. It's supposed to do as well as a skilled engineer...

      Simon
    • The compiler may have pragma instructions or linker bindings for parallelism, which would be easily taken advantage of by higher-level libraries, even if end-users don't know how to use it (though, imho, they can learn easily enough).
    • Well, there's already been one parallel processing success story - the GPU. Granted, the GPU provides a more restrictive programming environment and memory model than the Cell, but with the right training and the right tools, it is possible to write code that effectively exploits parallelism.

      Let's also not lose sight of the big picture with regard to the Cell: the 8 parallel vector processors are coupled with a single CPU core derived from the PowerPC chip. So the overarching structure of the Cell isn't a

  • by SmallFurryCreature ( 593017 ) on Sunday February 26, 2006 @05:03PM (#14805106) Journal
    I seem to remember that the PS2 was a bitch to code for as well and that many of the early titles did not make full use of its capabilities. So?

    All this meant that as the PS2 aged it could 'keep up' because the coders kept getting better and better.

    Mere mortals do not write the latest graphics engines. I think there are a lot more tier1 people running around then /. seems to think. They are just to busy to comment here.

    All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan.

    If your a game company and your faced with the choice of either making just another engine OR spending some money on the kind of people that code for super computers and get an engine that will blow the competition out of the water then it will be a simple choice.

    Just because some guy on website finds it hard doesn't mean nobody can do it.

    • "All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan."

      Yea, but what's the full power of a system? Prettier graphics?

      The "full power" of the PS1 seemed to be that its games became marginally less ugly as time went on, although FF7 was very well done since it didn't use textured polygons for most of it (the shading methods were much sexier). When I think about FF9, I don't like it more because it uses the PS1 at a fu
  • compilers ... (Score:5, Insightful)

    by dioscaido ( 541037 ) on Sunday February 26, 2006 @05:07PM (#14805126)
    ... can get you only so far. You need to have parallelism in mind when you write the high-level code, otherwise it may end up with needless dependence on serial execution that a compiler may not be able to break, reducing the benefits of such an architecture. It will be interesting to see how well games are suited for concurrent execution. Logically there are lots of computations that can be performed independently (AI, physics) but all of it has inherent interaction with a central data source (the game world).
    • There are compiler extensions that allow for multi-threaded code etc., specifically designed with parallelism in mind. However, yes, your point is good. I think the playstation will need some well thought-out high-level engine APIs even if the compiler is good, before many games with optimal performance are released. However, I'll be surprised if the Cell becomes cheap and has good raw performance, but isn't readily adopted and adopted to by the high-performance computing crowd.
  • Always wondered why there is no cooperation between chip makers and even video card companies to make a compiler like this.

    • There has. Itanium is the most recent example. Most of those efforts fail because, in the real world, getting good performance only with a single compiler from a single vendor, and then usually only if the stars align right, isn't good enough.
  • Far too complex? (Score:2, Insightful)

    by hptux06 ( 879970 )
    Cells big programming problem goes right down to each SPE: The assembler commands for which cannot actually address main memory! Every time information is read into / out of the 256K "local storage" on each SPE, a dma command must be issued. Now, while this is Cell's greatest asset (Execution continues while seriously slow memory movement occurs), it is also difficult to work with.

    Your average C programmer doesn't take architecture into account, and so there's no user indication of whether a variable can be
    • by stedo ( 855834 )
      Your average C programmer will not be developing the core code. Most likely, a group of very good coders will create a game engine, and the average C programmers can use the API that the highly-skilled, highly-paid engine coders created to hide unnecessary implementation details.
    • Your average C programmer doesn't take architecture into account,

      That's because, to the average C (or C++) programmer, speed doesn't matter -- ease of coding and debugging and maintenance does. However, that's not the case with games developers (or, more correctly, games engine developers these days), or high-performance computing people (ie, scientists who write weather prediction programs and such). To them, it matters, and they'll code for it. But, they also have tools like MPI and PVM, which are desi

  • Hmm that FA was totally devoid of any real details. As it seems to me, and granted I do not develop on cell processors, and I am not a stickler for the "next big thing", but these things may be interesting. Unfortunately, if they want me to use them I need to know it works for me. I want my existing code to compile with minimal changes so I can test the new platform in the raw. I have the resources to test a few "maybe good may be not" systems a year. What I want to know in short is, If it "could" work
  • by idlake ( 850372 ) on Sunday February 26, 2006 @05:36PM (#14805229)
    If a CPU needs a special compiler in order to give good performance, it's basically dead; there are simply too many different applications that do binary code generation.

    Also, the division into "expert programmer" and "regular programmer" is silly. Most coding is done by people who aren't experts in the cell architecture (or any other architecture). That's not because people are too stupid to do this sort of thing, it's because it's not worth the investment.

    If Cell can't deliver top-notch performance with a simple compiler back-end and regular programmers who know how to write decent imperative code, then Cell is going to lose. Hardware designers really need to get over the notion that they can push off all the hard stuff into software. People want hardware that works reliably, predictably,and with a minimum of software complexity.

    Maybe CISC wasn't such a bad idea after all--you may get less bang for the buck, but at least you get a predictable bang for the buck.
    • by theJML ( 911853 ) on Sunday February 26, 2006 @06:33PM (#14805458) Homepage
      As a programmer, there's only so much that can be done in software. Sure you can parallize things, and you can come up with newer/faster algorthms, but if we didn't get dual proc systems, that would have been pointless. So with parallel procs, we get better parallel code. Hardware advances will create software advances, and new algorthms will direct hardware futures. This is the way the world works, and I think it's worked out fairly well so far. Lets see what the Cell and processors after it can do!
    • If a CPU needs a special compiler in order to give good performance, it's basically dead

      Pretty much all modern CPUs need special compilers to give good performance. Unless you can keep track of the number of pipeline stages, the degree of superscalar architecture, etc. you will get sub-optimal code. The P4, for example, can have 140 instructions in-flight at once. Can you keep track of your code over a 140 instruction window and make sure there are no hazards? If not, then you're probably better of

  • Is it just me or is it that we went from cisc to risc and now going back to risc again?

    I assumed less complex chips with optimizations coming from compile time were more efficient or cost effective?
    • Re:CISC? (Score:2, Interesting)

      by tarpitcod ( 822436 )
      A key problem with CISC was that doing virtual memory and handling page faults on a CISC processor was so incredibly insanely complicated that you ended up going insane and designing your pipeline could throw multiple page faults on one instruction and you had a god-awful mess to clean up.

      The problem with the Cell is actually pretty interesting. They decided to go for in-order CPU's for the SPE's which means that to get good performance you sure as hell better know what your dependencies are and take into
    • Re:CISC? (Score:3, Funny)

      by Tim Browse ( 9263 )
      Is it just me or is it that we went from cisc to risc and now going back to risc again?

      Yeah, but the advantage of doing it this way is that the 2nd transition (from risc back to risc) is really quick!

  • I recall a common complaint by development houses about Sega consoles were that they were very difficult to code for because of hardware complexity. Isn't Sony now making the very same mistake that doomed Sega's console business? Speaking of which, is XB360 easier to code for than PS3?
    • "I recall a common complaint by development houses about Sega consoles were that they were very difficult to code for because of hardware complexity. Isn't Sony now making the very same mistake that doomed Sega's console business?"

      Sega didn't make a single mistake, they made a LOT of them. I imagine you're thinking of the Saturn. It was supposed to be a SNES killer. In other words, all the fancy technology it had was meant to throw sprites on the screen. Then Sony showed up with it's fancy ass 3D archit
    • Dreamcast is one of the easiest game consoles for programmers.
    • by CarpetShark ( 865376 ) on Sunday February 26, 2006 @07:06PM (#14805580)
      The Cell doesn't seem to be that complex. It's a powerful processor, with multiple elements and associated timing issues that you have to be aware of, but that's nothing like the Gamecube or similar, which had all these weird modes and issues that I can't even recall now, probably because my brain blocked it out ;) It'll be a challenge for people who don't know parallel programming, and it might frustrate some who imagine that a cpu with 8 SPEs should act like 8 entirely independent machines, each with its own SPE. But, I think games developers these days will take it as par for the course. There seems to be a trend now that only the biggest and best games companies actually develop game engines (ie, right low-level optimised code), while the other companies just rent the technology and develop levels and artwork and scripting based on that engine. So, the big question is how many of the engine developers will get on board early and if they'll be sufficiently inspired and up to the task. I think they'll find a way :)
  • I haven't done a lot of multi-threaded programming, so maybe this is actually commonly available, but I think a nice language-level parallelism feature would be something that could handle a really basic "for each" type loop:

    serialCode();

    pfor(element in collection) {
    element.parallelCode();
    }

    serialCode();

    without having to worry about manually setting up the threads, etc - if there are multiple resources available, they get used, if not, then it happens in serial. Is there anything like this out now?

    • FORTRAN has a construct somewhat like this. In FORTRAN you can operate on vectors as if they were scalars, which makes it much easier to generate vector unit code from FORTRAN than from (for example) C. This doesn't help with the problem of generating code for SPUs, however. A language that would make it easy to generate SPU code would have to have message passing built in, similar to that found in Erlang, but designed for larger messages and vector features similar to those found in shader languages.
  • (Warning : troll venting off.)
    Let me summarize
    1. take one of the most unsafe, slowest-to-compile, pitfall-ish, unspecified languages in existence (ok, I might be exagerating on the "unspecified" part)
    2. add even more #pragmas and other half-specified annotations which are going to change the result of a program near invisibly
    3. don't provide a debugger
    4. require even more interactions between the programmer and the profiler, just to understand what's going on with his code
    5. add unguaranteed and slow static analy
    • by Bazzalisk ( 869812 ) on Sunday February 26, 2006 @07:37PM (#14805684) Homepage
      C lacks a lot of features of more modern languages - but I think you'd be hard-pressed to find a modern autogarbage-collecting dynamicly typed modularise language which can handle low-level programming anything like as well as C.

      Certainly if I'm writing a pleasant little modern desktop application I'm going to write in Objective C or C# - would seem a little silly not to ... but for writing a compiler, a network stack, or gods forbid a kernel I don't know of anything that works even close to as well as C. C still has a niche, can't realy change that.

  • by joshv ( 13017 ) on Sunday February 26, 2006 @07:54PM (#14805741)
    The problems IBM programmers are having are emblematic of the problems that the PC industry is going to be facing in a few years. Multi-core is the future of PC performance. Increasing GHz and IPC of single processors has pretty much hit a wall. Creating Dual and multi-core CPUs is the best approach we have left for increasing performance with future increases in transistor count/density.

    The problem is that single threaded programs will run just as slowly on your quad-core 'Core-Quattro' in 2008, as they did on your old Pentium 4 - c. 2005. Great, yeah, I know, server loads parallelize very nicely (witness the miracle of Niagra), but consumer grade CPUs are where the volume is at, and people are going to have to notice a real difference in performance in order to stay on the hardware upgrade treadmill. This necessitates that Intel/AMD/IBM come up with new programming models that make it easy to parallelize existing code. Parallelized libraries and frameworks are all well and good, but it will be 20 years before everyone gets around to recoding the existing codebade to the the new platform - and most of them are probably not going to generate optimal code.

    No, what we need are compilers that take programs written in a serial fashion, and emit code that scales well on multiple processors. The problems with the PS3 are only the beginning.
  • I remember (Score:3, Interesting)

    by DSP_Geek ( 532090 ) on Monday February 27, 2006 @12:20AM (#14806415)
    About ten years ago VM Labs came out with something not too far off conceptually from the Cell - vector instructions, local memory you had to DMA in and out of, 4 processors on a chip. It wasn't floating point, however, and the development tools were best described as rudimentary: the best way of debugging was to deliberately crash the box and examine the register dump barfed back over TCP/IP.

    They called a developer's conference in August 1998, where after the presentation a veteran game coder shrugged: "Another weird British assembler programming cult".

    The Cell strikes me the same way, and for the same reasons, although Big Blue likely has more development tool budget than VM ever did. Not to take anything away from the smart guys at IBM, but I suspect they'll have a fun time working around the Cell's limitations. I can tell them from experience that DMAed local memory will be much more of a pain in the ass than they can imagine, and unless they can guarantee sync in hardware they'll be wasting a bunch of time schlepping spinlocks in and out of memory. The vector stuff will also be nontrivial: the best way to make that usable, apart from having everyone write vector code from the git-go, would be to provide a stonking great math library in the style of the Intel Integrated Performance Primitives.

    As an aside, the PS3 is in the tradition of Sony not caring about who programs their machine: the PS1 was easier to code than the Saturn, which was a true horror, the PS2 upped the difficulty a fair bit, and now even experienced coders are bitching about the PS3. Meanwhile Microsoft is learning from their mistakes: the X360 is easier than the X1, and if you doubt that makes a difference, check out game development budgets and time to delivery. I don't care, really: I eat algorithms and machine code for breakfast, so this just means more jobs and money for me.
  • by Animats ( 122034 ) on Monday February 27, 2006 @01:21AM (#14806551) Homepage
    The basic problem with the Cell processor is that the SPEs each have only 256K of private memory, with uncached, although asynchronous, access to main memory. It's the unshared memory that's the problem.

    This architecture has been tried before, for supercomputers. Mostly unsuccessful supercomputers you've never heard of, such as the nCube [wikipedia.org] and the BBN Butterfly. [paralogos.com] There's no hardware problem building such machines; in fact, it's much easier than building an efficient shared-memory machine with properly interlocked caches. But these beasts are tough to program. The last time around, everybody gave up, mainly because more vanilla hardware came along and it wasn't worth dealing with wierd architectures.

    The approach works fine if you're doing something that looks like "streaming", such as multi-stream MPEG compression or cell phone processing. If you want to do eight unrelated things on eight processors, you're good.

    But applying eight such processors to the same problem is tough. You've got to somehow break the problem into sections which can be pumped into the little CPUs in chunks that don't require access to any data in main memory. The chunks can't be bigger than 50-100K or so, because you have to double buffer (to overlap the transfers to and from main memory with computation) and you have to fit all the code to process the chunk into the same 256K. That's a program architecture problem; the compiler can't help you much there. Your whole program has to be architected around this limitation. That's the not-fun part.

    You have to make sure that you do enough work on each chunk to justify pumping it in and out of the Cell processor. It's like cluster programming, although the I/O overhead is much less.

    In some ways, C and C++ are ill-suited to this kind of architecture. There's a basic assumption in C and C++ that all memory is equally accessable, that the way to pass data around is by passing a pointer or reference to it, and that data can be linked to other data. None of that works well on the Cell. You need a language that encourages copying, rather than linking. Although it's not general-purpose, OpenGL shader language is such a language, with "in" and "out" parameters, no pointers, and no interaction between shader programs.

    Note that the Cell processors don't do the rendering in the PS3. Sony gave up on that idea and added a conventional NVidia graphics chip. (This guaranteed that the early games would work, even if they didn't do much with the Cell engines.) Since the cell processors didn't have useful access to the frame buffer, that was essential. So, unlike the PS2, the processors with the new architecture aren't doing the rendering.

    It's possible to work around all these problems, but development cost, time, and risk all go up. If somebody builds a low-priced 8-core shared memory multiprocessor, the Cell guys are toast. The Cell approach is something you do because you have to, not because you want to.

Memory fault - where am I?

Working...