Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
IBM Transmeta Hardware

Ars Dissects POWER5, UltraSparc IV, and Efficeon 176

Burton Max writes "There's an interesting article here at Ars about the POWER5, UltraSparc IV, and Efficeon CPUs. It's a self-styled "overview of three specific upcoming processors: IBM's POWER5, Sun's UltraSparc IV, and Transmeta's Efficeon. " I found the insights as to Efficeon (successor to Crusoe) to be particularly good (although it paints a sad picture of Transmeta, methinks)."
This discussion has been archived. No new comments can be posted.

Ars Dissects POWER5, UltraSparc IV, and Efficeon

Comments Filter:
  • Good article (Score:5, Interesting)

    by The_Ronin ( 202785 ) on Thursday November 20, 2003 @03:02PM (#7522734)
    Too bad they focused too much on Power and Transmeta while paying little time on UltraSparc IV and V and ignored Itanium. Needs a little more balance and it would have been a great read.
    • Re:Good article (Score:5, Interesting)

      by AKAImBatman ( 238306 ) <<moc.liamg> <ta> <namtabmiaka>> on Thursday November 20, 2003 @03:16PM (#7522847) Homepage Journal
      I think it would have been best to have an article devoted to the TransMeta chip, and split the Power5/UltraSparc discussion out into its own article. That way he could have given a great deal more attention to the powerhouse chips and how they're going to change the future. TransMeta's chips are on the level of ARM, not UltraSparc.

    • Re:Good article (Score:3, Informative)

      by Anonymous Coward
      There's a reason they ignored Itanium, it's about upcoming processor technologies. Last I checked there wasn't a new, soon to be released, Itanium that Intel was pushing.

      In fact, the current Intel processor roadmap [intel.com] shows the same Itanium 2 processor for the first half of 2004 as it did for the second half of 2003.
    • Re:Good article (Score:3, Informative)

      by jaberwaki ( 565104 )
      I believe he didn't spend much time on the UltraSparc IV because, quote:

      "To get the "hyperthreading" effect of two processors on one chip, Sun stuck two full-blown UltraSparc III cores on a single chip, which is chip-pin compatible with the UltraSparc III."

      He assumes the interested reader will already know something about the UltraSparc III. Sun didn't fundamentally change the chip architecture. Also the Itanium architecture is already discussed ad-nauseum in other articles. It wasn't meant to be a ba
      • Re:Good article (Score:4, Informative)

        by pmz ( 462998 ) on Thursday November 20, 2003 @07:22PM (#7524625) Homepage
        Sun didn't fundamentally change the chip architecture.

        Probably the most significant outcome of the USIV will be 212-CPU Sun Fire 15K servers. That seems to imply something like 5 or 6 CPUs per rack-unit (although it appears the 15K is somewhat bigger than a standard rack).

        • Standard 15k currently support 72 CPUs normally (4 CPUs in each of 18 system boards). This can be expanded to 106 by putting CPUs in I/O slots, although those CPUs are partially crippled by latencies/bandwith constraints by doing so. USIV give a form-factor equivalent to USIII so it should allow a simple doubling of capacity, assuming the firmware and OS support it.

          As for rack sizes, the 15K racks are about the same size as normal racks, but are slightly deeper. The system is not like a standard rack +

  • I had a brain fart for a second while reading the article:

    "This is why the advances that have the most striking impact on the nature and function of the computer are the ones that move data closer to the functional units. A list of such advances might look something like: DRAM, PCI, on-die caches, DDR signaling, and even the Internet"

    For a second there, I thought that the list of advances started with DRM, not DRAM, and I almost had a heart attack.
  • by Anonymous Coward on Thursday November 20, 2003 @03:04PM (#7522759)
    Since day 1 they have skirted the benchmark issue, always trying to deflect the question.

    Just like that article yesterday on their new chip. Did they ever cite a single benchmark? NO.

    The basic performance of your CPU product, as measured by industry standard benchmarks, is essential knowledge.

    I was under NDA on the previous gen Transmeta stuff. It was amusing how the other OEMs reacted - it was crap, but nobody could say anything in public.

    • > Since day 1 they have skirted the benchmark issue

      That's because they aren't going for speed. They are going for low power consumption. To compare Transmeta to Intel based purely on speed would be missing the point entirely.
  • Sun? (Score:4, Interesting)

    by Raven42rac ( 448205 ) * on Thursday November 20, 2003 @03:08PM (#7522783)
    Why the heck did Sun's offering get thrown in there? For variety? The Efficeons look awful nice to people who want less power-hunger from their computing devices. If all you do is word processing and such, why the heck even use an Intel/AMD chip? Less heat, less power, what is not to love? Now the IBM chips have really piqued my interest, I am a huge fan of IBM's chips, especially in Apple computers (I am a proud owner of a 12" Powerbook).
    • ... I am a huge fan of IBM's chips, especially in Apple computers (I am a proud owner of a 12" Powerbook).

      I am sorry to break this to you, but the 12"PB has a Motorolla chip in it...

      it just had to be said....

    • Re:Sun? (Score:5, Insightful)

      by illumin8 ( 148082 ) on Thursday November 20, 2003 @04:50PM (#7523629) Journal
      I am a huge fan of IBM's chips, especially in Apple computers (I am a proud owner of a 12" Powerbook).

      I don't mean to burst your bubble, but your 12" PowerBook uses a Motorola processor, not an IBM one. I own a 15" PowerBook though and I love it.

      That having been said, the IBM PPC 970 or G5 is breathing new life into the PowerMac line and Apple is doing really well because of it. I can't wait until they get it stuffed into a PowerBook.
      • Assuming the Powerbook owner has a new 'book, then you are most certainly correct. However, Apple has produced many computers under the "Powerbook" moniker, and some of them had 12" screens. I would wager some even included G3's or 603e's which would have been produced by IBM. You most likely know that, but there are alot of new folks interested in macs these days who might believe that IBM is a new player in Mac CPU's. IBM has been supplying apple with CPUs since 1994.
    • Re:Sun? (Score:3, Insightful)

      by Kunta Kinte ( 323399 )
      Why the heck did Sun's offering get thrown in there? For variety? The Efficeons look awful nice...

      "I don't like or use it so one else does"

      Real smart.

      Any idea the amount of Sun systems are out there? People who use Sun hardware and software, and *gasp*, like it?! Should we only evaluate chips that currentlydo ok in the slashdot market?

  • rTransmeta

    What's this?

  • One Power 5... (Score:5, Interesting)

    by Realistic_Dragon ( 655151 ) on Thursday November 20, 2003 @03:09PM (#7522796) Homepage
    Will show up as _4_ processors to the OS! (2 cores both doing SMT.)

    This means that in a (say) 512 processor box the OS will have to handle 2048 processors efficiently. That's placing a lot of control in the hands of the software designers, and a lot of money in the hands of the companies that license per processor.

    On the other hand, UNIX is getting pretty efficnelt at scaling to large systems, perhaps it (and by extension Linux thanks to SGI and IBM) will be able to handle it with no problems. One thread per processor on a desktop system might prove to be quite efficient :o)
    • Re:One Power 5... (Score:4, Interesting)

      by stevesliva ( 648202 ) on Thursday November 20, 2003 @03:17PM (#7522852) Journal
      I'm getting a lot of karma mileage from this Power5 MCM review [theinquirer.net] these days. They visited the same Microprocessor Forum that Ars did.
      • Geez! That thing looks like you could club someone over the head with it! Does putting 8 processors into a block on cement really improve things that much over multiple processors?

        • Re:One Power 5... (Score:3, Informative)

          by stevesliva ( 648202 )
          It's four dual-core SMT processor chips, and four L3 cache chips per MCM, actually. I think cache is sexy, but I'm biased.
          • Re:One Power 5... (Score:3, Insightful)

            by AKAImBatman ( 238306 )
            Alright, 4 two-way chips. But does it actually improve anything over individual processors? If I have to yank a board on an UltraSparc, I'm not going to throw away the entire board and all its processors! I'm simply going to replace the bad one and slap the board right back in the system. With IBM's design, I have to throw the whole thing away and get a new block of cement^W^W^W processor chip for my machine.

            • I don't know. Although I do think that the MCM package is trickling down IBM's eServer line from the hugely reliable mainframe zSeries into the pSeries. As far as bus advantages, etc, I'm not enough of a systems geek...
            • For one thing, there's a lot of interconnects inside that cement block, so it's not just like exposed pins for all the chips on the other side. For another... how often do chips die? If you can afford a machine with one of those chips, you can probably afford to replace a whole brick.
              • For another... how often do chips die? If you can afford a machine with one of those chips, you can probably afford to replace a whole brick.

                Actually, that's what a support contract is for. The bigger problem is availability. Each brick requires four processors, plus the various work to mold all the interconnects into place. The yield on a process like that can't be very high. Not to mention all the custom parts that would be needed to fit a chip like this.

                In other words, if my processor fails, there's
                • Chips are tested before they are put into MCMs. So, not really a problem..As far as supply, your downtime would measure in the hours, if you had the cash to buy a big system like these.
                • That's one of the dumbest things I've heard here (and that's saying a lot)

                  IBM won't be making these things to order, the minute your RS/6000 (p-series) looses a processor
                  in the brick a CE will be out that day to fix it.

                  • the above post is supposed to respond to this [slashdot.org] post
                  • Dude, chill out. It's called "playing devil's advocate". And my point is that a block like this is going to cause yield problems that could impact IBM's ability to supply them. Yes, even IBM can run into supply problems.

            • Re:One Power 5... (Score:2, Interesting)

              by redgren ( 183312 )
              Reliability is pretty impressive on these designs, considering the complexity. At least on the Power4 (similar design), MTBFs are measured in decades.
              • Re:One Power 5... (Score:3, Interesting)

                by AKAImBatman ( 238306 )
                Yes, but what are the advantages? IBM, Sun, and HP all make a business out of selling components with very high MTBF. Yet, if I have a 64 processor machine chugging along for years on end, I have a reasonably good chance of seeing a failure. (Particularly when chips come from a bad batch.)

                So, IBM is taking away the ability to hot swap individual chips in exchange for... what? That's the big question. If there's some major improvement in the design, say so! Inquiring minds want to know! :-)
                • by Anonymous Coward
                  Ok, so you are worried that your parts are no longer accessable.

                  One of the first computers I built had individual TTL parts (74xx type things) to make the CPU. If I fried on of those, I would just replace that single part and be going again. No need to replace the whole CPU.

                  I, for one, would never go back to that. Not just the size but the performance and the cost.

                  It used to be that I would buy 4K-bit RAM chips. Buy 8 of those to make a 8x4K RAM array (4K bytes) and then add a simple address decoder
                • Re:One Power 5... (Score:4, Informative)

                  by isaac ( 2852 ) on Thursday November 20, 2003 @04:57PM (#7523673)
                  So, IBM is taking away the ability to hot swap individual chips in exchange for... what? That's the big question. If there's some major improvement in the design, say so! Inquiring minds want to know! :-)
                  Damn, dude, RTFA if you're that curious!

                  What is gained is full-speed interconnect between processors within the same module. No "multipliers" - the bus between the cores within the module run at chip speeds. The timings are so tight at 2+ GHz that this is simply impossible to do with individual chips.

                  -Isaac

                • It really is just like the AC that replied to you states it. Its a matter of creating denser and denser integrated products. The Chip-chip intereconnects number somewhere around 5500 on a Power4 MCM, if you attempted to route this out of each chip, into a PCB, you would not get near the interconnectspeed of the as if you contained it all in an MCM. MCM-MCM interconnect speed drops by a factor of 3 compared with chip-chip in an MCM.
                • I have a 64 processor machine chugging along for years on end, I have a reasonably good chance of seeing a failure. (Particularly when chips come from a bad batch.)

                  So source bricks from the same batch and source multi-brick systems from different batches. If you have to toss the whole brick at once, it's best to keep the stuff that's more likely to fail on that brick.

        • From the inquirer article, I'd guess the major advantages are 2 things - An *extremely* fast bus between the CPUs on the same package, and increased density.
      • oh... my... sweet... jebus...
    • And if one of the threads blocks on IO? You would actually want more processes running than the total number of processes. The exact number depends on how many and how often processes get blocked for various reasons, but I think 1.5 or 2 is considered a good factor. That means something like 4000 processes would make pretty efficient use of a 512 processor box.
    • Re:One Power 5... (Score:1, Informative)

      by Anonymous Coward
      The POWER 4 based systems from IBM are only available up to 32-way. I'd expect them to try to double it to 64-way. So OS would see 128 processors with the POWER 5.
    • This means that in a (say) 512 processor box the OS will have to handle 2048 processors efficiently. That's placing a lot of control in the hands of the software designers, and a lot of money in the hands of the companies that license per processor.

      Fortunately for IBM, they are both the hardware designers and, frequently, the software designers. They can ensure that their big iron will be supported by software.
  • by G4from128k ( 686170 ) on Thursday November 20, 2003 @03:11PM (#7522806)
    The history of Wintel suggests that top-rated raw CPU performance is not the best predictor of adoption. Compatibility with market-dominating software platforms is a greater determinant of CPU sales. We might hope that advances in compiler design adn flexible cores can help any CPU run x86 code, but there are always the little nts that prevent true compatibility and drive computer buyers toward the dominant platform.
  • by Animats ( 122034 ) on Thursday November 20, 2003 @03:18PM (#7522864) Homepage
    First "Hyperthreading", now "prioritized hyperthreading".

    It's amusing seeing this. It reflects mostly that Microsoft has finally managed to ship in volume OSs that can do more than one thing at a time. (Bear in mind that most of Microsoft's installed base is still Windows 95/98/ME. Transitioning the customer base to NT/Win2K/XP has gone much more slowly than planned.)

    But Microsoft takes the position that if have multiple CPUs, you have to pay more to run their software. So these strange beasts with multiple decoders sharing ALU resources emerge.

    • Microsoft will eventually provide XP Home in an SMP flavor, it's only a matter of time. Perhaps they will have an HT edition before that happens. But SMP for free is just another selling point for Linux, so they won't let it be a sticking point forever.

    • I really fail to see how Microsoft's multi-processor licensing scheme falls into this. Correct me if I'm wrong, but since when has any MS OS run on a POWER chip or a Sparc?

      If these were x86 chips I think the licensing question would be valid, but since they're not...

    • Bullshit (Score:1, Informative)

      by Anonymous Coward
      MS has shipped preemptive multitasking and multithreading for a long time. You are confusing that with multiprocessing (which is different).

      Win95/98/ME are not multiprocessor but are preemptive multitasking and multithreading. They can certainly do "more than one thing at a time". Unlike Apple who first shipped this capability only recently, MS first shipped this in Windows 386 back in the late 80's.
    • Bear in mind that most of Microsoft's installed base is still Windows 95/98/ME.

      Is that really true? Judging by the web logs from my employer's site, it looks like about 65% of our users are on NT/2K/XP. Our customers are all in the construction industry, not the tech industry, so they aren't likely to be early adopters.

      If you're talking MS's home users, then that's pretty plausible, but home users aren't the majority of Microsoft's installed base.

      I'd be interested to see some numbers, though, if you

  • power consumption (Score:5, Interesting)

    by bigpat ( 158134 ) on Thursday November 20, 2003 @03:22PM (#7522888)
    Wasn't low power consumption the number 1 benefit that transmeta was looking to provide, so that you could get twice the battery life (or soemthing like that) without sacrificing too much performance. Did Transmeta shoot itself in the foot by letting people think that it was going to provide higher performance chips than the competition.

    The main selling point of transmeta was always power consumption, so have they lost their edge in that area? If so, then that would be serious for them, but the article doesn't answer that question.

    • The problem was that they provided around 10% more battery life at around 50% the performance.
    • My instinct is to say that under such benchmark conditions, Efficeon would edge out Centrino in terms of MIPS/Watt on the same process technology. However, would it be by enough of a margin to warrant going with the smaller TM instead of the larger Intel, both in terms of any added design costs incurred from using the TM parts and in political terms of angering Intel, and of course in terms of questions surrounding TM's long-term viability? Of that I'm not so sure.

      The article didn't answer the questi

    • No, they're still great for power consumption. Problem is, that the CPU isn't the only thing in most devices sucking power, and they built up expectations that their chips would be able to perform much better than they have turned out to do. I still think they are good choices for a lot of devices that don't really need any more power - they're basically like ARM with x86 compatibility built in, and there are plenty of cases where something like that makes sense - but they definately haven't lived up to the

    • Actually, the original intent of Transmeta was to produce a chip that had performance on-par or better than comparable Intel/AMD offerings.

      When the project failed to do that (quite badly), then the marketeers refocussed the company message to start talking about 'low-power' and efficiency. This deflects the critics who do not understand computer architecture and things like power-efficiency.

      Yeah, it's a neat research project and having Linus work there didn't hurt PR at all, but the performance just isn'

  • ...ship them some super fast cpus for their web server - it's smokin'
  • by joib ( 70841 ) on Thursday November 20, 2003 @03:50PM (#7523086)
    Seems like the power5 will be able to run only two threads per core, like the pentium 4. For the P4 it is understandable that they want to reduce cost as much as possible, but why be so frugal on a high-end cpu like the power5?

    I mean, the MTA supercomputer which pioneered the entire SMT concept, was able to run 128 threads per cpu. Ok, so they had different design constraints as well. Basically, the idea was that the cpu:s didn't have any cache at all thus making them simpler and cheaper. To avoid the performance hit usually associated with this they simply switched to another thread when one thread became blocked waiting for memory access.

    Anyway, is there any specific reason why IBM didn't put more than 2, say 8 or 16 threads per cpu on the power5?
    • contexts != threads (Score:5, Informative)

      by kcm ( 138443 ) on Thursday November 20, 2003 @04:03PM (#7523214) Homepage

      first, you don't just automatically get a linear increase with the width of the multiple-threading capabilities. it's not like it's free to increase the RF size and/or FUs, etc.

      you're also confusing contexts with active threads. the Tera^WCray MTA had 128 contexts available -- so that thread switching is more light-weight, more or less -- but only one could be active at one time.

      SMT in the various forms have more than one active thread, which introduces the problem(s) of competing for resources in the issue and retire stages, etc et al.

    • I recall an article on Forbes (of all places.. they talk to the right guys though) on the matter comparing the Sun Niagra design goals and IBM's and Intel's. Basically the answer was that it's not clear what kinds of apps benefit from 8, 16, or 32 threads of parallelism. This is a low tech description but there are other bottle necks, you have to have that many "threads" of code that are ready to run to benefit from it or else it's cheaper to context switch.

      Subsequently, I don't know how much you've pla

      • In other words, you're laying out the basic problems of:

        1) Being able to FIND parallelism
        2) Being able to take advantage of it:
        a) Issuing multiple instructions (limited fetch bandwidth)
        b) Executing them in parallel (limited FUs)
        c) Committing them to memory / retiring

        20% is generous, but that's a limitation of the simplicity of HT with respect to the EV8 / UltraSparc-V scale of SMT implementation, which leans towards a more full-issue design.
        • Actually the multithreaded design is an answer to the lack of parallelism, as most deisgns are able to deal at the thread or process level, hence the parallelism is implicit and does not ned to be "found" That is the whole point. You are citing the limitations of superscalar, not SMT designs.
          • I wasn't clear.

            Limited resources run out, hence four (independent) threads running in parallel cannot write to the RF or fetch from memory concurrently. If your parallelism involves many different types of operations, it's much easier.

            I suppose my original comment was worded badly -- being *able to* HARNESS the inherent (independent) parallelism with the resources at hand is the key, you are correct.
      • it's not clear what kinds of apps benefit from 8, 16, or 32 threads of parallelism.

        SunRay servers comes to mind, where there are lots of single-threaded users sharing a system.

        In Solaris, for example, every process gets a kernel thread, and every process thread gets a kernel thread. On my workstation, right now, just running CDE and a few apps gets reported as 189 light-weight processes (essentially threads). Have a system shared by 1000 users could result in over 100,000 threads with approximately 100
        • by Anonymous Coward
          This is a false economy. Just because you have 32 threads to run doesn't mean you would benefit from 32-way SMT. Remember that you don't just need 32 contexts in your CPU, you need enough cache to be able to feed 32 unrelated threads. The reason SMT sometimes slows down a CPU is that the 2 or more threads running concurrently compete for cache space. If you just run a single thread at a time, it has a whole quantum to fill up the cache and use it.

          The way this worked on the afforementioned MTA machine is th
          • No, multithreading gives you the ability to hide memory latency issues. While a thread is requesting a memory operation, other thrad can fill the functional units, hence "hiding" the bubles the stalled thread would have generated. If you have to threads competing for memory resources, then there is aproblem with the processor logic. The tera had no data cache, because it can hide all memory latencies when it comes to data requests. The memory requests could be pipelined through maxium levels of banking and
    • The MTA was a fine grained multithreaded machine, way differnet than the SMT approach required for complex superscalar threaded machines. The Tera did have an instruction cache though, no data cache.
    • I mean, the MTA supercomputer which pioneered the entire SMT concept, was able to run 128 threads per cpu.

      It is an older concept (20 years or maybe 30!), look up barrel processors sometime. I'm pretty sure the MTA executed one thread per CPU per cycle with no penality for switching between threads on diffrent cycles. It would switch threads any time a load was issued, any time the store buffer was full and a store was issued, and after X cycles. The resources you need for an MTA thread would be more

  • by csoto ( 220540 ) on Thursday November 20, 2003 @03:51PM (#7523098)
    the author suggests that it's not worth "pissing off Intel" to go with Transmeta. Give me a break. Transmeta is the only thing pushing Intel to make Centrino and other lower-wattage chips. They recognize that anybody in the mobile computing/devices world will seriously consider anything that gives their customers increased battery life and less toasty pockets.
    • Centrino is not a chip!

      it's a package of intel wireless, intel cpu and some other stuff.
      • But, it's a chip PLATFORM that depends on certain Pentium-M chips. Naturally, systems built around Transmeta "chips" will also require Transmeta-compatible support devices (e.g. the "Transmeta PLATFORM").

        My point is, this low-voltage thing was a non-issue before Transmeta came along. Intel just told everyone to "put bigger fans" in their laptops and shut up. I've got this Dell with seriuosly huge fans, and it gets HOT (but it's pretty durn fast, has a big screen and built in DVD/CD-RW). I don't need lo
  • by pz ( 113803 ) on Thursday November 20, 2003 @03:53PM (#7523115) Journal
    Multiple times while reviewing the Efficion architecture the article's author suggests that the tradeoff of additional storage required for Transmeta's code-morphing approach will easily balance out the power savings from making a simpler CPU. This belies a deep misunderstanding of power consumption in digital systems, as readily evidences by the fact that modern non-Transmeta processers dissipate multiple tens of Watts of power (often nearly 100W) and a full complement of memory (4G, in modern machines) dissipates a few Watts at most.

    Also in the article, the author suggests that processors spend most of their time wating on loads, and then argues that since the code-morphing approach means more instruction fetches, the Efficion processor will be spending disproportionatly more time on loads. Then, after this assertion, he admits that he does not know *where* the translated Efficion code is held. Might it be in one-cycle-accessible L1 cache? That point is conveniently sidestepped. He does not understand under what circumstances the profiling takes place, although he regurgitates the sales pitch nicely. He argues that transistors hold the translated code (trying to argue against the transistors-for-software tradeoff) but then does not realize that transistors in memory do not equate transistors in logic (neither in power, as they are not cycled as frequently, nor in speed characteristics).

    In all, I find the author's treatment of the Transmeta architecture sophomoric, and, after finding that section lacking, I left the rest of the article unread. Your mileage may vary.
    • It read much like a financial review of a company. Take the buzz words, guess wildy, base predictions of your guesses. Granted the author was intellegent and understood the basics, but with out a deeper understanding of the specifics he cannot really give reasons for performance or lack their of.
    • by Hannibal_Ars ( 227413 ) on Thursday November 20, 2003 @04:53PM (#7523646) Homepage
      "Multiple times while reviewing the Efficion architecture the article's author suggests that the tradeoff of additional storage required for Transmeta's code-morphing approach will easily balance out the power savings from making a simpler CPU."

      I neither suggest nor imply anything this simplistic. In fact, I go to great pains to show how complicated the whole power picture is for Efficeon.

      "This belies a deep misunderstanding of power consumption in digital systems, as readily evidences by the fact that modern non-Transmeta processers dissipate multiple tens of Watts of power (often nearly 100W) and a full complement of memory (4G, in modern machines) dissipates a few Watts at most."

      Er... you do realize, don't you, that comparing Efficeon to a 100W processor is not only unfair, but it's stupid and I didn't do it anywhere in the article. A more appropriate comparison is Centrino, which approaches Efficeon in MIPS/Watt without any help at all from any kind of CMS software. I think that you might be the one who needs to learn a bit more about digital systems.

      "Also in the article, the author suggests that processors spend most of their time wating on loads, and then argues that since the code-morphing approach means more instruction fetches, the Efficion processor will be spending disproportionatly more time on loads. Then, after this assertion, he admits that he does not know *where* the translated Efficion code is held. Might it be in one-cycle-accessible L1 cache? "

      No, it is most certainly all not stored in L1. TM claimed that the original CMS software that came with Crusoe took up about 16MB of RAM, and that this was paged in from a flash module on boot. What I'm not 100% certain of are the exact specs for Efficeon, but I've assumed in this article that they're similar. This is a reasonable assumption, especially given the fact that the new version of CMS contains significant enhancements and is unlikely to be smaller. In fact, it's much more likely to be larger than the original 16MB CMS footprint, especially given that DRAM modules have increased in speed and decreased in cost/MB, which gives TM more headroom and flexibility to increase the code size a bit.

      "That point is conveniently sidestepped. He does not understand under what circumstances the profiling takes place, although he regurgitates the sales pitch nicely. He argues that transistors hold the translated code (trying to argue against the transistors-for-software tradeoff) but then does not realize that transistors in memory do not equate transistors in logic (neither in power, as they are not cycled as frequently, nor in speed characteristics)."

      Of course I know that transistors in memory are not the same as transistors on the CPU. My point though is that they're still not "free" in terms of power draw, and that it also costs power to both page CMS into RAM and to move it from RAM to the L1. And even having pointed that out, I still don't claim that this cancells out all the power saving advantages of TM's approach.

      As far as relying on the sales pitch for info on CMS's profiling, well, TM doesn't exactly release the source for CMS, nor do they provide a detailed user manual for it avialable to the public. As their core technology, details about CMS are highly guarded and the only information that either you or I will likely ever have access to about it is whatever they put in the sales pitch. So I, like everyone else, must draw inferences from their presentations and do the best I can.

      Anyway, if you don't like the article, that's fine. But being a hater about it just makes you look lame.
      • by PastaAnta ( 513349 ) on Thursday November 20, 2003 @07:43PM (#7524763)
        First of all I thank you for a great article. You have som interesting views on the Transmeta approach. But like the parent poster I feel you may jump to some conclusions based on assumptions.

        It is true, that the CMS has a cost in terms of RAM usage but this does not necessarily translate into extra load latency. As I have understood the clue should be to utilize the fact that in common code you only execute a very little portion of the code most of the time (like 90%/10% or whatever). It should be expected that much can be gained by heavily optimizing these "inner loops", which should translate into reduced load latency as fewer instruction will be executed in total. The execution of the four optimisation runs or JIT compilation should drown in the millions of times these inner loops are executed.

        You could say that it is a complete waste of transistors and power usage to have many transistors performing the same optimisation over and over again in the conventional processors. These hardware based optimisation will also never be as efficient as their scope is limited.

        There are some interesting perspectives with the Transmeta approach as well. You state that POWER5, UltraSparcIV and Prescott tacle the problem with load latency by using SMT to fill pipeline bubbles from data stalls and thereby increase utilisation of the execution units. This should be possible for Transmeta as well, by upgrading their CMS to emulate two logic processors instead of one.

        But you are right! A complete theoretical comparison is impossible - only real world experience will show...
      • I have to concur with the "hater". This article wasn't up to my standard for Ars, and I recoiled at exactly the same implications regarding the power draw of CMS itself.

        CMS and its translation buffer takes a small fraction of the available RAM, and all of RAM takes a small fraction of the power the CPU does, so we're talking about a fraction of a fraction. Translations live in RAM, btw, and are cached like any other executable code, when needed.
    • The power consumption of a clocked transistor device is directly proportional to (a) the number of transistors, and (b) the square of the clock speed. Let's call it 1U for 1 transistor at 1 MHz, so (power consumption in U) = (# transistors)x((clock speed in MHz)^2). This means that 2 transistors at 1MHz will consume 2U, where 1 transistor at 2MHz will consume 4U.

      The Pentium 4 has upwards of about 55 million transistors on the die. SDRAM needs 1 transistor and 1 capacitor per bit; for 8x1024x1024x1024 bits

      • Oh my god! Please stop comparing apples and banjos and try to make sense of it!

        DDR SDRAM does not "run" at around 400MHz - the frequency of the databus is 400MHz. As you state yourself the power usage is very dependant on the usage pattern and only very few memory cells actualle change state during each write (up to 8 for an 8 bit RAM). I would guess that leakage and discharge of the capacitor cells is a significant factor, which you totally ignore.

        In a processor on the other hand, a lot of transistors ch
      • The base clock speed of the aforementioned P4 is 3.0 GHz, whereas the fastest DDR SDRAM runs at around 400MHz.

        Yes, except that the fraction of transistors switching in the two at any given moment is vastly different: in the P4 it will be reasonably high, in memory chips, it will be vanishingly low. Thus your analysis is inaccurate at best and potentially misleading at worst.

        Think of the following empirical observations: a modern processor cannot run without a heatsink without going into thermal failure.
  • Well, I tried to read the FA before making a comment, but it was futile, there was some enormous flashing strobe-y advert to it that was just painful to have on the screen.

    So, I gave up. I have no clue what the advert was for, it had a sort of minimalist man icon in it, and lots of flashing colours - that's all I know. I do however know a lot more about advertising than the idiots who thought that one up.

    Simon.
  • by Anonymous Coward
    Interesting article indeed, yet there is a thing I on't quite understand about ILP (Instruction Level paralellism) :
    If the number of decoded instructions is higher, then - the CPU being superscalar - the probability of having all pipelines working grows, which means that ILP's also going up.
    Of course the ILP depends on the compiler quality and the program code itself, but having a good parallelism capacity in the CPU is also a key factor.

  • ..and why is he taking UltraSparcs appart anyway?
  • One detail that they didn't mention was the integrated AGP and DDR memory controller on Efficeon. Blades don't use graphics, so I'm thinking that Efficeon was designed primarily for Japanese laptops.

    Efficeon allows for a low chip count design. That could mean a smaller and more reliable laptop design.
  • by crgrace ( 220738 ) on Thursday November 20, 2003 @04:43PM (#7523576)
    I actually read the article!!!!!

    All my questions were answered so I have nothing to say.
  • by Anonymous Coward
    I don't understand why Transmeta still comes up in conversation. Besides the fact that they hired Linus, what exactly have they done to merit this inclusion alongside IBM, Sun, and Intel? There are plenty of other CPU manufacturers that sell x86 clones now... I think Cyrix was bought by some Taiwanese fab plant company, weren't they?

    Until Transmeta becomes a real contender, let's just keep out of the Linux biases and concentrate on the real contenders.

    My prediction is that if they don't produce a real h
    • Perhaps because (Score:3, Insightful)

      by TCaM ( 308943 )
      unlike the other x86 knockoff manufacturers they have actually attempted something somewhat new and different in their designs. They may not have met with a roaring success marketwise but they certainly did try to attack things from a different angle. The point of the article seems to be comparing the somewhat different aproaches the various cpu makers took in their designs, not how many millions of chips they have sold or billions of dollars they have in the bank.
  • by Effugas ( 2378 ) on Thursday November 20, 2003 @06:10PM (#7524213) Homepage
    Since the author of this article is lurking here, I thought I'd ask:

    You make a rather big deal about Transmeta needing to run all x86 code through a "code morpher" (dynamic recompiler, actually), and come up with a decently large set of conclusions based on it.

    What's the big deal? No processor executes raw x86 anymore. Everything translates into an internal microcode that bears little resemblance to the original asm. Of course, normal chips have hardware accelerated microcode translaters, whereas Transmeta must recode in software -- but Transmeta's entire architecture was designed from day one to do that, and concievably they have more context available to do recoding by involving main memory in the process.

    And what is it with you neglecting the equivalence of main memory? Yes, transistors are necessary to store the translated program. They're also necessary to store the original one -- the Mozilla client I'm presently tapping away inside sure as hell doesn't fit in L1 on my C3! Outside of a small static penalty on load, and a smaller dynamic penalty from ongoing profiling, you can't blame performance on the fact that software needs to be in RAM. Software always needs to be in RAM.

    Don't get me wrong -- Transmeta's a performance dog, and everyone's known that since day one. But I think it's reasonable to say the cause is mostly one of attention -- every man hour they threw into allowing the system to emulate x86 took away from adding pipelines, increasing clock rates, tweaking caches, etc. In other words, yes it's a feat that they got the code to work, but you don't need to blame the feat for the quality of work -- they simply did alot of work nobody else had to waste time on, and fell behind because of it.

    Much easier explanation. Might even be true.

    Yours Truly,

    Dan Kaminsky
    DoxPara Research
    http://www.doxpara.com
    • by kma ( 2898 ) on Thursday November 20, 2003 @10:59PM (#7525662) Homepage Journal
      Ehh. In my opinion, people overestimate how big a deal x86 architecture complexity is, in part because it flatters their preconception that Intel is evil. ("If only dastardly Intel hadn't been holding the world back with this demon architecture from hell, think how fast CPUs could be now!") While working at VMware, I've gotten to know the x86 architecture on a first name basis. He now lets me call him "Archie."

      While Archie is undoubtedly an ugly, drunk screw-up, he's really a droplet in the ocean of effort that goes into a competitive CPU implementation. Yeah, we've got lots of code to deal with him, and he's an ongoing source of work, but not all that much code, nor that much work. If Archie were really such a terrible guy, it wouldn't be possible for Intel and AMD to be eating so many RISC vendors' lunches.

      Mike Johnson, the lead x86 designer at AMD, probably put it most succinctly when he said, "The x86 isn't all that complex -- it just doesn't make a lot of sense." It's peculiar all right, but not so peculiar that it can explain Transmeta's failure to be performance competitive. From speaking with Transmetans, I get the strong impression that they got bogged down because making a high performance dynamic translation system is ridiculously hard, rather than, say, because they just couldn't get the growdown segment descriptors right.
  • In fact, you could tell the story of the past 15 years of computer evolution -- from the rise of the PC to the rise of the Internet -- in terms of the effects of the amount of time it takes various components -- from a processor all the way out to a networked computer -- to load data.

    I like this assessement. Forget about Moore's Law as a measure of our progress; latency and throughput are far more important than processing power.

    Computers used to be for processing information; these days, most people use
  • Seriously. Every other day there's an Ars Technica this, an Ars Technica that. Let's make an icon for it.
  • Wouldn't it just be better if we had computers with lots of tiny CPU cores, instead of such big mamooths like the POWER5 or the Ultrasparc V ? for example, an array of 256 32 bit CPUs would make life simpler and more efficient at the hardware level as well as the software level, wouldn't it be?

    By the way, I would like to have a computer that has SRAM only and a bandwidth of 100 GB/sec...Is it possible, with current technology ?

C for yourself.

Working...