Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Intel Hardware

Intel Launches Power-Efficient Penryn Processors 172

Bergkamp10 writes "Over the weekend Intel launched its long-awaited new 'Penryn' line of power-efficient microprocessors, designed to deliver better graphics and application performance as well as virtualization capabilities. The processors are the first to use high-k metal-gate transistors, which makes them faster and less leaky compared with earlier processors that have silicon gates. The processor is lead free and by next year Intel is planning to produce chips that are halogen free, making them more environmentally friendly. Penryn processors jump to higher clock rates and feature cache and design improvements that boost the processors' performance compared with earlier 65-nm processors, which should attract the interest of business workstation users and gamers looking for improved system and media performance."
This discussion has been archived. No new comments can be posted.

Intel Launches Power-Efficient Penryn Processors

Comments Filter:
  • Still sticking (Score:2, Interesting)

    by guruevi ( 827432 ) on Monday November 12, 2007 @11:56AM (#21323947)
    It's sad that the industry is still sticking to the x86 instruction set. It should've been replaced a long time ago with a pure RISC instruction set especially now with the quest for less power-hungry chips. The Power/PowerPC architecture was good but because they didn't have enough demand, the price was high and development low. A few failures (compare to Netburst) and their customers (amongst them Apple) went running to the competitors.

    We're still running PowerPC here because they're low-power and do certain mathematics very well (I'm not the science guy). Hopefully Apple will switch back to PowerPC or so now that they are fully "Universal" and IBM has some promising chips lined up.
  • Re:RISC vs. CISC (Score:4, Interesting)

    by TheRaven64 ( 641858 ) on Monday November 12, 2007 @01:00PM (#21324845) Journal

    The transistors budgets are so high that the space taken by instruction decoders aren't an issue anymore (L1, L2 and sometimes even an L3 is on chip).
    Transistor space, no. Debugging time? Hell yes. Whenever I talk to people who design x86 chips their main complaint is that the complex side effects that an x86 chip must implement (or people complain that their legacy code breaks) make debugging a nightmare.

    Execution is out-of-order, and the pipeline stalls are greatly reduced. The out-of-order execution engine runs a RISC-like instruction set to begin with (micro-ops or r-ops).
    Most non-x86 architectures are moving back to in-order execution. Compilers are good enough that they put instructions far enough away to avoid dependencies (something much easier to do when you have lots of registers) and the die space savings from using an in-order core allows them to put more cores on each chip.

    There is one dominant platform (Wintel) and software costs dominate (compatibility is essential).
    Emulation has come a long way in the last few years. With dynamic recompilation you can get code running very fast (see Rosetta, the emulator Apple licensed from a startup in Manchester). More importantly, a lot of CPU-limited software is now open source and can be recompiled for a new architecture.

    x86-64 added 8 more general purpose registers, and the situation is much better (that's why most people see a 10-20% speedup when migrating to x86-64 - more registers)
    Unfortunately, you can only use 16 GPRs (and, finally, they are more or less real GPRs) when you are in 64-bit mode. That means every pointer has to be 64-bit, which causes a performance hit. Most 64-bit workstation spend a lot of their time in 32-bit mode, because the lower memory (capacity and bandwidth) usage and cache churn give a performance boost. They only run programs that need more than 4GB of address space in 64-bit mode. Embedded chips like ARM often do the same thing with 32/16-bit modes. If x86-64 let you have the extra registers with the smaller pointers you would probably see another performance gain.
  • by Z-MaxX ( 712880 ) on Monday November 12, 2007 @01:28PM (#21325219) Journal
    An often overlooked benefit of the way that modern IA32 processors achieve high performance through translating the CISC x86 instructions into microcode instructions is that the chip designers are free to change the internal microcode architecture for every CPU in order to implement new optimizations or to tune the microcode language for the particular chip's strengths. If we were all coding (or if our compilers were coding for us) in this RISCy microcode, then we, or the compiler, would have to do the optimizations that the CPU can do in its translation to microcode. I agree that the Power architecture is pretty cool, but I'm tired of hearing people bash the Intel x86 architecture for its "obsolete" nature. As long as it is the fastest and best thing I can buy for a reasonable amount of money, it's my top choice.
  • by enc0der ( 907267 ) on Monday November 12, 2007 @01:31PM (#21325241) Homepage
    Smaller size means faster but at the expense of more power. As a chip designer I can tell you that the smaller you go, the more leakage you have to deal with in the gates, and it goes up FAST. Now, with the new Intel chips, they are employing some new techniques to limit the leakiness of the gates, these techniques are not standard across the industry so it will be interesting to see how they hold up. I do not understand what you mean by signal-fixing/synchronization hardware. Design specific signal synchronization doesn't change over the different gate sizes. What changes is the techniques that are used as people find better ways to do these things. However, these are not technology specific and tend to find their way back into older technologies to improve performance their as well. In addition, cost is NOT always cheaper because die yield is generally MUCH LESS at newer technologies. For those on the bleeding edge. In addition, development costs go up because design specific limitations, process variance, and physical limitations cause designs to be MUCH HARDER to physically implement than at larger sizes. Things like electromigration, leakage power, ESD, OPC, DRC, and foundry design rules are MUCH worse. What is true is that these people want faster chips, and you can get that, as I said. Although the speed differences are not that amazing. Personally, I don't think the cost justifies the improvement in what I have worked on. Especially on power. Now, going out a few years from now, as they solve these problems at these specific gate geometries, THEN we will start to see the benefits of the size overall.
  • by dreamchaser ( 49529 ) on Monday November 12, 2007 @01:53PM (#21325505) Homepage Journal
    You should probably check the prices again with an eye towards price/performance ratios. AMD hasn't been cheaper for a long time. You can save a few bucks by settling for lower performance, but not enough to upgrade that video card or any other significant components.
  • Re:RISC vs. CISC (Score:5, Interesting)

    by vlad_petric ( 94134 ) on Monday November 12, 2007 @02:10PM (#21325689) Homepage
    High-performance computing isn't moving away from out-of-order execution any time soon. Itanic was a failure. The current generation of consoles are in-order, indeed, but keep in mind that they serve a workload niche (rather large niche in terms of deployment, sure, but still a workload niche).

    The argument that the compiler can do a reasonable job at scheduling instructions ... well, is simply false. Reason #1: The problem is that most applications have rather small basic blocks (spec 2000 integer, for instance, has basic blocks in the 6-10 instruction range). You can do slightly better with hyperblocks, but for that you need rather heavy profiling to figure out which paths are frequently taken. Reason #2: compiler operates on static instructions, the dynamic scheduler - on the dynamic stream. The compiler can't differentiate between instances of the instructions that hit in the cache (with a latency of 3-4 cycles) and those that miss all the way to memory (200+ cycles). The dynamic scheduler can. Why do you think that Itanium has such large caches? Because it doesn't have out-of-order execution, it is slowed down by cache misses to a much larger extent than the out-of-order processors.

    I agree that there are always ways to statically improve the code to behave better on in-order machines (hoist loads and make them speculative, add prefetches, etc), but for the vast majority of applications none are as robust as out-of-order execution.

  • Come Full Circle (Score:2, Interesting)

    by IorDMUX ( 870522 ) <<moc.liamg> <ta> <3namremmiz.kram>> on Monday November 12, 2007 @02:46PM (#21326195) Homepage
    Once upon a time (1970's), everybody used metal for their FET gates. Those aluminum gates are where we got the names MOSFET (Metal-Oxide-Silicon Field Effect Transistor) and CMOS (Complementary MOSFET). In the 1980's, pretty much every fab gave up metal gates for the polysilicon that has been used since, amidst various enhancements in polysilicon deposition technology, self aligned gates, etc.

    Now, the trend seems to be to return to the metal gates of yesteryear and ditch the oxide (the 'O' in MOSFET) for high-k dielectrics (not high-k metals, as the summary seems to say)...

    That's all well and good, but I have one question... when will we get around to updating the term "CMOS"?
  • Re:Still sticking (Score:1, Interesting)

    by Anonymous Coward on Monday November 12, 2007 @02:53PM (#21326271)
    The instruction set the programmer "sees" is not the instruction set that the chip actually runs.

    Huh. That's a strange definition of "replaced" you've got.

    This is like having ATMs that only gave out dimes, complaining about the dimes, and being told "no, we do all transactions in units of $10; the dimes you 'see' are not the same monies that we actually transfer".

    As a user, I don't care what the processor does internally -- could use black magic for all I care. I've written PPC compilers before, but I can't wrap my brain around x86. Could this be why so few new (non-byte)compiled languages exist -- because nobody can figure out how to write a code-emitter for the monstrosities that pass as recent CPUs?
  • Names of Rivers? (Score:4, Interesting)

    by spineboy ( 22918 ) on Monday November 12, 2007 @03:02PM (#21326415) Journal
    I'm just wondering which will end first - Moores law, or the number of river names left in Washington. For those of you who don't know, all of Intels chip names are named after rivers in Washington state.
  • by emil ( 695 ) on Monday November 12, 2007 @04:03PM (#21327263)
    • While POWER5 was out-of-order, POWER6 is now in-order. That's how they plan to hit 5ghz.
    • While you've added 8 more registers, you've also doubled the size of pointers (and thus doubled the memory bandwidth required for them). We've seen several cases where Sparc-32 compiled applications are faster than Sparc-64 on the same platform - therefore I'd benchmark an application in 32-bit mode before I'd take the 64-bit version.

Our business in life is not to succeed but to continue to fail in high spirits. -- Robert Louis Stevenson

Working...