Forgot your password?
typodupeerror
AMD Hardware Technology

AMD's Piledriver To Hit 4GHz+ With Resonant Clock Mesh 286

Posted by Unknown Lamer
from the inventing-a-better-sliderule dept.
MojoKid writes about some interesting news from AMD. From the article: "Advanced Micro Devices plans to use resonant clock mesh (PDF) technology developed by Cyclos Semiconductor to push its Piledriver processor architecture to 4GHz and beyond, the company announced at the International Solid State Circuits Conferences (ISSCC) in San Francisco. Cyclos is the only supplier of resonant clock mesh IP, which AMD has licensed and implemented into its x86 Piledriver core for Opteron server processors and Accelerated Processing Units. Resonant clock mesh technology will not only lead to higher clocked processors, but also significant power savings. According to Cyclos, the new technology is capable of reducing power consumption by 10 percent or bumping up clockspeeds by 10 percent without altering the TDP." Unfortunately, aside from a fuzzy whitepaper, actual technical details are all behind IEEE and other paywalls with useless abstracts.
This discussion has been archived. No new comments can be posted.

AMD's Piledriver To Hit 4GHz+ With Resonant Clock Mesh

Comments Filter:
  • Re:but really (Score:4, Interesting)

    by arbiter1 (1204146) on Monday February 27, 2012 @08:36PM (#39180295)
    The bulldozer and i7-2600k were about same performance wise but that is 8 core cpu vs 4 cores + HT. Powerusage of both machines at wall was like 250watts under load. When you overclocked both the bulldozer to 4.8ghz and i7 to 5ghz, i7 used 80 more watts, the bulldozer doubled its draw to over 500 watts, i think it was 550 watts.
  • Re:vaporware (Score:2, Interesting)

    by networkBoy (774728) on Monday February 27, 2012 @08:36PM (#39180301) Homepage Journal

    Is AMD really doing that badly?
    Seriously I am out of the loop from an AMD perspective*, but I assumed they were still rocking the cost/performance on the low end of the CPU ranges, and was hoping this would allow them to push into the mid-range i5 territory.
    -nB

    *all I work on at work & at home is Intel stuff, so I don't have any relevant AMD info.

  • Re:vaporware (Score:2, Interesting)

    by Anonymous Coward on Monday February 27, 2012 @09:09PM (#39180643)

    80% of Intel performance at 12% of the cost.

  • Re:vaporware (Score:5, Interesting)

    by tyrione (134248) on Monday February 27, 2012 @09:24PM (#39180807) Homepage
    You must not work in Parallel Programming, doing any heavy engineering analysis/modeling. Taking advantage of all those threads and cores within Bulldozer and utilizing it with OpenCL along with the GPGPUs is a dream come true. More and more modeling environments are leveraging all that this architecture offers, but to you if your game doesn't presently use it it's worthless. To each their own.
  • by TheSync (5291) on Monday February 27, 2012 @09:47PM (#39181023) Journal

    How can the mesh be resonant to a square wave (with lots of high frequency harmonics over a huge band)?

    I can imagine it being resonant to a single frequency sine wave.

    But if the clock mesh is powered by a sine wave, you have to turn it back into a square wave to drive gates, and to do that you have to compare the clock voltage level with some known voltage levels, and there you may have process inaccuracies.

  • Re:vaporware (Score:5, Interesting)

    by Anthony Mouse (1927662) on Monday February 27, 2012 @10:34PM (#39181341)

    Now, here's the puzzling part: they want to use bulldozer, the failure, as the new core for the A series, the success. I hope they find a way to fix it, otherwise my next rig will have an Intel for the first time in ten years.

    I think the people calling bulldozer a failure have the wrong expectations. The core used in the existing A series is a direct descendant of the original Athlon from 1999, which itself was very similar to (and designed by the same people as) the DEC Alpha introduced in 1992, predating even the Pentium Pro. Suffice it to say that there isn't a lot of optimizing left to be done on the design.

    Bulldozer is a clean slate. The current implementation has some obvious shortcomings, not least of which that the cache architecture is lame. (The L1 is too small and the L2 latency is too high. They might actually do pretty well to make a smaller, lower latency, non-exclusive L2 and use the extra transistors for a bigger L3 or even an L4.) But that's not a bad thing. It's something they can fix and make future generations faster than the current generation. Which is the problem with the old K10 -- there are no easy little changes left to be made to make it substantially faster than it is now.

    The other part of the problem is that people want Bulldozer to be something it's not. It isn't designed for first in class single thread performance. It's designed to have adequate single thread performance while reducing the number of transistors per core so that you can have a lot of cores. It's designed for the server market, in other words. And to a lesser extent the workstation market. They designed something that would let them compete in the space that has the highest margins. So now all the high-end gamers who only care about single thread performance are howling at the moon because AMD concluded it couldn't compete with Intel in that sector and stopped trying.

    What you have to realize is that it isn't that the design is flawed. It's that you aren't the target market. They could have built something that achieved 90-100% of Intel's best on single threads instead of 60-80% by doubling the number of transistors per thread and halving the number of threads and cores, but think about who would buy that. PC enthusiasts who comprise about 0% of the market. It wouldn't sell in the server market because the performance per core * number of cores would be lower. It wouldn't sell in the budget market because it would require too many transistors per thread and therefore cost too much to manufacture.

    Instead, with Bulldozer they can use more modules and sell to the server market or anyone else with threaded software and then and use fewer modules in combination with a GPU and sell to the budget market and the midrange gaming market, and leave the six dozen howling high-end PC gamers to Intel.

  • by subreality (157447) on Monday February 27, 2012 @11:15PM (#39181665)

    [quote]How can the mesh be resonant to a square wave (with lots of high frequency harmonics over a huge band)?[/quote]

    There's no such thing as a square wave at 4GHz. You can draw them like that on paper, but in reality the edges smear into a pretty good approximation of a sine wave.

    Regardless, it will still have some higher frequency components, but you don't have to worry about them. The resonance won't help generate nice sharp edges, but that's the line driver's job. The resonance is just to save energy by helping pump the voltage at the fundamental frequency.

    (Disclaimer, not an EE, but I've looked over their shoulders a bunch of times)

  • Re:Not really (Score:5, Interesting)

    by Cajun Hell (725246) on Tuesday February 28, 2012 @12:06AM (#39181947) Homepage Journal

    It's even more ridiculous than that. My motherboard automatically overclocked my 2500K to 4.3GHz.

    It's even more ridiculous than that. Tom's boys say that by overclocking it a little, you might even make it more efficient [tomshardware.com].

  • by Ungrounded Lightning (62228) on Tuesday February 28, 2012 @01:41AM (#39182361) Journal

    Agreed [that it looks like vaporware]. It's a breathlessly ebullient press release sales pitch.

    Agreed it's a sales pitch. But not vaporware at all. Very neat solution. (I saw another with similar properties a couple years ago but this one is 'way better.)

    The issue is the power consumption of the clocking of the chip. Modern designs are primarily layers of D-type flip-flop registers separated by small amounts of random logic and all the flip flops are clocked simultaneously, all the time. The clock signal is input to ALL the flipflops and a bit of the random logic. I'm guessing somewhere between one in five and one in ten gate inputs are driven about equally by CLK or ~CLK. Further, the other signals flip between one and zero once, sometimes, on each cycle. ALL the CLK signals flip from zero to one and back to zero EVERY cycle. So there's a lot of activity on the clock.

    In CMOS the load on the clock is primarily capacitave - the stray capacitance of the CMOS gates and wiring - plus some losses, mainly due to the resistance of the wiring. The stray capacitance has to be charged and discharged every cycle. The charge represents energy. In a conventional design the clock drivers are essentially the same thing as logic gates (inverters). New energy is supplied from the power supply (and about half of it, excluding signal-line resistive losses, dumped as heat in the pullup transistors of the drivers) every cycle as the lines are charged. Then the charge is dumped to ground (and the rest of the energy dumped as heat in the pulldown transistors). All that energy gets lost as heat every cycle, and it represents about 30% of the power consumed by the chip. It would be nice to scavenge it and reuse most of it for the next tick.

    A previous invention used a half-wave transmission line looped around the chip and connected plus-to-minus. A big mobius strip. The CLK and ~CLK loads acted as distributed capacitance around the transmission line. A clock waveform circulated continuously, twice per cycle. Instead of a sea of drivers providing new energy and then throwing it away every cycle, the transmission ring had a few drivers distributed around it, keeping the wave circulating and correctly formed, and pumping in enough energy to replace the resistive losses while the bulk of the energy went round-and-round. Result: Most of the clock power requirements and heating load go away.

    Unfortunately, the circulating clock wave meant the region completing a computation ALSO went round-and-round, rather than everything switching at the same time. Stock design tools assume CLK/~CLK is simultaneous (except for minor variations) across the whole chip. So using that earlier system would require a major rewrite on the stock tools and new design methodologies.

    THIS system does a similar hack energetically, but with everything in sync. Instead of a sea of drivers driven by a carefully-balanced tree of pre-drivers, the CLK and ~CLK are constructed as a pair of heavy-conductor meshes - like two stacked layers of flattened-out window screens. These form two plates of a capacitor. These plates are connected by an inductor, forming a resonant "tank circuit". When this is "pumped up" by a few drivers and is "ringing", energy alternates between being an electric field between the screens and a magnetic field in the inductor coil, twice (once for each polarity) each cycle. Again the bulk of the energy is reused over and over while the drivers only have to replace the (mostly) resistive losses (and pump it up initially, over a number of cycles). Again the bulk of the clock power and heating is gone. But this time the whole chip is switching essentially simultaneously, so the stock design tools just work.

    Neat!

    Downside (of both inventions): You can't quickly start and stop the clock in a given area or run it more than a few percent off the speed set by the resonance of the tank circuit or transmission line. No overclocking. Also no clock gating to save power on quiesc

  • Re:vaporware (Score:4, Interesting)

    by TheLink (130905) on Tuesday February 28, 2012 @03:35AM (#39182767) Journal

    This might be enlightening: http://hardforum.com/showpost.php?p=1037482638&postcount=88 [hardforum.com]

    What did happen is that management decided there SHOULD BE such cross-engineering ,which meant we had to stop hand-crafting our CPU designs and switch to an SoC design style. This results in giving up a lot of performance, chip area, and efficiency. The reason DEC Alphas were always much faster than anything else is they designed each transistor by hand. Intel and AMD had always done so at least for the critical parts of the chip. That changed before I left - they started to rely on synthesis tools, automatic place and route tools, etc. I had been in charge of our design flow in the years before I left, and I had tested these tools by asking the companies who sold them to design blocks (adders, multipliers, etc.) using their tools. I let them take as long as they wanted. They always came back to me with designs that were 20% bigger, and 20% slower than our hand-crafted designs, and which suffered from electromigration and other problems.

    That is now how AMD designs chips. I'm sure it will turn out well for them [/sarcasm]

    And that comment was back in 2010. No surprise now Bulldozer is slower and uses more power, and the only advantage is it has more cores (meh, any idiot can add more cores, at worst case you just add another computer[1]).

    [1] The same embarrassingly parallel tasks that do well on multiple cores will do well on multiple computers.

An age is called Dark not because the light fails to shine, but because people refuse to see it. -- James Michener, "Space"

Working...