Forgot your password?
typodupeerror
Intel Hardware

Intel Details Upcoming Gulftown Six-Core Processor 219

Posted by samzenpus
from the give-me-the-numbers dept.
MojoKid writes "With the International Solid-State Circuits Conference less than a week away, Intel has released additional details on its upcoming hexa-core desktop CPU, next gen mobile, and dual-core Westmere processors. Much of the dual-core data was revealed last month when Intel unveiled their Clarkdale architecture. However, when Intel set its internal goals for what its calling Westmere 6C, the company aimed to boost both core and cache count by 50 percent without increasing the processor's thermal envelope. Westmere 6C (codename Gulftown) is a native six-core chip. Intel has crammed 1.17 billion transistors into a die that's approximately 240mm sq. The new chip carries 12MB up L3 (up from Nehalem's 8MB) and a TDP of 130W at 3.33GHz. In addition, Intel has built in AES encryption instruction decode support as well as a number of improvements to Gulftown's power consumption, especially in idle sleep states."
This discussion has been archived. No new comments can be posted.

Intel Details Upcoming Gulftown Six-Core Processor

Comments Filter:
  • by TheStonepedo (885845) on Thursday February 04, 2010 @08:14AM (#31021210) Homepage Journal

    Perhaps a jump in number of cores will convince people outside the Apple and FreeBSD camps to port Grand Central Dispatch.
    Letting the kernel team handle the hairier parts of multi-threaded design should make it easy for barely-optimized software to use powerful hardware.
    Could its Apache license work with the #1 OS family?

    • by TheRaven64 (641858) on Thursday February 04, 2010 @08:33AM (#31021366) Journal

      Porting libdispatch requires a generic event delivery framework, where the userspace process can wait for a variety of different types of event (signals, I/O, timers). On Darwin, Apple used the kqueue() mechanism that was ported from FreeBSD, so it's quite easy to port the code to FreeBSD (just #ifdef the bits that deal with Mach messages appearing on the queue). Kqueue is also ported to NetBSD and OpenBSD, so porting it to these systems will be easy too.

      Solaris and Windows both have completion ports, which provide the same functionality but with different interfaces. Porting to Solaris would require replacing the kqueue stuff with completion port stuff. Porting to Windows would ideally also require replacing the pthread stuff with win32 thread calls. Even Symbian has a nice event delivery framework that could be used, although I'm not sure what the pthread implementation is like in the Symbian POSIX layer.

      Linux is the odd system out. All different types of kernel events are delivered to userspace via different mechanisms, so it's really hairy trying to block waiting until the next kernel event. This also makes it harder to write low-power Linux apps, because your app can't spend so long sleeping and so the kernel can't spend so much time with the CPU in standby mode.

      If you don't need the event input stuff (which, to be honest, you do; it's really nice), you can use toydispatch, which is a reimplementation that I wrote of the core workqueue model using just portable pthread stuff.

      It also adds some pthread extensions for determining the optimal number of threads per workqueue (or workqueues per thread, depending on the number of cores and the load), but these are not required. The FreeBSD 8.0 port doesn't have them; they were added with FreeBSD 8.1.

      • by ronocdh (906309)

        If you don't need the event input stuff (which, to be honest, you do; it's really nice), you can use toydispatch, which is a reimplementation that I wrote of the core workqueue model using just portable pthread stuff.

        Can you provide a link for this? A Google search for "linux toydispatch" yields 3 hits, one of which is your post above:

        Results 1 - 3 of 3 for linux toydispatch. (0.31 seconds)

        • Re: (Score:3, Informative)

          by TheRaven64 (641858)

          Subversion repository [gna.org]. Note that it's designed specifically to do stuff in the background for libobjc2. It only implements a tiny subset of the libdispatch functionality, and not as efficiently (one thread per workqueue, for example). It's not intended to replace libdispatch, just to let me use some of the libdispatch APIs in code that has to be portable. The 'toy' in the name is not self-deprecation, it's an accurate assessment.

          Oh, and you get better results if you search for 'toydispatch' not 'lin

  • 1.17 billion transistors into a die that's approximately 240mm sq

    That's a big chip.

    • Re:240mm square? (Score:4, Insightful)

      by goldaryn (834427) on Thursday February 04, 2010 @08:21AM (#31021270) Homepage

      1.17 billion transistors into a die that's approximately 240mm sq

      That's a big chip.

      240 mm sq, that's 15.49mm x 15.49mm

      • 240 mm sq, that's 15.49mm x 15.49mm

        But not nearly as amusing. ;-)

      • Re: (Score:3, Informative)

        by IBBoard (1128019)

        Isn't it 240mm sq = 240mm x 240mm (as in (240mm) squared) and 240 sq mm is 240 x 1mm x 1mm (as in 240 x (square mms))? It's always an awkward one to represent and be clear on.

        • that'd be how I read it, and I'm pretty sure how the GGP read it, too... 240 mm sq = 240mm squared, which is different from 240 square millimeters.

        • by vadim_t (324782)

          No.

          1sq mm is a square with 1mm sides
          240sq mm is 240 of them.

          The side is sqrt 240 for a square shape.

        • by rrohbeck (944847)

          And how much is that in football fields?

      • What? Why?

        In every other case I can think of where you need to denote area it's opposite.

        240 mm sq is 240 millimeters each side for a total area of 240mm x 2, or 57,600 square millimeters.

        If you want to refer directly to the area then the unit descriptor comes AFTER the square designator, like this: 240 square millimeters.

        You wouldn't write " 42 feet square" when you meant "42 square feet" would you?

        Did I miss something somewhere?

        • by radish (98371)

          The correct notation for an area of 240 square millimeters is very hard to type (it's 240mm^2), my guess is the OP just turned "^2" into "sq" (for "squared") leading to the confusion.

  • Talked to our HP rep a few weeks ago about them. As soon as HP ships proliant servers with the new CPU's, we're going to buy 4 of them. Just haven't decided if we're going with 36GB RAM or 72GB RAM. 72GB RAM is only $2000 more than 36GB RAM these days.

    • I'm guessing that 36GB and 72GB refer to three dimms per channel times 6 channels (three per processor) and 2GB or 4GB modules. IIRC with DDR3 if you put three dimms on a channel you are limited to DDR3-800 speeds.

  • by TubeSteak (669689) on Thursday February 04, 2010 @08:31AM (#31021356) Journal

    Just so you know, I made this joke almost two years ago:
    http://hardware.slashdot.org/comments.pl?sid=465898&cid=22548916 [slashdot.org]

    They could have gone to 3 cores, like the competition. That seems like the logical thing to do, but they said "Fuck it, we're going to six". What part of this don't you understand? If two cores is good, and four cores is better, obviously six cores would make them the best fucking CPU that ever existed.

    http://www.theonion.com/content/node/33930 [theonion.com] [theonion.com]
    /I'm just waiting for the day Intel says "this one goes to 11"

    It's the CPU joke that will never die.

  • by Gr8Apes (679165) on Thursday February 04, 2010 @08:50AM (#31021536)

    So I skimmed TFA (gasp!) and it appears that Intel is finally following AMDs lead by keeping thermal envelopes constant.

    I note that this is still a effectively 2 CPUs with 3 cores each, but that's better than legacy Intel approaches, which would have been 3 sets of dual cores.

    It will be interesting to see how independent performance benchmarks play out between the new processors that are coming out.

    • I'm pretty sure it is one die, with communication possible between any cores. It just looks like 2x3 due to the way it is laid out.

  • Obligatory (Score:2, Insightful)

    by Mattskimo (1452429)
    blah blah Beowulf blah blah
  • 1.17 Billion transistors. Anyone remember the 6502, the 6800, and then the 68000? 68K transistors was a LOT in 1980 and made for a fantastic 32bit architecture. Now we're at 17000 times that count. Sometime you just have to stop for a moment and think just about the numbers.
    • ... 68K transistors was a LOT in 1980 and made for a fantastic 32bit architecture....

      I'm guessing you're still caffeine deprived, and meant 8 bit architecture.

      Anyway, what I want to know is where are the 3.2GHz 6502 and Z80's? You'd think making an existing architecture run like a bat out of hell would be far easier than a new Pentium chip. With less than 1.17 billion transistors, you could put an entire C64 or Apple II on one chip and run all the old software.

      • The Motorola 68000 was/is a 32-bit architecture.

      • Re: (Score:2, Informative)

        by wtfbill (1408123)
        No, I bet his caffeine content is fine. The 68K transistors would refer to the 68000 procs from Motorola which were 16- or 32-bit depending on configuration. Some of them could be switched at boot time by holding one of the pins high or low (I forget which...where are those old data sheets I have on those?) Of course the 65xx series and the 6800 series were 8-bit, however, they didn't have close to 68K transistors. But GP is right on, 68K transistors for a 32-bit architecture.
      • by rrohbeck (944847)

        I have a feeling that Loderunner would be difficult to play at 3.2 GHz.

    • Re: (Score:3, Interesting)

      by sznupi (719324)

      And yet, latest ARM cores are much closer to that 68k transistors from 1980, while not being nearly that far behind i7 in performance as the relation between numbers of transistors would suggest.

      Perhaps ARM found the sweet spot.

  • Instead of churning out cores they schould tweak the x86 isa to use multiple cores efficently. 1/2-word Atomic compare and swap is not enough, you cannot make atomic lockless doubly linked lists with that. No wonder something as interesting as http://valerieaurora.org/synthesis/SynthesisOS/ [valerieaurora.org] is not possible on x86 without major hacks.
    • Re: (Score:3, Informative)

      1/2-word?
      I'm pretty sure that there are instructions for atomic compare and swap of pointer-sized values, at least.

    • Re: (Score:3, Informative)

      by master_p (608214)

      Isn't the xchg instruction atomic for all sizes (8/16/32/64 bits)?

  • Why put AES on-board? I thought AES was relatively fast as encryption algorithms go. Plus, it is inevitable that AES will be replaced at some point, so why include something so specific in a chip now? It will suck to have to implement that in the processor in 20 years when nobody uses AES any longer. This is the whole point of a processor - include generic instructions that are useful for implementing any algorithm.

    • Re:on-board AES? (Score:5, Informative)

      by 0123456 (636235) on Thursday February 04, 2010 @10:05AM (#31022394)

      Why put AES on-board?

      They're not: they're putting extra instructions on-board which help implement AES more efficiently. They may also allow you to implement other algorithms more efficiently, though I haven't looked at them in enough detail to be sure.

      I thought AES was relatively fast as encryption algorithms go.

      That still doesn't make it fast at an absolute level. Particularly when you're doing full-disk encryption with user account encryption on top and IPSEC on all your network connections.

      • Re:on-board AES? (Score:4, Informative)

        by wirelessbuzzers (552513) on Thursday February 04, 2010 @03:18PM (#31026270)

        Why put AES on-board?

        They're not: they're putting extra instructions on-board which help implement AES more efficiently. They may also allow you to implement other algorithms more efficiently, though I haven't looked at them in enough detail to be sure.

        The instructions perform a single round of AES (which has 10-14 rounds depending on key size), either encrypting or decrypting. Certain other algorithms such as Lex, Camellia, Fugue and Grostl use AES S-boxes in their core, and can probably benefit from these instructions. However, they will not achieve nearly so much a speedup as AES.

        The AES instructions themselves will approximately double the speed of sequential AES computations. This is very unimpressive; VIA's AES instructions are much faster. They will also make it resistant to cache-timing attacks without losing speed, which is unimpressive because you can already do this on Penryn and Nehalem. The low speed results from the AES instructions having latency 6; if you can use a parallel mode (GCM, OCB, PMAC, or CBC-decrypt, for example) then the performance should be 10-12x the fastest current libraries. Hopefully, this will cause people to stop using CBC mode, but perhaps I'm too optimistic.

        Intel also added an instruction called PCLMULQDQ which does polynomial multiplication over F_2. If it's fast (I can't find timing numbers, but hopefully it's something like latency 2 and throughput 1) then it will be very useful for cryptography in general, speeding up certain operations by an order of magnitude or more. This is more exciting to me than the AES stuff, because it might enable faster, simpler elliptic-curve crypto and similarly simpler message authentication codes. Unfortunately, these operations are still slow on other processors, so cryptographers will be hesitant to use them until similar instructions become standard. If the guy you're communicating with has to do 10x the work so that you can do half the work... well, I guess it's still a win if you're the server.

        I thought AES was relatively fast as encryption algorithms go.

        That still doesn't make it fast at an absolute level. Particularly when you're doing full-disk encryption with user account encryption on top and IPSEC on all your network connections.

        AES is fast for a block cipher, but modern stream ciphers such as Salsa20/12, Rabbit, HC and SOSEMANUK are about 3-4x faster. (In other words, they are still faster than AES in a sequential mode on Westmere.) AES is still competitive, though, if you can use OCB mode to encrypt and integrity-protect the data at the same time.

        The fastest previous Intel processor with cutting-edge libraries in the most favorable mode could probably encrypt or decrypt 500MB/s/core at 3-3.5GHz. This is fast enough for most purposes, but in real life with ordinary libraries you'd probably get a third of that. So this will significantly improve disk and network encryption if they use a favorable cipher mode.

        Cred: I am a cryptographer, and I wrote what is currently the fastest sequential AES library for Penryn and Nehalem processors. But the calculations above are back-of-the-envelope, so don't depend on them.

    • Yeah AES is relatively fast with the keyword being relatively. Those of us who like to use disk encryption applaud this move since it would great reduce the need for seperate and expensive crypto hardware.

  • by CoffeeDregs (539143)

    >Westmere 6C (codename Gulftown)

        Really? I fricking hate codenamed codenames...

  • This comes the same month as the release of 16-core processors by IBM and Oracle, and a 12-core from AMD. This isn't that impressive.
    • Re: (Score:2, Informative)

      by Microlith (54737)

      Sure, but neither the Oracle or IBM chips will be available for less than several grand, and never in consumer level equipment (I can't exactly order one off Newegg.) And there's no telling how long it will be until the AMD chip trickles down from Opteron class to Phenom class, while it will probably be short order for the Core i9 to appear in stores.

      I suspect that AMD will drop the 6-core version as an X6 pretty soon, but it will likely be outperformed (possibly significantly) by the Gulftown.

  • with an AMD X3 Core 2 Duo or AMD X3X2...

    Though I wonder why we are going to 6 rather than 8. Core 2 Quad Duo's? Head 'esplodes....

    I just can't wait till the Quad Quads... or something spiffy, like Quad Squared. 16 is probably a ways off from the consumer market anyway.

    • by Yvan256 (722131)

      The nice thing about running Doom on a Core 2 Quad is that all weapons do four times as much damage, all the time.

I am not now, nor have I ever been, a member of the demigodic party. -- Dennis Ritchie

Working...