Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Intel Hardware

Intel Shows 48-Core x86 Processor 366

Vigile writes "Intel unveiled a completely new processor design today the company is dubbing the 'Single-chip Cloud Computer' (but was previously codenamed Bangalore). Justin Rattner, the company's CTO, discussed the new product at a press event in Santa Clara and revealed some interesting information about the goals and design of the new CPU. While terascale processing has been discussed for some time, this new CPU is the first to integrate full IA x86 cores rather than simple floating point units. The 48 cores are set 2 to a 'tile' and each tile communicates with others via a 2D mesh network capable of 256 GB/s rather than a large cache structure. "
This discussion has been archived. No new comments can be posted.

Intel Shows 48-Core x86 Processor

Comments Filter:
  • by EdipisReks ( 770738 ) on Wednesday December 02, 2009 @05:50PM (#30303374)

    Does the fact that none* of the Apple Operating system names are of animals not native to America? *After 5.1, which is "Kodiak" - which can be found in Alaska. 5.2 Mac OS X v10.0 "Cheetah" 5.3 Mac OS X v10.1 "Puma" 5.4 Mac OS X v10.2 "Jaguar" 5.5 Mac OS X v10.3 "Panther" 5.6 Mac OS X v10.4 "Tiger" 5.7 Mac OS X v10.5 "Leopard" 5.8 Mac OS X v10.6 "Snow Leopard

    there are pumas in the American west and in Florida, they are just called Mountain Lions or Cougars or Floida Panthers. same thing.

  • Re:So ... (Score:2, Informative)

    by Avtuunaaja ( 1249076 ) on Wednesday December 02, 2009 @05:52PM (#30303400)
    Linux can handle 4096 cores without trouble in the main kernel tree, with support for much larger images already existing in trees forked by people who actually need such things.
  • by IYagami ( 136831 ) on Wednesday December 02, 2009 @05:56PM (#30303474)
  • by Anonymous Coward on Wednesday December 02, 2009 @05:58PM (#30303514)

    Here's the Wired story.

    http://www.wired.com/gadgetlab/2007/08/64-core-chips-a/

  • by eabrek ( 880144 ) <eabrek@bigfoot.com> on Wednesday December 02, 2009 @06:17PM (#30303864)
    That's what each channel is. I forget exactly, but each DDR channel is almost 200+ pins (RDRAM was considered a big win because it is about 80). And pins == money (mainly in die area).
  • Re:Codenames (Score:5, Informative)

    by azrael29a ( 1349629 ) on Wednesday December 02, 2009 @06:20PM (#30303920)

    Why can companies not come up with decent code names. For instance, this would be the perfect case for it being codenamed "Beowulf".

    They're using geographical names (cities, places, lakes, rivers) to avoid having to register the codename as a trademark. Geographical names can't be trademarked so no one will use your codename for his trademark.

  • Re:Windows 12 (Score:5, Informative)

    by Rockoon ( 1252108 ) on Wednesday December 02, 2009 @06:37PM (#30304204)
    A mutex (MUTually EXclusive) is a software methodology in which one thread or process can (usually temporarily) lock a resource (such as a memory location) so that another thread or process may not access it.

    It is most often required because resources are normally not 'atomic.' For instance, a string in memory is made up of many machine words and a CPU cannot read or write multiple machine word values in one operation. The danger is that while one CPU is writing to such a non-atomic collection of values, another might be trying to read from (or write to) it.. creating a situation where that second process reads part of the old data and part of the new data (essentially garbage data.)

    So the idea of a MUTEX is born, in which an atomic value is leveraged to allow a thread to reserve such resources, signaling others (if they respect the MUTEX as well) to wait their turn.
  • by eabrek ( 880144 ) <eabrek@bigfoot.com> on Wednesday December 02, 2009 @06:38PM (#30304226)

    Multiple channels and overlapped memory access? The hardware does it automatically. No need to program anything different (well, I guess there is BIOS code somewhere that configures all the channels and bank information - but most people shouldn't see that).

    Now, programming a 48 core FPU monster? That is a much harder problem!

  • by eabrek ( 880144 ) <eabrek@bigfoot.com> on Wednesday December 02, 2009 @06:43PM (#30304332)

    "ARM states that the Cortex-A8 occupies up to 3 mm when fabricated in a 65 nm process." (Source [insidedsp.com]).

    Each dual core "tile" is 3mm^2. So only 1 per tile, or 24.

  • Not the same thing (Score:4, Informative)

    by Sycraft-fu ( 314770 ) on Wednesday December 02, 2009 @07:04PM (#30304678)

    Sun's processors are heavily multi-threaded per core. It is an 8 core CPU where each core can handle 8 threads in hardware. Intel's solution is 48 separate cores, doesn't say how many threads per core.

    The difference? Well lots of threads on one core leads to that core being well used. Ideally, you can have it such that all its execution units are always full, it is working to 100% capacity. However it leads to slower execution per thread, since the threads are sharing a core and competing for resources.

    Something like Sun's solution would be good for servers, if you have a lot of processes and you want to avoid the context switching penalty you get form going back and forth, but no process really uses all that much power. Web servers with lots of scripts and DB access and such would probably benefit from it quite a lot.

    However it wouldn't be so useful for a program that tosses out multiple threads to get more power. Like say you have a 3D rendering engine and it has 4 rendering threads. If all those threads got assigned to one core, well it would run little faster than a single thread running on that core. What you want is each thread on its own core to give you, ideally, a 4x speed increase over a single thread.

    So in general, with Intel's chips you see not a lot of thread per core. 1 and 2 are all they've had so far (P4s and Core i7s are 2 threads per core, Core 2s are 1 thread per core). They also have features such as the ability for a single core to boost its clock speed if the others are not being used much, to get more performance for one thread and still stay in the thermal spec. These are generally desktop or workstation oriented features. You aren't necessarily running many different apps that need power, you are running one or maybe two apps that need power.

    As for this, well I don't know what they are targeting, or how many threads/core it supports.

  • by TheRaven64 ( 641858 ) on Wednesday December 02, 2009 @07:11PM (#30304790) Journal
    Not sure why you think that. Intel's owes its current existence to their Israeli team, which was the only group producing working designs with a usable power envelope while the American design team was following the US automobile industry in concept. Most Intel products are codenamed based on a location near the design team. Several recent Intel chipsets have been designed in east Asia. Plugging 48 x86 cores onto a die, when you have access to Intel's designs, is not a particularly hard task compared to chipset design, so I wouldn't be at all surprised if one of their Indian teams did it. There are a billion people in India; it's not hard to imagine that among that population there are some who are at least as competent as any of Intel's American designers.
  • by nullchar ( 446050 ) on Wednesday December 02, 2009 @07:20PM (#30304928)

    I thought it was ID10T (eye dee ten tee).

  • by TheRaven64 ( 641858 ) on Wednesday December 02, 2009 @07:32PM (#30305074) Journal
    Processors access memory via a cache. When you load a word from memory to a register, it is loaded from cache. If it is not already in cache, then you get a cache miss, the pipeline stalls (and runs another context on SMT chips), and the memory controller fetches a cache line of data from memory. Cache lines are typically around 128 bytes. Modern memory is typically connected via channel that is 64 bits wide. That means that it takes 16 reads to fill a cache line. If you have your memory arranged in matched pairs of modules then it can fill it in 8 pairs of reads instead, which takes half as long.

    On any vaguely recent non-Intel chip (including workstation and server chips for most architectures), you have a memory controller on die for each chip (sometimes for each core). Each chip is connected to a separate set of memory. A simple example of this is a two-way Opteron. Each will have its own, private, memory. If you need to access memory attached to the other processor then it has to be forwarded over the HyperTransport link (a point-to-point message passing channel that AMD uses to run a cache coherency protocol). If your OS did a good job of scheduling, then all of the RAM allocated to a process will be on the RAM chips close to where the process is running.

    The reason Intel and Sun are pushing fully buffered DIMMs for their new chips is that FBDIMMs use a serial channel, rather than a parallel one, for connecting the memory to the memory controller. This means that you need fewer pins on the memory controller for connecting up a DIMM and so you can have several memory controllers on a single die without your chip turning into a porcupine. You probably wouldn't have 48 memory controllers on a 48-core chip, but you might have six, with every 8 cores sharing a level-3 cache and a memory controller.

  • by Anonymous Coward on Wednesday December 02, 2009 @07:34PM (#30305120)

    ...which is why he pointed it out as an exception to his statement. You have bad reading comprehension.

  • Re:So ... (Score:4, Informative)

    by TheRaven64 ( 641858 ) on Wednesday December 02, 2009 @07:40PM (#30305192) Journal
    Ugh, I hate seeing this repeated so often. The 4096-processor SGI machines that Linux works on run 'with the main tree' are clusters. They run a separate instance of Linux on each node and have some very complex hardware managing cache coherency between them. Architecturally, they are nothing like a standard SMP system.
  • by sznupi ( 719324 ) on Wednesday December 02, 2009 @08:22PM (#30305710) Homepage

    The 48-core chip that Intel demonstrated is 45nm!

    Also, Cortex-A9: "For 2000 DMIPS of performance when designed in a TSMC 65 nanometer (nm) generic process the core logic costs less than 1.5 mm^2 of silicon." ( http://www.arm.com/products/CPUs/ARMCortex-A9SingleCore.html [arm.com] ) So it seems "up to 3 mm^2" in your quote really means "up to" (and for a much older core of course, when it was just launching 4 years ago)

    And Cortex-A9 "consumes less than 250mW per core"...

  • by kharchenko ( 303729 ) on Wednesday December 02, 2009 @09:19PM (#30306242)
    >If UCSB had such a system back in the 1990s, then UCSB would likely have produced as much multiprocessor research as Stanford University
    Actually, UCSB had exactly such a system in the 90's, called Meiko: "The Department of Computer Science at UCSB purchased a 64-processor CS-2 in June 1994." [ucsb.edu]
  • What is worse is that theyve done away with cache coherence. So I dont think you can take a 48 thread mysql / java process and just scale it. You COULD use forked processes that don't share much. (ie postgres/apache/php).
  • Re:Windows 12 (Score:4, Informative)

    by JWSmythe ( 446288 ) <jwsmytheNO@SPAMjwsmythe.com> on Wednesday December 02, 2009 @10:10PM (#30306622) Homepage Journal

        It doesn't matter much. The first sibling to grab key 1a is usually running for the car. Even if the other sibling grabbed key 1b, they'll be looking at an empty parking spot, complaining to mom. :)

  • by afidel ( 530433 ) on Wednesday December 02, 2009 @10:28PM (#30306752)
    The reason the i7 gains nothing going from double to triple channel memory is that the memory controller is power limited and so can only run at reduced clocking on triple channel configurations 800Mhz down from 1333Mhz. Of course for most workloads having 50% more data in RAM instead of glacially slow storage is a win =)
  • by Anonymous Coward on Wednesday December 02, 2009 @11:51PM (#30307288)

    Firstly stop being xenophobic.
    Maybe the name is Bangalore because of this?
    http://nextbigfuture.com/2009/12/intel-makes-single-chip-cloud-computer.html [nextbigfuture.com]
    "This represents the latest achievement from Intel's Tera-scale Computing Research Program. The research was co-led by Intel Labs Bangalore, India, Intel Labs Braunschweig, Germany and Intel Labs researchers in the United States. "
    And Intel is an international company headquartered in the US. Intel gets just 20% of its revenue from the Americas
    http://www.forbes.com/feeds/businesswire/2009/10/13/businesswire130140595.html [forbes.com]

    And Bangalore has nothing to do with the current or the previous US recession. India imports more from the US than it exports to the US. Hence the US has a trade surplus with India. The current crisis was caused by reckless behavior by American financial institutions and the American housing bubble and it has affected the rest of the world.

    Stop being so driven by hatred and country sentiments. We all live in this same world, are humans, dependant on each other and deserve respect from all other human beings. Hatred is so 2008.... grow up.

  • by Bengie ( 1121981 ) on Thursday December 03, 2009 @01:03AM (#30307628)

    Cache coherency should be handled by the programmer, not by the hardware. Cache coherency protocols consume more bandwidth the more cores you get. The more cores you get, the more important that bandwidth becomes. At some point Cache coherency will become a bottleneck. We've been holding quite well to doubling transistor count every 18 months. If we suddenly go from strong single cores to somewhat weaker multi cores, not only will they pack more cores in for the same transistor count, but more transistors.

    Imagine, our 4 core cpus will be 8 core in ~18months, then 16 ~18 more month. Intel has hyper-threading and AMD has a similar thing, so now it's like 32 cores. So, in ~ 3 years, at our current rate, we could have 32 logical CPUs reporting for low-mid sub $1.5k computers

  • by Anonymous Coward on Thursday December 03, 2009 @06:59AM (#30308914)

    called Meiko

    Meiko was the company who built the machine.

    INMOS begat the Transputer, which begat Meiko and the CS-2, which begat the Elan/Elite interconnect, which begat Quadrics, which begat QsNet & QsNet II (& almost, but not quite, QsNet III), which begat a whole bunch of redundent people in spring 2009 when they finally folded.

  • by apposite ( 113190 ) on Thursday December 03, 2009 @07:32AM (#30309022) Homepage

    parent++ (I'm not saying much more than parent post has already said.)

    We've more or less hit the limits of useful gains from increasing pipeline depth (and thus increasing clock frequency) or increased Instruction Level Parallelism (which gives you superscalar/multiple dispatch per clock cycle). The silicon required to do the book keeping starts being more of an overhead than you can get by simply rolling back to a simpler core and having more of them- which is precisely what has happened. As of about 2007 clock rates were generally down from their peak with increased throughput coming from the addition of multiple cores.

    Multiple cores- full cores with FP and everything!- are useful for Task Level Parallelism, which can be difficult to achieve on a single job but is a very nice fit for many server loads (like web serving) where individual threads have very little interaction. Desktops will no doubt inherit many core (8+) CPUs from the server world, but I'd guess that we'll actually see desktop CPUs shrinking- requiring less power and following the laptop power curves. There may even be a more pronounced separation between the "power desktop user" who uses their CPU for intensive graphic rendering (i.e a graphics workstation or gamer machine) and everyone else (who ends up using a mere 4 or 8 core machine which requires little or no active cooling).

    Servers will continue to pack more and more cores with more and more memory. The bandwidth bottleneck is RAM, not Disk as was mentioned in one comment (any serious server setup uses a variety of strategies to serve most content from RAM and only writes to Disk for persistence or tail end performance). This also means they'll have more NICs, and there will be pressure to push the network speed up to keep the CPU and RAM busy.

    The reference book on this sort of thing (and apologies for anything I got wrong) is "Computer Architecture: A Quantitative Approach" by Hennessy and Patterson. Very readable and amazingly comprehensive.

  • by mrboyd ( 1211932 ) on Thursday December 03, 2009 @08:17AM (#30309168)
    Really? then wtf is that job offer on intel website for a CPU Architect in bangalore for?

    In this position, you will be responsible for architecting advanced client platforms for 2015 and beyond. We are now in the early research and pathfinding for the 2015 generation of CPU products. Our team engages in early architecture analysis, microarchitecture research and/or development, performance and/or power modeling and analysis, including detailed architecture validation versus RTL

    Here's what they do in Bangalore: http://www.intel.com/jobs/india/iidc/index.htm [intel.com]. Seems like some people in India have enough skills to design a CPU.

  • by morgen_m ( 1688614 ) on Thursday December 03, 2009 @12:05PM (#30311184)
    Do you have any idea what you are talking about? Intel,AMD and many other companies design their chips in Bangalore. e.g: Xeon 7400 series [pcworld.com] by Intel and AMD's competitor to the Xeon, called Opteron [siliconindia.com] were designed in B'lore. Infineon, Cisco, GE and whole lot of other companies have chip designing operations in Bangalore.

The one day you'd sell your soul for something, souls are a glut.

Working...