Intel Shows 48-Core x86 Processor 366
Vigile writes "Intel unveiled a completely new processor design today the company is dubbing the 'Single-chip Cloud Computer' (but was previously codenamed Bangalore). Justin Rattner, the company's CTO, discussed the new product at a press event in Santa Clara and revealed some interesting information about the goals and design of the new CPU. While terascale processing has been discussed for some time, this new CPU is the first to integrate full IA x86 cores rather than simple floating point units. The 48 cores are set 2 to a 'tile' and each tile communicates with others via a 2D mesh network capable of 256 GB/s rather than a large cache structure. "
Re:Code Name is Offensive (Score:2, Informative)
there are pumas in the American west and in Florida, they are just called Mountain Lions or Cougars or Floida Panthers. same thing.
Re:So ... (Score:2, Informative)
Sun HAS a 64 thread processor: UltraSPARC T2 (Score:4, Informative)
More info at:
http://www.sun.com/processors/UltraSPARC-T2/specs.xml [sun.com]
That's nothing, how about 64 cores for $435? (Score:1, Informative)
Here's the Wired story.
http://www.wired.com/gadgetlab/2007/08/64-core-chips-a/
Re:Advantages over just adding more FPUs? (Score:4, Informative)
Re:Codenames (Score:5, Informative)
Why can companies not come up with decent code names. For instance, this would be the perfect case for it being codenamed "Beowulf".
They're using geographical names (cities, places, lakes, rivers) to avoid having to register the codename as a trademark. Geographical names can't be trademarked so no one will use your codename for his trademark.
Re:Windows 12 (Score:5, Informative)
It is most often required because resources are normally not 'atomic.' For instance, a string in memory is made up of many machine words and a CPU cannot read or write multiple machine word values in one operation. The danger is that while one CPU is writing to such a non-atomic collection of values, another might be trying to read from (or write to) it.. creating a situation where that second process reads part of the old data and part of the new data (essentially garbage data.)
So the idea of a MUTEX is born, in which an atomic value is leveraged to allow a thread to reserve such resources, signaling others (if they respect the MUTEX as well) to wait their turn.
Re:Advantages over just adding more FPUs? (Score:3, Informative)
Multiple channels and overlapped memory access? The hardware does it automatically. No need to program anything different (well, I guess there is BIOS code somewhere that configures all the channels and bank information - but most people shouldn't see that).
Now, programming a 48 core FPU monster? That is a much harder problem!
Re:Sounds like Sinclair's waffer scale intergratio (Score:2, Informative)
"ARM states that the Cortex-A8 occupies up to 3 mm when fabricated in a 65 nm process." (Source [insidedsp.com]).
Each dual core "tile" is 3mm^2. So only 1 per tile, or 24.
Not the same thing (Score:4, Informative)
Sun's processors are heavily multi-threaded per core. It is an 8 core CPU where each core can handle 8 threads in hardware. Intel's solution is 48 separate cores, doesn't say how many threads per core.
The difference? Well lots of threads on one core leads to that core being well used. Ideally, you can have it such that all its execution units are always full, it is working to 100% capacity. However it leads to slower execution per thread, since the threads are sharing a core and competing for resources.
Something like Sun's solution would be good for servers, if you have a lot of processes and you want to avoid the context switching penalty you get form going back and forth, but no process really uses all that much power. Web servers with lots of scripts and DB access and such would probably benefit from it quite a lot.
However it wouldn't be so useful for a program that tosses out multiple threads to get more power. Like say you have a 3D rendering engine and it has 4 rendering threads. If all those threads got assigned to one core, well it would run little faster than a single thread running on that core. What you want is each thread on its own core to give you, ideally, a 4x speed increase over a single thread.
So in general, with Intel's chips you see not a lot of thread per core. 1 and 2 are all they've had so far (P4s and Core i7s are 2 threads per core, Core 2s are 1 thread per core). They also have features such as the ability for a single core to boost its clock speed if the others are not being used much, to get more performance for one thread and still stay in the thermal spec. These are generally desktop or workstation oriented features. You aren't necessarily running many different apps that need power, you are running one or maybe two apps that need power.
As for this, well I don't know what they are targeting, or how many threads/core it supports.
Re:Code Name is Offensive (Score:3, Informative)
Re:Code Name is Offensive (Score:2, Informative)
I thought it was ID10T (eye dee ten tee).
Re:Advantages over just adding more FPUs? (Score:4, Informative)
On any vaguely recent non-Intel chip (including workstation and server chips for most architectures), you have a memory controller on die for each chip (sometimes for each core). Each chip is connected to a separate set of memory. A simple example of this is a two-way Opteron. Each will have its own, private, memory. If you need to access memory attached to the other processor then it has to be forwarded over the HyperTransport link (a point-to-point message passing channel that AMD uses to run a cache coherency protocol). If your OS did a good job of scheduling, then all of the RAM allocated to a process will be on the RAM chips close to where the process is running.
The reason Intel and Sun are pushing fully buffered DIMMs for their new chips is that FBDIMMs use a serial channel, rather than a parallel one, for connecting the memory to the memory controller. This means that you need fewer pins on the memory controller for connecting up a DIMM and so you can have several memory controllers on a single die without your chip turning into a porcupine. You probably wouldn't have 48 memory controllers on a 48-core chip, but you might have six, with every 8 cores sharing a level-3 cache and a memory controller.
Re:Code Name is Offensive (Score:1, Informative)
...which is why he pointed it out as an exception to his statement. You have bad reading comprehension.
Re:So ... (Score:4, Informative)
Re:Sounds like Sinclair's waffer scale intergratio (Score:3, Informative)
The 48-core chip that Intel demonstrated is 45nm!
Also, Cortex-A9: "For 2000 DMIPS of performance when designed in a TSMC 65 nanometer (nm) generic process the core logic costs less than 1.5 mm^2 of silicon." ( http://www.arm.com/products/CPUs/ARMCortex-A9SingleCore.html [arm.com] ) So it seems "up to 3 mm^2" in your quote really means "up to" (and for a much older core of course, when it was just launching 4 years ago)
And Cortex-A9 "consumes less than 250mW per core"...
Re:48 is sufficient for most Ph.D. dissertations. (Score:4, Informative)
Actually, UCSB had exactly such a system in the 90's, called Meiko: "The Department of Computer Science at UCSB purchased a 64-processor CS-2 in June 1994." [ucsb.edu]
Re:Advantages over just adding more FPUs? (Score:5, Informative)
Re:Windows 12 (Score:4, Informative)
It doesn't matter much. The first sibling to grab key 1a is usually running for the car. Even if the other sibling grabbed key 1b, they'll be looking at an empty parking spot, complaining to mom. :)
Re:Advantages over just adding more FPUs? (Score:5, Informative)
Re:Code Name is Offensive (Score:1, Informative)
Firstly stop being xenophobic.
Maybe the name is Bangalore because of this?
http://nextbigfuture.com/2009/12/intel-makes-single-chip-cloud-computer.html [nextbigfuture.com]
"This represents the latest achievement from Intel's Tera-scale Computing Research Program. The research was co-led by Intel Labs Bangalore, India, Intel Labs Braunschweig, Germany and Intel Labs researchers in the United States. "
And Intel is an international company headquartered in the US. Intel gets just 20% of its revenue from the Americas
http://www.forbes.com/feeds/businesswire/2009/10/13/businesswire130140595.html [forbes.com]
And Bangalore has nothing to do with the current or the previous US recession. India imports more from the US than it exports to the US. Hence the US has a trade surplus with India. The current crisis was caused by reckless behavior by American financial institutions and the American housing bubble and it has affected the rest of the world.
Stop being so driven by hatred and country sentiments. We all live in this same world, are humans, dependant on each other and deserve respect from all other human beings. Hatred is so 2008.... grow up.
Re:Advantages over just adding more FPUs? (Score:4, Informative)
Cache coherency should be handled by the programmer, not by the hardware. Cache coherency protocols consume more bandwidth the more cores you get. The more cores you get, the more important that bandwidth becomes. At some point Cache coherency will become a bottleneck. We've been holding quite well to doubling transistor count every 18 months. If we suddenly go from strong single cores to somewhat weaker multi cores, not only will they pack more cores in for the same transistor count, but more transistors.
Imagine, our 4 core cpus will be 8 core in ~18months, then 16 ~18 more month. Intel has hyper-threading and AMD has a similar thing, so now it's like 32 cores. So, in ~ 3 years, at our current rate, we could have 32 logical CPUs reporting for low-mid sub $1.5k computers
Re:48 is sufficient for most Ph.D. dissertations. (Score:1, Informative)
Meiko was the company who built the machine.
INMOS begat the Transputer, which begat Meiko and the CS-2, which begat the Elan/Elite interconnect, which begat Quadrics, which begat QsNet & QsNet II (& almost, but not quite, QsNet III), which begat a whole bunch of redundent people in spring 2009 when they finally folded.
Re:Advantages over just adding more FPUs? (Score:2, Informative)
parent++ (I'm not saying much more than parent post has already said.)
We've more or less hit the limits of useful gains from increasing pipeline depth (and thus increasing clock frequency) or increased Instruction Level Parallelism (which gives you superscalar/multiple dispatch per clock cycle). The silicon required to do the book keeping starts being more of an overhead than you can get by simply rolling back to a simpler core and having more of them- which is precisely what has happened. As of about 2007 clock rates were generally down from their peak with increased throughput coming from the addition of multiple cores.
Multiple cores- full cores with FP and everything!- are useful for Task Level Parallelism, which can be difficult to achieve on a single job but is a very nice fit for many server loads (like web serving) where individual threads have very little interaction. Desktops will no doubt inherit many core (8+) CPUs from the server world, but I'd guess that we'll actually see desktop CPUs shrinking- requiring less power and following the laptop power curves. There may even be a more pronounced separation between the "power desktop user" who uses their CPU for intensive graphic rendering (i.e a graphics workstation or gamer machine) and everyone else (who ends up using a mere 4 or 8 core machine which requires little or no active cooling).
Servers will continue to pack more and more cores with more and more memory. The bandwidth bottleneck is RAM, not Disk as was mentioned in one comment (any serious server setup uses a variety of strategies to serve most content from RAM and only writes to Disk for persistence or tail end performance). This also means they'll have more NICs, and there will be pressure to push the network speed up to keep the CPU and RAM busy.
The reference book on this sort of thing (and apologies for anything I got wrong) is "Computer Architecture: A Quantitative Approach" by Hennessy and Patterson. Very readable and amazingly comprehensive.
Re:Code Name is Offensive (Score:3, Informative)
In this position, you will be responsible for architecting advanced client platforms for 2015 and beyond. We are now in the early research and pathfinding for the 2015 generation of CPU products. Our team engages in early architecture analysis, microarchitecture research and/or development, performance and/or power modeling and analysis, including detailed architecture validation versus RTL
Here's what they do in Bangalore: http://www.intel.com/jobs/india/iidc/index.htm [intel.com]. Seems like some people in India have enough skills to design a CPU.
Re:Code Name is Offensive (Score:2, Informative)