Intel Shows 48-Core x86 Processor 366
Vigile writes "Intel unveiled a completely new processor design today the company is dubbing the 'Single-chip Cloud Computer' (but was previously codenamed Bangalore). Justin Rattner, the company's CTO, discussed the new product at a press event in Santa Clara and revealed some interesting information about the goals and design of the new CPU. While terascale processing has been discussed for some time, this new CPU is the first to integrate full IA x86 cores rather than simple floating point units. The 48 cores are set 2 to a 'tile' and each tile communicates with others via a 2D mesh network capable of 256 GB/s rather than a large cache structure. "
Larrabee? (Score:1, Interesting)
This seems like it would be very related to their Larrabee GPU project.
Re:Code Name is Offensive (Score:0, Interesting)
It makes sense, afterall.
Intel made a new 48 core x86 chip called cloud computing on a chip.
Funny thing is, cloud computer is like outsourcing your computer hardware to a bigger machine that's cheaper to use due to the rental pricings... etc...
The name of the chip you ask?
BANGALORE (outsourcing capital of the world)
Re:Windows 12 (Score:5, Interesting)
Microsoft once had a podcast where they were talking about multi-core CPU kernels. Their belief was that once you had 50+ cores, you would be able to have a mutex for every single COM object element, simply because you could.
Re:Yet another cloud? (Score:3, Interesting)
The term "cloud" is over-used, but a 48-core chip is certainly a good match for anyone who uses virtualization, and cloud-style data services are absolutely big users of virtualization.
Cloud computing is certainly a big deal. I recently explained to my boss that instead of spending weeks going through tickets, bureaucracy, approvals, and procurement to get a server in our own datacenter, we could go to Amazon, type the credit card number, and be up-and-running with a few clicks!
I don't know if he understood exactly what cloud computing *is*, but he knows it is important and will have a major impact on IT. So when someone mentions the word "cloud" he listens. Marketers are aware of this sort of thing, so they deliberately use these terms as liberally as possible.
48 is sufficient for most Ph.D. dissertations. (Score:5, Interesting)
Unlike Stanford University, UCSB lacks the money to build a full-blown multiprocessor system. If UCSB had such a system back in the 1990s, then UCSB would likely have produced as much multiprocessor research as Stanford University.
This 48-core processor chip, due to the fact that it will eventually be a commercial product mass-produced by the millions of units, will be economically cheap. This chip will enable UCSB to build or buy a cheap multiprocessor system.
A bunch of graduate students is already salivating at the prospect. They are drooling.
Is there enugh cpu to chipset bandwith to make use (Score:5, Interesting)
Is there enough cpu to chipset bandwidth to make use of all this cpu power?
Re:Advantages over just adding more FPUs? (Score:5, Interesting)
Can someone elaborate on why you'd want 48 full processors, rather than a processor with two (dual) or four (quad) "cores" (I'm presuming core in this case == FPU in the article). Supposedly Win7's SMP support becomes much more effective at the 12-16 core thresehold.
The first thought comes to mind if video processing and CGI animations because those applications are embarrassingly parallel [wikipedia.org].
And those companies usually have the money to spend on top of the line hardware.
Eventually this will trickle down to consumer level as always and people at home can now do real time movie quality CGI on their home computers in 10 years.
Re:Advantages over just adding more FPUs? (Score:4, Interesting)
GPUs are using 256 bit wide data paths now to improve data throughput; I think it is only a matter of time until the memory bus is a whole cache line (256 bits?) in width, enabling read/writing of entire cache lines in a single operation. Seems simple to me, but your pin count and power usage go up, as well as the number of separate DRAM chips you need for a wider memory bus.
Re:Advantages over just adding more FPUs? (Score:3, Interesting)
Embarrassingly parallel is right. Cache coherency was sacrificed in order to up the number of cores, though I suppose a Beowulf on a chip is still useful for some things.
Re:Is there enugh cpu to chipset bandwith to make (Score:4, Interesting)
Is there enough cpu to chipset bandwidth to make use of all this cpu power?
That's really going to depend on the intended use. And on whether the intended use involves problems that a) can be efficiently parallelized, and more importantly, b) actually have been efficiently parallelized. But unless each core gets its own memory bus and its own dedicated memory with its own cache, I rather expect that the only things that are going to be parallelized to their maximum potential are wait states. All that said, it will still probably run faster than a two- or four-core CPU for many tasks, but it won't be running 48 times faster. I would not, however, refuse a manufacturer's sample if one was handed to me. ;)
On the positive side, if this beast actually makes it to market, it might help spur the development of new parallel software.
NUMA vs SMP (Score:3, Interesting)
In my experience Windows 7 64 bit is noticeably faster with NUMA configuration (Windows experience index is significantly higher because of improved memory throughput) and majority of application also run up to 10 % faster.
I don't know if this is because of Nehalem Xeon CPUs having faster access to CPU local memory in NUMA configuration or if windows is also optimized for this?
Re:Advantages over just adding more FPUs? (Score:3, Interesting)
I was recently reading an article about multi core designs and they said they'll have to drop cache coherency at some point soon and redesign locking a bit. Some other architectures don't use cache coherency to help with scaling, but that's not x86.
Re:Advantages over just adding more FPUs? (Score:3, Interesting)
Re:Advantages over just adding more FPUs? (Score:4, Interesting)
A cache line on a modern Intel/AMD processor is actually 512 bits, or 64 bytes.
A memory bus 512 bits wide wouldn't really help much, though -- right now when dealing with memory, most of the time is spent in the various latencies. When you are fetching a lot of memory sequentially, you can get insane speeds even today. But that's not how you usually read memory -- instead, you read a few words from different locations, and the memory controller needs to activate the correct bank, row and column before you get what you need. On typical PC-10600 DDR3, that means at least 15 bus cycles just waiting around for the memory to adjust. Making the bus 512 bits wide would speed up the actual transfer to one bus cycle from the 4 what it takes currently, but that would only mean an improvement of about 15% -- at a huge cost for having to accommodate those 384 extra data lines on the chip, socket, motherboard and ram. It's better just to try to speed up the memory so burst transfers happen "fast enough".
I don't know about nvidia cards, but at least for ati the card doesn't actually have a 256 bit memory interface -- instead, it has 4 completely separate 64-bit memory channels connected to a fast ring bus. The interleaving of data on those separate memory channels is done very coarsely -- basically, entire textures and such are allocated on a single channel. This means that when that texture is being fetched, the 3 other channels can serve other requests.
This is the way I see cpu's evolve too -- even on current hardware, namely phenom 2, you get better performance when you ungang the memory channels, and wait 8 cycles for a single memory transfer instead of 4, because that way you get to wait on separate latencies on the separate channels at the same time. Of course, in the perverse case all the data you want to access resides on one of the channels, but the chance of that happening by accident is pretty much nil.
Re:48 is sufficient for most Ph.D. dissertations. (Score:3, Interesting)
That's pretty funny.
Made me think about how I created beautiful reports, using LaTeX, on a simple 100 MHz Pentium machine running Slackware Linux. Now there's Office 2010 coming up, and I'm not sure what the system requirements are, but I'm pretty sure it doesn't do ligatures [wikipedia.org].
(Ligatures: when you write "finally", the dot on the i looks funny next to the top of the f, thus LaTeX creates one specially designed character, a ligature, just to make it look good.)