AMD's Fusion CPU + GPU Will Ship This Year 138
mr_sifter writes "Intel might have beaten AMD to the punch with a CPU featuring a built-in GPU, but it relied on a relatively crude process of simply packaging two separate dies together. AMD's long-discussed Fusion product integrates the two key components into one die, and the company is confident it will be out this year — earlier than had been expected."
Re:OK, they're integrated "properly", but... (Score:5, Informative)
I recently went from an older AMD dual core to a Phenom II. With the exact same board and hardware, my memory performance increased by about 20% thanks to the independent memory controllers.
AMD also makes strikingly capable on-board graphics, so this will likely rule out the need for on-board or discrete video in the average person's computer. Cheaper/simpler motherboards and hopefully better integration of GPGPU functionality for massively parallel computational tasks.
Re:OK, they're integrated "properly", but... (Score:4, Informative)
AMD Fusion is about GPGPU (Score:3, Informative)
Re:OK, they're integrated "properly", but... (Score:3, Informative)
Is your CPU + motherboard combo cheaper than typical combo from some other manufacturer that has notably higher performance and compatibility?
With greater usage of GPUs for general computation, the point is that not only gamers "give a fuck" nowadays.
PS. If something runs HL2, it can run Portal. As my old Radeon 8500 did, hence also certainly integrated 9100 of parent poster.
Re:OK, they're integrated "properly", but... (Score:5, Informative)
You're behind the times.
http://en.wikipedia.org/wiki/ATI_Hybrid_Graphics [wikipedia.org]
http://en.wikipedia.org/wiki/Scalable_Link_Interface#Hybrid_SLI [wikipedia.org]
Re:CUDA? (Score:2, Informative)
CUDA is Nvidia.
ATI has Stream.
Re:OK, they're integrated "properly", but... (Score:5, Informative)
The page for the GMA 950 [intel.com] even has this hilarious tidbit:
"With a powerful 400MHz core and DirectX* 9 3D hardware acceleration, Intel® GMA 950 graphics provides performance on par with mainstream graphics card solutions that would typically cost significantly more."
Whoever wrote that line must have been borrowing Steve's Reality Distortion Field.
Re:This Is Good For everyone (Score:3, Informative)
Actually Intel had a radical way to handle this - Larrabee. It was going to be 48 in order processors on a die with Larrabee new instructions. There was a Siggraph paper with very impressive scalability figures [intel.com] for a bunch of games running DirectX in software - they captured the DirectX calls from a machine with a conventional CPU and GPU and injected them into a Larrabee simulator.
This was going to be a very interesting machine - you'd have a machine with good but not great gaming performance and killer server performance - servers are naturally "embarrassingly parallel" because you can have one thread per client. A sort of x86 take on Sun's Niagra.
Of course there are problems with this sort of approach. Most current games are not very well threaded - they have a small number of threads that will run poorly on an in order CPU. So if the only chip you had was a Larrabee and it was both a CPU and a GPU the GPU part would be well balanced across multiple cores. The CPU part would likely not. You have to wonder about memory bandwidth too.
Larrabee was switched to be a GPU only and then canned.
Of course as a pure GPU it is a bit of a poor design. Real GPUs don't drag in x86 compatibility - they can implement whatever instruction set is best and nothing else. The instruction set is not publicly exposed and can change from generation to generation. You can cram a lot more than 48 cores onto a GPU and the peak performance is higher. Power consumption is lower too.
Still a modern gaming GPU is huge - there's no way you're going to cram it and a modern GPU onto a die and get something affordable. Then again CPUGPU chips are probably not aimed at gamers - there's an argument for having a CPU and a stripped down integrated GPU on one chip for netbooks like the latest Atoms do.
You could cram in a chipset too to reduce the price on netbooks.
Re:This Is Good For everyone (Score:3, Informative)
Of course there are problems with this sort of approach. Most current games are not very well threaded - they have a small number of threads that will run poorly on an in order CPU. So if the only chip you had was a Larrabee and it was both a CPU and a GPU the GPU part would be well balanced across multiple cores. The CPU part would likely not. You have to wonder about memory bandwidth too.
I believe that it was in fact memory bandwidth which killed larrabee. A GPU's memory controller is nothing like a CPU's memory controller, so trying to make a many-core CPU behave like a GPU while still also behaving like a CPU just doesnt work very well.
Modern good performing GPU's require the memory controller be specifically tailored to filling large cache blocks. Latency isnt that big of an issue. The GPU is likely to need the entire cache line, so latency is sacrificed for more bandwidth. The latency is amortized over many many operations.
CPU's on the other hand require the memory controller be tailored to filling small cache blocks. Latency is a big issue. The CPU may only want or need 4 bytes from that cache line, so latency can't be sacrificed for bandwidth. The latency may not be amortized over many operations.
Re:This Is Good For everyone (Score:2, Informative)
It seems like the caching issues could be fixed with prefetch instructions that can fetch bigger chunks. Which it apparently has.
Still just fetching instructions for 48 cores is a huge amount of bandwidth.
http://perilsofparallel.blogspot.com/2010/01/problem-with-larrabee.html [blogspot.com]
Let's say there are 100 processors (high end of numbers I've heard). 4 threads / processor. 2 GHz (he said the clock was measured in GHz).
That's 100 cores x 4 treads x 2 GHz x 2 bytes = 1600 GB/s.
Let's put that number in perspective:
* It's moving more than the entire contents of a 1.5 TB disk drive every second.
* It's more than 100 times the bandwidth of Intel's shiny new QuickPath system interconnect (12.8 GB/s per direction).
* It would soak up the output of 33 banks of DDR3-SDRAM, all three channels, 192 bits per channel, 48 GB/s aggregate per bank.
In other words, it's impossible.
So 48 cores needs 16 banks of DDR3-SDRAM.