AMD Details Next-Gen Kaveri APU's Shared Memory Architecture

Please create an account to participate in the Slashdot moderation system

AMD Details Next-Gen Kaveri APU's Shared Memory Architecture 128

Posted by timothy on Tuesday April 30, 2013 @01:25PM from the grander-unified-theory dept.

crookedvulture writes "AMD has revealed more details about the unified memory architecture of its next-generation Kaveri APU. The chip's CPU and GPU components will have a shared address space and will also share both physical and virtual memory. GPU compute applications should be able to share data between the processor's CPU cores and graphics ALUs, and the caches on those components will be fully coherent. This so-called heterogeneous uniform memory access, or hUMA, supports configurations with either DDR3 or GDDR5 memory. It's also based entirely in hardware and should work with any operating system. Kaveri is due later this year and will also have updated Steamroller CPU cores and a GPU based on the current Graphics Core Next architecture." bigwophh writes links to the Hot Hardware take on the story, and writes "AMD claims that programming for hUMA-enabled platforms should ease software development and potentially lower development costs as well. The technology is supported by mainstream programming languages like Python, C++, and Java, and should allow developers to more simply code for a particular compute resource with no need for special APIs."

This discussion has been archived. No new comments can be posted.

AMD Details Next-Gen Kaveri APU's Shared Memory Architecture

Load All Comments

Search 128 Comments Log In/Create an Account

Comments Filter:

The PS4 (Score:4, Interesting)

by MXPS ( 1091249 ) writes: on Tuesday April 30, 2013 @01:38PM (#43592549)

will feature this technology. It will be interesting to see how it stacks up.

Share
twitter facebook
- - Re: (Score:3)
    
    by Wesley Felter ( 138342 ) writes:
    
    One of the problems with the PS3 is that it didn't have shared memory. Maybe you're thinking of the 360.
  - Re: (Score:2)
    
    by triffid_98 ( 899609 ) writes:
    
    And unlike the Atari Jaguar, it will actually be a 64 bit system. *rimshot*
- Re: (Score:2)
  
  by thoper ( 838719 ) writes:
  
  in effect, the ps4 memory is even more integrated.. see: here [sonyps4playstation.com] and here [polygon.com]
Interesting (Score:2)

by Malenx ( 1453851 ) writes:

I'm curious how long it will be before these optimizations are found in the compilers themselves.
Where's the fine print? (Score:1)

by madwheel ( 1617723 ) writes:

As usual, AMD is leaving out some key information. What will be the TDP of such chips? I've always rooted for AMD and all my systems were built with them. You can't beat an Ivy Bridge chip for performance for watt though. With the i7-3770K, AMD doesn't offer anything compelling to compete. I like the idea that they're using the GCN architecture to assist with processing, but have they done anything to the lithography or power consumption? Intel's haswell chips come out soon and those are even better.
- Re: (Score:2)
  
  by K. S. Kyosuke ( 729550 ) writes:
  
  Power is key in the mobile space where a lot of chips are going. -Joe
  I hope that your i7-3770K is serving you well in your cell phone.
  - Re: (Score:1)
    
    by madwheel ( 1617723 ) writes:
    
    I guess I need to provide more information to help get my point across. Intel has 4th gen chips that run on a 7 watt TDP. The performance per watt is pretty remarkable. Intel's i7-3770K has a 77 watt TDP. AMD's FX-8350 has a 125 watt TDP, get's spanked by Intel in most benchmarks, and doesn't have any graphics chip on die to drive a monitor. Translating that down, Intel has an advantage. I would love to be proven wrong though.
    - Re: (Score:2)
      
      by K. S. Kyosuke ( 729550 ) writes:
      
      Intel's i7-3770K has a 77 watt TDP. AMD's FX-8350 has a 125 watt TDP, get's spanked by Intel in most benchmarks, and doesn't have any graphics chip on die to drive a monitor.
      You know, that might be exactly the problem here. This is something completely different. If the GPU will be any decent, chances are that a combination of a high-end-GPU equipped APU with a lot of GDDR5 memory would make many HPC people much happier than Haswell ever could. In some application areas, it's all about bandwidth. Today, if you're trying to do HPC on, say, a 20GB dataset in memory, on a single machine, you're screwed.
    - Re: (Score:2)
      
      by serviscope_minor ( 664417 ) writes:
      
      Translating that down, Intel has an advantage.
      i7 3770k: Â£250
      FX 8350: Â£160
      Yes. Advantage Intel. Also take into account that quality motherboards are usually cheaper for AMD and that one can also upgrade more easily.
      The more apt comparison is to some i5. At that point, the 8350 beats it in a large number of benchmarks (and does actually beat the much more expensive i7). Basically in multi threaded code the FX8350 wins. In single threaded code the i5 wins.
      - Re: (Score:1)
        
        by madwheel ( 1617723 ) writes:
        
        I do agree with you. I'm simply referring to the simple tasks the general public does. Web surfing, iTunes, emails, etc. These are not heavily threaded tasks. Granted the difference is marginal because any modern processor can handle this with ease. Sure in highly threaded workloads the AMDs offer a better bang for your buck, but the general public does not do this on a day to day basis.
        
        Re: (Score:2)
        
        by fast turtle ( 1118037 ) writes:
        
        The only way you can state that "each tab in the browser is another thread" is if you're not using firefox. Even on 20.1, they still don't have that and a single bad tab can and does take the entire browser down. Hell even IE 9 finally got it straight with tabs. Doesn't handle too many tabs at once but it runs each one i a seperate thread. Just like Chrome does. Opera does the same AFAIK (don't use it). You're right about background tasks and such though as the multithread performance of an AMD CPU is far b
    - Re: (Score:2, Interesting)
      
      by skids ( 119237 ) writes:
      
      Speaking as someone currently considering buying slightly behind the curve, I was all set to jump on an Intel-based fanless system because of the TDP figures. However, with the PowerVR versions of the Intel GPU c**k-blocking linux graphics, and with AMD finally open-sourcing UVD, I'm now back to considering a Brazos. Less choices for fanless pre-built systems, though. May have to skip on the pay-a-younger-geek-because-I-dont-enjoy-playing-legos-anymore part.
      So no, for some markets, Intel has not yet real
      - Re: (Score:2)
        
        by tibman ( 623933 ) writes:
        
        You're welcome : ) http://4changboard.wikia.com/wiki/Falcon_Guide [wikia.com]
      - Re: (Score:2)
        
        by ardor ( 673957 ) writes:
        
        The sad thing is that the PowerVR's are actually pretty decent. The drivers (made by Intel) are to blame.
- Re:Where's the fine print? (Score:5, Insightful)
  
  by serviscope_minor ( 664417 ) writes: on Tuesday April 30, 2013 @01:56PM (#43592737) Journal
  
  You can't beat an Ivy Bridge chip for performance for watt though.
  Ehugh. Yes no kind of.
  For "general" workloads IVB chips are the best in performance per Watt.
  In some specific workloads, the high core count piledrivers beat IVB, but that's rare. For almost all x86 work IVB wins.
  For highly parallel churny work that GPUs excel at, they beat all X86 processors by a very wide margin. This is not surprising. They replace all the expensive silicon that make general purpose processors go fast and put in MOAR ALUs. So much like the long line of accelerators, co processors, DSPs and so on, they make certain kinds of work go very fast and are useless at others.
  But for quite a few classes of work, GPUs trounce IVB at performance per Watt.
  The trouble is that GPUs suck. They have teeny amounts of local memory and a slow interconnect to main memory. They also suck at certain things and batting data between the fast (for some things) GPU and fast (for other things) CPU is a real drag becuase of the latency. This limits the applicability of GPUs.
  Only with the new architecture, which I (and presumably many others) hoped was AMDs long term goal a number of these problems have disappeared since the link is very low latency and the memory fully shared.
  This means the very superior performance per Watt (for some things) GPU can be used for a wider range of tasks.
  So yes, this should do a lot for power consumption for a number of tasks.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by parlancex ( 1322105 ) writes:
    
    The trouble is that GPUs suck. They have teeny amounts of local memory and a slow interconnect to main memory. They also suck at certain things and batting data between the fast (for some things) GPU and fast (for other things) CPU is a real drag becuase of the latency. This limits the applicability of GPUs.
    The "slow interconnect" you're talking about to main memory, PCI Express v3.0 has an effective bandwidth of 32GB/s which actually exceeds the best main memory bandwidth you'd get out of an Ivy Bridge CPU with very fast memory, so no, that's not a bottleneck for bandwidth, though yes, there is some latency there.
    
    I don't know why everyone seems to forget that GPUs aren't just fast because they have a lot of ALUs (TFA included), they are fast because of the highly specialized GDDR memory they are attached t
    - Re:Where's the fine print? (Score:4, Informative)
      
      by bored ( 40072 ) writes: on Tuesday April 30, 2013 @02:52PM (#43593315)
      
      The "slow interconnect" you're talking about to main memory, PCI Express v3.0 has an effective bandwidth of 32GB/s which actually exceeds the best main memory bandwidth you'd get out of an Ivy Bridge CPU with very fast memory, so no, that's not a bottleneck for bandwidth, though yes, there is some latency there.
      Its both, for my application, the GPU is roughly 3x-5x as fast as a high end CPU. This is fairly common on a lot of GPGPU workloads. The GPU provides a decent but not huge performance advantage.
      But, we don't use the GPU! Why not? Because copying the data over the PCIe link, waiting for the GPU to complete the task, and then copying the data back over the PCI bus yields a net performance loss over just doing it on the CPU.
      In theory, a GPU sharing the memory subsystem with the CPU avoids this copy latency. Nor does it preclude still having a parallel memory subsystem dedicated for local accesses on the GPU. That is the "nice" thing about opencl/CUDA the programmer can control the memory subsystems at a very fine level.
      Whether or not AMD's solution helps our application remains to be seen. Even if it doesn't its possible it helps some portion of the GPGPU community.
      BTW:
      In our situation its a server system so it has more memory bandwidth than your average desktop. On the other hand, i've never seen a GPU pull more than small percentage of the memory bandwidth over the PCIe links doing copies. Nvidia ships a raw copy benchmark with the CUDA SDK, try it on your machines the results (theoretical vs reality) might surprise you.
      
      Parent Share
      twitter facebook
    - Re: (Score:3)
      
      by Rockoon ( 1252108 ) writes:
      
      The "slow interconnect" you're talking about to main memory, PCI Express v3.0 has an effective bandwidth of 32GB/s
      32GB/s doesnt sounds like a lot when you divide it amongst the 400 stream processors that an upper end AMD APU has, and thats as favorable a light as I can shine on your inane bullshit. There is a reason that discrete graphics cards have their own memory, and it isnt because they have more stream processors (these days they do, but they didnt always) .. its because PCI Express isnt anywhere near fast enough to feed any modern GPU.
      
      Llano APU's have been witnessed pulling 500 GFLOPS. Does 32GB/s still sound
      - Re: (Score:2)
        
        by parlancex ( 1322105 ) writes:
        
        Sigh. Here I go feeding the trolls.
        I'm not sure what point you're trying to make here, since MY main point in the rest of this topic was that modern GPUs are mostly limited by memory bandwidth, which makes the development in TFA pretty pointless. You're right! 32GB/s isn't enough to make the most of the computing resources available on a modern GPU! That was my point; How exactly would the GPU accessing main memory directly help? The fastest system RAM currently available in consumer markets in the fastest
        
        Re: (Score:2)
        
        by cynyr ( 703126 ) writes:
        
        because you wouldn't need to transfer it between the CPU and GPU? you could just pooint the GPU at main system ram and let it have at it.
  - Re: (Score:2)
    
    by Kjella ( 173770 ) writes:
    
    Assuming you're willing to write special software that'll only see benefit on AMDs APUs, not on Intel nor anything with discrete GPUs. I suppose it's different for the PS4 or Xbox720 where you can assume that everyone that'll use the software will have it, but for most PC software the advantages would have to be very big indeed. If you need tons of shading power it's better to run on discrete GPUs, even with unified memory switching between shaders and cores isn't entirely free so it might not do that much
  - - Re: (Score:2)
      
      by halltk1983 ( 855209 ) writes:
      
      If you prefer one hardware over the other without seeing benchmarks, then you are someone that is usually referred to as a "fanboy". Have fun with that.
- Re: (Score:2)
  
  by Luckyo ( 1726890 ) writes:
  
  In terms of APUs, they have intel not just beat but utterly demolished. Intel has absolutely nothing on AMD when it comes to combination of slowish low TDP CPU and a built in GPU with performance of a low end discreet GPU.
  And while they lack CPU power for high end, wouldn't you want a discreet CPU with a discreet GPU in that segment in the first place?
CPU - GPU - CPU latency (Score:2, Interesting)

by Anonymous Coward writes:

This should really help round trip times trough the GPU. With most existing setups, doing a render to texture, and getting the results back CPU side is quite expensive, but this should help a lot. It should also work great for procedural editing/generating/swapping geometry that you are rendering. Getting all those high poly LODs onto the GPU will not longer be an issue with systems like this.
Interestingly enough, this is somewhat similar to what Intel has now for their integrated graphics, except it looks
- Re: (Score:3)
  
  by UnknownSoldier ( 67820 ) writes:
  
  /Oblg.
  http://www.dvhardware.net/news/nvidia_intel_insides_gpu_santa.jpg [dvhardware.net]
  or
  http://media.bestofmicro.com/V/6/233106/original/feature_image09.jpg [bestofmicro.com]
Why compromise? (Score:2)

by parlancex ( 1322105 ) writes:

One question they never seem to answer is why bother unifying the memory architecture at all? CPU and GPU memory architectures have always been different for the same reasons that CPUs and GPUs themselves are different; one is designed for fast execution of serial instructions with corresponding random smaller reads and writes to memory, and the other is designed for fast execution of parallel instructions with corresponding contiguous reads and writes that are much larger in size. It seems like you're just
- - Re: (Score:1)
    
    by kukulcan ( 1440401 ) writes:
    
    I agree with you.
    Having a unified memory is a nice thing, but i expect it will only make a difference in something like the PS4, where you can target a specific architecture, which has GDDR5 as main memory, and doesn't have a discrete GPU. These two points are relevant: if you have "normal" DDR3 you loose a lot more than you gain by having UMA, and this will not change a thing in discrete GPUs because the PCIe bus is going to always be in the way of the GPU accessing main memory.
    I think it is more a "ni
- Re:Why compromise? (Score:5, Informative)
  
  by SenatorPerry ( 46227 ) writes: on Tuesday April 30, 2013 @02:13PM (#43592891)
  
  In OpenCL you need to copy items from the system memory to the GPU's memory and then load the kernel on the GPU to start execution. Then you must copy the data back from the GPU's memory at the end after execution. AMD is saying that you can instead pass a pointer to the data in the main memory instead of actually making copies of the data.
  This should reduce some of the memory shifting on the system and speed up OpenCL execution. It will also eliminate some of the memory constraints on OpenCL regarding what you can do on the GPU. On a larger scale it will open up some opportunities for optimizing work.
  
  Parent Share
  twitter facebook
  - - Re: (Score:2)
      
      by hedwards ( 940851 ) writes:
      
      I'm sure that they've thought about that already. The question is whether they've done the work necessary to deal with the problem.
    - Re: (Score:1)
      
      by GoatCheez ( 1226876 ) writes:
      
      That wouldn't exist otherwise because.......?
- Re: (Score:2)
  
  by dgatwood ( 11270 ) writes:
  
  I can see the benefit of being able to allocate a GPU/CPU-shared memory region in VRAM for fast passing of information to the GPU without a copy, but apart from making the above concept slightly cheaper to implement, the only benefit I could come up with for allowing the GPU access to main memory is making password theft easier. That and letting their driver developers write sloppier code that doesn't have to distinguish between two types of addresses....
  The most hilarious part of this is that while they'
- Re:Why compromise? (Score:5, Insightful)
  
  by forkazoo ( 138186 ) writes: <<wrosecrans> <at> <gmail.com>> on Tuesday April 30, 2013 @02:13PM (#43592895) Homepage
  
  Because when you are doing stuff like OpenCL, dispatching from CPU space to GPU space has a huge overhead. The GPU may be 100x better at doing a problem than the CPU, but it takes so long to transfer data over to the GPU and set things up that it may still be faster to do it on the CPU. It's basically the same argument that led to the FPU being moved onto the same chip as the CPU a generation ago. There was a time when the FPU was a completely separate chip,a nd there were valid reasons why it ought to be. But, moving it on chip was ultimately a huge performance win. The idea behind AMD's strategy is basically to move the GPU so close to the CPU that you use it as freely as we currently use the FPU.
  
  Parent Share
  twitter facebook
  - Re: (Score:1)
    
    by parlancex ( 1322105 ) writes:
    
    Wrong! The GPU is only 100x faster at doing certain problems because of the fast GDDR memory it is attached to which is optimized for very large sequential reads and writes. There are a tiny number of applications that require huge numbers of FLOPs on very small amounts of data (BitCoin mining and password hashing attacks come to mind, but that's about it.)
    - - Re: (Score:2)
        
        by parlancex ( 1322105 ) writes:
        
        Yeah, that's about right. I was just quoting from the OP there. An application that was already properly optimized on the CPU generally only sees performance gains of around 10 to 20x in best case scenarios.
- - - Re: (Score:2)
      
      by gmueckl ( 950314 ) writes:
      
      They sell this stuff under the brand name Xeon Phi now. It's something like 60 simplified x86-like units on a die. Looks like they only cater to big orders from supercomputer builders right now.
- The real reason is cost (Score:2)
  
  by Wesley Felter ( 138342 ) writes:
  
  In low-cost systems the CPU and GPU are combined on a single chip with a single (slow) memory controller. Given that constraint, AMD is trying to at least wring as much efficiency as they can from that single cheap chip. I salute them for trying to give customers more for their money, but let's admit that this hUMA thing is not about breaking performance records.
- Re: (Score:2)
  
  by markhahn ( 122033 ) writes:
  
  nah. providing wider and faster memory will help even purely CPU codes, even those that are often quite cache-friendly. the main issue is that people want to do more GPUish stuff - it's not enough to serially recalculate your excel spreadsheet. you want to run 10k MC sims driven from that spreadsheet, and that's a GPU-like load.
  but really it's not up to anyone to choose. add-in GPU cards are dying fast, and CPUs almost all have GPUs. so this is really about treating APUs honestly, rather than trying to
Security model? (Score:2)

by dgatwood ( 11270 ) writes:

They talk about passing pointers back and forth as though the GPU and CPU effectively share an MMU. The problem is, GPUs and CPUs don't work the same way. GPUs need to access shared resources that are per-system, whereas CPUs need to limit access to resources on a per-process basis. It would be devastating if a GPU could, for example, allow an arbitrary user-space process to overwrite parts of the kernel and inject virus code that runs with greater-than-root privilege. It would similarly be devastating
- Re: (Score:2)
  
  by forkazoo ( 138186 ) writes:
  
  My understanding is that there will indeed be something like RWX control. Not just for security, but also for performance. If boths ides can freely write to a chunk of memory, you can get into difficulties accounting for caches in a fast way.
  That said, if the CPU and the GPU are basically sharing an MMU, then the GPU may be restricted from accessing pages that belong to process that aren't being rendered/computed. There's no reason why two different applications should be able to clobber each other's tex
- Re: (Score:2)
  
  by frank_adrian314159 ( 469671 ) writes:
  
  GPUs need to access shared resources that are per-system, whereas CPUs need to limit access to resources on a per-process basis.
  If you plan to make the GPU easy to use as a general computing resource (which, according to the writeup, seems to be what they're aiming at) the GPU needs to also be working at a per-process basis and linked to the main system memory so that results are easily available to the main system for I/O, etc.
  Of course, even if this is their goal, one question still remains... Will thi
Name is a pun (Score:2, Informative)

by Anonymous Coward writes:

Apparently not too many finnish speakers here yet. Kaveri => partner/pal/mate, APU => help.
HTH,
ac
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Apparently not too many finnish speakers here yet. Kaveri => partner/pal/mate, APU => help.
  HTH,
  ac
  "Kaveri" is actually the name of a major river in Karnataka, a state in India. AMD names its cores on major rivers all around the world.
  HTH.
- - Re: (Score:2)
    
    by jones_supa ( 887896 ) writes:
    
    Yes, buddy would be spot on. :)
    But all in all, the chip's name really sounds sympathetic to the Finnish ear!
- Re: (Score:2)
  
  by Radak ( 126696 ) writes:
  
  It's the night before Vappu. We're way too busy getting drunk in Finland.
- Re: (Score:2)
  
  by TeknoHog ( 164938 ) writes:
  
  It's not just Finnish. Hebrew chaver [wiktionary.org] may be the common etymology for both this and the Finnish word. It is also the origin of the Dutch word gabber [wikipedia.org].
  OTOH, the pun with APU is harder to explain without Finnish.
Here's to better AI! (Score:2)

by pieisgood ( 841871 ) writes:

With a GPU next to the CPU the latency between them is reduced, this is awesome for OpenCL applications. Imagine you wanted to work a markov model into your AI and you needed to a large number of matrix calculations to get it to run properly and you want it in real time, I think this might solve that problem. I'm imagining game AI improving with adoption of this style of processor. Anyone see this differently?
- Re: (Score:2)
  
  by godrik ( 1287354 ) writes:
  
  I don't know... This heterogeneous computing with low latency seems interesting if it does not harm raw performance. The main advantage would be to transport data back and forth between the two. If the computation on one side is long, then the decrease in latency is not very useful. If both of them are really fast, then there is not too much to gain to begin with.
  It really helps when you need fast turn around so for small and very synchronous computation. I am waiting to see one good usecase.
- Re: (Score:2)
  
  by gmueckl ( 950314 ) writes:
  
  Learning AIs in games have been problematic in the past. Mostly it is about control over the experience that gets delivered to the customer: as a designer your job is to get it just right. You can do this easily with current more or less heuristic AI algorithms. The ability to learn opens the scope of possible behaviours so much that it's not possible anymore to deliver a guaranteed experience.
  Short version: the designer can't stop the game from going nuts in unpredictable ways because of stupid player inpu
- Re: (Score:2)
  
  by dstyle5 ( 702493 ) writes:
  
  Hi Charlie!
- Re: (Score:2)
  
  by mooglez ( 795643 ) writes:
  
  Kaveri is also finnish, and means "a friend", seems to describe the system decently.
heterogenous computing overrated (Score:1)

by edxwelch ( 600979 ) writes:

I think AMD overrate heterogenous computing. The assumption is that all applications can take advantage of GPGPU. This is simply not true. Only certain types of application are suitable, such as multimedia and simulation - where it's very obvious what part of the code can be parallelised.
This looks an awful lot like the PS4 (Score:3)

by juancn ( 596002 ) writes: on Tuesday April 30, 2013 @03:11PM (#43593527) Homepage

Today I read an an article in Gamasutra [gamasutra.com] that details some of the internals of the PlayStation 4 and the architecture looks a lot like what's described here.
With GDDR5 memory this could be very interesting.

Share
twitter facebook
- Re: (Score:2)
  
  by UnknownSoldier ( 67820 ) writes:
  
  Holy crap -- has hell frozen over? Sony is actually thinking about developers for once!? Using (mostly) off the shelf commodity parts is definitely going to help win back some developers. Time will tell if "they are less evil then Microsoft"
  Thanks for the great read.
SGI O2 reinvented (Score:3)

by Shinobi ( 19308 ) writes: on Tuesday April 30, 2013 @04:11PM (#43594197)

OK, so the SGI O2's UMA has now been reinvented for a new generation, just with more words tacked on....

Share
twitter facebook
What about the software model (Score:2)

by PhamNguyen ( 2695929 ) writes:

I'm interested to see what the software model for this will be. Sure they could use OpenCL, but it seems like a lot of the pain in using OpenCL derives from the underlying memory architecture. With a shared virtual address space and fully coherent caches all in hardware, it should be possible to have a much simpler software model than OpenCL. I guess it doesn't really matter what the software model is though since now that everything is in main memory, GPU functions can be called just like regular functi
- Re: (Score:2)
  
  by serviscope_minor ( 664417 ) writes:
  
  Indeed it should be easier. There will still be some cost, since the processors are still in thread bundles and still trade speed for throughput, but the cost will be much lower. I expect the break even point will be pretty small though and won't have the huge disadvantage of limited memory for very large things.
  I wonder what the low level locking primitives between the GPU and CPU will be. Those will have some effect on the speed.
  I also wonder what/how the stream processors will be dealt with by the OS an
whaaaaaaat? (Score:1)

by slashmydots ( 2189826 ) writes:

Why would a graphics card want to use virtual memory? Also, what motherboard takes GDDR5? Who the heck wrote this nonsense?
- One address is better than two (Score:1)
  
  by Ottibus ( 753944 ) writes:
  
  Why would a graphics card want to use virtual memory?
  Shared physical memory avoids the cost of copying data to and from the GPU but without shared virtual memory the data will end up at different addresses on the CPU and GPU. This means that you cannot use pointers to link parts of the data together and must rely on indexes of some sort. This makes it harder to port existing code and data structures to use GPU computation.
  Also, with shared physical memory you have to tell the device which memory you want to use (so that it can tell you which address to use).
GPU/GPGPU bottleneck (Score:2)

by S3D ( 745318 ) writes:

In my experience GPU and especially GPGPU bottleneck is not amount of memory but memory access bandwidth. 256-512 bit is not adequate for existing apps. Before amount of memory will become important manufacturers should move to at least 2048 bit mem bus and also increase amounts of registers per core several times.
IOMMU (Score:1)

by FithisUX ( 855293 ) writes:

I haven't seen this magical word in the presentation. Moreover I do not see the CPU/GPU convergence often talked about. It sounds more like a marketing hype. Moreover the ecosystem could be enriched with DSP or Network processor cores all uniformly offering their resources to software, I did not see it.
OK so how do I (or someone) use this? (Score:1)

by RalphTheWonderLlama ( 927434 ) writes:

The technology is supported by mainstream programming languages like Python, C++, and Java, and should allow developers to more simply code for a particular compute resource with no need for special APIs.
So how do you do this in Java, Python? Did nobody ask? I did a search for "java huma uniform memory access" and this page came up first with nothing from java.com or oracle in sight.
Ok more searching says to use OpenCL and lots of stackoverflow questions... but they're not new... and OpenCL is not Java. W
- Spam Advocates (Score:3)
  
  by TheNinjaroach ( 878876 ) writes:
  
  I'm not so sure how I feel about this whole Linux advocacy thing you're trying to promote. But spam, now there's an idea I can get behind! Take my money!
- Re: (Score:1, Insightful)
  
  by Anonymous Coward writes:
  
  The APU graphics kick the shit out of Intels, and now, you don't even need a memory->vid memory BUS. Think about it
  - Re: No match for Haswell (Score:3)
    
    by UnknowingFool ( 672806 ) writes:
    
    When someone asks me about buying AMD or Intel, the general summarization I give them is that AMD's built-in GPU handily beats Intel's built-in GPU but Intel's CPU beats AMD's CPU. If graphics are a big concern, they should get a cheap discrete card as one under $100 will be good for most games. Thus AMD's advantage is negated. Also both companies offer more CPU processing power than most consumers can use anyway.
    - - Re: (Score:2)
        
        by fast turtle ( 1118037 ) writes:
        
        radeon 5670 with 512m onboard (cost from newegg when bought $90) plays GW, SC and all the other games Iv'e thrown at it quite handily. Will have probs if game is is heavily tesserected but that's the only time it's a prob and I run 1900x1080 (monitor native rez) and the funniest thing is - the new radeon drivers support the damn thing while my 7300GT is no longer supported by either Nvidia or Linux, even with the god damn nouveua and nv driver. The reason I still have the old Geforce 7300GT - it's fanless s
    - Re: (Score:1)
      
      by Flodis ( 998453 ) writes:
      
      You do realize that what you're saying is an argument for AMD, don't you?
      both companies offer more CPU processing power than most consumers can use anyway.
      Ok. Noted. Either will do fine CPU-wise.
      AMD's built-in GPU handily beats Intel's built-in GPU
      Ah. Great. So AMD is the better buy then.
      
      Not only that, but it will save ~$100 on the CPU and ~$50 more on the motherboard. That's GREAT advice.
      
      But no.. Then we hear this;
      If graphics are a big concern, they should get a cheap discrete card as one under $1
      - Re: No match for Haswell (Score:2)
        
        by UnknowingFool ( 672806 ) writes:
        
        No. I'm saying if the user intends to get a discrete GPU there isn't an advantage to AMD and a slight advantage to Intel. But most consumers don't do anything that would see a difference anyways. Either works.
        
        Re: (Score:1)
        
        by Flodis ( 998453 ) writes:
        
        Not trying to pick a fight here, but I don't think this computes unless you change your mind about the importance of the CPU's computational power, or take some other - not yet mentioned - factor(*) into consideration.
        
        Eg: If the user intends to get a discrete GPU, as you say, s/he will have approx $150 more to spend on the GPU if s/he picks the AMD solution. A $250 GPU vs. a $100 GPU is a pretty significant difference. Thus if graphics matter, the user should pick the AMD solution.
        
        (*) of which there is
    - Re: (Score:3)
      
      by cheater512 ( 783349 ) writes:
      
      AMD beats Intel on the price point however.
      And that isn't even counting that with Intel you need to buy a $100 extra card either.
      If you *need* top notch performance, go Intel. Otherwise AMD will be lighter on your wallet and do the same job very well.
    - Re: (Score:2)
      
      by camg188 ( 932324 ) writes:
      
      I think AMD's target for this architecture is a typical Walmart shopper (lower price point, higher sales volume) looking to buy a laptop, so add-on video cards are out of the question. The first 2 questions this type of shopper will ask is "how much?' and "which one is better?"
      - Re: (Score:2)
        
        by serviscope_minor ( 664417 ) writes:
        
        I think AMD's target for this architecture is a typical Walmart shopper
        Partly that and partly it's way more interesting. The unified memory trades performance for flexibility (as always), but puts it in a very interesting space. Less performant than a discrete GPU with a crazy memory architecture, but puts tons more FPU grunt under the flexible memory susbsystem of easy to use CPUs.
        It will make acceleration more applicable to a much wider range of tasks at the cost of being slower on some.
        Due to the close c

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

The PS4 (Score:4, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Interesting (Score:2)

Where's the fine print? (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:Where's the fine print? (Score:5, Insightful)

Re: (Score:2)

Re:Where's the fine print? (Score:4, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

CPU - GPU - CPU latency (Score:2, Interesting)

Re: (Score:3)

Why compromise? (Score:2)

Re: (Score:1)

Re:Why compromise? (Score:5, Informative)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re:Why compromise? (Score:5, Insightful)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

The real reason is cost (Score:2)

Re: (Score:2)

Security model? (Score:2)

Re: (Score:2)

Re: (Score:2)

Name is a pun (Score:2, Informative)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Here's to better AI! (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

heterogenous computing overrated (Score:1)

This looks an awful lot like the PS4 (Score:3)

Re: (Score:2)

SGI O2 reinvented (Score:3)

What about the software model (Score:2)

Re: (Score:2)

whaaaaaaat? (Score:1)

One address is better than two (Score:1)

GPU/GPGPU bottleneck (Score:2)

IOMMU (Score:1)

OK so how do I (or someone) use this? (Score:1)

Spam Advocates (Score:3)

Re: (Score:1, Insightful)

Re: No match for Haswell (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: No match for Haswell (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals