AMD Previews New Processor Extensions 198
An anonymous reader writes "It has been all over the news today: AMD announced the first of its Extensions for Software Parallelism, a series of x86 extensions to make parallel programming easier. The first are the so-called 'lightweight profiling extensions.' They would give software access to information about cache misses and retired instructions so data structures can be optimized for better performance. The specification is here (PDF). These extensions have a much wider applicability than just parallel programming — they could be used to accelerate Java, .Net, and dynamic optimizers." AMD gave no timeframe for when these proposed extensions would show up in silicon.
Nice, but let's get Barcelona out the door, OK? (Score:1, Interesting)
I realize that the software people and hardware people both have their projects to work on, and they work largely independently in terms of a time-frame, but I figure this news might be timed to say, "Hey! Look at us! We're doing stuff!", but it only serves to frustrate me that their still aren't any real numbers on Barcelona, and, on the whole, that AMD seems to have dropped the ball.
Silicon Problems. (Score:0, Interesting)
The first stab at Barcelona we're getting are going to pathetically under-perform compared to the competition.
Re:And so it goes on... (Score:3, Interesting)
Re:I wish AMD and Intel teamed up for once (Score:1, Interesting)
Re:I wish AMD and Intel teamed up for once (Score:2, Interesting)
Yeah, I know, Intel tried with Itanium.
They already did. I believe the 486 was the last CPU to run x86 instructions natively. Everything since the Pentium has decoded them to a RISC like ISA which can be changed every generation if desired. The only drawback is that a relatively small area of the chip needs to be dedicated to decoding x86 instructions to whatever the internal ISA is.
And guess what? One of the things that people dislike about x86 is the variable length instructions. Turns out that it actually leads to more compact code. And the speed gains from reduced cache usage more than make up for the effort and chip real estate expended on those decoders.
So let's stick with x86 for now, since the gains you foresee are either non-existent or tiny and are never, ever going to outweigh the drawbacks.
Re:Nice, but let's get Barcelona out the door, OK? (Score:1, Interesting)
AMD was "winning" for quite a long time...what happend that has made it impossible for them to come up with something even mildly exciting?
Re:I wish AMD and Intel teamed up for once (Score:2, Interesting)
Re:I wish AMD and Intel teamed up for once (Score:5, Interesting)
Sure, now it is, since the decoding of CISC instructions into micro-ops has largely decoupled ISA from the microarchitecture, allowing many of those neat-o performance features you meantion like out-of-order execution. However in the past this wasn't the case and a lot of x86's odd behaviors that seemed like good ideas when they were made were serious performance limiters. Like a global eflags register that is only partially written by various instructions (and they always write even if the result isn't needed).
Even today, I would say that all those RISC ISAs are better than x86, simply from the standpoint that they are cleaner, easier to decode, have fewer tricky modes to deal with, fewer odd dependencies, and all the other things that make building an actual x86 chip a pain in the arse. No, in the end it makes no difference in performance. Yet, if you had it to do all over again, building the One ISA to Rule Them All without concern for software compatability, and you decided to make something that was more like x86 than Alpha, I'd slap the taste out of your mouth.
But we do have to be concerned with software compatability, and that I think was the GP's main point. All of those other ISAs failed to dominate -- even when there were actual performance implications! -- simply because they were not x86 and hence didn't run the majority of software. IA64 failed not because it was itself all that bad, but because it couldn't run x86 software well. So when AMD came out with 64-bit backward-compatible x86, everyone stopped caring about IA64. Because it wasn't x86, and AMD64 was.
So ultimately I agree with you both, and I don't think the GP was nonsense at all. It's a very valid point -- backward compatability is king, so x86 wins by default no matter what. Your point -- that x86 isn't actually hurting us anymore -- is just the silver lining on that cloud.
Map and reduce? (Score:4, Interesting)
Re:Logical reasons to buy AMD (Score:3, Interesting)
Gosh, maybe you should go tell AMD that they aren't having any trouble with leakage, the yield of their 65nm parts is optimal and they can start volume production right now! The time AMD has spent not shipping Barcelona has been costing them dearly. Did you see the loss they posted last quarter? Did you notice their market cap right now is just a tad over what they paid for ATI?
AMD knows Intel has the better fab, but AMD is selling super cheap. You can get a dual-core processor for half what Intel charges
Yeah, you can get a share of AMD for about half of what Intel's cost as well.
On a performance per dollar basis, AMD wins hands down.
So rush out and buy an AMD now, before their super-low margins bankrupt them altogether!
There is a mountain of evidence against Intel for anti-trust violations, and I try not to financially support evil. The EU is also coming down on Intel for anti-trust violations.
You know if Intel did what AMD has done back when AMD had the faster product - cut their margins down to almost nothing to undersell AMD and gain market share - you would be screaming about the evil monopolist Intel. Somehow it is exactly the opposite of evil when AMD does it.
Even if the anti-trust suits both come through, AMD is near bankruptcy, and I prefer choice in the marketplace. I am terrified of the day when Intel has no competition pushing them and they can just sell what they want and whatever price they want.
Oh please. Regulators would never allow Intel to buy AMDs IP and there are plenty of companies out there willing to jump in and try their hand at the x86 game. If Intel starts driving up prices, that just makes jumping in appear much more appealing.
Re:Map and reduce? (Score:2, Interesting)
And would be terrible for performance. Why on earth does everybody assume that fine grained parallelism will ever work? You need a very highly specialized processor to make it work and those have failed a decade ago as the "standard CPUs" just blew them away. Remember the Connection Machine, that was a box with exactly that fine grain of parallelization? It was programmed in C and Fortran with specialized extensions to express parallelism, incidently they live on in the way you program GPUs and the SSE is also another example of even finer grained parallelism.
Fine grained parallelism only works on very small and specific tasks. In general you want high level parallelism with very little communication and very little dependency on each other. As that is another extreme you have to find a compromise, but to assume the compiler can magically extract a real speed up from a bunch of simple for-loops is just completely unrealistic.
You will have to learn to handle the parallelism. It takes different algorithms and a different way to structure programs. Also you will have to accept that there are things which will not work in parallel. You can parallelize them, but the speed up is just not there to make it useful.
Parallel programming is hard and blaming it on programming languages and claiming another one will solve all problems is just the usual silver bullet. Those languages have been around for ever, functional programming languages can be parallelized automatically. So if they make it so much easier, why aren't they not used? Could it be, that you have to pay for the easy parallelization with something?
Re:I wish AMD and Intel teamed up for once (Score:3, Interesting)
This is true, they're the ones who have to make it actually work. I think who it -really- hurts is anyone who isn't Intel or AMD trying to make an x86 chip. Unfortunately there's a lot of x86 behavior that isn't actually documented -anywhere- except inside the heads of Intel and AMD engineers and the HDL they write. Whereas a couple grad students could code up a fully Alpha-compatible cpu in a few weeks (it wouldn't be fast, but it would work). It creates a higher barrier to entry into the x86 market, and to me that's unfortunate. AMD and Intel obviously have a handle on the ISA.
And in reality a lot of the complexity of x86 decoding has been moved into the microcode engine so that the actual hardware decoders are pretty efficient.
Well, in so much as getting every operation that has to occur when you do something like a protected mode code segment load, sure the microcode deals with that. But the really hard part of x86 decode is dealing with variable-length instructions. To have a super-scalar architecture you need to be able to decode more than one instruction in a cycle, which means you need to know where the second instruction starts. A few ways of dealing with this include speculatively decoding the 2nd instruction assuming it starts at various points using parallel decoders (doesn't scale well at all), or saving marker bits in the instruction cache that tell you where instructions start (unavailable the first time you see an instruction so you have to use a slower method). Also fun to deal with is when an instruction crosses a cache-line boundary. And the essentially arbitrary number of prefixes that can be used.
Also, people shouldn't forget some of the advantages of x86, like variable instruction lengths. PowerPC and ARM may be easier to decode but they take up a ton more space and that causes a significant decrease in cache and memory efficiency. For example, I think the average x86 instruction is only 2 bytes (many are only 1 byte, if your program uses mostly 1 byte instructiosn you can get a LOT of performance this way). PowerPC is fixed at 4 bytes.
You aren't going to get very far using just 1 byte instructions. The average x86 instruction going by the spec may be 2 bytes, but the average x86 instruction in actual code is going to be more. If you're doing FP then you would be using SSE instructions which mostly use 3 bytes *just* for the opcode, not including register or memory arguments which could use 1-3 more bytes and potentially more if you're using any prefixes. In general I think this advantage of x86 isn't very significant. I think it would be interesting to measure the average instruction size used in actual code. Personally, I'd take fixed-width instructions any day.