AMD Previews New Processor Extensions 198
An anonymous reader writes "It has been all over the news today: AMD announced the first of its Extensions for Software Parallelism, a series of x86 extensions to make parallel programming easier. The first are the so-called 'lightweight profiling extensions.' They would give software access to information about cache misses and retired instructions so data structures can be optimized for better performance. The specification is here (PDF). These extensions have a much wider applicability than just parallel programming — they could be used to accelerate Java, .Net, and dynamic optimizers." AMD gave no timeframe for when these proposed extensions would show up in silicon.
Just performance counters? (Score:3, Informative)
Re:And so it goes on... (Score:3, Informative)
As for 16 bit vs 32 bit modes. The instructions are mostly the same. A code segment is specified as being either 16 or 32 bit. That size is the default data sized used by instructions within that segment. There is a "size override" prefix, which if found immediately before an instruction, tells the CPU that the following instruction should use the opposite of default size.
I don't remember the specifics, but 64 bit mode just continues along with the same ideas. There aren't many changes from 32 bit code to 64 bit.
Re:I wish AMD and Intel teamed up for once (Score:3, Informative)
and did away with the aging x86 instruction set and came up with something new.
They did, at least with the FP (floating point) instructions. FP instructions were based on this awful stack architecture, and it's gone away with all the SSE and 64 bit extensions.
The x86 instruction set has evolved greatly over time, and will continue to evolve. Why replace it entirely from scratch? Who's to say that an entirely new instruction set won't have a whole new host of problems?
Re:Will Intel Adopt These Instructions? (Score:4, Informative)
EM64T [wikipedia.org]?
Re:I wish AMD and Intel teamed up for once (Score:3, Informative)
Close, it was the original Pentium. The Pentium Pro -- which despite the name which just made it sound like a minor improvement to the Pentium for business/servers was actually a completely new architecture -- is where they introduced the CISC->RISC conversion. This was in part to make it feasible to have out-of-order execution which many said CISC processors would never have. Turns out they were both right and wrong.
So let's stick with x86 for now, since the gains you foresee are either non-existent or tiny and are never, ever going to outweigh the drawbacks.
As much as I hate x86 from an aesthetic point of view, I must agree with you here.
You can get the x86/EMT64 documentation from intel (Score:3, Informative)
Also, I know from asm on SPARC that many op codes are really just variations of other ops (and/or pseudo ops). For instance, (I'm not sure of the x86 equivalent)
Re:Just performance counters? (Score:4, Informative)
The application can then gather the information, which is stored in its address space, and do with it what it will (optimize on-the-fly).
Of particular interest is that the OS can allow the profile information to be dumped to the address space of other threads/processes as well as the one that the data is collected on. The OS controls the switching of the cached profile information during a context switch.
This is both cool (in that a secondary core/thread can help optimize the first) and scary (one thread getting access to another's instruction address information). I predict there will be exactly 42 Windows patches released 3.734 days after the service pack that allows Windows to take advantage of this feature because of security reasons.
Re:I wish AMD and Intel teamed up for once (Score:4, Informative)
Oh who am I kidding, that could never happen.
Re:I wish AMD and Intel teamed up for once (Score:3, Informative)
What's not to like?
But maybe you meant missing instructions? No load-linked/store conditional or bus snooping. No double (or even 1.5) compare-and-swap. No hardware transactional memory support. Those three make it pretty hard to write fast concurrent code. And streaming operations are improving, but could be much better; there's a reasonable chance that cache coherency will soon be too expensive for practical use.
Maybe you're interested in single-threaded, native code performance; this is, after all, what x86 traditionally shines at. Here you'll find the lack of 3-register instructions to be a performance problem, even if the chip reduces this burden. There's no shuffle (like Altivec, although something like that is coming in Penryn, I think?), finite-field or bit twiddling operations, or conditional operations (a la ARM).
So yeah. There are a lot of things that the x86 instruction set could do better. I don't expect it to do them all, but there are certainly a lot of reasons to change it.
Jombeewoof, get off the Internet. (Score:0, Informative)