Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AMD Hardware

AMD Previews New Processor Extensions 198

An anonymous reader writes "It has been all over the news today: AMD announced the first of its Extensions for Software Parallelism, a series of x86 extensions to make parallel programming easier. The first are the so-called 'lightweight profiling extensions.' They would give software access to information about cache misses and retired instructions so data structures can be optimized for better performance. The specification is here (PDF). These extensions have a much wider applicability than just parallel programming — they could be used to accelerate Java, .Net, and dynamic optimizers." AMD gave no timeframe for when these proposed extensions would show up in silicon.
This discussion has been archived. No new comments can be posted.

AMD Previews New Processor Extensions

Comments Filter:
  • by Anonymous Coward on Wednesday August 15, 2007 @04:46PM (#20241289)
    These extensions could be useful, but speaking as someone from the target audience... I just don't care right now. No amount of minor improvement difference (as might be gained through these) is as important to me as seeing a viable alternative to Intel. Not because I'm an AMD fanboy, but because competition brings the prices down, and accelerates the release of faster chips. From what I hear now, we'll finally see Barcelona chips out on September 10th at -maybe- up to 2.3 Ghz if you're one of the cherised few, but most retail ones will be 1.9 Ghz. I haven't seen the (valid) numbers, so I can't say for sure, but I'm worried about how competitive this will be.

    I realize that the software people and hardware people both have their projects to work on, and they work largely independently in terms of a time-frame, but I figure this news might be timed to say, "Hey! Look at us! We're doing stuff!", but it only serves to frustrate me that their still aren't any real numbers on Barcelona, and, on the whole, that AMD seems to have dropped the ball. /Grumble
  • Silicon Problems. (Score:0, Interesting)

    by Anonymous Coward on Wednesday August 15, 2007 @04:56PM (#20241367)
    They can't get the chips to clock up nicely as a whole; an individual chip or a few dozen individuals can, but most of them are binning in the sub-2GHz category, and that's simply atrocious; no matter how much "better" they are than Intel's quad cores, Intel's are already pushing 3GHz (and benchmarking roughly 50% better, meaning both architectures are performing pretty similarly and roughly the same clock-for-clock).

    The first stab at Barcelona we're getting are going to pathetically under-perform compared to the competition.
  • by Ant P. ( 974313 ) on Wednesday August 15, 2007 @04:59PM (#20241385)
    It was at least 200 last time I read - and the source was an 80486 programming book. I think there's at least that many more in the different versions of SSE.
  • by Anonymous Coward on Wednesday August 15, 2007 @05:01PM (#20241411)
    I read somewhere that modern x86 processors don't really process x86 opcodes anymore--there's a "translator" that takes the CISC x86 code and converts it into some kind of RISC code. If true, maybe they should enable a way for the processer to use that RISC code without the conversion.
  • by Slashcrap ( 869349 ) on Wednesday August 15, 2007 @05:01PM (#20241415)
    and did away with the aging x86 instruction set and came up with something new.

    Yeah, I know, Intel tried with Itanium.


    They already did. I believe the 486 was the last CPU to run x86 instructions natively. Everything since the Pentium has decoded them to a RISC like ISA which can be changed every generation if desired. The only drawback is that a relatively small area of the chip needs to be dedicated to decoding x86 instructions to whatever the internal ISA is.

    And guess what? One of the things that people dislike about x86 is the variable length instructions. Turns out that it actually leads to more compact code. And the speed gains from reduced cache usage more than make up for the effort and chip real estate expended on those decoders.

    So let's stick with x86 for now, since the gains you foresee are either non-existent or tiny and are never, ever going to outweigh the drawbacks.
  • by Pojut ( 1027544 ) on Wednesday August 15, 2007 @05:12PM (#20241543) Homepage
    What I would like to know is how is it that AMD got it's ass handed to itself so viciously by Intel with the Core 2, and yet STILL isn't even remotely close to having something that can compete?

    AMD was "winning" for quite a long time...what happend that has made it impossible for them to come up with something even mildly exciting?
  • by Anonymous Coward on Wednesday August 15, 2007 @05:32PM (#20241793)
    IBM's PPC compiler kicked the shit out of every x86 compiler. (Apples and oranges, but the quality was much better). Same for ARM's compiler and Sun's (SPARC) compiler. Fact is, x86 is the ugly girl at the party, but it gets more attention from GCC, MS, Intel, etc. Native compilers on other architectures beat the shit out of it.
  • by Chris Burke ( 6130 ) on Wednesday August 15, 2007 @07:18PM (#20242897) Homepage
    Why is this nonsense still perpetuated? The instruction set is irrelevant - it's just an interface to tell the processor what to do.

    Sure, now it is, since the decoding of CISC instructions into micro-ops has largely decoupled ISA from the microarchitecture, allowing many of those neat-o performance features you meantion like out-of-order execution. However in the past this wasn't the case and a lot of x86's odd behaviors that seemed like good ideas when they were made were serious performance limiters. Like a global eflags register that is only partially written by various instructions (and they always write even if the result isn't needed).

    Even today, I would say that all those RISC ISAs are better than x86, simply from the standpoint that they are cleaner, easier to decode, have fewer tricky modes to deal with, fewer odd dependencies, and all the other things that make building an actual x86 chip a pain in the arse. No, in the end it makes no difference in performance. Yet, if you had it to do all over again, building the One ISA to Rule Them All without concern for software compatability, and you decided to make something that was more like x86 than Alpha, I'd slap the taste out of your mouth.

    But we do have to be concerned with software compatability, and that I think was the GP's main point. All of those other ISAs failed to dominate -- even when there were actual performance implications! -- simply because they were not x86 and hence didn't run the majority of software. IA64 failed not because it was itself all that bad, but because it couldn't run x86 software well. So when AMD came out with 64-bit backward-compatible x86, everyone stopped caring about IA64. Because it wasn't x86, and AMD64 was.

    So ultimately I agree with you both, and I don't think the GP was nonsense at all. It's a very valid point -- backward compatability is king, so x86 wins by default no matter what. Your point -- that x86 isn't actually hurting us anymore -- is just the silver lining on that cloud.
  • Map and reduce? (Score:4, Interesting)

    by tepples ( 727027 ) <tepples@nospAm.gmail.com> on Wednesday August 15, 2007 @07:40PM (#20243117) Homepage Journal

    Compilers don't know how to extract parallelism very well. It's an *incredibly* difficult problem
    It's not that compilers can't extract parallelism. It's that the C and C++ language standards lack a way to express parallelism. Often, you want to compute a function for each element in an array, resulting in a new array. In some languages, this is called map(). In Python, this is [expression_involving(el) for el in some_list]. An ideal language would provide a way to express that a function has no side effects, allowing map() to farm out different slices of the array to different CPUs. However, iterators in C++ and many other popular languages assume that the computation may have side effects, and provide no way inside the standard language to ask the compiler to break the computation into slices.
  • by Wavicle ( 181176 ) on Thursday August 16, 2007 @01:50AM (#20245865)
    Funny. I've seen a $59 Brisbane core (1.9 out of the box) overclocked to 2.9 GHz with just air cooling, so I'm not sure why everyone insists AMD can't hit the 3GHz barrier, especially when AMD keeps displaying 3GHz Barecelonas.

    Gosh, maybe you should go tell AMD that they aren't having any trouble with leakage, the yield of their 65nm parts is optimal and they can start volume production right now! The time AMD has spent not shipping Barcelona has been costing them dearly. Did you see the loss they posted last quarter? Did you notice their market cap right now is just a tad over what they paid for ATI?

    AMD knows Intel has the better fab, but AMD is selling super cheap. You can get a dual-core processor for half what Intel charges

    Yeah, you can get a share of AMD for about half of what Intel's cost as well.

    On a performance per dollar basis, AMD wins hands down.

    So rush out and buy an AMD now, before their super-low margins bankrupt them altogether!

    There is a mountain of evidence against Intel for anti-trust violations, and I try not to financially support evil. The EU is also coming down on Intel for anti-trust violations.

    You know if Intel did what AMD has done back when AMD had the faster product - cut their margins down to almost nothing to undersell AMD and gain market share - you would be screaming about the evil monopolist Intel. Somehow it is exactly the opposite of evil when AMD does it.

    Even if the anti-trust suits both come through, AMD is near bankruptcy, and I prefer choice in the marketplace. I am terrified of the day when Intel has no competition pushing them and they can just sell what they want and whatever price they want.

    Oh please. Regulators would never allow Intel to buy AMDs IP and there are plenty of companies out there willing to jump in and try their hand at the x86 game. If Intel starts driving up prices, that just makes jumping in appear much more appealing.
  • Re:Map and reduce? (Score:2, Interesting)

    by Josef Meixner ( 1020161 ) on Thursday August 16, 2007 @04:41AM (#20246625) Homepage

    An ideal language would provide a way to express that a function has no side effects, allowing map() to farm out different slices of the array to different CPUs.

    And would be terrible for performance. Why on earth does everybody assume that fine grained parallelism will ever work? You need a very highly specialized processor to make it work and those have failed a decade ago as the "standard CPUs" just blew them away. Remember the Connection Machine, that was a box with exactly that fine grain of parallelization? It was programmed in C and Fortran with specialized extensions to express parallelism, incidently they live on in the way you program GPUs and the SSE is also another example of even finer grained parallelism.

    Fine grained parallelism only works on very small and specific tasks. In general you want high level parallelism with very little communication and very little dependency on each other. As that is another extreme you have to find a compromise, but to assume the compiler can magically extract a real speed up from a bunch of simple for-loops is just completely unrealistic.

    You will have to learn to handle the parallelism. It takes different algorithms and a different way to structure programs. Also you will have to accept that there are things which will not work in parallel. You can parallelize them, but the speed up is just not there to make it useful.

    Parallel programming is hard and blaming it on programming languages and claiming another one will solve all problems is just the usual silver bullet. Those languages have been around for ever, functional programming languages can be parallelized automatically. So if they make it so much easier, why aren't they not used? Could it be, that you have to pay for the easy parallelization with something?

  • by Chris Burke ( 6130 ) on Thursday August 16, 2007 @01:23PM (#20251497) Homepage
    The people who really suffer from this are Intel and AMD. They're the ones that have to design the nasty decoders for x86. They obviously find the advantages of decades of expertise in x86 ISA throughout the industry is worth the effort.

    This is true, they're the ones who have to make it actually work. I think who it -really- hurts is anyone who isn't Intel or AMD trying to make an x86 chip. Unfortunately there's a lot of x86 behavior that isn't actually documented -anywhere- except inside the heads of Intel and AMD engineers and the HDL they write. Whereas a couple grad students could code up a fully Alpha-compatible cpu in a few weeks (it wouldn't be fast, but it would work). It creates a higher barrier to entry into the x86 market, and to me that's unfortunate. AMD and Intel obviously have a handle on the ISA.

    And in reality a lot of the complexity of x86 decoding has been moved into the microcode engine so that the actual hardware decoders are pretty efficient.

    Well, in so much as getting every operation that has to occur when you do something like a protected mode code segment load, sure the microcode deals with that. But the really hard part of x86 decode is dealing with variable-length instructions. To have a super-scalar architecture you need to be able to decode more than one instruction in a cycle, which means you need to know where the second instruction starts. A few ways of dealing with this include speculatively decoding the 2nd instruction assuming it starts at various points using parallel decoders (doesn't scale well at all), or saving marker bits in the instruction cache that tell you where instructions start (unavailable the first time you see an instruction so you have to use a slower method). Also fun to deal with is when an instruction crosses a cache-line boundary. And the essentially arbitrary number of prefixes that can be used.

    Also, people shouldn't forget some of the advantages of x86, like variable instruction lengths. PowerPC and ARM may be easier to decode but they take up a ton more space and that causes a significant decrease in cache and memory efficiency. For example, I think the average x86 instruction is only 2 bytes (many are only 1 byte, if your program uses mostly 1 byte instructiosn you can get a LOT of performance this way). PowerPC is fixed at 4 bytes.

    You aren't going to get very far using just 1 byte instructions. The average x86 instruction going by the spec may be 2 bytes, but the average x86 instruction in actual code is going to be more. If you're doing FP then you would be using SSE instructions which mostly use 3 bytes *just* for the opcode, not including register or memory arguments which could use 1-3 more bytes and potentially more if you're using any prefixes. In general I think this advantage of x86 isn't very significant. I think it would be interesting to measure the average instruction size used in actual code. Personally, I'd take fixed-width instructions any day.

All the simple programs have been written.

Working...