AMD Previews New Processor Extensions

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

AMD Previews New Processor Extensions 198

Posted by kdawson on Wednesday August 15, 2007 @04:37PM from the parallel-universes dept.

An anonymous reader writes "It has been all over the news today: AMD announced the first of its Extensions for Software Parallelism, a series of x86 extensions to make parallel programming easier. The first are the so-called 'lightweight profiling extensions.' They would give software access to information about cache misses and retired instructions so data structures can be optimized for better performance. The specification is here (PDF). These extensions have a much wider applicability than just parallel programming — they could be used to accelerate Java, .Net, and dynamic optimizers." AMD gave no timeframe for when these proposed extensions would show up in silicon.

This discussion has been archived. No new comments can be posted.

AMD Previews New Processor Extensions

Load All Comments

Search 198 Comments Log In/Create an Account

Comments Filter:

Just performance counters? (Score:3, Informative)

by Erich ( 151 ) writes: on Wednesday August 15, 2007 @04:42PM (#20241243) Homepage Journal

Looks like there isn't a whole lot there that you couldn't get using existing performance counters and a tool like oprofile....

Share
twitter facebook
- Re: (Score:2)
  
  by pipatron ( 966506 ) writes:
  
  But this could probably do it dynamic, in realtime, which might be nice. Dunno, didn't RTFA of course.
- Re:Just performance counters? (Score:4, Informative)
  
  by imgod2u ( 812837 ) writes: on Wednesday August 15, 2007 @06:34PM (#20242447) Homepage
  
  Looking at the PDF, it supposedly gathers profile data in the background (in local caches on the chip itself) and dumps periodically depending on the OS/application settings. This allows it to profile on-the-fly with very little impact on application performance.
  
  The application can then gather the information, which is stored in its address space, and do with it what it will (optimize on-the-fly).
  
  Of particular interest is that the OS can allow the profile information to be dumped to the address space of other threads/processes as well as the one that the data is collected on. The OS controls the switching of the cached profile information during a context switch.
  
  This is both cool (in that a secondary core/thread can help optimize the first) and scary (one thread getting access to another's instruction address information). I predict there will be exactly 42 Windows patches released 3.734 days after the service pack that allows Windows to take advantage of this feature because of security reasons.
  
  Parent Share
  twitter facebook
- reminds me of the PS2's PA (Score:2)
  
  by acidrain ( 35064 ) writes:
  
  Looks like there isn't a whole lot there that you couldn't get using existing performance counters and a tool like oprofile....
  Sony had a $10k PS2 called the PA that recorded exactly what happened to every cycle on the cpu, gpu etc. without changing the way the game ran. It was the most incredible thing, like you had been sitting in the dark for years and then suddenly someone turned on the lights.
  Is it cache misses, dma contention, background threads, branch stalls or actual work? Optimizing on the PC
- Re: (Score:2)
  
  by Nefarious Wheel ( 628136 ) * writes:
  
  I wonder if this isn't part of the series of changes announced at MS TechEd, where it was said the Ring 0 (Kernel) instructions would be emulated to provide a bit of a speed-up for the VS Hypervisor. It was said that both Intel and AMD were preparing designs to support virtualisation in silicon. That would put it out somewhere near the end of 2007 I think.
I wish AMD and Intel teamed up for once (Score:3, Funny)

by rolfwind ( 528248 ) writes: on Wednesday August 15, 2007 @04:43PM (#20241261)

and did away with the aging x86 instruction set and came up with something new.

Yeah, I know, Intel tried with Itanium.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by Chris Burke ( 6130 ) writes:
  
  Yeah, I know, Intel tried with Itanium.
  
  And you want them to try *again*? As far as I'm concerned the most amazing achievement of IA64 was that they got to start over from scratch, and ended up with an ISA with a manual even bigger than the IA32 manual! Going to prove that the only thing worse than an ISA developed through 20 years of engineering hackery is one developed by committee.
  - Re: (Score:2)
    
    by gilesjuk ( 604902 ) writes:
    
    Indeed, devices at the lowest level don't always look that pretty. As Linus said, with Itanium Intel threw away all the good bits.
    - Re: (Score:2)
      
      by dfghjk ( 711126 ) writes:
      
      "As Linus said, with Itanium Intel threw away all the good bits."
      
      It's a good thing Linus leveraged his considerable processor architecture experience while at Transmeta. Where would they be now had he not provided useful advice like that?
      - Re: (Score:2)
        
        by Chris Burke ( 6130 ) writes:
        
        They'd have been even worse off even sooner than what actually happened. Any other questions?
  - Re: (Score:2)
    
    by LordPhantom ( 763327 ) writes:
    
    It's like the saying goes: None of us is as dumb as all of us....
  - Re: (Score:2)
    
    by hitmark ( 640295 ) writes:
    
    and this is the same corp that came up with ACPI and EFI, iirc.
    not good...
    
    hell, if i didnt know better, i would suspect that intel was government owned, why? because they seems to overengineer to a degree that only nasa tops.
- Re:I wish AMD and Intel teamed up for once (Score:4, Insightful)
  
  by realmolo ( 574068 ) writes: on Wednesday August 15, 2007 @04:58PM (#20241381)
  
  Yup. They tried it with Itanium, and it didn't work.
  
  The thing is, at this stage in processor design, the actual instruction set isn't all that important.
  
  But *compilers* are more important than ever, and writing a good compiler is hard work. x86 compilers have been tweaked and improved for nearly 30 years. A new instruction set could NEVER achieve that kind of optimization.
  
  Interestingly,the Itanium and the EPIC architecture were designed to move all the hard work of "parallel processing" to the compiler. Unfortunately, they could never get the compiler to work all that well on most kinds of code. The compiler could never really "extract" the parallelism that Itanium CPUs needed to work at full speed.
  
  Which is *exactly* the problem we have now with our multi-core CPUs. Compilers don't know how to extract parallelism very well. It's an *incredibly* difficult problem that Intel has already thrown untold billions of dollars at. Essentially, even though Itanium/EPIC never caught on, we're having to deal with all the same problems it had, anyway.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Interesting)
    
    by Anonymous Coward writes:
    
    IBM's PPC compiler kicked the shit out of every x86 compiler. (Apples and oranges, but the quality was much better). Same for ARM's compiler and Sun's (SPARC) compiler. Fact is, x86 is the ugly girl at the party, but it gets more attention from GCC, MS, Intel, etc. Native compilers on other architectures beat the shit out of it.
    - Re: (Score:3, Insightful)
      
      by jguthrie ( 57467 ) writes:
      
      Okay, I'll feed the troll. Tell me where I can buy an ATX (or smaller) PPC motherboard and CPU new for, oh, say $200, and I'll look at PPC again. The reason that x86 gets all the software is because it's the cheapest, it's the cheapest because all the motherboard manufacturers make motherboards for it, and all the motherboard manufacturers make motherboards for it because it gets all the software.
    - Re:I wish AMD and Intel teamed up for once (Score:4, Informative)
      
      by x2A ( 858210 ) writes: on Wednesday August 15, 2007 @09:01PM (#20243815)
      
      So what we need really is a "native" x86 compiler, say, from Intel, that would maybe outperform the multi-platform GCC compiler... an Intel C/C++ Compiler, or 'ICC' we could call it... maybe...
      
      Oh who am I kidding, that could never happen.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Carewolf ( 581105 ) writes:
        
        And one that doesn't artificially limit the performance of other (let's say non-Intel) x86 CPU's.
        
        I am not kidding, that would never happen.
  - Map and reduce? (Score:4, Interesting)
    
    by tepples ( 727027 ) writes: <tepplesNO@SPAMgmail.com> on Wednesday August 15, 2007 @07:40PM (#20243117) Homepage Journal
    
    Compilers don't know how to extract parallelism very well. It's an *incredibly* difficult problem
    It's not that compilers can't extract parallelism. It's that the C and C++ language standards lack a way to express parallelism. Often, you want to compute a function for each element in an array, resulting in a new array. In some languages, this is called map(). In Python, this is [expression_involving(el) for el in some_list]. An ideal language would provide a way to express that a function has no side effects, allowing map() to farm out different slices of the array to different CPUs. However, iterators in C++ and many other popular languages assume that the computation may have side effects, and provide no way inside the standard language to ask the compiler to break the computation into slices.
    
    Parent Share
    twitter facebook
    - Re: (Score:2, Interesting)
      
      by Josef Meixner ( 1020161 ) writes:
      
      An ideal language would provide a way to express that a function has no side effects, allowing map() to farm out different slices of the array to different CPUs.
      And would be terrible for performance. Why on earth does everybody assume that fine grained parallelism will ever work? You need a very highly specialized processor to make it work and those have failed a decade ago as the "standard CPUs" just blew them away. Remember the Connection Machine, that was a box with exactly that fine grain of paralleliz
      - Then again, schools are partly to blame (Score:2)
        
        by tepples ( 727027 ) writes:
        
        You will have to learn to handle the parallelism. It takes different algorithms and a different way to structure programs.
        Why are these parallel algorithms not taught in university computer science classes from day 1?
        Those languages have been around for ever, functional programming languages can be parallelized automatically. So if they make it so much easier, why aren't they not used?
        Educational inertia probably makes up a large part of it.
    - Re: (Score:2)
      
      by Just Some Guy ( 3352 ) writes:
      
      An ideal language would provide a way to express that a function has no side effects, allowing map() to farm out different slices of the array to different CPUs.
      I wrote something like that [honeypot.net] for Python. The idea is that you'd use a "decorator" to indicate that a method is parallelizable (doesn't have any side effects) and roughly how many processes to spread it across (because you don't want to hit your database with 10,000 simultaneous queries just because your client could theoretically do so, for instance). For example:
      @parallelizable(10, perproc=4) def timestwo(x, y): return (x + y) * 2 print map(timestwo, [1, 2, 3, 4], [7, 8, 9, 10])
      
      would tell the multipr
  - Re: (Score:2)
    
    by be-fan ( 61476 ) writes:
    
    But *compilers* are more important than ever, and writing a good compiler is hard work. x86 compilers have been tweaked and improved for nearly 30 years.
    
    Compilers have gotten better, but mostly at CPU-independent optimization. Compilers for x86 aren't better than compilers for other architectures, it's just that x86 CPUs are extraordinarily insensitive to mediocre code generation. The reason is two-fold. First, they kind of have to be, because x86 doesn't really have enough registers to make fancy schedulin
- Re: (Score:2, Interesting)
  
  by Slashcrap ( 869349 ) writes:
  
  and did away with the aging x86 instruction set and came up with something new.
  
  Yeah, I know, Intel tried with Itanium.
  
  They already did. I believe the 486 was the last CPU to run x86 instructions natively. Everything since the Pentium has decoded them to a RISC like ISA which can be changed every generation if desired. The only drawback is that a relatively small area of the chip needs to be dedicated to decoding x86 instructions to whatever the internal ISA is.
  
  And guess what? One of the things that people d
  - Re: (Score:3, Informative)
    
    by Chris Burke ( 6130 ) writes:
    
    I believe the 486 was the last CPU to run x86 instructions natively.
    
    Close, it was the original Pentium. The Pentium Pro -- which despite the name which just made it sound like a minor improvement to the Pentium for business/servers was actually a completely new architecture -- is where they introduced the CISC->RISC conversion. This was in part to make it feasible to have out-of-order execution which many said CISC processors would never have. Turns out they were both right and wrong.
    
    So let's stick wi
- Re: (Score:3, Informative)
  
  by Vellmont ( 569020 ) writes:
  
  and did away with the aging x86 instruction set and came up with something new.
  
  They did, at least with the FP (floating point) instructions. FP instructions were based on this awful stack architecture, and it's gone away with all the SSE and 64 bit extensions.
  
  The x86 instruction set has evolved greatly over time, and will continue to evolve. Why replace it entirely from scratch? Who's to say that an entirely new instruction set won't have a whole new host of problems?
- Re:I wish AMD and Intel teamed up for once (Score:5, Insightful)
  
  by LWATCDR ( 28044 ) writes: on Wednesday August 15, 2007 @05:03PM (#20241437) Homepage Journal
  
  Well we had the 68000 family which had much better instruction set then the X86.
  We have the Power and PowerPC which had a much better instruction set than the X86.
  We have the ARM which is a much better instruction set then the X86.
  We have the MIPS which is pretty nice.
  And we had the Alpha and still do for a little while longer.
  The problem with all of them is that they didn't run X86 code. Intel and AMD both made so much money from selling billions of CPUs that they could plow a lot of money into making the X86 the fastest pig with lipstick that the world has ever seen.
  What made the IA-64 such a disaster was that it was slow running X86 code.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by ZachPruckowski ( 918562 ) writes:
    
    I don't know why you aren't modded +5 (at the moment anyway), but you're precisely correct.
    
    The number one requirement for a new instruction set is that it runs Windows and most Win32 programs at speeds comparable to existing processors. Given the size and scope of Windows, Microsoft probably can't easily port Windows and Win32 and Visual Studio's compiler over to another instruction set easily.
    
    This means that we either need hardware or software emulation of x86 (and possibly x86-64) on whatever new instr
    - Re: (Score:2)
      
      by jgrahn ( 181062 ) writes:
      
      Given the size and scope of Windows, Microsoft probably can't easily port Windows and Win32 and Visual Studio's compiler over to another instruction set easily.
      
      Whatever the cause is, it isn't size and scope. Practically any piece of free software compiles on a dozen architectures. For example, Debian Gnu/Linux ships around thirteen gigabytes of software for each of eleven architectures ...
  - Re:I wish AMD and Intel teamed up for once (Score:5, Insightful)
    
    by Criffer ( 842645 ) writes: on Wednesday August 15, 2007 @06:00PM (#20242079)
    
    Not again.
    
    Why is this nonsense still perpetuated? The instruction set is irrelevant - it's just an interface to tell the processor what to do. Internally, Barcelona is a very nice RISC core capable of doing so many things at once its insane. The only thing that performs better is a GPU, and that's only because they're thrown at embarassingly parallel problems. The fastest general purpose CPUs come from Intel and AMD, and it has nothing to do with instruction set.
    
    AMD64, and the new Core2 and Barcelona chips are very nice chips. 16 64-bit registers, 16 128-bit registers, complete IEEE-754 floating point support, integer and floating-point SIMD instructions, out-of-order execution, streaming stores and hardware prefetch. Add to that multiple cores with very fast busses, massive caches - with multichip cache coherency - and the ability to run any code compiled in the last 25 years. What's not to like?
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Insightful)
      
      by Chirs ( 87576 ) writes:
      
      The instruction set *is* relevent to low-level designers. Working with the PowerPC instruction set is much nicer than x86...for me at least.
      
      As for "the fastest general purpose CPUs come from Intel and AMD", have you ever looked at a Power5? It's stupid fast. Stupid expensive, too.
    - Re:I wish AMD and Intel teamed up for once (Score:5, Interesting)
      
      by Chris Burke ( 6130 ) writes: on Wednesday August 15, 2007 @07:18PM (#20242897) Homepage
      
      Why is this nonsense still perpetuated? The instruction set is irrelevant - it's just an interface to tell the processor what to do.
      
      Sure, now it is, since the decoding of CISC instructions into micro-ops has largely decoupled ISA from the microarchitecture, allowing many of those neat-o performance features you meantion like out-of-order execution. However in the past this wasn't the case and a lot of x86's odd behaviors that seemed like good ideas when they were made were serious performance limiters. Like a global eflags register that is only partially written by various instructions (and they always write even if the result isn't needed).
      
      Even today, I would say that all those RISC ISAs are better than x86, simply from the standpoint that they are cleaner, easier to decode, have fewer tricky modes to deal with, fewer odd dependencies, and all the other things that make building an actual x86 chip a pain in the arse. No, in the end it makes no difference in performance. Yet, if you had it to do all over again, building the One ISA to Rule Them All without concern for software compatability, and you decided to make something that was more like x86 than Alpha, I'd slap the taste out of your mouth.
      
      But we do have to be concerned with software compatability, and that I think was the GP's main point. All of those other ISAs failed to dominate -- even when there were actual performance implications! -- simply because they were not x86 and hence didn't run the majority of software. IA64 failed not because it was itself all that bad, but because it couldn't run x86 software well. So when AMD came out with 64-bit backward-compatible x86, everyone stopped caring about IA64. Because it wasn't x86, and AMD64 was.
      
      So ultimately I agree with you both, and I don't think the GP was nonsense at all. It's a very valid point -- backward compatability is king, so x86 wins by default no matter what. Your point -- that x86 isn't actually hurting us anymore -- is just the silver lining on that cloud.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by truesaer ( 135079 ) writes:
        
        Even today, I would say that all those RISC ISAs are better than x86, simply from the standpoint that they are cleaner, easier to decode, have fewer tricky modes to deal with, fewer odd dependencies, and all the other things that make building an actual x86 chip a pain in the arse.
        The people who really suffer from this are Intel and AMD. They're the ones that have to design the nasty decoders for x86. They obviously find the advantages of decades of expertise in x86 ISA throughout the industry is worth th
        
        Re: (Score:3, Interesting)
        
        by Chris Burke ( 6130 ) writes:
        
        The people who really suffer from this are Intel and AMD. They're the ones that have to design the nasty decoders for x86. They obviously find the advantages of decades of expertise in x86 ISA throughout the industry is worth the effort.
        
        This is true, they're the ones who have to make it actually work. I think who it -really- hurts is anyone who isn't Intel or AMD trying to make an x86 chip. Unfortunately there's a lot of x86 behavior that isn't actually documented -anywhere- except inside the heads of Int
    - Re: (Score:3, Informative)
      
      by wirelessbuzzers ( 552513 ) writes:
      
      Why is this nonsense still perpetuated? The instruction set is irrelevant - it's just an interface to tell the processor what to do...
      
      What's not to like?
      To start with, the complexity makes it a total pain in the ass to write kernels, compilers, runtime systems, analyses, debuggers and verifiers for x86. On top of that, it costs lots of engineering time, silicon and power to implement all those microcode crackers and fancy superscalar optimizations; this is why x86 can't hold a candle to ARM in the embedded world.
      
      But maybe you meant missing instructions? No load-linked/store conditional or bus snooping. No double (or even 1.5) compare-and-swap. No hardw
    - Re: (Score:2)
      
      by LWATCDR ( 28044 ) writes:
      
      I don't believe that ISA doesn't matter. If for no other reason than the X86 has a real shortage of GP registers. To gain the extra registers you must run in 64 bit mode so you must live with 64 bit addressing even if you really don't need it. As you said the X86 is fast which is also what I said. The ISA is very messy and and a real pain to write code for. There will always be some people that must write assembly. Yes the x86 is really fast even without a good ISA. It is also be updated over the years to
    - - Re: (Score:2, Insightful)
        
        by Verte ( 1053342 ) writes:
        
        AMEN! The lack of general purpose registers is a serious drawback to x86. The MMU is the same- well, it's not that it isn't feature packed, but it's so slow that we need a TLB, and the TLB can't handle threads, so all non-globals need to be flushed when context switching. Yuck.
        
        All the other features the GP mentioned, except for the last one if you mean COMPILED code, are also available on most RISC chips :P and the performance data really spoke for itself [Alphas had four times the floating point perform
        
        Re: (Score:3)
        
        by x2A ( 858210 ) writes:
        
        "the TLB can't handle threads, so all non-globals need to be flushed when context switching"
        
        Isn't this not true on modern processors, at least up to a point? With some space per TLB entry put aside for a task ID, means that when you switch to a different process, it will won't use TLB entries with a different task ID. Of course the OS has to support this (tell the processor when it's task switching which memory space it's switching to), and I'm not sure how big the space on the TLB is for this (it may be on
  - Re: (Score:3, Insightful)
    
    by wonkavader ( 605434 ) writes:
    
    No, the problem with the IA-64 was not that it was slow running x86 code. The problem was that it was slow running x86 code and not that great at running non-x86 code. Spectacular performance on non-x86 would have made it a much greater success, but it was lackluster from the start. After so long spent on designing a new chip, you'd expect some real results -- it was not much better than the alternatives. "Why bother?", the world said, and says even now.
  - - Re:ARM CPUS outnumber x86 by a huge factor -probab (Score:2)
      
      by LWATCDR ( 28044 ) writes:
      
      Yes but then the 8051 then is probably out numbers the X86 and the Arm. The Mips, Arm, Power, and even the 68k still exists in the embedded market. For example the Power is in all three of the new game consoles. Arms are in a lot of the WAPs. I keep wondering if we will see the a CPU the size of the latest AMD but containing 16 or more ARM cores. Sort of a T1 competitor.
- Re: (Score:2)
  
  by nbert ( 785663 ) writes:
  
  Not that it would make much of a difference - in the end most of the instruction set won't be used by programmers and especially compilers (CISC vs. RISC anyone?). But to get back to the topic: The overhead caused by upwards compatibility isn't that big after all. Problems a normal user experiences are not caused by bad hardware design nowadays.
- Re: (Score:2)
  
  by Ant P. ( 974313 ) writes:
  
  The thing is, what would they replace it with that they can sell? The only choices are emulation or translating code on the fly, both of which have sunk already.
  - - Re: (Score:2)
      
      by Ant P. ( 974313 ) writes:
      
      Now that you mention it, that worked for Apple...
- Re: (Score:3, Insightful)
  
  by servognome ( 738846 ) writes:
  
  and did away with the aging x86 instruction set and came up with something new.
  I wish they'd do away with English and come up with something new - a language based on consistant & logical rules.
  I don't know how anything gets done using a set of words cobbled together over hundreds of years with all sorts of special rules and idioms.
  - Re: (Score:2)
    
    by rolfwind ( 528248 ) writes:
    
    Yes, it's called German:) (Actually, English stems from it.)
  - - Re: (Score:2)
      
      by UncleFluffy ( 164860 ) writes:
      
      if/when they get tired of sending money to Britain to pay for translators.
      
      if/when they get tired of returning a small amount of Britain's contribution back to it to pay for translators.
      
      There, fixed that for you...
- - Re: (Score:2)
    
    by dunkelfalke ( 91624 ) writes:
    
    yep, that is right.
    especially interesting is the transmeta crusoe cpu which can load different instruction sets and translate them into its native code.
    
    but the thing is, as far as i remember, back at those days when transmeta crusoe was just near the release, linus said something like "i compiled the linux kernel to the native crusoe vliw instructions and it was actually slower than the x86 code"
  - Re: (Score:3, Insightful)
    
    by Chris Burke ( 6130 ) writes:
    
    If true, maybe they should enable a way for the processer to use that RISC code without the conversion.
    
    I don't think that's a good idea. The internal micro-ops are machine-dependent, and they will change as the microarchitecture changes. By designing the micro-ops specific to the architecture, they can try to make the x86 instruction translate into an optimal sequence of micro-ops. As hardware functionality changes, existing x86 instructions can have the underlying ops changed to suit without you having
I think this is great (Score:3, Funny)

by P3NIS_CLEAVER ( 860022 ) writes: on Wednesday August 15, 2007 @05:25PM (#20241711) Journal

I for one
think this
is good
news.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by edwdig ( 47888 ) writes:
  
  There's very little difference between the instructions in the different modes. The memory management unit is where most of the differences are. Properly written 16 bit real mode code will still run in 16 bit protected mode. The only difference is how the segment portion of the pointer in interpreted.
  
  As for 16 bit vs 32 bit modes. The instructions are mostly the same. A code segment is specified as being either 16 or 32 bit. That size is the default data sized used by instructions within that segment. There
  - - Re: (Score:2)
      
      by AnyoneEB ( 574727 ) writes:
      
      Java and .Net are JIT compiled. C++ is a normal compiled language. I assume the extensions are helpful to JIT compilers because they would allow the compilers to recompile the code with different optimizations based on the data they get.
- Re: (Score:3, Interesting)
  
  by Ant P. ( 974313 ) writes:
  
  It was at least 200 last time I read - and the source was an 80486 programming book. I think there's at least that many more in the different versions of SSE.
- You can get the x86/EMT64 documentation from intel (Score:3, Informative)
  
  by Gazzonyx ( 982402 ) writes:
  
  If you root around Intel's site a bit, you can get the developer manuals for asm on their chips; I think there's like 5 of them @ 300 pages+ each. It's all the documentation, I think only 1 book is the actual language specs. Anyways, if you ask them nicely via email, they'll send the manuals to you for free. I got mine in under a week from when I emailed them. They even pay shipping.
  Also, I know from asm on SPARC that many op codes are really just variations of other ops (and/or pseudo ops). For in
  - - Re: (Score:2)
      
      by Gazzonyx ( 982402 ) writes:
      
      Yeah, but I couldn't find a way to get AMD to mail me a hard copy of their documentation (at least, not for free). If they do so, please correct me, as I haven't looked in quite a few months.
- - Re: (Score:2)
    
    by TheOrquithVagrant ( 582340 ) writes:
    
    If only it were so. Unfortunately, it's not. There's a distressing amount of 16-bit real-mode code being executed in between power-on and your OS kernel switching into 32 or 64 bit mode even on the most modern PC.
    - - Re: (Score:2)
        
        by WhatAmIDoingHere ( 742870 ) * writes:
        
        In windows, thanks to people who love making things easy for the end user and who also are always looking to the future.. a lot of install programs for games and apps are 16bit.
        
        Another fun thing is a lot of games broke in Vista because the game had the "MY DOCUMENTS" folder location hard coded.
        
        Future looking programmers..
  - Re: (Score:2)
    
    by BillyBlaze ( 746775 ) writes:
    
    In a sense, the 16 bit instructions have been dropped, if only when running in 64-bit mode. Which is actually kind of annoying, because it means some of those old Windows 3 and DOS programs won't run without emulation.
- Re:Will Intel Adopt These Instructions? (Score:4, Informative)
  
  by The Real Nem ( 793299 ) writes: on Wednesday August 15, 2007 @05:09PM (#20241493) Homepage
  
  EM64T [wikipedia.org]?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Anarke_Incarnate ( 733529 ) writes:
    
    Technically correct and wrong at the same time. EM64T has a kludge in the way that memory is addressed. The EM64T chips cannot access memory above 4GB without using pointers.
    - Re: (Score:2)
      
      by edwdig ( 47888 ) writes:
      
      Technically correct and wrong at the same time. EM64T has a kludge in the way that memory is addressed. The EM64T chips cannot access memory above 4GB without using pointers.
      
      You can't access any memory without pointers.
      
      You're probably thinking of Page Addressing Extensions (PAE), which let you swap out parts of the page tables to point to memory above 4 GB. That's existed since the Pentium Pro or so. EM64T is just the damage control name Intel's marketing department came up with for their implementation of
      - Re: (Score:2)
        
        by Anarke_Incarnate ( 733529 ) writes:
        
        No, this is not regarding PAE. PAE should be irrelevant with 64bit extensions since it should be able to address over 32bits worth of RAM. PAE was for older generation processors. The issue is that the EM64T spec does not change the addressable amounts of RAM. I wish I had the link from Red Hat about the kernel hacks that were needed to make it work.
        By the way, you do not need pointers to address memory, and what I had stated was that in order to address higher than 4GB of RAM, the EM64T chips have
        
        Re: (Score:2)
        
        by andreyw ( 798182 ) writes:
        
        I don't know what you're smoking, but I want some of it.
        
        Let's start with some basic facts, that you can verify for yourself by hitting the long mode specs in AMD and Intel manuals:
        1) You need PAE enabled (in CR4). Long mode uses a 4-level paging table scheme (PML4 - PDPT - PD - PT, although you can get away with only using the first three levels if you are fine with a 2MB granularity.
        2) The linear address space is 64 bits.
        3) The physical address space, ATM AFAIK, is 52 bits, with the other bits reserved for
        
        Re: (Score:2)
        
        by Anarke_Incarnate ( 733529 ) writes:
        
        I simplified the fucking explanation due to being tired, sue me.
        
        https://www.redhat.com/docs/manuals/enterprise/RHE L-3-Manual/release-notes/as-amd64/RELEASE-NOTES-U2 -x86_64-en.html [redhat.com]
        
        From the reference itself
        " Software IOTLB -- Intel® EM64T does not support an IOMMU in hardware while AMD64 processors do. This means that physical addresses above 4GB (32 bits) cannot reliably be the source or destination of DMA operations. Therefore, the Red Hat Enterprise Linux 3 Update 2 kernel "bounces" all DMA operatio
        
        Re: (Score:2)
        
        by edwdig ( 47888 ) writes:
        
        Well, you missed the point of it. This is referring to DMA. DMA is means of fast transfer of data between main memory and an expansion device. Basically, this means that, for example, to send a packet of data to your network card, the data must exist within the first 4 GB of memory.
        
        It simply means that when the kernel allocates buffers for data transfer to/from hardware, it has to be a little careful about where it does it. This doesn't have any impact whatsoever on userspace code.
        
        Also, at least in the earl
        
        Re: (Score:2)
        
        by andreyw ( 798182 ) writes:
        
        No, you do not understand the crux of the problem. This has nothing with memory access from from user-space or kernel-space code. This has nothing to with the CPU instruction set architecture. This has everything to do with direct memory access by I/O devices, and is a result of lack of chipset support for this, wherein the entire north bridge is integrated in the AMD chips making any support issues moot. This cripples I/O bandwith.
        
        You didn't simplify the explanation - you did not understand it, and you STI
    - Re: (Score:2)
      
      by forkazoo ( 138186 ) writes:
      
      Technically correct and wrong at the same time. EM64T has a kludge in the way that memory is addressed. The EM64T chips cannot access memory above 4GB without using pointers.
      Could you clarify that at all? I'm not the end-all, be-all expert on these things, but I do know enough to be sure that what you wrote is so not-correct as to not even be wrong...
      
      Pointers really only matter from a relatively high-level software perspective. From a low level hardware perspective, you can either say that pointers don't
  - - Re: (Score:2)
      
      by thegnu ( 557446 ) writes:
      
      Apple Acolyte wrote:
      Oops, kind of forgot about that case. Sorry for the stupid question.
      --
      PowerPC zealot since 1994
      
      Apparently you're a mac person, so it's understandable. :-)
      - woops (Score:2)
        
        by thegnu ( 557446 ) writes:
        
        i don't mean that you'd be an idiot for being a mac person, but that x86 cpu particulars would slip your mind. :D
- Re: (Score:2)
  
  by SEE ( 7681 ) writes:
  
  x86-64 (AMD64) is the classic case.
  
  Prior to that, the closest thing was when NexGen (just before AMD bought them) developed an MMX-like extension for the Nx686 (released by AMD as the K6) and cut a deal for Cyrix to use them, which is what provoked Intel into creating MMX with cross-licensing to AMD and Cyrix.
- Re: (Score:2)
  
  by Necroman ( 61604 ) writes:
  
  Intel and AMD have some nice agreements between one another where they are allowed to share information about x86 processor extensions and the like. This means if one company designs a cool new extension, the other can pick it up with little hassle.
  
  (Or at least that's how I remember it working)
- Re: (Score:2)
  
  by x2A ( 858210 ) writes:
  
  Aside from what others mentioned, "3D-Now!" was added by AMD on the K6/2, as a kind of floating point MMX type thing I seem to recall. Intel later incorporated* these instruction sets, and AMD incorporated* Intels SSE instruction sets.
  
  (*mostly)
- Re: (Score:2)
  
  by tji ( 74570 ) writes:
  
  They don't necessarily need to. But, if they are useful, Intel could either adopt them or make their equivalent.
  
  Adopt: x86-64 (AMD Created, Intel adopted it when the Itanium sunk)
  
  Co-existing features: SIMD: MMX/SSE and 3DNow! (SSE eventually won out, but they co-existed for a long time).
  Virtualization: Intel VT and AMD-V co-exist today, and both are used by virtualization projects like Xen.
- - Re: (Score:3, Insightful)
    
    by HandsOnFire ( 1059486 ) writes:
    
    What happened is that the P4 architecture was more of a marketing scheme to push MHz, but not performance. AMD came out with an architecture directed at high performance. Intel came out with the Core 2 products which also focused on peroformance instead of clock speed. Intel has a lead in the manufacturing process side with respect to node size. This helps them to produce a lot at a lower cost. And If you look at Intel's and AMD's financials, you'll see how much each has to spend on R&D. Intel has a lot
  - Re: (Score:3)
    
    by Surt ( 22457 ) writes:
    
    Major architectural changes (historically) have been years between. AMD had the lead arch, and intel took years to respond with core. Now intel has the lead, and AMD won't compete until their new arch. The problem is compounded for AMD by intel deciding to make a major push to speed up their arch cycle time. AMD's new arch will have to do battle with intel's refined core2 shortly after release, and intel's next arch is due as soon as next year, so their window is tight. AMD is of course also trying to
  - Re: (Score:2)
    
    by GiMP ( 10923 ) writes:
    
    What I would like to know is how is it that AMD got it's ass handed to itself so viciously by Intel with the Core 2, and yet STILL isn't even remotely close to having something that can compete?
    As I see it... The memory bandwidth limitations on Intel's FSB are so restricting that for many applications it matters little how many cores or threads their CPUS can push. The reality is that Intel's chips cannot push memory around fast enough for those processors to be worthwhile. Rather than a dual quad-core sy
  - Re: (Score:2)
    
    by Enderandrew ( 866215 ) writes:
    
    It is pretty simple. On the same fab, the AMD architecture was proving superior for a couple years. However, Intel has a superior manufacturing process right now. A 45nm process certainly beats the 90nm AMD was doing for a while, and the 65nm process they are using now.
    
    The question I used to have is when both were using the same manufacturing process, why AMD was kicking the teeth in on the P4 line and why it took so long for Intel to catch up. It goes back and forth.
- - Logical reasons to buy AMD (Score:4, Insightful)
    
    by Enderandrew ( 866215 ) writes: <enderandrew@NOsPAM.gmail.com> on Wednesday August 15, 2007 @10:16PM (#20244623) Homepage Journal
    
    Funny. I've seen a $59 Brisbane core (1.9 out of the box) overclocked to 2.9 GHz with just air cooling, so I'm not sure why everyone insists AMD can't hit the 3GHz barrier, especially when AMD keeps displaying 3GHz Barecelonas.
    
    There are three reasons to buy AMD right now.
    
    1. Price, price and price. AMD knows Intel has the better fab, but AMD is selling super cheap. You can get a dual-core processor for half what Intel charges, and for the average user, it is more than enough. I'm running Oblivion at 30 FPS with a $59 processor, and I've barely overclocked it. The cheapest Intel dual-core proc was $120 when I bought my $59 proc. Most people have no idea that their proc these days often underclocks itself, and you rarely touch the full potential of your proc. Intel is faster, and no one doubts that today, but if you never see the speed benefit, why spend the extra dollars? On a performance per dollar basis, AMD wins hands down.
    
    2. There is a mountain of evidence against Intel for anti-trust violations, and I try not to financially support evil. The EU is also coming down on Intel for anti-trust violations.
    
    3. Even if the anti-trust suits both come through, AMD is near bankruptcy, and I prefer choice in the marketplace. I am terrified of the day when Intel has no competition pushing them and they can just sell what they want and whatever price they want.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Interesting)
      
      by Wavicle ( 181176 ) writes:
      
      Funny. I've seen a $59 Brisbane core (1.9 out of the box) overclocked to 2.9 GHz with just air cooling, so I'm not sure why everyone insists AMD can't hit the 3GHz barrier, especially when AMD keeps displaying 3GHz Barecelonas.
      
      Gosh, maybe you should go tell AMD that they aren't having any trouble with leakage, the yield of their 65nm parts is optimal and they can start volume production right now! The time AMD has spent not shipping Barcelona has been costing them dearly. Did you see the loss they posted la
      - Re: (Score:2)
        
        by Enderandrew ( 866215 ) writes:
        
        Are you trolling? Sure seems like it to me.
        
        Intel demanded that people not carry or display AMD products, or they'd refuse to ship product they already purchased. That is pretty clearly evil.
        
        Intel doesn't have to buy AMD's IP. If AMD goes belly up, then Intel will have an unchallenged monopoly, and no one has suggested trying to compete with them.
        
        Barcelona is late, and Intel does have a better manufacturing process. No one is contesting either of these points, but cheap AMD processors are reaching the 3
        
        Re: (Score:2)
        
        by Wavicle ( 181176 ) writes:
        
        Are you trolling? Sure seems like it to me.
        
        No, I'm stating some inconvenient truths.
        
        Intel demanded that people not carry or display AMD products, or they'd refuse to ship product they already purchased. That is pretty clearly evil.
        
        It's an alleged evil. Since the only major Intel-only brand in the US was Dell, I don't find it a particularly compelling case of evil. In fact it is a pretty short walk from saying Intel was manipulating Dell to saying Dell was manipulating Intel (hey Intel, I hear AMD has some p
        
        Re: (Score:2)
        
        by Enderandrew ( 866215 ) writes:
        
        When I bought my last processor, it was a few months back before all the AMD price cuts by the 3600+ was $59 and the cheapest Core 2 Duo was $120.
  - - Re: (Score:2, Funny)
      
      by Anonymous Coward writes:
      
      94.3% of all statistics are made up on the spot.
      - Re: (Score:2)
        
        by jombeewoof ( 1107009 ) writes:
        
        You can make up statistics to prove anything. 16% of all people know that.
        
        Re: (Score:2)
        
        by BillyBlaze ( 746775 ) writes:
        
        Only 3% of Slashdot users haven't heard that joke, and only 2% of those who have still think it's funny for the (on average) 36.4th time.
        
        Re: (Score:3, Funny)
        
        by jombeewoof ( 1107009 ) writes:
        
        It looks like I have a fan.
        good times. I guess I'll have to start wearing pants now though.
  - - Re: (Score:2)
      
      by be-fan ( 61476 ) writes:
      
      Entertainingly, PPC code is much larger than AMD64 code, prefix bytes or no.
- Re: (Score:2)
  
  by ZakuSage ( 874456 ) writes:
  
  You can poo-poo Java all you want, but the reality is that it's made programming a lot easier for the "rest of us", especially in a world where cross platform compatibility is key.
  - Re: (Score:2, Funny)
    
    by kiddygrinder ( 605598 ) writes:
    
    Java never made anything easier for anyone and you know it.
  - Re: (Score:2)
    
    by Enderandrew ( 866215 ) writes:
    
    Java is a great concept with piss-poor execution.
    
    Oddly enough, the same code can often be compiled cross-architecture and cross-platform quite easily on GCC that provides a nice, fast executable native to each platform and architecture and it uses a fraction of the start-up speed and resources of Java.
    
    I'm a crappy programmer, and even that is transparent to me.
- Re: (Score:2)
  
  by Chris Burke ( 6130 ) writes:
  
  Yet another waste of silicon to 'accellerate' badly written software.
  
  AND well-written software. What, you think you could write code that's just as fast without all the "hardware acceleration" being done for you, without using any instruction set extensions that have been added over the years? You are on crack.
  
  Instead of devoting transistors to speed up the latest toy programming languages ('managed' code), why can't we just train programmers better?
  
  And better profiling tools are contrary to this goal how
  - Re: (Score:2)
    
    by 12357bd ( 686909 ) writes:
    
    There's a matter of degree, to be sure, but even still you're most likely wasting your time "optimizing" individual lines of C code since the compiler can probably do a better job and that's been the case for quite a while.
    
    Terrible, if people start to give up to optimize the code (and understanding why it works), the net result will always be a noticeable decrease in programming quality (a very usual situation).
    I know that you are aiming at premature optimization, and you are really right on this one, b
- Re: (Score:2)
  
  by x2A ( 858210 ) writes:
  
  Profiling is useful for code produced by any language, and being able to profile without adding code, eg, at the beginning of functions, means you get to see how the actual software runs, without doing things that affects caching etc (for example, profiling code might push certain instructions onto a different cache line, skewing the results)
- Re:Nothing special for Java or .NET (Score:5, Insightful)
  
  by Wesley Felter ( 138342 ) writes: <wesley@felter.org> on Wednesday August 15, 2007 @06:34PM (#20242451) Homepage
  
  Performance counters could be used by JITs to generate more optimized code. I wonder which programming languages use JITs...
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by TheNetAvenger ( 624455 ) writes:
  
  That must have been speculation or a SWAG from the poster to suggest it could be used to accelerate Java and/or .NET. There is nothing special about java or net that would allow this optimization.
  
  Ok, sorry, wrong, and yes, wrong again...
  
  The notes about .NET and JAVA come specifically from AMD themselves.
  
  The reason it would benefits these environments is because they are processed on the fly and the environment could make the 'adjustments' to the code at runtime instead of it be 'locked' as natively compiled
- Re: (Score:2)
  
  by cbhacking ( 979169 ) writes:
  
  You're missing the point here. They aren't talking about accelerating the frameworks (the JRE, the Common Language Runtime, any other program that was compiled to native code at or before install time), they are about accelerating the applications that are run using those frameworks. The reason is that both frameworks use JIT-compiled code (I believe old JVMs translated instructions individually, but these days I'm pretty sure the whole .class gets compiled to native code just before execution).
  
  The advantag
- Re: (Score:2)
  
  by x2A ( 858210 ) writes:
  
  "There is nothing special about java or net that would allow this optimization"
  
  Sure there are. A profiler could quickly pick up on a function that's getting called many times from within a loop, and decide it could speed it up more by inlining it. Or, a bit of inline code that isn't being used often could be moved out of line, so the rest of the loop fits into a single cache line.
- - - Re: (Score:2)
      
      by TheNetAvenger ( 624455 ) writes:
      
      There already are systems which do exactly that (optimise dynamically C programs)
      
      I don't disagree with the notion that any natively compiled language could be scaled to take advantage of this, a good solution would be an OS level scheduling mechanism for natively compiled applications that could make the decisions based on the information the AMD instructions would be offering.
      
      However, the reference you cite is more about basic instruction changing and not the dynamics of testing to see what threads are bus
- Re: (Score:2)
  
  by gnasher719 ( 869701 ) writes:
  
  >> "They would give software access to information about cache misses..." Yeah that ought to help significantly with side-channel attacks against crypto software.
  
  I think you didn't read the spec. All that information is only available to the thread that is profiled; everything is context-switched so it can't leak out to other threads and definitely not to other processes.
- Re: (Score:2, Funny)
  
  by Short Circuit ( 52384 ) writes:
  
  I see all fuss about programming. easy. don't what the is parallel It's
  I see all fuss about programming. easy. don't what the is parallel It's
  I hereby propose that execution is in order for out of order speech.
- Re: (Score:2)
  
  by UID30 ( 176734 ) writes:
  
  I think I'm missing some op-codes.
- Re: (Score:2)
  
  by Courageous ( 228506 ) writes:
  
  I never quite understood why chip manufacturers had added cores long after memory bandwidth had became a problem.
  
  They've been increasing bandwidth while adding cores, and those cores also happen to have things like L1 and L2 caches, and so forth.
  
  C//

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Just performance counters? (Score:3, Informative)

Re: (Score:2)

Re:Just performance counters? (Score:4, Informative)

reminds me of the PS2's PA (Score:2)

Re: (Score:2)

I wish AMD and Intel teamed up for once (Score:3, Funny)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:I wish AMD and Intel teamed up for once (Score:4, Insightful)

Re: (Score:2, Interesting)

Re: (Score:3, Insightful)

Re:I wish AMD and Intel teamed up for once (Score:4, Informative)

Re: (Score:2)

Map and reduce? (Score:4, Interesting)

Re: (Score:2, Interesting)

Then again, schools are partly to blame (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Interesting)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re:I wish AMD and Intel teamed up for once (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:I wish AMD and Intel teamed up for once (Score:5, Insightful)

Re: (Score:3, Insightful)

Re:I wish AMD and Intel teamed up for once (Score:5, Interesting)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:3)

Re: (Score:3, Insightful)

Re:ARM CPUS outnumber x86 by a huge factor -probab (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

I think this is great (Score:3, Funny)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:3, Interesting)

You can get the x86/EMT64 documentation from intel (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Will Intel Adopt These Instructions? (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

woops (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Logical reasons to buy AMD (Score:4, Insightful)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)