AMD's Showcases Quad-Core Barcelona CPU

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

AMD's Showcases Quad-Core Barcelona CPU 190

Posted by Zonk on Saturday February 10, 2007 @03:29AM from the four-times-the-fun dept.

Gr8Apes writes "AMD has showcased their new 65nm Barcelona quad-core CPU. It is labeled a quad-core Opteron, but according to Infoworld's Tom Yeager, is really a redefinition of x86. Each core has a new vector math processing unit (SSE128), separate integer and floating point schedulers, and new nested paging tables (to vastly improve hardware virtualization). According to AMD, the new vector math units alone should improve floating point operation by 80%. Some analysts are skeptical, waiting for benchmarks. Will AMD dethrone Intel again? Only time will tell."

This discussion has been archived. No new comments can be posted.

AMD's Showcases Quad-Core Barcelona CPU

Load All Comments

Search 190 Comments Log In/Create an Account

Comments Filter:

Bit Slice (Score:2)

by flyingfsck ( 986395 ) writes:

Things have come a long way since the heady days of bit slice processors. The first microcode I wrote was for an XOR operation - I could not think of anything simpler, that would actually do something useful...
But SSE is already 128 bits! (Score:2)

by pammon ( 831694 ) writes:

Anyone know what "SSE128" means? SSE registers have been 128 bit from day one.
- Re:But SSE is already 128 bits! (Score:5, Informative)
  
  by Zenki ( 31868 ) writes: on Saturday February 10, 2007 @03:45AM (#17960622)
  
  SSE+ operations up until now were operated on 64 bit at a time within the processor. SSE128 just means the new AMD chip will complete a SSE instruction in one pass.
  
  This was pretty much the reason why most people only bothered with MMX optimizations in their applications.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Interesting)
    
    by pammon ( 831694 ) writes:
    
    SSE+ operations up until now were operated on 64 bit at a time within the processor
    Hmm...do you mean specifically on AMD's hardware? That stopped being true for Intel starting with the Core, which has 1-cycle latency on SSE instructions.
    - Re:But SSE is already 128 bits! (Score:5, Informative)
      
      by waaka! ( 681130 ) writes: on Saturday February 10, 2007 @06:00AM (#17961192)
      
      Hmm...do you mean specifically on AMD's hardware? That stopped being true for Intel starting with the Core, which has 1-cycle latency on SSE instructions.
      
      Core2 has single-cycle throughput on most SSE instructions, not single-cycle latency. Most of these instructions still take 3-5 cycles to generate results, which is similar to the Pentium M, but now a vector of results finishes every cycle, instead of every two or four cycles.
      
      An important consequence of this is that if your instructions are poorly scheduled by the compiler (or assembly programmer) and the processor spends too much time waiting for results of previous operations, the advantages of single-cycle throughput mostly disappear.
      
      Parent Share
      twitter facebook
      - Re:But SSE is already 128 bits! (Score:5, Informative)
        
        by pammon ( 831694 ) writes: on Saturday February 10, 2007 @06:33AM (#17961332)
        
        Core2 has single-cycle throughput on most SSE instructions, not single-cycle latency
        Well, certainly you won't be able to get a square root through in one clock cycle, but many/most of the simple integer arithmetic, bitwise, and MOV SSE instructions on the Core 2 really do have single cycle latency. source [agner.org]. None do on the AMD64, which supports the theory that SSE128 means more "new for us" than "new for everyone." Not to put AMD down - many of the other features sound promising (but the article is long on breathlessness and light on details, alas).
        
        Parent Share
        twitter facebook
- Comment removed (Score:5, Informative)
  
  by account_deleted ( 4530225 ) writes: on Saturday February 10, 2007 @03:50AM (#17960650)
  
  Comment removed based on user account deletion
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Interesting)
    
    by adam31 ( 817930 ) writes:
    
    With the other chips, you have to load the first part(if it's a full 128bit instruction, or if it's multiple instructions added together), save, load, save, add, execute.
    Please explain this. Do I understand correctly that you think some SSE instuctions are 16 bytes? Issuing is one thing, and latency another. In most cases I've found AMD/Intel can issue 1 mulps/shufps/adds per cycle, the *ss instructions at 2 per (AMD sometimes 3 per cycle). If you mean that only the first 64-bits, 2 components, are
  - Re: (Score:2)
    
    by Lost Race ( 681080 ) writes:
    
    SSE first appeared in the Katmai (that's why SSE was also known as "KNI" or "Katmai New Instructions") which was produced in a 250 nm (0.25 micron) process. 250 nm was already pretty mature when the Katmai came out so I doubt they ever targeted the design for 350 nm production.
Dethrone? No. (Score:2, Insightful)

by NXprime ( 573188 ) writes:

"Will AMD dethrone Intel again?" Dear AMD, meet Larrabee. http://www.theinquirer.net/default.aspx?article=37 548 [theinquirer.net] AMD might kick Intel in the nuts a little but definitely not dethrone.
- Re: (Score:2, Interesting)
  
  by robinvanleeuwen ( 1009809 ) writes:
  
  No but a good hard, well aimed, holding nothing back kick in the nuts can leave them impotent,
  so they'll have to do some ugly procedures to survive it in the long run. A couple of identical
  blows in the meantime could leave them sterile, so if the current setups begin to die out.
  And Intel had no more babies waiting anymore, they will not be dethrowned, but will be getting
  an hounerable mention in the history books.
- GPU not CPU - Re:Dethrone? No. (Score:3, Insightful)
  
  by dave1g ( 680091 ) writes:
  
  read the article, that is an x86 GPU it wouldn't be able to compete with general purpose CPUs
Is dethroning Intel the point? (Score:5, Insightful)

by Weaselmancer ( 533834 ) writes: on Saturday February 10, 2007 @03:50AM (#17960654)

As long as AMD and Intel continue to chase each other in the x86 market, high end chips become low end in the span of six months. Just keep buying 6 months behind the press releases and you get great processors for next to nothing.

Share
twitter facebook
Huh? (Score:2)

by sumdumass ( 711423 ) writes:

It is labeled a quad-core Opteron, but according to Infoworld's Tom Yeager, is really a redefinition of x86.

I don't get the surprise or disapointment here. It appears that the submiter thinks x86 isn't an opertron or something. As far as i know, the opertron is the same thing- IE and extention to the x86 that can handle 64 bit extentions.

Am i missing something or am i completly wrong?
Well.... (Score:2, Interesting)

by Creepy Crawler ( 680178 ) writes:

Keeping in scientific fact, how much heat has to be generated for 1 MIPS?

The fact is, absolutely none. It has been shown that only the destruction of information via AND and like instructions create entropy (heat). As long as you use only 3 types of gates (pass through, not, xor), you can create a heat-free CPU. Provided we do want to check for bit errors, we could maintain a very low heat via ECC like checking. Estimates on that are 10^8 lower than present.

We could keep 98% of our efficiency of current day
- Re: (Score:2)
  
  by DimGeo ( 694000 ) writes:
  
  If memory serves right, you need the constant 1 and xor to for a Boolean base with xor.
- Re: (Score:3, Informative)
  
  by Khyber ( 864651 ) writes:
  
  Heat-free? Did you forget the Second Law? Or did you just forget about pure friction itself? Moving ANYTHING is going to involve friction. Nothing moves without SOME force, and friction will happen.
  - Re: (Score:2)
    
    by OrangeTide ( 124937 ) writes:
    
    electrical resistance formulas cannot be derived from friction formulas. friction is a macroscopic thing that is the statistically accumulation of many microscopic effects.
  - Re: (Score:2)
    
    by Enrique1218 ( 603187 ) writes:
    
    Take it more fundamental than that. Temperature arise from the motion of atoms in a material. Switching transistor the way they do, allows those electrons to really get those atoms moving.
- - Re:Well.... (Score:4, Interesting)
    
    by Creepy Crawler ( 680178 ) writes: on Saturday February 10, 2007 @04:48AM (#17960916)
    
    ---That sounds very interesting. Would you mind providing a link to the literature that discusses that ? I have some trouble figuring out the thermodynamics of this. Perpetum mobile and such, you know....
    
    Of course. It, at first, sounds too good, but here you go.
    
    Rolf Landauer showed in 1961 that reversible logic operations could be performed by neither using energy or taking heat out. The same could not be said for irreversible logic operations.
    
    "Irreversibility and Heat Generation in the Computing Process" IBM Journal of Research Development 17 (1973): 525-32, IBM PDF [ibm.com]
    
    ___
    
    In 1973, Charles Bennett proved that any computation could be derived from purely reversible computing.
    
    Charles H. Bennett "Logical Reversibility of Computation" IBM PDF [ibm.com]
    
    ___
    
    Later on, Fredkin and Toffoli presented a review of the ideas of reversible computing. The essential idea is that you can save all intermediary states between an algorithm to get the answer, and then reverse the process so that no energy is used, and generated no heat. Fredkin also indicates that if we switched from irreversible to reversible computing, we would expect to lose no more than 1% efficiency.
    
    International Journal of Theoretical Physics 21 (1982):219-53 PDF [digitalphilosophy.org]
    
    ___
    
    And as an unsubstantiated claim, I remember hearing that due to heat/radiation sources, that volatile memory gains errors of 1 bit per billion with a time from 1 minute to 1 day ( I forget the exact time). To correct this would only require the entropy of deleting that incorrect bit. In other words, 10^8 or so magnitude heat shrinkage. But trust the stuff above.
    
    (Many of these ideas were taken from "The Singularity is Near" by Ray Kurzweil from page 130)
    
    Parent Share
    twitter facebook
    - Re:Well.... (Score:5, Insightful)
      
      by drgonzo59 ( 747139 ) writes: on Saturday February 10, 2007 @05:25AM (#17961066)
      
      And how exactly is your reversible computing going to reduce the resistance of millions and millions of conductors to 0. You are confusing a theoretical issue relating to computer science (and very relevant to quantum computing) with a practical problem of a CPU design. Just moving information around _without_ deleting it will generate heat.
      
      Or did you actually think that those "stupid" CPU designers for all this years, battling with heat dissipation, never thought of, oh.. simply replacing the nand gates with reversible Fredkin and Toffoli gates and 'poof' magically all the heat issues are gone, processors will run @ hundreds of GHz, the wold's electrical power consumption will go down and the geeks won't be able to boast about their huge ass sinks anymore...
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by chthon ( 580889 ) writes:
        
        Yeah, I think there was an article back in 1993 or 1994 in Byte about such processors. It seems that in practice, the theory doesn't add up.
      - Re: (Score:2)
        
        by balloonhead ( 589759 ) writes:
        
        Very aggressive tone you have there. Most likely to get modded flamebait (as you have done) and get people to disregard what you say.
        Mr. Coward, in case you have not read the article, the conversation actually is about AMD's new processor, which is a real processor. That processor will generate some amount of heat ... real heat, not theoretical heat.
        The conversation might have started out as that, but this thread has gone somewhere else. This is a natural part of any discussion. It does not make it off
    - - Re: (Score:2)
        
        by rbarreira ( 836272 ) writes:
        
        If you reverse the computation, are you still "allowed" to know the answer ?
        
        As far as I remember reading, outputting answers adds a bit of heat output to the equation, but doesn't prevent you from using reversible circuits.
      - Re: (Score:2)
        
        by Mr Z ( 6791 ) writes:
        
        It's not enough to merely limit yourself to NOT, XOR and pass-through, as traditional implementations still destroy information in a way. Traditional gates are made of switches: When you switch the input to an inverter (NOT gate) off, the output switches on by closing a switch to Vdd and opening the switch to ground. Some current flows from Vdd to the inputs of whatever gates the inverter is driving. When you switch the input to that inverter on, the switch to Vdd opens and the switch to ground closes.
AMD64 is very fast (Score:5, Interesting)

by GreatDrok ( 684119 ) writes: on Saturday February 10, 2007 @04:17AM (#17960760) Journal

In my own benchmarks (generic C integer and floating point scientific code) I have found that the Core Duo and Core 2 Duo aren't all that quick compared with an AMD64. Clock for clock the AMD64 Opterons we have are about 50% quicker than an equivalent Core 2 Duo for integer work. I know this doesn't agree with all the usual magazine benchmarks but they are heavily biased towards using SSE instructions where possible and it is SSE where the Core 2 Duo has been a real improvement over previous Intel designs and also bests the AMD chips. Hopefully, AMD has recognised this and the new SSE implementation will bring them back on par with Intel for these benchmarks but even today an AMD64 processor is a beast and more than a match for anything Intel produces.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by pjbass ( 144318 ) writes:
  
  Care to publish your numbers that debunk all the other hardware sites that are typically AMD-biased anyways?
  
  And pointing out that it isn't fair to compare because a Core2 duo already executes the full SSE instruction in one pass vs. the 2 clocks for a curret AMD64 is the same as saying it's not fair to compare the on-die memory controller on AMD's vs. Intel's FSB. But people didn't seem to care when the numbers went in AMD's favor.
  
  I'd really be interested in seeing your numbers, your programs, and what com
  - Re:AMD64 is very fast (Score:5, Informative)
    
    by GreatDrok ( 684119 ) writes: on Saturday February 10, 2007 @04:54AM (#17960946) Journal
    
    "Care to publish your numbers that debunk all the other hardware sites that are typically AMD-biased anyways?"
    
    OK. I can't give you the code but it is my own implementation of a pretty standard bioinformatics sequence comparison program which doesn't use SSE/MMX type instructions and is single threaded. On all platforms it was compiled using gcc with -O3 optimisation. I have tried adding other optimisations but it doesn't really make much difference to these numbers (no more than a couple of percent at best).
    
    AMD Opteron 2.0Ghz (HP wx9300) - 205 Million calculations per second
    Intel Core 2 Duo 2.66Ghz (Mac Pro) - 146 Million
    Intel Core Duo 2.0 Ghz (MacBook Pro) - 94 Million
    IBM G5 PPC 2.3 Ghz (Apple Xserve) - 81 Million
    Motorola G4 PPC 1.42 Ghz (Mac mini) - 72 Million
    Intel P4 2.0 Ghz (Dell desktop) - 61 Million
    Intel PIII 1.0 Ghz (Toshiba laptop) - 45 Million
    
    Interesting things about these numbers. The Core Duo is clearly a close relative of the PIII since the performance at 2Ghz is roughly twice that of the PIII at 1Ghz. The P4 at 2Ghz is really very poor indeed which isn't a huge surprise as it was never very efficient. The G4 PPC puts in a reasonable result easily beating the much higher clocked P4 (what, the Mac people were right? Shock!) although I have to say that the performance of the G5 is disappointing. The Core 2 Duo isn't a bad performer although it does have the highest clock speed of any processor in this set but it is seriously beaten by the Opteron. From these numbers, a Core 2 Duo at 2Ghz would be about half as quick as an Opteron at the same speed.
    
    Parent Share
    twitter facebook
    - Show us your source code (Score:2, Insightful)
      
      by rbarreira ( 836272 ) writes:
      
      Well, until you show us your source code those numbers are as believable as anything else one might randomly type here...
      - Re: (Score:2)
        
        by GreatDrok ( 684119 ) writes:
        
        "Well, until you show us your source code those numbers are as believable as anything else one might randomly type here..."
        
        I can't because the program is really large and it doesn't entirely belong to me (you know, work for people, they own your code).
        
        You're right, I could just be making these numbers up and if you prefer to believe that then there is nothing I can do to change your mind. All I can say is that this is my own (admittedly anecodatal) experience.
        
        Re: (Score:2)
        
        by rbarreira ( 836272 ) writes:
        
        Actually I was thinking more about benchmarking/coding flaws than lying from your part.
        
        Re: (Score:3)
        
        by GreatDrok ( 684119 ) writes:
        
        "Actually I was thinking more about benchmarking/coding flaws than lying from your part."
        
        Certainly a possibility. In my defense I would like to point out that all benchmarks are open to question. I know my own code, I know what it does and it doesn't do much but it does a lot of it so the performance figures are what they are. I originally wrote this code on an SGI, ported it to Linux on a 486, SPARC, Alpha, PPC and so on. Its old and simple but does real work. While I could make it faster using SSE an
    - Re: (Score:2)
      
      by kestasjk ( 933987 ) * writes:
      
      I hope the benchmarks don't take get advantage out of using 64-bit arithmetic.
      - Re: (Score:2)
        
        by GreatDrok ( 684119 ) writes:
        
        "I hope the benchmarks don't take get advantage out of using 64-bit arithmetic."
        
        Nope, straight 32 bit. If it had been 64 bit then the Core 2 Duo would also have seen a more significant boost versus its 32 bit predecessor not to mention the G5 should have been better than the G4 which it wasn't.
        
        Re: (Score:2)
        
        by NovaX ( 37364 ) writes:
        
        No, Core2 has worse [xbitlabs.com] 64-bit performance.
    - Re: (Score:2)
      
      by jez9999 ( 618189 ) writes:
      
      Any numbers for an Athlon 64? I just bought a 3800+ single core and would like to be made really excited about it. :-P
      
      Also which of these chips are single, and which are dual, and which are quad cores?
      
      What's the point of dual and quad core, anyway? Anyone figured out why it's better than just having 2/4 CPUs?
      - Re: (Score:2)
        
        by GreatDrok ( 684119 ) writes:
        
        "Any numbers for an Athlon 64? I just bought a 3800+ single core and would like to be made really excited about it. :-P"
        
        Pretty much the same as the Opteron in this case. The program doesn't really hammer cache or main memory, just the CPU. Work out your clock speed as a percentage of 2Ghz and do the sums and that should be the number.
        
        The Opteron, Core 2 Duo and Core Duo are all dual core chips in this test, the others single core although the G5 was a dual processor system. Since the program is single th
      - Re: (Score:2)
        
        by ocbwilg ( 259828 ) writes:
        
        What's the point of dual and quad core, anyway? Anyone figured out why it's better than just having 2/4 CPUs?
        
        It's better than just having 2/4 CPUs because you can now get dual CPU functionality on consumer-level mainboards. You get SMP without having to shell out for workstation or server level hardware. Of course, if you do have workstation or server boards with 2 or 4 CPU sockets on it, then you can put dual or quad core CPUs in those sockets as well. So instead of having 2-way SMP with 2 sockets yo
    - Re:AMD64 is very fast (Score:4, Insightful)
      
      by waaka! ( 681130 ) writes: on Saturday February 10, 2007 @06:19AM (#17961272)
      
      OK. I can't give you the code but it is my own implementation of a pretty standard bioinformatics sequence comparison program which doesn't use SSE/MMX type instructions and is single threaded. On all platforms it was compiled using gcc with -O3 optimisation. I have tried adding other optimisations but it doesn't really make much difference to these numbers (no more than a couple of percent at best).
      
      When you say you've tried "adding other optimizations," are you referring only to other GCC optimization flags? If your program's algorithms have any moderate degree of parallelism and you haven't tried vectorization either by compiler (GCC and ICC can both do this) or by hand, the benchmark you've done is not unlike a race where no one is allowed to shift out of first gear. Can you go into any more specifics about how this program does sequence comparisons?
      
      Also, the disappointing numbers from the G5 may be partially explained by the fact that its integer unit has higher latency than the other desktop processors in that list. The G5 isn't exactly known for blistering integer performance, anyway.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by GreatDrok ( 684119 ) writes:
        
        I should say that this program was written a very long time ago originally. It implements an efficient but standard Smith and Waterman dynamic programming algorithm. I have done vectorisation of this algorithm in the past and the performance improvement was dramatic (about x20). With this test program though, it hasn't really benefited from extreme compiler optimisations. I do remember running it after compiling with ccc on an Alpha and seeing a 30% speedup so there is definitely room for improvement bu
      - perfectly fine for a CPU benchmark (Score:2)
        
        by r00t ( 33219 ) writes:
        
        We're not testing the compiler. IMHO, turning optimization OFF would be a fine idea, or at least unobjectionable.
        
        The only important thing is that the compiler choices and options are fair. Using gcc on the Opteron and icc on the Core Duo would not be fair. Using gcc everywhere, with the same options, it completely fair.
        
        One can also define "fair" as "all systems tweaked to the max", but this is rather difficult to do right. (see also: OS benchmarks, where the benchmarker knows all the ways to tweak the OS he
    - Re: (Score:3, Interesting)
      
      by NovaX ( 37364 ) writes:
      
      AMD64 is not a processor, it is an instruction set. So you need to clarify whether you compiled your programs using 32-bit or 64-bit x86 instructions. I am not a gcc user, but I'm assuming that it chooses the default architecture based on your environment settings, thus AMD64 on 64-bit Linux. Since you've included a PowerPC processor, its really not obvious.
      
      When the Core2 was released, benchmarks made it clear that Intel did not optimize for 64-bit performance. They have the architecture, but they pushed th
      - 32 vs 64 (Score:2)
        
        by mrnick ( 108356 ) writes:
        
        It depends on what kind of data you are processing. If you are doing 32 bit calculations then you would want to compile your code for 32 bit, assuming your processor can handle it, as most 64 bit CPU can. If you are using 64 bit calculations then of course the 64 bit CPU would out perform the 32 bit as you would have to do additional coding steps to simulate 64 bit on 32 bit architecture, multiple 32 bit operations with bit shifting and the like.
        
        If you took code that was written for 32 bit operations and
        
        Re: (Score:2)
        
        by NovaX ( 37364 ) writes:
        
        You're assuming that the only difference between 32-bit and 64-bit x86 instructions are the bit sizes. That's not true, and the most immediate gain from AMD64 are the extra registries. There are a lot of changes to the ISA that will dramatically skew the results. The only negative results you would get compiling 32-bit code to 64-bit would be: A) The cache can contain fewer entries; B) Platform assumptions, such as when performing pointer arithmetic, would break. His code is probably fairly clean since he
    - Re: (Score:2)
      
      by ben there... ( 946946 ) writes:
      
      Intel Core 2 Duo 2.66Ghz (Mac Pro) - 146 Million
      
      Where did you get a Mac Pro with a Core 2 Duo?
      
      Should be LGA-771 2-socket Xeon Woodcrest, and not fit a LGA-775 C2D, right?
    - Re: (Score:2)
      
      by jcupitt65 ( 68879 ) writes:
      
      I have some benchmarks too:
      
      http://www.vips.ecs.soton.ac.uk/index.php?title=Be nchmarks [soton.ac.uk]
      
      Again, plain C code, no SSE/whatever. It is threaded, which makes it slightly different. The source is there too.
      
      Results:
      
      Opteron 850, 2.4 GHz, 4 CPUs, 4.5s
      Opteron 254, 2.7 GHz, 2 CPUs, 6.9s
      P4 Xeon (64 bit), 3.6 GHz, 2 CPUs (4 threads), 7s
      Core Duo, 2.0 GHz, 2 CPUs, 18.1s
      P4 Xeon (32 bit), 3.0 GHz, 2 CPUs (4 threads), 19.7s
      P4 (Dell desktop), 2.4 GHz, 1 CPU, 36.6s
      PM (HP laptop), 1.8 GHz, 1 CPU, 58.5s
      
      So I agree: an Opteron
    - Re: (Score:2)
      
      by RightSaidFred99 ( 874576 ) writes:
      
      Good god, how did this get modded as informative? "This just in, random poster redefines reality, AMD64 really faster than Core 2 Duo regardless of the tons of real world application performance data which completely contradicts this!!!"
      Please people, get a grip. This guys little application does tons of random memory reads. This is the one area where the Opteron still kicks ass because it has an IMC. The number of applications where this is useful is fairly small, and it's been known for a long time.
      - Re: (Score:3, Informative)
        
        by GreatDrok ( 684119 ) writes:
        
        "This guys little application does tons of random memory reads"
        
        If only that was the case but actually it is very linear. The application can hold the whole of its memory requirements in cache these days so it hardly has to touch main memory and it was designed to do all the inner loop code using only registers. Heck, I doubled the size of the inner loop just to avoid a single register copy because it made a significant performance increase.
        
        The reason I like this code is that it shows how many operations y
        
        Java is slow on x86 (Score:2)
        
        by r00t ( 33219 ) writes:
        
        Java is big-endian, like the SPARC and G4.
        
        Java has strictly-defined floating-point math that is incompatible with the x86. An x86 chip must save floating-point options out to memory to force the exponent to be the right size.
        
        JIT/emulation systems in general, including Java, do better with more registers. The G4 has about 6x as many once you exclude registers that are unavailable. (about 5 for x86, but at least 30 for the G4)
    - Re: (Score:2)
      
      by pjbass ( 144318 ) writes:
      
      One of the things that makes a big impact on performance, on any platform, is the type and speed of memory used. Looking at your list of platforms above, I see an HP Workstation used with the Opteron. Not having one in front of me to verify, but reading on HP's website what chipset and memory is available, you could get a very distinct increase in performance simply due to lower memory latency through the chipset and memory type.
      
      What chipset(s) and memory were used in the Mac's? Were they on-par with a w
    - Re: (Score:2)
      
      by MemoryDragon ( 544441 ) writes:
      
      Ok thanks for the interesting result, an IBM Power5 would be more interesting in this comparison. But your core Duo Opteron comparison is somewhat flawed in the logic, you compare basically single core performance of those processors, if you split the problem up in multiple threads then the results will look entirely different, and that is what multiple cores are about, as many threads and processes as possible without significant slowdown. Given modern operating systems hosting 20-100 processes each of the
    - - Re:AMD64 is very fast (Score:4, Informative)
        
        by GreatDrok ( 684119 ) writes: on Saturday February 10, 2007 @06:03AM (#17961208) Journal
        
        "The P3 you list looks a Coppermine, I suspect a P3 Tualatin would perform much better."
        
        Pretty sure it is a Tualatin since it is a 1Ghz PIII Mobile which I bought in early 2002 (http://www.theregister.co.uk/2001/01/31/chipzilla _readies_1ghz_mobile_piii would seem to support this).
        
        Given that it is a Tualatin, then the peformance of the Core Duo at 2Ghz looks about right. The Core 2 Duo gets about 10% better performance clock for clock from all the blurb I have read except when it comes to SSE where it is about twice as fast so the performance figure of 146 million also looks pretty much on the mark too as a 2Ghz Core 2 Duo should be able to manage about 110 million if you scale the figure for clock speed and that is (surprise) ~10% quicker than the Core Duo at 2Ghz (94 million) so the basic integer performance of the Core 2 Duo is better than the Core Duo but doesn't compare with the 205 million the 2.0Ghz Opteron manages.
        
        Parent Share
        twitter facebook
    - - Re: (Score:2)
        
        by MemoryDragon ( 544441 ) writes:
        
        Ahem you have to see it differently, you can launch way more programs parallely without having a huge impact on performance. Multicores can go a long way, especially if you do vm stuff and java development where you have to juggle threads by the dozends...
      - that's what we all run though, and it can be OK (Score:3, Interesting)
        
        by r00t ( 33219 ) writes:
        
        The proper fix is to run multiple copies of the benchmark.
        
        I'm using Linux, with single-threaded apps, but so what? I run lots of things at once:
        
        X, window manager, xterm, editor -- that is 4, plus the kernel
        
        X, xterm, tar, gzip -- that is 4, plus the kernel
        
        X, xterm, make, bash, cc1, cc1, cc1, gas, gas, ld... -- that's a lot of things!
    - - vector units mostly sit idle (Score:2)
        
        by r00t ( 33219 ) writes:
        
        The main use of vector units is running crappy Windows gamer "benchmarks" and MacOS Photoshop "benchmarks". The games don't even use the vector units all that much. It's just the benchmarks that use the vector units.
        
        In the real world, vector units aren't good for much at all. You can do radar processing with them, but that isn't exactly a desktop app. Linux can use them for software RAID.
- - Re: (Score:2)
    
    by GreatDrok ( 684119 ) writes:
    
    "perhaps you need to write some more cache efficient code to test with. goto BLAS can feed the beast like no other."
    
    goto BLAS uses SSE so doesn't count. It has already been acknowledged that the SSE implementation of Core 2 Duo is very good. The new AMD chips may address this but we won't know until we see the benchmarks. For non-SSE the Core 2 Duo is a little better than the Core Duo which was similar to the PIII/PII/PentiumPro clock for clock. The current Opteron is much quicker clock for clock for no
- - Re: (Score:2)
    
    by MemoryDragon ( 544441 ) writes:
    
    No, but after reading this thread, I came to the conclusion that most people in here are completely clueless about multiprozessing and multithreading and that is what it is all about. If you dump one thread on a single core machine and on a multicore machine you probably will get better results at the single core machine, if you raise the thread number, the single core machine will relatively early reach its peak while the multicore machine will reach its performance peak way later, and that is all about. O
Wow (Score:2)

by 2ms ( 232331 ) writes:

I dont think I've ever read such an admiring review of a CPU design. Last time I remember a chip sounds so fantastic was the Alpha or something like ten years ago. If a lot of all the new things really work the way they sound in theory, well then yeah I guess it's evident this Barcelona thing is really going to be something else.

The design for VW performance sounds extra interesting
how do they fit a fourwheeler in the chip? (Score:2, Funny)

by pseudosero ( 1037784 ) writes:

I want a floating quad.
SSE128 means... (Score:2)

by adam31 ( 817930 ) writes:

Does SSE128 mean some significant departure from the doomed SSE instruction set?
I'm not kidding. In SSE I'm familiar with, one of the input registers is always an output register, which means its contents are destroyed. Another flaw is that there aren't enough registers... SSE uses 8, where 32 are commonly not enough when latency is longish (especially with SoA-style progamming, where pragmatically a single vec3 occupies 3 128-bit registers).
... or Madd. You know, multiply-add. Does it have that?
- Re: (Score:2)
  
  by philthedrill ( 690129 ) writes:
  
  Does SSE128 mean some significant departure from the doomed SSE instruction set?
  
  No. It means 128 bit SSE ops can be done in a single cycle instead of two (64-bit chunks).
  
  In SSE I'm familiar with, one of the input registers is always an output register, which means its contents are destroyed
  
  How is this different from regular x86 (non-SSE) instructions? They have two operands where one is a source and destination.
  
  Another flaw is that there aren't enough registers... SSE uses 8
  
  AMD64 specifies 16 SSE (XMM) regi
  - Re: (Score:2)
    
    by edwdig ( 47888 ) writes:
    
    The trade off to have 32 registers was probably not worth the die space and extra complexity. Having 16 probably gave most of the benefit, and having 32 provided diminishing returns.
    
    At least with the general purpose registers, AMD wanted to go to 32, but couldn't do it without changing the instruction set. I'd assume the same thing applies to the SSE registers.
    - Re: (Score:2)
      
      by philthedrill ( 690129 ) writes:
      
      At least with the general purpose registers, AMD wanted to go to 32, but couldn't do it without changing the instruction set. I'd assume the same thing applies to the SSE registers.
      
      How so? Unless I'm missing something here, I think the only cost is in the size of the register file and rename register set, but nothing ISA-related.
Great (Score:3, Funny)

by Trogre ( 513942 ) writes: on Saturday February 10, 2007 @05:33AM (#17961096) Homepage

So now I'll see four penguins at startup!

Share
twitter facebook
Barcelona vs Itanium in single and double float ? (Score:2, Interesting)

by tchiwam ( 751440 ) writes:

What really interest me is how does it compare with single and double precision calculations. If AMD gets in the range of Itanium performaces will Intel follow and kill their own Itanium by boosting core 2 FP ?
SUNDAY SUNDAY SUNDAY (Score:2)

by un1xl0ser ( 575642 ) writes:

Can we start rejecting 'scoops' that sound like a radio/TV demolition durby or monster-truck madness advertisement?
Junk article, full of inaccuracies. (Score:5, Informative)

by barracg8 ( 61682 ) writes: on Saturday February 10, 2007 @08:43AM (#17961908)
- Each of Barcelona's four cores incorporates a new vector math unit referred to as SSE128
SSE has always been 128bit (the 64bit simd extensions were called MMX). AMD used to funnel the instructions through a 64bit execution unit by splitting the work into two halves, the new core has a full 128bit SSE pipeline so doesn't need split the operations. Nothing new here, just a faster internal implementation. Can this deliver and 80% improvevment in benchmark performance? - quite possibly. Take a look at the Core2 FP perfromance numbers - it also has a full 128bit implementation of SSE.
- And separating integer and floating-point schedulers also accelerates this thing called virtualization
Huh. Hardware virtualization affects how the processor handles certain instructions such as priviledged operations. FP instruction execution is unaffected. Virtualized workloads will benefit no more than non-virtualized workloads. Separate issue queues are good but does it specifically benefit virtualization? - no.
- Barcelona blacks out power to individual portions of the chip that are idled, from in-core execution units to on-die bus controllers. This hasn't made it into PCs before ...
Intel call this 'intelligent power capability'.
http://www.intel.com/technology/magazine/computing /core-architecture-0306.htm?iid=search& [intel.com]
- Barcelona adds Level 3 cache, a newcomer to the x86
Xeons have featured L3 caches for years. http://en.wikipedia.org/wiki/List_of_Intel_Xeon_mi croprocessors [wikipedia.org]
- Barcelona is genius, a genuinely new CPU that frees itself entirely of the millstone of the Pentium legacy.
- Barcelona is a new CPU, not a doubling of cores and not extensions strapped on here and there.
Barcelona is an Opteron, with a doubling of cores and some extensions strapped on here and there.
I'm not meaning to detract from AMD here - the fact that they have still not had to make any radical changes to the opteron micro-architecture is a testament to the quality of the original design. They are slightly ahead of the game on virtualization - they're going to beat Intel to nested page tables - but other than that this chip is playing catchup. Overall this is going to be a very nice piece of kit to work with. But nothing radical and new here.
G.
Share
twitter facebook
- Re: (Score:3, Informative)
  
  by ocbwilg ( 259828 ) writes:
  
  Xeons have featured L3 caches for years. http://en.wikipedia.org/wiki/List_of_Intel_Xeon_m i [wikipedia.org] croprocessors
  
  Actually, if you go waaaay back to the Socket 7 days you could have L3 cache as well. The AMD K6 and K6-2 CPUs only had on-die L1, and the L2 cache was on the mainbaord. But the K6-3 CPU had 256KB or 512KB of on-die L2 and was compatible with the same mainboards. So when you put that K6-3 in a socket 7 mainboard the mainboard's cache actually functioned as L3. Sure it wasn't on-chip, but L3 cache
- Re: (Score:3, Informative)
  
  by ScriptedReplay ( 908196 ) writes:
  I fully agree, the article is mainly empty of information - it took words from AMD briefings and produced a meaningless salad.
  
  Now, as far as some claims, in detangled order:
  
  FPU boost: this seems to be based on several things - one is the obvious widening of SSE2 issues. Others are increasing instruction fetch from 16B/cycle to 32B/cycle, making the FPU scheduler 128bit, unaligned loads and a doubling of cache bandwidth.
  Virtualization: Nested page tables and reduces witching times for the hypervisor.
  Powe
  - Re: (Score:2)
    
    by Bj�rn ( 4836 ) writes:
    
    Here is another article or post [siliconinvestor.com] that has a relatively long lists of the improvements in Barcelona.
- Re: (Score:2)
  
  by mczak ( 575986 ) writes:
  
  * And separating integer and floating-point schedulers also accelerates this thing called virtualization
  Separate issue queues are good but does it specifically benefit virtualization? - no.
  True. Additionally, the article implies this is something new. All K8 chips (=Opterons, Athlon64) however always had seperate schedulers for float and int instructions (in contrast to the intel core2 chips, so amd is touting that as an advantage - it's more of a design choice than really a simple "better" or "worse" for either solution probably). There is a reason the codename of Barcelona is K8L! As you mentioned, it'
  - Re: (Score:2)
    
    by Bj�rn ( 4836 ) writes:
    
    There is a reason the codename of Barcelona is K8L! As you mentioned, it's certainly not somehow a completely new chip.
    From an article [theinquirer.net] in The Inquirer:
    "WE'VE BEEN HEARING the "K8L" codename for ages now, but we can say now, straight from the horse's mouth, K8L was never a codename for AMD's upcoming generation of chips."
    If we are to believe the article, K8L was apparently the code name for the Turion64 where the L stands for Low-power. K9 was the X2 processors, so that would make the upcoming Barcelon
Barcelona??? (Score:2, Funny)

by master_p ( 608214 ) writes:

Rumours have it that their next CPU model will be named 'Real Madrid'...
Paging Tables (Score:5, Informative)

by Doc Ruby ( 173196 ) writes: on Saturday February 10, 2007 @11:16AM (#17962778) Homepage Journal

Nested paging tables is a per-core feature that will light the afterburners on x86 hardware virtualization. A paging table holds the map that translates virtual memory addresses to physical memory addresses, and each CPU core has only one. Virtual machines have to load and store their page tables as they get and lose their slice of the CPU. AMD solved the problem with nested paging tables. Simplified, each VM maintains its own paging table that stays fixed in place. Instead of loading and saving paging tables as your system flips from VM to VM, your system just supplies Barcelona with the ID of the virtual machine being activated. The CPU core flips page tables automatically and transparently. This is another feature that's implemented for each core.

Context-switching has long been the weakest design point for x86 in "PCs", especially servers. x86 arch is rooted in single-user, single-threaded, single-context apps. The in-core registers that CPU operations execute directly against have to be swapped out for each context switch. In *nix, that means every time a different process gets a timeslice, it's got to execute two slow copies between registers and at best cache RAM, at worst offchip RAM (over some offchip bus). If the register count is larger than the bus width (even onchip), that's another multiple on that slow cycle. That context-switch overhead can be larger than the timeslice allocated to each process's "turn" in the schedule for lower-latency / higher-response (lower "nice") processes, approaching realtime.

Unix was designed for multiusers, context-switching from the beginning. The chips it's run on coevolved with it. Linux arrived when x86 CPUs ran fast enough that context-switching was OK, but still a big waste compared with, say, MicroVAX multiple register sets. Windows architecture is rooted in the x86 architecture that DOS was designed for, though perhaps Vista has finally lost all of the old design baggage originated in the 8088/8086, but its long history of UI multitasking means it's context-switching all the time, which will gain in speed. The MacOS switch to BSD means it's got lots of power bound up in the context switches that could be released with Barcelona.

So while low-level benchmarks might show something like 80% FPU improvement, the high level (application) performance could improve quite a lot more. Recompiling apps to machine code that exploits more registers without the context-switching penalties could find multiples, especially apps with realtime multimedia that run concurrently with other apps. Intel's hyperthreading already gets past some of these bottlenecks in distributing tasks among multiple cores, but the Barcelona paging tables go even deeper, for likely extra performance (on top of Barcelona's own hyperthreading and new L3 cache).

Aside from the marketing "vapormarks" we'll surely see out of AMD (and their sockpuppets) before it's actually released "midyear", I'm looking forward to seeing how this thing really runs in multitasking apps. I'm expecting "like a greased snake across a griddle".

Share
twitter facebook
- Re:Honestly... (Score:5, Insightful)
  
  by mabinogi ( 74033 ) writes: on Saturday February 10, 2007 @03:43AM (#17960600) Homepage
  
  I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail. I care about the results - how fast is it for my workloads? How much is it? How much power does it use?
  
  Obsession about process size is sillier than obsession over clock speeds.
  
  If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Khyber ( 864651 ) writes:
    
    I would care. POWER CONSUMPTION/EFFICIENCY. If I want a space heater, I'll stick with a 3.4 GHz P4 with HyperThreading. I DON'T WANT ONE. As it is, for what I like doing and for what I want to do, current-gen processors work just fine. I can play my games, make my music, draw shit, upload data, and check out sites like this, while maintaining my bank account, talk with other people, and more, at the same time. I got over the clock speed thing the second I actually owned a G3. Granted, Windows emulation su
    - Re: (Score:2)
      
      by Dahamma ( 304068 ) writes:
      
      You haven't really added much from your original post. Die shrink is an implementation detail, probably something you read that sounded "futuristic"... The real goals are performance, and (usually secondarily) power consumption. Doesn't really matter how they achieve those goals.
      
      I agree with you on one point - I think as with your requirements, the goal for the average non-technical home consumer should be focused more on efficiency than multi-core 64 bit 4MB cache, etc. But not everyone spends 95% of t
  - I neglected to mention something else... (Score:3, Interesting)
    
    by Khyber ( 864651 ) writes:
    
    Clock speed doesn't mean crap anyways. It's all in the code. I see guitar tuning programs for the computer... TEN megs in size, slow as hell, and inaccurate! I believe APTuner is FAR smaller than most, faster and far more accurate. People just don't know how to code, plus the fastest ways to code are copyrighted, which they shouldn't be since they'd be utterly obvious to any programer with that standard "ordinary" knowledge in that language, so one has to make workarounds that inevitably end up being slower
    - Re: (Score:2)
      
      by cduffy ( 652 ) writes:
      
      the fastest ways to code are copyrighted, which they shouldn't be since they'd be utterly obvious to any programer with that standard "ordinary" knowledge in that language
      
      An individual implementation can be copyrighted. A way of doing something can't be covered by copyright, and needs to be patented. That's what you meant, right?
  - Re: (Score:2)
    
    by suv4x4 ( 956391 ) writes:
    
    I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail.
    
    This is Slashdot. We care about those details. You can read more about the "super fast, super cool, super cheap!" market speak on the company's official press releases section.
  - Re:Honestly... (Score:5, Insightful)
    
    by epine ( 68316 ) writes: on Saturday February 10, 2007 @05:49AM (#17961160)
    
    If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
    
    Feature size has denominated progress (as measure either by raw performance or performance per watt) over an unbroken 30 year period. Do you recall the very passionate debates about RISC vs CISC? Did a RISC design at one feature size ever beat a CISC design at the next shrink? I think not. Design has never mattered anywhere near as much as feature size. Not that you can't get design wrong. But then you can get a shrink wrong, too, and end up with 1% yields. AMD managed briefly to remain competitive with Intel playing a full shrink behind when Intel did that rather stupid marketron-driven face-plant into the thermal wall (against good advice from their Israel team, who later came to the rescue with Core Duo).
    
    With the recent skyrocket of leakage current, the holy grail of feature size is somewhat tarnished, but it still dominates the performance curve. You completely missed the relationship between feature shrinks and the performance crown. If Intel has better process technology than AMD (almost always) and AMD has a better design (most of the time since the Athlon was first launched) and both companies shrink every 18 months following the Moore projection (that unbroken 30 year historical trend) and AMD always shrinks 9 months behind Intel, then the performance crown will pass back and forth exactly as often as either company announces their next product.
    
    So I agree with you: feature size has no importance to the customer who wants performance for their dollar. Except that you can set your clock by it and project ten years into the future effective performance levels of shrinks we haven't even seen yet. Except for that part, yeah, I'm with you.
    
    Parent Share
    twitter facebook
  - Re: (Score:3, Interesting)
    
    by jmv ( 93421 ) writes:
    
    If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
    
    They care. Just moving the chip from 65 nm to 45 nm means you can produce twice as much on the same silicon wafer. Also, if a 65 nm chip performs well, then a 45 nm version of it (with slight modifications of course) will work even better.
    - Re: (Score:2)
      
      by zCyl ( 14362 ) writes:
      
      Just moving the chip from 65 nm to 45 nm means you can produce twice as much on the same silicon wafer. Also, if a 65 nm chip performs well, then a 45 nm version of it (with slight modifications of course) will work even better.
      But how much does this really affect the retail price of a cpu? From randomly googling around, it looks like silicon wafer cost only translates to a dollar or two per cpu, so who cares if they can drop this expense by half? Surely other factors would be more important for cost, suc
  - Re: (Score:2)
    
    by Charcharodon ( 611187 ) writes:
    
    don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail. I care about the results - how fast is it for my workloads? How much is it? How much power does it use?
    The best way of looking at things. I started off Intel and stuck with them up to about 1Ghz, jumped ship stayed with AMD untill my X2 3800, now I'm back to Intel with a Duo 2 6600. We'll see in 1-2 years who'll I'll be with next. The same goes for video cards and soda. Peps
  - Re: (Score:2)
    
    by Deliveranc3 ( 629997 ) writes:
    
    I don't care if it's 65nm, 45nm or 10mm - that's a completely irrelevant (to me as a user and purchaser) implementation detail. I care about the results - how fast is it for my workloads? How much is it? How much power does it use?
    
    Obsession about process size is sillier than obsession over clock speeds.
    
    If AMD can produce a better performing chip at 65nm, then who the hell cares if Intel - or anyone else - move to a 45nm process?
    
    If you move to a smaller transistor size you get more processors per w
  - - Re:Honestly... (Score:5, Insightful)
      
      by mabinogi ( 74033 ) writes: on Saturday February 10, 2007 @04:38AM (#17960858) Homepage
      
      45nm is not inherently "better" than 65nm any more than 3Ghz is inherently "better" than 1Ghz. A smaller process size is a means to an end, it's not an end in itself.
      
      The end is the delicate balance of improving power / watt while increasing overall performance and keeping the price down. If AMD can deliver a chip that does a better job of that at 65nm than an Intel 45nm one, then the AMD chip is not somehow "worse" than the Intel one just because it doesn't use 45nm. That's just stupid.
      
      I'm not saying AMD can do that, but I think that criticizing them for not being ready for 45nm yet is more than premature.
      AMD's actually guilty of the same flawed logic though - their criticism of Intel's 4 core processor being just 2 dual cores stuck together is just as pointless. It doesn't matter what matters is how well the processor meets the requirements of its target market.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Interesting)
        
        by OrangeTide ( 124937 ) writes:
        
        Likely Intel has an edge because they are [almost] ready for 45nm process, while AMD is just getting started on 65nm.
        
        But it is interesting to see the two companies approach the problem from different ends. Do you improve the silicon process or do you alter the architecture and instruction set? I bet you the best answer will be to do both.
        
        quad cores that actually share cache would be nice. these double duals kind of suck because architecturally they can never share cache. although AMD and Intel don't have ve
      - Re: (Score:2)
        
        by l3v1 ( 787564 ) writes:
        
        AMD's actually guilty of the same flawed logic though - their criticism of Intel's 4 core processor being just 2 dual cores stuck together is just as pointless. It doesn't matter what matters is how well the processor meets the requirements of its target market.
        
        I don't think it's about that. I mean, since Intel quickly pumped out something which seems like being 4 core cpu which took far shorter than to develop a new quad core design cpu makes them seem to lag behind, so what can you do to explain ? Not mu
        
        Re: (Score:2)
        
        by Gr8Apes ( 679165 ) writes:
        
        This is way more than a mere quad core design. I was hoping to impart that. It is actually dropping some of the legacy x86 architecture internally, and adding big-iron features - the nested paging per core will be a huge plus for businesses that run multiple CPU machines with lots of virtualization for instance.
        
        The separated schedulers for floating and integer math allows for more parallelism, another speed up.
        
        The shared L3 and reduced latency L2 caches should put Barcelona ahead of Cloverton's split caches
      - Re: (Score:3, Interesting)
        
        by Targon ( 17348 ) writes:
        
        The problem I have with performance/watt is that it distorts the true "value" to the system owner. You NEED to break it down, because while power usage is important, the real issue comes down to "is the higher performance WORTH the extra power the chip draws". I personally don't CARE about performance/watt, except when the power draw is excessive, and I believe that is how MOST people will look at it.
        
        Most laptop processors have a higher performance/watt than desktop processors because they are designed
      - Quad Core (Score:3, Interesting)
        
        by Gary W. Longsine ( 124661 ) writes:
        
        Actually, from the important perspective of the difficulty of building a new machine around it, the Intel "dual-dual" core chips really are quad core -- they drop into the same socket as the previous dual core chip, placing four cores into the socket. That certainly helped speed the time to market for the chip.
- Re:Intel's Responds (Score:5, Interesting)
  
  by pchan- ( 118053 ) writes: on Saturday February 10, 2007 @04:31AM (#17960822) Journal
  
  "Lets make a Octa-core processor!"
  
  Oh, here's one [sun.com]. Though it's been out since before Intel had quad-core chips.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Informative)
  
  by Anonymous Coward writes:
  
  8 core (two quad core chips in a single package) is already on Intel's internal roadmaps.
  
  (this was anonymous for a reason)
- Re:If its true (Score:5, Funny)
  
  by Anonymous Coward writes: on Saturday February 10, 2007 @04:54AM (#17960940)
  
  Three quad cores for the pasty-nerds under the sky,
  Seven for the WoW-nerds in their halls of stone,
  Nine for Diablo Men doomed to die,
  One for the Dark Nerd on his dark throne
  In the Land of Silicon where the corporations lie.
  One quad core to rule them all, One quad core to find them,
  One quad core to bring them all and in the darkness bind them
  In the Land of Silicon where the corporations lie.
  He paused, and then said in a deep voice,
  This is the Master-quad core, the One quad core to rule them all.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by Khyber ( 864651 ) writes:
  
  Who says news can't come from an advertising section? It's stil a source of information.....
- Re: (Score:2)
  
  by OrangeTide ( 124937 ) writes:
  
  I think I would cry if the answer was "No".
- Re: (Score:2)
  
  by OrangeTide ( 124937 ) writes:
  
  well if your cpu ever gets powerful enough to do some sort of extremely computationally expensive compression (like with fractals or something?). maybe you could squeeze a little bit more out of your slow link?
  
  I like dial-up, nobody can call me (one phone line, disable call waiting), and I really only do IRC and text browsing. Honestly who wants to give the cable company or phone company $50 a month, those bastards are rich enough.
- Re: (Score:2)
  
  by drsmithy ( 35869 ) writes:
  
  Intel comes up with some hair-brained scheme that "More is better!". (like Viagra) They design something new and decide to make it faster (or in this case just glue more of them together). Back in the day it was the "GHz" now it's all about how many "Cores" you got. This tactic seems to suit Intel quite well and dethrones AMD for about a year and a half... During this time AMD massively redesigns there chips to integrate new, emerging technologies. The gamers and server operators of the world sit by their
  - Re: (Score:2)
    
    by RightSaidFred99 ( 874576 ) writes:
    
    Yeah, that is pretty cute. I know _I_ certainly want to claim that my CPU is more "technologically advanced" and don't really care about performance. I'm sure server operators are the same way. I'd take the most technologically advanced CPU in the world even if it only ran at 80286 speeds, because then I get nerd bragging rights!!
    "Hey, boss, we need to by another 100 machines to support these validation runs. Or we could buy 80 machines of this other brand which will accomplish the same thing and save
- Re: (Score:3, Interesting)
  
  by canuck57 ( 662392 ) writes:
  
  I will not surprised if AMD dethrones Intel again. It is a classical Intel vs. AMD battle...
  I am not sure Intel ever did beat out AMD.
  I went down to Best Buy where the Intel rep was hard peddling a Code Duo 2 machine and compared his $1500 machine to a AMD X2 clearance one for $600. I had nothing to do that day but be a clown, so I went and got a DVD with software on it, and said these are both XP right? Copy the contents to the hard drive and compress it. I am going to measure it. Core Duo 2 result

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Bit Slice (Score:2)

But SSE is already 128 bits! (Score:2)

Re:But SSE is already 128 bits! (Score:5, Informative)

Re: (Score:2, Interesting)

Re:But SSE is already 128 bits! (Score:5, Informative)

Re:But SSE is already 128 bits! (Score:5, Informative)

Comment removed (Score:5, Informative)

Re: (Score:3, Interesting)

Re: (Score:2)

Dethrone? No. (Score:2, Insightful)

Re: (Score:2, Interesting)

GPU not CPU - Re:Dethrone? No. (Score:3, Insightful)

Is dethroning Intel the point? (Score:5, Insightful)

Huh? (Score:2)

Well.... (Score:2, Interesting)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re:Well.... (Score:4, Interesting)

Re:Well.... (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

AMD64 is very fast (Score:5, Interesting)

Re: (Score:3, Informative)

Re:AMD64 is very fast (Score:5, Informative)

Show us your source code (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:AMD64 is very fast (Score:4, Insightful)

Re: (Score:2)

perfectly fine for a CPU benchmark (Score:2)

Re: (Score:3, Interesting)

32 vs 64 (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Java is slow on x86 (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:AMD64 is very fast (Score:4, Informative)

Re: (Score:2)

that's what we all run though, and it can be OK (Score:3, Interesting)

vector units mostly sit idle (Score:2)

Re: (Score:2)

Re: (Score:2)

Wow (Score:2)

how do they fit a fourwheeler in the chip? (Score:2, Funny)

SSE128 means... (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Great (Score:3, Funny)

Barcelona vs Itanium in single and double float ? (Score:2, Interesting)

SUNDAY SUNDAY SUNDAY (Score:2)

Junk article, full of inaccuracies. (Score:5, Informative)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Barcelona??? (Score:2, Funny)

Paging Tables (Score:5, Informative)

Re:Honestly... (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

I neglected to mention something else... (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)