Forgot your password?
typodupeerror
IBM Hardware

IBM Unveils Fastest Microprocessor Ever 292

Posted by samzenpus
from the greased-lightning dept.
adeelarshad82 writes "IBM revealed details of its 5.2-GHz chip, the fastest microprocessor ever announced. Costing hundreds of thousands of dollars, IBM described the z196, which will power its Z-series of mainframes. The z196 contains 1.4 billion transistors on a chip measuring 512 square millimeters fabricated on 45-nm PD SOI technology. It contains a 64KB L1 instruction cache, a 128KB L1 data cache, a 1.5MB private L2 cache per core, plus a pair of co-processors used for cryptographic operations. IBM is set to ship the chip in September."
This discussion has been archived. No new comments can be posted.

IBM Unveils Fastest Microprocessor Ever

Comments Filter:
  • by TaoPhoenix (980487) <TaoPhoenix@yahoo.com> on Thursday September 02, 2010 @08:01AM (#33447790) Journal

    So what is this beast supposed to be, a 64 core machine?

    Didn't we retire the Ghz wars 5 years ago? I know, AMD style "more done per cycle", but isn't a quad core 3.1 Ghz per chip with 20% logistic overhead faster?

  • by Vectormatic (1759674) on Thursday September 02, 2010 @08:15AM (#33447912)

    Intel's netburst architecture (of pentium 4 fame) featured the 'Rapid Execution Engine', which consisted of two ALU's running double the clock speed, on 3.8 GHz Pentium 4's, that would be 7.6 GHz

    Granted, that is not the entire cpu, but still..

  • Re:Microchip? (Score:3, Interesting)

    by the_fat_kid (1094399) on Thursday September 02, 2010 @08:19AM (#33447944)

    iChip?

  • by Carewolf (581105) on Thursday September 02, 2010 @08:24AM (#33447998) Homepage

    BTW, TFA mentions L1 cache per core but doesn't mention how many cores this chip scales up to. Could it be just one?

    It later mentions using 128Mbyte just for level 1 cache, so that would be around 1024 cores.

  • by Sycraft-fu (314770) on Thursday September 02, 2010 @08:51AM (#33448228)

    These days, compilers take care of almost everything. It has gotten complex to the extent that a programmer trying to do things all in assembly will probably do a worse job than a good compiler. Chips have many, many tools to solve their problems.

    That isn't to say it is never done, in some programs there may be some hand optimized assembly for various super speed critical functions. However even then it is most likely written in a high level language, compiled to assembly (you can order most compilers to do that), tuned and then put back in the program.

    Memory is cheap and compilers are powerful so assembly is just not as needed as it once was, at least on desktops/servers where you see these massive chips.

  • by Anonymous Coward on Thursday September 02, 2010 @10:01AM (#33449382)

    But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.

    I'm betting the code used on these z196 systems is multi-threaded. Shit, if you're paying hundreds of thousands of dollars per CPU you can afford some top notch programmers.

    Actually I think this mainframe is for getting the last little bit of performance out of thirty year old cobol code. And the original top notch programmers are long dead.

    A lot of people think mainframes are "faster" than PCs. They aren't. Modern mainframes use microprocessors just like PCs, but generally somewhat slower, since mainframes aren't intended to be bleeding-edge. What really makes a mainframe a mainframe is attitude - none of this "reboot and start over" and throughput. Mainframes are optimized for doing lots of concurrent I/O.

    However, recent trends in mainframes have also had some additional considerations. One is virtualization. You can have many - in some cases thousands - of Linux VMs in the system. The other is Java. IBM would dearly love to sell you Java on a mainframe. And considering what a dog WebSphere can be, it almost demands a mainframe.

    Forget the top-notch programmer nonsense. The watchword on software these days is "Git 'er Dun!". And IBM is hardly setting a standard, considering how much cheap offshore labor they use internally.

  • by asliarun (636603) on Thursday September 02, 2010 @10:10AM (#33449556)

    "clockspeed is NOT related to throughput"

    Of course it is. It is not, however, the only factor, and other factors may indeed (and commonly do) outweigh it.

    You took my comment out of context. I was responding to the original post that focused purely on clockspeed as a magic mantra. What you say is only true if you are talking about clock speed increase in the same microarchitecture, ceteris paribus. Making a blanket claim that we have the fastest CPU because we have clocked it at 5GHZ means nothing. I could overclock a P4 to 5GHZ using exotic cooling and my laptop would still probably beat it in terms of performance.

    I think you underestimate IBM's technical ability. They do have some idea of what they're doing.

    Of course they do. I wasn't talking trash about the chip. The point I was trying to make is that the days of exotic chips and boutique chip manufacturers are getting over, at least in the mainstream server space. IBM is just trying to be performance competitive and retain the mainframe server niche. If you notice the trend in servers, commodity servers are becoming more powerful and stable at a much faster rate than niche servers.

    Having said this, performance may not even be the most important consideration in large servers. Other factors like stability, ability to handle failures, platform, etc. are probably much more important. I suspect that sensationalized headlines like this are only a marketing ruse and meant for boasting rights.

    This is not to take anything away from IBM, I'm just making a comment on the overall trend and where this will eventually lead.

    That's like saying a Ferrari is a poor performance car because it can't compete against a Ford Focus on cost-per-max-speed or miles-per-gallon.

    Sorry, wrong analogy. I was actually being cautious when I said this since I hadn't really seen any benchmarks. Even on pure performance, I am not too sure if the IBM chip will really trounce the upcoming CPUs from Intel and AMD.

  • by gorzek (647352) <gorzek.gmail@com> on Thursday September 02, 2010 @11:25AM (#33451170) Homepage Journal

    Yeah, it's actually kind of funny how today's Intel desktop processors actually trace their lineage to the Pentium M, which was a mobile chip. When the Pentium 4 came around, the Pentium Pro (Pentium II, Pentium III) architecture was pretty much relegated to the mobile market while Pentium 4 represented their desktop line. As you said, they ran into heat (and power) issues with the Pentium 4s and basically had no more room for expansion there. They went back to the Pentium M, which was doing pretty nicely in the notebook space, and since it was low-power and efficient it became the basis for their future desktop CPUs--the Core line, in particular. They just stopped playing up the clock speed because that architecture's clock speeds were substantially lower than the Pentium 4, despite being able to do more work. I read once that a Pentium M could do about 40% more work than a Pentium 4 of the same clock, so in essence a 2GHz Pentium M was about as powerful as a 3.2 GHz P4.

    Switching everything over to the low-power and parallel-friendly Pentium M line is probably one of the smartest things Intel ever did. They would've dug their own grave had they stuck with building on Pentium 4 to the bitter end.

  • by Anonymous Coward on Thursday September 02, 2010 @12:02PM (#33451934)

    There is a bank in Canada that had zero (as in none) downtime -scheduled or not- in twelve years....That includes hardware upgrades, software upgrades, application upgrades ..... This is what a mainframe is all about

  • by David Greene (463) on Thursday September 02, 2010 @05:45PM (#33457664)

    A couple of things:

    In the first example, 'm' is not being moved to the constant data section. The constant vector being assigned to m is placed there. MSVC is missing the vectorization, not placement of constants into constant memory. You can see that it fetches the constant values from memory using scalar moves while gcc and icc use vector moves.

    I'm not familiar with MSVC switches but you might need to tell it explicitly to vectorize. I'm curious why you didn't try -ftree-vectorize with gcc, for example.

    Floating-point optimization is a tricky thing. Many compilers will be very conservative to retain bitwise equivalent results regardless of optimization level. Some will even go as far as maintaining bitwise equivalence between scalar and vector code. That can severely degrade optimization. Again, most compilers have a switch to enable "unsafe" floating-point optimization. This may be what's tripping up these compilers in some cases.

    NaNs are also an issue with floating-point. The compiler is not allowed to eliminate anything which might raise an exception.

    When encountering intrinsics, many compilers will do exactly as you say, as noted in the article. That's not a bug, it's a feature. When people use intrinsics, they usually are trying to hand-code something and often don't want the compiler to mess with it.

    Some of these tests (the shuffle one for example) are a little out-of-the-ordinary. Compiler developer time is at a premium and it's not worth doing these kinds of micro-optimizations if such code is never seen in the wild. That said, it's clear the some compilers (gcc, for example, and LLVM) do these sorts of things.

    On x86, it's often just fine to spill things to the stack and reload them. My studies show that the number of spills does not matter so much but rather what is spilled. So the number of loads/stores, while a gross indicator of performance, doesn't tell the whole story.

    The comparison test is, I think, one of those cases not worth optimizing. I can't recall ever seeing a vector compare where the operands are known statically. Doing that optimization would require loading static vectors of various combinations of 1s and 0s from memory. It is almost certainly faster to just do the compare. This isn't a missed optimization. In gcc's case it's the compiler doing what it should, regardless of what the programmer expects.

    Even so, these are interesting code examples. It would be neat to see what happens when we turn on -ftree-vectorize, use a newer gcc or try LLVM.

Put no trust in cryptic comments.

Working...