Forgot your password?
typodupeerror
IBM Upgrades Hardware

IBM Mainframe Running World's Fastest Commercial Processor 158

Posted by timothy
from the overclocked-with-pencil-lead-of-course dept.
dcblogs writes "IBM's new mainframe includes a 5.5-GHz processor, which may be the world's fastest commercial processor, say analysts. This new system, the zEnterprise EC12, can also support more than 6-TB of flash memory to help speed data processing. The latest chip has six cores, up from four in the prior generation two years ago. But Jeff Frey, the CTO of the System Z platform, says they aren't trading off single-thread performance in the mainframe with the additional cores. There are still many customers who have applications that execute processes serially, such as batch applications, he said. This latest chip was produced at 32 nanometers, versus 45 nanometers in the earlier system. This smaller size allows more cache on the chip, in this case 33% more Level-2 cache. The system has doubled the L3 and L4 cache over the prior generation."
This discussion has been archived. No new comments can be posted.

IBM Mainframe Running World's Fastest Commercial Processor

Comments Filter:
  • ...gives me a bit of a cognitive dissonance sensation. It shouldn't, really, but it does. Is it just me?
    • by Anonymous Coward on Tuesday August 28, 2012 @11:37AM (#41149795)

      Mainframes run a surprising amount of critical workloads in the real world. They're vastly different than open systems, but they can be kept running through almost anything, if you're willing to spend enough money.

    • by gstoddart (321705) on Tuesday August 28, 2012 @01:30PM (#41151895) Homepage

      ...gives me a bit of a cognitive dissonance sensation. It shouldn't, really, but it does. Is it just me?

      It may not be just you. But I think a lot of people really have no idea of just how many mainframes are still chugging away doing what they've always done.

      My wife does outsourced SAN storage, and they still have a couple of clients with big iron running.

      Every couple of years when everybody has forgotten about the machines, an IBM tech will call up and say that the machine has phoned home and has a part that needs to be swapped out and that he needs to go onsite. Which usually leads to several hours of people trying to remember what it is and where it is (except the guys who work in the data center, who can't miss it).

      I've worked in several places that have had mainframes for literally decades. And I've even worked on a project or two which tried to replace ancient, purpose built software with some shiny new stuff. In the cases I've seen, after spending a few years a a few million dollars ... they still can't replace the mainframe and scrap the project.

      I knew someone in the early 2000's who had retired from his job with a full pension, and was back as a consultant making at least 3x his old salary because they no longer could find someone who knew the machines and the software like he did.

      Mainframes haven't gone away. Not by any stretch. And I bet this one still runs the stuff from the IBM 360 days quite nicely.

      • by mlts (1038732) *

        Mainframes do a bunch of tasks extremely well. The problem is that there is a "cheapest at any cost" mentality in IT, which is why this type of technology seems to be outmoded.

        If businesses looked at the TCO of a mainframe, oftentimes, they would be better off, especially because of the CPU power per square foot of server room space, which a mainframe excels at. This is also true to a lesser extent with the higher end Oracle SPARC and IBM POWER7 machines.

        The one advantage of mainframes is that once set up

  • by afidel (530433) on Tuesday August 28, 2012 @11:17AM (#41149487)

    How does the L4 cache in these processors work? Generally going to anything off die is going to induce a major latency penalty due to the need to go through a driver stage which can handle outside interference. How can they make the L4 cache fast enough that its small size doesn't make it basically pointless versus just going to main memory?

    • Small is a relative term. The L4 cache is almost 200 Mb on these. Of course, it all depends on the how the math works out. As long as it's faster than going to RAM there will be plenty of situations where it pays off.

    • by sjames (1099)

      The outboard cache is off the die, but within the same ceramic module. That means no sockets in the way and very short connecting wires. The cvache chips themselves can also be faster than the chips you'd find on a DIMM.

  • Ming Mecca (Score:5, Funny)

    by unixhero (1276774) on Tuesday August 28, 2012 @11:21AM (#41149541)
    That's a Ming Mecca chip. Those aren't even declassified yet!
    • I'm not sure the mainframe crowd will know this pop culture reference, and may end up thinking of the guy with a pointy beard from Flash Gordon.
      • by Jeng (926980)

        Or they will be like me, be interested, look it up, and then laugh.

        I've since added the movie to the queue.

        Then again I'm not part of the mainframe crowd. Damn cool kids with their expensive toys.

  • So it was my understanding that part of the reason consumer CPUs didn't tend to go above 3-4GHz was that, at those speeds, the electrons can't actually move through the wires fast enough. Specifically for doing memory reads -- at 5.5GHz, I'm calculating about 4cm per clock cycle -- which may be further than the memory is physically located on a normal desktop PC. Meaning it would take not just two, but possibly three or four clock cycles to read a value from main memory.

    Granted, on a server, main memory may

    • by Anonymous Coward on Tuesday August 28, 2012 @11:36AM (#41149769)

      CPUs have not accessed main memory synchronously in decades. There are many hundreds of cycles lost if the processor stalls on a RAM access, not just from the length of the wiring but the addressing logic too. In fact, modern CPUs don't do word-level access to RAM, but rather pull in whole cache lines in a more packetized memory access protocol. Even in a multi-CPU SMP system, they don't actually communicate through system RAM anymore, but rather communicate CPU-to-CPU with a cache coherency protocol that provides the illusion of a shared system RAM. Each CPU really has its own set of local RAM behind its own cache and on-chip memory controller.

      Even the L2 or L3 caches are unable to keep up with the CPU, but they are still significantly faster than system RAM, so they still help when the working set can fit there.

      • by Rockoon (1252108) on Tuesday August 28, 2012 @12:19PM (#41150461)
        To add to this, the Sandy Bridge has an L1 latency of 4 or 5 cycles (depending on access mode), the L2's latency is 12 cycles, and the L3's latency is 46 cycles plus the response time of the memory chips (typically between 60ns to 70ns)

        These chips make up for the high latencies by having many instructions being executed simultaneously, so if one dependency chain completely stalls out on a cache miss any other dependency chains can still fill up the execution units keeping the processor just as busy as if there were no stall at all until everything left in the pipeline is dependent on the result of the stalled out operation.
        • Yes, everybody does that (out-of-order execution, pipelining, etc., etc.) And then...you still need to keep the CPU well fed to boost performance. Enormous 4-level caches help do that. Having a continuous 5.5 GHz clock speed is also quite helpful. So is having 101+ cores that can access the same cache rather than, say, 8 such cores. And at least a couple hundred (at least) other IBM performance tricks, many of which cost money to deliver and thus probably won't find their way into save-a-nickel parts of the
        • To add to this, the Sandy Bridge has an L1 latency of 4 or 5 cycles

          Wow, I thought L1 cache was much faster. Thank God there are 16 registers.
          But it looks that passing arguments in the stack is moderately expensive for a small function. Am I right?

  • Except.... (Score:2, Informative)

    Except the 5.5GHz may not be all that fast, as the Z-line of CPUs are the old IBM 360 instruction set, which is is so large, complex, and baroque that it is mostly usually implemented through a thick layer of microcode.

    So 5.5GHz may be the speed of the microcode level, the actual "machine instructions" may be a considerable sub-multiple of that.

    • Re:Except.... (Score:4, Interesting)

      by BBCWatcher (900486) on Tuesday August 28, 2012 @12:24PM (#41150543)
      No, that's not a correct supposition -- quite the opposite, actually. All processors, including Intel X86, use microcode (or what IBM calls millicode) to a degree. IBM knows it well. After all, they invented microcode/millicode in the System/360 in 1965. But IBM uses microcode comparatively less nowadays than other processor architectures. The vast majority of zEC12 instructions are implemented entirely in hardware, including IEEE-754-2008 decimal floating point as an example. There's some really, really interesting new stuff in the instruction set, like the first transactional memory ("transaction execution facility") instructions in a commercial server, and some "feedback" instructions that can tell Java applications/the JVM how to dynamically tune itself in a live running environment. Very cutting edge -- so cutting edge I've got to crack open some engineering manuals to try to figure out what they've done, although they probably need to write those manuals.
      • Re:Except.... (Score:5, Informative)

        by Guy Harris (3803) <guy@alum.mit.edu> on Tuesday August 28, 2012 @01:23PM (#41151709)

        No, that's not a correct supposition -- quite the opposite, actually. All processors, including Intel X86, use microcode (or what IBM calls millicode) to a degree.

        At least from what I've read about the past few generations of S/3x0 chips, millicode is more like PALcode on the Alpha processor than like traditional microcode, i.e. it's a combination of regular machine code and processor-specific instructions that access specialized registers etc., running in a special processor mode with (presumably) fast entry and exit, support for said processor-specific instructions (which presumably trap in either both "problem state", i.e. user mode, and "supervisor state", i.e. kernel mode), and its own bank of general-purpose registers (part of the "fast entry and exit"). Instructions implemented in millicode trap to millicode routines that implement them.

        What IBM called "microcode" rather than "millicode" was implemented using processor-specific instructions completely different from the machine's instruction set (instructions often having fields that directly controlled gates).

        (And then there's System/38 and the pre-PowerPC AS/400, where the processor instruction set was a CISC instruction set implemented using microcode, and where the compilers available to customers generated code in an extremely CISCy instruction set [ibm.com] that the low levels of the OS translated into machine code and ran. For legal reasons - they didn't want to have to be required to make the low-level OS code available to "plug-compatible manufacturers", i.e. cloners - they not only called the microcode that implemented the processor instruction set "microcode" ("horizontal microcode", as it probably was "fields directly control gates"-style horizontal microcode), they also called the aforementioned low level OS code "microcode" as well, even though it ran from main memory and its instruction set was the instruction set that was actually executed in application code ("vertical microcode"), and had the group working on that code report to a manager in the hardware group. See Frank Soltis's Inside the AS/400.)

        IBM knows it well. After all, they invented microcode/millicode in the System/360 in 1965.

        "Invented", no; the paper generally considered to have introduced the concept was "Microprogramming and the Design of the Control Circuits in an Electronic Digital Computer" [microsoft.com], by Maurice Wilkes and J. B. Stringer, from 1953. S/360 may have been the first line of computers to use microcode in most of the processors (S/360 Model 75 was, I think, implemented completely in hardwired logic).

        Very cutting edge -- so cutting edge I've got to crack open some engineering manuals to try to figure out what they've done, although they probably need to write those manuals.

        Well, for the previous generation, there's Volume 56, Issue 1.2 of the IBM Journal of Research and Development [ieee.org] has some papers on the z196, but, alas, not for free online. They may publish an issue on the zEC12 at some point.

  • How about a price list in TFS for budget planning?
  • We could have gotten some meaningful benchmarks. According tho this Register arcticle [theregister.co.uk]

    When you add it all up, the single-engine performance of a z12 chip is about 25 per cent higher than on the z11 that preceded it. IBM has not released official MIPS ratings (a throwback to the days when IBM actually counted the millions of instructions per second that a machine could process) for the z12 engines, but given that the top-end core in a z11 processor delivered 1,200 MIPS, that puts the z12 core at around 1,600

    • Arrgh
        That last sentence should be "On the other hand, if your old and creaky code can't be divvied up among a multiplicity of cores, the existence of a far cheaper 64 core, 8 way Nehelem EX machine (or its current equivalent) that's only twice as fast as a single zEC12 core shouldn't much matter."

    • OK, now go license 64 cores of Oracle DB (for example) and get less performance than one core on a zEC12, as you say. I'll help you out: you'd probably pay about $1.5M in database software licensing plus $300K+ in annual maintenance for your 64 X86 cores versus $47K and $9.4K on a zEC12 core. And that's one cost factor among many, not the only one. So which server is "cheaper"? Is a bicycle cheaper than a truck? (Not an Olympic racing bicycle, probably.) It depends on what you're trying to do. Though I've n
      • by bored (40072)

        Those are emulated cores under hercules. If you were running oracle you would run them natively on the cores, and in that case its closer to 1:1 with the mainframe.

        • by Wovel (964431)

          Huh? I think his Orale example misses the point, but your statement is a bit off. In any case. I doubt you see a lot of people going the Orale on Z route.

    • by bws111 (1216812)

      The numbers provided by TurboHercules are most certainly complete bullshit. Actual MIPS are determined by running standard benchmarks against simulated workload. A 1200 MIPS machine is going to be driving a whole lot of I/O in those benchmarks, and there is no way that Hercules emulated processor and emulated I/O is going to be able to pull that off.

      If they didn't test with IBMs standard LSPR tests, their numbers are useless.

  • by noname444 (1182107) on Tuesday August 28, 2012 @12:54PM (#41150989)

    I'll believe their claims when I see some test results they can back it up with.

    • Only test results? (Yes, 5.5 GHz is fast. A test -- or even a spec sheet -- will tell you that.) But aren't real world results more useful? Go visit any large bank's (for example) data center if they'll let you. How many transactions, how much batch, etc. (and concurrently) do they push through their (one or two) IBM mainframe(s)? And has it ever quit? Is it secure? Does it...work?
  • The article seems to have typo-ed in the editing phase. The technology is "Cache Express" not Flash Express. Flash memory is SLOOOOOwwww memory. Do a Google for "IBM L2 Cache Express" if you are interested.

    With flash memory you read a block, flip some bits, and write it back to modify that block. Not only that, but Flash memory will wear out after so many reads and writes. That would be devistating to a CPU.

    • No, no typo. There's indeed Flash Express -- and yes, IBM's engineers have figured out a way to add yet another memory tier using (very high quality) flash memory. The processor can directly address it -- it's all mapped within the 64-bit virtual address space from what I've read. Yes, it's slower than DRAM but it's faster than storage-attached SSD (which at least has a longer distance to travel). Flash Express is great for things like paging, memory dumps, gigantic in-memory databases, and certain things t

Today's scientific question is: What in the world is electricity? And where does it go after it leaves the toaster? -- Dave Barry, "What is Electricity?"

Working...