Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Intel Supercomputing Hardware

Intel Launches 72-Core Knight's Landing Xeon Phi Supercomputer Chip (hothardware.com) 179

MojoKid writes: Intel announced a new version of their Xeon Phi line-up today, otherwise known as Knight's Landing. Whatever you want to call it, the pre-production chip is a 72-core coprocessor solution manufactured on a 14nm process with 3D Tri-Gate transistors. The family of coprocessors is built around Intel's MIC (Many Integrated Core) architecture which itself is part of a larger PCI-E add-in card solution for supercomputing applications. Knight's Landing succeeds the current version of Xeon Phi, codenamed Knight's Corner, which has up to 61 cores. The new Knight's Landing chip ups the ante with double-precision performance exceeding 3 teraflops and over 8 teraflops of single-precision performance. It also has 16GB of on-package MCDRAM memory, which Intel says is five times more power efficient as GDDR5 and three times as dense.
This discussion has been archived. No new comments can be posted.

Intel Launches 72-Core Knight's Landing Xeon Phi Supercomputer Chip

Comments Filter:
  • LOL ... Crikey ... (Score:5, Insightful)

    by gstoddart ( 321705 ) on Monday November 16, 2015 @11:08PM (#50945011) Homepage

    So, somewhere someone at AMD is going "fuck it, we're going to 128 cores".

    Damn ... that's a crap pile of cores ... that's like, Skynet in a box or something.

    The mind reels.

    • Yep, but it'll be 128 integer cores with 64 floating-point cores, and someone will take them to court over it... because... butthurt.

    • by AK Marc ( 707885 )
      And someone, somewhere is muttering "256 cores should be enough for anyone".
    • Damn ... that's a crap pile of cores

      IMHO the only metrics that aren't subjective are transistor count and process tech...

  • I want one of these in my next notebook....
  • by Required Snark ( 1702878 ) on Monday November 16, 2015 @11:22PM (#50945091)
    If you watch the video in the linked article is is 100% buzzword marketspeak with zero information content. Disruptive technology blah blah integration blah blah innovation blah blah software continuity blah blah...

    It is probably a good chip for it's niche, so you would think they would have less bloviation in their intro video. If this was anyone else I would assume they were mostly trying to fleece more investors before they inevitably went belly up. It's so bad that major league sports style animation with yelling pitchman and a pounding beat would be an improvement. That bad.

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Better article [anandtech.com].

      Also summery is wrong: its on the 14nm process (the previous gen one was 22).

      Really the memory looks like the only interesting thing here.

  • So what exactly is the real world application of such a beast? Are there that many x64 based supercomputers out there?
    • They pretty much all are, Los Alamos, Blue Waters, Oak Ridge National Laboratory, all the medium to long term weather forecasting I know of, etc.

      Generally 10,000 to 100,000+ nodes.

      • With the exceptions of the IBM machines, which held most of the top spots in 2012. The LLNL one is still #3... http://www.top500.org/list/201... [top500.org] and the ANL #5. The general idea was cheaper processors that doesn't do pipelining, but OTOH the cost of them is massively decreased and they all had an instruction unit capable of two threads per core so the onus is on the programmer to make sure that the alu was kept constant fed.
      • by JanneM ( 7445 )

        The Japanese K built by Fujitsu uses Sparc64.

    • by gl4ss ( 559668 )

      raytrace version of wolfenstein, pretty much.

      cost effectivity for other uses.. well..

    • by Jeremi ( 14640 ) on Tuesday November 17, 2015 @12:51AM (#50945443) Homepage

      Bitcoin mining, of course -- it may not be as fast as a similarly-priced GPU farm, but the coins it creates will be of the highest possible quality and workmanship.

    • by dbIII ( 701233 )
      Heaps of x86 based supercomputers, but these are a bit bandwith limited by the bus they are plugged into and how chatty these things are.
      For applications when a working dataset is small enough that you can fit it on these cards they are apparently very good. If you need to shift a lot of stuff in from main memory on frequent occasions they are not and the AMD systems hooked together with infiniband look a lot better. For things that benefit from a huge amount of shared memory (2TB plus onboard and 160 cor
    • Does it execute x86 code? Does it support virtualization? I guess you could use it then to host lots of Linux VMs.

      • It does execute x86. However, I'm pretty sure that the VM hypervisor would need to be tailored to use these, and your memory bandwidth is severely reduced because it sits on a PCI-e link.

        These things are made for the same workloads that people use CUDA and OpenCL for. Seriously parallel processing with small-ish data sets.

    • So what exactly is the real world application of such a beast?

      All of the things where you really, really wish that you could do GPU offloading, but can't because you have diverging flow control and the GPU version ends up coming nowhere near to the theoretical peak performance of the hardware. The Xeon Phi cores are pretty simple, but there are loads of them, they have real branch prediction and caches (so handle the same kind of workloads as normal CPU cores, just a bit slower) and have fairly beefy vector units (so when they're running in a straight line they're ac

      • The architecture isn't really that different from a GPU, whatever Intel might try to make you believe. It has 512 bit vectors, compared to 1024 bit vectors on NVIDIA, so it's slightly less hurt by divergent flow control, but only slightly. The theoretical maximum teraflops (8 for single, 3 for double), are pretty similar to what NVIDIA is claiming for the just announced M40.

        And don't forget, Intel massively hyped the first generation of MIC, and it then turned out to be next to worthless. Hopefully they'

    • So what exactly is the real world application of such a beast? Are there that many x64 based supercomputers out there?

      The two fastest supercomputers in the world are x86_64 based, as are in fact all but three of the top ten.

    • by LWATCDR ( 28044 )

      " Are there that many x64 based supercomputers out there?"
      Including the number one and number two on the top 500 list.

  • So Intel is top dog given that nVidia is only producing 2.3Tflops, right?
    I guess AMD gave up on HPC. If I read the wiki right, their top card does 0.1Tflops
    • Guess I read it wrong. FirePro S9170 produces 2.6Tflops.
    • by jon3k ( 691256 )
      I think the real question is FLOP/Watt. I really don't know how the two will stack up. Might also depend on whether or not the stream processors in nvidia gpus are better suited to the workload than x86 cores?
      • by dbIII ( 701233 )
        That was what Transmeta thought but their customers didn't. FLOP/$ is what matters more (sometimes) since the power bill over a lifetime is going to be less than the difference in price between a mid range AMD system (64 cores ~$10k) and a top end Intel system (80 faster hyperthreading cores ~$80k).
        • FLOPS/Watt matters a lot to the customers of this kind of thing. When you're spending a few tens of millions on the supercomputer, you really don't care what the CPUs or accelerator cores cost. You do care about power consumption though, because that translates to cooling requirements and directly limits how big a system you can build.
          • by dbIII ( 701233 )
            So please explain why Transmeta didn't take off despite aiming directly for that metric and why there are so many power hungry Xeons out there.
            You can't explain it?
            You don't know?

            Note to posters - please do not counter specific examples with a gut feeling - it makes you look like an idiot.
            • Did someone urinate in your cereal this morning?

            • So please explain why Transmeta didn't take off despite aiming directly for that metric

              and good performance, there was no reason to buy Transmeta processors. Note that they're not actually dead: nVidia bought Transmeta and used their ideas in their Project Denver ARM SoCs, which are selling pretty well now, in a different market where performance-per-Watt does matter.

              and why there are so many power hungry Xeons out there.

              There aren't. Xeons are so popular precisely because they give you the best performance within a given power envelope that you can currently buy (unless you're willing to go with custom accelerators or less general cores such a

              • by dbIII ( 701233 )
                With respect Mr gut feeling your ramblings are what is known as secondary considerations, as you would know if you did a bit more than guessing and going with the first thing that sounds right.

                Xeons became really popular once they started beating Opterons in performance per Watt

                They are already popular despite that not happening yet. Wrong guess. Maybe try something other than a guess next time?

                • They are already popular despite that not happening yet. Wrong guess. Maybe try something other than a guess next time?

                  Where are you getting your numbers from? That's why we bought them, and it's why the companies that I talk to who buy them in lots of a thousand buy them. In the P4 days, we were buying Opterons almost exclusively. I think it's been five years since we last bought one.

          • It may matter a lot, but it's not the most important metric.

            If you're building a supercomputer, you're building it to calculate shit, and to calculate it as accurately as possible, as fast as possible. You design the computer first, and then the facility to house it after the design is done.

            Someone dropping tens of millions on a super computer isn't going to say "well, we already have this room here that can handle X watts of heat, so design your computer to simulate global weather patterns / thermonuclear

      • FLOP/watt calculation is bounded by time and operating budget.
    • by Junta ( 36770 )

      nvidia is pumping out 2.9 TFlop DP on their K80 (on paper). Of course on paper the numbers are as good as imaginary (across the board, Rpeak has been more and more a fantasy over time).

    • You must be looking at one of the low end NVIDIA GPUs. Tesla K40 gets 4.29 teraflops. Tesla M40 (just announced last week), supposedly gets 7.

  • McRAM?

    Yes, I would fries with that.

  • by zoid.com ( 311775 ) on Monday November 16, 2015 @11:48PM (#50945211) Homepage Journal

    I've been asleep for 20 years so I guess CISC won?

    • Re:CISC? (Score:4, Informative)

      by ndykman ( 659315 ) on Tuesday November 17, 2015 @12:39AM (#50945397)

      Kind of. The advantages of RISC faded pretty fast. The footprint of a decoder between something like x86 and say, ARM is really not that much, and a decoder is just a small part of a core these days. Clock speed is an issue of thermal footprint. So, all the disadvantages of the x86 (and it's x64 extensions) faded in the face of Intel's focus on process improvements. In the end, not even the Itanium could eek out enough of a win to dethrone the x86 architecture.

    • Neither CISC or RISC won.

      Data-driven design won out over faith-based instruction set architecture.

      • I would not call it faith ;) There where compelling reasons that once all processors where CISC.
        Considering that x86 is the only majour CISC processor left, and is translating internally CISC instructions into sets of RISC instructions before they get executed and considering that everything else, that is bigger than an 8 or 16 bit micro controller is RISC (Arm, Mips etc.) I would say: RISC has won.

    • by LWATCDR ( 28044 )

      No x86/x64 won.

    • The complexity of the instruction set matters very little when you can just cache the decoded instructions in the processor. Intel solved that with Pentium Pro in 1995. By using ever-decreasing fabrication processes, they have die space to heap tons of cache in there - I think the current Xeons are somewhere around 2MB/core of cache...

      So your 20 year nap is just about right.

  • ...but I suppose 640 kilocores should be enough for anybody.

  • So how fast can it calculate a Knight's Tour [wikipedia.org]

  • Or be able to load Windows 11?

  • Excellent! Now we'll be able to process even more bullshit widgets on websites!
  • I am still annoyed that Skylake still only comes with 4 meager cores and some lousy graphics I will mever make use of, and anything beyond that is a hockey stick price increase. Taunting us with 72 is just cruel.

    • It still won't allow to have a baby in a month.

    • They also don't plan to have a desktop / same socket intel Xeon chip with out build in video. For the last gen you can get a 4 core + HT chip for about $100 less then a i7.

      • I would imagine that the built-in video is actually wanted in the Xeon line, so you don't have to waste motherboard real estate adding a crappy video chip to the bill of materials.

        Many, if not practically every, server uses on-board video. Unless they run completely headless.

        • But an 16-32MB video chip with it's own ram is better then eating system ram and it can be an issue in multi cpu systems.

  • by ndykman ( 659315 ) on Tuesday November 17, 2015 @01:05AM (#50945497)

    While supercomputing is a very small section of the computing world, it's not that hard to understand.

    First of all, this would make for a terrible graphics card. This (deliberately) sits between a CPU and GPU. Each core in a Phi has more branching support, memory space, more complex instructions, etc than a GPU core, but is still more limited than a Xeon core (but it has wider SIMD paths).

    A GPU has many more cores that have a much more limited set of operations, which is what is needed for rapid graphics render. But, those limited sets of operations can also very useful in scientific computing.

    I haven't seen anybody try a three pronged approach (CPU/Phi/Nvidia Tesla), but I will admit I didn't look very hard. This is all in the name of solving really big problems.

  • I have eight (8) cores on my laptop. Frequently, a single multiprocessor-unaware application will hog an entire core, getting it hot, while asking nothing of the other seven (7). These applications are typically very expensive ones, so you might think that they would make use of them.

    Oh, but no. Give me two cores, 100 cores, or anywhere in between. I, as a power-user, will actually never notice a difference.

    Get the programmers to write MPA software. Only then will I think about believing the hype about

    • That is what happens when u use Windows.
      • That is what happens when u use Windows.

        Actually, this is what happens when you (I) use Adobe products.

        The open-source Image-J is far more agile in processing my 100,000+ image-stacks.

      • Uhhhh, this is what happens when you use any application written in C, C++, Java, C#, PHP, Python or just about any programming language without adding threading code. The OS is irrelevant.

    • by dbIII ( 701233 )

      Frequently, a single multiprocessor-unaware application will hog an entire core, getting it hot

      While that is very annoying at least the OS switches it over to another core every now and again to avoid overheating, as a process monitor like "gkrellm" with show you.

      Get the programmers to write MPA software

      Give them a break, developers are only just getting their teeth into 64 bit and you want them to write stuff as if it's 1999? Please give them at least twenty years to get used to the hardware :)

    • The good news is that a Xeon Phi isn't ever going to be installed anywhere but a data center, so you don't have to worry about it. It will churn through data sets by running an application specifically written for it.

      This isn't a high-volume product for Intel - they probably have a couple hundred customers that use these things. But when they do use them, they use a LOT of them because they are building supercomputers that have thousands of cores.

  • Something fast enough to run Minecraft!
  • So the product is Intel's not quite released compute accelerator, featuring new micro architecture, memory technology, and using the latest chip fab capabilities.

    The most readily available competition with released numbers is an nVidia K80, a year old product using 5 year old memory technology, 5 year old chip fab capabilities, Set to be superseded by their refresh using state of the art fab, memory, and microarchitecture, which would actually compete toe to toe with what Intel announced.

    This *should* make

    • GPU floating point performance has been leading general purpose x86 CPU floating point performance by an order of magnitude - for many many years now. There's nothing new in what you are saying.

      What is indeed new is that this is the first general purpose x86 based solution that gives you similar floating point performance as a graphics card. And you get all the advantages of the general purpose CPUs as well as all the x86 codebase you might want to support.

      There must also be a reason why the number 1 superc

    • Yeah, Nvidia can compete toe-to-toe with their next-gen product, until a branch comes along. Branching on GPU compute is ridiculously expensive. This is not so with Xeon Phi.

      That's where this product makes sense.

Research is what I'm doing when I don't know what I'm doing. -- Wernher von Braun