Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Hardware Technology

Finnish Startup 'Flow' Claims It Can 100x Any CPU's Power With Its Companion Chip (techcrunch.com) 124

An anonymous reader quotes a report from TechCrunch: A Finnish startup called Flow Computing is making one of the wildest claims ever heard in silicon engineering: by adding its proprietary companion chip, any CPU can instantly double its performance, increasing to as much as 100x with software tweaks. If it works, it could help the industry keep up with the insatiable compute demand of AI makers. Flow is a spinout of VTT, a Finland state-backed research organization that's a bit like a national lab. The chip technology it's commercializing, which it has branded the Parallel Processing Unit, is the result of research performed at that lab (though VTT is an investor, the IP is owned by Flow). The claim, Flow is first to admit, is laughable on its face. You can't just magically squeeze extra performance out of CPUs across architectures and code bases. If so, Intel or AMD or whoever would have done it years ago. But Flow has been working on something that has been theoretically possible -- it's just that no one has been able to pull it off.

Central Processing Units have come a long way since the early days of vacuum tubes and punch cards, but in some fundamental ways they're still the same. Their primary limitation is that as serial rather than parallel processors, they can only do one thing at a time. Of course, they switch that thing a billion times a second across multiple cores and pathways -- but these are all ways of accommodating the single-lane nature of the CPU. (A GPU, in contrast, does many related calculations at once but is specialized in certain operations.) "The CPU is the weakest link in computing," said Flow co-founder and CEO Timo Valtonen. "It's not up to its task, and this will need to change."

CPUs have gotten very fast, but even with nanosecond-level responsiveness, there's a tremendous amount of waste in how instructions are carried out simply because of the basic limitation that one task needs to finish before the next one starts. (I'm simplifying here, not being a chip engineer myself.) What Flow claims to have done is remove this limitation, turning the CPU from a one-lane street into a multi-lane highway. The CPU is still limited to doing one task at a time, but Flow's Parallel Processing Unit (PPU), as they call it, essentially performs nanosecond-scale traffic management on-die to move tasks into and out of the processor faster than has previously been possible. [...] Flow is just now emerging from stealth, with [about $4.3 million] in pre-seed funding led by Butterfly Ventures, with participation from FOV Ventures, Sarsia, Stephen Industries, Superhero Capital and Business Finland.
The primary challenge Flow faces is that for its technology to be integrated, it requires collaboration at the chip-design level. This means chipmakers need to redesign their products to include the PPU, which is a substantial investment.

Given the industry's cautious nature and the existing roadmaps of major chip manufacturers, the uptake of this new technology might be slow. Companies are often reluctant to adopt unproven technologies that could disrupt their long-term plans.

The white paper can be read here. A Flow Computing FAQ is also available here.
This discussion has been archived. No new comments can be posted.

Finnish Startup 'Flow' Claims It Can 100x Any CPU's Power With Its Companion Chip

Comments Filter:
  • by locater16 ( 2326718 ) on Tuesday June 11, 2024 @03:49PM (#64541571)
    Smells like, venture capital funding
    • by Antique Geekmeister ( 740220 ) on Tuesday June 11, 2024 @03:52PM (#64541579)

      Smells like venture capital fraud.

      • It will be enough to demo some random solutions without providing "100x compute power on any problem and for any instruction the CPU does". Likely, it is an ASIC that has a few cherrypicked instructions printed onto silicon. A demo that does something like computes an elliptic curve or decodes AES will run 100x faster, but that's what ASICs and GPUs already do. This will be no different.
    • Don't bother with this. The room temperature superconductor shtick still has some life in it.

    • Other that or it's just a hyped up co-processor. Toss a 287 on your old IBM computer and watch the spreadsheets fly.

      The problem is that if they just had a really good co-processor for a specific application they would probably be selling that instead of this ridiculous idea of doubling CPU speed. So yeah the whole thing is fishy as hell
    • A parallel processing unit that can, with software "tweaks" make my computer go 100x faster in some cases?

      It's not bullshit, it's the opposite. NVidia make them for example.

      • Depending on the application, co-processors can do wonders. The reason that multi-cores don't do well is because of the interconnection bus among processors, and cache/wait states for true parallel processing.

        NVIDIA does well at math functions and array processing (a math function). The post describes something else. It might be naive or given that it requires design interface, perform look-ahead that doesn't expose buffers or malloc in future designs. Who knows?

      • by Mal-2 ( 675116 )

        GPUs are great at anything that can be reduced to matrix multiplication, because they are optimized for multiply-and-accumulate in massive parallel. That's why they're so useful for LLMs, which are mostly matrix multiplication at the lowest level. What they're not real good for is handling conditional events such as general purpose programs invoke. That's why even with a GPU, you still need a CPU to feed it jobs.

  • by Pseudonymous Powers ( 4097097 ) on Tuesday June 11, 2024 @03:51PM (#64541575)

    A coprocessor multiplying performance a hundredfold? Impossible!

    Unless... unless somehow this company has finally found a way to harness the limitless potential of homeopathy for computing purposes?

    [Pours glass of water into this computer.]

    • Ah, but remember when floating point was done with coprocessors which also greatly increased performance?
      Note also that most modern programming involves highly inefficient code; optimizing one's code is considered an aberration or a waste of time. All the time spent on research into improving compilers to automatically parallelize code at both high and low levels, goes to naught if all the programmers prefer javascript, python, or dotnet. There is so much inherent inefficiency in modern code.

      The drawback,

    • by Xenx ( 2211586 )

      A coprocessor multiplying performance a hundredfold? Impossible!

      They claim it doubles performance, but has the potential of up to 100x with software tweaks. Software optimization is already a thing, and the results can vary quite a bit. From that standpoint, 100X is feasible but would require the right circumstances. Either very specialized tasks, or a really poor starting point. That said, I'm still skeptical of the overall claim. I'm just pointing out the inaccuracy.

    • by HiThere ( 15173 )

      Well, I'll wait a bit before bothering to disbelieve this one...but I'm sure not going to bother investigating it until I hear is from many other sources.

  • Perpetual motion machines. Limitless energy with no input. And 100x processing power through an add-on chip. It's a beautiful daydream. But still, just a daydream.

    • by The Cat ( 19816 )

      Orbiting moon + tidal capture tank = perpetual motion machine.

      • by Anonymous Coward
        Believe it or not, that helps decay the moon's orbit. Not perpetual.
        • The moon's orbit is not decaying. It's moving farther away from us
          • But wouldn't that be a gangsta move

          • by Mal-2 ( 675116 )

            Either way -- sapping energy from the system increases how fast the instability increases. If the moon was spiraling in, sapping energy would make it spiral in faster. But since it's spiraling OUT, sapping energy will make it spiral OUT faster.

            • In orbital mechanics, "spiralling out faster" would require increasing the velocity of the orbiting object, or a loss of gravitational mass in the gravity well the object is orbiting.

              So, no. "Spiralling out faster" would mean the moon is accelerating in it's orbit, and that energy is coming from somewhere - likely the pull of a larger gravitational object (the sun).

              • by Mal-2 ( 675116 )

                The spiraling out is coming at the expense of the Earth's rotation. It used to be much closer, and days used to be much shorter.

        • by Shaitan ( 22585 )

          True but anything this effectively perpetual does tend to get lumped into the 'perpetual motion machine' category by skeptics. I have to toss out the disclaimer "Yes, it's probably bullshit but something which is for useful purposes practically indistinguishable from a perpetual motion machine is physically possible and only a true perpetual motion machine is not."

    • I think you are thinking about it wrong.

      Yes clearly something is fishy. I think it's not that it's implausible, it's just not new. I have an addon chip in a pcie x16 slot that makes some things go hundreds of times faster. It's called a GPU.

    • I was thinking things like SIMD, hardware accelerated video encoding, wavetable synthesis, ... Things are faster when done with hardware acceleration, a LOT faster actually. I'm not saying I truly believe this chip would make everything run 100x faster. But I also know VTT and the fact that they don't waste their money on fairytale projects. So I think they actually might have something useful, but the 100x case is most likely for some few specific cases where the latency is currently the bottle neck, and t
    • Perpetual motion machines. Limitless energy with no input. And 100x processing power through an add-on chip. It's a beautiful daydream. But still, just a daydream.

      I wandered over to their web site. I don't think it's an add-on chip so much as a re-design of superscalar. The idea would be Intel, AMD, Arm, or Apple buys some IP from Flow and incorporates that in the next CPU package.

      Why exactly this is better than what four massive and well-funded architecture teams came up with over the last 30 years is a little unclear to me. Maybe people building RISC-V processors would be interested.

  • Just download one! It worked for RAM, didn't it? xD
    • by hawk ( 1151 )

      It *used* to.

      Well, ok, we couldn't download in those days, unless stealing at 2400 on a BBS, but we could order the disk.

      RAM Doubler for the Mac would effectively double memory on 68030 machines years before macOS was doing it with virtual memory. It could delay allocation until actually needed, compress in memory, and page to disk as a last resort.

      I assume there were similar things for 486 on the dos side.

      • Yeah, that's what I'm joking about, I remember the RAM compressor they had for MSDOS. It sort-of worked, but since it was all 386 and 486 processors, it sucked because it just slowed everything down, and also if I remember correctly it just didn't work with everything.
        • by hawk ( 1151 )

          I recall a see-saw on the dos side, as whether or not it was faster to use the cpu to compress before writing or not was faster or not--but I sure don't remember any details!

  • by Casandro ( 751346 ) on Tuesday June 11, 2024 @03:56PM (#64541593)

    It's been done for decades. Whole computer architectures, like the Transputer, have been built around it. The basic idea is that you pipeline processing elements one after another. This is also how data processing in hardware (e.g. for 1980s style digital TV chipsets like the Digit 2000) have achieved to decode PAL with some very modest amount of resources. It's a bit more tricky to do this in a programmable way, but typically you solve this by having each processing element be a small computer with fast interfaces. (see the Transputer)

    However there is nothing that stops you from doing the same on general purpose computers. In fact if you do shell scripting you will likely have done some primitive flow based computing by using pipes. Implementing the same ideas in a more efficient way isn't that hard, but requires your problem to be in a certain form.

    • by augustw ( 785088 )

      It's been done for decades. Whole computer architectures, like the Transputer, have been built around it. The basic idea is that you pipeline processing elements one after another..

      That's NOT how the transputer worked.

    • The basic idea is that you pipeline processing elements one after another.

      I wonder how it can cope with x86 instruction set. Some instructions can take an unit busy for hundreds of cycles, that does not help pipelining.

  • by Alain Williams ( 2972 ) <addw@phcomp.co.uk> on Tuesday June 11, 2024 @04:01PM (#64541615) Homepage

    then it probably is.

    Extraordinary claims require extraordinary evidence.

    • Theyâ(TM)re talking about a GPU. Read the description, it is exactly how a GPU works, massively parallel, the nVIDIA GPU in my computer has 16,000 or so cores, and for certain things like AI, it is a 100x acceleration over the CPU.

      So not sure what the big deal is, they made a GPU.

  • Would you like to be this works? I'll take your money.
    Would you like to bet this is a scam? Yeah, everybody is on this side of the boat.

    But hey, slow Tuesday on New-Slashdot where everything gets approved.

  • Cool so... (Score:5, Insightful)

    by Shaitan ( 22585 ) on Tuesday June 11, 2024 @04:08PM (#64541641)

    Slap this puppy on one of the open risc designs and prove it.

    • Re:Cool so... (Score:5, Informative)

      by Tailhook ( 98486 ) on Tuesday June 11, 2024 @05:35PM (#64541883)

      LOL. That's the acid test for bullshit here isn't it? They don't actually need "collaboration at the chip-design level" for this at all. They could "just" make a stupid fast RISC-V design and show everyone how it's done.

      Thing is even without that affordance, you know this is bullshit just from the information already provided: Add silicon and increase performance. We can already do that. It's done by all of the large scale CPU manufacturers today. Big, expensive CPUs have higher IPC because they have more silicon for predictors, cache, etc. to keep pipelines full.

      Further, the premise of their claims, that CPUs are wastefully serialized, is absurd. Advanced CPUs do outrageous amounts of gymnastics to keep their components working on instructions. To the point where they've been found to compromise isolation and exhibit huge security flaws.

      • We can already do that. It's done by all of the large scale CPU manufacturers today.

        While I think that the article is nothing but bullshit, I caution your view of the "experts" here. When all you have is a hammer, every problem looks like a nail. This applies equally to CPUs, the peddlers of which infamously dismissed the idea of video accelerator cards thinking they could just throw more CPU silicon at the problem, after all it worked with the FPU so it will work with everything else too right? They were of course very wrong and proceeded to spend the rest of IT history playing catchup (o

        • by Tailhook ( 98486 )

          I don't doubt they have at least something of note. Likely it is indeed some sophisticated new predictor that yields better results than the prevailing predictors. The claims that are made here and the viability of the business model are the issues I see.

          Doubling performance? No, that's is highly unlikely. That is classic bullshit for credulous investors. There isn't any fruit hanging that low in prevailing CPU design. Maybe there are a small number of cherry picked applications where something appr

        • by Tailhook ( 98486 )

          There may be a better way that people who think x86 is the only thing the world needs may be missing.

          Those people are fictions that exist exclusively inside your head.

          Here is another falsehood plaguing your mind: thinking that ARM or RISC-V or any other ISA you're a fan of actually matters.

          The most valuable silicon in the world today isn't x86 or ARM or RISC-V or Apple's stuff. It's GPUs and ML accelerators. The conventional CPU ISAs and their various implementations are just the bookkeepers that feed GPUs and other co-processors. Jim Keller explained all this this recently in an interview: the spec

      • by Shaitan ( 22585 )

        "Add silicon and increase performance. We can already do that."

        Yeah but that is a false equivalency, some uses of silicon can bring greater performance increases than others. Usually it happens that someone did some napkin math and added another dimension to mathematically prove something could cover all the possibilities. But the reality is the real problem space doesn't include all the possibilities and there is some scheme that puts more in information space in a scheme that lets them more efficiently ta

    • They appear to have been funded to €4M. Masks and prototype run in a modern process are, what $95M now?
      • And how much is a FPGA? As the smaller variants of RISC-V fit without problems into one, couldn't they just add their stuff? Then they can demonstrate, that their stuff works.

        • by Shaitan ( 22585 )

          Right, so maybe due to the differences it only yields 5-10x benefit vs 100x it theoretically could bring to the table. If a REAL demonstration showed a 20% improvement, let alone 500-1000% you could make a case to one of the big dogs that it is worth taking a real look at what you have.

        • by Shaitan ( 22585 )

          AND here it is:

          "Therein lies the primary challenge to Flow’s success as a business: Unlike a software product, Flow’s tech needs to be included at the chip-design level, meaning it doesn’t work retroactively, and the first chip with a PPU would necessarily be quite a ways down the road. Flow has shown that the tech works in FPGA-based test setups.

          Per the whitepaper they integrate a massively parallel coprocessor that communicates directly with the L1/L2 CPU caches [think thousands of strea

          • Thanks for pointing it out. But I see no mention of concrete numbers or even which architecture they attached it to in the "white paper". I also don't understand, how they intend to work without recompilation and just detecting the pthread stuff in a CPU and quite honestly who writes code like that?

            But at least I got reminded of that stuff, I remember the foundations of that from a remote presentation by Danny Hillis for the Connection Machine back in the 90's. But they at least used some extensions to C to

  • by WankerWeasel ( 875277 ) on Tuesday June 11, 2024 @04:09PM (#64541647)

    This one gets filed away with the weekly reports of BREAKTHROUGH battery technology that'll allow 1000x more power in 1/100th the size battery and it'll charge in 10 seconds.

    • Exactly, don't forget Cold Fusion and quantum computing in the same pile. Funny how they cannot produce results first then tell us how well it works.
      • by Gleenie ( 412916 )

        Oh, and room temperature FTL, perpetual cancer machines and AIs that run on water ... ah, wait, I think I might be mixing up my bullshit a bit here...

  • by Mirnotoriety ( 10462951 ) on Tuesday June 11, 2024 @04:16PM (#64541663)
    > Central Processing Units have come a long way .. but .. they can only do one thing at a time.

    Pipelining: Superscalar Architecture: Speculative Execution: Branch Prediction: Out-of-Order Execution: Simultaneous Multi-Threading - SMT: Multi-Core Processors:
    • Yeah, they kind of lost credibility with me here. Parallel computing has been around since the 50's. Some of it is still done through time slicing but multi-core processors and specialty instruction set cores have been around for more than a decade.
      • Yeah, they kind of lost credibility with me here. Parallel computing has been around since the 50's. Some of it is still done through time slicing but multi-core processors and specialty instruction set cores have been around for more than a decade.

        Even further. Superscalar and hyperthreading were introduced in the '90s. Multicore was introduced in the early oughties.

        For the other ancient dinosaurs around here, my favorite mid-80s idea for parallelism was the data flow architecture [wikipedia.org]. Flow's design may want to acknowledge it's intellectual great-great-grandfather.

    • Right, but so much of that depends upon a narrow window of code that gets affected. Superscalar is essentially dataflow computing, but it's not looking at huge amounts of code at the same time, just the code that is there in the assembly language stream (following N branches at once). But modern code has lost the concept of "locality of reference". If a value is computed but separated from when the value is used by a thousand intermediate instructions then superscalar can't help with it much. Now imagin

    • by Xenx ( 2211586 )
      I think that is more about the article not differentiating between CPU and CPU core. I'm not saying I trust the claims, but I'm of a mind to follow Hanlon's razor when it comes to that part of it.
  • VTT, Though (Score:4, Interesting)

    by abEeyore ( 8687599 ) on Tuesday June 11, 2024 @04:16PM (#64541665)
    In spite of the absurd initial claim, the fact that an arm of the Finnish Government is involved suggests that there is - at a minimum - an interesting underlying technology there. Also, looking at the recent one line MeteorLake optimization that improved performance on Linux by more than 50% - code optimizations can, and do still matter.

    The dollars on the table are still relatively small, though, so if there is anything there, the most likely outcome is being acquired by Intel or AMD. If, on the other hand, VC's, or Elon Musk get involved, it'll be safe to assume it's garbage.
    • Nah, an arm of the UK government have funding to one of the em drive variants. A device that is no different from a perpetual motion machine.

      Anyway, it's just a parallel coprocessor by the sounds of the summary, so like a gpu.

  • by rta ( 559125 ) on Tuesday June 11, 2024 @04:20PM (#64541673)

    There's definitely some info/clams in that whitepaper and FAQ. Not a CPU implementation person, so hard to understand how this co-processor that you have to recompile for differs from the FPGA co-processors that have recently come on the market or even from old school SIMD type extensions for parallelism (let alone from the many stages of pipelining that modern CPUs (that is since like... the 80s? ) already rely on to increase throughput by having multiple instructions in flight at the same time )

    Hope they're not just going to try to patent troll everyone with their "novel IP" which turns out to be stuff that everyone's been doing in one way or another already for decades.

  • by Anonymous Coward on Tuesday June 11, 2024 @04:22PM (#64541681)

    Looked at the white paper.
    Basically it seems like they integrate a massively parallel coprocessor that communicates directly with the L1/L2 CPU caches, and takes instruction from the CPU.
    A 100x improvement isn't that hard of a pill to swallow if you add hundreds or thousands of streaming processors to a normal CPU to do highly paralellizable work, because it isn't multiplying your CPU's processing power, it's adding a highly parallel coprocessor.
    Your GPU is also 100x (some big num) faster than your CPU for vector math. In fact this might not be very different conceptually than a CPU with an onboard general purpose accelerator, like an integrated GPU.
    As per the white paper, you would recompiling to explicitly take advantage of the coprocessor.

    Techcrunch says "It just more efficiently uses the CPU cycles that are already taking place." I think Techcrunch is misinterpreting what the company is actually saying - the white paper makes it more clear.

    • This sounds like the old HSA stuff from AMD, which could give huge speedups by putting the got on the same side of the mmu and cache as the CPU. I remember benchmarks of a bulldozer APU creaming Intel's best in particular tasks.

      Naturally AMD fucked it up with a mix of inadequate software support and a weird product lineup where only the low end processors could go fast.

    • Good job on actually reading the paper.

      The tag line of "Can improve 100x Any CPU with its companion chip" makes it sound like it's setup like a game genie on your motherboard - and reporting that tries to put it into layman's terms is always going to miss a bunch of stuff, or misrepresent it.

      There might be something in there, but everyone is so jaded (and maybe rightfully so) from so many false promise tech startups that I think they need to bring their advertising a bit more down to earth.

    • So it is a GPU in die that shares fast memory with the CPU, like the Apple M1 chip.

      PCIe5 x16 with an nVIDIA board and DMA does the exact same thing though and scales larger. Some people even hang them together over InfiniBand to scale to many CPU and many GPU accessible in a single fabric.

  • They invented the Math Co-processor.

  • by Whateverthisis ( 7004192 ) on Tuesday June 11, 2024 @04:37PM (#64541719)
    From the article:

    "A Finnish startup called Flow Computing is making one of the wildest claims ever heard in silicon engineering: by adding its proprietary companion chip, any CPU can instantly double its performance, increasing to as much as 100x with software tweaks." - Ok, let's see it.

    "Flow’s big achievement, in other words, isn’t high-speed traffic management, but rather doing it without having to modify any code on any CPU or architecture that it has tested." - Ok sounds like an achievement, but could be replicated by others.

    "Therein lies the primary challenge to Flow’s success as a business: Unlike a software product, Flow’s tech needs to be included at the chip-design level, meaning it doesn’t work retroactively, and the first chip with a PPU would necessarily be quite a ways down the road. Flow has shown that the tech works in FPGA-based test setups, but chipmakers would have to commit quite a lot of resources to see the gains in question." - Ok no thanks.

    You can't say it can work on *Any CPU* and then turn around and say it has to be integrated into the chip design; by definition that rules out every CPU built to this point. That story implies you can make any computer faster, but then you realize it's any computer except any one currently in existence. Fun tech, probably not hard to replicate in numerous ways, but business-wise, this just isn't a business.

  • I mean, yeah, its almost certainly garbage but if it works (and is cheap enough) I can totally see the retro computer and console scene putting these in consoles to boost the performance and run more modern games. Things like Fuji.net, GameDrive, etc. already add extra chips to these old systems to add features.
  • by 50000BTU_barbecue ( 588132 ) on Tuesday June 11, 2024 @04:53PM (#64541769) Journal

    3D printed in the cloud with quantum computers mining bitcoins to buy AI apps on Elon's Mars colony so you can mine asteroids to build a space elevator to make humanity a multi planet species?

    • I feel like there should be carbon nano-tubes in there somewhere. Maybe that was just implied in the space elevator.
  • > Their primary limitation is that as serial rather than parallel processors, they can only do one thing at a time

    CPUs since the mid-1990s have had multiple functional units and retire multiple instructions per cycle. They are very much parallel.

    Of course the CDC 6600 was doing that in the 1960s, but it wasn't exactly a single chip.

  • Why not design the entire chip and sell an entire solution?

  • Not merely as much as 94% or 107%, but exactly 100%.

    Not to be confused with companies focused on sports medicine, who always give 110%.
  • Their primary limitation is that as serial rather than parallel processors, they can only do one thing at a time

    Anyone thinking modern CPUs do this clearly have never worked on modern CPUs. Case in point.

    add eax, eax
    add eax, eax
    add eax, eax
    add eax, eax

    In dependent code, this would be a clock per instruction. But same code that's independent, like such:

    add eax, eax
    add ebx, ebx
    add ecx, ecx
    add edx, edx

    Will get packed together and the above will be part of a reciprocal throughput prediction in the pipe. Since the add ALU circuit can take four values (the IO of the ALU circuit is eight times a GP register, 51

  • The talk about the CPU only being able to do one thing at a time instantly makes me think the claims are bogus. Modern CPUs do far too many things at the same time, that's the whole pipelined and superscalar architecture thing and it's why branch mis-prediction causes such a huge performance hit. If the company's glossing over such a huge thing, chances are it's just marketing spew.

    Now, if they were describing a way to execute both legs of a branch, to a reasonably large depth, and then prune the result tre

    • It sounds like they figured out a way to more efficiently shuttle instructions between RAM and the CPU. Notably the RAM to CPU pipeline is a huge bottleneck.
  • Flow is just now emerging from stealth, with [about $4.3 million] in pre-seed funding

    Thatâ(TM)s effectively nothing. Thatâ(TM)s saying that nobody has any faith in this idea yet.

  • I'm encouraged to see that the prison rehabilitation program is working well and Elizabeth Holmes already has a new venture running from inside prison.

  • Modern chips already do this, it is called out of order threading. Intel and AMD both do this, what makes this different?
    At the extreme end of this, the Mill design does this with each instruction packing up to 32 commands. What is the thing this is doing exactly? Is it able to address all the cores at once, maybe? Currently this is done on a per core basis, and multiple instructions have been in Intel chips since the Pentium Pro days at least. If they could somehow do this across all the cores on a chip, I

  • Basically they have a different sort of way of doing parallel operations similar to how Intel MIC and Nvidia GPGPU stuff works. A big part of it is some different memory design stuff.

    Might work, might not, time will tell.

  • Reading the white paper, it looks like the old barrel processing https://en.wikipedia.org/wiki/... [wikipedia.org]. It requires 100+ threads to get 100x speedup. On the parallel part. Then you bonk into Amdahl's law.
  • I did see a notable performance improvement after installing my 387 math coprocessor chip. :)
  • "Latency of memory references is hidden by executing other threads while accessing the memory. No coherency problems since no caches are placed in the front of the network. Scalability is provided via a high-bandwidth network-on-chip."

    So basically they describe what a GPU is, just with a ton of on-chip memory or a ton of fetching from off-chip memory. Then, get this, you need to **rewrite** your software to take advantage of it. They aren't claiming it's 100x faster than a GPU. They are claiming it's 100x f

  • You know those annoying fuel additives you can pay extra for at the pump, that promise to make your engine more efficient and last longer? Yeah, it's kind of like that.

  • If "a few software tweaks" could make the code 100x faster, the OS or chip makers would have implemented those tweaks long ago. Performance profiling is a thing, and it works, and has already been done.

    Remember those rumors of carburetors that could make your car able to do 200 mpg, but the oil companies secretly squashed? No, such a thing never was possible, but it's hard to kill rumors.

Children begin by loving their parents. After a time they judge them. Rarely, if ever, do they forgive them. - Oscar Wilde

Working...