Finnish Startup 'Flow' Claims It Can 100x Any CPU's Power With Its Companion Chip (techcrunch.com) 124
An anonymous reader quotes a report from TechCrunch: A Finnish startup called Flow Computing is making one of the wildest claims ever heard in silicon engineering: by adding its proprietary companion chip, any CPU can instantly double its performance, increasing to as much as 100x with software tweaks. If it works, it could help the industry keep up with the insatiable compute demand of AI makers. Flow is a spinout of VTT, a Finland state-backed research organization that's a bit like a national lab. The chip technology it's commercializing, which it has branded the Parallel Processing Unit, is the result of research performed at that lab (though VTT is an investor, the IP is owned by Flow). The claim, Flow is first to admit, is laughable on its face. You can't just magically squeeze extra performance out of CPUs across architectures and code bases. If so, Intel or AMD or whoever would have done it years ago. But Flow has been working on something that has been theoretically possible -- it's just that no one has been able to pull it off.
Central Processing Units have come a long way since the early days of vacuum tubes and punch cards, but in some fundamental ways they're still the same. Their primary limitation is that as serial rather than parallel processors, they can only do one thing at a time. Of course, they switch that thing a billion times a second across multiple cores and pathways -- but these are all ways of accommodating the single-lane nature of the CPU. (A GPU, in contrast, does many related calculations at once but is specialized in certain operations.) "The CPU is the weakest link in computing," said Flow co-founder and CEO Timo Valtonen. "It's not up to its task, and this will need to change."
CPUs have gotten very fast, but even with nanosecond-level responsiveness, there's a tremendous amount of waste in how instructions are carried out simply because of the basic limitation that one task needs to finish before the next one starts. (I'm simplifying here, not being a chip engineer myself.) What Flow claims to have done is remove this limitation, turning the CPU from a one-lane street into a multi-lane highway. The CPU is still limited to doing one task at a time, but Flow's Parallel Processing Unit (PPU), as they call it, essentially performs nanosecond-scale traffic management on-die to move tasks into and out of the processor faster than has previously been possible. [...] Flow is just now emerging from stealth, with [about $4.3 million] in pre-seed funding led by Butterfly Ventures, with participation from FOV Ventures, Sarsia, Stephen Industries, Superhero Capital and Business Finland. The primary challenge Flow faces is that for its technology to be integrated, it requires collaboration at the chip-design level. This means chipmakers need to redesign their products to include the PPU, which is a substantial investment.
Given the industry's cautious nature and the existing roadmaps of major chip manufacturers, the uptake of this new technology might be slow. Companies are often reluctant to adopt unproven technologies that could disrupt their long-term plans.
The white paper can be read here. A Flow Computing FAQ is also available here.
Central Processing Units have come a long way since the early days of vacuum tubes and punch cards, but in some fundamental ways they're still the same. Their primary limitation is that as serial rather than parallel processors, they can only do one thing at a time. Of course, they switch that thing a billion times a second across multiple cores and pathways -- but these are all ways of accommodating the single-lane nature of the CPU. (A GPU, in contrast, does many related calculations at once but is specialized in certain operations.) "The CPU is the weakest link in computing," said Flow co-founder and CEO Timo Valtonen. "It's not up to its task, and this will need to change."
CPUs have gotten very fast, but even with nanosecond-level responsiveness, there's a tremendous amount of waste in how instructions are carried out simply because of the basic limitation that one task needs to finish before the next one starts. (I'm simplifying here, not being a chip engineer myself.) What Flow claims to have done is remove this limitation, turning the CPU from a one-lane street into a multi-lane highway. The CPU is still limited to doing one task at a time, but Flow's Parallel Processing Unit (PPU), as they call it, essentially performs nanosecond-scale traffic management on-die to move tasks into and out of the processor faster than has previously been possible. [...] Flow is just now emerging from stealth, with [about $4.3 million] in pre-seed funding led by Butterfly Ventures, with participation from FOV Ventures, Sarsia, Stephen Industries, Superhero Capital and Business Finland. The primary challenge Flow faces is that for its technology to be integrated, it requires collaboration at the chip-design level. This means chipmakers need to redesign their products to include the PPU, which is a substantial investment.
Given the industry's cautious nature and the existing roadmaps of major chip manufacturers, the uptake of this new technology might be slow. Companies are often reluctant to adopt unproven technologies that could disrupt their long-term plans.
The white paper can be read here. A Flow Computing FAQ is also available here.
I love the smell of bullshit in the morning (Score:5, Funny)
Re:I love the smell of bullshit in the morning (Score:5, Insightful)
Smells like venture capital fraud.
Re: (Score:3)
Re: (Score:2)
It's the "Cell Architecture" BS (see: Sony PS3) all over again. Nothing will address it meaningfully without being hand-coded assembly, so it's basically vaporware.
Re: I love the smell of bullshit in the morning (Score:3)
Re: (Score:3)
Re: (Score:2)
Bah (Score:3)
Don't bother with this. The room temperature superconductor shtick still has some life in it.
Re: (Score:2)
The problem is that if they just had a really good co-processor for a specific application they would probably be selling that instead of this ridiculous idea of doubling CPU speed. So yeah the whole thing is fishy as hell
Re: (Score:2)
A parallel processing unit that can, with software "tweaks" make my computer go 100x faster in some cases?
It's not bullshit, it's the opposite. NVidia make them for example.
Re: (Score:2)
Depending on the application, co-processors can do wonders. The reason that multi-cores don't do well is because of the interconnection bus among processors, and cache/wait states for true parallel processing.
NVIDIA does well at math functions and array processing (a math function). The post describes something else. It might be naive or given that it requires design interface, perform look-ahead that doesn't expose buffers or malloc in future designs. Who knows?
Re: (Score:3)
GPUs are great at anything that can be reduced to matrix multiplication, because they are optimized for multiply-and-accumulate in massive parallel. That's why they're so useful for LLMs, which are mostly matrix multiplication at the lowest level. What they're not real good for is handling conditional events such as general purpose programs invoke. That's why even with a GPU, you still need a CPU to feed it jobs.
"Do like you oughter, add ASUS to water." (Score:5, Funny)
A coprocessor multiplying performance a hundredfold? Impossible!
Unless... unless somehow this company has finally found a way to harness the limitless potential of homeopathy for computing purposes?
[Pours glass of water into this computer.]
Re: (Score:2)
Ah, but remember when floating point was done with coprocessors which also greatly increased performance?
Note also that most modern programming involves highly inefficient code; optimizing one's code is considered an aberration or a waste of time. All the time spent on research into improving compilers to automatically parallelize code at both high and low levels, goes to naught if all the programmers prefer javascript, python, or dotnet. There is so much inherent inefficiency in modern code.
The drawback,
Re:"Do like you oughter, add ASUS to water." (Score:4, Interesting)
Or...
There are UART co-processors.... Sound/audio co-processors.... Video, numeric, dma... You name it. All exist to do things way faster than the CPU.
Having said that, I've always wanted a co-processor like the Xilinx or Altera PCI cards that I have. A huge gate array that could have application specific code.
Re: (Score:2)
Re: (Score:2)
A coprocessor multiplying performance a hundredfold? Impossible!
They claim it doubles performance, but has the potential of up to 100x with software tweaks. Software optimization is already a thing, and the results can vary quite a bit. From that standpoint, 100X is feasible but would require the right circumstances. Either very specialized tasks, or a really poor starting point. That said, I'm still skeptical of the overall claim. I'm just pointing out the inaccuracy.
Re: (Score:2)
Well, I'll wait a bit before bothering to disbelieve this one...but I'm sure not going to bother investigating it until I hear is from many other sources.
I was just thinking about snake oil. (Score:2)
Perpetual motion machines. Limitless energy with no input. And 100x processing power through an add-on chip. It's a beautiful daydream. But still, just a daydream.
Re: (Score:2)
Orbiting moon + tidal capture tank = perpetual motion machine.
Re: (Score:1)
Re: (Score:2)
Re: I was just thinking about snake oil. (Score:2)
But wouldn't that be a gangsta move
Re: (Score:3)
Either way -- sapping energy from the system increases how fast the instability increases. If the moon was spiraling in, sapping energy would make it spiral in faster. But since it's spiraling OUT, sapping energy will make it spiral OUT faster.
Re: (Score:2)
In orbital mechanics, "spiralling out faster" would require increasing the velocity of the orbiting object, or a loss of gravitational mass in the gravity well the object is orbiting.
So, no. "Spiralling out faster" would mean the moon is accelerating in it's orbit, and that energy is coming from somewhere - likely the pull of a larger gravitational object (the sun).
Re: (Score:2)
The spiraling out is coming at the expense of the Earth's rotation. It used to be much closer, and days used to be much shorter.
Re: (Score:2)
If the moon's orbit is getting wider, it's very likely due to gravitational pull from the sun. If we could figure out how much delta-v the sun is applying, that gives you a good budget for what you could take out of the system to keep a stable lunar orbit.
Of course, it's nowhere as simple as that, and because of the mass of the moon we're talking about shitloads of energy there, but the concept of "balance" is still there.
Re: (Score:2)
True but anything this effectively perpetual does tend to get lumped into the 'perpetual motion machine' category by skeptics. I have to toss out the disclaimer "Yes, it's probably bullshit but something which is for useful purposes practically indistinguishable from a perpetual motion machine is physically possible and only a true perpetual motion machine is not."
Re: (Score:2)
It really just goes back to the idea that all of our energy sources really are solar powered, with different steps between the sun's fusion and our excitement of electrons in a wire - whether it's direct photon interaction with a substrate causes electron flow (PV panels), a very long process of creating the conditions for plant life to exist and thrive, and cause trillions of tons of carbon to be captured from the air and deposited as decaying plant matter deep underground, or having sufficient mass to giv
Re: (Score:2)
I think you are thinking about it wrong.
Yes clearly something is fishy. I think it's not that it's implausible, it's just not new. I have an addon chip in a pcie x16 slot that makes some things go hundreds of times faster. It's called a GPU.
Re: I was just thinking about snake oil. (Score:2)
Re: (Score:2)
Perpetual motion machines. Limitless energy with no input. And 100x processing power through an add-on chip. It's a beautiful daydream. But still, just a daydream.
I wandered over to their web site. I don't think it's an add-on chip so much as a re-design of superscalar. The idea would be Intel, AMD, Arm, or Apple buys some IP from Flow and incorporates that in the next CPU package.
Why exactly this is better than what four massive and well-funded architecture teams came up with over the last 30 years is a little unclear to me. Maybe people building RISC-V processors would be interested.
Need a faster CPU? (Score:2)
Re: (Score:2)
It *used* to.
Well, ok, we couldn't download in those days, unless stealing at 2400 on a BBS, but we could order the disk.
RAM Doubler for the Mac would effectively double memory on 68030 machines years before macOS was doing it with virtual memory. It could delay allocation until actually needed, compress in memory, and page to disk as a last resort.
I assume there were similar things for 486 on the dos side.
Re: (Score:2)
Re: (Score:2)
I recall a see-saw on the dos side, as whether or not it was faster to use the cpu to compress before writing or not was faster or not--but I sure don't remember any details!
Well it's flow processing (Score:3, Insightful)
It's been done for decades. Whole computer architectures, like the Transputer, have been built around it. The basic idea is that you pipeline processing elements one after another. This is also how data processing in hardware (e.g. for 1980s style digital TV chipsets like the Digit 2000) have achieved to decode PAL with some very modest amount of resources. It's a bit more tricky to do this in a programmable way, but typically you solve this by having each processing element be a small computer with fast interfaces. (see the Transputer)
However there is nothing that stops you from doing the same on general purpose computers. In fact if you do shell scripting you will likely have done some primitive flow based computing by using pipes. Implementing the same ideas in a more efficient way isn't that hard, but requires your problem to be in a certain form.
Re: (Score:2)
It's been done for decades. Whole computer architectures, like the Transputer, have been built around it. The basic idea is that you pipeline processing elements one after another..
That's NOT how the transputer worked.
Re: (Score:3)
The basic idea is that you pipeline processing elements one after another.
I wonder how it can cope with x86 instruction set. Some instructions can take an unit busy for hundreds of cycles, that does not help pipelining.
Re: (Score:2)
Those ones aren't included in the benchmarks.
If something sounds too good to be true ... (Score:4, Insightful)
then it probably is.
Extraordinary claims require extraordinary evidence.
Re: If something sounds too good to be true ... (Score:2)
Theyâ(TM)re talking about a GPU. Read the description, it is exactly how a GPU works, massively parallel, the nVIDIA GPU in my computer has 16,000 or so cores, and for certain things like AI, it is a 100x acceleration over the CPU.
So not sure what the big deal is, they made a GPU.
I'll take the under on this one. (Score:1)
Would you like to be this works? I'll take your money.
Would you like to bet this is a scam? Yeah, everybody is on this side of the boat.
But hey, slow Tuesday on New-Slashdot where everything gets approved.
Cool so... (Score:5, Insightful)
Slap this puppy on one of the open risc designs and prove it.
Re:Cool so... (Score:5, Informative)
LOL. That's the acid test for bullshit here isn't it? They don't actually need "collaboration at the chip-design level" for this at all. They could "just" make a stupid fast RISC-V design and show everyone how it's done.
Thing is even without that affordance, you know this is bullshit just from the information already provided: Add silicon and increase performance. We can already do that. It's done by all of the large scale CPU manufacturers today. Big, expensive CPUs have higher IPC because they have more silicon for predictors, cache, etc. to keep pipelines full.
Further, the premise of their claims, that CPUs are wastefully serialized, is absurd. Advanced CPUs do outrageous amounts of gymnastics to keep their components working on instructions. To the point where they've been found to compromise isolation and exhibit huge security flaws.
Re: (Score:2)
We can already do that. It's done by all of the large scale CPU manufacturers today.
While I think that the article is nothing but bullshit, I caution your view of the "experts" here. When all you have is a hammer, every problem looks like a nail. This applies equally to CPUs, the peddlers of which infamously dismissed the idea of video accelerator cards thinking they could just throw more CPU silicon at the problem, after all it worked with the FPU so it will work with everything else too right? They were of course very wrong and proceeded to spend the rest of IT history playing catchup (o
Re: (Score:2)
I don't doubt they have at least something of note. Likely it is indeed some sophisticated new predictor that yields better results than the prevailing predictors. The claims that are made here and the viability of the business model are the issues I see.
Doubling performance? No, that's is highly unlikely. That is classic bullshit for credulous investors. There isn't any fruit hanging that low in prevailing CPU design. Maybe there are a small number of cherry picked applications where something appr
Re: (Score:2)
There may be a better way that people who think x86 is the only thing the world needs may be missing.
Those people are fictions that exist exclusively inside your head.
Here is another falsehood plaguing your mind: thinking that ARM or RISC-V or any other ISA you're a fan of actually matters.
The most valuable silicon in the world today isn't x86 or ARM or RISC-V or Apple's stuff. It's GPUs and ML accelerators. The conventional CPU ISAs and their various implementations are just the bookkeepers that feed GPUs and other co-processors. Jim Keller explained all this this recently in an interview: the spec
Re: (Score:2)
"Add silicon and increase performance. We can already do that."
Yeah but that is a false equivalency, some uses of silicon can bring greater performance increases than others. Usually it happens that someone did some napkin math and added another dimension to mathematically prove something could cover all the possibilities. But the reality is the real problem space doesn't include all the possibilities and there is some scheme that puts more in information space in a scheme that lets them more efficiently ta
Re: (Score:3)
Re: (Score:2)
And how much is a FPGA? As the smaller variants of RISC-V fit without problems into one, couldn't they just add their stuff? Then they can demonstrate, that their stuff works.
Re: (Score:2)
Right, so maybe due to the differences it only yields 5-10x benefit vs 100x it theoretically could bring to the table. If a REAL demonstration showed a 20% improvement, let alone 500-1000% you could make a case to one of the big dogs that it is worth taking a real look at what you have.
Re: (Score:3)
AND here it is:
"Therein lies the primary challenge to Flow’s success as a business: Unlike a software product, Flow’s tech needs to be included at the chip-design level, meaning it doesn’t work retroactively, and the first chip with a PPU would necessarily be quite a ways down the road. Flow has shown that the tech works in FPGA-based test setups.
Per the whitepaper they integrate a massively parallel coprocessor that communicates directly with the L1/L2 CPU caches [think thousands of strea
Re: (Score:2)
Thanks for pointing it out. But I see no mention of concrete numbers or even which architecture they attached it to in the "white paper". I also don't understand, how they intend to work without recompilation and just detecting the pthread stuff in a CPU and quite honestly who writes code like that?
But at least I got reminded of that stuff, I remember the foundations of that from a remote presentation by Danny Hillis for the Connection Machine back in the 90's. But they at least used some extensions to C to
And batteries! (Score:3)
This one gets filed away with the weekly reports of BREAKTHROUGH battery technology that'll allow 1000x more power in 1/100th the size battery and it'll charge in 10 seconds.
Re: (Score:2)
Re: (Score:2)
Oh, and room temperature FTL, perpetual cancer machines and AIs that run on water ... ah, wait, I think I might be mixing up my bullshit a bit here...
CPUs can only do one thing at a time ? (Score:5, Informative)
Pipelining: Superscalar Architecture: Speculative Execution: Branch Prediction: Out-of-Order Execution: Simultaneous Multi-Threading - SMT: Multi-Core Processors:
Re: (Score:1)
Re: (Score:2)
Yeah, they kind of lost credibility with me here. Parallel computing has been around since the 50's. Some of it is still done through time slicing but multi-core processors and specialty instruction set cores have been around for more than a decade.
Even further. Superscalar and hyperthreading were introduced in the '90s. Multicore was introduced in the early oughties.
For the other ancient dinosaurs around here, my favorite mid-80s idea for parallelism was the data flow architecture [wikipedia.org]. Flow's design may want to acknowledge it's intellectual great-great-grandfather.
Re: (Score:2)
At last, a claim I can get behind: running Crysis on my VIC-20.
Re: (Score:2)
Right, but so much of that depends upon a narrow window of code that gets affected. Superscalar is essentially dataflow computing, but it's not looking at huge amounts of code at the same time, just the code that is there in the assembly language stream (following N branches at once). But modern code has lost the concept of "locality of reference". If a value is computed but separated from when the value is used by a thousand intermediate instructions then superscalar can't help with it much. Now imagin
Re: (Score:2)
Welcome to the Huudrawlic Press Channel!
Re: (Score:2)
Oookkkkaaaay! The hydraulic press guy is hillarious.
Re: (Score:2)
Well, I never said they don't have an accent :-)
Re: (Score:2)
VTT, Though (Score:4, Interesting)
The dollars on the table are still relatively small, though, so if there is anything there, the most likely outcome is being acquired by Intel or AMD. If, on the other hand, VC's, or Elon Musk get involved, it'll be safe to assume it's garbage.
Re: (Score:2)
Nah, an arm of the UK government have funding to one of the em drive variants. A device that is no different from a perpetual motion machine.
Anyway, it's just a parallel coprocessor by the sounds of the summary, so like a gpu.
Re: (Score:2)
I mean, it doesn't sound much different from AMD's old HSA system. That put the GPU on the same side of the MMU and cache as the CPU, which allowed passing data back and forth with roughly nanosecond latency from what I recall. This allowed much tighter coupling of CPU and GPU code, so much smaller amounts could be done on the GPU.
Result was some of the low end APUs could outperform the top end i7s on certain benchmarks which weren't really helped with a GPU due to the penalty of latency.
AMD however AMD'd i
Hope they're not patent trolls (Score:3)
There's definitely some info/clams in that whitepaper and FAQ. Not a CPU implementation person, so hard to understand how this co-processor that you have to recompile for differs from the FPGA co-processors that have recently come on the market or even from old school SIMD type extensions for parallelism (let alone from the many stages of pipelining that modern CPUs (that is since like... the 80s? ) already rely on to increase throughput by having multiple instructions in flight at the same time )
Hope they're not just going to try to patent troll everyone with their "novel IP" which turns out to be stuff that everyone's been doing in one way or another already for decades.
Looked at the white paper (Score:5, Informative)
Looked at the white paper.
Basically it seems like they integrate a massively parallel coprocessor that communicates directly with the L1/L2 CPU caches, and takes instruction from the CPU.
A 100x improvement isn't that hard of a pill to swallow if you add hundreds or thousands of streaming processors to a normal CPU to do highly paralellizable work, because it isn't multiplying your CPU's processing power, it's adding a highly parallel coprocessor.
Your GPU is also 100x (some big num) faster than your CPU for vector math. In fact this might not be very different conceptually than a CPU with an onboard general purpose accelerator, like an integrated GPU.
As per the white paper, you would recompiling to explicitly take advantage of the coprocessor.
Techcrunch says "It just more efficiently uses the CPU cycles that are already taking place." I think Techcrunch is misinterpreting what the company is actually saying - the white paper makes it more clear.
Re: (Score:2)
This sounds like the old HSA stuff from AMD, which could give huge speedups by putting the got on the same side of the mmu and cache as the CPU. I remember benchmarks of a bulldozer APU creaming Intel's best in particular tasks.
Naturally AMD fucked it up with a mix of inadequate software support and a weird product lineup where only the low end processors could go fast.
Re: (Score:2)
Good job on actually reading the paper.
The tag line of "Can improve 100x Any CPU with its companion chip" makes it sound like it's setup like a game genie on your motherboard - and reporting that tries to put it into layman's terms is always going to miss a bunch of stuff, or misrepresent it.
There might be something in there, but everyone is so jaded (and maybe rightfully so) from so many false promise tech startups that I think they need to bring their advertising a bit more down to earth.
Re: Looked at the white paper (Score:2)
So it is a GPU in die that shares fast memory with the CPU, like the Apple M1 chip.
PCIe5 x16 with an nVIDIA board and DMA does the exact same thing though and scales larger. Some people even hang them together over InfiniBand to scale to many CPU and many GPU accessible in a single fabric.
Re: (Score:2)
The claim here is not that it executes x86 (100x) faster, it executes certain SIMD instructions fast. Which is exactly what nVIDIA/AMD GPUs do. If you look at datacenter fabrics, the nVIDIA GPU can become the host and use the underlying hardware (such as NVMe) storage without the x86 CPU (using something like a BlueField DPU) and with those fabrics, the GPU can even address other x86 CPU and memory in a different chassis. It is all possible to do this in software, which is the same this company claims to do
Re: (Score:2)
I see you haven't worked with GPGPU recently, with modern datacenter GPU, they can run and schedule work independently, hell, even regular network cards can do that these days, get access to hardware resources etc. That's why IOMMU in hypervisors exist, to shield other processes/memory from rogue tenants.
Re: (Score:2)
Neither can this board. There is not a single CPU outside of Intel/AMD that is even licensed to execute x86 instructions. The article is talking about SIMD instructions and it can't just query the CPU, take things out of its queue and execute them independently.
Blast from the past (Score:2)
They invented the Math Co-processor.
These two things are mutually exclusive (Score:3)
"A Finnish startup called Flow Computing is making one of the wildest claims ever heard in silicon engineering: by adding its proprietary companion chip, any CPU can instantly double its performance, increasing to as much as 100x with software tweaks." - Ok, let's see it.
"Flow’s big achievement, in other words, isn’t high-speed traffic management, but rather doing it without having to modify any code on any CPU or architecture that it has tested." - Ok sounds like an achievement, but could be replicated by others.
"Therein lies the primary challenge to Flow’s success as a business: Unlike a software product, Flow’s tech needs to be included at the chip-design level, meaning it doesn’t work retroactively, and the first chip with a PPU would necessarily be quite a ways down the road. Flow has shown that the tech works in FPGA-based test setups, but chipmakers would have to commit quite a lot of resources to see the gains in question." - Ok no thanks.
You can't say it can work on *Any CPU* and then turn around and say it has to be integrated into the chip design; by definition that rules out every CPU built to this point. That story implies you can make any computer faster, but then you realize it's any computer except any one currently in existence. Fun tech, probably not hard to replicate in numerous ways, but business-wise, this just isn't a business.
I will be able to run FarCry on my Atari 800! (Score:2)
Re: (Score:2)
forget the 800; think of what this means for the 2600!
Ah but is it (Score:3)
3D printed in the cloud with quantum computers mining bitcoins to buy AI apps on Elon's Mars colony so you can mine asteroids to build a space elevator to make humanity a multi planet species?
Re: (Score:2)
yeah sure (Score:2)
> Their primary limitation is that as serial rather than parallel processors, they can only do one thing at a time
CPUs since the mid-1990s have had multiple functional units and retire multiple instructions per cycle. They are very much parallel.
Of course the CDC 6600 was doing that in the 1960s, but it wasn't exactly a single chip.
Why stop there? (Score:2)
Why not design the entire chip and sell an entire solution?
"Desperately seeking suction." (Score:2)
Not to be confused with companies focused on sports medicine, who always give 110%.
Modern CPUS (Score:2)
Their primary limitation is that as serial rather than parallel processors, they can only do one thing at a time
Anyone thinking modern CPUs do this clearly have never worked on modern CPUs. Case in point.
add eax, eax
add eax, eax
add eax, eax
add eax, eax
In dependent code, this would be a clock per instruction. But same code that's independent, like such:
add eax, eax
add ebx, ebx
add ecx, ecx
add edx, edx
Will get packed together and the above will be part of a reciprocal throughput prediction in the pipe. Since the add ALU circuit can take four values (the IO of the ALU circuit is eight times a GP register, 51
Only one thing at a time? (Score:2)
The talk about the CPU only being able to do one thing at a time instantly makes me think the claims are bogus. Modern CPUs do far too many things at the same time, that's the whole pipelined and superscalar architecture thing and it's why branch mis-prediction causes such a huge performance hit. If the company's glossing over such a huge thing, chances are it's just marketing spew.
Now, if they were describing a way to execute both legs of a branch, to a reasonably large depth, and then prune the result tre
Re: (Score:2)
No funding (Score:2)
Flow is just now emerging from stealth, with [about $4.3 million] in pre-seed funding
Thatâ(TM)s effectively nothing. Thatâ(TM)s saying that nobody has any faith in this idea yet.
Looks like the rehabilitation program is working (Score:2)
I'm encouraged to see that the prison rehabilitation program is working well and Elizabeth Holmes already has a new venture running from inside prison.
So, the same thing Chips do now? (Score:2)
Modern chips already do this, it is called out of order threading. Intel and AMD both do this, what makes this different?
At the extreme end of this, the Mill design does this with each instruction packing up to 32 commands. What is the thing this is doing exactly? Is it able to address all the cores at once, maybe? Currently this is done on a per core basis, and multiple instructions have been in Intel chips since the Pentium Pro days at least. If they could somehow do this across all the cores on a chip, I
Intel MIC/Nvidia GPGPU Type Stuff (Score:2)
Basically they have a different sort of way of doing parallel operations similar to how Intel MIC and Nvidia GPGPU stuff works. A big part of it is some different memory design stuff.
Might work, might not, time will tell.
Barrel processing (Score:2)
Please let the part number end in 7. :) (Score:2)
I read their whitepaper, it's called a GPU (Score:2)
"Latency of memory references is hidden by executing other threads while accessing the memory. No coherency problems since no caches are placed in the front of the network. Scalability is provided via a high-bandwidth network-on-chip."
So basically they describe what a GPU is, just with a ton of on-chip memory or a ton of fetching from off-chip memory. Then, get this, you need to **rewrite** your software to take advantage of it. They aren't claiming it's 100x faster than a GPU. They are claiming it's 100x f
Like the fuel additives you can pay extra for (Score:2)
You know those annoying fuel additives you can pay extra for at the pump, that promise to make your engine more efficient and last longer? Yeah, it's kind of like that.
100x performance isn't that easy (Score:2)
If "a few software tweaks" could make the code 100x faster, the OS or chip makers would have implemented those tweaks long ago. Performance profiling is a thing, and it works, and has already been done.
Remember those rumors of carburetors that could make your car able to do 200 mpg, but the oil companies secretly squashed? No, such a thing never was possible, but it's hard to kill rumors.
Re: (Score:2)
Valid summary, in a way no LLM could.
Re: (Score:2)
I'm glad someone did mention it!
https://en.wikipedia.org/wiki/... [wikipedia.org]
I thought the same guys came back with another product.