Ask Slashdot: How Much Faster Is an ASIC Than a Programmable GPU? 63
dryriver writes: When you run a real-time video processing algorithm on a GPU, you notice that some math functions execute very quickly on the GPU and some math functions take up a lot more processing time or cycles, slowing down the algorithm. If you were to implement that exact GPU algorithm as a dedicated ASIC hardware chip or perhaps on a beefy FPGA, what kind of speedup -- if any -- could you expect over a midrange GPU like a GTX 1070? Would hardwiring the same math operations as ASIC circuitry lead to a massive execution time speedup as some people claim -- e.g. 5x or 10x faster than a general purpose Nvidia GPU -- or are GPUs and ASICs close to each other in execution speed?
Bonus question: Is there a way to calculate the speed of an algorithm implemented as an ASIC chip without having an actual physical ASIC chip produced? Could you port the algorithm to, say, Verilog or similar languages and then use a software tool to calculate or predict how fast it would run if implemented as an ASIC with certain properties (clock speed, core count, manufacturing process... )?
Bonus question: Is there a way to calculate the speed of an algorithm implemented as an ASIC chip without having an actual physical ASIC chip produced? Could you port the algorithm to, say, Verilog or similar languages and then use a software tool to calculate or predict how fast it would run if implemented as an ASIC with certain properties (clock speed, core count, manufacturing process... )?
Hard/impossible to quantify, but the diff is huge (Score:2)
Re: (Score:2)
Re:Hard/impossible to quantify, but the diff is hu (Score:5, Informative)
That's not how ASICs work. The whole point of using an ASIC is that you don't have the von Neumann bottleneck. You don't have to have a central processing unit, even. Everything depends on the input stream, the desired output, and the transformations to be performed in between.
Re: (Score:3, Informative)
Just look at bitcoin mining to see.
That's not a fair comparison. That's largely because while GPGPUs are supposed "General Purpose Graphics Processors" they're still really ASICs tuned to handle game graphics. You're hacking a vector multiplication chip to execute hashing functions which they aren't as good at.
The question is if tuned Hardware Transform and Lighting chips for a game could run faster than a GPU. The answer is that yes they could so long as you didn't want nice graphics. If you wanted modern game engine graphics they would
Re:Hard/impossible to quantify, but the diff is hu (Score:5, Informative)
Another example of ASIC performance would be the EFF DES cracker, aka Deep Crack [wikipedia.org] which was a machine built by the Electronic Frontier Foundation (EFF) in 1998, to perform a brute force search of the Data Encryption Standard (DES) cipher's key space – that is, to decrypt an encrypted message by trying every possible key. The aim in doing this was to prove that the key size of DES was not sufficient to be secure.
What does this have to do with performance? This guy estimates 136 years to crack DES on an i7 [stackexchange.com], and the ASIC based Deep Crack can do it in less than a day
That said, the DES ASIC was designed by a group of college professors and then produced at a custom foundry for a total 'working machine'cost of $250,000. So it helps to have industry experts and a bunch of money to get that sort of performance boost.
Re: (Score:2)
That's because DES is bit manipulation. That's not something that GPUs are made for.
Re: (Score:3)
Any recent GPU with cuda support natively supports a full set of bitwise logic and shift ops. Whether massively parallel logic ops are actually very useful in something like video encoding is debatable.
Re: Hard/impossible to quantify, but the diff is h (Score:3)
For $250,000 you can buy lots of i7s with GPU that can crack DES faster than the ASIC. You could even rent $250k worth of gear for a couple of hundred dollars.
The question is generally not whether it's possible, it's whether one thing is more cost efficient than the other.
If your job is to run a single process as fast as possible non-stop for 5 years, an ASIC is always the best performing but the development time and cost has to be taken into account (what's the scale of the operation).
If you want decent pe
Re: (Score:3)
Re: (Score:2)
The numbers would have been even further in the ASIC favor if you compared it to publicly available processors of the same time period
Sure, you can try to make a price comparison as long as you ignore that the i7 did not exist when Deep Crack was made in 1998
Cut down the number of threads and clock speed to 1998 levels and the 136 year run time for the i7 increases by a multiple of 10, and the sheer number of complete computer systems cost would be waaay over the $250,000 spent on Deep Crack
Re: (Score:3)
That's not a fair comparison. That's largely because while GPGPUs are supposed "General Purpose Graphics Processors" they're still really ASICs tuned to handle game graphics. You're hacking a vector multiplication chip to execute hashing functions which they aren't as good at.
You mean the "AS" part of "ASIC" stands for "Application Specific"?
Who knew...?
Re: (Score:2)
The question is if tuned Hardware Transform and Lighting chips for a game could run faster than a GPU. The answer is that yes they could so long as you didn't want nice graphics.
I'm pretty sure the original question was not about graphics, but about doing specific, non-graphics calculations. The GPUs are technically ASICs as well, but the application there is graphics-related calculations.
Re: (Score:2)
Re: (Score:1)
Hardware T&L (Score:2)
What you're asking about is Hardware T&L. It was a fixed function set of functions for transforming and lighting triangles.
The problem is it doesn't allow programmable shaders. And if you want lots of different materials but not lots of wasted silicon on taping out chips for each individual shader, it's way faster to implement shading as a *general purpose processor. *General purpose up to a point which is still largely designed for 3D shaders.
The Geforce 2 was slightly faster than Geforce 3 in limi
Re:Hardware T&L (Score:5, Informative)
What they're asking about isn't exactly clear, which leads to the followup question "what precisely are you trying to do"? ASICs can do the one specific thing they're designed for very well and everything else very badly or not at all.
Even then, for a lot of things a close-enough GPU refined over years of work by experts at nVidia or whatever will outperform some ASIC glued together from various IP blocks by a contractor.
"What /precisely/?" is such a dumb question. (Score:2)
Anyone who ever programmed, knows that you want your functions to be as generic as possibke, and only as specific as necessary.
You do not want to solve one /precise/ problem. You want to solve the largest set possible with the fewest most generic functions.
Your question is like if somebody asked for the fastest sort function, and you would ask him "For what record data type?". (Hint: Sorting is a higher order function that takes the comparator function as a parameter. So the type of the record is irrelevant
yes, simulators exist (Score:2)
Also FPGA tools give you the clock speed of your design.
But it's a lot of work and will always depend on the technology / process being used.
Re: (Score:1)
However if you are doing something like video processing then you need potentially a lot of memory. So depending on how much and what the task is there may be no advantage to having an ASIC. If you have a memory bandwidth bottleneck then you can potentially solve it but GPU's already work hard to solve this so you would need to work even harder.
If you only need a frame buffer it would be no issue, it is when you are doing complex stuff and also if you need to ac
Re: (Score:2)
This is a nonsensical thing to say. Look, an ASIC isn't magic, its just a chip where they've taken the FPGA design and baked it into custom silicon at a price. (Theres a little more complexity, and ASIC design tends to rely on pre-existing IP blocks)
And FPGA is more or less the same, except that the gates are software configured. Once configured however it behaves precisely the same , because it *is* the same. Obviously different FPGAs have different limitations, and
Re: (Score:1)
FPGA are closer to ASIC then CPU's. But an ASIC will still run much faster then FPGA programed to the same circuit as there is still all the logic connectors between the gates that the signals would need to propagate thru, not to mention that the gates are would be physically further apart. On a ASIC there are only the proper connections, and the gates much more optimal layout minimizing the distance the signals have travel.
A lot (Score:1)
Until much better computer designs are a thing again...
Try a GPU? Try a CPU? Try a
Find smart people who understand the math and let them find the very best hardware that year for the math...
Performance vs flexibility/cost (Score:5, Insightful)
Only when you have static simple tasks like mining bitcoins, you can take advantage of the architectual differences between the GPU and FPGA/ASIC.
That is when you can get 10x~50x speed improvements.
For generic tasks, GPU's are just as fast as FPGAs nowadays.
But there are some major architectual differences.
Best see the FPGA as an expensive programmable ASIC.
You build and test your design on an FPGA and when you're a 100% sure that you got it right, you order the ASIC.
Since an ASIC is dedicated silicon, they are about 10x faster than the FPGA and much more energy efficient.
The startup costs to make an ASIC are huge.
You only consider making an ASIC if you need high volumes like a bitcoin farm.
Modern tools allow you to build and test your ASIC in a virtual environment and can calculate how fast it will perform the operations.
Re: (Score:1)
Best see the FPGA as an expensive programmable ASIC.
So, a FPGA is an "expensive ... ASIC"? I think its best NOT to see it that way. You know, because of the wrongness.
Re: (Score:1)
So, a FPGA is an "expensive ... ASIC"? I think its best NOT to see it that way. You know, because of the wrongness.
How do you figure a single $250 part is cheaper than a single $100 or even $10 part?
ASIC runs in the million count range, spreading your quarter million setup charges and $3 per chip over the whole run makes each chip $10
Likewise a run of 100k will spread the setup cost out to be just under $100 per chip.
The FPGA will cost the same from small to medium run and still be $200-$250 each.
It's by far cheaper to get a couple FPGAs to test your design on and get all the bugs worked out and only then get your ASIC
Re: (Score:1)
Re: (Score:2)
in all practicality an fpga IS pretty much an expensive per single unit asic though?
because most products that actually ship that have fpga's in them rarely if ever have the fpga reconfigured during the lifetime of said device. it's just cheaper to make.
then again you could just use an arm chip for lots of things you would have used a fpga for before now too. .. you know because it's cheaper.
Re:Performance vs flexibility/cost (Score:5, Interesting)
Sorry, not even close. It depends completely at the task required.
Take for example video (h26x for example) encoding, which is FAR from a 'generic task'.
A long time ago, Nvidia had a version that ran using the GPGPU resources - it worked, was faster than a cpu, but not by a big margin.
These days, pretty much all GPUs have a small dedicated area (ASIC in effect) for this encoding. These modules run MUCH faster that the GPUs ever could, while doing a better job (less shortcuts).
As you imply, an FPGA is a middle ground, in both performance and size. However you cannot always just 'order an ASIC' once you have a working FPGA (at least not if you want a good ASIC).
You are dead right on development cost (both in time and resources) is almost exactly inversely proportional to speed here.
GPU FGPA ASIC
I suspect the person asking this question is the same one who asked nearly exactly the same question here a few months ago from memory.. and surprisingly, the answers have not changed.
FPGA required you to have a big market or consumers with deep pockets for your application.
ASIC required you to have a huge market, or consumers with BIG budgets.
Basically, if you need to ask the question, they you dont know enough to understand the answer - go find someone who does, and pay them to find out properly.
Re: (Score:2)
I suspect the person asking this question is the same one who asked nearly exactly the same question here a few months ago from memory.. and surprisingly, the answers have not changed.
I'm glad it's not just me who recognised it. So I went and looked:
https://slashdot.org/story/360... [slashdot.org]
Yup, same guy.
Re: (Score:2)
> Take for example video (h26x for example) encoding, which is FAR from a 'generic task'.
Actually it is - every single person doing h.26x encoding can use the same silicon and be happy. It's a generic task.
Possibly, there's some edge case where certain h.26x profiles can't be done by commodity hardware and would be loaded onto a fast GPU, but those would be oddball specific, custom tasks.
Genericizing to silicon isn't just for simple gates - any algorithm that millions of people are happy doing the same
Re: (Score:2)
For some applications, speed with flexibility points to FPGAs as the only practical solution. For example, their ability to be re-programmed remotely make them viable for platforms that cannot easily be accessed physically. For example, satellites, deep-space missions, landers, or other long-term observing platforms at remote locations or low sea-depths are candidates.
Re: (Score:2)
Only when you have static simple tasks like mining bitcoins, you can take advantage of the architectual differences between the GPU and FPGA/ASIC.
That is when you can get 10x~50x speed improvements.
For generic tasks, GPU's are just as fast as FPGAs nowadays.
But there are some major architectual differences.
Best see the FPGA as an expensive programmable ASIC.
You build and test your design on an FPGA and when you're a 100% sure that you got it right, you order the ASIC.
Since an ASIC is dedicated silicon, they are about 10x faster than the FPGA and much more energy efficient.
The startup costs to make an ASIC are huge.
You only consider making an ASIC if you need high volumes like a bitcoin farm.
Modern tools allow you to build and test your ASIC in a virtual environment and can calculate how fast it will perform the operations.
It sounds like what they are asking is are there any specific calculations that are common among all (most) modern games that can be split off to a co-processor to allow the GPU to handle everything else. Things like maybe physics , or vector etc.
Depends on the algorithm. (Score:3)
But unless the difference is worth a few million dollars, forget about it.
An FPGA is a better point, but unless you're going something weird or super-specific, I'd think a video card is going to do a decent job. And a lot of the energy used in a video card is in simply moving the data around. An Asic Isn't going to help with that.
If you could build a video card with your special math op built in, you could speed it up by however much that instruction is slowing it down. And if you had a large enough order, you might convince Intel,nVidia AMD, Qualcom, or Imagination... to do just that for you.
Best case for an ASIC is stream processing where you dump a batch of data in, and it runs through the chip and creates and output, and any intermediates don't matter to you in any way and doesn't require additional inputs once it starts. Why it works or Hashing, but isn't that common of a technique anymore.
SHA256: ASIC vs GPU (Mining hardware comparison) (Score:3)
https://en.bitcoin.it/wiki/Non... [bitcoin.it]
Re: (Score:2)
> If you were to implement that exact GPU algorithm as a dedicated ASIC hardware chip or perhaps on a beefy FPGA
Then you'd be doing it wrong, I believe.
I could be wring here, but I think that's like saying:
If you wanted to move to a new house, so you loaded up a semi truck and a rocket car with identical loads (same weight and volume of cargo in each) ...
I think if you design an special-purpose ASIC that is supposed to do one task and then you pretend it's a general-purpose GPU and use it like a GPU, yo
Most 10G isn't copper (Score:2)
You know something about twisted pair. Cool.
Probably most 10GBase links are 10gbase-sr or some -*r, as opposed to -T.
Does any chip exist that does only 10GBASE-T? The cheapest chip I found that does copper also does multiple fiber standards, which run at 10.3125 GHz.
Questions answered (Score:3)
Just like SIMD operations on a CPU, GPUs are typically designed to execute the same instructions in parallel over a set of data. Even if the hardware algorithm is slower, parallelizing across a data stream can end up being much faster than executing it faster one at a time especially since hardware buses perform better with bursts of data.
The advantages of FPGAs and custom/special purpose ASICs is that you can choose to optimize less for generality and more for handling specialized tasks. If you have enough transistors/LUTs you can utilize more and more of them to optimize and reduce how many clock cycles your algorithm executes all the way until it takes only a single clock cycle (as long as you're not bumping in to path planning length limitations). FPGA speeds are typically less than 800Mhz so even single cycle operations can't get any faster than your FPGAs maximum speed whereas an ASIC can be designed to run at much higher clock speeds.
ASICs are typically simulated on fairly beefy FPGAs, often several working in concert, before being produced so calculating speed is obviously doable.
Re: (Score:2)
Second answer for bonus question, look up what test benches are.
Re: (Score:2)
ASICs are typically simulated on fairly beefy FPGAs, often several working in concert, before being produced so calculating speed is obviously doable.
Keep in mind that a board full of beefy FPGAs could buy you a nice house on the beach.
It becomes about power consumption (Score:5, Interesting)
The most efficient bitcoin miners are at 11,000 MegaHashes per Joule. GPU's topped out at something like 3 MH/J. CPU's are more like 0.03.
The more efficient it is, the tighter you can pack in more parallel processing units.
Depends on memory usage (Score:2)
ASICs are screaming fast when everything fits on the chip. They can still be worthwhile when the optimal memory access pattern maps poorly to the facilities offered by the GPU. If your data is large, comes in chunks that the GPU likes to deal with, and the access pattern is well tuned to the GPU, you may not be able to win at all.
There is no simple answer (Score:2)
A GPU shader will perform a set of operations sequentially, processing many data values in parallel. Both because there are many instances of the shader running on separate compute units, and because each compute unit is processing multiple values in separate registers.
When you port your implementation to an FPGA / ASIC. While you would want to speed up each operation to run in a single clock tick, you should also consider adding a pipeline. Performing each of those operations, on different data values, at
Let's talk about ASICs, people. (Score:5, Informative)
First of all, the question assumes that the only use case of ASICs is speed. This is displays a poor understanding of ASICs in general, and this must be addressed.
Let's review the basics. ASIC stands for "Application Specific Integrated Circuit". Therefore, that tells us two things about ASICs:
1. They are application specific, as opposed to general purpose.
2. They are integrated circuits. This means that, fundamentally, they are in the same family as everything from a 74C jellybean part to an Intel i9 microprocessor.
Note that neither of the above have any inherent meaning with respect to FPGAs.
At one point in time, ASICs were the only real game in town. Needed to make a DMM? Build an ASIC. Needed to make a calculator for finance? Build an ASIC. Need to make a musical birthday card? Build an ASIC. Need to make a talking teddy bear? Build an ASIC. The ASIC allowed you to achieve the functionality you needed at the scale you needed, at the lowest cost possible. In addition, if you were clever with your ASIC design, you could sell your ASIC to other manufacturers, who would then integrate your ASIC into their products. It was all about revenue.
The general purpose microprocessor at the time (and even now), did not have enough flexibility or the low cost at scale that ASICs had.
The FPGA came about after the ASIC, and it had a marketing battle of its own for relevancy. There was no fundamental reasoning that FPGAs were somehow "designed to be" inferior to the ASIC. The FPGA is merely the sum of its parts. It has advantages and limitations inherent to its design, which is a trivial exercise to understand, just by looking at the FPGA's internal design.
However, if your logic is sufficiently simple, it should be a trivial exercise to compute how fast your ASIC could be. You know the process you'll be using, the transistor operating characteristics and the longest route in the logic. It's actually more difficult to determine how fast an FPGA is at doing something. The advantage with FPGAs is that a computer can do the difficult work for you pretty much automatically, while the ASIC involves (usually much) more human effort.
So what are the actual advantages of ASICs over FPGAs if not outright performance?
1. ASICs have a huge advantage in logic density.
2. ASICs can have tremendous amounts of flexibility in design - especially when it comes to things like analog circuits.
3. ASICs can be designed to perform more operations per clock cycle than an FPGA.
All else being equal, #1 and #3 means using an ASIC has tremendous consequences when it comes to thermal and energy efficiency for computation.
Therefore, if the logic density and operations per cycle for an ASIC design on a specific ASIC process exceed an equivalent FPGA's capabilities, then it's almost always certain that a properly designed ASIC on that process will outperform the FPGA.
Re: (Score:2)
This is an excellent summary of both the history and design tradeoffs. Hats off.
The one thing I would add: designing ASICs is trickier than designing FPGAs, and both require significantly different skill sets than designing software (whether for CPUs or GPUs). With the advent of cloud-based FPGA platforms (where you can try out your FPGA designs without having to invest in a dedicated hardware lab), I suddenly had folks asking about translating their web-based applications onto AWS F1 instances, as though t
Homework (Score:2)
Why is it these sound like homework questions to me? Everyone knows stackoverflow is for homework questions, and slashdot is for an argument.
Re: (Score:2)
Why is it these sound like homework questions to me? Everyone knows stackoverflow is for homework questions, and slashdot is for an argument.
They never answer the homework questions on Stack, they can smell that shit a mile away. Sometimes you get the n00b that starts answering and someone comes along and shuts it down.
Re: (Score:2)
I agree.
You can almost always tell when someone is asking an academic or superficial question about ASICs when they are only concerned about performance.
Someone who was truly serious about using ASICs would almost always ask questions involving scale / cost or analog/RF design.
GPU for parallel - ASIC for pipelined serial (Score:2)
GPUs excel when the same simple math operations need to be performed on massive amounts of data in parallel. Algorithms like Fast Fourier Transform and all kinds of linear algebra benefit from massive parallel execution. Computer graphics is nothing more than applied linear algebra, and the algorithms are essentially to identical to signal processing. There are two major bottlenecks in GPU programs:
- supplying data as fast as it can be processed and retrieving results as fast as they can be produced
- condit
Old, let's talk tech... (Score:1)
First, what are basics, fpgas and gpgpus?
A gpgpu is an asic. The designer wants to make a GPU, which can also be used for general purpose computer tasks. The designer chooses how many shades to include, and how they are architected, and how it is all controlled. The designer puts registers where it makes sense to, to optimize this architecture.
A CPU is similar, also an asic, and the designer figures out how to most optimally (in his opinion) to implement the instruction set, what a core looks like and how i
Yes, ASICs are faster due to high specialization (Score:1)
There is a general positive correlation between a processor's efficiency and how specialized its function is. Human brains are highly adaptable (at least compared to a modern CPU), and can do a wide variety of tasks, but most of them very slowly. This is why we are perfectly capable of doing basic calculations by hand, but we prefer to have a calculator handy. A CPU is a lot more specialized in that it has a limited set of operations it will be required to perform (the x86_64 instruction set for example)
How about this comparison: (Score:2)
Depends (Score:2)
Unless you invest significant money and time on getting that ASIC manufactured with a modern process, or the task you want to solve is actually unsuitable for GPUs, the answer may well be that the ASIC is slower.
Can you outperform the world's best ASIC team. (Score:3)
Do you have data? (Score:2)
Does your application have a lot of data to process (e.g., machine learning applications)? If yes, then your bottleneck is the bandwidth at which you can transport your data, not the processing power. If you're doing video analysis, the bottleneck is going to be the cable between the camera and the chips processing it, and a GPU is more than enough. If your application has no data 'e.g., mining bitcoins), then, pure processing power is interesting.
Of course, you can design the input of your asic, but everyt
How much faster is an ASIC than a GPU? (Score:2)
I dunno. Is it a European ASIC or an African one?
It depends the task and how much you spend (Score:1)
It depends (Score:2)
It completely depends on the algorithm involved and the bottlenecks faced by the GPU in performing those algorithms. In general, ASICs are about 2-3x as fast as an FPGA, and can contain about 4 times as much logic. They also cost a LOT of money to make; think 5-10 million USD for a mask set of a large ASIC in modern technology. Depending on the algorithm, you may be able to run many pipelines in parallel, creating a huge calculation capacity, and end up with your PCIexpress bus being the limiting factor. AS
a GPU is closley related to an ASIC (Score:1)
The GPU is full of hardwired pieces and the technology to fabricate GPUs is the same as an ASIC. If you want to spent the time developing and testing hardwired blocks for every case, then the ASIC is definitely faster, cheaper, or lower power. But probably not all three if you hit the same feature set. If you try to implement every combination of functionality in an ASIC, then you run out of area and have to give up some parallelism to meet the basic requirements. GPUs came out of boiling down some primitiv
AAA Game Dev response (Score:1)
AAA Game programmer here. I'm disappointed to see so many dismissive and rude responses.
I would estimate the speedup as at least 100x (just an educated ballpark). I choose this number because, the speedup for a single shader would be well over 1000x, but games don't use just one shader.
Game engines use different shader programs, from dozens to thousands, in order to draw different techniques. The first constraint would be that you'd need to choose which shaders you want. Shaders are often *a little bit* gen
Learn how algorithms in hardware really work (Score:1)
I suspect then you can (Score:1)
I suspect then you can alter the equations as much as you like and will achieve only modest results. With highly intensive video operations you will most likely be moving vast amounts of data in and out of the processing unit and optimizing this memory access will account for the bulk of any speed ups.
Apples to Oranges (Score:2)