Intel Defends AVX-512 Against Critics Who Hope It 'Dies a Painful Death' (pcworld.com) 132
"I hope AVX512 dies a painful death," Linus Torvalds said last month, "and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on."
Friday PC World published some reactions from Intel: Torvalds wasn't the only person to kick AVX-512 in the shins either. Former Intel engineer Francois Piednoel also said the special instruction simply didn't belong in laptops, as the power and die space area trade-offs just aren't worth it.
But Intel Chief Architect Raja Koduri says their community loves it because they're seeing a huge performance boost: "AVX-512 is a great feature. Our HPC community, AI community, love it," Koduri said, responding to a question from PCWorld about the AVX-512 kerfuffle during Intel's Architecture Day on August 11. "Our customers on the data center side really, really, really love it." Koduri said Intel has been able to help customers achieve a 285X increase in performance in "our good old CPU socket" just by taking advantage of the extension...
Koduri acknowledged some validity to Torvald's heat, too. "Linus' criticism from one angle that 'hey, are there client applications that leverage this vector bit yet?' may be valid," he said. Koduri explained further that Intel has to maintain a hardware software contract all the way from servers to laptops, because that's been the magic of the ecosystem. "(That's) the great thing about the x86 ecosystem, you could write a piece of software for your notebook and it could also run on the cloud," Kodori said. "That's been the power of the x86 ecosystem..."
And no, hate on AVX-512 and special instructions all you want, Intel isn't going to change direction. Koduri said it will continue to lean on AVX-512 as well as other instructions. "We understand Linus' concerns, we understand some of the issues with first generation AVX-512 that had impact on the frequencies etc, etc," he said "and we are making it much much better with every generation."
They also summarize some performance testing by blogger Travis Downs, saying it found AVX-512 "doesn't appear to enforce much of a penalty at all on a laptops. Downs' testing found the clock speed only dropped 100MHz when using one active core under AVX-512.
"At least, it means we need to adjust our mental model of the frequency related cost of AVX-512 instructions," Downs concluded. "Rather than 'generally causing significant downclocking,' on this Ice Lake chip we can say that AVX-512 causes insignificant or zero licence-based downclocking and I expect this to be true on other Ice Lake client chips as well."
Friday PC World published some reactions from Intel: Torvalds wasn't the only person to kick AVX-512 in the shins either. Former Intel engineer Francois Piednoel also said the special instruction simply didn't belong in laptops, as the power and die space area trade-offs just aren't worth it.
But Intel Chief Architect Raja Koduri says their community loves it because they're seeing a huge performance boost: "AVX-512 is a great feature. Our HPC community, AI community, love it," Koduri said, responding to a question from PCWorld about the AVX-512 kerfuffle during Intel's Architecture Day on August 11. "Our customers on the data center side really, really, really love it." Koduri said Intel has been able to help customers achieve a 285X increase in performance in "our good old CPU socket" just by taking advantage of the extension...
Koduri acknowledged some validity to Torvald's heat, too. "Linus' criticism from one angle that 'hey, are there client applications that leverage this vector bit yet?' may be valid," he said. Koduri explained further that Intel has to maintain a hardware software contract all the way from servers to laptops, because that's been the magic of the ecosystem. "(That's) the great thing about the x86 ecosystem, you could write a piece of software for your notebook and it could also run on the cloud," Kodori said. "That's been the power of the x86 ecosystem..."
And no, hate on AVX-512 and special instructions all you want, Intel isn't going to change direction. Koduri said it will continue to lean on AVX-512 as well as other instructions. "We understand Linus' concerns, we understand some of the issues with first generation AVX-512 that had impact on the frequencies etc, etc," he said "and we are making it much much better with every generation."
They also summarize some performance testing by blogger Travis Downs, saying it found AVX-512 "doesn't appear to enforce much of a penalty at all on a laptops. Downs' testing found the clock speed only dropped 100MHz when using one active core under AVX-512.
"At least, it means we need to adjust our mental model of the frequency related cost of AVX-512 instructions," Downs concluded. "Rather than 'generally causing significant downclocking,' on this Ice Lake chip we can say that AVX-512 causes insignificant or zero licence-based downclocking and I expect this to be true on other Ice Lake client chips as well."
AMD vs. Intel (Score:4, Insightful)
With AMD, they didn't implement the AVX-512 instructions, and yet, the newer chips provide basically the same performance for a cheaper price simply by throwing more cores at it. Likewise, significant AFX-512 workloads can likely be moved to GPU processing for even better price/performance. As such, it isn't providing much benefit for the vast majority of cases--you can simplify the cpu and throw more cpus, pair it with a decent gpu, and you have a solution that can provide benefits for more users, while still giving the same overall performance even for the specialized cases. It is a solution looking for a problem.
Re:AMD vs. Intel (Score:5, Informative)
the newer chips provide basically the same performance for a cheaper price simply by throwing more cores at it
Not really. You're comparing a specific instruction for an edge case to general performance. AMD absolutely are pantsing Intel. Intel still have an edge in IPC and single threaded workloads but even that edge may disappear when Zen 3 hits the market.
If you don't make use of AVX-512 and you have a parallel problem then AMD will win with it's core count. If you occasionally execute an AVX-512 instruction the gap becomes even larger because there's actual performance impacts in executing that instruction.
However if you have a workload that makes use of AVX-512 heavily, even if the workload is parallel Intel ends up wiping the floor with AMD. The problem is that those workloads are usually even executed on GPUs so really the AVX-512 thing makes little sense.
Re: AMD vs. Intel (Score:2)
AFAIK the single-threaded edge is already gone.
Re: (Score:2)
Not quite yet, but honestly it's so close as to be irrelevant in practical terms clock for clock. The only reason Intel really still tops all the single core charts is due to having a far higher single core turbo boost than AMD.
Re: (Score:2)
The only reason Intel really still tops all the single core charts is due to having a far higher single core turbo boost than AMD.
That's part of it, but the 6% higher core speed doesn't account for the nearly 15% performance delta.
Intel cores also have vastly superior IPC, which is offset by better data path subsystems tacked onto Zen2.
Really- the performance of the 2 cores is wildly different than benchmarks allude to, and both really do perform different tasks at wildly different levels of proficiency.
Re: (Score:2)
The ipc edge is gone
Negative, it is not.
Amd chips are getting more done per clock
Now this is in some cases true, but it's not related to how many instructions it can pump through. It has more to do with how much data they can access and how quickly.
Re: (Score:2)
Re: (Score:2)
From what I understand the AVX-512 instruction is an attempt to shoehorn GPU style processing in a CPU. There's no point doing it on the GPU since that's what the GPU already excels at and doesn't need a special instruction.
Re: (Score:2)
No, and yes. GPGPU doesn't need/want anything to do with AVX512.
Maybe in some dystopian alternate reality where Larrabee took off as a GPU, yes, that could have happened.
Other way around (Score:3)
Speaking of...could Intel's implementation of AVX-512 set the foundation for later integration into the iGPU for better graphic performance?
The other way around.
AVX512 was born out of their older failed attempt at making an dGPU (project Larrabel):
as GPGPU computation started to be popular back then, but were still a bit cumbersome (most was done by abuse OpenGL, and a little bit of the early low-level API available such as the BrookGPU implementation running atop of AMD's CTM) the idea Intel had was to pair a very large amount of very simple cores, each with extremely large SIMD units: you still got the ultra-wide SIMD popular on GPUs, but as
Re: (Score:2)
That is of course caveated by the fact that there is more to the performance of a CPU than merely IPC.
AMDs are generally paired with faster RAM as an example, coming very close to evening out its severe IPC deficit.
Re: (Score:2)
I'm not sure that a GPU is the right place to handle that level of precision though.
There are likely some use cases where it's useful, but I don't think that mainstream processors shall have the AVX-512.
Re:AMD vs. Intel (Score:5, Interesting)
The entire thing was born out of the larabee project, when that project was about rendering. What Intel found was that no matter what they did they could not feed that much data to the CPU without changing the cache architecture, and that such changes to the cache architecture would negatively effect regular performance with crushing memory latency.
So we end up in a situation where Intel knew that they would not be able to process entire AVX-512 registers in one go on all threads, so did not include the execution units necessary to do it even on a single core, let alone have the bandwidth to do it on all of them.
So as Linus rightly notes, the shit is more or less useless right now, and costs a lot of execution time because AVX-512 registers are enormous and like all registers need saving between context switches, saving that is slow because of that lack of bandwidth. A single AVX-512 register is as large as all the general purpose registers combined.
There is no solution other than time.
Re: (Score:2)
"like all registers need saving between context switches, saving that is slow because of that lack of bandwidth."
Architectural registers are not physical registers. While the register allocation table is usually though of as enabling out-of-order execution, it also aids context switching.
Re: (Score:2)
Some "mainstream processors" already use AVX512. It's implemented in Cannonlake, IceLake, and TigerLake (with TigerLake having the broadest selection of AVX512 instructions; AVX512 is a mess of different sub-standards). The trick behind the "consumer" implementations of AVX512 is that total width of vector units didn't improve tall from Haswell onward. You aren't going to see a whole lot of performance benefit using AVX512 over AVX2 on any modern Intel design, since the only way Cannonlake/IceLake/TigerL
Re: AMD vs. Intel (Score:2)
Re: AMD vs. Intel (Score:2)
Unless they are AMD APUs, of course. :)
See: Game consoles.
Re: (Score:2)
Same performance? Intel doesn't have anything faster than Rome, and Rome was LAST year's server CPU. Milan would like a word with you.
Re: (Score:1)
Re:Linus isn't really a floating-point kind of guy (Score:5, Interesting)
Re: (Score:3)
So what is " licence-based downclocking", does it downclock some CPUs not for heat or power management, but because it would be too fast for the tier the user paid for?
Re: (Score:2)
Re: (Score:2, Informative)
Re: (Score:2)
From my perspective the only really practical use of 512bit FP would be for astronomical data processing, but I think that a completely different HW architecture overall would be needed anyway for that to avoid the constraint that a 64 bit data bus is in the main processor. Or maybe reserve it for processors with 8 memory channels and make those processors more or less 512 bit processors.
For general purpose processing I'd say that Linus is right. In general purpose processing it's more useful with more core
Re: (Score:3)
AVX512 can provide benefits in any number of HPC applications - but so can competing ISA extensions like SVE2. SVE2 is a much-more-elegant system. Also, Intel's implementation of AVX512 on processors that can actually use it fully (Skylake-SP, Cascade Lake, Cooper Lake, upcoming IceLake-SP, etc.) have to downclock due to the extra power draw/heat generation. Intel hasn't developed a sophisticated method for determing how many cores are running those instructions or how often, so any time those instructio
Re: (Score:2, Funny)
Who is using 512 bit FP operations in machine learning?
Nobody... so why are you asking?
AFAIK everyone is reducing precision in the machine learning space, to improve performance.
I see. You are asking because you are completely ignorant. You think AVX-512 does 512-bit floating point. Then, with that so accurate knowledge, decided that you would pretend to be an expert on slashdot.
Its called SIMD you fucking pretending dishonest fuck.
Re: (Score:2)
Who is using 512 bit FP operations in machine learning?
Nobody... so why are you asking?
Because the guy from Intel said the AI community likes it. Apparently you didn't even read the summary, you dumb person.
Re: (Score:2)
Because the guy from Intel said the AI community likes it. Apparently you didn't even read the summary, you dumb person.
Nobody in the AI community is using 512 bit floating point, which is fine because AVX isn't doing 512 bit floating point. What people do like is being able to perform 512 bits worth of float32 operations in parallel.
Re: (Score:2)
Re: (Score:2)
Is this really something you would use for a tensor flow model?
Yes, definitely.
Who is using 512 bit FP operations in machine learning?
No one, but what's that got to do with AES?
AFAIK everyone is reducing precision in the machine learning space, to improve performance.
Yes, and...? AVX does integer operations too.
Re: (Score:2)
Re: (Score:2)
lol fair enough. I've said some pretty dumb shit on here stone cold sober.
Re: (Score:2)
AVX512 is a mess of instruction subsets. The ML-related ones are stuff like bfloat16 (present in Cooper Lake). Look it up.
Re: (Score:2)
Re: Linus isn't really a floating-point kind of gu (Score:4, Insightful)
Re: (Score:3)
Re: Linus isn't really a floating-point kind of gu (Score:2)
You seriously need to learn you logical fallacies, kid.
Like "argument from authority".
Parent's argument was, that it's the sub-aread where Linus does not have a clue. Which was the same fallacy aswell.
In the end, Linus was the only one making actual arguments. You may show counter-arguments, of you got some. Otherwise, why don't you two monkey brains shut up?
That's not a retort. (Score:2)
The critics say AVX-512 has no place in desktops and laptops. Countering that the HPC and AI community love it is sort of supporting the points of the critics. Throw it in Xeons and special purpose chips, leave it out of the rest. Maybe then you could save some money and be price competitive with AMD.
Right now you'd be mad to build a general purpose computer based on an Intel CPU.
Re: (Score:1)
Right now, AVX-512 is only supported in Intel's HEDT systems and some Xeon Phi models: https://en.wikipedia.org/wiki/AVX-512 [wikipedia.org]. Which makes it very niche indeed. No testing something at home on your old i7.
Re: (Score:2)
Ice Lake client is out and has more support for AVX-512 than even the HEDT systems right now (see VBMI2).
https://en.wikichip.org/wiki/i... [wikichip.org]
Re: (Score:2)
Intel actually implemented AVX512 in Cannonlake as well. TigerLake will have it. Sadly, it won't offer much performance since, unlike those HEDT systems and server systems (Skylake-SP, Cascade Lake, Cooper Lake), the consumer CPUs I mentioned above only support 512b SIMD via op fusion. 2x256b vs 2x512b and all that.
You're not THAT much better off running AVX512 on IceLake-U than you are AVX2.
Re: (Score:2)
There maybe some uses where it really shines, but it's simply not common enough to advertise it as a useful feature for the broader market of even professionals.
The only application in my personal workflow that makes notable use of AVX 2 is Blender in the Cycles renderer. And at that my Ryzen 9 3900X at stock speeds has still more processing power than a 25% more expensive i
Re: (Score:2)
Every compiler around supports auto-vectorization and uses it in their equivalent of -O3.
In kernels, they're used to move data around/compare/transform with less instructions- resulting in significant speedups, like Netfilters ~420% speedup using AVX2 on Rome cores.
And at that my Ryzen 9 3900X at stock speeds has still more processing power than a 25% more expensive i9 10900k.
Aggregate, yes. But core-for-core, the Intel walks you like a dog.
That his its own applications (generally anything that performs better with lower instruction latency will perform better in ca
Re: (Score:2)
Do you mean Blender viewport performance? I explicitly mentioned Blender Cycles. That's a situation where people spend a lot of time waiting for the renders to be finished at a high power draw of their system unless they send their files to a remote render farm. Show me the benchmarks where the 10900k is faster there.
I'll go ahead and start by showing data that supports what I wrote by simply googling "10900k blender"
Re: (Score:2)
According to Chromium compile times the 3900X is also about 27% faster than the 10900k.
I never argued that the 3900X didn't have more aggregate CPU cycles to throw at a problem. It has 2 more cores and 4 more hardware threads.
I argued that core-per-core, it walks the 3900X like a dog. And it does.
But let's address your 27% claim- because frankly, it's horse shit.
Let's look at the source of it- the gamersnexus article.
Notice that the 3900X (stock) is literally the same as the 3900X overclocked? Ya. They fucked up.
They're not measuring stock speed. So it's logical we should be comparing ov
Re: (Score:2)
Apparently not. You saw a post by someone you thought to be an Intel hater and had to counter with some "Real World Performance" nobody asked for and which isn't backed up by data either.
My point is that I care little for core-to-core performance here. In these workloads, that make me and many others that work in that field money, AMD offers more cores with a higher total performance which also have a lower power draw fo
Re: (Score:2)
There's nothing about AVX512 that's making Intel consumer CPUs "more expensive". Cannonlake/IceLake/TigerLake support AVX512 via op fusion. They have the same 2x256b config that Intel has used since Haswell. AVX512 provides no real performance advantage for any of Intel's current lineup outside of Cascade Lake/Cooper Lake (where Intel has implemented 2x512b configs).
Skylake-S, CoffeeLake-S, and Comet Lake-S (and any of the mobile derivatives of those CPU groups) do not support AVX512 at all, so it's a mo
Summary (Score:2)
So, one of these things has more things than the other thing, and the other thing doesn't even have the thing so everyone is arguing about the thing and if it's really a thing. Intel spokesperson says, "Yes", it will be a thing.
At least that's what I got out of it. Oh and it has something to do with graphics or numbers or something like that.
Re: Summary (Score:2)
Sure, if you lossy-reduce a discussion to uselesness, it's gonna be useless. You might aswell reduce it further, to "x". Or to (bottom) aka undefined. (In Haskell parlance.)
Maybe it's just you not getting things. Maybe you need some coffee. Or do something more fun to you.
Re: (Score:2)
No no I get it- the first thing has twice as many things that the other one doesn't have.
Re: (Score:3)
The irony is that "EditorDavid" committed one of the most fundamental errors an "editor" (in the journalistic sense) can make.
A whole effin' summary THAT NEVER EXPLAINS WHAT THE *EXPLETIVE* AVX-512 EXTENSION IS!!!!!!
This thing! This terrible thing! Linus hates it! Intel Loves it! And we can't be arsed to tell you what it is!!!
I used to program in Assembly. I consider myself mildly knowledgeable about x86 architecture. I don't know what avx-512 is. My first thought was that it must be an audio/video c
Re: (Score:2)
AVX512 is a mess of instruction sub-sets that's actually hard to explain. In its simplest terms, it's suppoesed to be 512 bit SIMD. A simple example would be adding 32-bit fp values to one another to produce a single sum.
128b SIMD lets you add four values at once. 256b SIMD lets you add 8 values at once. 512b SIMD lets you add 16, and so forth. The wider your vector implementation, the more stress you put on load/store blah blah blah you should know the drill by now right?
I don't care! My next custom built (Score:2)
in "our good old CPU socket"
and
"Our customers on the data center side really, really, really love it."
The second and third "really" sold me right there. Such a powerful selling point coming from a marketing dweeb. More useless marketing double speak.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
AMD will probably support AVX512 eventually, unless they do something radical and try to force the x86 world to switch to SVE2 (which I wish they would).
What the hell is AVX-512? (Score:3)
Can someone explain, in language understandable by a tech person with no particular experience in CPU or compiler design, what AVX-512 is, what need it is supposed to fill, and why exactly people are criticizing it?
Re: (Score:3, Informative)
Re: (Score:2)
Intel AVX-512 enables twice the number of floating point operations per second (FLOPS) per clock cycle compared to its predecessor
Incorrect.
Execution units do that. 512 bit registers are not required.
Re: (Score:2)
Are you really about to argue that we can forgo vectorization and simply add more execution units?
That'll be awesome. I, personally, can't wait for pipelines so long that a stall takes seconds to resolve.
Or were you hoping the execution units would peer into the future?
Re:What the hell is AVX-512? (Score:5, Informative)
GPU-like processing on the CPU, but not as fast as on a GPU. And it still requires specially optimized programs.
Re: (Score:2)
And it still requires specially optimized programs.
This made me laugh. Only someone who has never written an OpenCL/CUDA kernel would say something that fucking stupid.
Perhaps you should limit yourself to offering opinions on things you actually know something about.
Re: (Score:3)
Except when it needs to warm up the AVX-512 unit. It really only works when it is doing big batches, for small pieces of AVX-512 code you have to wait for it to spin the special unit on the CPU up and clock the CPU down, then run the code.
Re: (Score:2)
Except when it needs to warm up the AVX-512 unit.
Ya, you have a good point.
Warming up the AVX-512 unit is super expensive compared to the startup cost of the shader cores. </sarcasm>
Re: (Score:2)
Re: What the hell is AVX-512? (Score:2, Informative)
And that is deliberately misleading, as, in a typical Intel fashion "It's better because we shifted the shit out to the stuff around it, that we hid, so it looks better". In other words: The instruction needs a special mode and special caching behavior and even then the CPU cannot feed it constantly, so you only get that speed for a short time, before and after which the CPU will waste so much time on related stuff, that in total it is much slower. And if you try to do it for longer, to make it worth it, it
Re: (Score:2)
Re: (Score:2)
You should be looking at SVE2 instead.
Re: (Score:2)
Re: (Score:2)
They didn't hobble Titan RTX.
The Titan V uses Volta, while Titan RTX uses Turing.
One is a rebranded professional card, while the other is a rebranded RTX 2080 Ti.
The pro card has dedicated 64-bit units, while the consumer card ditched that for a bit higher FP32 performance and dedicated Raytracing. units.
Re: (Score:2)
Re: (Score:2)
Nvidia A100:
9.7 TFLOPs FP64
19.5 TFLOPS FP32
78 TFLOPSS FP16
nvdia marketing materials [nvidia.com]
Re: (Score:2)
These are vector instructions, meaning that they operate on several things at once. So for example say you have 16 sets of multiplications to do, you can do them all with one instruction.
The advantage is that the CPU only needs to read and decode one instruction and then all 16 multiplications happen at the same time in parallel. That's really useful for a lot of things.
The first version of AVX had 64 bit registers (or was it 32?) and Intel kept increasing their size. 512 is huge and few things benefit from
Re: (Score:2)
Re:What the hell is AVX-512? (Score:5, Informative)
Current responses are lacking, so..
AVX-512 is a SIMD instruction. Single-instruction, multiple-data, rather than having to decode instructions, fetch data from memory, and executing each instruction in sequence -- it'll fetch a block of memory, and perform the (limited set of) instruction on the memory.
It started with MMX, which back in 1997 (according to Wikipedia) allowed you to process an array of integer data -- something like four pieces of data at once, perform an integer operation on all of them, and in the space of a single multiply or add, you get four of them done. Great! This helped with image decoding, audio processing (how many voices in a game?), DVD playback, and the list goes on.
AMD had their own implementation with 3DNow!. These grew up, through MMX2, MMX3, 3dNow2, and blah.
It was seen how useful these SIMD instructions were, but they were just so limited - add, subtract, multiply, divide of integers. What about floating point? This is where the successors come in -- SSE. According to the 'pedia, SSE2 was introduced with the P4, so I'm guessing SSE itself was introduced on the Pentium 3 (which seems to have introduced a _lot_ of new architecture).
Well, SSE was killer, too -- it helped with better precision for floating point, started getting used in games, and cheap AI purposes. Cool!
SSE is old news, though. SSE4 is probably the last I've seen mentioned (Sept 2006?), and Intel keeps introducing new things, and growing up. Really, Intel was hurting -- all of these nVidia cards being used for AI and HPC purposes -- Intel wanted a cut of that HPC market, which they were completely locked out of. Intel doesn't have a discrete graphics processor, and the Intel Iris Graphics just isn't going to cut it for HPC purposes.
And so: Intel AVX instructions.
The AVX instruction set is really about doing on the CPU -- without OpenCL, CUDA, or another compiler/language/processing environment, without copying data over the PCIe bus, without referencing a discrete component -- HPC-things that needed to be done on the graphics cards. Remember, graphics cards are about processing _lots_ of data at once (300-2000 shaders consumer, up to 7000 cores HPC), and AVX is trying to pick up some of that (AVX-32,64,128, AVX-256, and AVX-512). These are in-CPU, smaller versions of the graphics card, without the linguistic changes, or scheduling, or DMA, or other complexities associated therewith.
The benefits are clear -- if you're hashing something, you can do that very quickly with the AVX instructions, especially without any latency of copying data to another device.
The drawbacks are less-clear, but very apparent: graphics cards are rated to 300 watts. You're now trying to stuff a portion of that processing power into the CPU, and back in the early 2010's, benchmarking showed this to cause the CPUs to run VERY hot. Much hotter, much more quickly than the heat sink could cool them. (I worked at a computer manufacturer -- running Prime95 with AVX instruction set would regularly cause problems.) Apparently, from other comments, the CPU also doesn't have the memory bandwidth to fetch the data quickly enough. Remember, graphics cards use High Bandwidth Memory now to supply up to 1500 shader cores. Really, with AVX, the memory bus can't keep up -- unless you're doing thousands of iterations over the same, cached data, you can do one instruction and then you have to wait.
So -- with AVX, Intel got a serious performance boost for games, graphics, RAID, AI workloads, encryption, compression, and so on.
Re: (Score:2)
That's a great post, but it doesn't explain why SSE and AVX were great extensions that everybody adopted without question and AVX-512 is not.
SSE allowed processing of (among others things) vectors of 4 32 bits FP numbers (16*128 bits registers, 256 bytes total). You don't need to look very far to find applications: anything 3d. Very useful, so.
What SSE _also_ allowed what to do finally forget about the x87 FPU. That was a mess because it used a stack architecture, which meant _a lot_ of x87 instructions wer
Re: (Score:3)
If your computer sits around all day solving linear algebra problems, you'd like your dot products to run in O(1) time instead of O(number of elements). That's what AVX and similar instruction sets do. They make your CPU act less like a Commodore and more like a Cray.
However, if you do anything else, you would prefer your CPU vendor to spend the considerable transistor budget associated with vector instruction sets on something else.
The thing is, almost every interesting problem boils down to linear algeb
Re: (Score:2)
It's SIMD - Single Instruction Multiple Data
Intel has been working on SIMD since MMX in 1995/96.
You should be familiar with some of these terms: SSE, SSE2, SSE3, SSE4/4.1a/4.1b, XOP, AVX, AVX2
Each one is a selection of CPU instructions you can use in some capacity to carry out various floating-point (or in some cases, integer) operations on multiple points of data without using more than one instruction.
In general terms, the bit-width of the SIMD standard determines how much data can be processed in one ins
Re: (Score:2)
How many AVX-512 units are there on a die? If I'm trying to do Mandelbrot on 8 threads (4 cores), will each thread stall until a AVX unit is available or what?
Re: (Score:2)
Per core? Depends on the CPU!
Cannonlake, IceLake, and Tigerlake have 2x256b per core, and perform AVX512 via op fusion in hardware. No real gain there over AVX2 unless there's some instruction in AVX512 you're just dying to use.
Skylake-SP, Cascade Lake-SP/AP, and Cooper Lake all have 2x512b per core. They carry out AVX512 natively, though they are the ones that downclock at the first sign of an AVX512 instruction. Sometimes substantially. Unless you're smashing all cores non-stop with AVX512 instruction
How? (Score:2)
"Our customers on the data center side really, really, really love it." Koduri said Intel has been able to help customers achieve a 285X increase in performance in "our good old CPU socket" just by taking advantage of the extension...
The only way I can think of where this would be useful at all in a datacenter would be with encryption, so encrypting HTTPS connections. That's a small portion of total cost in most datacenters, though.
Only 100MHz (Score:1)
From 2200? That's a lot!
Re: (Score:2)
More importantly, the dude is being deceptive, since the one thing you want to do with such instructions is parallel them up - And that includes using all cores!
accomplishments (Score:1)
Effect on task switching? (Score:2)
Pushing / popping that many bytes must give a noticable hit on task switching as well, and everyone is paying it, even if you aren't using AVX512. Anybody have any numbers on how large the effect is?
Re: (Score:3)
Pushing / popping that many bytes must give a noticable hit on task switching as well, and everyone is paying it, even if you aren't using AVX512. Anybody have any numbers on how large the effect is?
Thats one of the things Linus was complaining about. Real negative performance metrics combined with almost no realizable benefit. There are no AVX-512 supporting processors that cant do the same work in the same time using SSE's 128-bit registers instead.
Re: (Score:2)
There are no AVX-512 supporting processors that cant do the same work in the same time using SSE's 128-bit registers instead.
That's downright absurd.
Why are you making shit up?
Re: (Score:2)
I assume the CPU has some way of telling the OS if it has touched the AVX-512 registers. They are not just double size, but there are also twice as many of them. Saving them all would take 4 times as long as normal.
Sure, sure... now show us that HPC laptop... (Score:2)
HPC is a marketeering wank term anyway, like "cloud".
No person of clue uses it.
AVX-512's problem is that it is incoherent (Score:5, Interesting)
Mod this post up to 11 (Score:2)
Spot on. If Intel actually implemented this in a way useful to the developer of a software application that was not restricted to one processor of one generation, maybe there would be a use for it.
Re: (Score:2)
Take a look at SVE2. It's much cleaner than AVX512.
What some blogger found is irrelevant (Score:2)
Intel Who? (Score:4, Insightful)
When was the last time anyone cared what Intel engineers claimed?
Re: (Score:2)
It's a processor with added purpose built instr. (Score:2)
Wow, I can't believe that people are getting so fanatical (as in religious) over this!
Really, now we need to rag on a processor because it has added instructions meant to accelerate a specific kind of task?!
Give it a rest!
nt (Score:2)
The real reason why Intel has AVX-512... (Score:2)
The reason reason why Intel has AVX-512 and keeps representing it as a very special, needed thing is that it's probably the only differentiating feature they have compared to AMD. On the contrary, they have less features. They don't have PCIe 4.0, they don't have more cores, they don't have support for ECC RAM in as wide range of products, they don't have full memory encryption - those are some things I can name from the top of my head. They only have a bit higher single-threaded performance (at the cost of
Re: IDS (Score:3)
Don't you mean the deep state and branch prediction? ;)
Re: (Score:2)