Intel Announces Cascade Lake With Up To 56 Cores and Optane Persistent Memory DIMMs (tomshardware.com) 112
At its Data-Centric Innovation Day, Intel today announced its Cascade Lake line of Xeon Scalable data center processors. From a report: The second-generation lineup of Xeon Scalable processors comes in 53 flavors that span up to 56 cores and 12 memory channels per chip, but as a reminder that Intel company is briskly expanding beyond "just" processors, the company also announced the final arrival of its Optane DC Persistent Memory DIMMs along with a range of new data center SSDs, Ethernet controllers, 10nm Agilex FPGAs, and Xeon D processors. This broad spectrum of products leverages Intel's overwhelming presence in the data center, it currently occupies ~95% of the worlds server sockets, as a springboard to chew into other markets, including its new assault on the memory space with the Optane DC Persistent Memory DIMMs. The long-awaited DIMMs open a new market for Intel and have the potential to disrupt the entire memory hierarchy, but also serve as a potentially key component that can help the company fend off AMD's coming 7nm EPYC Rome processors.
Re: (Score:2, Insightful)
Did he mention how many data vulnerabilities this chip has due to shared memory and mutually cached areas?
Compare to nvidia (Score:3)
highend consumer GPUs have about 56 streaming multiprocessors. Each multi-proprocessor can run 2 to 4 SIMT ops on 32 four-byte numbers at a time. These MP are slower than a typical CPU
This intel will have 56 cores and each core presumably has 4 four-byte simd channels. It will likely hyperthread (maybe not) and have pipelined instructions and predictive branching and larger caches.
These things might actually start closing the gap with GPUs and then have all the great general purpose advantage of CPUs.
A
Re: (Score:1)
Re: (Score:2)
Each multi-proprocessor can run 2 to 4 SIMT ops on 32 four-byte numbers at a time. These MP are slower than a typical CPU
Key phrase is "2 to 4 SIMT ops on 32 four-byte numbers at a time". That is some pretty massive parallelism. The "32 four-byte numbers at a time" would be the equivalent of a hypothetical AVX-1024.
Overall I would be surprised if Cascade Lake can match a GPU for massively parallel stuff. Also, the TDP is even more obscene than in a Radeon VII. Up to 400W
( o_o)
Re: (Score:2)
Basically it comes in a similar ballpark, but with 2-3 times the power consumption, and ~10 times the price. Something like that.
Re:Compare to nvidia (Score:5, Insightful)
One of the biggest challenges in machine learning is moving data around from storage and OSs to the machine learning hardware for training and execution. General purpose CPUs typically have direct high performance access to data and this can have a dramatic effect on overall system performance and ease of implementation.
Re: (Score:3)
> With AVX instructions, I believe that each core can perform 32 fused add/multiply operation per clock cycle.
With AVX-512, architectures should have 32 FMA per cycle single precision. So that should be 7TFlops single precision.
Intel was playing with half precision, I wonder they are going to go that route and give us 14TFlops half precision.
Re: (Score:2)
> This intel will have 56 cores and each core presumably has 4 four-byte simd channels. It will likely hyperthread (maybe not) and have pipelined instructions and predictive branching and larger caches.
Intels have supported AVX512 for a couple generation now. So ech core will probably be able to do 2 512-bit FMA per clock cycle.
The main difference in speed between CPUs and GPUs has been in the memory subsystem more than flops. (And also on programmability CUDA is much easier to write than AVX code...)
Re: (Score:2)
The last time I looked into wide-body AVX, a single use of such an instruction on a single core on any thread deturboed the entire CPU down to the 2 GHz range for milliseconds thereafter.
Re: (Score:2)
The main difference in speed between CPUs and GPUs has been in the memory subsystem more than flops. (And also on programmability CUDA is much easier to write than AVX code...)
I only have experience on OpenCL/GL instead of CUDA, but I would guess it's rather similar in broad terms. You need all kinds of setting up to get your code and data on the GPU and then back, so I don't think it's any easier overall. CPU languages have all kinds of helpers to use SIMD instructions without departing from the main CPU code or thread, so it's much more transparent.
I agree that GPUs are in some ways easier for parallel workloads, because the coding tools are built for parallelity to begin wi
Re: (Score:2)
Anyone have thoughts on this?
Yeah. Intel already attempted that and it was an abysmal failure. https://en.wikipedia.org/wiki/Larrabee_(microarchitecture) [wikipedia.org]
Re: (Score:2)
and amd epyc has 128 lanes of pci-e that can be used for any pci-e device not just Intel only Optane stuff.
Re: (Score:2)
Optane is not Intel only. The NVMe-mounted stuff works perfectly for me in an ancient Phenom 2, or in a RockPro64. And it's not just some "pass-through cache" but a real disk. with latency 3-4 times better than on best SSDs, great linear read speeds and not so stellar but still nice linear writes. The linear speed can be fixed with RAID0 -- I found someone selling suspiciously cheap but apparently (according to SMART) new 16GB disks, I snagged four. Still waiting for the delivery of machine I can put th
Re: (Score:2)
https://www.supermicro.com/Apl... [supermicro.com]
AMD 1 CPU
all flash!
NO PCI-E SWITCHES OVER HEAD NEEDED
Also with 56 vulnerabilities for CIA, NSA and FBI (Score:4, Funny)
The fucking company is literally called "Intel"!
For much the same reason the US's first army ... (Score:3)
Re:Why is it called "intelligence" anyway ... and not "spying" or "surveillance" or, even better, "data kraken"?
For much the same reason the US's first army, back during the revolution, was called the "Second Army" or the atomic bomb project was called "The Manhattan Project.
It's the "Fog of War": The name is not for clarity. It's a tool to advance the organization's objectives.
When the enemy is battling the Second Army, his attention is distracted, wondering if the First Army is about to attack from beh
Re: (Score:2)
I still think you pulled all of it out or your ass.
That's intelligence.
Re: (Score:3)
I don't buy that. "intelligence agency" is not a good example of obfuscation, because its meaning is instantly obvious to anyone, unlike, say, "Manhattan project".
One of the definitions of 'intelligence' is "the faculty of understanding". In this case, the faculty of understanding your enemies.
Gathering data is just the first part of what an intelligence agency does. The real value is in analyzing that data into a coherent picture of what your enemy is capable of and what he will do next.
Re: (Score:1)
and nothing of note to indicate intel even gives a shit about the ongoing supply issues (shortage) of normal consumer chips.. just the stuff that pads their bank accounts the most (high end and servers).
epyc also have more pci-e lanes good for IO (Score:1)
epyc also have more pci-e lanes good for IO
Re: (Score:1)
AMD is a profit-oriented company too.
I have more sympathy for them than for Intel, or maybe I should say less antipathy, mostly due to Intel's business methods. But ultimately AMD will also take the prices the market will bear.
OTOH, if they are successful with Epyc, I guess we will see price drops at Intel rather than climbing prices from AMD. Good for the customer. Just my guess of course.
Re: (Score:2)
Oh hey, that's me.
You have no idea what you're talking about.
"Single thread performance" is absolutely important, even in applications that are multithreaded. The only time that it becomes irrelevant is in embarrassingly parallel problem spaces, which are nearly non-existent in my datacenters.
We've done the math, and that's why we're still using Intel. We're open to AMD at some point when they're not trying to win th
56 cores? (Score:3)
I don't know if I could handle having a 56 core processor when the whole time I'll know... deep inside. It's not an 8x8 array of cores in there. :|
Re: (Score:2)
It probably is; but they disable the first 8 cores that fail testing, to get enough production quantity.
I know Microsoft Windows dev team has had access to prototype 128 core systems for years
Re: (Score:2)
No. Yields aren't good enough to support a product with the full chip and everything enabled. You over design then trim back. If yields improve, or the market saturates, you can release a part with fewer things disabled later. Happens with CPUs and GPUs.
Re: (Score:2)
Stop talking out of your ass.
Re: (Score:2)
Care to show us pictures of this recently announced processor?
Re: (Score:2)
To debunk his clam as bullshit, I merely needed to lookup naked pictures of current high-core-count processors. I used 48-core AMDs as an instance.
Any other logic lessons I can teach you today?
Could be 7x8 (Score:1)
A grid of processor cores 7 wide by 8 tall would give you exactly 56 cores though. And these things are determined by the available chip area, which is itself determined by manufacturer goals for performance, heat output, electrical consumption, chip yields, etc.
So yeah, it could be an 8x8 grid, but it could easily be a 7x8 grid too.
Re: (Score:1)
Since Skylake the 'high core count' Xeons use an mesh layout (Instead of a dual ring)
Look at the first two diagrams here to get an idea about both layouts. https://www.anandtech.com/show/11544/intel-skylake-ep-vs-amd-epyc-7000-cpu-battle-of-the-decade/5
You'll see that 28 core Xeon is made up of 6x6 blocks that each have a connection to the mesh. 28 of them are cores, 2 are memory controllers (That each have 3 memory channels!) 4 are PCI express controllers , and 2 are QPI controllers.
The 56core 92xxx proces
Re: (Score:1)
In principle, AMD does something similar with Epyc. Especially with Rome. One central I/O chiplet in 12nm(?) and several computing chiplets at 7nm around it. To me it looks like this actually spreads the heat over a larger area.
I wonder how much heat Rome gives off by the way. Cascade Lake has been announced with up to 400W TDP. If Rome takes less, it will obviously have less problems with cooling too.
Great! (Score:2)
Can I get all that in a laptop? :-)
Re: (Score:2)
Re: (Score:2)
battery life 15 min or less
400w (Score:5, Interesting)
This new top-end CPU comes in at 400w and requires water cooling. Who the hell wants water cooling in the data center!? This just seems like a massive disaster waiting to happen. Also, they're no longer socketed, but instead soldered directly to the motherboard, just like SoCs.
Re: (Score:3)
Re: (Score:1)
Why not think of it as a fast disk instead of slow RAM?
Re: (Score:2)
Re: (Score:1)
It can function as DRAM, but as you said, why would anyone want that? See https://nvdimm.wiki.kernel.org... [kernel.org]
Re: (Score:2)
Because you need software designed to recognize and treat it as such to take advantage of the fact that it's persistent.
Re: (Score:2)
So you need the memory controller and its initialisation to see that it's Optane and act accordingly, and then when that's done, have OS support (i.e. not see it as ordinary DRAM, but allow it to be read and written to via the memory controller as if it were DRAM). Given both, there's hardly a problem.
Re:400w (Score:5, Informative)
This new top-end CPU comes in at 400w and requires water cooling. Who the hell wants water cooling in the data center!? This just seems like a massive disaster waiting to happen. Also, they're no longer socketed, but instead soldered directly to the motherboard, just like SoCs.
Mainframes used to use water cooling. See old IBMs.
Power RF uses water cooling.
Power machinery uses water cooling.
Internal combustion engines use water cooling.
Do it right and it's reliable.
Re: (Score:2, Informative)
Or use mineral oil, like old Crays did. Mineral oil does not conduct electricity.
Re: (Score:2)
I guess Florinert is out of favor these days, being a CFC.
Re: (Score:2)
Do it right and it's reliable.
That's the trick though, isn't it? Everyone knows that nuclear can be done right. TEPCO proves that people will NOT do it right. Same with water cooling or any other project a person might do.
Of course, the knee-jerk, programmed response is to prohibit doing any such thing rather than setting it up so that it has to be done correctly. Short sighted and stupid is how they like us.
Re: (Score:2)
Mainframes used to use water cooling. See old IBMs.
How else would one cool 1.8kW per MCM? :) But let me reformulate that: Mainframes are increasingly using water cooling. The air-cooling option for the IBM systems was still available at least with the previous generation of products and probably still is.
I'm not familiar with new ones. I last worked on such things in the early 90s. It's good to hear that it's still going strong.
Re: (Score:2)
Re: (Score:2)
If an pipe leaks you full rack may just meltdown. Maybe the power will cut out after your 20K MB is wasted but before your storage is damaged.
Re: (Score:2)
Who the hell wants water cooling in the data center!?
This is not only common in the past but there are several current data centre products on the market for water cooling.
Persistent Memory (Score:2, Insightful)
Just what you want... persistent memory... so your keys are easier to steal and the government can see what you were doing when they broke in and stole all of your computers.
Optane write-cycles (Score:1, Interesting)
Unless this Optane is very different from the Optane that Intel has been selling as a hard disk cache, the number of writes per bit before failure falls very short of medium-grade SSDs. That's okay for a lightly used consumer laptop but will soon fail as a disk cache in a heavily used system. Main-memory for a server is even worse - they'll run through the expected life in months, if not weeks.
Re: (Score:2)
Not sure where you read that, but that is completely wrong.
Re: (Score:2)
Not sure where you read that, but that is completely wrong.
Got a source? Because a general google search shows up as endurance falling short of traditional SSDs.
Re: (Score:2)
https://www.tomshardware.co.uk... [tomshardware.co.uk]
And a quick comparison:
https://www.samsung.com/semico... [samsung.com] - 1200TBW
vs
https://www.intel.com/content/... [intel.com] - 17,520TBW
Re: (Score:3)
One more:
https://www.anandtech.com/show... [anandtech.com]
Write endurance for the 983 ZET also falls short of the bar set by Intel's Optane SSDs, with 8.5 DWPD for the 480GB 983 ZET and 10 DWPD for the 960 GB model, while the Optane SSD debuted with a 30 DWPD rating that has since been increased to 60 DWPD.
And that is comparing Samsung's latest (released last month) SSD specifically designed to try and compete with Optane. In some respects it does good, and in others not so much. It's latency is 30us vs optane's 10us, and its write IOPS is pretty poor at 75K IOPS, vs 550k IOPS. But if all you do a read, and you read in a heavily loaded server, then it does well with 750k IOPS vs Optane's 575k IOPS. That isn't really a likely scenario for most workloads, and its w
Re: (Score:3)
There's a reason Micron bowed out of the relationship, neglected to release any 1st generation products, (and looks to not be releasing any 2nd generation products), and has instead doubled down on their investment into traditional DRAM design and manufacturing. 3D Xpoint (Optane) does not meet any of the specs they've claimed it would (even after revising them all, unfavorably, by multiple orders or magnitude). It needs several more years in the oven, and even then it may not pan out.
Re: (Score:2)
Main-memory for a server is even worse - they'll run through the expected life in months, if not weeks.
If you bothered to read TFA, the endurance is expected to last 5 years when being driven at maximum possible throughput the whole time -- and if a given stick fails within that 5 years, you get warranty replacement.
So they almost hit Newegg's April Fool's joke? (Score:1)
Except of course the 1.4Phz clock, 200 threads, and RGB LEDs on the die cover.
Re: (Score:1)
This may be a troll but I'll bite:
Do you have any Epyc results for comparison?
Obviously Rome is not available yet, but the "old" Epyc32 core model should give at least an idea of how much Intel has improved here.
Re: (Score:3)
As soon as Intel and its partners find one, they'll let you know.
I think MS SQL supports it, maybe in some preview build not sure. But to that end, why not just use the already-existing functionality of memory optimized tables, persisted memory DBs, etc.? The only real advantage Optane has is capacity per price, but it sacrifices speed and longevity (down to traditional flash or worse) to get it.
It offers a transparent non-volatile storage, but we've had transparent, disk-backed RAM drives for ages. Opta
BGA for $10k CPU?! (Score:2, Informative)
From here: https://www.tomshardware.com/reviews/intel-cascade-lake-xeon-optane,6061.html
"Instead of being socketed processors, the 9200-series processors come in a BGA (Ball Grid Array) package that is soldered directly to the host motherboard via a 5903-ball interface."
Who is excited to attempt RMA'ing a $10k to $20k Motherboard?
Intel has been on this train for a while now:
https://phys.org/news/2012-11-intel-broadwell-cpu-swap-outs.html
https://www.techpowerup.com/186846/intel-roadmap-outlines-lga-to-bga-tr
Server boards need to much flex to be BGA (Score:2)
Right now there are like 11 super micro boards for 1 socket LGA 3647.
and like 20+ different 2 sockets boards.
Most of the difference is to fit different case sizes and different io choices.
Also only 40 pci-e lanes per socket.
With no socket you are going to end up you can't get X cpu with X board or say your big case board as a min cpu that is over kill for your needs.
AMD will crush Intel again.
Re: (Score:2)
2012 and 2013 articles?
Those journalist's unfounded claims didn't pan out did they?
Re: (Score:2)
Who is excited to attempt RMA'ing a $10k to $20k Motherboard?
No one is excited. People who are buying $10k to $20k motherboards have SLAs that make the entire process incredibly boring and uneventful complete with spare part in place instantly so you don't even need to care about if or when your RMA goes through.
Essential Feature (Score:2)
We're not in Kansas anymore Dorothy (Score:1)
I think we are in a new era. 56 Cores means that the next gen will be >100 cores which means we can be within striking distance of 1000 cores within 10 years.
Maybe ?
Question to Slashdot: Until what year did humanity have more than 56 cores/CPUs ? 1959, 60, 61 ? I wonder...
Persistent Memory (Score:2)
Persistent malware!
Re: (Score:2)
As persistent as that on a legacy disk? That's terrible!