Intel, AMD Just Created a Headache for Datacenters (theregister.com) 93
An anonymous reader shares a report: In pursuit of ever-higher compute density, chipmakers are juicing their chips with more and more power, and according to the Uptime Institute, this could spell trouble for many legacy datacenters ill equipped to handle new, higher wattage systems. AMD's Epyc 4 Genoa server processors announced late last year, and Intel's long-awaited fourth-gen Xeon Scalable silicon released earlier this month, are the duo's most powerful and power-hungry chips to date, sucking down 400W and 350W respectively, at least at the upper end of the product stack. The higher TDP arrives in lock step with higher core counts and clock speeds than previous CPU cores from either vendor.
It's now possible to cram more than 192 x64 cores into your typical 2U dual socket system, something that just five years ago would have required at least three nodes. However, as Uptime noted, many legacy datacenters were not designed to accommodate systems this power dense. A single dual-socket system from either vendor can easily exceed a kilowatt, and depending on the kinds of accelerators being deployed in these systems, boxen can consume well in excess of that figure. The rapid trend towards hotter, more power dense systems upends decades-old assumptions about datacenter capacity planning, according to Uptime, which added: "This trend will soon reach a point when it starts to destabilize existing facility design assumptions."
A typical rack remains under 10kW of design capacity, the analysts note. But with modern systems trending toward higher compute density and by extension power density, that's no longer adequate. While Uptime notes that for new builds, datacenter operators can optimize for higher rack power densities, they still need to account for 10 to 15 years of headroom. As a result, datacenter operators must speculate as the long-term power and cooling demands which invites the risk of under or over building. With that said, Uptime estimates that within a few years a quarter rack will reach 10kW of consumption. That works out to approximately 1kW per rack unit for a standard 42U rack.
It's now possible to cram more than 192 x64 cores into your typical 2U dual socket system, something that just five years ago would have required at least three nodes. However, as Uptime noted, many legacy datacenters were not designed to accommodate systems this power dense. A single dual-socket system from either vendor can easily exceed a kilowatt, and depending on the kinds of accelerators being deployed in these systems, boxen can consume well in excess of that figure. The rapid trend towards hotter, more power dense systems upends decades-old assumptions about datacenter capacity planning, according to Uptime, which added: "This trend will soon reach a point when it starts to destabilize existing facility design assumptions."
A typical rack remains under 10kW of design capacity, the analysts note. But with modern systems trending toward higher compute density and by extension power density, that's no longer adequate. While Uptime notes that for new builds, datacenter operators can optimize for higher rack power densities, they still need to account for 10 to 15 years of headroom. As a result, datacenter operators must speculate as the long-term power and cooling demands which invites the risk of under or over building. With that said, Uptime estimates that within a few years a quarter rack will reach 10kW of consumption. That works out to approximately 1kW per rack unit for a standard 42U rack.
Boxen (Score:5, Funny)
I thoroughly appreciate the article using this woefully underused plural.
Re: (Score:2)
Re: (Score:2)
I thoroughly appreciate the article using this woefully underused plural.
I heard that used way back in college and early jobs where we had several DEC VAX systems -- VAXen. My first exposure to that system was using the VAX-11/785 running 4.3BSD in my university's CS department in the mid 80s. [ I'm old. :-) ]
Re:Boxen (Score:4, Informative)
What is the Vaxen equivalent to Boxen?
I worked at a company that used to have on-site mainframes
The power and cooling remained in place as the mainframes were first replaced with refrigerator sized DEC ALPHAs alongside GIGANTIC (8U?) Compaq intel systems, and then finally with rows of racks of 4U AMD64 and Xeon based blade servers
In all of that time (3+ decade), growth in company size (from single state to nationwide) and growth in compute capability (from green screen over modem to desktops pushed everywhere supporting private company cloud) they were supported by the same old Liebert systems for cooling and power conditioning.
TLDR: If you are not getting significantly more compute power for the same wattage, then upgrading to the new hotness does not make sense in a market that regularly breaks records for FLOPS/watt
pointless pursuit of headline speed (Score:4, Insightful)
Re: (Score:2)
Comment removed (Score:4, Informative)
Re: (Score:3)
Well you just leave the racks half empty, and still get more performance out of the same space. You can use the extra space for things like SSD storage (uses less power than traditional drives) or have largeer heatsinks etc.
Or you buy the lower spec processors which consume the same power as the old ones, while also providing more cores and higher performance.
Re: (Score:2)
Re: (Score:2)
Only if you go by the assumption that you _have_ to cram as much of this equipment into the data center as it will physically hold. Basically, a space like a data center has a certain amount of physical capacity to hold racks/servers, a certain amount of power capacity and a certain capacity for dealing with waste heat. If you have racks everywhere you can put racks and every rack is as full as it can get, but you have not exceeded the power capacity or heat capacity of the data center, you can lament that
Re: (Score:2)
Re: (Score:2)
The performance per watt is probably still better. More cores and more RAM but only one motherboard, many other parts consolidated. It's the density that's the problem, not the amount - because it requires more wattage and more cooling in less space.
Re:pointless pursuit of headline speed (Score:4, Informative)
Then you are not the target market for these parts.
You know who is? Amazon, Google, Microsoft, Facebook, Apple. And fortune-100 companies that operate their own data centers.
They all want more cores without having to build a whole new datacenter to house them, so they can then use that capacity to run even more VMs on the expanded capacity. If that means pulling in another electrical hookup from the grid to run it (and the cooling for all the waste heat) so be it - more cores = more billable business.
Intel and AMD wouldn't be doing it if they didn't think there was a huge market for it.
Re:pointless pursuit of headline speed (Score:5, Insightful)
That is a fair point, but the Iron Mountains of this world would probably like to see power consumption come down as well since it represents an additional cost to them. A cooler part means for room for profits, and less money spent of overhead like cooling and power conditioning.
btw, wasn't ARM going to go big on datacenters due to power efficiency and compute density?
Re: (Score:2)
You can always just lower the processor power or get a cheaper, less powerful SKU. The performance per watt is increasing, they're just also cramming more of it in one chip and then maxing out the power to look good in benchmarks.
Re: (Score:3)
Re: (Score:2)
wasn't ARM going to go big on datacenters due to power efficiency and compute density?
No, just due to power efficiency. amd64 chips blow ARM completely out of the water on the high end, ARM can't even get close. ARM has made huge inroads into data centers, though. A lot of fancy supercomputers use ARM CPUs to handle I/O, and some other kind of logic to do the actual processing.
Re: (Score:2)
btw, wasn't ARM going to go big on datacenters due to power efficiency and compute density?
They are starting but it will take a good while yet. Lower end ARM are quite efficient which is great for the likes of mobile phones. But on the high end they are still woefully behind.
E.g. The number 2 supercomputer in the world is ARM based according to the TOP500 list. But according to the Green500 list the first ARM system we see is down at number 43 and clocks in at under 1/4 of the energy efficiency of Xeons, EPYCs, but does run toe-to-toe with Power9.
Re: (Score:2)
That is a fair point, but the Iron Mountains of this world would probably like to see power consumption come down as well since it represents an additional cost to them. A cooler part means for room for profits, and less money spent of overhead like cooling and power conditioning.
These parts have more performance per watt, and more performance per dollar. So whatever their power consumption desire is, these units will give them more compute for that power. If they don't want to improve their power supply or cooling then they can save even more money by leaving rack slots open and not buying blades to put in there.
Re: (Score:2)
Yes, I completely agree with you, and I suspect that concrete saws are running somewhere to bring more power into current data centers
I toured an IO datacenter about 10 years ago before they were acquired by Iron Mountain. There was a significant amount of empty space as they were constantly building out, and this seems to have continued as several other Iron Mountain facilities have cropped up within a mile of the original facility (oddly enough along some Level(3) conduit laid before the dot.com crash)
I
Re: (Score:2)
Re: (Score:2)
Yeah but that's not top end.
Back then you could get quad socket opterons with 16 cores a piece giving 64 cores in 1U at probably 600 W of system power. Giving 128 cores and 1200 W in 2 U.
So um what is the article whining about?
Really though the cheapish super micro boards completely blew the blade crap out of the water in terms of price and compute density. The current systems are not remotely unprecedented but you get more cores and a lot more compute for your kilowatt in those 2U.
Re: (Score:2)
Re: (Score:2)
Epyc Rome's top-end 7763 runs at 2GHz. The speed-select 7H12 version runs at 2.25. The highest count Ice Lake and Sapphire Rapids processors, similarly, top o
Re: (Score:2)
Re: (Score:3)
A single unit can draw more power but you can run with fewer units if the workload stays the same.
Unfortunately the workload increases over time so it's just necessary to stay ahead. The reason for increased workload is more functionality, bloated frameworks and inefficient coding.
One consideration I have is that not every routine you execute need a 64 bit core. You can do most tasks with a 32 bit core and a considerable number of tasks can be done with a 16 bit core. But to be able to utilize a processor w
Re: (Score:2)
Re: (Score:1)
The wrong thing is to try to use 100% of the physical space when using smaller machines with more consumption.
Then, we will need to use the SUN directly for energy if we shrink things to the best possible power/space ratio?
This is why it is important not only to increase computing power, but to improve efficiency with better software using in the best possible way the available resources.
And cryptocurrency madness must stop being conceived around computing waste; this is nonsense, just to trash energ
Re: (Score:2)
Performance/watt is still going up. They're just cramming more performance into fewer sockets.
Re: (Score:2)
This saves on datacenter floor space, which is good because land = money, but it makes everything else way harder. Once the dual-socket half-U systems arrive, we're going to have 384 cores emitting up to 1.6KW per U. Times 32U of compute (yielding the remaining 8 to network and power), that's 50KW give or take. Those are also going to demand four top-of-rack fibre links per U at at least 50
Re: (Score:2)
Re: (Score:2)
Now, I see some people suggesting that datacenters will just install more power, but that's much more compl
just wait for load shedding to hit them as well! (Score:4, Insightful)
just wait for load shedding to hit them as well!
Time for more nukes to power them all
Not really a new problem (Score:5, Insightful)
Power density has been an issue for years in data centers, and has been steadily increasing over time. High-density blade chassis were a big jump, but that was 15-20 years ago now IIRC. Some places you might be able to get a full rack but be limited to 30A/208V supply.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Big data centers have been working that way for years. You have separate line items for the rack, the power, and the network/cross-connects. If you didn't get that, it probably means you're on a bundle (which means you're paying for more of something that you don't need) or in a data center that doesn't understand their own costs.
Re: (Score:2)
Treat them like crypto miners (Score:1)
Re: (Score:2)
Meh... they asked for it (Score:5, Insightful)
Data centers wouldn't NEED this kind of power density if it wasn't for this push to "cloudify ALL the things".
If you plan on housing all the computing needs of a nation full of customers under one roof AND need to plan for future expansion too? Then I guess you wind up where they're headed now?
The personal computer was such a breakthrough in the 20th. century because it freed up the need for everyone to connect to centralized mainframes or minicomputers to do everything from dumb terminals. Now, we've just taken all the microcomputers, made them glorified dumb terminals again, and made the connection distance to the "mainframe" a lot longer than it used to be.
Maybe failures like Google Stadia indicate that the line was crossed, at least for computer gamers? But I think there's going to be some kind of reversal of this trend at some point. Amazon's EC2 is already a running joke with how high your monthly fees can climb.
Re: (Score:3)
The question is, will the powers that be allow the change.
Re: (Score:1)
This is really funny.
Then, we pay each month how much? $12? $25? ... for the equivalent power of having a Raspberry Pi in our hands ... if we ask for the Pi 400 kit, this is total $100 all the time we use the machine.
There is a balance has been broken, because the cloud is important for some important things but NOT for everything. To pay storage for backup has some sense, but storing data is mainly a passive task without a lot of CPU needed.
Absolute solutions are plain wrong. The ideal is hybrid .
Re:Meh... they asked for it (Score:5, Insightful)
Data centers wouldn't NEED this kind of power density if it wasn't for this push to "cloudify ALL the things".
Or just write more efficient software. One place to start would be to go back to writing a thing called a "computer program" that runs in a process.
That would probably use orders of magnitude less processing power than setting up dozens of containers that have to exchange hundreds of REST calls amongst themselves just to serve up a single page of data to the end user.
It would also help to cut back on the layers of abstraction so that 90% of the business logic isn't wasted on copying fields between slightly different versions of the same data structures in each layer, often serializing and deserializing while they're at it.
And one more thing: People ought to stop patting themselves on the back for using statically typed languages for speed, then setting up architectures that need to use reflection for every operation because really using static typing is too much work. This usually ends up being less efficient that just using a dynamically typed language in the first place.
Re: (Score:2)
You are trading efficiency for development time/resources. Sure, you can create a monolithic application with asm or bare naked C, but now you are paying for developers time to write it, debug it, maintain and service it. What will take weeks or months in that direction can be done in a day or two with the fat dev tools and abstractions. Capitalism only cares about less dollars being spent and more dollars coming in. If "efficiency" gets in the way of that, that is the definition of counterproductive
Re:Meh... they asked for it (Score:4, Informative)
It's also not efficient to have to develop code using an environment where dozens of containers and VMs all have to interact perfectly, and obscure failures and network issues constantly pop up.
Sure, it might be easy for any one person to work on their little container, but the system as a whole can become a convoluted morass to anybody who actually has to use the stinking thing. Too much of this shortcut dev tool stuff is penny wise and pound foolish.
When I've been stuck using such monstrosities, I try to keep telling myself that all the time I waste dealing with the inevitable issues at least goes toward my paycheck. However, my employer is basically wasting that money since I could be doing actual useful work instead.
Re: (Score:2)
Pretend the externalized costs of a thing don't exist. Capitalism 101.
Re: (Score:2)
All of these data sources have their own security need
Re: (Score:2)
Personally, I greatly prefer letting my cloud provider worry about this crap. Getting adequate power from local hosting companies was always a pain in the ass.
Re: (Score:2)
We're currently on our way to the cloud right now. (Well, trading an in house cloud for an external cloud.) All work spaces and servers are headed for foggy land. The bill is going to be awesome.
Re: (Score:2)
Yes the would still need the density. The total market for datacenters/hosting might be smaller but the density demand would be the same.
Clients want DCs near major population centers where latency to clients will be low. Those are places where realestate aint cheap generally speaking. Its also true there are other efficiency gains by being able to run more VMs per hosts or doing more compute work on single host to be had. The more efficent DC will be the more profitable more competitive DC.
As long as ther
Re: (Score:2)
When compute was expensive, dumb terminals to an obscenely expensive mainframe. Then compute was cheap and networks were slow, so PCs running local software. Now that network is inexpensive - and local compute is hitting thermal brick walls - local terminals that store all their brains in a remote system are making a comeback.
Re: (Score:2)
As most Data centres effectively charge either by the U who wouldn't want to get as much as they can. If I lease an entire suite then I've still only got a certain footprint which equates to a total number of U's I've got to fit everything into
Re: (Score:2)
How is SaaS substantially different in this respect than a company leasing space in the same DC and managing their own hardware?
There's an assumption here that the top-end SKUs will be the driving factors. That isn't true today and it won't be tomorrow. There is typically a price/performance curve. Production where all the cores work at high GHz is priced at a premium per socket, and the number shipped is distinctly lower than a bit down the [sometimes binned] price/performance curve. CPU often is not t
192x64 ? (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Intel? No.
Re: (Score:2)
Re: (Score:3)
192 cores of "x64" architecture.
x64 is a current modern intel or AMD "64-bit" cpu core. contrast with an ARM core, or a CUDA core, or a GPU core.. this is an "x64 cpu core"
A typical desktop with a 13th generation intel i5 has 6 performance cores, and 8 efficient cores for a total of 14.
A top line 2023 Xeon has 60 cores. And you can put more than one cpu on a mother board.
A top line AMD cpu has 96 cores, and you can put more than one cpu on a mother board.(e.g. 2 cpu sockets, with top line AMD would allow fo
Is it a problem? Depends... (Score:3)
I contend that his hand wringing over whether the 'socket per U' density is maintained is silly. The question is whether you have the capacity/performance per U. If a system easily has 4X the cpu, memory, storage, and network as the prior generation, it's no huge problem if you can only go half as dense 'socket wise', you would still be way ahead of the 'dense' friendly generation.
But there are folks that are fixated on how many *sockets* they can cram in a volume, and this just doesn't make any practical sense. If anything, you get some relief on things like weight on the floor for same benefit.
If you really want to go with a 4 socket per U no matter what, go for it, though expect to have to deal in crazy high power densities and water cooling. Otherwise, no shame being a facility that can *only* accomodate about a socket per U or even per 2U when each socket is up to 96 cores and 12 terabytes of ram.
The 140 ton railroad car (Score:2)
I never understood the hand-wringing about the ever heavier generation of railroad cars damaging the tracks.
The railroad tracks in the U.S. are privately owned. If the savings in operating costs for the heavier loads outweigh, to excuse the pun, the damage done to the tracks, run the heavier cars and pay to repair the tracks. Otherwise, don't build those heavier railroad cars.
If the operational benefits of the denser, higher power-consumption chips outweights the cost of buying new racks and upgradin
Re:The 140 ton railroad car (Score:4, Insightful)
Otherwise, what's the crisis? Just run the earlier generation chips and stop whinging about it.
Or even run the newest chips, just don't completely fill your server racks.
I know that in the data center I worked in ~20 years ago(government), that some of the newest blade servers we'd only be able to have a single server in a whole rack, power wise. More power lines would need to be run, and more HVAC systems installed, to be able to put more in.
On the other hand, said single blade could easily handle like 5 racks worth of what used to be there. The newest stuff is not only more power dense, but they're also still more power efficient.
The only problem that would occur is that, theoretically at least, if they're paying a lease based on square footage, another company could lease a more modern facility with much higher power density(and cooling!) and less square footage, and thus have a higher profit margin because they can actually fill their racks. Thing is, I think that any place leasing property is going to factor in the energy capabilities of the place, so we'd be back to "the square footage charge is a footnote compared to the power capability charge".
But balance this against a semi-smart hosting site that, you know, owns the facility, and thus the facility size is a sunk cost. They would have the space to lease out to less energy dense needs(data storage, perhaps?) and still make some extra money that way.
Or even, at some point, shrink their operation area and offer storage/office space or something.
Overall, isn't efficiency going up? (Score:3)
Yes, having more cores in a smaller chassis does mean more power density... but it also means the ability to distribute tasks, and free up rack space. For example, if one has five ESXi servers, and a server upgrade means that the I/O, RAM and core count is doubled for each of them, then at least one of the old servers can be decommissioned, if not two, depending on redundancy needs. The newer servers may take more power, but overall, it can mean fewer servers in the rack, which can be better overall.
Re: (Score:2)
but you need space even if temp to drain one node at an time run updates and reboot it.
Does ESXi have live no reboot updates now?
Re: (Score:2)
This is why you keep more than a minimum going, at least "n+1", but "n+2" would be better since a node could be taken down. One still needs a minimum of nodes for vMotion, but what was done before with nine nodes could be done with seven. Fewer servers, but still enough for breathing room in case one is down for maintenance, and another drops off the air.
Re: (Score:2)
Yes, overall efficiency and perf/watt are going up. You can buy lower-wattage SKUs if you don't need as many cores or as much clockspeed.
Nothing new (Score:2)
Time for a universal DC standard? (Score:2)
I think of this daily at my home with the number of wall-worts, power bricks, USB chargers etc, as well as the circuitry needed to convert the AC in my wires to DC for LED bulbs. If my house just had DC to begin with, I could convert it with a large and efficient DC converter in a central location and probably cut my winter electricity use by 5-10% as well as save lots of money
Re: (Score:2)
Probably not much. Modern switching power supplies are very efficient, particularly when they can be designed for the expected load. A typical modern computer requires many different voltages anyway, and is pretty picky about noise, so if you really wanted to centralize it you'd be running a whole bunch of cable, some of it very low voltage. That would be subject to interference and line losses.
Re: (Score:2)
How much energy is wasted by having the servers convert AC to DC?
Less than 10%.
Why not have the DataCenter route DC to each board directly?
Conductor size. The DC stuff would have to be rengineered to higher voltages, right now all of it is 48V or less.
If my house just had DC to begin with, I could convert it with a large and efficient DC converter in a central location and probably cut my winter electricity use by 5-10% as well as save lots of money on light bulbs and consumer electronics if they could assume reliable DC power to begin with and not convert it from AC.
Nope. Your average buck converter is not much more efficient than a good power supply, which is what high end servers come with, so there really isn't any meaningful efficiency advantage to using DC. And either you need much much bigger conductors to handle your low voltage, or you have to have more expensive buck converters, either way you're going to spend more on hardware for ver
Re: (Score:2)
Re: (Score:2)
Recently I was a bit surprised to learn that lots of telco equipment still runs on 48VDC. A Verizon FIOS interface box had a 48V wallwart. But, a single 12V battery inside.
Also, you probably know the pro audio world uses 48V to to "phantom power" many microphones and other input interfaces and gizmos.
Re: (Score:2)
... cut my winter electricity use by 5-10% as well as save lots of money on light bulbs and consumer electronics if they could assume reliable DC power to begin with and not convert it from AC.
All good thoughts, and you and Thomas Edison would have gotten along very well. :) (he was very big on DC for those who might not know- Tesla wanted AC.)
I'm all for the highest efficiency / least power loss possible, but everyone try to remember: all that loss becomes heat, which offsets your heating costs in winter. IE, none is wasted in heating season.
The thermodynamic efficiency nut in me would love to have a total energy control system which would take in all heat sources into a heat pump / storage /
I'm not seeing the issue here: (Score:2)
It is true that lower spec datacenters aren'
Ya know ... (Score:2)
At this point they could, at least, include a slot (like an Easy Bake Oven [wikipedia.org]) so people could make cookies and such while we work. At the moment, they're just *super* expensive space heaters.
Utter Bullshit (Score:1)
If you designed to have 196 cores take 6U of rackspace, you can still do so. The fact that 196 cores now only uses 2U is irrelevant.
The whole article is ill-concieved from the get-go and written by an obvious moron.
Boo hoo (Score:2)
Are they suggesting that a configuration management engineer can't figure out that they'll have to put in less CPUs per rack to manage the additional wattage? Did they even bother to figure out $$$/IPC?
Well, how does that core count work with Licensing (Score:1)
So it looks like to put MS SQL on that box it might be $1.3 million. I assume we can negotiate a lower price???
TDP is specified by the platform (Score:2)
If you define a new platform, you can sell a processor SKU at the TDP specified by that platform. If you don't define a new platform, you can't just shove a processor with a different TDP in there.
This is irrelevant to current platforms. Stop crying about the power of current platforms going up.
Overblown concern? (Score:2)
Re: (Score:2)
Even large enterprises that are still running their own DCs are finding that they are essenti