What Does It Take to Build the World's Largest Computer Chip? (newyorker.com) 23
The New Yorker looks at Cerebras, a startup which has raised nearly half a billion dollars to build massive plate-sized chips targeted at AI applications — the largest computer chip in the world.
In the end, said Cerebras's co-founder Andrew Feldman, the mega-chip design offers several advantages. Cores communicate faster when they're on the same chip: instead of being spread around a room, the computer's brain is now in a single skull. Big chips handle memory better, too. Typically, a small chip that's ready to process a file must first fetch it from a shared memory chip located elsewhere on its circuit board; only the most frequently used data might be cached closer to home...
A typical, large computer chip might draw three hundred and fifty watts of power, but Cerebras's giant chip draws fifteen kilowatts — enough to run a small house. "Nobody ever delivered that much power to a chip," Feldman said. "Nobody ever had to cool a chip like that." In the end, three-quarters of the CS-1, the computer that Cerebras built around its WSE-1 chip, is dedicated to preventing the motherboard from melting. Most computers use fans to blow cool air over their processors, but the CS-1 uses water, which conducts heat better; connected to piping and sitting atop the silicon is a water-cooled plate, made of a custom copper alloy that won't expand too much when warmed, and polished to perfection so as not to scratch the chip. On most chips, data and power flow in through wires at the edges, in roughly the same way that they arrive at a suburban house; for the more metropolitan Wafer-Scale Engines, they needed to come in perpendicularly, from below. The engineers had to invent a new connecting material that could withstand the heat and stress of the mega-chip environment. "That took us more than a year," Feldman said...
[I]n a rack in a data center, it takes up the same space as fifteen of the pizza-box-size machines powered by G.P.U.s. Custom-built machine-learning software works to assign tasks to the chip in the most efficient way possible, and even distributes work in order to prevent cold spots, so that the wafer doesn't crack.... According to Cerebras, the CS-1 is being used in several world-class labs — including the Lawrence Livermore National Laboratory, the Pittsburgh Supercomputing Center, and E.P.C.C., the supercomputing centre at the University of Edinburgh — as well as by pharmaceutical companies, industrial firms, and "military and intelligence customers." Earlier this year, in a blog post, an engineer at the pharmaceutical company AstraZeneca wrote that it had used a CS-1 to train a neural network that could extract information from research papers; the computer performed in two days what would take "a large cluster of G.P.U.s" two weeks.
The U.S. National Energy Technology Laboratory reported that its CS-1 solved a system of equations more than two hundred times faster than its supercomputer, while using "a fraction" of the power consumption. "To our knowledge, this is the first ever system capable of faster-than real-time simulation of millions of cells in realistic fluid-dynamics models," the researchers wrote. They concluded that, because of scaling inefficiencies, there could be no version of their supercomputer big enough to beat the CS-1.... Bronis de Supinski, the C.T.O. for Livermore Computing, told me that, in initial tests, the CS-1 had run neural networks about five times as fast per transistor as a cluster of G.P.U.s, and had accelerated network training even more.
It all suggests one possible work-around for Moore's Law: optimizing chips for specific applications. "For now," Feldman tells the New Yorker, "progress will come through specialization."
A typical, large computer chip might draw three hundred and fifty watts of power, but Cerebras's giant chip draws fifteen kilowatts — enough to run a small house. "Nobody ever delivered that much power to a chip," Feldman said. "Nobody ever had to cool a chip like that." In the end, three-quarters of the CS-1, the computer that Cerebras built around its WSE-1 chip, is dedicated to preventing the motherboard from melting. Most computers use fans to blow cool air over their processors, but the CS-1 uses water, which conducts heat better; connected to piping and sitting atop the silicon is a water-cooled plate, made of a custom copper alloy that won't expand too much when warmed, and polished to perfection so as not to scratch the chip. On most chips, data and power flow in through wires at the edges, in roughly the same way that they arrive at a suburban house; for the more metropolitan Wafer-Scale Engines, they needed to come in perpendicularly, from below. The engineers had to invent a new connecting material that could withstand the heat and stress of the mega-chip environment. "That took us more than a year," Feldman said...
[I]n a rack in a data center, it takes up the same space as fifteen of the pizza-box-size machines powered by G.P.U.s. Custom-built machine-learning software works to assign tasks to the chip in the most efficient way possible, and even distributes work in order to prevent cold spots, so that the wafer doesn't crack.... According to Cerebras, the CS-1 is being used in several world-class labs — including the Lawrence Livermore National Laboratory, the Pittsburgh Supercomputing Center, and E.P.C.C., the supercomputing centre at the University of Edinburgh — as well as by pharmaceutical companies, industrial firms, and "military and intelligence customers." Earlier this year, in a blog post, an engineer at the pharmaceutical company AstraZeneca wrote that it had used a CS-1 to train a neural network that could extract information from research papers; the computer performed in two days what would take "a large cluster of G.P.U.s" two weeks.
The U.S. National Energy Technology Laboratory reported that its CS-1 solved a system of equations more than two hundred times faster than its supercomputer, while using "a fraction" of the power consumption. "To our knowledge, this is the first ever system capable of faster-than real-time simulation of millions of cells in realistic fluid-dynamics models," the researchers wrote. They concluded that, because of scaling inefficiencies, there could be no version of their supercomputer big enough to beat the CS-1.... Bronis de Supinski, the C.T.O. for Livermore Computing, told me that, in initial tests, the CS-1 had run neural networks about five times as fast per transistor as a cluster of G.P.U.s, and had accelerated network training even more.
It all suggests one possible work-around for Moore's Law: optimizing chips for specific applications. "For now," Feldman tells the New Yorker, "progress will come through specialization."
It takes... (Score:3, Funny)
Lots of dupes
Re: (Score:3, Funny)
By having all the dupes on one web site it concentrates the power.
Re: (Score:2)
It is not a dupe.
It is not a trupe.
It is a quadrupe.
This may be a new record for Slashdot.
https://hardware.slashdot.org/... [slashdot.org]
https://hardware.slashdot.org/... [slashdot.org]
https://hardware.slashdot.org/... [slashdot.org]
Re: (Score:2)
Duping the silicon is an important part of the manufacturing process.
Dupe (Score:1)
Dupe,
Dupe, dupe dupe duplicate dupe.
Cerebras is doing a big PR push (Score:2)
Re: (Score:2)
Tesla DOJO is an array of chips.
It is not wafer-scale.
Re: (Score:2)
"Wafer scale" is mostly a marketing term. At best it is a proxy for the attributes that customers care about: throughput, bandwidth, latency, cost, and power efficiency.
Imagine what we could do with that. (Score:5, Funny)
With so much processing power it should be possible to search Slashdot for Dupes https://hardware.slashdot.org/... [slashdot.org]
Tech vs nature (Score:2)
Meanwhile the human brain consumes on average 12-15W while having measly 86 billion neurons.
I wonder why the tech created by us and which long has been scaled far below/smaller biological neurons is so inefficient.
Re: (Score:2)
Give us that kind of timeframe and I'm pretty sure we'll surpass that level of efficiency.
Re: (Score:2)
Meanwhile the human brain consumes on average 12-15W
When doing facial recognition, a human brain can process about 0.2 images per second.
A deep ANN running on Cerbras can process 10,000 images per second.
So Cerebras consumes fewer Joules per image. It also produces measurably better results.
while having measly 86 billion neurons.
Comparing the number of neurons to the number of transistors doesn't mean much. A neuron can do way more than a transistor. But it does it much slower.
I wonder why the tech created by us and which long has been scaled far below/smaller biological neurons is so inefficient.
Biological neurons have been evolving for 500 million years.
Artificial neural networks have been in development for ab
Plus ca change ... (Score:3)
Think how "antiquated" this new "wafer scale" fabrication process is, reminiscent of the photogravure process used to prepare photographs for printing. The first example of this process, almost 200 years old now, is credited to Nicephore Niepce, who also is credited for making the first permanent photograph with a camera.
"A chip begins as a cylindrical ingot of crystallized silicon, about a foot across; the ingot gets sliced into circular wafers a fraction of a millimeter thick. Circuits are then "printed" onto the wafer, through a process called photolithography. Chemicals sensitive to ultraviolet light are carefully deposited on the surface in layers; UV beams are then projected through detailed stencils called reticles, and the chemicals react, forming circuits"
Nicephore used direct sunlight, in 1822, instead of UV, but the engraving process is essentially the same.
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
Can it run Cyberpunk? (Score:2)
So it begins (Score:2)
What Does It Take? (Score:1)
Re: (Score:2)
The MineFrame! (Score:2)
So it's a mainframe/supercomputer on a chip. Interesting, but limited application. Oh wait - MOAR MINING!
Mandatory (Score:1)
Some Tech writers don't know Tech (Score:2)