Too Much Gold Delays World's Fastest Supercomputer 111
Nerval's Lobster writes "The fastest supercomputer in the world, Oak Ridge National Laboratory's 'Titan,' has been delayed because an excess of gold on its motherboard connectors has prevented it from working properly. Titan was originally turned on last October and climbed to the top of the Top500 list of the fastest supercomputers shortly thereafter. Problems with Titan were first discovered in February, when the supercomputer just missed its stability requirement. At that time, the problems with the connectors were isolated as the culprit, and ORNL decided to take some of Titan's 200 cabinets offline and ship their motherboards back to the manufacturer, Cray, for repairs. The connectors affected the ability of the GPUs in the system to talk to the main processors. Oak Ridge Today's John Huotari noted the problem was due to too much gold mixed in with the solder."
Re:can someone please explain (Score:5, Informative)
I'm not a chemist either but fortunately somebody who is working on it is.
Munger also reported the problems with the connector pins, which Oak Ridge Today‘s John Huotari noted was due to too much gold mixed in with the solder. Gold is used for connectors because it does not oxidize quickly, and because of its high electrical conductivity; however, when mixed with solder that contains tin, the gold and tin can combine, making the combination brittle (PDF) under certain conditions. Cray is reportedly replacing the connectors to alleviate the problem.
Re:can someone please explain (Score:3, Informative)
Quoting from the article "Gold is used for connectors because it does not oxidize quickly, and because of its high electrical conductivity; however, when mixed with solder that contains tin, the gold and tin can combine, making the combination brittle under certain conditions."
Re:can someone please explain (Score:5, Informative)
I can only guess, but perhaps the coating on the terminals has to maintain certain mechanical properties over time. A wrongly formulated alloy, or a wrong thickness of plating will give you a connector that, perhaps, degrades in presence of heat and vibration. Or perhaps it plastically deforms on the contact area, thus lowering the contact pressure and eventually leading to loss of reliable connection. When you have small contact area, the contact pressure is sufficient to provide essentially a gas-tight connection. As the contact area grows, the pressure drops and eventually you expose your contact area to the atmosphere. At that point things usually go wrong.
Pure gold is soft and by itself it has about the worst properties imaginable for any sort of a connector surface. It literally rubs off, it's so soft. Its low resistance is irrelevant, since the gold layer is very thin. Gold's bulk conductance plays little role in overall resistance of a mated contact pair. You could replace gold with a metal that has 10x lower conductance, usually with little or no measurable change in contact resistance -- that is, if you can find something that can match gold in other properties (wetting of underlying surfaces, resistance to oxygen, etc.).
Gold is also useless as plating for high current terminals. I have designed plenty of connectors where some pins were for small signals and were gold plated, and others were for power and were silver plated. Gold plated power contacts simply lose the gold and then you have all the problems of an unplated contact pair that's exposed to the atmosphere since the gold erodes away leaving craters. It's no fun.
When you get relays with gold-plated contacts, there are often two sets of ratings. One is for low-current use, where the gold is guaranteed to stay on the contacts. Another rating is for sufficiently high current use where the gold is vaporized away and you're left with some other coating material that works well in this application. You can't swap such relays around without realizing what's going on, since contact pairs that were exposed to high currents will perform horribly in small signal, small current applications.
I also can't quite understand why people still buy gold jewelry -- all it took for me was a gold wedding band. I switched to tungsten carbide after a decade and I'm not looking back. The standard 18K alloy is a joke.
Re:Which is another way of saying not enough lead. (Score:3, Informative)
Re:Which is another way of saying not enough lead. (Score:5, Informative)
The lead-free solder has cost billions in failures.
http://en.wikipedia.org/wiki/Whisker_(metallurgy) [wikipedia.org]
http://nepp.nasa.gov/WHISKER/ [nasa.gov]
NASA lost satellites because of lead-free solder (despite them requesting leaded solder). The funny thing is, leaded solder completely prevents whisker formation.
Now, you may not care about whiskers if you just throw away your electronics every year or two, but if you want longevity, these things will kill you. So for lead-free solder preventing pollution? We are producing much more garbage now thanks to whisker-caused short circuit failures.
Re:can someone please explain (Score:4, Informative)
Re:Which is another way of saying not enough lead. (Score:5, Informative)
Re:Which is another way of saying not enough lead. (Score:5, Informative)
The problem in solder is not the lead. It's the tin.
Tin by itself forms whiskers spontaneously. Some of the worst culprits in this isn't the solder, it's the hardware - the tin in hardware used to mount PCBs etc seem to whisker the most and cause problems. And plenty of research have shown what combination of tin ("bright" tin is the worst - and it was only until recently did manufacturers stop using it) led to the worst problems.
Leaded solder suffers from whiskering as well. Anytime you use tin, you'll have whiskers. Its just a matter of time - use the wrong tin and it'll whisker quickly. Use the right tin and it'll whisker slowly. And it's not the result of electrochemistry, electromigration, or anything. It's just tin atoms wishing to migrate to relieve stress in the crystalline structure. They diffuse through the structure - the atoms aren't pulled locally, but from the entire bulk.
We knew this when the first solders were created for electronics. At the time, they experimented and found lead worked "well enough".They never went to find out if there's any other substitute. Massive amounts of R&D is going on in materials science to find alternatives.
Re:Which is another way of saying not enough lead. (Score:4, Informative)
There are a number of process factors that also impact the reliability of a solder joint, including heating and cooling rates, flux chemistry, and the plating of the connected parts. These can effect microstructure, intermetallic formation, and void formation. Like you say, for Tin-Lead this has been studied in depth for decades, the focus on lead-free has only been going on for about 15.