Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Twitter Data Storage Social Networks The Internet

Extreme California Heat Knocks Key Twitter Data Center Offline (cnn.com) 62

Extreme heat in California has left Twitter without one of its key data centers, and a company executive warned in an internal memo obtained by CNN that another outage elsewhere could result in the service going dark for some of its users. CNN reports: "On September 5th, Twitter experienced the loss of its Sacramento (SMF) datacenter region due to extreme weather. The unprecedented event resulted in the total shutdown of physical equipment in SMF," Carrie Fernandez, the company's vice president of engineering, said in an internal message to Twitter engineers on Friday. Major tech companies usually have multiple data centers, in part to ensure their service can stay online if one center fails; this is known as redundancy.

As a result of the outage in Sacramento, Twitter is in a "non-redundant state," according to Fernandez's Friday memo. She explained that Twitter's data centers in Atlanta and Portland are still operational but warned, "If we lose one of those remaining datacenters, we may not be able to serve traffic to all Twitter's users." The memo goes on to prohibit non-critical updates to Twitter's product until the company can fully restore its Sacramento data center services. "All production changes, including deployments and releases to mobile platforms, are blocked with the exception of those changes required to address service continuity or other urgent operational needs," Fernandez wrote.
In a statement about the Sacramento outage, a Twitter spokesperson told CNN, "There have been no disruptions impacting the ability for people to access and use Twitter at this time. Our teams remain equipped with the tools and resources they need to ship updates and will continue working to provide a seamless Twitter experience."
This discussion has been archived. No new comments can be posted.

Extreme California Heat Knocks Key Twitter Data Center Offline

Comments Filter:
  • Twatter (Score:5, Funny)

    by Anonymouse Cowtard ( 6211666 ) on Tuesday September 13, 2022 @05:21AM (#62876851) Homepage

    As a result of the outage in Sacramento, Twitter is in a "non-redundant state"

    Oh no, no, no, no, no. It's redundant, baby. It's redundant.

  • by mccalli ( 323026 ) on Tuesday September 13, 2022 @05:29AM (#62876855) Homepage
    I am suddenly more of a fan of global warming than I was prior to this article being posted.
  • Was the temperature SO extreme that they could not possibly CONCEIVE of such a high temperature when they designed their cooling system ?

    And of course, now everyone else is forewarned by their mistake, so the next time it happens it is DEFINITELY incompetence or corner-cutting.

    • by Freischutz ( 4776131 ) on Tuesday September 13, 2022 @06:02AM (#62876901)

      Was the temperature SO extreme that they could not possibly CONCEIVE of such a high temperature when they designed their cooling system ?

      And of course, now everyone else is forewarned by their mistake, so the next time it happens it is DEFINITELY incompetence or corner-cutting.

      I was thinking the same thing but apparently nobody could have foreseen cooling problems when building a data centre in an area prone to heatwaves anymore than anybody could have foreseen a meltdown when building a nuclear plant in an earthquake & tsunami zone. Sounds like a whole bunch of engines and business people need some serious lessons in obvious problem prediction.

      • Re: (Score:3, Funny)

        by Megane ( 129182 )
        Specifically, they need a ride on the B-Ark.
      • Sounds like a whole bunch of engines and business people need some serious lessons in obvious problem prediction.

        The entire business model was built around nothing but hot air from the start...

      • by gweihir ( 88907 ) on Tuesday September 13, 2022 @07:58AM (#62877103)

        Sounds like a whole bunch of engines and business people need some serious lessons in obvious problem prediction.

        Typically in these cases, the engineers did a good job but made a key mistake: They presented "options" to management with different probabilities of things failing. And management predictably took the cheapest one that still seemed somewhat reasonably to them, but, as usually, management did not understand what was going on.

        I learned from my mother how to do this right: Present 3 options to the customer. Make one so obviously bad the customer will see that themselves. Do one you want them do chose. And then add one that is more expensive with non-necessary frills and some gold-plating. The customer will usually chose the one you wanted and sometimes the gold-plated one. Only the most stupid customers will select the bad one and usually you can still convince them otherwise.

        • Sounds like a whole bunch of engines and business people need some serious lessons in obvious problem prediction.

          Typically in these cases, the engineers did a good job but made a key mistake: They presented "options" to management with different probabilities of things failing. And management predictably took the cheapest one that still seemed somewhat reasonably to them, but, as usually, management did not understand what was going on.

          I learned from my mother how to do this right: Present 3 options to the customer. Make one so obviously bad the customer will see that themselves. Do one you want them do chose. And then add one that is more expensive with non-necessary frills and some gold-plating. The customer will usually chose the one you wanted and sometimes the gold-plated one. Only the most stupid customers will select the bad one and usually you can still convince them otherwise.

          And what happens when they pick option #1 because they are greedy morons? It's easy to lay the blame at the feet of 'management', those people usually do their homework before investing huge quantities of money in a project but sometimes you run into a bunch of greedy salivating morons. The fact that any of this was allowed to happen represents an epic chain of failures all the way from engineering, through risk assessment, business planning and up to CEO level. If management wants you do do something that

      • If these people are building datacentres for twitter instead of key infrastructure like nuclear plants, we're making progress. Maybe we can get them jobs building datacentres for facebook and tiktok too!

        • If these people are building datacentres for twitter instead of key infrastructure like nuclear plants, we're making progress. Maybe we can get them jobs building datacentres for facebook and tiktok too!

          Given the history of cost overruns, engineering failures, meltdowns and other disasters in the history of nuclear power it seems a fair few of them are indeed building nuclear power plants.

    • by Shimbo ( 100005 ) on Tuesday September 13, 2022 @06:16AM (#62876921)

      Was the temperature SO extreme that they could not possibly CONCEIVE of such a high temperature when they designed their cooling system ?

      That's not how you engineer things. You make an assessment on how likely extreme events are and do a cost benefit analysis. And "we already have redundant data centres, so losing one temporarily isn't a big deal" comes into that equation.

      • Was the temperature SO extreme that they could not possibly CONCEIVE of such a high temperature when they designed their cooling system ?

        That's not how you engineer things. You make an assessment on how likely extreme events are and do a cost benefit analysis. And "we already have redundant data centres, so losing one temporarily isn't a big deal" comes into that equation.

        And if the company is less than honest, the cost a rare but likely event that can take the entire system off line is balanced against extra profits from not having to engineer for it. Thus the company is more profitable, and the consumers may be without a necessary service (not talking twitter here, actual necessary services) for a period of time long enough to be a problem, but the company is more profitable.

      • Moreover, when you design a data center for high efficiency you lose a few buffers. Things like air cooled chillers are often used for backup/redundant/low-water operations, but they have a hard upper limit on outside air temperature where they can still provide meaningful cooling.

        When you design a facility with a cold aisle temperature of 82-85F you don't have much time if it creeps up to 90F.

    • could not possibly CONCEIVE of such a high temperature
      Planning for every eventuality is not possible, which is why one tries to develop a fault flexible architecture. One product I worked on was designed to have survived nine out of eleven simultaneous data center outages with zero downtime. They simulated three out of six, then called that "good enough". I would have tested it to the full 9's but I don't have to pay the bills either.

      Almost every answer to "Why don't they..." is the same.
      Cost, effort, tim

    • by splutty ( 43475 )

      Yes. Yes it was. If your data center is designed to handle X days of 40C+ and you get X+30 days of 40C+ at one point you're not going to be able to cool properly anymore.

      Unless you want everything to be a lot more expensive to be able to handle what were, at the time of construction, extremely low chance events.

    • by thegarbz ( 1787294 ) on Tuesday September 13, 2022 @09:05AM (#62877297)

      You can conceive anything. What you can't do is ever justify the cost of buildings something to withstand everything you can conceive. When someone says something was designed to withstand up to a 1/100 year event, what they mean is there's a 1% chance it may fail in any given year.

      • I was bemoaning a product I helped worked on. The product manager said to not feel bad, it has a 99% success rate. I responded that we sold a millions of them.

        • And? Absolute numbers themselves are not sufficient information. What did the failure of your product do, that is the relevant part. Did it mildly inconvenience people like a McDonalds toy which broke causing a kid to cry? Did it leave someone without internet for a week during an RMA process? Did it kill tens of thousands of your customers?

          99% success rate (or in some cases even far lower) is perfectly acceptable for a low consequence product. If on the other hand we were talking about the brakes of a car

          • A high consequence product, sold to industries not consumers. Failure means a high expense to the customer, possibly other consequences.

      • True, you can indeed conceive anything.

        But I find it hard to believe that - knowing that global warming is occurring - they did not conceive that they would encounter the temperatures that they did just encounter. That those temperatures were soooo far outside the norm.

    • Taking that to absurd conclusions: should they have also built the place with a 3 meter thick reinforced concrete dome over it too, because meteor strikes can't be ruled out and can be conceived of as well, and could take the whole facility offline too. Or, maybe they should have built it with protection from electromagnetic pulses - you never know what those crazy Russians are going to do...

      At some point the probability of an event gets low enough that it's not worth the extra cost to harden against it, e

      • "At some point the probability of an event gets low enough [...]"

        That is the point. Was the probability of encountering the temperature that they have just encountered *really* so low that they could assume they would never have to deal with it, knowing that global warming is occurring ?

        I find that hard to believe.

    • Certainly never in Sacramento, a city founded when they discovered molten gold flowed there in the summer.

    • They probably would have had to utilize something like geocooling to keep the systems running. That's not always feasible.

  • And yet (Score:5, Funny)

    by memory_register ( 6248354 ) on Tuesday September 13, 2022 @06:09AM (#62876913)
    Nothing of value was lost.
  • by gavron ( 1300111 ) on Tuesday September 13, 2022 @06:37AM (#62876941)

    Usually on a weekend, slashdot editors take a wile joyride with word choice. This time it extends past the weekend.

    Redundancy is not replication. Redundancy means some effort at play to mitigate loss. Replication means
    duplication with the goal of maintaining sustained availability in the face of outages. The Sacramento
    datacenter is an example of the latter.

    Finally, the "going dark" thing. This is a Fear Uncertainty Doubt (FUD) thing that is used by everyone
    from LEOs and encryption to ermagod FB having to turn down a datacenter. It's just become absurd.

    We don't care about "going dark" if all that means is less twitter, less FB, and less BeauHD. We care
    when the underlying data, article, UGC, or site is of some value. None of these are.

    Replication is the key to avoid these issues, and it's soundly in place.

    You can go back to sleep now, knowing no matter what happens in California, the swill you crave
    will be online just fine.

    • by Anonymous Coward

      We don't care about "going dark" if all that means is less twitter, less FB, and less BeauHD...

      When you sustain a product that answers to shareholders, outages matter.

      Stop making stupid assumptions.

      • by gavron ( 1300111 )

        > When you sustain a product that answers to shareholders, outages matter.

        Products don't answer to shareholders. Period.

        Product outages don't answer to shareholders. Period.

        Management's abilities to have duplicate datacenters such that RANDOM NONPAYING PEOPLE (note: not shareholders) can access RANDOM DATA uploaded by other RANDOM PEOPLE (UGC and again nothing to do with shareholders) has nothing to do with anything.

        In a public company management's role and goal is to increase shareholder value. In so

    • by EvilSS ( 557649 )
      I didn't realize /. editors also worked weekends for CNN
  • i like twitter only slightly better than facebook, and i hate em both, i do have a twitter account for when i get bored and want to read drivel other than 4chan's brand of drivel, and if i used twitter's app the spamvertising would make twitter unusable, so i use a browser with adblocking, its twitter's fault, they are the ones that turned up the noise on the signal to noise ratio, if they kept the spamvertising to a minimal then i would not bother with adblocking
  • After all, this is Twitter that we are talking about.
  • ...telling us that Twitter is toxic.

    Pray for more extreme events at twitter data centers. We still might be able to save humanity before twitter becomes skynet.

  • But they know how to play a long game. Spending years to fight against anti-climate change initiatives and support ever increasing pollution all to knock offline a datacentre who blocked their favourite president.

  • For anything that requires a high reliability power supply. Why?
  • Twitter's redundant datacentre approach is working: an entire datacentre is down and twitter is still up, but if that's the end of Twitter's redundancy, it's risky. As to why the entire datacentre is down, that's certainly a problem worth solving, but one we don't have enough information about to be able to comment intelligently. I'd guess a flaw in the cooling (or maybe power) being triggered under stress. ACs are electrical-mechanical systems, they fail most often when they're needed most (heatwaves) beca
  • Hate and stupidity halted in their tracks for who knows how long.

  • And there was much rejoicing!

  • Was this datacentre shut off because it couldn't get sufficient power for cooling, or was it shut off because it couldn't get any power at all?

    • It was shut off because the existing cooling system was insufficient to keep the equipment cool enough.

      • Kind of makes you wonder if they could just let the CPUs run in lower p-states? Or would they be so slow operating thusly to be worth keeping online?

  • You would think an SF company would realize that Sacto is like the hottest place in the summer, while SF is the coolest. But, SF electric power is provided by expensive capitalist PG&E, while Sacto has socialist electric power, provided by cheap SMUD.

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (10) Sorry, but that's too useful.

Working...