Forgot your password?
typodupeerror
Power IT

Multiple Sites Down In SF Power Outage 423

Posted by kdawson
from the or-was-it-a-drunk dept.
corewtfux writes with word of a major outage apparently centered on 365 Main, a datacenter on the edge of San Francisco's Financial District. Valleywag initially claimed that a drunken person had gotten in and damaged 40 racks, but an update from Technorati's Dave Sifry says the problem is a widespread power outage. Sites affected include Technorati, Netflix (these display nice "We're Dead" pages), Typepad, LiveJournal, Sun.com, and Craigslist (these just time out).
This discussion has been archived. No new comments can be posted.

Multiple Sites Down In SF Power Outage

Comments Filter:
  • by slug_bait (118345) * on Tuesday July 24, 2007 @06:18PM (#19976327)
    I can verify that it affected much of the Financial District here in SF. We had the power go out 3 times. Seems to be back now. Haven't heard any explanation yet.
  • Oblig.... (Score:5, Funny)

    by Anonymous Coward on Tuesday July 24, 2007 @06:18PM (#19976329)
    im in ur datacentr
    trashin ur racks
  • Redundant? (Score:5, Insightful)

    by DogDude (805747) on Tuesday July 24, 2007 @06:19PM (#19976335) Homepage
    Don't these large sites have failover capable, redundant servers in multiple physical locations? Why should a failure in one rack, one room, or heck, even one state for the giant sites, effect them?
    • Because those who bought colo services were in fact ripped off, and should now be proceeding to San Francisco to seek veangance upon those who can do little more than process credit card payments.
      • Re: (Score:3, Funny)

        by RobertB-DC (622190) *
        Because those who bought colo services were in fact ripped off, and should now be proceeding to San Francisco to seek veangance upon those who can do little more than process credit card payments.

        Perhaps they could begin their vengeful wrath by hiring a few (more?) winos...
    • You could then get all the geeks to crank the handles and keep the web running!
    • Re:Redundant? (Score:5, Informative)

      by Anonymous Coward on Tuesday July 24, 2007 @07:17PM (#19977053)
      They do, but one of the dirty little secrets of most data centers is that they don't have enough generator capacity for all the cooling. They'll woo you with the generator, the 2,000 gallons of diesel, and N+1 array of UPSes, but when utility power dies, it gets hot very quickly. And some racks must go down.
    • Re:Redundant? (Score:5, Interesting)

      by ryanisflyboy (202507) on Tuesday July 24, 2007 @07:32PM (#19977247) Homepage Journal
      For some of these sites they are a lot more central than you might realize. If they failed to build their systems with a secondary site in mind it can be near impossible for the "CTO" types to pony up the dollars for it later. The biggest issue I have seen that affects this is storage. Either they aren't using suitable SAN technologies, or they didn't put enough money behind the storage initiative to set up secondary site replication. I agree with you though. This is a problem that has been solved. Perhaps netflix thought - wth - if we go out for a few hours and people can choose their movies that's just tough luck.

      Sun.com going down is a good example of someone totally screwing up. They have absolutely NO excuse. The others - maybe they can get away with it and we won't care. If Sun can't keep their own site up, how can I expect them to keep mine up?
  • Other sites.. (Score:3, Informative)

    by king-manic (409855) on Tuesday July 24, 2007 @06:19PM (#19976337)
    Gamefaqs/Gamespot is also down. I wonder if it's related.
    • Re: (Score:3, Informative)

      by nuzak (959558)
      Gamefaqs/Gamespot is C|Net, located on Rincon Hill in downtown SF, and their servers are probably in 365main. So yeah.

      Anyway, PG&E says it's over now, but they still don't have an explanation as to why. Shyeah (rolls eyes)
    • by Virak (897071)
      I don't think it is. While the main site has been having some problems for a little while now, I haven't had any trouble reaching db.gamefaqs.com (the domain for the actual FAQs and such), which seems to be on the same server.
  • by msimm (580077) on Tuesday July 24, 2007 @06:19PM (#19976349) Homepage
    Does this mean backup generators have failed or is the fault somewhere outside the datacenter? Time to start shopping.
    • by Gazzonyx (982402)
      It means they (the sites) didn't bother to setup fail-overs to another site, geographically separate. Now we know who keeps all of their eggs in one basket. :)
  • by riceboy50 (631755) on Tuesday July 24, 2007 @06:23PM (#19976393)
    It's interesting that so many major sites would go down in a local power outage? Are they all sharing one data center in SF? If so, why don't they have co-locations in other cities?
  • um, like i said.
  • (That's the fantasy sports site that works like a stock market, if you didn't know.)
  • by nsanders (208050)
    I can hear it now, the sound of a million emos all finally committing suicide.
    • by eln (21727) * on Tuesday July 24, 2007 @06:36PM (#19976573) Homepage
      Impossible, they would never commit suicide without posting a note in the form of bad angst-filled poetry to their blog first. There is no chance any of them will actually kill themselves until the site is back online.
    • Re: (Score:3, Funny)

      by dextromulous (627459)

      I can hear it now, the sound of a million emos all finally committing suicide.
      Nah, they wouldn't commit suicide if they couldn't blog about it afterwards...
  • by fromtheblueline (717915) on Tuesday July 24, 2007 @06:28PM (#19976447)
    At least 20,000 without power in downtown S.F. Marisa Lagos and Demian Bulwa, Chronicle Staff Writers Tuesday, July 24, 2007 (07-24) 15:12 PDT SAN FRANCISCO -- At least 20,000 customers of Pacific Gas and Electric Co. in downtown San Francisco lost power this afternoon, the utility said. Brian Swanson, a spokesman for the utility, said outages have been reported throughout downtown and along the Embarcadero, including at PG&E's office on Beale Street near the Ferry Building. It was unclear initially how many customers who lost power remained without it for a sustained period. Power outages were also reported in the South of Market neighborhood, the Outer Mission and down the 3rd Street corridor south of Mission Bay. PG&E officials said they did not know why power had gone out, but most customers appeared to be back online by 3 p.m. The outage has prompted Muni to run shuttles in the place of cable cars, a spokeswoman said. The T-Third Metro line was unable to cross the 4th Street Bridge for a short time, but power was restored to the drawbridge by 3 p.m. Muni bus lines 14, 49, 30, 41 and 45 were without power for about 30 minutes following the outage, but are now working, spokeswoman Maggie Lynch said. Parking Control officers were deployed to the Outer Mission, 3rd Street and Monterey Avenue for traffic control, she added. Power first went offline around 1:50 p.m. and came back at least three times in the downtown area before shutting off again. The same problems were reported in South of Market all the way to AT&T Park and the Caltrain station at Fourth and King streets, and traffic lights were out as far south as Monterey Boulevard. At the Westfield Center at Market and Fifth streets, only one of six Nordstrom elevators was working while the shopping mall ran on a backup generator. Shoppers milled around as the lights flickered on and off. BART is still running trains but the lights at its downtown stations have flickered on and off several times, said spokesman Linton Johnson. The transit agency also has concerns about the ventilation system, which is on the same grid as the lights, he said, but will keep its downtown stations open so long as the lights and ventilation continue to work. Workers at several downtown and South of Market offices were reportedly sent home for the day following the outage. Additionally, the datacenter 365 Main -- which hosts Web sites including Craigslist and Yelp -- lost power.
  • by Darth_brooks (180756) <clipper377@gm3.1415926ail.com minus pi> on Tuesday July 24, 2007 @06:36PM (#19976571) Homepage
    We are working with our co-location facility managers to assess why it is back-up power generators failed to provide the necessary back-up power to prevent our site going down. We apologize for any inconvenience caused by our site being unavailable this afternoon.

    I think that's admin speak for:

    I warned these idiots eight months ago during my review that the datacenter had outgrown its generator capacity. But did they listen? Fuck no, they just kept counting money and worrying about the bottom line. The beancounters looked at me like I'd asked them for a blowjob from their grandmothers when I submitted the workup for additional generator capacity. And now that the shit's hit the fan, whose ass are they screaming for? Screw this, I'm applying at Taco Bell.
  • According to this article [myway.com] it appears the Netflix outage is unrelated to the power outage in downtown San Francisco.

    Netflix's Web site - the hub of its rental system - went down Monday evening and remained inaccessible as of Tuesday afternoon (EDT). Spokesman Steve Swasey attributed the outage to an unanticipated problem that he declined to describe. Engineers hoped to fix the trouble by 2 p.m. EDT.

  • Someone came in shitfaced drunk, got angry, went berserk, and fucked up a lot of stuff. There's an outage on 40 or so racks at minimum.

    Libel lawsuit in 3...2...

  • LOLcurrent (Score:3, Funny)

    by carou (88501) on Tuesday July 24, 2007 @06:51PM (#19976745) Homepage Journal
    I is not in ur datacenter, 2 power ur servers.
  • Just called a friend at One Market, the big office tower downtown at the end of Market Street, and she says the power has been going on and off there for hours. Building alarms were sounding, but nothing serious was happening other than power loss.

  • by Honig the Apothecary (515163) on Tuesday July 24, 2007 @06:53PM (#19976773)
    Press Release on Red Envelope having 2 years of uptime at 365 Main - San Francisco from today: http://365main.com/press_releases/pr_7_24_07_red_e nvelope.html [365main.com]
    • by Animats (122034) on Tuesday July 24, 2007 @07:16PM (#19977049) Homepage

      Data sheet for 365 Main [365main.com]:

      The company's San Francisco facility includes two complete back-up systems for electrical power to protect against a power loss. In the unlikely event of a cut to a primary power feed, the state-of-the-art electrical system instantly switches to live back-up generators, avoiding costly downtime for tenants and keeping the data center continuously running.

      They use a Hytec Continuous Power System [pageprocessor.nl], which is a motor, generator, flywheel, clutch, and Diesel engine all on the same shaft. They don't use batteries.

      With this type of equipment, if for some reason you lose power and the generator doesn't start before the flywheel runs down, you're dead. There's no way to start the thing without external power. Unless you buy the optional Black Start feature [pageprocessor.nl], which has an extra battery pack for starting the Diesel. "Usually the black start facility will not be often needed but it won't hurt to consider installing one. Just imagine if you were unable to start up your UPS system because the mains supply is not available.". Did 365 Main buy that option?

  • ... it was down a few months back, and as every blog owner and their dog include a little technorati script or graphic on their sites, they were loading very slowly, if at all.

    So I edited my hosts.conf so technorati points at my localhost.

    Can't say that's degraded my blog-reading experience in the least.

  • As someone who lives and works in San Francisco, I can attest that "a crazy homeless dude did it" is a fairly sensible first guess for most problems.
  • by duplicate-nickname (87112) on Tuesday July 24, 2007 @07:12PM (#19977009) Homepage
    This has got to be some type of joke: RedEnvelope Reports Two Years of Continuous Uptime at 365 Main's San Francisco's Datacenter [365main.com].

    It was released today....
  • by linuxwrangler (582055) on Tuesday July 24, 2007 @07:13PM (#19977021)
    It's been a long time since I went on a tour of several data centers to locate a new facility for our dot-com. I believe that 365 Main was a facility that does not use a battery UPS. Instead, they have engine-backed flywheel UPS system (see http://www.enterprisenetworksandservers.com/monthl y/art.php?2813 [enterprise...ervers.com] for a description). At the time, they have 10 2-megawatt generators on the roof in a N+2 configuration. The engines are kept heated and are spec'd to go from stop to engage-clutch/deliver-power in 3 seconds. The flywheel can deliver 11 seconds of power so they can fail through a couple of bad engines before running out of flywheel power. They periodidally do a 20-hour load test into a pair of 500,000 watt heat-sinks. Time will tell if this outage was a failure of design, failure of maintenance, or outright malfeasance. But it wasn't supposed to happen. They've got some 'splainin' to do.

    As to diesel storage, use of diesel is widespread for emergency use everywhere from hospitals to emergency-services to hospitals. Those systems are run regularly - typically weekly. The use of biocides, stabilizers, and mobile fuel-scrubbing services, and extra filtration systems can maintain the fuel quality. Our colo currently maintains a 1-week fuel-supply and has multiple quick-refuel contracts in place. I can't imagine any colo having less than 24-48 hours in-the-tank with quick-refill on-call.

    But one thing that is missing is cooling. Our colo has a typical contract that says something like blah-blah won't exceed 80F for more than 4 hours blah blah. OK, but a rack full of blade servers can crank out 15-20kW of heat load and a data center can heat up real quick without AC. By contract, 150F for 3.5 hours would be in-spec.
  • by akita (16773)
    Pinging openbsd.org [199.185.137.3] with 32 bytes of data:
    Reply from 199.185.137.3: bytes=32 time=239ms TTL=236

    Pinging freebsd.org [69.147.83.40] with 32 bytes of data:
    Reply from 69.147.83.40: bytes=32 time=191ms TTL=47

    Pinging netbsd.org [204.152.190.12] with 32 bytes of data:
    Reply from 204.152.190.12: bytes=32 time=213ms TTL=241

    Lost irony.
  • Wanted to look at 365 main in google maps' street view but the button isn't available.

    Doesn't seem to be showing airborne/satellite images either.
  • I just tried to look at my blog on livejournal, and got a 403 error, not 404. Intermittent errors are quite common on lj, so I thought I'd try again later.

    So then I checked my Netflix queue, and couldn't get to it (got a 404 error there, though, not a "nice \"we're dead\" message" - two sites in a row indicate the problem might be local.

    Good thing slashdot was my next stop, not one of the many others. I had no idea all those sites were run out of the same location in SF.

    San Francisco has always seemed to m
  • How coincidental that I was actually trying to reach a Sun page before and couldn't get to it. I don't even remember what it was anymore, I really need to make my Firefox closed tabs list longer than 5.
  • Not that uncommon (Score:3, Interesting)

    by Phil Wherry (122138) on Tuesday July 24, 2007 @07:50PM (#19977463) Homepage
    I really feel for all the folks who have to deal with this outage; it's no fun at all!

    A client of mine had a number of servers in a Sterling, Virginia data center managed by Verio/NTT. It's a good data center and seems to be well-run.

    Last September, the data center experienced two complete power failures in the span of three days. To their immense credit, data center management was straight with customers about what had happened. For those who might be interested, their statements about the problem appear here. [dedicatedserver.com]

    My point? Make sure you know how to bring your systems back up from a completely cold start, and that you find a way to test this periodically. While we work to ensure that this sort of situation occurs rarely, the fact remains that these sorts of failures DO occur, and they're not as uncommon as the sales and marketing folks would like you to believe.

    Phil
  • by Animats (122034) on Tuesday July 24, 2007 @10:20PM (#19978695) Homepage

    The press release "RedEnvelope Reports Two Years of Continuous Uptime at 365 Main's San Francisco Data Center", which was on the 365 Main web site earlier today, has disappeared from there. [365main.com]

    But they sent the press release to PR Newswire, [prnewswire.com] and you can still read it there.

  • by Meridian Umbrios (1132753) on Wednesday July 25, 2007 @04:46AM (#19980655)
    Here is the e-mail that 365 is sending out to their customers. The best is their tagline "the world's finest datacenters'.

    365 Main Customer,

    At 1:49 p.m. on Tuesday, July 24, 365 Main's San Francisco data center was effected by a power surge caused when a PG&E transformer failed in a manhole under 560 Mission St.

    An initial investigation has revealed that certain 365 Main back-up generators did not start when the initial power surge hit the building. On-site facility engineers responded and manually started effected generators allowing stable power to be restored at approximately 2:34 p.m. across the entire facility.

    As a result of the incident, continuous power was interrupted for up to 45 mins for certain customers. We're certain colo rooms 1, 3 and 4 were directly affected, though other colocation rooms are still being investigated. We are currently working with Hitec, Valley Power Systems, Cupertino Electric and PG&E to further investigate the incident and determine the root cause.

    All generators will continue to operate on diesel until the root cause of the event has been identified and corrected. Generators are currently fueled with over 4 days of fuel and additional fuel has already been ordered.

    We understand the seriousness of this issue and will provide full details once they come available. We sincerely apologize for the impact this has had on your operations.

    Regards,
    Vice President, Security
    365 Main
    "The World's Finest Data Centers"

    Just send me a big fat check and all is forgiven.
    • Re: (Score:3, Funny)

      by AK Marc (707885)
      On-site facility engineers responded and manually started effected generators allowing stable power to be restored at approximately 2:34 p.m. across the entire facility.

      Wow, on-site engineers took 45 minutes just to be able to turn on generators? The generator for our facility has a master switch and a big green button. I think a monkey could get it running in 20 seconds by slinging poo at it. So, what other problems did they have that they aren't telling us? Someone else mentioned a flywheel system.

Numeric stability is probably not all that important when you're guessing.

Working...