Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage IT

Are Data Center "Tiers" Still Relevant? 98

miller60 writes "In their efforts at uptime, are data centers relying too much on infrastructure and not enough on best practices? That question is at the heart of an ongoing industry debate about the merits of the tier system, a four-level classification of data center reliability developed by The Uptime Institute. Critics assert that the historic focus on Uptime tiers prompts companies to default to Tier III or Tier IV designs that emphasize investment in redundant UPSes and generators. Uptime says that many industries continue to require mission-critical data centers with high levels of redundancy, which are needed to perform maintenance without taking a data center offline. Given the recent series of data center outages and the current focus on corporate cost control, the debate reflects the industry focus on how to get the most uptime for the data center dollar."
This discussion has been archived. No new comments can be posted.

Are Data Center "Tiers" Still Relevant?

Comments Filter:
  • by japhering ( 564929 ) on Tuesday September 22, 2009 @12:30PM (#29505521)

    Data center redundancy is a need thing. However, most data center designs for get to address the two largest causes of down time ... people and software. People are people and will always make mistakes, as such there are still things that can be done to reduce the impact of human error.

    Software, very rarely is designed for use in redundant systems. More likely, the design is for use in a hot-cold or hot-warm recovery scenario. Very rarely is it designed for multiple hot across multiple data centers.

    Remember, good disaster avoidance is always cheaper than disaster recovery when done right.

  • by Sarten-X ( 1102295 ) on Tuesday September 22, 2009 @12:32PM (#29505537) Homepage

    "A stick of RAM costs how much? $50?"

    I don't remember the source of that quote, but it was in relation to a company spending money (far more than $50) to reduce the memory use of their program. Sure, there's a lot of talk in computer science curricula about using efficient algorithms, but from what I've seen and heard, companies almost always respond to performance problems by buying bigger and better hardware. If software weren't grossly inefficient, how would that affect data centers? Less power consumption, cheaper hardware, and more "bang for your buck", so to speak.

    Eventually, this whole debate becomes moot, as data centers can get more income from the hardware, thus still provide the uptime, redundancy, and features, without the need to cut costs. Once those basic needs are out of the way, there's room for expansion into other less-than-critical offerings, and finally, innovation in areas other than uptime.

  • by jeffmeden ( 135043 ) on Tuesday September 22, 2009 @12:36PM (#29505593) Homepage Journal

    Given the recent series of data center outages and the current focus on corporate cost control, the debate reflects the industry focus on how to get the most uptime for the data center dollar.

    Repeat after me: There is no replacement for redundancy. There is no replacement for redundancy. Every outage you read about involves a failure in a feature of the datacenter that was not redundant and was assumed to not need to be redundant... assumed *incorrectly*. Redundancy is irreplaceable. If you rely on your servers (the servers housed in one place) you had better have redundancy for EVERY. SINGLE. OTHER. ASPECT. If not, you can expect downtime, and you can expect it to happen at the worst possible moment.

  • by Maximum Prophet ( 716608 ) on Tuesday September 22, 2009 @12:51PM (#29505825)
    That works if you have one program that you have to run every so often to produce a report. If your datacenter is more like Google, where you have 100,000+ servers, a 10% increase in efficiency could eliminate 10,000 servers. Figure $1,000 per server and it would make sense to offer a $1,000,000 prize to a programmer that can increase the efficiency of the Linux kernel by > 10%.

    B.t.w Adding one stick of RAM might increase the efficiency of a machine, but in the case above, the machines are probably maxed out w.r.t. RAM. Adding more might not be an option without an expensive retrofit.
  • uptime matters (Score:3, Insightful)

    by Spazmania ( 174582 ) on Tuesday September 22, 2009 @12:54PM (#29505861) Homepage

    Designing nontrivial systems without single points of failure is difficult and expensive. Worse, it has to be built in from the ground up. Which it rarely is: by the time a system is valuable enough to merit the cost of a failover system, the design choices which limit certain components to single devices have long since been made.

    Which means uptime matters. 1% downtime is more than 3 days a year. Unacceptable.

    The TIA-942 data center tiers are a formulaic way of achieving satisfactory uptime. They've been carefully studied and statistically tier-3 data centers achieve three 9's uptime (99.9%) while tier-4 data centers achieve four 9's. Tiers 1 and 2 only achieve two 9's.

    Are there other ways of achieving the same or better uptime? Of course. But they haven't been as carefully studied which means you can't assign a high a confidence to your uptime estimate.

    Is it possible to build a tier-4 data center that doesn't achieve four 9's? Of course. All you have to do is put your eggs in one basket (like buying all the same brand of UPS) and then have yourself a cascade failure. But with a competent system architect, a tier-4 data center will tend to achieve at least 99.99% annual uptime.

  • by Maximum Prophet ( 716608 ) on Tuesday September 22, 2009 @12:54PM (#29505873)
    Code scales, hardware doesn't. If you have one machine, yes, it cheaper to get a bigger, better machine, or to wait for one to be released.

    If you have 20,000 machines, even a 10% increase in efficiency is important.
  • by japhering ( 564929 ) on Tuesday September 22, 2009 @01:03PM (#29505965)

    And if you had two identical data centers, where each in and of itself was redundant with software designed to function seamlessly across the two in a hot-hot configuration .. there would have been NO downtime.. the university would have been up the entire time with little to no data loss.

    So say I'm Amazon and my data center burns down.. 48 hours with ZERO sales for a disaster recovery scenario vs normal operations for the time it takes to rebuild/move the burned data center..

    I think I'll take disaster avoidance and keep selling things :-)

  • by Ephemeriis ( 315124 ) on Tuesday September 22, 2009 @01:10PM (#29506063)

    I've been involved in this field for about 15 years. The funniest misconception I've run into, time and time again, is that an unmaintained UPS, unmaintained battery bank, unmaintained transfer switch, and unmaintained generator will somehow act as magical charms so as to be more reliable than the commercial power they are supposedly backing up.

    A lot of folks don't really contemplate what a loss of power means to their business.

    Some IT journal or salesperson or someone tells them that they need backup power for their servers, so they throw in a pile of batteries or generators or whatever... And when the power goes out they're left in dark cubicles with dead workstations. Or their manufacturing equipment doesn't run, so it doesn't really matter if the computers are up. Or all their internal network equipment is happy, but there's no electricity between them and the ISP - so their Internet is down anyway.

    I'll stand behind a few batteries for servers... Enough to keep them running until they can shut down properly... But actually staying up and running while the power is out? From what I've seen that's basically impossible.

  • by aaarrrgggh ( 9205 ) on Tuesday September 22, 2009 @01:23PM (#29506239)

    Unless you were doing maintenance in the second facility when a problem hit the first. That is what real risk management is about; when you assume hot-hot will cover everything, you have to make sure that is really the case. Far too often there are a few things that will either cause data loss or significant recovery time even in a hot-hot system when there is a failure.

    Even with hot-hot systems, all facilities should be reasonably redundant and reasonably maintainable. Fully redundant and fully maintainable can be a pipe-dream.

  • by Timothy Brownawell ( 627747 ) <tbrownaw@prjek.net> on Tuesday September 22, 2009 @02:05PM (#29506773) Homepage Journal

    Every outage you read about involves a failure in a feature of the datacenter that was not redundant and was assumed to not need to be redundant... assumed *incorrectly*.

    No, I've also heard about cases where both redundant systems failed at the same time (due to poor maintenance) and where the fire department won't allow the generators to be started. Everything within the datacenter can be redundant, but the datacenter itself still is a single physical location.

    Redundancy is irreplaceable.

    Distributed fault-tolerant systems are "better", but they're also harder to build. Likewise redundancy is more expensive than lack of redundancy, and if you have to choose between $300k/year for a redundant location with redundant people vs. a million-dollar outage every few years, well, the redundancy might not make sense.

The hardest part of climbing the ladder of success is getting through the crowd at the bottom.

Working...