Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage IT

Are Data Center "Tiers" Still Relevant? 98

miller60 writes "In their efforts at uptime, are data centers relying too much on infrastructure and not enough on best practices? That question is at the heart of an ongoing industry debate about the merits of the tier system, a four-level classification of data center reliability developed by The Uptime Institute. Critics assert that the historic focus on Uptime tiers prompts companies to default to Tier III or Tier IV designs that emphasize investment in redundant UPSes and generators. Uptime says that many industries continue to require mission-critical data centers with high levels of redundancy, which are needed to perform maintenance without taking a data center offline. Given the recent series of data center outages and the current focus on corporate cost control, the debate reflects the industry focus on how to get the most uptime for the data center dollar."
This discussion has been archived. No new comments can be posted.

Are Data Center "Tiers" Still Relevant?

Comments Filter:
  • It depends (Score:5, Interesting)

    by afidel ( 530433 ) on Tuesday September 22, 2009 @12:26PM (#29505457)
    If you are large enough to survive one or more site outages then sure you can go for a cheaper $/sq ft design without redundant power and cooling. If on the other hand you are like most small to medium shops then you probably can't afford the downtime because you haven't reached the scale where you can geographically diversify your operations. In that case downtime is probably still much more costly than even the most expensive of hosting facilities. I know when we looked for a site to host our DR site we were only looking at tier-IV datacenters because the assumption is that if our primary facility is gone we will have to timeshare the significantly reduced performance facilities we have at DR and so downtime wouldn't really be acceptable. By going that route we saved ~$500k on equipment to make DR equivalent to production at a cost of a few thousand a month for a top tier datacenter, those numbers are easy to work.
  • by CherniyVolk ( 513591 ) on Tuesday September 22, 2009 @12:28PM (#29505475)

    Infrastructure is more important than "best practices". Infrastructure is more of a physical, concrete aspect. Practices really aren't that important once the critical, physical disasters begin. As an example, good hardware will continue to run for years. Most of the downtime in regards to good hardware will most likely be due to misconfiguration, human error that sort of thing. A Sys Admin banks on some wrong assumption, messes up a script or hits the wrong command, but nonetheless the hardware is still physically able and therefore the infrastructure has not been jeopardized. If the situation is reversed, top notch paper plans and procedures... with crappy hardware. Well... the realities of physical discrepancies are harder to argue than our personal, nebulous, intangible, inconsequential philosophies of "good/better/best" management procedures/practices.

    So to me the question "In their efforts at uptime, are data centers relying too much on infrastructure and not enough on best practices?" is best translated as "To belittle the concept of uptime and it's association with reliability, are data centers relying too much on the raw realities of the universe and the physical laws that govern it and not enough on some random guys philosophies regarding problems that only manifest within our imaginations?"

    Or, as a medical analogy... "In their efforts in curing cancer, are doctors relying too much on science and not enough on voodoo/religion?"

  • RAID (Score:5, Interesting)

    by QuantumRiff ( 120817 ) on Tuesday September 22, 2009 @12:43PM (#29505703)
    Why go with a huge, multiple 9's datacenter, when you can go the way of google, and have a RAID:
    Redundant Array of Inexpensive Datacenters..

    Is really better to have 1000 machines in a 5-9's location, or 500 systems each in a 4-9's, with extra cash in hand?
  • by R2.0 ( 532027 ) on Tuesday September 22, 2009 @02:23PM (#29507027)

    It's not just in IT. I work for an organization that uses a LOT of refrigeration in the form of walk-in refrigerators and freezers. Each one can hold product worth up to $1M and all can be lost in a temperature excursion. So we started designing in redundancy: 2 separate refrigeration systems per box, backup controller, redundant power feeds from different transfer switches over divers routing (Brown's Ferry lessons learned). Oh, and each facility had twice as many boxes as needed for the inventory.

    After installation, we began getting calls and complaints about how our "wonder boxes" were pieces of crap, that they were failing left and right, etc. We freak out and do some analysis. Turns out that, in almost every instance, a trivial component had failed in 1 compressor and the system had failed over to the other system, ran for weeks-months, and then that failed too. When we asked why they never fixed the first failure, they said "What failure?" When we asked about the alarm the controller gave due to mechanical failure, we were told that it had gone off repeatedly but was ignored because the temperature readings were still good and that's all Operations cared about. In some instances the wires to the buzzer was cut, and in one instance, a "massive controller failure" was really a crash due to the system memory being filled by the alarm log.

    Yes, we did some design changes, but we also added another base principle to our design criteria: "You can't engineer away stupid."

  • by Anonymous Coward on Tuesday September 22, 2009 @04:10PM (#29508297)

    I work for a very very large European bank. And yes - we're highly risk averse.

    Here's the interesting thing - we built a bunch of Tier 3 and Tier 4 datacenters because the infrastructure guys thought that it was what the organization needed.

    But they didn't talk to the consumers of their services - the application development folks.

    So what do we have -

    Redundant datacenters with redundant power supplies with redundant networks with redundant storage networks with redundant WAN connections with redundant database servers running in them.

    The app guys then said - "to hell with this" - whenever we try to fail over we can't get it to work anyway because there's always something small the infra guys missed, so they built HA and auto-failover into their applications, or better still, live-live applications.

    Hence all of the redundant infrastructure is ... redundant. And a complete waste of money.

    In the end the app devs want cheap and full control of their infrastructure. Which is why they all want to go buy the cheapest hosting they can get.

    The good thing is that it's now apparent that this is the case. And so, soon, there will probably be a lot of redundant infrastructure people - and that will be a good thing.

    The uptime institute is, in my mind, has the major piece of accountability for peddling this rubbish.

    It's a scam.

  • by japhering ( 564929 ) on Tuesday September 22, 2009 @04:19PM (#29508365)

    Precisely, I've spent the last 12 years (prior to be laid off) working in a hot-hot-hot solution. Each center was fully redundant and ran at no more then 50% dedicated utilization. Each data center got 1 week worth of planned maint every quarter for hardware and software updates when that data center was completely off line leaving a hot-hot solution.. if something else happened we still had a "live" data center while scrambling to recover the other two.

    We ran completely without change windows as we would simply deadvertize an entire data center do the work and readvertize, them move on to the next data center. In the event of high importance, say a cert advisory requiring an immediate update, we would follow the same procedures just as soon as all the requisite mgmt paperwork was complete.

    And yes, we were running some of the most visible and highest traffic websites on the internet.

Real Programmers don't eat quiche. They eat Twinkies and Szechwan food.

Working...