Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Cloud Data Storage IT

Amazon Wants To Replace Tape With Slow But Cheap Off-Site "Glacier" Storage 187

Nerval's Lobster writes with a piece at SlashCloud that says "Amazon is expanding its reach into the low-cost, high-durability archival storage market with the newly announced Glacier. While Glacier allows companies to transfer their data-archiving duties to the cloud — a potentially money-saving boon for many a budget-squeezed organization—the service comes with some caveats. Its cost structure and slow speed of data retrieval make it best suited for data that needs to be accessed infrequently, such as years-old legal records and research data. If that sounds quite a bit like Amazon Simple Storage Service, otherwise known as Amazon S3, you'd be correct. Both Amazon S3 and Glacier have been designed to store and retrieve data from anywhere with a Web connection. However, Amazon S3 — 'designed to make Web-scale computing easier for developers,' according to the company — is meant for rapid data retrieval; contrast that with a Glacier data-retrieval request (referred to as a 'job'), where it can take between 3 and 5 hours before it's ready for downloading."
This discussion has been archived. No new comments can be posted.

Amazon Wants To Replace Tape With Slow But Cheap Off-Site "Glacier" Storage

Comments Filter:
  • by alen ( 225700 ) on Tuesday August 21, 2012 @09:56AM (#41068145)

    my company pays for offsite storage of our tapes and i did some quick math

    $2000 a month to store over 1000 tapes for us. I think the minimum bill is like $1500 if you only have a few tapes

    $.01/GB is $10 to $20 per LTO-4 tape per month. i know the specs are less but ive seen LTO-4 tapes hold close to 4GB of data.
    i send out one tape per month for storage and keep a bunch more locally. so even on the cheap end that's $240 per month for the first year.

    • i know the specs are less but ive seen LTO-4 tapes hold close to 4GB of data.

      That's 4TB, right?

      • by alen ( 225700 ) on Tuesday August 21, 2012 @10:10AM (#41068327)

        yep

        specs say 1.6TB max compressed but i've seen my tapes hold 3TB and 4TB. LTO-5 is even better but too expensive.

        PHB is always complaining about the cost of our off site storage so this made me look at it right away. and LTO4 is fast if you have decent server hardware

        • by dave562 ( 969951 )

          What kind of data are you backing up? 3-4TB seems WAY above average. We get about 2TB per LTO-4 tape.

    • by l0ungeb0y ( 442022 ) on Tuesday August 21, 2012 @10:22AM (#41068499) Homepage Journal

      The cost for Glacier Storage is $10 per Terabyte per month. Not sure why you are saying it's $10 - $20 per 4GB, perhaps you meant 4TB, I'm not familiar with LTO Tapes. If you are storing about 4TB of data, that would be $40/month for Glacier. However, reading back data will incur costs of $10 per Terabyte retrieved.

      I probably would never use Glacier for storing internal document records, but for safely archiving DB records/snapshots and usage logs from services running on an EC2 instance after running them through analytics and aggregation, it seems like an excellent service.

    • by Trepidity ( 597 ) <[delirium-slashdot] [at] [hackish.org]> on Tuesday August 21, 2012 @10:25AM (#41068533)

      Yeah, I don't think this is competitive with tape robots for large operations. I see it as gaining inroads, at least at the current price point, among customers who don't have that kind of equipment onsite, so would be otherwise using regular backup services for their archival needs. By adding Glacier to the existing S3 service, as a cheaper but higher-latency storage option for stuff that you're keeping "just in case" (lawsuit/whatever) as opposed to for likely access, Amazon basically incrementally expands the range of use-cases they're competitive in.

    • by wvmarle ( 1070040 ) on Tuesday August 21, 2012 @10:37AM (#41068677)

      I think your organisation is too big for Glacier.

      When you're big enough, it usually pays off to do stuff in-house, as you have economy of scale.

      Everyone smaller than that, is struggling to do proper back-ups. I for one, have something like 50 GB of data to backup. Way too small for tape. It's HD size. But HDs are not exaclty suitable to drop in a tote bag and take home on the train. Also they're a bit expensive to have a new HD every week/month so you have to rotate, making the transport even worse. I've looked into using memory cards or USB sticks, but I need 64GB ones which are still very expensive. A service like this I should seriously look into (especially now I have a 20 Mbit up/down Internet connection).

      Privacy remains an issue of course.

      • by mlts ( 1038732 ) * on Tuesday August 21, 2012 @10:46AM (#41068799)

        At the 50GB level, that is where this service becomes useful. For maximum security, I'd create a TrueCrypt volume, stuff all the stuff needing to go into the archive into it, gpg sign the volume, and upload the volume and its signature. That would mean 50 cents a month indefinitely, but at the minimum, if the upload is successful, Amazon would be storing the data on a SAN with at least RAID 5 or 6 on the backend.

        Of course, with a Blu-Ray burner, I can spend a couple bucks and burn the data onto BD-R media to store indefinitely.

        For business critical data, perhaps the best thing would be both burning a local copy to optical media, then uploading a TC container to AWS. This allows recovery in a lot more circumstances. This way, one doesn't need to sit there waiting for stuff to get readied, then download, but if there are no working local copies, the data is still accessible.

        • by hawguy ( 1600213 )

          At the 50GB level, that is where this service becomes useful. For maximum security, I'd create a TrueCrypt volume, stuff all the stuff needing to go into the archive into it, gpg sign the volume, and upload the volume and its signature. That would mean 50 cents a month indefinitely, but at the minimum, if the upload is successful, Amazon would be storing the data on a SAN with at least RAID 5 or 6 on the backend.

          Of course, with a Blu-Ray burner, I can spend a couple bucks and burn the data onto BD-R media to store indefinitely.

          For business critical data, perhaps the best thing would be both burning a local copy to optical media, then uploading a TC container to AWS. This allows recovery in a lot more circumstances. This way, one doesn't need to sit there waiting for stuff to get readied, then download, but if there are no working local copies, the data is still accessible.

          For 50GB you may as well use regular S3 storage... at 12 cents/GB, that's $6/month and you have instant access to your data, no need to wait 3 to 5 hours to do a restore from Glacier storage (and they say "most jobs" can be retrieved in that timeframe, they didn't say if 5 hours is the upper bound). If you save yourself an hour or two during the year when doing a critical file restore, then your saved labor costs should cover the additional cost of using S3.

          • For 50GB you may as well use regular S3 storage... at 12 cents/GB, that's $6/month and you have instant access to your data, no need to wait 3 to 5 hours to do a restore from Glacier storage (and they say "most jobs" can be retrieved in that timeframe, they didn't say if 5 hours is the upper bound).

            For 50GB, given an average internet connection and a 4 hour retrieval delay, I expect that most of your total down time is going to be spent downloading -- you can't open a partially-downloaded truecrypt volume,

        • by CastrTroy ( 595695 ) on Tuesday August 21, 2012 @11:43AM (#41069495)
          Yeah, for any appreciable amount of data, it's going to be quite time consuming to transfer the data. It's not unheard of to run a website off a 10 Mbit line, but transferring 50 GB over a 10 Mbit line is going to take over 113 hours. So if you have to backup 50 GB a day, it's impossible. If you have a 100 mbps line, you're down to 11 hours of saturating your line, just to transfer out the 50 GB of data. Unless your data center has some kind of peering agreement with Amazon where they can give you a really fast unmetered line, I don't really see this working out all that well.
          • by heypete ( 60671 )

            It's not unheard of to run a website off a 10 Mbit line, but transferring 50 GB over a 10 Mbit line is going to take over 113 hours.

            You're off by a factor of ten: 50GB / 10Mbps = 11.37 hours.

            Still, point taken.

          • Hopefully, you're doing incremental backups. Doing full 50GB backups each day is a bit of a waste, unless your use case requires it for some reason.

    • by hawguy ( 1600213 ) on Tuesday August 21, 2012 @11:21AM (#41069217)

      my company pays for offsite storage of our tapes and i did some quick math

      $2000 a month to store over 1000 tapes for us. I think the minimum bill is like $1500 if you only have a few tapes

      $.01/GB is $10 to $20 per LTO-4 tape per month. i know the specs are less but ive seen LTO-4 tapes hold close to 4GB of data.
      i send out one tape per month for storage and keep a bunch more locally. so even on the cheap end that's $240 per month for the first year.

      Compress your data before you send it to Amazon and you'll have a more fair comparison. An LTO-4 tape holds 800GB native, so your thousand tapes is 800TB of data, which would cost you $8000/month on Amazon Glacier.

      If you store multiple copies of your data (to protect against tape failure) and could get by with only 200TB of Glacier space, then it might be cost effective, lower labor costs in loading tapes and shipping them offsite, and dropping maintenance on your tape library (or libraries) may also sway the decision.

      The numbers change for LTO-5 (1.5TB native), but then you're looking at a large capital cost to swap out your tapes and upgrade your tape drives.

      I'm in a little different situation - I have my data replicated to a colocated storage array with less than 100TB of data. Amazon Glacier storage would cost about the same as I pay in maintenance on the array (ignoring colocation fees). Glacier is not a drop-in replacement for the array, since the storage array also runs my DR VMware cluster, but it may be more cost effective to get rid of the colocated array cabinet and VMware cluster hardware and rent some VM's with a small amount of storage for the critical servers I need for disaster recovery, using Glacier to store the rest of my data.

      • by alen ( 225700 )

        nope

        most of my 1000 some tapes are ancient DLT. i have about 40 LT-4 tapes in storage. even by itself that is like $800 per month if you figure 2TB on average.

        the $2000 monthly charge includes shipping off site. guy comes once a month and i give him a tape. takes a few minutes. what labor cost? takes 5 minutes to take it out of the robot.

        and the above doesn't include another 100TB archive i have as well at 10TB or so of tapes that i rotate for some other backups for archives.

        i can see this working for small

        • by hawguy ( 1600213 )

          nope

          most of my 1000 some tapes are ancient DLT. i have about 40 LT-4 tapes in storage. even by itself that is like $800 per month if you figure 2TB on average.

          the $2000 monthly charge includes shipping off site. guy comes once a month and i give him a tape. takes a few minutes. what labor cost? takes 5 minutes to take it out of the robot.

          and the above doesn't include another 100TB archive i have as well at 10TB or so of tapes that i rotate for some other backups for archives.

          i can see this working for smallish businesses

          But you're not storing 2TB of data on single LTO-4 tape, you're storing 800GB of compressed data on a tape, so those 40 tapes are holding 3.2TB of data. You can apply the same (or better) compression to the data that the LTO drive does before you ship your data to Amazon, so you need to look at the native capacity of the tapes, not the compressed capacity. Let your backup software do the compression and Amazon will store the same amount of data that you can store natively on a tape.

          If you have 500 DLT1 tap

          • by alen ( 225700 )

            no i'm storing 2TB and sometimes more on an LTO tape. when i first noticed i couldn't believe but i asked around on some backup forums and people said that its true. LTO tapes will frequently store a lot more data than they are rated for.

            i'm using an HP MSL 8096. except for a bunch of bad drives that were replaced under warranty i haven't had stuck tapes or any other problems. if i need to pull a tape out i look in Netbackup for the tape # and slot #. issue the command to unlock the magazines. pull them out

            • by hawguy ( 1600213 )

              no i'm storing 2TB and sometimes more on an LTO tape. when i first noticed i couldn't believe but i asked around on some backup forums and people said that its true. LTO tapes will frequently store a lot more data than they are rated for.

              i'm using an HP MSL 8096. except for a bunch of bad drives that were replaced under warranty i haven't had stuck tapes or any other problems. if i need to pull a tape out i look in Netbackup for the tape # and slot #. issue the command to unlock the magazines. pull them out. pull tape out. takes a few minutes total time.

              i buy HP branded LTO-4 tapes for $30 each. maybe $32. they are so cheap and store so much i don't rotate that much. just on data that we don't need past 6 months. i buy 40 tapes per year. i even have a secret stash of backups with a lot more data than i send offsite. its cheaper buying LTO-4 tapes than calling the backup company to bring back a tape the next day. even if its only once a year.

              and i remember calling PHB and asking for 300 DLT tapes which cost $25,000 back in the day

              You don't seem to understand the distinction between compressed storage and native storage.

              I fully believe that you're writing 2TB of data to your LTO-4 tape drive and it's storing that data on the tape. But it's still writing only 800GB of data to the tape, but the tape drive uses built-in compression software to compress the data while writing. It's completely transparent to you and the application writing to the drive (well, most backup software is aware of tape drive compression and can turn it off if i

    • by Rich0 ( 548339 )

      A few things here - your math is off as others have pointed out.

      However, all the Amazon offerings are basically more expensive than doing it yourself, if you can utilize your own capacity 100%.

      However, consider carefully even your own example. First you need to own an LTO drive, and a bunch of tapes. Those tapes aren't cheap, and if you only have a few GB of data to store then 99% of that tape is wasted. Then as you point out that warehouse has a minimum bill.

      Then consider administrative overhead. You n

  • And simple (Score:5, Funny)

    by smittyoneeach ( 243267 ) * on Tuesday August 21, 2012 @09:56AM (#41068147) Homepage Journal
    Walkabout the glacier
    With stubble on the face. You're
    Returning to a place sure
    To need a smoother face, pure.
    Burma Shave
  • by mjackson14609 ( 69635 ) on Tuesday August 21, 2012 @10:01AM (#41068199) Homepage

    Do you have to submit a properly-formatted JCL card to get your data back?

    • by mwvdlee ( 775178 )

      Not if it can be retrieved within 5 hours.

      (ex OS-390 programmer)

    • There are 11 types of Slashdotters: Those who get the joke, those who don't get the joke and don't care, and those who laugh anyway but have no clue what "JCL" means.
  • Where should I put sensitive documents that must be safely stored for a long time? In the cloud, of course!

    • If by "cloud" you mean a remote location, yes.

      Proper backup procedures generally include a on-site backup, plus an off-site. The "cloud" is perfect for an off-site backup.

    • by Kjella ( 173770 ) on Tuesday August 21, 2012 @10:41AM (#41068725) Homepage

      Where should I put sensitive documents that must be safely stored for a long time? In the cloud, of course!

      Yeah, going to a specialized 3rd party provider for safe long term storage is insane, you'd never put anything valuable in a bank vault would you? Would I put them in any random cloud? Not any more than I'd store my valuables in a shed, but with the right agreements in place on redundancy, backups, access control procedures and so on... maybe. Perhaps I'd use two and have redundant providers too. At least a company you have to remember that either way it's going to be run by people, whether you outsource it or not there could be bad apples. Maybe you think you can smell a bad one better among your own employees than they can, but most lack good self-assessment skills.

      • by ceoyoyo ( 59147 )

        Yeah, Iron Mountain seems to have made a pretty good business out of doing just this. The only difference with Amazon is that they don't send a truck to pick up your tapes.

      • you'd never put anything valuable in a bank vault would you?

        You know that governments gurarantee bank deposits everywhere, because when they didn't people simlpy didn't put their money at the banks, right?

  • So ... (Score:5, Funny)

    by PPH ( 736903 ) on Tuesday August 21, 2012 @10:10AM (#41068325)

    ... does this mean that deleting data from Amazon Simple Storage is called an ASS-wipe?

  • If transferring the gigabytes of data nightly over the internet was feasible, we'd be using rsync to an offsite server for a fraction of the cost. Bandwidth / sync time is the issue here, not whether or not its on tape or not. Why would I use Amazon if I can just run rsync to my remote server for (probably) a much lower cost. We use tape because there is not enough time to run these backups over the web. Maybe as some kind of secondary backup solution so Joe doesn't have to go get the tapes, but it prob
    • by jythie ( 914043 )
      Well, as you say, 'for us'. Having an offsite server to back up to is indeed a similar solution, in fact that is basically what this is... only someone else maintains the server and worries about maintenance. Anyone can set up an extra server in their closet at home for pretty cheap, but this isn't intended for that type of ad-hoc solution. I suspect services like this also make a company's insurance carrier happier.
      • The point here is, no matter what the service, until we get either less data or unlimited bandwidth, transferring backups nightly over the net is not an option. I was just saying this is essentially an offsite server that is probably going to be more expensive, likening it to tape is really a misnomer. I can already see the suits hearing about this and trying to get rid of tape backups because Amazon does this and they won't have to tell Joe IT guy to go get them. Than I have to start explain why the inte
        • by jythie ( 914043 )
          I would wager it actually IS tape. I am surprised it is even news, actually. Off-site tape backup centers managed by a 3rd party are nothing new. Generally bandwidth is not that big of an issue for such setups, one is not doing backups of large amounts of data at a time, but instead fairly small amount (like customer records, payroll, etc) done over and over so they can role back to any particular date easily.

          Though ideally, one does not use such a service to replace local tape backups, but instead to a
  • by hey ( 83763 ) on Tuesday August 21, 2012 @10:17AM (#41068431) Journal

    I look forward to see what services are built on top of this. Easy and cheap backup?

  • by retep ( 108840 ) on Tuesday August 21, 2012 @10:17AM (#41068435)

    A robotic tape system would generally give you your data back in a few minutes at most, but Amazon is saying you can expect multiple hours of waiting. I'm assuming this system is literally based on people moving around boxes of tapes and inserting them into tape readers; inconvenient but reassuring in its own way. Perhaps they've managed to automate things even further, say by setting up carts of hundreds of tapes carried around by a forklift that get plugged into the robotic tape loading system.

    Also sound like an interesting operations challenge though in trying to co-ordinate all the read request jobs when your customers can store as little as 1 byte. You can see why they penalize any attempt to actually read your data, especially if you send in a read request job within a short time period of storing the data.

    • by jythie ( 914043 )
      It still might be a tape robot, with the wait time being their best guess regarding how active the system will be and thus how long the robot's queue is. A robotic system indeed is quite quick, when you are the only user.
  • It usually takes us a couple days to put in the request, get the tapes from offsite, then restore the data, hoping we picked the right dates.

  • by Rei ( 128717 )

    will system meltdowns on Glacier be referred to as Jökulhlaup [wikipedia.org]?

  • This sounds amazingly like someone put money into a data storage system that turned out to be far slower than they'd wanted. Now marketing is picking up the slack by calling it Glacier.

    In other words, they're stuck trying to sell white salmon by claiming "Guaranteed to never turn pink in the can!"

    • by IpSo_ ( 21711 )

      Actually, it sounds like a network engineer asking how to better utilize the terrabits of available DOWNSTREAM bandwidth that Amazon has available. Running servers by its very nature primarily uses UPSTREAM bandwidth (serving content), so having people send them loads of data often and rarely reading it I'm sure will do wonders to better utilize that available bandwidth, not to mention backups/archives often happen during non-peak periods its a win-win for Amazon.

  • I think this opens the possibility for a middle-man company to provide long term archival tools for end users. This firm would spend its energy focused on front end tools for the end user and make use of Amazon's back end long term storage for the actual infrastructure.

    There are many amateur and even professional photographers, for example, with almost no alternatives for very long term storage. Home writable media is nearly all flawed in terms of true long term storage. I'm sure there are many use cases in this space.

    In terms of mid-size and larger companies, I think a critical feature will need to be a simple interface that encrypts at the client side prior to sending the data using a private key only available on the client side. I cannot think a responsible I.T. professional would store company critical or customer data on a third party site like that without such protections in place.

    • by Bodero ( 136806 )

      I think this opens the possibility for a middle-man company to provide [...] tools for end users.

      You hit the nail on the head about AWS' goal: They are providing the APIs for others to develop consumer-level tools and products by utilizing their existing infrastructure. Everything, from EC2 to S3 to R53, is geared towards developers (which will then market to end users) by providing full functionality via an API. Glacier is no exception, and as you said, there will be great tools available for end users for

      • Where are all the good end-user tools for S3 now?

        You can find one or two, but it's curious that a Google search for "Amazon S3 client comparison" turns up links from 2009 and 2010.

        More curious is the fact that Dropbox, SugarSync, the MS solution, Google's new solution etc seem to be thriving and providing exactly the kind of services that you'd expect third party S3 clients to provide.

        I'm not saying these clients don't exist, but I don't seem to find them very easily compared to other cloud storage options,

        • by isaac ( 2852 )

          More curious is the fact that Dropbox, SugarSync, the MS solution, Google's new solution etc seem to be thriving and providing exactly the kind of services that you'd expect third party S3 clients to provide.

          Dropbix IS a consumer interface to S3.

          https://www.dropbox.com/help/7/en [dropbox.com]

          "Once a file is added to your Dropbox, the file is then synced to Dropbox's secure online servers. All files stored online by Dropbox are encrypted and kept securely on Amazon's Simple Storage Service (S3) in multiple data centers lo

        • You can find one or two, but it's curious that a Google search for "Amazon S3 client comparison" turns up links from 2009 and 2010.

          More curious is the fact that Dropbox, SugarSync, the MS solution, Google's new solution etc seem to be thriving and providing exactly the kind of services that you'd expect third party S3 clients to provide.

          Dropbox and SugarSync both are applications using Amazon S3 for infrastructure (SugarSync says they use "two carrier-grade data centers, including Amazon's S3 facility.") So

        • Jungle Disk does transparent folder sync a la Dropbox on top of S3 (or Rackspace), and also lets you mount your cloud store as a filesystem, with clients provided for Windows, OS X and Linux.

        • by Bodero ( 136806 )

          Where are all the good end-user tools for S3 now?

          As others have mentioned, Dropbox and SugarSync are consumer interfaces to S3. I think the fact that Amazon references "objects" and "buckets" in S3 terminology is directly because they didn't really build S3 to be an "online file system" type service (though s3fuse [google.com] provides it). They intended to be merely the backend for the consumer services you mentioned.

          That being said, clients aren't always strictly downloadable software. My most-used S3 client is buil

  • by rossdee ( 243626 ) on Tuesday August 21, 2012 @10:44AM (#41068769)

    Apparently someone at Amazon didn't watch the long term weather forecast - climate change means all the glaciers will be gone in a few decades.

  • This is essentially what Amazon (and Google mail/docs for that matter) is doing - Aiming to become your company's new IT department. No CEO in their right mind is going to pay multiple salaries/benefits for a staffed IT department when they can get it from Google and Amazon way cheaper. Even if they pay $10k/month, that's cheaper than paying to staff a 4 person IT departement.

    And before you start in about how this helps small startups who can't afford and IT staff, well think again. They can't afford the

  • They obviously could use some help [slashdot.org].

  • by bogie ( 31020 ) on Tuesday August 21, 2012 @11:11AM (#41069077) Journal

    It's $10/month per 1TB which imho is pretty fair. Maybe not doable if you have 1,000 1TB tapes like someone else posted but for most other businesses that's not bad.

  • The examples all use the Retrieval pricing:
    http://aws.amazon.com/glacier/faqs/ [amazon.com]

    Not having ever used AWS, I'm wondering what is the difference between a "Transfer Out" and a "Retrieval"?

    • Not having ever used AWS, I'm wondering what is the difference between a "Transfer Out" and a "Retrieval"?

      You could read the page you linked to for the answer. That page defines data transfer:

      Data transfer "in" and "out" refers to transfer into and out of an AWS Region. There is no Data Transfer charge for data transferred between Amazon EC2 and Amazon Glacier within the same Region. Data transferred between Amazon EC2 and Amazon Glacier across all other Regions will be charged at Internet Data Transfer rat

  • Okay, I thought Google Play was a terrible name, but Amazon Glacier leaves me speechless.

  • Well I would have got first post if I wasn't using Amazon Glacier for my swap file.

  • You mean "Web Scale" is a term that people ACTUALLY use? I thought that that youtube video was just exaggerating for theatrical effect.

    *face palm*

  • This sounds like an ideal medium for PACS - medical imaging. PACS generates large quantities of data, which may be required to be retained for a very long time to be available for medico-legal reasons. For clinical purposes, 97% of the data over three years old is never referenced, but trying to get anyone to agree to an ILM policy that isn't at least 30 years is a real problem. Given the average acute hospital is generating 20TB of image data per year, this service from Amazon might be quite popular. DICOM

Some people manage by the book, even though they don't know who wrote the book or even what book.

Working...