Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage

How the LHC Is Reviving Magnetic Tape 267

sandbagger writes "The Large Hadron Collider is the world's biggest science experiment. When spinning, it reportedly generates up to six gigs of data per second. Today's six-terabyte tape cartridges fill rapidly when you're creating that amount of material. The Economist reports that despite the advances in SSDs and hard drives, tape still seems to be the way to go when you need to store massive amounts of digital assets."
This discussion has been archived. No new comments can be posted.

How the LHC Is Reviving Magnetic Tape

Comments Filter:
  • No shit Sherlock (Score:4, Informative)

    by morcego ( 260031 ) on Monday December 02, 2013 @12:42PM (#45575235)

    No one in the data retention business ever stopped using tapes. See the numbers on LTO units being sold, if you need proof.

    This is a shitty article.

  • by MightyYar ( 622222 ) on Monday December 02, 2013 @12:44PM (#45575253)

    Just be careful - optical disks degrade, too. Years ago before hard drives became so incredibly dirt cheap, I would do my little video editing thing and then back up the project files to DVD. And not just any DVD - I did my homework and found the best-rated archival DVDs (sorry, don't remember the brand - only that they came from Japan). Anyway, I just sucked them back onto my NAS, and some of them had developed a teeny bit of unreadable data. Fortunately, I had made PAR2 files for everything. Between par2repair and ddrescue, I was able to recover the data. But the moral of the story is don't rely on optical disks to be magical storage that does not degrade.

  • by dshk ( 838175 ) on Monday December 02, 2013 @01:00PM (#45575423)

    Yes, they are surprisingly fast. The maximum speed of a current Tandberg LTO-6 drive is 160 megabytes/s if the data is uncompressable. With the usual compressible data it can be about 320 megabytes/s (officially 400).

    These drives can even be too fast. The drives do speed matching, but they have a minimum speed, below that they start shoe-shining. One reason I have chosen an older generation, LTO-3 tape drive, instead of the current generation, because I cannot easily feed an LTO-6 with at least 60 MB/s, which is the minimum speed of the drive. Considering compression, that is about 120 MB/s, which saturates a 1Gb network.

  • by rickb928 ( 945187 ) on Monday December 02, 2013 @01:07PM (#45575493) Homepage Journal

    The bottom line in managing long-term archiving (5+ years) is that you need to both refresh and verify you storage, at several different levels.

    1. Shoot the initial copy.
    2. Copy this asap. "Copy1"
    3. Stash both in disparate locations.
    4. Go back to the 'original' on a 6-9 month schedule and verify it.
    5. Go back to the 'copy1' on a schedule and verify it on a different schedule.
    6. Go back to the 'original' on a different 9-12 month schedule and refresh(copy) it, stored to the other site.
    7. Go back to the 'copy1' on a different schedule and refresh (copy) it, stored to the other site.
    8. Repeat 4&5 on a year schedule. Do you need to re-write the data in 'current' formats and retain both original and new? Are you moving to new media?
    9. Repeat 6&7 on a year schedule. Ditto the rest of step 8.
    10. We should be at year 2 or 2.5. Repeat steps 1-9 once for a 6+/- year retention, again for 10+ year retention.

    Are you changing data formats, and is it possible to ensure integrity by copy8ing and archiving in new formats?
    As you change media, do you need to retain old media systems, or will you move to the new media?
    At what point is the data no longer valid, determined by the owners?
    Are the 'owners' the only stakeholders? If not, expand the set.

    In all of this, you have a dedicated media management system including media drives, copy/verify capabilities, and stand-in for restoration.

    This is all very interesting to me. Medical records in particular seem to be assumed to have a lifetime retention, but other than the date and nature of the event, how important are the details of your appendectomy performed at age 5 when you are 60? Is that benign tumor removed at age 12 important at age 45? How much LHC data collected in 2013 will be useful in 2023? Different criteria. Different processes.

  • by Doc Hopper ( 59070 ) on Monday December 02, 2013 @01:46PM (#45575911) Homepage Journal

    The drives do speed matching, but they have a minimum speed, below that they start shoe-shining.

    Agreed. At my work we do parallel streams to multiple Sun T10000 T2 tapes (T10K "C" drives) at 250Mbyte/sec uncompressed (500 megabytes per second compressed, more or less, usually quite a bit more). If for some reason we push less than about 120mbytes/sec, the tape rewind times cause all kinds of issues.

    We make the same kind of decision when choosing Sun T10000 "B" drives instead of "C" or the new "D" drives if the source cannot push data fast enough.

    I've long laughed at articles saying tape is dead. For large-scale* backup, retention, transport, and legal hold problems, there simply is no other solution that scales reasonably well.

    *My definition of "large-scale" for this specific context: hundreds of terabytes or more, much of it transported thousands of miles regularly. If you don't work with hundreds of terabytes and at least dozens of petabytes on a daily basis, you may suffer from optimistic delusions regarding disk storage capabilities, one which disk storage vendors are all too glad to reinforce, to the detriment of customers faced with half-baked solutions that cannot hope to meet their throughput requirements. Given "large-scale" data, there's no replacement for tape at present; everything else is a low-throughput also-ran, typically harboring enormous and unplanned complications. We're also heavy users of VTL, replication, cloning, S3-workalikes, and various disk technologies. Tape remains vital to large enterprise operations, and those predicting its imminent death have been the butt of jokes about marketing wonks for a decade and a half.

  • by dshk ( 838175 ) on Monday December 02, 2013 @01:50PM (#45575951)
    Sequential access speed is only relevant if you backup huge non-fragmented files or entire raw partitions, and nothing else.
  • by Doc Hopper ( 59070 ) on Monday December 02, 2013 @01:55PM (#45576009) Homepage Journal

    long-life optical discs fail... [store] tape in a cool, dark place...

    This, this, one-thousand times this. I've worked in data centers for a decade and a half, and seen innumerable optical media go bad within just a few years (typically about 3 years) even in DVD jukeboxes in climate-controlled environments. Meanwhile, we restore from fairly ancient tapes on a regular basis.

    In reality, most companies don't store tapes longer than 7 years anyway; that's the upper limit of typical audit liability. The data on the tapes may be older than that, kept indefinitely on-disk, but most large companies have a fairly aggressive destruction/over-write schedule for data on tape older than 7 years.

    It's very unlikely we'll need data off a tape 20 years from now, but kept in the right conditions -- like the bat-cave of a tape silo room housing tens of thousands of 10TB tapes a few feet away from me right now -- there's a really good chance the data will be readable. While we do have plenty of tape failures (hundreds per year), they are almost always caught at write-time by the verification head.

    On a modern tape drive, you usually have several dozen "heads" on any given tape drive, and there will be two sets of them each with its own mechanism to align it with a precision of just a few microns. Pretty amazing, really; if you drop by the Denver, CO area some time, the Oracle/Sun building engineers there can often arrange a tour of our tape testing facilities if you sign a NDA and represent a potential sale. Anyway, the second mechanism will be engaged on the tape in order to read what the first just wrote and verify it before it passes the "successful write" confirmation back up the fibre channel chain. This way you can guarantee you don't get "write once, read never" media.

  • by Doc Hopper ( 59070 ) on Monday December 02, 2013 @02:09PM (#45576103) Homepage Journal

    ...you need to both refresh and verify you storage...

    You came pretty close with the process, but for most businesses you're not quite there. Here are a few clarifications on the process.

    1. Typically large companies (including those, like us, with stringent HIPAA requirements) take two simultaneous copies from the original source. We don't copy a copy if it can be avoided, and we have enough tape drives to do this.
    2. We contract out with a local storage company to grab the tapes within a few days and store for the given retention period off-site. One copy usually remains on-site as well for long-term retention and rapid restoration. With plenty of capacity in the silo (tens of thousands of tapes in an Oracle/Sun SL8500), we are not terribly concerned about retention policies. If we get tight on space, we'll just expand the silo again.
    3. The same data usually still exists as on-disk media marked read-only, available for the legal folks who insisted we archive it in the first place. Often it also exists at a second geographical location thousands of miles (at minimum) from the first, with its own backup tapes. Plus it exists on two tapes at each site, one near-line and one off-site. Given tape reliability, three layers of data protection is typically sufficient. If "legal hold" is involved, we also insist that the disk array be kept on a valid support contract to reduce the risk of failed disks in the storage appliance.
    4. Retention policies dictate we keep around at least a few tape drives of every generation we've ever used which has tapes archived with our off-site storage facility. Even if they are not in the silo, they're in a storage closet waiting for us to bring them to life if needed up to twenty years later.

    I do this kind of thing all the time. Feel free to ping me at my easily-figured-out email address (firstname@lastname.org) if I can answer additional questions for you.

  • by Anonymous Coward on Monday December 02, 2013 @03:11PM (#45576757)

    >"shoe-shining"

    When the tape drive repeatedly and quickly does forward and reverse operations over the same piece of tape due to data fault on the tape or some buffer problem (or other reasons, too). The analogy is to the quick back-and-forth of a shoe-shine rag that runs the same piece of rag over a shoe many times.

    Ah, memories...

  • Re:Tape is bullshit. (Score:4, Informative)

    by Doc Hopper ( 59070 ) on Monday December 02, 2013 @03:21PM (#45576857) Homepage Journal

    Tape is slow, expensive, proprietary and unreliable.

    The only people who still use it are those who have to, or idiots with money to burn.

    Fact check on the troll.

    "Tape is slow". Absolutely false for throughput; true only for IOPS. A modern tape is much faster than a modern hard drive. That's the point of the article, and my personal experience as well. Random I/O to/from tape drives is incredibly slow, but no hard drive can touch a modern tape drive's throughput. It's the reason LHC uses it.

    "Tape is expensive": True only in a non-ROI sense, therefore mostly false. You'll find a modern, large tape silo of equivalent capacity to a modern, large storage appliance usually works out much cheaper both in initial cost and cost over time if you intend to use the hardware for at least three to five years. That said, the cost of admission to the world of enterprise tape is pretty high; it's the ongoing costs that are much lower than hard drives.

    "Tape is Proprietary": Both true and false. LTO is an open (licensable) standard, but the fastest/largest tape drives on the planet are typically proprietary right now, because being the fastest/largest causes more sales, and therefore funds innovation in faster/larger tape technology.

    "The only people who still use it are those who have to...": False. There are many, many use cases for tape where it is not a requirement, but is just more convenient, reliable, faster, and less expensive than a hard-disk solution. I could list them, but, well, you're a troll and I don't want to type much more.

    "The only people who still use it are... [those] with money to burn.": False. ROI is what drives most of our tape purchases, and we save an enormous amount of money by using tape in appropriate scenarios. Hard disks are appropriate for some use cases, tapes are mandatory or just a smart purchase in others.

  • Re: maybe (Score:4, Informative)

    by Doc Hopper ( 59070 ) on Monday December 02, 2013 @03:36PM (#45577011) Homepage Journal

    Late 2013 pricing.

    4TB hard drive: around $400
    5TB tape: around $160
    8.5TB tape (same media as 5TB, newer drive): still about $160

    Cost per terabyte of disk: about $100.
    Cost per terabyte of tape: about $19

    I'm ignoring the cost of the tape drive, just like I'm ignoring the cost of the head(s) involved in NAS/SAN storage.

    To fix your quote to be in line with reality:

    Glacier is cold storage; the drives are only spinning when they are filled, when retrieving, and when scrubbing / consolidating. Just like tape but at least five times more expensive.

With your bare hands?!?

Working...