Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Data Storage The Internet Technology

Your Internet Data is Rotting (theconversation.com) 141

MySpace, which recently lost 50 million files uploaded between 2003 and 2015, is not alone in encountering problems. As the internet grows, batches of old information are increasingly disappearing from it. From a story: Amazon cloud services, for example, also experienced a substantial outage in 2011 and another in 2017. Though temporary, and without actual loss of data, these outages left users without access to precious and important files for some time. Preserving content or intellectual property on the internet presents a conundrum. If it's accessible, then it isn't safe; if it's safe, then it isn't accessible. Accessible content is subject to tampering, theft or other sorts of bad actions. Only content that is inaccessible can be locked and protected from hacking.

The internet currently accesses about 15 zettabytes of data, and is growing at a rate of 70 terabytes per second. It is an admittedly leaky vessel, and content is constantly going offline to wind up lost forever. Massive and desperate efforts are underway to preserve whatever is worth preserving, but even sorting out what is and what is not is itself a formidable undertaking. What will be of value in 10 years -- or 50 years? And how to preserve it? Acid-free paper can last 500 years; stone inscriptions even longer. But magnetic media like hard drives have a much shorter life, lasting only three to five years. They also need to be copied and verified on a very short life cycle to avoid data degradation at observed failure rates between 3% and 8% annually. Then there is also a problem of software preservation: How can people today or in the future interpret those WordPerfect or WordStar files from the 1980s, when the original software companies have stopped supporting them or gone out of business?

This discussion has been archived. No new comments can be posted.

Your Internet Data is Rotting

Comments Filter:
  • precious files? (Score:5, Interesting)

    by Anonymous Coward on Friday May 17, 2019 @02:18PM (#58609846)

    Your files are far from precious. The data that your files generates however, is.

    Your files are stored on the cheapest disks we can find, and backed up whenever we feel like it, and shared globally.

    OUR data however, is stored on quad-redundant disks, backed up regularly, on multiple formats, and off site locations, and secured.

    Once we snarf all the useful bits of data from your file, we couldn't care less what happens to it.

    • by skids ( 119237 )

      Only content that is inaccessible can be locked and protected from hacking.

      If only we had some kind of device that you could write data to, but not rewrite the data. Maybe they could have some sort of breakable plastic tab that prevented the drive write mechanisms from engaging at a hardware level, so you couldn't alter them without physical access. Or better yet maybe some physical property of the media would prevent it from being altered digitally once it had been run over once... of course nobody can protect against a building fire and all, but surely that might prevent "hack

      • I love this idea! I think lasers would work really well for writing the data due to their precision.
      • I know you were joking, but it' important to note that magnetic storage fails fairly quickly and writable CDs cloud over with time— we simply don't yet have a reliable means of permanently storing digital data. Lots of digital archivists and technologists have been working on this problem for a really long time; it's not as if nobody thought to use existing technology. Every existing fiscally feasible technology fails us in some way when it comes to long-term digital archiving.

        I work on a team at a la

        • It's not "digital" but we have a reliable means of long-term storage of data. It's called microfilm, and it has a life expectancy of 500+ years.
          • You could also use microfilm to store things like photos & scanned images in hybrid analog + digital form... an image preview of the original page, alongside a high-res scanned copy in 2D barcode form (or "3D", using multiple densities of gray to pack multiple bits into each barcode dot).

            The main downside is extraordinarily slow production & restoration. It takes time to develop microfilm to archival standards, and you'd probably have some degree of (correctable) error right off the top once you fin

          • by rtb61 ( 674572 )

            As my life, can not be backed up, nor is it all that enduring, I am not really all that fussed about data. Seriously any data I really want to keep, is so minimal in reality, that hard copy is good enough. As an introvert computer geek, I have always taken the view, any data I create, I can recreate and more often than not, do a better job the second time round. Back up sometimes but not that often, except for the more essential stuff, which in the end I often just delete some years later because it was not

    • >Your files are far from precious. The data that your files generates however, is.
      Your files are far from precious. The data that your files generate however, are.

      There, fixed that for you.

      • >Your files are far from precious. The data that your files generates however, is.
        Your files are far from precious. The data that your files generate however, are.

        There, fixed that for you.

        It's like a thousand tech writers all screamed in agony and then fell silent.

    • I sometimes wonder what some future techno-archeologist will have to sift through to learn about our time.

  • MySpace, which recently lost 50 million files uploaded between 2003 and 2015...

    MySpace still exists?

  • In the late 90s there was a Rice University site called, "Things the warning label told you not to do".
    Sadly it is gone, especially the videos of them vaporizing watermelons with a potato canon with 200psi behind it. 30 frames a second and you could see the melon expanding the moment of impact and the next frame just showed red mist.

    Pre YouTube brilliance and now it is all gone...
    • by Anonymous Coward

      I was part of the group working on those videos while I was at Rice getting my PhD. Glad you liked them. We were picking up watermelon slurry for hours. I might still have some of the raw footage lying around on 3.5" floppies, though I bet I can't read them anymore.

  • by iggymanz ( 596061 ) on Friday May 17, 2019 @02:23PM (#58609872)

    the "soffice" command line converter of LibreOffice has WordPerfect filter for conversion, and for WordStar just use LIbreOffice 3.6.6 which has filter (version 4 dropped support)

    All the common 1980s stuff has converters in the open source world, dbase and foxpro files, Lotus 1-2-3, etc.

    • What about my TRS-80 Electric Pencil files?

    • by Sigma 7 ( 266129 )

      Conversion is only half the issue. In reality, some users might also have plenty of similar files on old computers, stored on floppy disks, or an ATA hard drive that requires manual CHS configuration. Maybe it could be farther back, and really be a Commodore 64 Speedscript document. Regardless of which conversion tools are around, they aren't helpful if you can't access the documents.

      LibreOffice has WordPerfect filter for conversion

      I had two troubles with the conversion. First was that the WordPerfect docum

      • You have to convert the WordPerfect file on command line before opening. Never had a problem myself. You want specific number of lines per page for a poem? Suck it up and hand edit it afterwards!

        USB floppy drives exist, for both 3-1/2 and 5-1/4"

        Comodore 64 disks can be read by normal PC using a universal floppy controller, those can read all the old apple, atari, amiga, next, etc.

        The info for ATA and MFM drive settings are on the web, and the ISA controller boards still sold.

        Only the unmotivated would hav

  • by ctilsie242 ( 4841247 ) on Friday May 17, 2019 @02:23PM (#58609874)

    One rule is to have some means of checking data. Every so often, verify it on whatever media, then perhaps move it to somewhere else. For example, verify what is stored on Amazon Glacier, and move it to Glacier Deeparchive (when it goes GA), if it is archive data that is never touched. Or, copy the LTO-5 tapes the stuff is on to a new LTO-8 tape.

    Another rule is to have archives on different media. Burn a copy to optical media (M-DISC Blu-Ray), and store a copy in AWS Glacier, or a copy on a hard disk, and another copy stored in a Wasabi bucket. This way, if you lose the online copy, you have a local one.

  • by QuietLagoon ( 813062 ) on Friday May 17, 2019 @02:23PM (#58609876)
    ... then you either have placed it in the wrong spot, or you don't care about it in the first place.
  • by Anonymous Coward

    You mean if I hand my data to someone else for safe keeping, I can't be ensured they'll protect it forever? Shocking.

  • If you outsource management of your systems to an external party, you also outsource your control over it. Is this a surprise for anyone?

    • That is what encryption is for. Of course, key management becomes an issue, but it is a lot less to keep track of compared to terabytes of data. Worst case, print out the key, or use this [cryptosteel.com] to ensure you have a backup of your key that won't be rotting anytime soon.

  • by Anonymous Coward

    who knows what the grand kids will consider important and worth holding on to... There is so much dilution of information these days that there's no guarantee that what is cherished today will remain so tomorrow.

  • Stone tablets shatter, Scrolls and Paper rot, We have fires, floods, and just run out of space and a decision on what data stays and goes is made. I remember volunteering for the library, they had me go threw the books, and take out book that haven't been checked out in 5 years. Then the librarian would go threw these books and decide if they were classics deserved to be saved, and the rest went up for sale.

    If information is deemed vitally important, we make copies, use different mediums, thus is protecte

    • Data rot follows Moore's Law. We are losing more data at ever increasing rates.

    • "Oh who cares, it's just a stupid vase" -- Citizen of Pompeii circa 79 AD

      • Good for that citizen, since getting your self out ASAP and forgetting about most possessions is the best move.

      • "Oh who cares, it's just a stupid vase" -- Citizen of Pompeii circa 79 AD

        At the time, it was.

        Seriously though, if only I'd been smart enough to save my original Major Matt Mason Space Crawler and my Matchbox cars, I could sell them on eBay and retire.

        • that's my point, what we consider mundane and pointless (like facebook/twitter -- whatever) would be a potential treasure trove for future archaeologists. (and remember, they're currently sifting through the remains of civilizations who also thought they were the pinnacle of humanity.)

    • by flippy ( 62353 )

      Stone tablets shatter, Scrolls and Paper rot, We have fires, floods, and just run out of space and a decision on what data stays and goes is made. I remember volunteering for the library, they had me go threw the books, and take out book that haven't been checked out in 5 years. Then the librarian would go threw these books and decide if they were classics deserved to be saved, and the rest went up for sale.

      If information is deemed vitally important, we make copies, use different mediums, thus is protected from data rot. I think we have learned our lesson from the Library of Alexandria. MySpace and Facebook data isn't that valuable in general.

      ^^^This. If this type of thing hadn't been happening from the beginning of civilization, we'd have perfect records back to the beginning of civilization. People decide what's important to make sure is safe, and what isn't. It's always been that way.

      • Where are the data, important or otherwise, that we are moving to longer lived storage media since we think it is important?
        • by flippy ( 62353 )

          That's up to the individual owner of that data. I can answer for myself, my important data is on multiple drives (mostly flash, whether it's a USB drive or an internal SSD) in multiple copies that I keep locally, as well as online. Used to be CD-R or DVD-R. As formats change and/or become no longer viable, I make sure it gets copied to newer tech.

          The good thing here, and what's different from the rest of human history, is that it's possible to make perfect copies of digital data. In the middle ages, whe

    • Although I agree in general with this point of view, I think one of the concerns is:

      MySpace and Facebook data isn't that valuable in general.

      Is it? Maybe there was something of value there that should have been saved? Or some meaningful connection about people that should make sense after centuries when studying some social circles? Who knows (or is entitled) for it?

      But again, the fight against information rot is a fight against thermodynamic's 2nd law, and we all know it is deemed to lose.

    • by Anonymous Coward

      We have been loosing hounds for thousands of years.

    • by jbmartin6 ( 1232050 ) on Friday May 17, 2019 @03:29PM (#58610268)
      One shortcoming with this approach is that what is considered important now may not be so for future generations. We've learned a lot about history and how folks centuries ago lived by looking at trash heaps. If our trash heap is entirely unrecoverable we will be a complete mystery to them. I forget who, but I recall one author whose fictional future referred to the current era as "the Dark Ages" since all the records kept entirely on short-lived magnetic media were all lost.
      • by Kjella ( 173770 )

        I forget who, but I recall one author whose fictional future referred to the current era as "the Dark Ages" since all the records kept entirely on short-lived magnetic media were all lost.

        It doesn't matter if we're losing 99.999% when we're recording a million times more. The original moon landing tapes are lost [reuters.com]. It's probably one of the most newsworthy moments of the 20th century and it wasn't valuable enough to keep and that was only 50 years ago. What's the chance we lose the original recording of the first Mars landing? 0%. 0.00000% unless the camera/transfer fails in the first place. Let's say WW3 breaks out today, all shit is loose and the nukes go flying. Would we lose tons and tons o

    • If you've been loosing data you need to tighten it back up.
      Needing some space, I just grabbed a few of the old computers my biz ran on in the 90's and checked for content.
      All the disks worked fine, I even found some things I thought I'd lost and recovered them via emailing them to myself (USB didn't work on old enough machines). Not quite back to MFM drives, but hey - ~ 30 years, no issues. Kinda hurts to toss them in the green box, but a raspberry pi is now a more potent machine.
      .

      Maybe that few years

    • Media. Mediums are those who do spiritualism, and fleece the gullible in the process.
  • TFS should state data is disappearing but data quality can deteriorate over time - more like rotting.

    I used to argue with a manager over indiscriminately keeping data acquired during experiments. He wanted to keep everything, calculating the cost was minimal. My point was that there were real costs, corresponding to locating what you wanted, discriminating bad (known issues with the collection) from good data or, more generally, understanding what the data contained as context information may be missing or

    • I learned that lesson the hard way - but the one your boss wanted you to learn. The trouble with data reduction at that time is that if you come to a different understanding or want to ask a different question later - you've done what amounts to lossy compression and that original data is gone - totally, not just degraded.
  • Video is harder to preserve now with discontinuing of easy to use VCRs and replaced by digital video which has more DRM that can be copyright claimed away at any time. Wikipedia can make you an unperson if you do not live up to their arbitrary standards of notability. Shift in media formats from floppy disc to zip disc to cdrw to usb drive to micro sd to cloud. I’m surprised Slashdot keeps comments from 20 years ago. A disaster could take this archive of nerd lore away from us.
    • Nah, easy to preserve, get a USB video converter, they take RCA, PAL, NTSC, various other composite formats and even the channel 3/4 output of VCR.

      Less than $20

  • You can run your old DOS programs all the way back to version 1. There are converters for all the popular 1980s PC software formats.

    The only issue is if you care enough about your data to do something about it now when the tools to preserve, run or convert exist.

    If you lived in the CP/M, Xenix, Atari, VMS, or PDP-11 Unix worlds there are open source solutions out there too.

    Not really seeing any problem...

  • Because computers are growing far more powerful at very high rates I would think that any advanced computer will be able to read any stored data or even auto create an OS that can run old software as if were a trivial thing to do at a moments notice.
    • by godrik ( 1287354 )

      If it is purely data, then I guess it is true. But if it is code, then you also need an execution platform and peripherals for it.

      The screen for the vectrex (an old gaming system) was not a raster screen but a vector screen. So emulation on a raster screen never looks quite right.

      The wii's input may become particularly hard to emulate in the future, there are quite a few buttons place in particular places, motion control, screen pointer. Having a controller that feels true may be difficult. And that's befor

      • I probably can't run my old FOCAL programs from college on a PDP-8 emulator. Then again it probably would work. I have a few tubes of new 6100 processors on hand, actually.

      • We don't need to archive Wii controllers, just the information to produce them. 3D models for the printer, protocol specifications. In two hundred years the Museum of Early Computed Recreation can just hire an electronics engineer to reproduce the insides, stick it in a 3D printed case, and they have a controller.

  • There is no doubt that the correlation between value of data and popularity is fraught with - heck, you may as well call it a randomness.

    When someone doing a dance in a video is millions of times more popular than a scientific paper on the impact of climate change, who determines what is expendable?

    I'll take my bets that should humanity still be around in 100 years, that dance video has more chance of entering the cultural history of the planet than the scientific paper has of being remembered and cited.

    Thi

    • A cat GIF in an otherwise serious technical presentation can often elicit more response and enjoyment than the entire presentation itself.

      That's often a sign that the technical presentation is crap.

      If someone is making a technical presentation about how to cheaply transmute wood to gold or a cure for cancer, the audience isn't going to want to be distracted by cat videos.

  • I guess I'd better plan to replace my 6 year old backup NAS drive soon then.
  • by IWantMoreSpamPlease ( 571972 ) on Friday May 17, 2019 @03:06PM (#58610168) Homepage Journal

    I recently (jan of this year) needed to find a file for a client on a batch of 3.5" floppy discs.
    Of the 50 I head to search through, some were *27* years old, and all but one were readable.

    Flip side:
    A few years ago I wanted to install WInNT 4.0 on a modern(ish) laptop, just to see if I could, and get everything working (including USB, which you can)

    Just finding on service packs, patches, etc online was a monumental task, MS has made a concerted effort to expunge this from their systems, and most links I ran across simply pointed to dead links on MS' servers.

    In the end I have to visit some eastern european university FTP links to find the service packs and whatnot.

    No moral to this story, just pointing out much of the old info is actively being removed.

    • by kackle ( 910159 )
      I still use floppy disks about every other week. They do wear out from use. And their longevity varies greatly based on their quality (it used to be that certain brands were better than others).
      • IIRC these were all Verbatim discs.
        Which I seem to recall when I used such things, to be a solid brand.

        On the flip size, remember Zip100 discs and the disaster the "click of death" was to both the disc and the drive?

        What a mess...

        • by kackle ( 910159 )
          That's funny because if I recall, I had much trouble with the 5.25" Verbatim disks in my 1980s/Commodore 64 days. But honestly, I was a kid then, I had never dealt with such media before and could have very well had had a flaky floppy drive too that the Verbatim brand was sensitive to.
    • by Pascoea ( 968200 )

      A few years ago I wanted to install WInNT 4.0 on a modern(ish) laptop, just to see if I could, and get everything working (including USB, which you can)

      Have you tried these new things called "Friends" and "Outdoors". (I jest, of course)

    • Yeah, a lot of this is self-panicking. Maybe marketing driven or just not realizing that things last a lot longer on the shelf then they don when you spin them up and down all the time? I've had plenty of the same experiences - decades old stuff just works again when I check it before tossing it (because, space, man, space...).
  • When these articles come up I always try to ignore them, but this time I'll bite...

    1) Magnetic storage can last 20~30 years depending on how its stored. (Ask any sysadmins that still use tape) not 3 to 5 years as the author suggests, I have old school RAID arrays that have been up longer then that I have yet have to replace any of their disks. I have a 486 running Slackware with a 8GB Maxtor IDE disk in it, still has not fallen yet. So those numbers they are shouting at the top of their lungs are utter non

    • You ought to add that acid-free paper rarely lasts 500 years, unless it's protected from humidity, light, insects, dust, fire, and a host of other problems.
    • by xtal ( 49134 )

      Are you actually running those in production, or just as morbid amusement?

      It seems wildly irresponsible to not replace such equipment, and wasteful of power not to virtualize?

      Production (e.g. makes me money) equipment is cycled out every 18-24mo.

      I have much older kit around for amusement, but not in use.

    • you are wrong

      your cute little home toy's life are not indicative of the averages of production systems. The averages they are "shouting at the top of their lung" are backed by data, they are facts.

      • There is nothing wrong about my post, notice I said ***I*** and then went on to say drives can fail prematurely? Yea you should kinda read that and as a solution Ceph + ZFS or CephFS is known to allow data to live for a very long time, how long? As long as you can maintain the systems, thats how long... Also my "cute" little toys store data for a company, the RAID arrays aren't mine, the 486 is mine (I keep it around to ensure any encryption keys I generate don't have flawed RNG) because Intel and AMD stil
        • The big failures for first four yeras in home grade drives have nothing to do with batches but everything to do with manufacturing defects which will occur across batches or in batches regardless. 10% of home grade drives fail in 3 years, 20% in 4...

          that is scary. that is reality.

  • There's a point missing here: if data on the internet is increasing at 70 TB/sec, and this is to be backed up redundantly, it needs > 140 TB/sec to be archived somewhere. Where are we going to store all this media? And who is going to test it periodically to ensure it is still readable?

    I imagine that almost all this data is useless anyway: archeologists search rubbish heaps to find how people lived, but how many FB pages of cat photos does posterity need? And given the amounts of nonsense, lies, and jus

  • I have some old resumes that were saved with Microsoft Works. I haven't yet found a modern editor that can open them.
    I have an old GEnie email archive from the early 1990s...I can open it in a text editor and "kind of" read through them, but it's not easy.
    I have old bookkeeping records saved in Microsoft Money format. That too is long gone.

    Any data that is not actively maintained...rots. Even if that data is on good old paper.

A morsel of genuine history is a thing so rare as to be always valuable. -- Thomas Jefferson

Working...