Your Internet Data is Rotting (theconversation.com) 141
MySpace, which recently lost 50 million files uploaded between 2003 and 2015, is not alone in encountering problems. As the internet grows, batches of old information are increasingly disappearing from it. From a story: Amazon cloud services, for example, also experienced a substantial outage in 2011 and another in 2017. Though temporary, and without actual loss of data, these outages left users without access to precious and important files for some time. Preserving content or intellectual property on the internet presents a conundrum. If it's accessible, then it isn't safe; if it's safe, then it isn't accessible. Accessible content is subject to tampering, theft or other sorts of bad actions. Only content that is inaccessible can be locked and protected from hacking.
The internet currently accesses about 15 zettabytes of data, and is growing at a rate of 70 terabytes per second. It is an admittedly leaky vessel, and content is constantly going offline to wind up lost forever. Massive and desperate efforts are underway to preserve whatever is worth preserving, but even sorting out what is and what is not is itself a formidable undertaking. What will be of value in 10 years -- or 50 years? And how to preserve it? Acid-free paper can last 500 years; stone inscriptions even longer. But magnetic media like hard drives have a much shorter life, lasting only three to five years. They also need to be copied and verified on a very short life cycle to avoid data degradation at observed failure rates between 3% and 8% annually. Then there is also a problem of software preservation: How can people today or in the future interpret those WordPerfect or WordStar files from the 1980s, when the original software companies have stopped supporting them or gone out of business?
The internet currently accesses about 15 zettabytes of data, and is growing at a rate of 70 terabytes per second. It is an admittedly leaky vessel, and content is constantly going offline to wind up lost forever. Massive and desperate efforts are underway to preserve whatever is worth preserving, but even sorting out what is and what is not is itself a formidable undertaking. What will be of value in 10 years -- or 50 years? And how to preserve it? Acid-free paper can last 500 years; stone inscriptions even longer. But magnetic media like hard drives have a much shorter life, lasting only three to five years. They also need to be copied and verified on a very short life cycle to avoid data degradation at observed failure rates between 3% and 8% annually. Then there is also a problem of software preservation: How can people today or in the future interpret those WordPerfect or WordStar files from the 1980s, when the original software companies have stopped supporting them or gone out of business?
precious files? (Score:5, Interesting)
Your files are far from precious. The data that your files generates however, is.
Your files are stored on the cheapest disks we can find, and backed up whenever we feel like it, and shared globally.
OUR data however, is stored on quad-redundant disks, backed up regularly, on multiple formats, and off site locations, and secured.
Once we snarf all the useful bits of data from your file, we couldn't care less what happens to it.
Re: (Score:2)
Only content that is inaccessible can be locked and protected from hacking.
If only we had some kind of device that you could write data to, but not rewrite the data. Maybe they could have some sort of breakable plastic tab that prevented the drive write mechanisms from engaging at a hardware level, so you couldn't alter them without physical access. Or better yet maybe some physical property of the media would prevent it from being altered digitally once it had been run over once... of course nobody can protect against a building fire and all, but surely that might prevent "hack
Re: (Score:1)
Re: (Score:3)
I know you were joking, but it' important to note that magnetic storage fails fairly quickly and writable CDs cloud over with time— we simply don't yet have a reliable means of permanently storing digital data. Lots of digital archivists and technologists have been working on this problem for a really long time; it's not as if nobody thought to use existing technology. Every existing fiscally feasible technology fails us in some way when it comes to long-term digital archiving.
I work on a team at a la
Re: (Score:3)
Re: precious files? (Score:2)
You could also use microfilm to store things like photos & scanned images in hybrid analog + digital form... an image preview of the original page, alongside a high-res scanned copy in 2D barcode form (or "3D", using multiple densities of gray to pack multiple bits into each barcode dot).
The main downside is extraordinarily slow production & restoration. It takes time to develop microfilm to archival standards, and you'd probably have some degree of (correctable) error right off the top once you fin
Re: (Score:2)
As my life, can not be backed up, nor is it all that enduring, I am not really all that fussed about data. Seriously any data I really want to keep, is so minimal in reality, that hard copy is good enough. As an introvert computer geek, I have always taken the view, any data I create, I can recreate and more often than not, do a better job the second time round. Back up sometimes but not that often, except for the more essential stuff, which in the end I often just delete some years later because it was not
Re: (Score:2)
>Your files are far from precious. The data that your files generates however, is.
Your files are far from precious. The data that your files generate however, are.
There, fixed that for you.
Re: (Score:2)
>Your files are far from precious. The data that your files generates however, is.
Your files are far from precious. The data that your files generate however, are.
There, fixed that for you.
It's like a thousand tech writers all screamed in agony and then fell silent.
Re: (Score:2)
No, you really didn't
https://www.onlinegrammar.com.... [onlinegrammar.com.au]
That's some revisionist grammar nonsense you found there.
Re: (Score:2)
I did. I called it revisionist - of the form "People have been getting this wrong since they started buying computers in the 80s, so it's right now". No it isn't. It's still wrong and sounds weird. Try saying 'bytes' instead - "The bytes that your files generates however, is".
Re: (Score:2)
I sometimes wonder what some future techno-archeologist will have to sift through to learn about our time.
I'm shocked! (Score:2)
MySpace still exists?
The most important things are already gone (Score:2)
Sadly it is gone, especially the videos of them vaporizing watermelons with a potato canon with 200psi behind it. 30 frames a second and you could see the melon expanding the moment of impact and the next frame just showed red mist.
Pre YouTube brilliance and now it is all gone...
Re: (Score:1)
I was part of the group working on those videos while I was at Rice getting my PhD. Glad you liked them. We were picking up watermelon slurry for hours. I might still have some of the raw footage lying around on 3.5" floppies, though I bet I can't read them anymore.
Wordperfect and WordStar... no problem! (Score:3)
the "soffice" command line converter of LibreOffice has WordPerfect filter for conversion, and for WordStar just use LIbreOffice 3.6.6 which has filter (version 4 dropped support)
All the common 1980s stuff has converters in the open source world, dbase and foxpro files, Lotus 1-2-3, etc.
Re: Wordperfect and WordStar... no problem! (Score:1)
What about my TRS-80 Electric Pencil files?
Re: (Score:2)
you mean this? https://www.classic-computers.... [classic-computers.org.nz]
Re: (Score:2)
Conversion is only half the issue. In reality, some users might also have plenty of similar files on old computers, stored on floppy disks, or an ATA hard drive that requires manual CHS configuration. Maybe it could be farther back, and really be a Commodore 64 Speedscript document. Regardless of which conversion tools are around, they aren't helpful if you can't access the documents.
I had two troubles with the conversion. First was that the WordPerfect docum
Re: Wordperfect and WordStar... no problem! (Score:1)
There are utilities out there that let you edit the CHS numbers on BIOSes that only have hard coded drive types. It reads the BIOS image off the chip in-system, then lets you edit the drive table and makes a hex to burn into a fresh eprom.
There are a LOT of old utilities out there. Best to keep the old archives like SIMTELNET on CDROM for perpetuity.
Re: (Score:2)
You have to convert the WordPerfect file on command line before opening. Never had a problem myself. You want specific number of lines per page for a poem? Suck it up and hand edit it afterwards!
USB floppy drives exist, for both 3-1/2 and 5-1/4"
Comodore 64 disks can be read by normal PC using a universal floppy controller, those can read all the old apple, atari, amiga, next, etc.
The info for ATA and MFM drive settings are on the web, and the ISA controller boards still sold.
Only the unmotivated would hav
Re: (Score:2)
USB floppy drives exist, for both 3-1/2 and 5-1/4"
USB floppy drives exist for 3-1/2", *not* 5-1/4".
Re: (Score:2)
having trouble with your search engine?
they absolutely exist, we have one at work
http://shop.deviceside.com/ [deviceside.com]
Don't let data stay static... check it (Score:3)
One rule is to have some means of checking data. Every so often, verify it on whatever media, then perhaps move it to somewhere else. For example, verify what is stored on Amazon Glacier, and move it to Glacier Deeparchive (when it goes GA), if it is archive data that is never touched. Or, copy the LTO-5 tapes the stuff is on to a new LTO-8 tape.
Another rule is to have archives on different media. Burn a copy to optical media (M-DISC Blu-Ray), and store a copy in AWS Glacier, or a copy on a hard disk, and another copy stored in a Wasabi bucket. This way, if you lose the online copy, you have a local one.
Re: Don't let data stay static... check it (Score:1)
It can be important if it's YOUR data. Lots of people have massive collections that aren't really their own data. That complete collection of Playboy centerfolds you saved off a binary usenet group? Probablybplenty of copies out there. The same for "songs" and movies. Probably 99% of internet traffic fits in that category.
If your data rots away... (Score:3)
Re: (Score:1)
When I did IT support the first thing we would look at on systems when backing up is what really needs backing up? Usually that came down to just da
Re:Management is key to archiving? (Score:2)
Other things end up in an application's documents folder, or the cache-all Documents folder, or worse the Downloads folder, and I don't think I e
Re: (Score:1)
Good luck auto-deleting the printouts in my desk drawer.
Re: If your data rots away... (Score:1)
Good luck greping those documents.
Re: (Score:2)
A fire, leaky roof or broken water main just might do that. Or the glass of water you drop on them.
What else is new? (Score:1)
You mean if I hand my data to someone else for safe keeping, I can't be ensured they'll protect it forever? Shocking.
Yes of course (Score:2)
If you outsource management of your systems to an external party, you also outsource your control over it. Is this a surprise for anyone?
Re: (Score:2)
That is what encryption is for. Of course, key management becomes an issue, but it is a lot less to keep track of compared to terabytes of data. Worst case, print out the key, or use this [cryptosteel.com] to ensure you have a backup of your key that won't be rotting anytime soon.
Re: (Score:2)
Encryption will help you against server failure? Thats new
Look Upon My Works Ye Mighty and Dispair! (Score:1)
who knows what the grand kids will consider important and worth holding on to... There is so much dilution of information these days that there's no guarantee that what is cherished today will remain so tomorrow.
We have been loosing data for thousands of years (Score:2)
Stone tablets shatter, Scrolls and Paper rot, We have fires, floods, and just run out of space and a decision on what data stays and goes is made. I remember volunteering for the library, they had me go threw the books, and take out book that haven't been checked out in 5 years. Then the librarian would go threw these books and decide if they were classics deserved to be saved, and the rest went up for sale.
If information is deemed vitally important, we make copies, use different mediums, thus is protecte
Re: (Score:2)
Data rot follows Moore's Law. We are losing more data at ever increasing rates.
Re: (Score:2)
"Oh who cares, it's just a stupid vase" -- Citizen of Pompeii circa 79 AD
Re: (Score:2)
Good for that citizen, since getting your self out ASAP and forgetting about most possessions is the best move.
Re: (Score:2)
"Oh who cares, it's just a stupid vase" -- Citizen of Pompeii circa 79 AD
At the time, it was.
Seriously though, if only I'd been smart enough to save my original Major Matt Mason Space Crawler and my Matchbox cars, I could sell them on eBay and retire.
Re: (Score:2)
that's my point, what we consider mundane and pointless (like facebook/twitter -- whatever) would be a potential treasure trove for future archaeologists. (and remember, they're currently sifting through the remains of civilizations who also thought they were the pinnacle of humanity.)
Re: (Score:3)
Stone tablets shatter, Scrolls and Paper rot, We have fires, floods, and just run out of space and a decision on what data stays and goes is made. I remember volunteering for the library, they had me go threw the books, and take out book that haven't been checked out in 5 years. Then the librarian would go threw these books and decide if they were classics deserved to be saved, and the rest went up for sale.
If information is deemed vitally important, we make copies, use different mediums, thus is protected from data rot. I think we have learned our lesson from the Library of Alexandria. MySpace and Facebook data isn't that valuable in general.
^^^This. If this type of thing hadn't been happening from the beginning of civilization, we'd have perfect records back to the beginning of civilization. People decide what's important to make sure is safe, and what isn't. It's always been that way.
Re: (Score:2)
Re: (Score:2)
That's up to the individual owner of that data. I can answer for myself, my important data is on multiple drives (mostly flash, whether it's a USB drive or an internal SSD) in multiple copies that I keep locally, as well as online. Used to be CD-R or DVD-R. As formats change and/or become no longer viable, I make sure it gets copied to newer tech.
The good thing here, and what's different from the rest of human history, is that it's possible to make perfect copies of digital data. In the middle ages, whe
Re: (Score:1)
Although I agree in general with this point of view, I think one of the concerns is:
MySpace and Facebook data isn't that valuable in general.
Is it? Maybe there was something of value there that should have been saved? Or some meaningful connection about people that should make sense after centuries when studying some social circles? Who knows (or is entitled) for it?
But again, the fight against information rot is a fight against thermodynamic's 2nd law, and we all know it is deemed to lose.
Re: (Score:1)
We have been loosing hounds for thousands of years.
Re:We have been loosing data for thousands of year (Score:4, Insightful)
Re: (Score:2)
I forget who, but I recall one author whose fictional future referred to the current era as "the Dark Ages" since all the records kept entirely on short-lived magnetic media were all lost.
It doesn't matter if we're losing 99.999% when we're recording a million times more. The original moon landing tapes are lost [reuters.com]. It's probably one of the most newsworthy moments of the 20th century and it wasn't valuable enough to keep and that was only 50 years ago. What's the chance we lose the original recording of the first Mars landing? 0%. 0.00000% unless the camera/transfer fails in the first place. Let's say WW3 breaks out today, all shit is loose and the nukes go flying. Would we lose tons and tons o
Re: (Score:2)
Needing some space, I just grabbed a few of the old computers my biz ran on in the 90's and checked for content.
All the disks worked fine, I even found some things I thought I'd lost and recovered them via emailing them to myself (USB didn't work on old enough machines). Not quite back to MFM drives, but hey - ~ 30 years, no issues. Kinda hurts to toss them in the green box, but a raspberry pi is now a more potent machine.
Maybe that few years
Re: (Score:2)
Not just data survival (Score:2)
TFS should state data is disappearing but data quality can deteriorate over time - more like rotting.
I used to argue with a manager over indiscriminately keeping data acquired during experiments. He wanted to keep everything, calculating the cost was minimal. My point was that there were real costs, corresponding to locating what you wanted, discriminating bad (known issues with the collection) from good data or, more generally, understanding what the data contained as context information may be missing or
Re: (Score:2)
Lots of things (Score:2)
Re: (Score:2)
Nah, easy to preserve, get a USB video converter, they take RCA, PAL, NTSC, various other composite formats and even the channel 3/4 output of VCR.
Less than $20
PC software preservation mostly non-issue (Score:2)
You can run your old DOS programs all the way back to version 1. There are converters for all the popular 1980s PC software formats.
The only issue is if you care enough about your data to do something about it now when the tools to preserve, run or convert exist.
If you lived in the CP/M, Xenix, Atari, VMS, or PDP-11 Unix worlds there are open source solutions out there too.
Not really seeing any problem...
Re: Let it begone. (Score:1)
A friend of mine once told me a story about a party he was at in the early 60's. They had a quantity of old glass slide photographs of Native Americans that they got drunk and smashed up.
One Portion Answered (Score:2)
Re: (Score:2)
If it is purely data, then I guess it is true. But if it is code, then you also need an execution platform and peripherals for it.
The screen for the vectrex (an old gaming system) was not a raster screen but a vector screen. So emulation on a raster screen never looks quite right.
The wii's input may become particularly hard to emulate in the future, there are quite a few buttons place in particular places, motion control, screen pointer. Having a controller that feels true may be difficult. And that's befor
Re: One Portion Answered (Score:1)
I probably can't run my old FOCAL programs from college on a PDP-8 emulator. Then again it probably would work. I have a few tubes of new 6100 processors on hand, actually.
Re: (Score:2)
We don't need to archive Wii controllers, just the information to produce them. 3D models for the printer, protocol specifications. In two hundred years the Museum of Early Computed Recreation can just hire an electronics engineer to reproduce the insides, stick it in a 3D printed case, and they have a controller.
How much is really valuable and how would we know? (Score:1)
There is no doubt that the correlation between value of data and popularity is fraught with - heck, you may as well call it a randomness.
When someone doing a dance in a video is millions of times more popular than a scientific paper on the impact of climate change, who determines what is expendable?
I'll take my bets that should humanity still be around in 100 years, that dance video has more chance of entering the cultural history of the planet than the scientific paper has of being remembered and cited.
Thi
Re: (Score:2)
A cat GIF in an otherwise serious technical presentation can often elicit more response and enjoyment than the entire presentation itself.
That's often a sign that the technical presentation is crap.
If someone is making a technical presentation about how to cheaply transmute wood to gold or a cure for cancer, the audience isn't going to want to be distracted by cat videos.
Only 3 to 5 years for a HD? (Score:2)
Two sided story here- (Score:3)
I recently (jan of this year) needed to find a file for a client on a batch of 3.5" floppy discs.
Of the 50 I head to search through, some were *27* years old, and all but one were readable.
Flip side:
A few years ago I wanted to install WInNT 4.0 on a modern(ish) laptop, just to see if I could, and get everything working (including USB, which you can)
Just finding on service packs, patches, etc online was a monumental task, MS has made a concerted effort to expunge this from their systems, and most links I ran across simply pointed to dead links on MS' servers.
In the end I have to visit some eastern european university FTP links to find the service packs and whatnot.
No moral to this story, just pointing out much of the old info is actively being removed.
Re: (Score:2)
Re: (Score:2)
IIRC these were all Verbatim discs.
Which I seem to recall when I used such things, to be a solid brand.
On the flip size, remember Zip100 discs and the disaster the "click of death" was to both the disc and the drive?
What a mess...
Re: (Score:2)
Re: (Score:2)
A few years ago I wanted to install WInNT 4.0 on a modern(ish) laptop, just to see if I could, and get everything working (including USB, which you can)
Have you tried these new things called "Friends" and "Outdoors". (I jest, of course)
Re: (Score:2)
No?! Is this a new website? /h
Does it require an invite or something?
Re: (Score:2)
Author is ill informed (Score:1)
When these articles come up I always try to ignore them, but this time I'll bite...
1) Magnetic storage can last 20~30 years depending on how its stored. (Ask any sysadmins that still use tape) not 3 to 5 years as the author suggests, I have old school RAID arrays that have been up longer then that I have yet have to replace any of their disks. I have a 486 running Slackware with a 8GB Maxtor IDE disk in it, still has not fallen yet. So those numbers they are shouting at the top of their lungs are utter non
Re: (Score:2)
Re: (Score:3)
Are you actually running those in production, or just as morbid amusement?
It seems wildly irresponsible to not replace such equipment, and wasteful of power not to virtualize?
Production (e.g. makes me money) equipment is cycled out every 18-24mo.
I have much older kit around for amusement, but not in use.
Re: (Score:2)
you are wrong
your cute little home toy's life are not indicative of the averages of production systems. The averages they are "shouting at the top of their lung" are backed by data, they are facts.
Re: (Score:1)
Re: (Score:2)
The big failures for first four yeras in home grade drives have nothing to do with batches but everything to do with manufacturing defects which will occur across batches or in batches regardless. 10% of home grade drives fail in 3 years, 20% in 4...
that is scary. that is reality.
70 TB/sec (Score:2)
There's a point missing here: if data on the internet is increasing at 70 TB/sec, and this is to be backed up redundantly, it needs > 140 TB/sec to be archived somewhere. Where are we going to store all this media? And who is going to test it periodically to ensure it is still readable?
I imagine that almost all this data is useless anyway: archeologists search rubbish heaps to find how people lived, but how many FB pages of cat photos does posterity need? And given the amounts of nonsense, lies, and jus
Re: (Score:2)
They'll probably conclude that most of the 21st century was just porn.
Data rots even when it's not "missing" (Score:2)
I have some old resumes that were saved with Microsoft Works. I haven't yet found a modern editor that can open them.
I have an old GEnie email archive from the early 1990s...I can open it in a text editor and "kind of" read through them, but it's not easy.
I have old bookkeeping records saved in Microsoft Money format. That too is long gone.
Any data that is not actively maintained...rots. Even if that data is on good old paper.