Become a fan of Slashdot on Facebook


Forgot your password?
Data Storage

Avoiding a Digital Dark Age 287

al0ha writes to recommend a worthwhile piece up at American Scientist on the problems of archiving and data preservation in an age where all data are stored digitally. "It seems unavoidable that most of the data in our future will be digital, so it behooves us to understand how to manage and preserve digital data so we can avoid what some have called the 'digital dark age.' This is the idea — or fear! — that if we cannot learn to explicitly save our digital data, we will lose that data and, with it, the record that future generations might use to remember and understand us. ... Unlike the many venerable institutions that have for centuries refined their techniques for preserving analog data on clay, stone, ceramic or paper, we have no corresponding reservoir of historical wisdom to teach us how to save our digital data. That does not mean there is nothing to learn from the past, only that we must work a little harder to find it."
This discussion has been archived. No new comments can be posted.

Avoiding a Digital Dark Age

Comments Filter:
  • The question remains (Score:2, Interesting)

    by Anonymous Coward on Tuesday February 23, 2010 @07:53PM (#31252842)

    How much of it is really worth saving? Except The Goatse image and a good RickRoll video I mean...

  • by malkavian ( 9512 ) on Tuesday February 23, 2010 @08:03PM (#31252966)

    About storing data will change. Historically, we've stored on paper, stone, or whatever could be inscribed. The 'backups' for data has been more about attempting to 'inscribe' media with the digital info.
    Perhaps we're entering an era where we'll be trying to keep information 'live' perpetually, with the internet the first attempt at having an active library (though there are currently lots of cracks for information to be lost).

    Many of the laws that overly stymie information flow (DMCA etc.), I think, are just a knee jerk reaction in the way printing presses were suppressed, and controlled until everyone realised the benefits of having them opened up.

    Still, having the long term offline stores is no bad thing..

  • by jaavaaguru ( 261551 ) on Tuesday February 23, 2010 @08:19PM (#31253164) Homepage

    You mean like I actually went there recently to look at old Geocities, and was shocked that they don't have it all backed up there. has pretty much everything else I've looked for. Any idea why geocities is not there?

  • Re:Won't matter (Score:5, Interesting)

    by Third Position ( 1725934 ) on Tuesday February 23, 2010 @08:24PM (#31253230)

    Our landfills will provide all the info they need.

    Well, I'm not entirely sure of that. If you pick up a stone or a paper with characters on it, you at least have an idea what it's purpose was. But 5000 years from now, how does someone interpret a shiny little disk? It might be a long, long time before someone is able to discern it's purpose, let along figure out how it's encoded and how to un-encode it. And that's even before getting a look at the language, and learning how to translate that.

    That's one advantage of paper, stone and parchment - they don't assume a technical infrastructure in order to use them.

    I have heard that some of the braided ropes left by Mayans might actually be a "written" language. But consider that it's taken us over 500 years to suspect these braids are a form of media, let alone learned to read it, and you can imagine what a future civilization might be confronting trying to figure out our digital media.

  • by rudy_wayne ( 414635 ) on Tuesday February 23, 2010 @08:24PM (#31253232)

    The Domesday Book was commisioned in December 1085 by King William (aka William the Conqueror, who invaded ngland in 1066). The first draft was completed in August 1086 and contained records for 13,418 settlements in the English counties south of the rivers Ribble and Tees (the border with Scotland at the time). It is a detailed statement of lands held by he king and by his tenants and of the resources that went with those lands. It records which manors rightfully belonged to which estates, thus ending years of confusion resulting from the gradual and sometimes violent dispossession of the Anglo-Saxons by their Norman conquerors.

    In 1986, at a cost of £2.5 million, the UK compiled the contents of the Domesday Book into electronic form that was stored on laserdiscs. The information stored on the laserdiscs, which is the equivalent of several sets of encyclopedias, is now unreadable because the equipment needed to read the discs is no longer available. Meanwhile the original book is still readable after more than 900 years.

  • by syousef ( 465911 ) on Tuesday February 23, 2010 @08:28PM (#31253282) Journal

    In my own quest to preserve my digital photos, I've created multiple backups on hard disk including a remote backup which gets updated every few months. I use different disks created by different manufacturers and buy new disks every couple of years (but do not throw away old copies).

    I've recently come across another aspect that isn't addressed by the article. Data that is in use in an online copy can be modified (including corrupted).There is no point in copying/propagating data if the data you are copying is damaged. Typically this has happened when I've tried DAM software like Lightroom which will modify the original file despite claiming to be non-destructive I have no proof that photos were re-encoded or quality was reduced but I do know original files were altered, and I want an original unaltered file preserved

    Most people when they backup files do very little verification to ensure the files they are copying today are the same files that were created 5 or 10 years ago. They rely too much on backup software to do this for them, with no attention paid to what's happened to the data between copies. To keep this under control I've started putting checksums on all my photo files, which I check when I create a fresh copy.

    Of course where my photos are captured in a proprietary format I copy to an open or at least well documented format (typically jpg, sometimes also tif). This is done as soon as I transfer the photos, which are not removed from the camera card until i have 2 additional copies. So I shouldn't have the same issues that the author had assuming jpg can still be read throughout my lifetime.


  • by Ltap ( 1572175 ) on Tuesday February 23, 2010 @09:01PM (#31253672) Homepage is the solution, and this is just one of those problems where throwing money at it actually works - give them more bandwidth, more contributors, and more disk space, and they could work wonders.
  • Re:Won't matter (Score:4, Interesting)

    by ObsessiveMathsFreak ( 773371 ) <obsessivemathsfr ... t ['com' in gap]> on Wednesday February 24, 2010 @12:52AM (#31255604) Homepage Journal

    A related but more pertinent point is that no-one right now is able to archive or in most cases obtain anything because of copyrights and DRM.

    I work in academia and I can tell you that future researchers are not going to be able to get their hands on 90%+ of the papers written today because the private companies that own them will lose the data when they inevitably go bust (Or just lose it). It will be one of the huge ironies of history.

  • Re:The fight is lost (Score:3, Interesting)

    by BrokenHalo ( 565198 ) on Wednesday February 24, 2010 @01:42AM (#31255930)
    I have code and documents dating back to 1976 on a HDD on this machine. Until 15 years ago I had it all stored on 800BPI mag tapes, but before I left my last serious "big-iron mainframe" site I transferred it across to floppies. I doubt if I'll ever need the files again, but since they don't make any significant dent in my storage, there's no reason to throw them away.

    I know many historians (in fact my wife is one), and one day someone might be more interested in a perspective on '70s and '80s programming than I am right now. If I throw it out, that information will be gone forever.
  • Re:Won't matter (Score:3, Interesting)

    by jc42 ( 318812 ) on Wednesday February 24, 2010 @12:20PM (#31260414) Homepage Journal

    But 5000 years from now, how does someone interpret a shiny little disk?

    They wouldn't be able to use that stuff because of copyrights and DRM

    You're probably right; copyright extensions will probably eventually last that long.

    But historically, this whole issue is nothing new.A number of historians have pointed out that, for example, our "scientific methods" have been independently discovered by intelligent people in most societies. The "scientific revolution" in the Western world wasn't based on discovery of scientific methods of investigation. Rather, it was the result of the new concept of open publication. Before that, technical knowledge was generally closely held by "guilds", organizations that controlled and restricted access to their specialized knowledge. This is clear in medical areas, where there is all sorts of "folk medicine" that typically turns out to include knowledge of a lot of naturally-occurring drugs. Western medicine surpassed the "medicine men" of other cultures in the late 1900s, because the Western medical people started publishing their results. This slowly produced a distributed body of knowledge that was available to all the practitioners.

    But even in modern Western societies, there's a major exception: Knowledge developed by "private" organizations such as corporations is still closely held, and protected by the legal system. Even the patent system, which generally requires publishing the details of a patent, limits the usability of the information to people who have paid the appropriate license fees. The motive is profit, but the actual effect is very often to block use of the information by others. Trade Secret is, of course, the old guild-style secrecy in a modern form. Information handled this was is routinely lost when the single owner dies or loses interest due to low profits.

    In any case, what's happening with digital records fits right in with the historic secrecy that most societies have had. The Western scientific community is one of the few known cases of free and open sharing of information. You'd think its spectacular success over the past few centuries would have taught us something. But this seems to not be the case. Even the computer industry, which is especially dependent on modern science for its very existence, tends to keep most of its records in closed, proprietary forms that are protected by trade secrecy. We do have a significant "Open Source" subculture whose results have been in line with the rest of the scientific world's. But most of the commercial part of the industry treats the Open Source approach with contempt and does extensive PR to prevent its adoption by the wider culture.

    So it shouldn't be surprising that digitized records are a sinkhole of information. Such data has been and will continue to be quickly lost in the same way that the traditional Guild system lost most of its information within a few generations. If you want information saved, you have to put it into a form that will survive and be readable in the future. Most of our current digital media won't even outlive the people who write to it.

  • Re:Won't matter (Score:3, Interesting)

    by jc42 ( 318812 ) on Wednesday February 24, 2010 @12:47PM (#31260828) Homepage Journal

    These people are talking about on the scale of thousands of years, the point where recovering data is a matter of "What were our ancestors like?" rather than "Oh look, a zip disk!".

    We have had a very recent success story of a case like this. After the Spanish destroyed most of the civilization of Central America, the Mayan writing system became unreadable. There are thousands of "stelae", stone columns covered with writing, all over the area, and they were generally understood to be historic markers. They did contain dates that were readable (since any mathematician will recognize the numbers as such and will understand the notation), but nothing else could be decoded. The writing was cracked in the 1970s, with a lot of help from speakers of the modern Mayan languages. Scholars are now busy decoding all those historic markers and the inscriptions on ruins of old buildings. Most of the original Mayan libaries were burned as "heretical", and only a handful of books survived (ironically smuggled to Europe by Catholic priests who understood their value). One of them is an astronomical reference text that tells us a lot about the capabilities of Mayan astronomers, but we don't have much else of what could have been thousands of other technical works.

    So in that case, we've gone from "Oh look, a carved stone historical marker; I wonder what happened here" to "Hmmm ... On July 27, 1147, there was a battle here in which the forces of so-and-so city won. I wonder where that city was? Does anyone recognize the name?"

    A funnier example: A few years ago, someone published recipes for drinks made from ground chocolate beans and hot peppers (and no sugar), translated from some old writings. There are several companies in Central America now selling the ingredients and instructions for making such potions, which were apparently quite popular with the Mayan upper crust a thousand years ago. They're really macho drinks, guaranteed to fry your tonsils.

    Both of these do tell us a lot more about those societies than just "Oh look, a stone pillar covered with writing!"

    (In case anyone wants a good scholarly project to work on, people are also tackling the pre-Mayan writing systems, e.g. the Olmec writing. They're all related, so cracking Mayan writing is helping decode the others. There are images online, so interested people anywhere in the world can get involved. That should satisfy just about anyone's nerdy desire to get involved in decoding obscure encoding systems. ;-)

Q: How many IBM CPU's does it take to execute a job? A: Four; three to hold it down, and one to rip its head off.