Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
Data Storage

Avoiding a Digital Dark Age 287

Posted by kdawson
from the you-must-remember-this dept.
al0ha writes to recommend a worthwhile piece up at American Scientist on the problems of archiving and data preservation in an age where all data are stored digitally. "It seems unavoidable that most of the data in our future will be digital, so it behooves us to understand how to manage and preserve digital data so we can avoid what some have called the 'digital dark age.' This is the idea — or fear! — that if we cannot learn to explicitly save our digital data, we will lose that data and, with it, the record that future generations might use to remember and understand us. ... Unlike the many venerable institutions that have for centuries refined their techniques for preserving analog data on clay, stone, ceramic or paper, we have no corresponding reservoir of historical wisdom to teach us how to save our digital data. That does not mean there is nothing to learn from the past, only that we must work a little harder to find it."
This discussion has been archived. No new comments can be posted.

Avoiding a Digital Dark Age

Comments Filter:
  • Won't matter (Score:4, Insightful)

    by countertrolling (1585477) on Tuesday February 23, 2010 @06:50PM (#31252808) Journal

    Our landfills will provide all the info they need.

    • Re: (Score:3, Insightful)

      by rubycodez (864176)

      right on, and for that matter it is silly to say we don't have paper records any more. We have even more of them than ever before. Receipts, leases, mortgages, contracts, invoices, manifests, packing slips, explanation of benefits (EOB), licenses, warranties, guarantees, manuals....fuck, if anyone thinks digital age means less just order a single piece of software on Amazon and by the you take everything out of the box you'll have generated at least eight items on the list I just mentioned. God damn!

      • Re: (Score:3, Insightful)

        Unfortunately, as someone who's tried to challenge a tax audit for old expenses, I can tell you that a lot of those records do not keep well. Many poor quality printouts are not likely to last even a few years due to the poor quality of the paper and the ink. This is especially true of receipts, which are on the cheapest printers possible.

    • Re:Won't matter (Score:5, Interesting)

      by Third Position (1725934) on Tuesday February 23, 2010 @07:24PM (#31253230)

      Our landfills will provide all the info they need.

      Well, I'm not entirely sure of that. If you pick up a stone or a paper with characters on it, you at least have an idea what it's purpose was. But 5000 years from now, how does someone interpret a shiny little disk? It might be a long, long time before someone is able to discern it's purpose, let along figure out how it's encoded and how to un-encode it. And that's even before getting a look at the language, and learning how to translate that.

      That's one advantage of paper, stone and parchment - they don't assume a technical infrastructure in order to use them.

      I have heard that some of the braided ropes left by Mayans might actually be a "written" language. But consider that it's taken us over 500 years to suspect these braids are a form of media, let alone learned to read it, and you can imagine what a future civilization might be confronting trying to figure out our digital media.

      • by JustOK (667959) on Tuesday February 23, 2010 @07:34PM (#31253352) Journal

        They wouldn't be able to use that stuff because of copyrights and DRM

        • Re:Won't matter (Score:4, Interesting)

          by ObsessiveMathsFreak (773371) <obsessivemathsfreak@nosPAm.eircom.net> on Tuesday February 23, 2010 @11:52PM (#31255604) Homepage Journal

          A related but more pertinent point is that no-one right now is able to archive or in most cases obtain anything because of copyrights and DRM.

          I work in academia and I can tell you that future researchers are not going to be able to get their hands on 90%+ of the papers written today because the private companies that own them will lose the data when they inevitably go bust (Or just lose it). It will be one of the huge ironies of history.

        • Re: (Score:3, Interesting)

          by jc42 (318812)

          But 5000 years from now, how does someone interpret a shiny little disk?

          They wouldn't be able to use that stuff because of copyrights and DRM

          You're probably right; copyright extensions will probably eventually last that long.

          But historically, this whole issue is nothing new.A number of historians have pointed out that, for example, our "scientific methods" have been independently discovered by intelligent people in most societies. The "scientific revolution" in the Western world wasn't based on discovery o

      • Re: (Score:3, Insightful)

        by Redlazer (786403)
        While I agree to some extent, an advanced culture is advanced not just with it's technology, but also in the way it thinks.

        I would suspect that, in the future, our ability to understand and figure things out will be far higher than it is today. Especially since the question of what a DVD is for is clear - not just to us now, but I would imagine even to someone who had no preconceived notion would be able to piece together the clues into what it might have been used for.

        A reflective on one side, perfectl

      • I like the method shown by the George Pal version of "The Time Machine". (I think it was George Pal's - it's the one that debuted a young, red-haired Michael Doohan). Little rings on a round table top. You can't help but want to spin the first one you see. Brilliant piece of UI design, I think, although difficult to accomplish without an opposable thumb (it's anthropocentric, of course - our robot overlords might not have the knack of spinning a ring on a table) but I doubt that's a problem if we limit
      • It's not about the texts. Archaeologists learn [arizona.edu] a lot from trash [learnnc.org] :

        "The unusable or unwanted remnants of everyday life end up in the garbage. By studying what people have thrown away, archaeologists can learn a great deal about a culture. This is true not only of prehistoric peoples who left no written record about their lives, but also of people today. Archaeologist Bill Rathje studies the garbage of Americans. He has learned many things about the relationships of human behavior and trash disposal, informati

  • The question remains (Score:2, Interesting)

    by Anonymous Coward

    How much of it is really worth saving? Except The Goatse image and a good RickRoll video I mean...

    • How much of it is really worth saving?

      Pictures of my grandmother from the '20s? Priceless. Mostly useless to anyone but our family, but there ya go.
  • How many of you have digital files from 15 years ago that you can read today? 20 years? There was no DMCA back then, now just imagine the future....

    • by Bluesman (104513)

      I do. The cool thing is that they all fit in a tiny portion of my portable hard drive.

      It's amazing that the entirety of everything I've produced since high school can fit on a single $100 device, with plenty of room to spare.

    • I have files from at least as far back as 1982 (Z80 source code). Of course they are not on their original media (which, might of been 8" floppies, I really don't recall.)
      • Re: (Score:3, Interesting)

        by BrokenHalo (565198)
        I have code and documents dating back to 1976 on a HDD on this machine. Until 15 years ago I had it all stored on 800BPI mag tapes, but before I left my last serious "big-iron mainframe" site I transferred it across to floppies. I doubt if I'll ever need the files again, but since they don't make any significant dent in my storage, there's no reason to throw them away.

        I know many historians (in fact my wife is one), and one day someone might be more interested in a perspective on '70s and '80s programming
    • by peragrin (659227)

      I do. while only 12 years I have tax forms from 1998 in PDF format that open just fine. I efiled those taxes too.

      Of course I worried about that 8 years ago, and switched all my files out of excel wnd word formats to text, rtf, and at the time OpenOffice. Now they are in ODF. Since ODF and PDF formats are easy to ready by many programmers, and thus open. Emails are stored in mbox. All of it is stored in multiple locations, with encryption(and decryption software) used as needed.

      I moved all my data beca

    • by iggymanz (596061)

      And what parts of those digital records would be *important* information? c'mon, you are talking about personal crap. Important records (birth certificate, medical records, academic records, insurance, account statements) will be on paper

      • Re: (Score:3, Insightful)

        by turbidostato (878842)

        "And what parts of those digital records would be *important* information? c'mon, you are talking about personal crap."

        What do you think History is but a lot of "personal craps" tied together?

    • by Kenshin (43036)

      Not much of note. A few old websites I made, a few 3D renders (can't find the models, though.) I didn't produce much worth holding onto back then.

      But pretty much all my stuff from 10~12 years back until now stays on my hard disk, and moves from new disk to new disk as I upgrade. All of my music and photos are managed by library apps, and I have automatic backups at least weekly. (Backing up is more convenient now, since I recently moved to a Mac and have Time Machine set to do it when I plug in my external

    • /raises hand.

      I still have the "utils" directory from my '286 even though the programs have been obsolete for a 'coon's age. (A 'coon in captivity, that is.) Every company I've worked at has had a "data refresh" plan of some sort where we move old archived data to new media. And only one of those companies ever mined that data for a useful purpose. The rest kept it "just in case".

      I have a feeling the problem isn't going to be that we retain very little important information but that, of the vast mountain

      • by Locklin (1074657)

        You are talking about files that you have personally retained a couple decades. I can go to the library and find transcripts of Darwin's, Galileo's or Da Vinci's notes and letters.

    • by Locklin (1074657)

      Exactly. It's not a technological problem it's a legal problem. The institutions that preserve printed material (mostly libraries) would be happy to help preserve digital information -if it was possible. Unfortunately nearly all digital content cannot be copied legally, and thus cannot be effectively archived by libraries. It takes a team of lawyers to do what archive.org does with web pages, and forget anything multimedia.

    • I still have a working Atari and a few dozen game cartridges, if those count...
  • by Anonymous Coward

    to ensure this never happens. This is the same reason why DVD's and Bluerays will never work in 100 years time.

    DRM will destroy any record of our current culture, but looking around at the abyss, I really have to say its for the better.

    But I already feel bad for the eventual people that will spend far too much time trying to recover "scary movie part 15" or some other 'gem' from our time. But much like 'abandonware' and other areas of trying preserve machine code, lawyers will always race in to make sure

  • by MagikSlinger (259969) on Tuesday February 23, 2010 @06:58PM (#31252900) Homepage Journal
    The main way ancient writing reached us is because someone copied it. Lots of copies. Sometimes translated into another language and back, for example, a lot of Greek learning went into Arabic and came back out into Latin or Greek. With all the copy protection and encryption on our media today, can we ever copy the data and be able to decipher it again?
    • by a whoabot (706122)

      "a lot of Greek learning went into Arabic and came back out into Latin or Greek."

      Can you give me an example of a significant text? I'm pretty sure it is a myth that lots of Greek learning has gone through this process. I see the claim made a lot, but I've never come across a text which has done this (the philosophers, the dramatists, the historians, the lyric poets, all seem to come from the original Greek).

      • by lennier (44736)

        I don't know about Greek, but it's a good thing that at least the authentication server for the Epic of Gilgamesh is still online.

      • Re: (Score:3, Informative)

        Probably the most significant texts to undergo this process were Ptolemy's Almagest and Euclid's Elements; both had been lost to Western Europe, and were thus translated in the Middle Ages to Latin from Arabic by Gerard of Cremona and Adelard of Bath, respectively. I believe in both cases the original Greek texts were eventually recovered by the West used for later direct translations, but for a while Western Europe knew Hipparkhos/Hipparchus as "Abrachir."
      • by Ltap (1572175) on Tuesday February 23, 2010 @07:59PM (#31253654) Homepage
        The problem is that very few identifiably Greek writings survive. In ancient times, copying was a bit like playing telephone - writing at the time was very politicized, so scribes would often alter works while copying them, mostly to give a local slant or simply changing the names. This makes it frustrating to trace things like legends (see: Noah's Ark/Epic of Gilgamesh and its infinite variations with every other culture that existed nearby). A lot of Greek and Roman writings are now quite simply lost for good, but almost certainly inspired works that aren't lost. For instance, the Odyssey and the Iliad were originally just two parts of the epic story of Troy (out of, AFAIK, four or five parts in total), and the set of works that we derive most of our knowledge of Rome from, Ab Urbe Conditum, are only partially preserved - it was a set that chronicled the history of Rome from its founding to when they volumes stopped being produced, and there were hundreds, enough to fill entire libraries. It was only in the Renaissance that anyone tried to assemble a collection, and we've only been able to come up with about 30 - if we had the full set, we would know a great deal more about Rome than we do now.
      • I see other people have responded, but a lot of the mathematical texts came that way. For example, Euclid's Elements was the most famous of these. There were a lot of books on the geometry and mathematical knowledge of that age, as well as most of the ancient astronomy. If I still had my text book from the "History of Mathematics" class I took, I could give you specific names and titles.

        Almost all of today's surviving texts of Archimedes came via Arabic: http://en.wikipedia.org/wiki/Archimedes#Writings [wikipedia.org]

        F

    • by DrYak (748999) on Tuesday February 23, 2010 @08:22PM (#31253898) Homepage

      The main way ancient writing reached us is because someone copied it. Lots of copies. {...} With all the copy protection and encryption on our media today, can we ever copy the data and be able to decipher it again?

      (And as another example of copies being important for preserving : Fritz Lang's Metropolis [wikipedia.org] got recently another 30 minutes of its missing part recovered from a copy located in Argentina)

      After a long enough time, virtually any DRM measure end-up being broken. What only matters is time, resources and some clever tricks (to avoid waiting until universe heat-death while bruteforcing a 4096bit key).
      So DRM has only 2 direct effects :
      - it annoys legitimate users everywhere with no practical reason.
      - it forces the basement-dwelling teen with too much free time on their hand to wait until 2 weeks before official launch date, instead of 3 weeks before, because it took 1 week to the pirates to find a way to break the DRM.

      This implies 2 results :
      - That the 99.99% of pirate users, will never ever interact with the DRM nor be affected by it in any way.
      - The important part : DRM protected piece of data will get copied, eventually and a lot. Lots of copies will exist and virtually 99.99% of these copies will be the "pirated" copies. Be it legal backup or unlicensed copies.

      So in the end, the DRM-protected data will survive, only not the DRM version itself, but the DRM-free version as found on The Pirate Bay and similar. Case in point : Classics emulation.
      Most of the companies which produced the game we played as children are now belly. Of the few remaining, few of them have kept the assets of their old production. Few of them are interested in doing anything with these old assets. The few who do, generally do modern re-imaging and re-interpretation, rather than re-issuing the old.

      So in short, if you ever wanted to pull back some of your children memories out of the grave, don't count on the original companies.
      Some time you can find still working vintage equipment and media - but these will eventually break.
      Today, the biggest part of these oldies are available ... as image of pirated disks. It's practically sure that, if in 2010 you want to play the same game as in 1985, you'd probably see a cracktro in the beginning.

      All your Commodore C64, Amiga, etc. favourite games are currently best sourced from download site which contain warez copies that were carried over back from that era, while at the same time the companies went belly up and/or let their assets rot.

      So, in 25 years, when most of the current media companies have either disappeared, or completely forgotten about today's media, your children's best way to find a copy of them to remember fond memories, would be finding a copy which will be the digital descendant of what's today on pirate bay.
      Yes, **AA, today's EVIL pirate, might be tomorrow's heroic archivist.

      In 25 years, when the current maker of

    • Re: (Score:3, Funny)

      by agrif (960591)

      I'm not sure where I heard this idea, but it bears repeating:

      Future historians will hate us, with a passion, because we encrypt even the most banal things. We encrypt movies, for God's sake! Where's the justification in that? We're robbing the future of our culture, even from things like movies with talking hamsters!

  • Quick... (Score:3, Funny)

    by eegad (588763) on Tuesday February 23, 2010 @06:59PM (#31252910)

    Everybody print out all their emails!!!

  • Those that forget history are doomed to repeat it - but these days, it seems that there's more and more effort put into actively avoiding learning from history.

    Or maybe I've just hit that age when The Kids ought to get off My Lawn.

    • by MrEricSir (398214)

      History is like software, it needs maintainers or it's doomed to disappear in the next version.

      • by iggymanz (596061)

        you mean "needs rewriters and revisionists."

        "History" is written by the winners to appease their benefactors, as they say.....

  • by malkavian (9512) on Tuesday February 23, 2010 @07:03PM (#31252966) Homepage

    About storing data will change. Historically, we've stored on paper, stone, or whatever could be inscribed. The 'backups' for data has been more about attempting to 'inscribe' media with the digital info.
    Perhaps we're entering an era where we'll be trying to keep information 'live' perpetually, with the internet the first attempt at having an active library (though there are currently lots of cracks for information to be lost).

    Many of the laws that overly stymie information flow (DMCA etc.), I think, are just a knee jerk reaction in the way printing presses were suppressed, and controlled until everyone realised the benefits of having them opened up.

    Still, having the long term offline stores is no bad thing..

    • by CharlyFoxtrot (1607527) on Tuesday February 23, 2010 @09:26PM (#31254488)

      Many of the laws that overly stymie information flow (DMCA etc.), I think, are just a knee jerk reaction in the way printing presses were suppressed, and controlled until everyone realised the benefits of having them opened up.

      Barbarians have always burned down libraries. No reason to think they'd stop doing that just because they wear ties these days.

  • by Eravnrekaree (467752) on Tuesday February 23, 2010 @07:04PM (#31252976)

    It is indeed a big problem. The problem was illustrated recently when Yahoo suddenly pulled the plug on Geocities, wiping out a vast cultural archive that went back to the early days of the internet, a lot of valuable information was lost as a result of that. Yahoo's blatant arrogance caused me to refuse to ever use any of their products again. Geocities was actually a fairly nice service, often people criticised it because of the ads, but how do you pay to continue to offer a free service. The loss of geocities was a perfect example of the need for a permenant store or online archive of information, personal websites and so on that can be maintained as a cultural legacy and informational resource.

    • If only there were such a thing [archive.org]...
      • Re: (Score:3, Insightful)

        by Eravnrekaree (467752)

        I checked archive.org backups of geocities. half of the sites are not backed up correctly. Mine was never backed up, it seems, at all. With most sites 90% of the files are missing. Is archive.org the solution? Apparently not.

        • Re: (Score:3, Interesting)

          by Ltap (1572175)
          Archive.org is the solution, and this is just one of those problems where throwing money at it actually works - give them more bandwidth, more contributors, and more disk space, and they could work wonders.
    • by jaavaaguru (261551) on Tuesday February 23, 2010 @07:19PM (#31253164) Homepage

      You mean like archive.org? I actually went there recently to look at old Geocities, and was shocked that they don't have it all backed up there. Archive.org has pretty much everything else I've looked for. Any idea why geocities is not there?

      • Interesting that you mention that. I havea geocities site so i knew of the Archive.org thing, that was supposed to back up the site. I checked archive.org backups of geocities. half of the sites are not backed up correctly. Mine was never backed up, it seems, at all. With most sites 90% of the files are missing. Is archive.org the solution? Apparently not.

        • Re: (Score:3, Insightful)

          by Ltap (1572175)
          Part of the problem is of manpower - geocities was just so massive, and Yahoo gave them very little time to archive anything properly, so most of it was simply a dash to copy as much as they could before it was deleted. When you look at public domain audio, video, and texts, you'll see that things have been done much better.
      • by lennier (44736) on Tuesday February 23, 2010 @07:51PM (#31253562) Homepage

        Perhaps because others were doing it. A number of independent projects tried to back up Geocities, and may have between them recovered most of the data.

        * http://geociti.es/ [geociti.es]
        * http://reocities.com/ [reocities.com]
        * http://www.archiveteam.org/ [archiveteam.org]

  • by Animaether (411575) on Tuesday February 23, 2010 @07:04PM (#31252978) Journal

    Seriously, Slashdot.. until there's a revolutionary insight into this matter.. quick posting these stories ad nauseum.

    For further commentary, see previous stories... here's one.. it's from september 2009 and -nothing has changed-.

    http://ask.slashdot.org/story/09/09/29/1646251/Archiving-Digital-Artwork-For-Museum-Purchase [slashdot.org]

    • by iggymanz (596061)

      especially that the main insight is that 99% of digital records are useless crap. Just like it won't matter if archaeologists never find 99.9999999% of our cities, when you've seen one Starbucks next to a McDonalds next to a Walmart, you've more than seen them all. The ditto mark will be the most used character recording our drivel... don't even get me started on our mostly devoid of talent "music" and "art" (is a frontal lobotomy prerequisite to being a rap star?)

  • You'll have to go to .wav (not FLAC)--just straight bits. This does away with both copy-protection and compression.

  • by enoz (1181117) on Tuesday February 23, 2010 @07:08PM (#31253018)

    http://archive.org/ [archive.org]

    They've already got a copy of your Geocities sites from the first Digital Dark Age.

    • ...One Site to find them, One Site to bring them all and in the Darkness bind them...
    • They aren't archiving everything the way they used to... I've been trying to convince them to archive a couple of sites for 2 years now and they still haven't shown up. I think they're allergic to Wordpress weblogs...
  • To forget is good (Score:3, Insightful)

    by Anonymous Coward on Tuesday February 23, 2010 @07:09PM (#31253026)

    IMHO we'll find that our problem is that we drown in a sea of useless information because we can't find the islands of relevance. Trying to archive everything will only lead to failing to archive anything. On the other hand I doubt that we'll lose much important information despite failing at organized preservation attempts, because important information is copied all the time, which is the only way for information to survive quickly changing technologies and file formats anyway.

    In a more philosophical light, I think that forgetting is good for us. It frees us from the constraints of our past and makes way for new ideas. Archives are backwards-facing, but we all live in the future, all the time.

  • by presidenteloco (659168) on Tuesday February 23, 2010 @07:10PM (#31253030)

    I think that many people are failing to appreciate the longevity of information preservation
    that cloud computing (more specifically, redundant, geographically distributed network storage) can bring.

    If we get the protocols right, and insist on open standards for data interchange, we can obtain
    properties such as:

    Data bundles that know how to move themselves to more recently commissioned, and/or more
    reliable hosts.

    Data bundles that know how to check in with copies of themselves, to make sure there are enough of
    them alive, and that they are adequately geographically distributed, at every given moment.
    If not, then more baby copies of the same data would be produced and stored elsewhere automatically.

    There are other issues to longevity of course, like maintenance of software that understands different
    versions of data etc. Not trivial but very doable.

    How long an individual disk or SSD or stone tablet lasts is COMPLETELY IRRELEVANT to
    the prospects for information longevity, given the network, and new levels of automated distribution
    that will take place on it going forward.

    • How long an individual disk or SSD or stone tablet lasts is COMPLETELY IRRELEVANT to the prospects for information longevity, given the network, and new levels of automated distribution that will take place on it going forward.

      I don't know that I agree with that.

      Compare, for example, letters written during the Civil War [dsu.edu], with email messages sent and received by those involved in either of the Gulf Wars. Which do you think had, at the time they were written, a better chance of being available to future hist

      • by Ltap (1572175)
        It depends on how the senders and receivers think about the information. I know people who kept every postcard and letter they'd ever received - I doubt you could say the same about that with email. People still just don't consider email a serious medium.
    • by lgw (121541)

      Information "in the cloud" will disappear the moment you stop paying for it. Corporate information in the cloud will come with a destruction date (as do paper corporate records in a storage facility, so no real difference there).

      • Well, one thing that will have to happen is the establishment of "public library" clouds. I guess you could imagine these being funded by various governments or non-profit associations.

        Or you could go with the massively P2P model in which the data is stored, in little fragmented encrypted chunks, on millions of edge devices on the net. i.e. peoples' personal computers, en masse, each contributing a tiny bit to the perpetual storage cloud.

    • Yeah, what could go wrong...

      Cloud is nice, and I have been very appreciative of Yahoo and Google archiving my e-mail for the last 10+ years without losing any (as opposed to my local copies), but a massive redundant system is no guarantee of future service in the face of bankruptcy, war, etc.

      The post Roman era dark age was brought about by the collapse of society - the next time we manage to do that (and, in geologic time, it's coming soon), it's going to be quite the mess. At least there wasn't much
    • by guspasho (941623)

      This assumes that the information age lasts forever. Technology continuously improves and human civilization survives without interruption. And most importantly, that information survives brief or even extended periods of irrelevance.

      Basically, your answer to how to survive a digital dark age is to assume that one will never occur?

  • Far outlasts stone, and if you did it right I'll bet you could get nearly 1Mbit per card without running into the problems of Lace Cards [wikipedia.org]

    • Re: (Score:3, Insightful)

      by lgw (121541)

      The army had a program to design a means of storing data in case of really being nuked back into the stone age. They chose punched metal tape. Most plastic doesn't last long whn exposed to sunlight or weather, and the downside of a card deck is obvious the first time you drop one down the stairs. It's a clevel idea really, since you can read punch tape manually if you have to, and it's far faster than cards.

  • by BluBrick (1924) <blubrick @ g m ail.com> on Tuesday February 23, 2010 @07:13PM (#31253082) Homepage

    We will naturally make multiple copies of everything we consider important, continually transcribing important data onto the latest generation data storage media. (Consider what was the very first publication printed on Gutenberg's big invention.) Unfortunately, that's not necessarily what will be considered important many generations into the future.

    I have every confidence that, far into the future, we will have or be able to develop the capability to read any media we preserve today. The problem then becomes how to determine what data we should should preserve now rather than how to preserve it. What do we know now that will be important and useful to someone 10^n years from today?

  • Not so hard (Score:3, Funny)

    by T Murphy (1054674) on Tuesday February 23, 2010 @07:17PM (#31253130) Journal
    Just put a massive data server in a spaceship and accelerate it near the speed of light. Data loss would be slowed enough that it would be negligible, and if we have to retrieve anything it should have a fast enough processor to respond to a request in a timely fashion and send off a pre-made copy of the needed data (as it may take too long to copy petabytes at near light speed).

    This should work out perfectly- by the time we have the technology to do this, today's worthwhile material should finally be coming out of copyright.
  • by rudy_wayne (414635) on Tuesday February 23, 2010 @07:24PM (#31253232)

    The Domesday Book was commisioned in December 1085 by King William (aka William the Conqueror, who invaded ngland in 1066). The first draft was completed in August 1086 and contained records for 13,418 settlements in the English counties south of the rivers Ribble and Tees (the border with Scotland at the time). It is a detailed statement of lands held by he king and by his tenants and of the resources that went with those lands. It records which manors rightfully belonged to which estates, thus ending years of confusion resulting from the gradual and sometimes violent dispossession of the Anglo-Saxons by their Norman conquerors.

    In 1986, at a cost of £2.5 million, the UK compiled the contents of the Domesday Book into electronic form that was stored on laserdiscs. The information stored on the laserdiscs, which is the equivalent of several sets of encyclopedias, is now unreadable because the equipment needed to read the discs is no longer available. Meanwhile the original book is still readable after more than 900 years.

    • Re: (Score:3, Informative)

      by tomhudson (43916)
      We can scan in the surface pits of the laser disk at high-enough resolution to decrypt the bit patterns - we no longer need the original readers.
    • I invite people to read more about your inaccurate statement. Copy paste much?

      http://www.si.umich.edu/CAMILEON/domesday/domesday.html [umich.edu]

      It's a NEW Domeday, not a scan of the old one. Emulating old computers is not that hard. COPYRIGHT seems to ne the real problem.

    • by elronxenu (117773) on Tuesday February 23, 2010 @11:21PM (#31255422) Homepage

      Because they forgot key parts of the process:

      • Keep it simple
      • Make lots of copies which are readily available
      • Keep converting to new formats over the years

      The UK fouled up by inventing new proprietary storage formats which needed custom hardware and software to read and process the data. The laserdisc needed a special laserdisc player and a BBC Micro. The BBC who produced this were years ahead of their time and had to invent a lot of stuff. Unfortunately the rest of the world invented a lot of different stuff, which is what we use today.

      And how many of these systems were produced? I don't know, but they cost 4000 pounds each which is a significant investment for a school and certainly the high price reduced the number of items which were sent into the community.

      Even though we have extracted the data from the original formats (and also obtained improved images by re-mastering original video footage) it seems that one of the main impediments to putting this data online is copyright - the contents of the 1986 project won't be out of copyright until 2090!

      The above two points come together with "keep converting to new formats". If your stuff is all proprietary, it may be hard to convert to new formats. If your stuff is copyrighted, you may be able to convert it but you can't distribute it, and widespread distribution is one of the requirements of effective data preservation.

      The data which was produced in 1986 wasn't lost and won't be lost. People are working with it and upgrading it. However, you won't be able to see it, primarily due to the shortsightedness of the original project.

      So loss of digital data is not so much a technical problem, more a social problem, of shortsightedness in creation, distribution and copyright.

      Kinda like the BBC's lost videotapes of Monty Python (or was it Dr Who?) ... priceless recordings were allowed to degrade and become unusable, were thrown away, or were overwritten ("media re-used"). I don't mean to point the finger only at the BBC - NASA did it too. Lack of foresight, folks.

  • by syousef (465911) on Tuesday February 23, 2010 @07:28PM (#31253282) Journal

    In my own quest to preserve my digital photos, I've created multiple backups on hard disk including a remote backup which gets updated every few months. I use different disks created by different manufacturers and buy new disks every couple of years (but do not throw away old copies).

    I've recently come across another aspect that isn't addressed by the article. Data that is in use in an online copy can be modified (including corrupted).There is no point in copying/propagating data if the data you are copying is damaged. Typically this has happened when I've tried DAM software like Lightroom which will modify the original file despite claiming to be non-destructive I have no proof that photos were re-encoded or quality was reduced but I do know original files were altered, and I want an original unaltered file preserved

    Most people when they backup files do very little verification to ensure the files they are copying today are the same files that were created 5 or 10 years ago. They rely too much on backup software to do this for them, with no attention paid to what's happened to the data between copies. To keep this under control I've started putting checksums on all my photo files, which I check when I create a fresh copy.

    Of course where my photos are captured in a proprietary format I copy to an open or at least well documented format (typically jpg, sometimes also tif). This is done as soon as I transfer the photos, which are not removed from the camera card until i have 2 additional copies. So I shouldn't have the same issues that the author had assuming jpg can still be read throughout my lifetime.

    --
    Sammy

    • by rubycodez (864176)

      bad choice! jpeg is lossy format, information is deliberately dropped to make an approximate reproduction!

      you're like the guy in the India Jones movie who drinks from a fancy chalice, has the flesh and guts dissolve and burn from his bones: "...he chose....poorly....."

      really, if you value your work do a little research, maybe standard such as "TIFF Revision 6.0 Final" or similar should be used, and perhaps with widely known and well documented lossless compression.

      • bad choice! jpeg is lossy format, information is deliberately dropped to make an approximate reproduction!

        Many cameras only capture in a lossy format such as jpg. Even those that have RAW sometimes use lossy RAW. Losses only occur once per save. So to mitigate you don't modify files repeatedly. If there is a need to do this, go back to the original, save to TIFF and edit from there. So long as you have the original preserved you can always reapply any edits.

        you're like the guy in the India Jones movie who drinks from a fancy chalice, has the flesh and guts dissolve and burn from his bones: "...he chose....poorly....."

        I can't tell if you're trolling or just being melodramatic.

        really, if you value your work do a little research, maybe standard such as "TIFF Revision 6.0 Final" or similar should be used, and perhaps with widely known and well documented lossless compression.

        TIFFs are a poor choice unless multiple edits are going to be made. They slow down current hardwa

        • by rubycodez (864176)

          was being melodramatic for fun,but I've had photos butchered by subsequent moving from one jpeg software to another.

          Yes, Windows XP will choke on your little ~52 megabyte TIFF. Real operating systems won't, Mac OSX with sufficient RAM and good software and there's no problem at all. Linux does ok too, though available free software not as high quality.

      • by swillden (191260)

        bad choice! jpeg is lossy format, information is deliberately dropped to make an approximate reproduction!

        That's the SECOND bad choice!

        The FIRST bad choice is taking digital photos! They're inherently limited approximations of the actual scene; numerous issues with optics, internal filters, sensors and digital post-processing lose tremendous amounts of information.

        Actually the ZEROTH bad choice is taking photos at all! Film has its own problems, and not only that, any single-lens image capture device automatically discards all depth information!

        Seriously, the tiny amount of information discarded by JPEG

  • by drDugan (219551) * on Tuesday February 23, 2010 @07:29PM (#31253288) Homepage

    we are generating data far, far faster than we can save. We have for some time, and while trends for storage are catching up, we will always be able to generate more than we store, as a function of how computing and communications work.

    So what to save? The Director of the NLM had a unique insight on this exact question: [paraphrasing] "What is used, is saved." Basically, its the utility of information, that information that people find useful and actually use is the best proxy for long term value. The good thing is that all people are motivated to store and maintain the data they find useful, or their constituents or customers desire. As long as people keep wanting data, it will be stored and available.

    This is a very different situation to real-world archeology. In the digital, connected world we can access data today once it's publicly available, evaluate it and use it if we want. There is no dust that covers old data, it does not get buried...

    • by guspasho (941623)

      What we preserve based on our present biases may not be what interest future archeologists. It may skew their perception of us as well. Imagine if all they found of us was the porn.

  • by drDugan (219551) *

    This is what counts for science nowadays?

    http://www.americanscientist.org/include/popup_fullImage.aspx?key=vo50G9YwnF6SwlOk2usL5R9EyqRLsNX+YiPzweX/0ZsH0IeSOOXIBip7qwN2/ZRY [americanscientist.org]

    Look carefully at the 'digital encoding' of the "simple tone" sine wave. ??? Really? What encoder is that?
    cf. http://en.wikipedia.org/wiki/Fourier_Transform [wikipedia.org]

    • by imsabbel (611519)

      Glasshouse, meet stone.

      Or in other words: Are you functionally retarted?

      Look up the meaning of "digital". Hint: it has nothing to do with all the stuff happening if you _compress_ digital signals.

  • This topic involves two vastly different things:

    - File formats - easy - just make sure everything is stored in an open format, or something so ubiquitous its as good as an open format (odt, txt, jpg, pdf, csv, ogg, etc) and it will be readable forever.

    - Physical media - this is the risk - most new machines these days can't read 3 1/2" floppies, let alone anything older, but so long as you migrate contents of your old physical media onto new media formats - AND you have multiple copies of important stuff - t

    • by enoz (1181117)

      - Physical media - this is the risk - most new machines these days can't read 3 1/2" floppies, let alone anything older, but so long as you migrate contents of your old physical media onto new media formats - AND you have multiple copies of important stuff - that shouldn't be a problem.

      I get your point but you've used a terrible example. If you NEED to read a 3 1/2" floppy, then you can go and buy a new or used floppy drive for under $10. Most mobos still come with a floppy interface, just the drive is not bundled because most people don't use them.

  • With all the pushing by law enforcement for permanent archiving of everybody's web use the problem will solve itself!

    Rah! Rah! for terrosists - they hate our freedom but they have saved our culture from fading from history! [livingwithanerd.com]

  • Virtual machines really eliminate a lot of those concerns. But what we really have to worry about is silent bit rot. I've found a few old files of mine that are corrupted. Not cool. ZFS and drobos... I don't really see a good end-user ready backup system that verifies data integrity.

  • This is something that I've seriously taken a look into on the personal side of things. I look at all the digital data I've collected (and lost due to a drive failure, virus, corruption, disaster, ect.) over the years and it really makes your head go foggy. I only hit this realization putting together a wedding anniversary party for my parent's together in the last few months. My parents brought over bucket loads of photos and keepsakes that I have to rummage through for an overhead slideshow. On top of

  • I'm not worried. We are pretty soon going to have a bunch of people that are heartbroken about their data from 10 years ago being lost. The travel photos, the e-mailed love letters, the brilliant blog posts. And these people will create demand for longer-term storage and data collection techniques we don't have now. Why should it happen in the near future if it hasn't already? Because we first needed a generation of people that use computers and the internet as the primary way of expressing their life.
    • "consumer-grade "lifetime" storage options will enjoy a more prominent place on the market."

      Long-lived consumer-grade gadgets, no matter if logical or physical, are at odds with the corporate-grade greed for benefits: you won't sell a device that won't be replace in fifty years if you can find the way to sell fifty one-yeared devices in that time span, do you?

      Look around: which one do you think is the best bet? Consumer or corporation interests?

  • by Lazarian (906722) on Tuesday February 23, 2010 @08:18PM (#31253864)
    If you want to preserve your data, backup your data yourself, and keep it on its own storage medium. There seems to be a growing impetus where "cloud computing" and "thin clients" are envisioned to replace traditional architectures where data is stored and decoded by the individual who owns/created it. I'd rather store my data myself than ask permission to access it through the equivalent of a 1980's green screen dumb terminal from some corporation who's interests run contrary to mine.
  • Preserving digital data is inherently hard.

    Not only do you need to preserve the bits, but you also need to preserve the knowledge about what the bits mean.

    So...instead of addressing this issue as important, the content owners have decided to add another layer...

    Now, they encrypt the data, to prevent copying.

    This makes the problem A LOT HARDER!

    The content owners are the ones to blame if we lose entire decades of art and culture.

  • Seriously. Many copies. Multiple, ad-nauseum uber redundancy. And, so what about that DRM crap? Is it _that_ important to preserve pop music? If so, when did DRM ever stop us? Burn a CD/DVD/Blu-Ray and then RIP it, upload the thing. Put it on a hardened RAID, what...ever.

    True, technology is an ever more complex cycle. I guess we should try to get the info/code down to lowest common denominator. Text? If so, what language? Boggles the mind..but if the unthinkable happens then maybe we can assist the great mi

  • like an odd sock (Score:4, Insightful)

    by timmarhy (659436) on Wednesday February 24, 2010 @12:42AM (#31255928)
    this is the same shit story that keeps popping up on /. ever 6 months or so.

    typically kdawson posts it, what a tard.

"It's what you learn after you know it all that counts." -- John Wooden

Working...