Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Data Storage Open Source Programming Software

GitHub Buries Giant Open-Source Archive In An Arctic Vault (zdnet.com) 44

Microsoft-owned GitHub has finally moved its snapshot of all active public repositories on the site to a vault in Norway. ZDNet reports: GiHub announced the archiving plan last November and on February 20 followed through with the 21 terabyte snapshot written to 186 reels of film. GitHub cancelled plans for a team to "personally escort the world's open-source code to the Arctic" due to the coronavirus pandemic, leaving the job to local partners who received the boxed films and deposited them in an old coal mine on July 8. The archive is being stored in Svalbard, Norway, a group of islands that's also home to the global seed bank.

"The code landed in Longyearbyen, a town of a few thousand people on Svalbard, where our boxes were met by a local logistics company and taken into intermediate secure storage overnight," said Julia Metcalf, director of strategic programs at GitHub. "The next morning, it traveled to the decommissioned coal mine set in the mountain, and then to a chamber deep inside hundreds of meters of permafrost, where the code now resides fulfilling their mission of preserving the world's open-source code for over 1,000 years." The repository includes public code repositories and significant dormant repos. The snapshot consists of the HEAD of the default branch of each repository, minus any binaries larger than 100kB in size. Each repository is then packaged as a single TAR file, and for efficiency's sake, most of the data will be stored as QR codes. A human-readable index and guide will itemize the location of each repository and explain how to recover the data.

This discussion has been archived. No new comments can be posted.

GitHub Buries Giant Open-Source Archive In An Arctic Vault

Comments Filter:
  • Maybe Slashdot should store their backups there too?
    • Microsoft-owned GitHub has finally moved its snapshot of all active public repositories on the site to a vault in Norway. ZDNet reports:

      Maybe Slashdot should store their backups there too?

      You are right, backups is a more appropriate term. Snapshots only are useless without the complete file system content. As for Slashdot, it seems like they lost one or two days of content in the last outage.

  • A great snapshot from the start of this year. However code changes, so I wonder how they intend to keep this up to date. One would not want our great, great, ... grand kids from laughing at today's bugs!

    BTW: I am curious how much this cost them to do ?

    • Yeah, I just patched a bug in one of my open source libraries. Now I know that bug is going to be sitting in that archive for the next 1000 years, dammit. How am I supposed to sleep at night knowing this?

      • Either stop writing bugs, or stop fixing them.

        One of these options is easier and more profitable.

      • Yeah, I just patched a bug in one of my open source libraries. Now I know that bug is going to be sitting in that archive for the next 1000 years, dammit.

        Just send in a change request and wait a couple centuries for the update.

      • If they used a couple of off the shelf 14 Tb hard drives they could do it for less than US$1500. Add raid 6 and it would still be less than $3k.
    • Those MS guys, they never miss an opportunity to Bury some Open source stuff. :)

  • QR codes-on-film (Score:4, Informative)

    by schweini ( 607711 ) on Friday July 17, 2020 @07:37PM (#60302223)
    For anyone else confused about why they are using "film" instead of regular tape: they are storing the data in QR codes exposed on photographic film. Seems that this is more long-term stable than magnetic tape.

    Complete manual of how this works is here: https://github.com/github/arch... [github.com]
    • No they are using film and QR codes because they know that tape drives won't work in the future :)

      An Overview Of The Archive
      The archive consists of 201 reels of film: one "guide reel" of human-readable information and guidance, and 200 reels of archived software. Each reel includes 65,000 individual frames. The frames at the beginning of each reel, and the frames of the guide reel, include human-readable text and images. All other frames of film consist of digital data stored in a visual form known as QR co

  • _Written_ archives on papyrus, patchment, and good quality, low acidity linen paper have lased hundreds and thousands of years. There are very few _rooms_ that have lasted 1000 years, the data is vulnerable to EMP unless the room shielding is _very_ good, nad cosmic ray generated bitrot. It's also vulnerable to chemical degradation of _whatever_ the medium is, which is not completely tested because no one has yet _tried_ to keep high density media for more than a few years.

    • _Written_ archives on papyrus, patchment, and good quality, low acidity linen paper have lased hundreds and thousands of years. There are very few _rooms_ that have lasted 1000 years, the data is vulnerable to EMP unless the room shielding is _very_ good, nad cosmic ray generated bitrot. It's also vulnerable to chemical degradation of _whatever_ the medium is, which is not completely tested because no one has yet _tried_ to keep high density media for more than a few years.

      A solid point. We're certainly given over to technology and modern information storage techniques, yet perhaps we should store valuable information both ways, in computer memory and with old-fashioned dead tree paper backups... you know, shite, in case all we can do is read in some dystopian future.

      • Instead of digital storage, keep a copy of the source printed in the same very small analog font on nickel discs used for the Rosetta Project: https://en.wikipedia.org/wiki/... [wikipedia.org]
        These discs could be read by a good 19th-century microscope.

        The offsite backup could be a lava tube near Musk Base on the lunar farside.

      • by kiviQr ( 3443687 )
        sounds like you should choose 3 different mediums that are impervious to weaknesses of other two mediums.
        • I've done so for critical data. Negotiating the budgeting and storage can be difficult, and it often winds up with a single point of failure.

    • They're storing everything on stable optical black and white film. It's made out of plastic. If you think paper lasts a long time, wait until you see what plastic can do in a waterproof, UV light proof case when it only needs to store black and white binary data not even accurate color

      • I'd not read enough of the article, that does seem like good protection from EMP. I'll submit that chemical dagradation over the course of 1000 years is still likely: _nothing_ plastic based is that old to verify the longevity of any particular composition. Lightproof containers and modest or even cold temperatures are good: Projecting the degradation observed in a few years in a laboratory to 1000 years is, if I may say, optimistic.

  • by h33t l4x0r ( 4107715 ) on Friday July 17, 2020 @07:54PM (#60302267)
    Whoever survives the apocalypse is going to need my curated list of instagram milfs.
  • It will just wash up on the beach and some kids will find it [nocookie.net]

  • to freezing the code... :D
  • Man if I'd only known I'd have put my Bitcoin cold wallet on GitHub.
    Then it would truly be safe in cold storage... ...and...

    FREE SHIPPING!!!

    OMG /hit head in shame

    E
    P.S. Yeah some followup /.er will one-up me by saying "Well how will you get it out, idiot?" SHOVELS!!!

  • I consider this to be a romantic gesture, but nothing more.

    Software needs to be used and improved, or it loses it's worth. What is being stored there is then so much data that no human alone is capable of appreciating it. This isn't an art collection. It would require a computer and analytic software to draw any human comprehensible conclusion from it in the first place. So this is more closer to a pile of sawdust, created as part of the process of making good software into better software, than something t

  • It's only a backup if it is tested occasionally. Hmm, seems tricky...

  • GitHub has finally moved its snapshot of all active public repositories on the site to a vault in Norway.

    ... the vault isn't open to the public.

  • In 1000 years the Arctic Vault will be as obscure as pyramids are to us. People will think it's a shrine to the gods of open source.
    • We should stuff it into RMS's corpse. Then archaeologists really will think this is our religion.

  • In 100 years people can look at the code and laugh. What am I saying? People will be living in feudal groups among the ruins of the ancients then.
    • In 100 years people can look at the code and laugh. What am I saying? People will be living in feudal groups among the ruins of the ancients then.

      The way things are going, another year or two of this and we'll be shivering in the dark, eating cold beans from a can and fighting over scraps of roasted rat.

  • "... written to 186 reels of film."

    Errrrr....film? I confess I don't understand this bit.

I've noticed several design suggestions in your code.

Working...