Internet Archive Gets 4.5PB Data Center Upgrade 235
Lucas123 writes "The Internet Archive, the non-profit organization that scrapes the Web every two months in order to archive web page images, just cut the ribbon on a new 4.5 petabyte data center housed in a metal shipping container that sits outside. The data center supports the Wayback Machine, the Web site that offers the public a view of the 151 billion Web page images collected since 1997. The new data center houses 63 Sun Fire servers, each with 48 1TB hard drives running in parallel to support both the web crawling application and the 200,000 visitors to the site each day."
Story is meaningless without LOC measurement (Score:5, Funny)
Storage Envy (Score:5, Funny)
Own the internet! (Score:5, Funny)
so all one need to do to "own the internet" is to drive a big rig and ... lift the container off their parking lot?
Slight problem? (Score:5, Funny)
I can now theoretically steal "the internet" with a flatbed truck and a lift. There's something to be said for conventional data centers: They're rather hard to load onto a truck and drive off with.
Re:Where do they store 4.5TB off site (Score:5, Funny)
one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system
floppy disks.
lots of floppy disks.
Re:Where do they store 4.5TB off site (Score:5, Funny)
Re:Where do they store 4.5TB off site (Score:5, Funny)
It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.
Re:Own the internet! (Score:5, Funny)
well if you plug in a laser printer you can print off a hard copy for your boss.
They store 4.5PB in Egypt! (Score:5, Funny)
The Internet Archive also works with about 100 physical libraries around the world whose curators help guide deep Internet crawls. The Internet Archive's massive database is mirrored to the Bibliotheca Alexandrina, the new Library of Alexandria in Egypt, for disaster recovery purposes.
Re:Where do they store 4.5TB off site (Score:2, Funny)
Not reliable enough.
I suggest that this important resource be backed up to punched cards.
This would also enable handy comparisons in units that us oldies understand, such as ELOCs
(Equivalent Library of Congress).
I'd calculate it myself, but seem to have mislaid my slide rule...
Re:What about 1996 and earlier? (Score:5, Funny)
I would love to revisit the web as it appeared when I first discovered it (1994 at psu.edu).
No, you wouldn't.
Re:What about 1996 and earlier? (Score:2, Funny)
The entire internet prior to 1996 is archived on an old PC that I'm currently trying to get the 5GB disk restored on.. why I've kept all that old porn for so long completely escapes me tho. :)
Re:Where do they store 4.5TB off site (Score:5, Funny)
They'd better have it backed-up. Last time the Alexandria library burned-down, we lost about one thousands years of collected information from ancient Greece and Rome. Ooopsie.
Re:Where do they store 4.5TB off site (Score:1, Funny)
They have Charlie Babbitt on their staff. No need to replicate.
Comment removed (Score:3, Funny)
Re:Where do they store 4.5TB off site (Score:3, Funny)
I'd suggest also using stone slabs. Water can do serious damage to paper, and don't get me started on fire hazards. Good old Stone Slabs resist both of those really well. I'm not sure what the write speed is, however, so you'll probably need to hire many stonecutters to work in parallel.
Re:You can ship it over OC-192... (Score:5, Funny)
You can ship 4.5 petabytes over a single OC-192 link in about 71 days.
yeah, but just at the 70th day, someone will pick up the phone and the whole thing will have to be resent.
Re:Where do they store 4.5TB off site (Score:2, Funny)
It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.
in music today, there is a so-called 'loudness war' and I think I've discovered what it is: they're removing the zeroes, thinking that 'all ones' will make the music even louder!
I wonder if its reversable? where do the zeroes go? can they be unzeroed? we should try to find them.
Re:They store 4.5PB in Egypt! (Score:4, Funny)
Egypt could be a good choice. The area is fairly famous for reliable persistent storage. From papyrus scrolls to stone engravings, things tend to keep there better than most places. There really aren't many other geographical areas on earth that can claim the same kind of data retention rates over the time periods they've dealt with. Though despite their impeccable track record with avoiding hardware failures, they've done significantly worse when it comes to data loss due to theft and/or hackers/pirates.
The one curious part about that choice is that the library at Alexandria is the one notable case where mass amounts of data were irreparably lost. So it's odd that they'd choose to entrust their data to that specific institution. Perhaps they felt that since it's under new management, the previous problems will have been resolved.
However, had the choice been mine, I would have chosen to store my offsite data in Luxor. It's data retention was quite good, and included one data store that was preserved in its entirety for over 3000 years. As an added benefit, it seems that they've opened a second location [luxor.com] that's significantly more convenient for the IA since there's no overseas transmission to worry about.
Re:Slight problem? (Score:3, Funny)
take THAT, Ted Stevens!
The off-site backup IS the Internet. (Score:5, Funny)
They're keeping the offsite backup distributed around the Internet, using the World-Wide Web to store it in real time.
Part of it may even be on *your* machine! We've really got to stop Brewster from leaching all your storage and make him store his backup himself - this business of using the originals to back up the backup just isn't sustainable!
Re:Where do they store 4.5TB off site (Score:5, Funny)
Can you say, Parallelism?
Parallelogram.... crap
Parallellellell... dammit
Parapalouza... >
Why did you have to point that out to everyone? :(
Re:Where do they store 4.5TB off site (Score:3, Funny)
It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.
Compressed with XML! Because XML makes everything better... right?
Right?
Re:Story is meaningless without LOC measurement (Score:4, Funny)
Re:Where do they store 4.5TB off site (Score:5, Funny)
I'd suggest also using stone slabs. Water can do serious damage to paper, and don't get me started on fire hazards. Good old Stone Slabs resist both of those really well. I'm not sure what the write speed is, however, so you'll probably need to hire many stonecutters to work in parallel.
A math problem. My favorite. I don't know much about stone cutters but lets assume they can write one bit every 2 seconds. Thats 1 byte in 16 seconds. The internet archive is (4.5 x 1,125,899,906,842,624) 5,066,549,580,791,808 (5 quadrillion) bytes. That works out to 81,064,793,292,668,928 (81 quadrillion) seconds or about 2,570,547,732 (2.5 billion) years. That is far to long for their stringent 2 month backup cycle. They would need 15,423,286,395 (15.4 billion) stone cutters to keep schedule assuming they had unlimited stone. Last time I checked there were only between 6 and 7 billion people with only a small fraction of them being stone cutters. That leaves but one solution. Force the web developers to become stone cutters. This would not only increase the work force but also reduce the amount needed to backup because fewer people will be making more web pages to backup.
Re:Story is meaningless without LOC measurement (Score:1, Funny)
Re:Where do they store 4.5TB off site (Score:3, Funny)
"XML is like violence. If it doesn't solve your problem, you're not using enough of it."