PetaBox: Big Storage in Small Boxes 295
An anonymous reader writes "LinuxDevices.com is reporting that a Linux-based system comprising more than a petabyte of storage as been delivered to the Internet Archive, the non-profit organization that creates periodic snapshots of the Internet. The PetaBox products, made by Capricorn Technologies, are based on Via mini-ITX motherboards running Debian or Fedora Linux. The IA's PetaBox installation consists of about 16 racks housing 600 systems with 2,500 spinning drives, for a total capacity of roughly 1.5 petabytes, according to the article. Now to strap one of those puppies to my iPod!" The Internet Archive continues to astound.
A Great Historical Tool (Score:5, Insightful)
I for one think that archive.org should turn into some UN effort, with a mission to chronical and store daily/timely snapshots of the internet and the culture at the time, preserving it for future generations. What a tool for future historians!
The ability to look at a large representation of socity at one single critical moment in time, and being able to have first hand sources for all that information is something that can truely change the way history is recorded (and not in the bad newspeak ingsoc way either). Infact, a wholeistic archive of what happens day-to-day, in an easily accessible format, might well help written history to be more representative of actual history (instead of, say the history Bush wants us to believe; that the Iraq war was for human right and not wmd's). I love Foucault.
The internet archive rocks... really hope this project continues full blast.
- Peace
Re:Good to see. (Score:3, Insightful)
I had a similar experience, I was playing around on irc back when we were swapping video files through DCC. apparently some downloading got out of hand and paged the admin, who contacted me and politely pointed out that I had a process running wild and filling
Re:A Great Historical Tool (Score:2, Insightful)
We already had people writing diaries and making lots of pictures in WWII. The improvement isn't that great.
Re:A Great Historical Tool (Score:3, Insightful)
Funny you should mention that, but this whole "Internet as history" thing has me wound up tight.
Books cannot be changed. They can be destroyed, reprinted and banned but the first edition will always exist in a collection.
The first edition of a website only exists in digital form and there is no way to stop the original from being edited and timestamped back to the expected date.
The IA is the MiniTruth's dream come true.
But who cares? History has always been written by the victorious, hasn't it?
Re:copyright (Score:2, Insightful)
1. FAIR USE!
2. Google is merely providing a service. If you don't like it you can opt out.
The Google Cache is not fair use, as it reproduces the entirety of a web page's text for none of the purposes for which Fair Use is defined. (Under Fair Use you are entitled to use a portion of a copyrighted work, not the whole thing.)
The second one just cracks me up. I thought the Slashdot crowd didn't like being asked to opt out.
Now, trifish, how can the Internet Archive evade copyright laws by reproducing the entirety of many copyrighted pages? Don't try and argue that they're a library. Libraries buy books; they don't photocopy them.
Re:No RAID?! (Score:3, Insightful)
So, while yes, if it really was just one giant supercomputer with a bajillion hard drives in it, RAID 50 would be an ideal solution (as long as the stripes were large enough to prevent too many accesses crossing too many drives, the one big advantage of JBOD here), but that's not what's really in use here.
RAID 5 is inferior to JBOD (Score:1, Insightful)
Depends heavily on your purpose of the system, of course.
If you need something that is highly aviable and have good performance, then raid is wonderful. But archives don't need to be highly aviable, they just need to be highly redundant and backed up to several places.
For instance if you have a RAID 5 array, then a single harddrive failing couldn't take it out. But a single controller failing could. If one drive starts spewing out nonsense then that corruption could be replicated automaticly between harddrives on a array before anybody notices or hardware monitors shutdown everything.
So in this sense simply having multiple copies on different computers on different disks is actually preferable to raid setup. It is simplier, as long as you have high quality distributed filing systems, it's easier to restore materal. It'll be easier to access down the line.
It just won't have the higher performance or high aviability that raid will provide.. but then again it doesn't realy need it.
And remember:
RAID != backups.
Re:copyright (Score:3, Insightful)
If you consider your music to be copyrighted material, might I ask why the hell it's being played on the radio in the first place?
If you consider your book to be copyrighted material, might I ask why the hell it's being lent out in the library in the first place?
If you consider your movie to be copyrighted material, might I ask why the hell it's being broadcast on HBO in the first place?
Just because something is available for free doesn't mean that the producer has granted you a permanent license to distribute it for commercial gain, as Google does with its cache.
Re:No RAID?! (Score:1, Insightful)
It basically triples the price without getting you much for such a large setup, where point replacement of lost systems without imperiling your other systems is much, much easier.