The Ultimate All-In-One Storage Solution 387
karnifex writes "Filled up your LaCie Bigger Disk already, and looking for a little more storage space? Good news! The Petabox is ready! 'The petabox by the Internet Archive is a machine designed to safely store and process one petabyte of information (a petabyte is a million gigabytes).' And luckily, as the Internet Archive notes, it's shipping-container friendly (20' x 8' x 8'). So save on delivery costs and order two!"
In 10 years ... (Score:5, Insightful)
Re:Price? (Score:3, Insightful)
Not really a Petabyte...yet (Score:4, Insightful)
PILOT STATUS 5/2004
* The first 100TB Rack is up and running!
* The second 100TB Rack will be up by the end of May
* Thermal Targets have been met
* Systems Booted from USB Dongle
* Reiser FS running
* PC-based Router running
Maybe I'm missing something but this looks to me like they don't really have a Petabyte of storage working but plans to incorporate a Petabyte of storage with only 100 TB up and running now. Not that 100 TB is anything to brush off.
To bad it won't last... (Score:5, Insightful)
Re:two words (Score:5, Insightful)
100 disk -> 1 TB
15000 disks -> 150 TB.
Netflix has a "mere" collection of 15000 disks. Your patebyte disk is only 1/6th full.
You upload all music CDs: 1 GB per disk (feeling generous).
How many CDs can be in print? Maybe a 500,000?
That is only 500 TB. Now your disk is 2/3rd full.
Lets upload all printed material. May or may not fit in the rest.
Then again, if you want to archive the internet: ~6G pages. 10kB each. 60 TB. each run. Store the last 16 versions -> 1TB.
Comment removed (Score:5, Insightful)
There's an easier way to do this... (Score:3, Insightful)
Re:Price? (Score:3, Insightful)
What would be interesting is to know the estimated maintenance costs as well. With than many drives, I imagine you'd be changing them like light bulbs, especially as time passes and the probability of each drive failing get's higher and higher.
If one was really clever, they could use the failure rate of a typical hard disk and Moore's Law to estimate monthly replacement costs for the next 100 years or so. I would expect them to rise in the short term as the drives age, but fall in the long term as moore's law catches up.
Re:Business idea (Score:3, Insightful)
The power requirements are also quite hefty. It shouldn't be necessary to run all those drives (and the computers behind them) unless the unit is near capacity and access is random (which I'm sure would rarely be the case). Instead, they should be dynamically powering drives and computers up and down, and migrating data to a reasonably small 'working set' of drives.
On the hardware front, the device in this article also incorporates 800 "low-end PCs." IOW it's a big cluster that happens to be heavy on storage. If all you want is the storage, surely there is some way to get rid of all those motherboards and CPUs with their fault-prone, power-hungry fans. They need to develop a controller that can directly handle, say, 64 hard drives, analogous to a big network switch.
Anyways, it sounds like a fun project!
Ozymandias (Score:4, Insightful)
You're complaining that these hard drives won't run forever and you're right. Neither will CD's. However, I would also like to point out that the vast majority of ancient egyptian papyrus isn't around today. Also, don't start goign off on using clay or stone tablets, because they break (even the Rosetta stone is broken).
Honestly, computers are still far superior to what we were using before. It's not like we've got Homer's original version of the Illiad sitting in a museum somewhere; we just have many duplicated copies that have been reproduced over the years. You're right that hard drives fail and CDs break, but we can keep updating onto new media. Besides, when a monk drops an iota when transcribing the Bible, Jesus goes from being God to godlike. When a computer adds an iota, the checkbit fails and the data is resent.
Somebody is also going to point out that, as systems change, data can become unreadable. Heck, I had a professor who couldn't update his lab instructions because the software that read the lab printouts wouldn't run on new machines and the fileformat wasn't understood by any other software. So, want to stop our data from becoming unreadable? Well, let's just do what the Etruscans did! Of course, we don't have a clue what they did because nobody can read Etruscan. For a more familiar example, think of heiroglyphics before the Rosetta stone. It's pretty common for data to become lost and unreadable. Also, this bring us back to the solution. Along with the data, include the source code for the software that can read it. If you really want to be anal, you could even include the source to an emulator for the machien it was designed to run on.
Still, you might point out, 400 years from now, we'll still lose 99% of that do to failures of whatever nature. Once again, you would be be right. However, do you honestly believe that we have 1% of all the data that was collected in 1604? Hell, most of the people couldn't even right, so we don't know ANYTHING about their lives. I'm sorry that we can't digitally preserve our wonderous society for all of eternity, but it's completely blind to believe that this makes us in ANY way different to any other culture. Read Percy Shelley's Ozymandias before complaining about how people in the future won't know what our lives were like.
Replacing HDs could be a pain (Score:4, Insightful)
Re:In 10 years ... (Score:3, Insightful)
The storage problems I have these days are almost entirely organisational.
Re:You think they could spare a boot disk. (Score:3, Insightful)
The last thing you want with a setup like this is having to haul hardware around or disconnect stuff if you for any reason can't boot of the disks anymore. And you certainly don't want to reduce density by wasting space that could be filled with disks with other stuff.
Re:Replacing HDs could be a pain (Score:3, Insightful)
So assuming 3 failures a day, at most 3 RAID's would be running slower a day. Assuming 4 disks per RAID that's at most 12 disks at reduced performance, or 0.3% of the total data set that isn't available at full speed. If that is an issue, you duplicate any data that MUST be available on multiple nodes.