Digital Big Bang — 161 Exabytes In 2006 176
An anonymous reader tips us to an AP story on a recent study of how much data we are producing. IDC estimates that in 2006 we created, captured, and replicated 161 exabytes of digital information. The last time anyone tried to estimate global information volume, in 2003, researchers at UC Berkeley came up with 5 exabytes. (The current study tries to account for duplicating data — on the same assumptions as the 2003 study it would have come out at 40 exabytes.) By 2010, according to IDC, we will be producing far more data than we will have room to store, closing in on a zettabyte.
XXX (Score:5, Funny)
Re:XXX (Score:5, Funny)
Re: (Score:2, Funny)
Re: (Score:1)
Re:XXX (Score:4, Funny)
Re: (Score:2)
Re: (Score:1, Redundant)
I'd imagine more in the 95%-99% range...
Re: (Score:2)
The number is way too low! (Score:3, Insightful)
If we consider all digital data, not just the stuff that flows over the internet, then this is way too low. Consider the data in all the DTVs, GPS receivers etc.
A top-end GPS is grinding over 10^9 bits per second in its correlators (about 50 correlator channels x 20Mbps or so sampling rate). That ends up being approx 3x10^15 bytes per year per GPS... or 40,000-odd top-end GPSs would be grinding 1.61x10^20 bytes per year. There a
Re: (Score:3, Informative)
Re:The number is way too low! (Score:4, Insightful)
If a NMEA lat-lon string gets spit out of the serial port of a GPS and there's nothing there to capture it, it is not part of their count. They're not counting bitrate on data generators and multiplying times bandwidth. They're counting discrete blocks of saved data. You cannot arrive at the latter from the former, just like you can't tell how much water is behind Hoover Dam on average during the year by measuring the average daily flow rate of the Colorado river and multiplying by 365.
Re: (Score:2)
Re: (Score:2)
It was only 9 megs (Score:5, Funny)
Re:It was only 9 megs (Score:5, Insightful)
But seriously, I wonder what percentage of this data is text. I'd guess it is a very very small amount. When I had a film camera, in twenty years I bet I took less than 100 rolls of film. With digital cameras I've take thousands of pictures, sometimes taking a dozen or more of the same subject, just because the cost to me is practically zero. Now there are vendors that will let me upload large numbers of these amateurish photos for free, and let's pretend that there are enough people interested in seeing my pictures that these companies can pay for this storage with advertising. That's scary.
Excluding attachments I think it would be practically impossible for anyone to use up Googles 2 gig of storage, but I've heard of people using it up in little more than a week by mailling large attachments back and forth (oh yeah, I HAVE to have every single iteration of that Word document, sure I do!)
But what's scarier is that for some nominal fee (like $20 a year) they place no limit at all on my ability to hog a disk drive somewhere. I know people who are messed up in the head enough to want to test these claims. Give them 5 gig for photos and they've filled it up in a week, give them "unlimited" and they upload pure junk to see if they can break the thing.
Like any house of cards, this thing is gonna come down sooner or later. I just hope that people who are making sensible use of these online services don't lose everything along with the abusers.
Re: (Score:2)
Re: (Score:1)
Finally, an excuse... (Score:5, Funny)
Re: (Score:2)
How many... (Score:3, Funny)
Alternatively, you can also answer in anime episodes, or mp3 files.
Re:How many... (Score:5, Funny)
Honestly, I don't know why the
Comment removed (Score:5, Funny)
Obvious Joke: (Score:2)
In the Library of Congress, you're not allowed to lick the pages.
Re: (Score:1)
Re:How many... (Score:5, Funny)
This is a complete lie (Score:2)
Re: (Score:2)
Re:How many... (Score:5, Informative)
I don't have anime estimates, but I can make a Heroes [wikipedia.org] analogy.a hi-def episode is more or less 700mb. Considering the first season has 23 episodes, that would make 16.1gb. So 161 exabytes would be 10,000,000,000 (ten billion) seasons of Heroes. Since the earth currenlty has around 6.6 billion people, this would mean that you would have 1 episode for each person on the planet, and all the people of China, India and the US would have a second episode. That's how big it is.
Regarding the storage space, I call shenanigans. We already have HDD that stores terabytes. A couple years from now, MS office will require that space to be installed.
Re: (Score:2)
Re: (Score:2)
Sorry, my fault... (Score:5, Funny)
Re: (Score:2)
Re:Sorry, my fault... (Score:5, Interesting)
Re:Sorry, my fault... (Score:5, Funny)
Re: (Score:3, Funny)
And here I thought Malthus was dead (Score:5, Insightful)
Re:And here I thought Malthus was dead (Score:4, Informative)
Malthus has just gone down to the shops (Score:3, Insightful)
I remember when software came on cassettes and when food came from close to where you live.
When floppy disks were too small, we made higher-density floppy disks, and we still needed a whole box of them.
When there wasn't enough of a particular food, we got it shipped from further away.
When CD-ROMs came out, we still ended up not only filling them but spreading things over multipl
Re: (Score:2)
Re: (Score:2)
I think you underestimate the size of the planet - it's pretty big.
There are 6.5 billion people on the earth (well, roughly - last time I counted, I think I lost count at about 4 billion, but I'm pretty sure I was more than halfway at that point).
Assuming everyone lived in households of 3, and each household had it's own acre of land, you would be able to fit the entire populati
Re: (Score:3, Funny)
Re: (Score:2)
Correction: we haven't run out of food, yet.
But we animals have a self-correcting system as far as that goes; if the food supply isn't sufficient to support a large population, then the population simply grows at a slower rate. The same will happen with our data; if we reach a point where we can't store all the non-ephemeral data we generate, we'll reflexively limit the amount of non-ephemeral data that we generate.
Re: (Score:2, Insightful)
They wouldn't fit comfortably, and you'd certainly have to stack them. The acceleration of gravity would increase as more humans were added to the mass of the Earth.
Possibly, you'd have to import food from throughout the universe... I'm not sure if conservation of mass applies to a planet and all living (or not living) entities on it... Debris from space that enters Earth's atmosphere may
Re: (Score:2)
Only if the humans are added from another source, say, some sort of space-humans. Humans grown on earth would retain the current mass.
What's an exabyte? (Score:2, Informative)
10^18 bytes, or One million terabytes
Re: (Score:3, Informative)
Re: (Score:2)
Re: (Score:2)
Yotta is the largest metric prefix and it's the next one after Zetta, so it looks like the standards people are going to have to get together to name some more prefixes.
What if ISP's are forced to retain data? (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Funny)
No, of course not. Any law or regulation that the government comes up with doesn't have any hidden costs.
Re: (Score:3, Funny)
Must be the space donuts (Score:5, Funny)
The awesome information we retain (Score:5, Insightful)
And there used to be so little on-line data (Score:5, Interesting)
What's really striking is how little data was available in machine-readable form well into the computer era. In the 1970s, the Stanford AI lab got a feed from the Associated Press wire, simply to get a source of machine-readable text for test purposes. There wasn't much out there.
In 1971, I visited Western Union's installation in Mawah, NH, which was mostly UNIVAC gear. (I worked at a UNIVAC site a few miles away, so I was over there to see how they did some things.) I was shown the primary Western Union international gateway, driven by a pair of real-time UNIVAC 494 computers. All Western Union message traffic between the US and Europe went through there. And the traffic volume was so small that the logging tape was just writing a block every few seconds. Of course, each message cost a few dollars to send; these were "international telegrams".
Sitting at a CRT terminal was a woman whose job it was to deal with mail bounces. About once a minute, a message would appear on her screen, and she'd correct the address if possible, using some directories she had handy, or return the message to the sender. Think about it. One person was manually handling all the e-mail bounces for all commercial US-Europe traffic. One person.
"closing in on a zettabyte" (Score:3, Funny)
Re: (Score:2)
It doesn't seem like that much (Score:2)
Re: (Score:1)
Every OS file you have, application file you have, mp3 file you have, is only counted once. So 10000 gelflings is still only 40GB.
Re: (Score:2)
Supply and demand (Score:5, Insightful)
"producing far more data than we will have room to store"
That's like saying, for the last 2 months, my profit has increased by 10%. If my profit keeps increasing at 10% per month, then pretty soon I'll own all the money in the world, and then I'll own more money than exists! Damn I must stop making money now before I destroy the world economy!!!
Who are these people who draw straight lines on growth curves? Why do people print the garbage they write and why weren't they the first against the wall after the dot com bust?
The only things that seem certain are death, taxes, entropy and stupid people...
Re:Supply and demand (Score:4, Insightful)
Though with data, some people, or even companies, are merely sinks. They store huge amounts of data, mostly for auditing purposes. Access logs for webservers. Windows NT event logs. Setup logs for Windows Installer apps. For ISPs, a track record of people who got assigned an IP address, in case they get a subpoena. Change logs for DoD documents. Even CVS for developers, to keep track of umpteen old versions of software. Even the casual Web browsing session replicates information in your browser cache. Many more of these examples could be given.
We also need to produce more and more hardware to store these archived data, the most obiquitous of which is the common hard drive. In the end, we'll need more metal and magnetic matter than the Earth can provide.
Martian space missions, anyone?
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
We also need to produce more and more hardware to store these archived data, the most obiquitous [sic] of which is the common hard drive. In the end, we'll need more metal and magnetic matter than the Earth can provide.
Right, and we ran out of wood because we used it all up heating our stone houses and all our land is taken up by pasture to feed the horses we use for transportation.
/current/ technology is a bit shortsighted.
Extrapolating our future needs based on the most common
Re: (Score:1)
Re: (Score:1)
We won't produce more data than can be stored. (Score:5, Funny)
Then again the past no longer exists anyway, the future doesn't exist yet and the present has no duration- so maybe the data never existed anyway. Maybe you don't exist?!?! Awe man maybe I *~/ disappears in a puff of logic*
----
Kudos to Augustine and Adams
Re: (Score:2)
I routinely have to compile static versions of my company's web stores in order to archive them and they are about 1GB each of HTML once compiled.
Each store, however is about 100 megs of assets and then the data in the DB makes up another 50M or so. All of this is then generated dynamically and sent to client browsers that will just cache them temporarily. So, the data transmitted may be huge, but what people are storing would appear to be less.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Of course we will (Score:3, Interesting)
Re:We won't produce more data than can be stored. (Score:5, Funny)
Great. Now we're all going to be inhaling second-hand logic. There ought to be a law...
2010 (Score:1)
Internet | uniq (Score:3, Insightful)
My work machine that I backed up a couple weeks ago, was a 30MB zip file, and 3/4 of that was my local CVS tree. So out of a 30GB, less then 1/3000th was not OS, software, or just copied locally from a data store.
At home, I've saved every email, every picture, everything from my Windows, Linux, OSX and every other box I've every had since ~1992, and that's barely a few GB uncompressed.
The amount of non-duplicate useful material is far far smaller then your would think.
Re: (Score:2)
Re: (Score:2)
I will admit the digital camera turned it from a CD into a DVD of backups, but I just need to get a really good 3-4MP camera, instead of a rather bad 8MP one. Also just deleting
2nd the motion of Firehed (Score:2)
Text (code, misc letters) IS very small. Up until just a couple of years ago, all the "good stuff" would fi
Internet a product of biology? (Score:2, Interesting)
In River Out of Eden [wikipedia.org] Richard Dawkins traces the data explosion of the information age right back to the big bang.
How much is actually used? (Score:5, Interesting)
My question is how much of this data is actually being used? I'm horrible for constantly downloading e-books, movies, software, OSes, and other stuff that I'm *intending* to do something with, but often don't get around to. I end up with gigabytes of "stuff" just sucking up disc space or wasting CDs. I burned a DivX copy of Matt Stone and Trey Parker's popular pre-South Park indie film "Orgazmo" in about 2001. I've since seen the film 2 or 3 times on TV. I STILL haven't watched the DivX version I have, and now I can't find the CD I put it on. I know I'm not the only one who does this either, as many of my friends are using up loads of storage space on files they've just been too busy to have a look at.
Right now I'm on a project digitizing patient files for a neurologist. We're going up to 10 years deep with files for over 18,000 patients. Most of this is *just* for legal purposes and nobody is EVER going to open and read the majority of these files. The doctor does electronic clinics where he consults the patient and adds new pages to their file, which will probably sit there undisturbed until the Ethernet Disk fails someday.
I think a more interesting story (although probably MUCH more difficult to research) would be "How much computerized data is never used beyond it's original creation on a given storage medium?"
Exabyte tapes (Score:3, Funny)
Google Says: (Score:3, Interesting)
Google Says: (Score:3, Interesting)
[1] Total est. of people on the Internet:
http://www.internetworldstats.com/stats.htm [internetworldstats.com]
Low SNR (Score:5, Insightful)
As interesting as the sheer volume is, most of it is garbage. I'd rather have 50 terabytes of organized and accurate information than 500 exabytes of data that isn't organized, and even if it were, it's accuracy is questionable at best. In essence, even if you manage to find what you want, the correctness of that information is likely to be very low.
I've long said we are not in the information age, we are in the data age. The information age will be when we've successfully organized all this crap we're storing/transmitting.
Surbanes Oxley (Score:2)
Yes...but is it useful (Score:4, Interesting)
of data, but what's the point. Is that data any more useful
to people than the selective data that was used to run the world
50, 60 or 100 years ago?
We as individuals are only capable of assimilating a limited amount
of information so most of those exabytes are just rolling around
like so many gears in an old machine. If they are minimally used or
never used they simply become a storage liability.
As an example, the internet has not made *better* doctors.
Even with all the latest information at thier finger tips
professionals are still only the sum of what they can
mentally absorb. Too much data, or wrong data (ie: wikipedia)
can lead to the same levels of inefficiency seen prior to
the 'information age'. What would a single doctor do with
160 exabytes of reading material, schedule it into the work day?
Also, if the amount of information is rated purely on bytes
but not in *useful content* the stats get skewed. Things like
movies and music should be ranked by the length of script
and/or notation. That would make the numbers much less than
160 exabytes.
Saying that the whole world produced 160 exabytes of information
is like saying the whole world used 50 billion tonnes of water.
did somebody actually drink it to sustain life?
Mechanistic stats are stupid.
Re: (Score:2)
Dr Evil (Score:5, Funny)
Data or simply format shift? (Score:2)
1 Photography
2 Letters and corrospondance
3 Fileing and records
4 Music
5 Telephone calls & faxes
6 Newspapers and magazines
7 Novels and books
8 Board games and puzzles
9 Movies
10 Radio and TV broadcasts
11 ??
All these form of data existed before. None of them was digital before. The numbers represent a format shift, not new content. Not many people archived ev
Poor estimation, poor predictions, poor conclusion (Score:2)
Re:Poor estimation, poor predictions, poor conclus (Score:2)
Is this a concept that is so hard to understand? Many replies above don't seem to grasp the concept of data not actually being kept.
Re:Poor estimation, poor predictions, poor conclus (Score:2)
Re: (Score:2)
I'm sorry that I have to clarify myself, but neither I or the survey include temporary data in the discussion. We're talking data that is stored and represents archived information..
Otherwise where do we stop? Do we count copies of the programs in RAM, swap files, temporary caches and so on? It'll become point
Re: (Score:2)
50 Exabytes for $30.5 Billion (Score:5, Informative)
RAID6 (24 Drives -2{Parity} -1{Hot Spare} = 21) 750GB, 13.48TB ZFS/Solaris:
93,345,048 750GB Hard Drives: $17,735,559,120
3,889,377 Areca ARC-1280ML: $4,317,208,470
1,944,689 Motherboards/Mem/CPU: $766,207,466
1,944,689 5U Rackmount Chassis's: $4,546,682,882
194,469 4 Post 50U Racks: $45,700,215
3,684 528-port 1Gbps Switches: $374,294,400
40 96-port 10Gbps Switches: $11,424,000
1,948,935 Network Cables: $2,020,812
? Assembly Robots/Misc. $111,000,000
Sub Total: $27,910,097,365
Tax/Shipping: $2,645,915,779
Grand Total: $30,556,013,144
$470 billion cheaper then the IRAQ war.
Wow that's massive! (Score:2)
* Two copy's of the entire Library of Congress, 6000 TB[1], can be stored in the collective cache buffers of the RAID controllers.
* It would need a 1,712 MW (peak) power source, a typical PWR nuclear power station produces 2,000 MW. Tack on another $5 billion for the construction of a nuclear power station.
* You would likely need to employ an entire team (in 3 shifts) to replace defective drives every day.
* You would need 1,684,80
Re: (Score:2)
Re: (Score:2, Funny)