Info Glut - Five Exabytes of Data Created in 2002 284
securitas writes "If you had any doubts that you are overwhelmed by the volume of information in your life, a new Berekley study (PDF) shows that five exabytes of data were created in 2002, twice the 1999 total. That's five million terabytes of data, or 500,000 Libraries of Congress, which works out to about 800 MB of data for each of the 6.3 billion people on the planet. Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future. The study was conducted by University of California-Berkeley's School of Information Management and Systems professors Peter Lyman and Hal Varian. More at CNet, Infoworld, ByteAndSwitch and The Register."
And about 1% was worthwhile (Score:4, Insightful)
Re:And about 1% was worthwhile (Score:4, Interesting)
For that matter, how much of the data is real, and how much is virtual? If two sites point to the same download, is that data counted twice, or once?
Re:And about 1% was worthwhile (Score:5, Informative)
The blurb said 92% was stored on magnetic media; curious about the rest, I looked glanced around the article. Surprisingly a large part, 7%, is FILM! The reason film comprised such a large percentage is that each film reel is duplicated thousands of times to be sent to theaters around the world.
So if they're counting duplicates in film, I'd guess they'd count duplicates in magnetic media.
Re:And about 1% was worthwhile (Score:3, Funny)
I only glanced through the numbers, but couldn't find any place that said "for our purposes pictures are considered HxV resolution". For film (studio movies), they did say each frame was considered a picture and th
Re:Ummm.. That's not data... (Score:3, Informative)
Re:And about 1% was worthwhile (Score:2)
Trust me, the nicest thing about stored data is its own copy safely guarded somewhere else, at at least 10 km distance andsoon.
Re:And about 1% was worthwhile (Score:5, Funny)
3% was [AOL] Me Too! [/AOL] posts.
1% was In Soviet Russia jokes.
0.5% Profit!!!
So I guess there was a fair amount of duplication.
KFG
Re:And about 1% was worthwhile (Score:2)
Re:And about 1% was worthwhile (Score:2)
Not to mention all the websites online that only have keywords aimed to hack google, and nothing else, but maybe links to OTHER void pages by the same author/group/company!!
there are a few more considerations (Score:2)
2 years back, CD-R's were the in thing. Everyone and anyone was storing data on it. Since its size was 700 MB, files were generally smaller and compressed. Higher broadband connections and DVD recorders(alongwith faster processors) are becoming common, people don't care so much about file sizes.
Regarding duplicate data- ask five people
Re:And about 1% was worthwhile (Score:4, Insightful)
Re:And about 1% was worthwhile (Score:2)
I don't really think historians and archaeologists are ever going to be able to dig through Five Exabytes of Data. Maybe the magnetic storage is a blessing then...
Re:And about 1% was worthwhile (Score:2)
historians and archaeologists are ever going to be able to dig
A. They'll use machines to do the heavy digging.
B. Or, the historians and archaelogists will be machines.
A big problem will be that those 5 EB of data describing 5 years near Y2K will be dissolved in a much larger ocean of data by that time.
Re:And about 1% was worthwhile (Score:3, Insightful)
I left this script running on the unix farm which did the following on each box
while(true)
rm filename
echo "Whose the Daddy" > filename
end while
Its a big farm, and its been running all year. The net result is about 100k of files on the farm total... but terrabytes during the year.
In otherwords what I mean is...
How much of this "created" information was transient.
Re:And about 1% was worthwhile (Score:4, Interesting)
ProfQuotes [profquotes.com]
Sounds about right. (Score:5, Insightful)
That's a believable number. Consider the amount of published data on Kazaa, or that 45 minutes of raw DV video is roughly 12.5 Gb*. Move 100 of your CD's to MP3s and you're consuming/creating roughly 3.5 Gb* (or more if you're using higher than 128kb MP3's). And I'm not evern commentin on pr0n.
(*I said roughly...comment on the comment, not the mathematical precision of the statement.)
Re:Sounds about right. (Score:2)
By the article: The researchers relied on existing data such as ISBN numbers to count books and journals, as well as industry reports about data handled by enterprise servers for things such as supermarket sales and airline bookings. They performed surveys to estimate how much unique information exists on each type of hard drive.
I don't think they attempted to collect information on more ephermeral data... For example, artists that go through many ver
Yeah... (Score:5, Funny)
Re:Yeah... (Score:2)
AOL doom day. (Score:3, Funny)
It's a joke..
Re: (Score:2)
Damn (Score:2, Funny)
Re:Damn (Score:3, Funny)
Re:Damn (Score:2)
ermmm, maybe it's best to forget I wrote that.
Dissertation (Score:3, Funny)
Shoot, it felt like my doctoral dissertation was responsible for at least 2 of those 5 exabytes.
This artcical says 23 exabytes (Score:3, Informative)
Re:This artcical says 23 exabytes (Score:4, Informative)
They found that new information flowing across televisions, radios, telephones, Web sites and the Internet had increased by 3 1/2 times to a total of 18 exabytes as of 2002. The amount of new but stored (non-transmitted) information in 2002 was determined to be about five exabytes.
This jives with the other articles. 5 exabytes generated content, 18 exabytes transferred content - still one heck of a lot of bits floating around :)
No problem here. (Score:2, Funny)
Well, why won't they just print it ? Sheesh...
Re:No problem here. (Score:5, Interesting)
Do the evolution (Score:2, Interesting)
Now, here's a little math for you
Re:No problem here. (Score:5, Funny)
So 122 Great Pyramids = 500,000 Libraries of Congress?
Great, another conversion factor to remember...
Huzzah! (Score:4, Interesting)
Re:Huzzah! (Score:2)
Security by obfuscation?
Temporary data? (Score:2)
letters from nigeria (Score:2)
oddly enough the most useful information is often the most concise. duck!
Hmmm.... (Score:2)
Hmmmmm.... I think I might know where all that 'new data' came from. [sitefinder.com]
quote (Score:5, Interesting)
Re: (Score:2)
words/motion picture (Score:2)
Looks like 599, assuming said motion picture is a complete rotting turd. Thanks for gems like this one, MPAA!
Re:words/motion picture (Score:2)
Effective use of data (Score:2)
``We're producing all this information, but we don't necessarily have the tools to use it most effectively,'' he said.
What does it mean to use data "effectively", and is the "We" producing the data the same "We" using it? My first instinct on not having the tools to use this data most effectively is "that's good". My second instinct tells me that data is already being used TOO effectively. Personally, I hope that cross-reference of mass data stores containin
that's a LoC per minute, almost. (Score:4, Funny)
But if these data were recorded on floppies, and stacked up to the moon n times, how many VWs would it take to carry those floppies to the stack site?
You forgot... (Score:2)
...how many golf balls falling on said stack it would take to knock it over. And if you laid all the bits in the data side by side, I wonder how many times it would go around the earth?
Re:I'm sure my math is wrong, but... (Score:2)
Your math is wrong. (Score:2)
Dividing 95,797,591m^3 of floppies by 0.368119m^3 per Jetta, the requirement is 260,235,389 Jettas to transport them all there. Or one Jetta, preferably one more reliable than my old thing, 260,235,389 times.
(Is the cargo capacity really that little? I would think it's over a cubic meter. Maybe they reduced the capacity in newer models.)
a problem that solves itself? (Score:2)
Storage (Score:4, Interesting)
it's gotta be stored somewhere! (Score:2)
For most of it
Where are the VC when one needs them?
Not long-term data (Score:3, Interesting)
Google Calculator Sucks (Score:2)
Even though it knows the Answer to Life, the Universe, and Everything [google.com] and number of feet in 10 metres [google.com], it can't convert 10 libraries of congress into feet of books [google.com].:(
I demand that this be fixed immediately!;)
It just occured to me... (Score:2)
You only get to count data you have generated yourself, anything you got from somewhere else (99% of porn, everything on P2P apps) doesn't count.
As such, I think I'm under my one-cd-per-person (800mb) limit for the year, but I do know a few friends (artists) that would definitely be over :P
Another interesting question is whether data conversion counts - If I copy a CD to
Turn your little 0 into a big 1 (Score:2)
Kids (Score:2)
Mass replication (Score:3, Interesting)
Re:Mass replication (Score:2)
"They performed surveys to estimate how much unique information exists on each type of hard drive."
Still, it seems like it would be a difficult thing to discern.
True it's a lot of info to create, but... (Score:5, Insightful)
It's only going to get worse... (Score:4, Interesting)
Re:It's only going to get worse... (Score:2)
The remarkable thing is that after analysis is complete, all that data is reduced to just two bytes: "42"
Speak English!!! (Score:2)
Just how much of that was porn?
-Goran
Libraries of Congress (Score:3, Insightful)
Units of measure (Score:2)
And if that number is about right... (Score:2)
I'd bet not much. And what is backed up may only have a shelf life of about 20 months if on poor CD-R or Floppies.
800 MB per person (Score:2)
I personally burned over 500 CDs last year, filled a couple of hard drives, and sent God knows how much email...
I think this goes to show what a wealthy little world we computer people live in.
Re:800 MB per person (Score:5, Funny)
Congrats, you balanced out 1 medium-sized tribe in Africa.
Re:800 MB per person (Score:2, Insightful)
Do you run your own particular psuedo-random number generator and store the results? Do you go out with a digital camcorder and record tons and tons of images of the world? Do you write that much prose or poetry in a year?
Or are you just talking about 500 CDs of data that you or somebody else 'ripped' from exisiting media and are shuffling around?
Just one question ... (Score:2)
Info Glut (Score:2)
My figures (Score:4, Interesting)
I'm a news photographer, shooting digital.
In 2002 I saved 78,742 photos to disk. (Bad images were not saved.)
That worked out to 122 gig. The output was transferred fromt he CF cards and archived to DVDs.
But how much of that 122 gig is really information? The image file saved by the Canon 1d is mostly empty air, as far as I can tell. There is also EXIF data and IPTC, and who knows how much hidden BS is included a'la Microsoft Word documents?
Simple compression was able to whittle that down to 33.2 gig. So that's my contribution.
The main beneficiary is the DVD-R blank disc makers and Western Digital, I guess.
Re:My figures (Score:3, Interesting)
I never had that limitation and I still shoot 2-3 times as much as I did in 1999.
Probably the main reason is the good cameras, like the Canon 1d, shoot 8 frames a second. A 1G CF card holds 420 shots. The largest roll of film is 36 frames.
I shot digital starting in 1996, but still primarily used film until decent digital SLRs came out. I moved over entirely to digital in 200
My Beer Gut is bigger than the Info Glut (Score:2)
What about the data from nuclear colliders? (Score:2)
megabytes. (Don't quote me on that.)
compression (Score:2)
I suppose the number could be much larger if you expand data before counting it.
Sorry... (Score:2)
almost exhausting a 64bit address space. (Score:2)
(62.3 for RAM style exabytes or 62.1 for HD style exabytes).
Relevance? (Score:2, Funny)
Not least for those historians who want to know what my Amazon.com session ID was on the day that my Runescape character hit mining level 33.
Five Exabytes (Score:2)
Reminds me of this observation: (Score:5, Funny)
3 billion of them will never be found again.
Poor files...
Do your fair share of the work. (Score:2)
long x;
{ for (true)
x = rand();
send_to_info_glut(x); }
Please send the data created to Info Glut, and while you're at it, send it to all the spammers and to SCO. With some luck, you might DDOS them off the internet.
What this means for the average bozo (Score:2)
800 MB per capita (Score:2)
What a load of twaddle. (Score:2)
Seriously, though, I bet the breakdown is something like this:
1. Most of the "information" is probably composed of music and film. We all know how much bandwidth and disk space music and film take up. Here's another thing: different sites might have dif
And what kind of data are we creating? (Score:3, Funny)
They fail to mention that also of note is that 99% of that informations is in the form of pr0n! That's a lot!
Dangit! (Score:2)
Now look what you've done.
-Adam
Not quite (Score:2)
If poster had carefully read the report it is noted in the report that the comparison is to the print collection of the Library of Congress. If you add in their audio and film collections they have at least two orders of magnitude more data. Even the LOC doesn't seem to be sure how much their entire collection is.
Nice, but what does it REALLY mean? (Score:2)
Interesting statistics regarding porn (Score:2)
Regarding web pages:
You read that right, 28% of the internet sampled appears to be porn. Anyone surprised? Read on...
Regarding P2P networks:
Magnetic media isn't such a bad choice (Score:2)
Many nine-track magtapes from the 1960s are still readable. For those that aren't, typically the problem is not with the magnetic coating, but the substrate. By now the properies of the substrate materials are much better understood, so this should be less of a problem with modern magnetic media.
Most optical media does not have any be
Only 800 Megabytes/Year?? (Score:2)
I admit I take more pictures than most, but I haven't gotten a video camera yet... just think of the Terabytes I'll consume with that bad boy.
--Mike--
Library of Congress Measurement (Score:2)
How will I ever do the proper conversions if you aren't using the up-to-date standards?
=Brian
The world farted 6 billion times today. (Score:2)
It should be noted..... (Score:3, Interesting)
though much is taken, little abides (Score:3, Interesting)
But only a fraction of that will make it onto my web site - I have maybe 60 megabytes of photos (cut-down to around 100k each) online and 10 megabytes of text on my web sites, and would be adding less than 40 megabytes a year to that.
Maybe I'll get a video camera, though, or put up some MP3s of my gamelan group...
Danny.
Re:Well... (Score:2)
I thus propose to transcribe all that data to clay tablets...
Re:Well... (Score:3, Interesting)
Maybe more research could be done into a marketable multi-century (millenial?) storage.
For corporate purposes, several decades of fidelity, perhaps a century or two, wo
1984 (Score:2)
Yeah right. The government wants all historical data distroyed as soon as it is created.
Re:Let's get the standard jokes out of the way (Score:3, Funny)
I for one welcome our new data generating overlords!
With all that data you'd think that my conne3^$ATDT01[NO CARRIER]
In Soviet Russia data generates YOU!
Homer: I see they have the Internet on computers now.
Re:Let's get the standard jokes out of the way (Score:2)
Ah! So it's AOL's fault.
Re:Some data from 1996 (Score:2)
Re:The year 10008 (Score:2)
Check out Frederik Pohl's Gateway series... humans find a remnant of an alien outpost on venus, and a ship on autopilot that takes them to a hangar of spaceships on
Re:Should I kill myself? (Score:2)
Search for hemlock society.
No GF is no reason to kill oneself anyway.