"Digital Universe" Enters the Zettabyte Era 137
miller60 writes "In 2010 the volume of digital information created and duplicated in a year will reach 1.2 zettabytes, according to new data from IDC and EMC. The annual Digital Universe report is an effort to visualize the enormous amount of data being generated by our increasingly digital lives. The report's big numbers — a zettabyte is roughly a million petabytes — pose interesting questions about how the IT community will store and manage this firehose of data. Perhaps the biggest challenge isn't how much data we're creating — it's all the copies of it. Seventy-five percent of all the data in the Digital Universe is a copy, according to IDC. See additional analysis from TG Daily, The Guardian, and Search Storage."
Re:Who cares? (Score:5, Insightful)
Re: (Score:1)
A zettabyte is more data than you generate during your whole lifetime. It's pointless to have so much space.
For those wondering: 1,000,000,000,000,000,000,000-ish, or some 10^21
Re: (Score:1, Informative)
Yes, or roughly a million petabyte, where a million is roughly 10^6 and peta is roughly 10^15.
I'll make roughly one post about this matter.
Re: (Score:2)
A zettabyte is more data than you generate during your whole lifetime. It's pointless to have so much space.
Speak for yourself.
Re:Who cares? (Score:5, Funny)
Yes, 640 petabytes should to be enough for anybody.
Re: (Score:2)
My Hot Fresh Brewed Coffee shot out my nose. It still burns a little, but that joke was totally worth it.
Re: (Score:2)
Glad I could assist in the steam cleaning of your nasal cavities.
Just another service we offer in addition to sarcasm.
Re:Who cares? (Score:5, Funny)
Don't you mean
Re: (Score:2)
No, what I meant specifically was "(Score:5, Funny)".
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
NOBODY needs a porn collection that big.
You're forgetting backups.
Re: (Score:1)
NOBODY needs a porn collection that big.
You're forgetting backups.
Just another kind of porn, I guess.
Re: (Score:1)
Re: (Score:2)
A zettabyte is more data than you generate during your whole lifetime. It's pointless to have so much space.
Ooooh, it sounds like somebody's pr0n collection seems a bit inadequate today.
Re: (Score:1)
Re: (Score:2)
Ah, so you've finally taken my advice and started cleaning those up. Thanks.
Re: (Score:2)
Re: (Score:2)
That's everyone, not per person.
That's ~1/7th TB per person
Man am I over quota.
Re: (Score:2)
Funny, me too, I'm at 7/1 TB rather than 1/7
Re: (Score:1)
1.2 ZBy is about 1/8th of a gram mol of bits (Score:2)
Re: (Score:1)
Umm, the article is pretty much about the fact that we did generate it in our lifetime
Where do I get a Zettabyte Drive? (Score:2)
Are they on sale for $149.00 yet?
Re: (Score:2)
I'm sure that's "more than enough storage" for all my digital files...
<oblig>640 zettabytes ought to be enough for anybody.</oblig>
Re: (Score:2)
Re: (Score:2)
Not yet, but soon.
Hardware: "Digital Universe" Enters the Zettabyte (Score:3, Interesting)
"In 2010 the volume of digital information created and duplicated in a year will reach 1.2 zettabytes, according to new data from IDC and EMC. The annual Digital Universe report is an effort to visualize the enormous amount of data being generated by our increasingly digital lives. The report's big numbers -- a zettabyte is roughly a million petabytes -- pose interesting questions about how the IT community will store and manage this firehose of data. Perhaps the biggest challenge isn't how much data we're creating -- it's all the copies of it. Seventy-five percent of all the data in the Digital Universe is a copy, according to IDC."
Re: (Score:3, Insightful)
Only 75%? Considering that all DVD's are copies, all local caches are copies, I wouldn't be surprised if that number was much larger.
Also, cutting out all the copies would only reduce the problem to .3 zettabytes. For day-to-day IT purposes, that's about the same number.
Re: (Score:3, Insightful)
Everything on their DVR is also not original.
Now, in the business world things are a bit different. Here you can expect the same 20 to 40 gigabytes of used storage on the median machine, but backed by a massive networked database of original uptime-crit
Re: (Score:3, Insightful)
Re: (Score:2)
I've seen you project your geek lifestyle onto the world before.
Re: (Score:2)
Second, you can't buy a camcorder that's not flash or hard disk. Yep, you heard me: Walmart only sells 2 camcorders that record directly to DVD, the other 150+ are all flash and hard drive [walmart.com]. The camcorder offering the smallest hard drive capacity is still 80gb for a paltry sum of $350 [walmart.com] and HD camcorders start [walmart.com]
Re: (Score:2)
You are projecting your own lifestyle onto others. The average person does not own a digital camera, and most of the ones that do are sporting one integrated into their pay-as-you-go $30 cell phone.
The average person does not have a smart phone. The average person does not have a camcorder. The average person does not have a digital camera. The average person doesnt even have a game console. They have a laptop which they send email with. Tha
Re: (Score:2)
You know people who don't own a digital camera? Really? I don't know anyone who _doesn't_!
Re: (Score:2)
Re: (Score:2)
ha yeah it was pretty dead-pan humour (not that that's a bad thing...)
Re: (Score:2)
I have a 20 year old sub $10/hour employee with a mid-range blackberry, and the rest of them probably average $200 phones (a couple of geezers like me and two single moms bring that average way down).
Re: (Score:2)
Are you kidding? I bought, at retail, a Sony cam, with 60GB internal drive and HD resolution 2 years ago for $900 from Costco.
Yeah.. thats means everyone has a Sony cam with a 60GB hard drive.. oh wait.. NO IT DOESNT
It means that you are a geek. Most people do not own *ANY* camcorder.
You heard me. Most people do not own a camcorder.
Let me repeat that one more time. Most people do not own any camcorder.
Re: (Score:1)
No it means he bought a camcorder sometime in the last 5 years.
Re: (Score:2)
HD home movies and photographs are copies, even if only one digital copy exists.
Re:Hardware: "Digital Universe" Enters the Zettaby (Score:4, Insightful)
If every piece of digital data doesn't have a copy made of it, it is one hardware failure away from non-existence. Most of the storage space used in businesses that I administrate is not for the original data, but for multiple backup copies. Copies are not a bad thing, in the business we call them redundancy.
# Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)
* Torvalds, Linus (1996-07-20).
Re: (Score:1)
The comment about all the duplication of storage makes me think of the current pop culture obsession with hoarding.
I'd guess that all slashdotters have known someone who obsessively downloads music - to the point that they've got more music stored than they could possibly listen to.
Re: (Score:1)
This is all after getting rid of tapes/MD and quite a lot of the vinyl/CD sometime I think he hold up the entire UK mus
Re: (Score:2)
So when this is duped in a few hours will that be irony or just funny?
Re: (Score:2)
Depends on whether you're SI & IEC or JEDEC (Score:2)
a zettabyte isn't ROUGHLY a million petabytes - it is EXACTLY a million petabytes, that being its definition, and all.
Depends on whether you're using SI & IEC units (Z/zeta- and Zi/zebi- for 1000^7 / 1024^7) or extrapolating JEDEC (which would come up with Z/zeta- for 1024^7).
ZFS? (Score:1)
Re: (Score:2)
How do we have copies of all this data? (Score:3, Insightful)
Since this is EMC, let me tell you...
EMC loves to tell you to use RAID1. - 2 copies of your data
If it's important, you should use timefinder (snapshots), 1 more copy of the data.
If you want DR, then you should implement SRDF, 1 more copy of the data (this one is remote)
If you want to do data warehousing on what you just replicated, you run timefinder on the remote copy, 1 more copy.
So that makes it 5 copies of my data on disk.
Oh, and to protect myself from data corruption (or a deleted file) being replicated to all these copies, it's still recommended that I backup to tape/VTL/MAID.
Total of 6 copies of data. That is if I'm using dedup on my VTL or TSM (which stores versions of a given file). If i'm using a traditional (daily incrementals plus weekly fulls) I could have lots of duplications within my tape infrastructure.
Ever wonder why EMC stands for Endless Mirroring Company.
Re: (Score:2)
Ever wonder why EMC stands for Endless Mirroring Company.
And if the endless mirroring company starts moving really fast, until they couldn't possibly move faster, they'd represent the relationship between mass and energy.
Re: (Score:2)
i thought they were the Excessive Margins Company.
Re: EMC's take (Score:1)
http://www.enterprisestorageforum.com/ipstorage/news/article.php/3879726 [enterprise...eforum.com]
I'm happy to see (Score:3, Funny)
That we have all become good citizens, backing up all our data. I presume the data recovery firms are all panicking now that all their potebtial customers have backups of everything, and thus no longer need their services.
Not bad to have a global backup ratio of >1:1
Personally I use RAIM (Redundant Array of Instant Messages) to back up all my important notes and communications. It only works as long as all my friends log everything too, of course.
Re: (Score:3, Funny)
Dude, that's so old-school. I use RAT (Redundant Array of Tweets). My data is backed up... 140 characters at a time.
I'm thinking of upgrading to a system with a larger packet size. RASC (Redundant Array of Slashdot Comments) might work, but I'm afraid of having my pr0n collection marked "insightful".
Re: (Score:2)
What is the data? (Score:3, Interesting)
I was told about 10 years ago that "70% of the world's digital data is stored under MVS" which surprised me a bit, even then.
After some thought when you consider that almost all commercial transactions (banks, telcos etc) whould have been running MVS then it may have been true.
SETI and CERN and other large scientific endeavours are small fry in comparison.
Re: (Score:2)
10 years ago MVS [wikipedia.org] carts were considered huge compared to the other gaming systems of the era.
Challenge? (Score:3, Insightful)
Why is that a challenge? Digital media is somewhat unique in that you can carefully craft media or information (reports, programs, videos much in the same way you'd carve a chair) but risk instantly and nearly irrecoverably lose it (much unlike a chair).
Copies of data are a safeguard by redundancy. A website gets taken offline, well good thing there is a mirror. My camera breaks or my hard drive disk fails, well good thing I have an external backup or copies on my DVDs.
Re: (Score:2)
" Perhaps the biggest challenge isn't how much data we're creating — it's all the copies of it. "
Slashdot apparently even manages to create new data while still backing up the old data...
But it's pretty (Score:1)
Information storage was expensive.
At some point we started word processing on the desktop.
Information storage was still expensive.
Files were still small and the majority of the bytes in each file was information.
As time progressed and Microsoft Office has permeated the work area, the information content of each file hasn't changed much.
Each release seemed to take more space to store the same information.
Today, the
Re: (Score:1)
That's a silly analysis of it, text markup and layout is some tiny fraction of it. 150 pages of text layout information takes up about the same amount of space as 2 crappy snapshots, or a few seconds of high quality video.
Re: (Score:1)
I think Hierofalcon was probably referring more the huge inefficiency of MS Word and co to store even a simple text based document.
I have seen 70MB+ Word files, which you can open, Ctrl-A, Ctrl-C and then paste into a new empty doc - save that doc and you have a 50kb Word file.
Plain text doesn't really have the capability of being inefficient (unless I suppose you fill the file with crap, but then it is simply efficiently storing a load of crap).
Re: (Score:1)
The part at the end where he talks about pretty favors my interpretation.
I don't know what causing what you describe, but there is probably something tracking changes to the document. And maybe somebody posted in a large bitmap (from what I have seen, people think that is a great idea), or perhaps of series of them, and then deleted them.
Re: (Score:1)
Yeah, Word has a lovely feature whereby when you remove sections from a document it doesn't so much delete that content from a file as just delete the references to it. So, if someone changes one image in a doc for another it will keep a copy of both images in the file but only show the new one.
It would be a good feature if it was actually made use of in some sort of revision history system, but as far as I can tell the only effect of it is the increased file size of some docs.
I agree with you that the talk
Re: (Score:2)
Gamefaqs still hosts all its faqs in text, which is one of the reasons I use it pretty exclusively. That and it being easily the most authorative faq site out there.
Library of congress (Score:1)
How many libraries of congress is that?
Re: (Score:2)
Re: (Score:3, Interesting)
Of COURSE most of it consists of copies! (Score:3, Insightful)
A typical individual wouldn't have a whole lot of unique information to store in the first place.... Basically, a collection of photos and some video from a few vacation trips or holidays, and some handwritten notes .... Maybe some artistic works (a few original songs or paintings, or ?) if he/she was interested in such endeavors. Oh, and your tax records and resume. But let's face it. Most of us are FAR more of content consumers than creators. Content creation usually results in mass re-distribution of the original work, as others want to enjoy a copy of it.
I don't see any harm with this either, since duplication is the best way to protect against data loss. (When my parents were trying to trace their family history, they reached a dead-end because a library had burnt up in a fire that contained the only known records of some of the people they needed to research. With so much data going digital, on media that's practically EXPECTED to fail after less than 10 years of regular use? You better believe we need lots of duplicates out there!)
I have often said.... (Score:2)
I have often spoken to a many engineers from gmail and hotmail....pertaining to the data they store and how they could improve their
systems by having pointers to emails instead of actual copies per storage account. if someone sends a joke email from one gmail account to all his friends which have 80% gmail accounts (so let's say, 25 in 30) you would only still have one copy of that joke email sitting on their server accessible by all who have that pointer reference, but in fact looks like they all have thei
Re: (Score:2)
Re: (Score:2)
Or you could just have pointers to letters in the English alphabet! Then you can store all your emails in only 26 bytes (plus some overhead for the pointers).
1.21 zettabytes? (Score:5, Funny)
Re: (Score:2)
To aptly apply the mispronounced "jigawatt" paradigm: <Doc Brown>1.31 settabytes? Great Scott!</DocBrown>
Retarded IP (Score:5, Insightful)
This beautifully illustrates how idiotic the concept of "copy right" and IP in general is in the digital universe. When 75% of 1.2 zettabytes is mostly untracked copies of other information, just storing the licenses alone would be an impossible task.
How do you maintain a business model built on the exclusive right to copy information in world where everything is a infinitely copied and copyable? It's like trying to legislate and sell access to saltwater while floating on a raft in the middle of the pacific.
Space Program (Score:5, Funny)
- 1 zettabyte / 1.44MB floppy disk = approx 694,444,444,444,444 floppy disks.
- 694,444,444,444,444 * 3.5 inches per disk = 2,430,555,555,555,550 inches if you laid the floppies end to end.
- 2,430,555,555,555,550 inches / 63360 inches per mile = 38,361,040,965 miles
- 38,361,040,965 miles / 2.7 billion miles to pluto = approx 7 round trips to Pluto via floppy disk.
In conclusion: Don't kill NASA yet, President Obama. We've found a way to get to Pluto!.
Re: (Score:1)
Re: (Score:2)
If only we'd saved all of those AOL floppies...
Too many duplicates consuming disk space? (Score:2, Insightful)
No problem...
zfs set dedup=on tank
there... that should do the trick.
75% sounds about right (Score:2)
75% of everything I have on disk is a copy of something else, but unfortunately I usually have lost the copy somewhere in the process of moving, moving from one machine to the next, or trying to clear up disk space so I can download more stuff to leave on my disk.
A zettabyte is EXACTLY a million petabytes (Score:1, Troll)
By definition. And since EMC is a storage company, they're almost certainly using the SI prefixes properly.
The author of the summary is, I think, confusing zettabytes and petabytes with the base-2 units, zebibytes and pebibytes. For all of the binary prefix haters, when you get up into these sizes the difference between base 2 and base 10 units is more than big enough to justify the effort to use the correct terms. The difference between one zebibyte and one zettabyte is over 180 exabytes.
Re: (Score:1, Offtopic)
Do you have Asperger's?
Re: (Score:2)
Do you have Asperger's?
No. Do you have Alzheimer's?
I'm assuming that's how we play this game. I haven't seen it before, though, so I'm just guessing.
Re: (Score:1)
And when you get to those kinds of numbers, they are the same order of magnitude so the difference is fairly irrelevant.
Order of magnitude, yes, but a power of 10 is a pretty wide range.
Put it this way: A zebibyte is almost 20% larger than a zettabyte. That's a pretty big difference.
Re: (Score:2)
Most amusing to me is that 1,048,576 (if that's what he meant) is fewer characters than "roughly a million"
And 2^20 is even fewer!
Damn pirates (Score:1)
How do you know which one is the copy? (Score:2)
Identical meaning everything down to the create date and last modified date.
Volume of digital data, not information (Score:2)
Actually, information is useful stuff. The internet, and the world is saturated with useless stuff (data, noise). Also, the world's stuff is considerably smaller when de-duplicated. And then if you remove redundancies, and different ways of saying the same thing...
I am pretty sure that all the world's Information can be contained on a single petabyte. That would include all the world's literature, and all the newspapers, magazines, etc. If you include pictures, maybe significantly more.
Part of the Data pro
questions about how to store and manage the data (Score:2)
Everybody takes care of their own bit(s) & backups; there is no single entity dealing with managing 1.2ZB.
Questions not so interesting. Move on.
The scary part (Score:2)
Is what percentage of it ISN'T backed up AND should be (which will be something less than 25% but much greater than 0%).
Seriously? They included duplication? (Score:2)
The number is meaningless, because "duplication" is arbitrary. Where do you draw the line? If duplication means "copying data from one place to another" then data is duplicated every time function parameters are pushed on the stack, every time memcpy() is called, every time something is loaded from disk into RAM. I could write a simple loop that copies a 32-bit quantity from EAX to EBX three billion times per second. If you include all that shit going on, I bet their number would be higher by a factor of a
Anybody have a link to the torrent? (n/t) (Score:2)
Anybody have a link to the torrent?
Re: (Score:2)
Zeta is one with 21 zeros (Score:2)
a kb is a lot of data (Score:2)
Re: (Score:2)
I need that much every second, at least!
Re: (Score:2)