SKA Telescope Set To Generate More Data Than Current Net 73
angry tapir writes "The forthcoming $2.1 billion Square Kilometre Array (SKA) radio telescope could generate more data per day than the entire internet when it comes online in 2020, according to the director of the International Centre for Radio Astronomy Research (ICRAR), Professor Peter Quinn. SKA — which Australia with New Zealand and South Africa are competing to host, and which will help the search for Earth-like planets, alien life forms, dark matter and black holes — will be 10,000 times more powerful than any telescope currently used. Slashdot has previously discussed the proposal to use 'Skynet' — a grid-computing-based solution for processing and storage."
Uh oh. (Score:2)
Cue in the HAARP freaks in 3...2...1...
Re: (Score:2)
Cue confused rude boys in 3...2...1... ...
Let's hope this satellite will pick it up, pick it up, pick it up [youtube.com]
Re: (Score:2)
Aww you beat me to it... ...but YES.
Thank you, sir/madam.
Re: (Score:1)
How can they afford this with Congress cutting the budget for everything except military spending and they aren't yet that delusional into planning for an interstellar invasion?
This thing will blow through their ISP's data cap in the first couple minutes, after which they'll be on the hook for more than $1/Gb... Hell, it might even increment at a rate within 1 or 2 orders of magnitude of the national debt's rate of increase.
finally we can retire that... (Score:2)
finally we can retire that saying/meme about Internet is for porn....or that the mass storage market is driven by porn
unless...I guess we happen to be able to spot the alien equivalents via this SKA.....
Re:finally we can retire that... (Score:4, Funny)
This is just Astronomical Porn. Rule 34 still holds true.
Re: (Score:2)
How about: Imagine a beowulf cluster of these...
Surprised it hasn't happened already. (Score:2)
Really, how much "data" is "generated" by the internet every day?
Sure, there's lots of traffic, but that's millions of copies of the same data.
The new data going on to the internet probably isn't too heinous in quantity.
And the summary blew the meme. It's not "generate more data per day than the internet", it's "generate more data per day than the earth does in a year, and conduct more internal networking traffic than the internet."
Re: (Score:1)
Re: (Score:2)
if you start logging all traffic, you'll quickly run out of space to store it.
if on the other hand you start logging only novel content, you'll take somewhat longer to run out of space to store it.
just ask google how their cache works.
Re: (Score:2)
SKA and other astronomy projects (Score:3)
Re: (Score:2)
The US is not even involved in the SKA.
Re: (Score:2)
10 years from now... (Score:1)
The FUTURE (Score:1)
Until... (Score:1)
NBN (Score:4, Interesting)
(Warning: Australian content ahead!)
I hope this lays down a water-tight case for the NBN going ahead - or the combination of the two being a catalyst for each other. If there's one thing this is good for demonstrating, it's that future data requirements will outgrow the current infrastructure very quickly, and a project which is as far-sighted as installing FTTH throughout the country has a justification for the unforeseen benefits it can help happen.
(and bah humbug to anyone who thinks the SKA isn't justified to begin with!)
Re: (Score:2)
That's absurd on the face of it, you don't need FTTH for the SKA, because the data isn't going to be distributed on a large scale to homes! For the SKA, you only need one fat pipe leading from the SKA, out to the under-ocean cables.
Re: (Score:2)
How many major infrastructure projects have been attempted at all over the past 40 years? That is the entire problem, as far as I see it, is we are falling behind because we are not building anything. Successive governments keep playing it safe and if there is a surplus the money does not get invested into major infrastructure projects, it gets "invested" into winning the next election through middle class welfare. The mining boom will eventually end and we will be left in the tough times with nothing to
Ska (Score:1)
Re: (Score:3, Funny)
Re: (Score:1)
No, there is just too much data. They chose ska because they needed something exactly like reggae except much, much faster.
Almost! (Score:1)
is this a good thing? (Score:3)
Re: (Score:1)
If the CIA invented a device that listened into every phone call in the entire world, real time and dumped it all as a WAV file on a storage device in the basement, would that really do them any good at all?
Why would they invent that when they already have a device that listens into every phone call in the entire world, real time and dumps it all as a bunch of AMR files on a storage array in the basement?
Seems like... (Score:1)
Nonstandard unit fun! (Score:1)
Sensationalistically inaccurate article... (Score:4, Informative)
The project is expected to deliver up to an exabyte a day of raw data, compressed to some 10 petabytes of data in images for storage.
So, 10 petabytes of data - who cares about the raw source. I work for a video streaming company and we have several petabytes of H.264 video. If that were to be uncompressed into 30 FPS 1080p raw data, it would be 50-100x that, so already approaching a couple hundred petabytes. And think of all of the JPEGs out there - why don't we just uncompress all of those for the comparison as well?
A (likely conservative) back of the hand calculation by Google estimated at least 5 exabytes accessible on the Internet (so even the wrong estimate is wrong). I'd imagine a huge percentage of that is compressed video, audio, and images. So, basically 5 exabytes vs 10 petabytes - it's off by 3 orders of magnitude.
Re: (Score:2, Informative)
Indeed ... while it's an impressive number, we already have experiments that generate more "raw data" per day than that: "CERN experiments generating one petabyte of data every second" http://www.v3.co.uk/v3-uk/news/2081263/cern-experiments-generating-petabyte That's 84EB per day. But "all" of it is crap, and they eventually store only about 25PB per year.
Re: (Score:2)
It's a specious analogy. Video and audio can be compressed with loss, and the algorithms make heavy use of human perceptual limitations. Scientific data produced by large instruments need to have breadth and depth; the instrument is a scarce resource and there are unlimited ways of reducing radio astronomy datasets to produce different data and different insights. Especially with radio, you're going to be collecting a ton of white noise-looking data, but you can't use a lossy compression algorithm to trim i
Re: (Score:2)
It's a specious analogy.
If you RTFA, no it's not.
The project is expected to deliver up to an exabyte a day of raw data, compressed to some 10 petabytes of data in images for storage.
They clearly say it's compressed before it's stored. The raw data number itself was pure boast. Besides, no one said it was lossy compression, anyway. Lossless image compression can be very efficient depending on the image.
Re: (Score:2)
It's a distortion on the part of the article. Radio astronomy raw data are not images. And the computational effort to reduce the raw data into data from which you could make images is large enough that you tend to store the result of the processing alongside the original data--it takes up more space, not less.
It still doesn't matter, because there's no way they'll be running the telescope at full throttle until several years after commissioning.
Re: (Score:3)
Your maths is quite a bit off.
We have only 3 primary color channels (4 if you count rods separately, 5 if also counting tetrachromats [wikipedia.org]), not "65000". We can't see 1 million different intensities simultaneously either -- while the human eye does have an enormous dynamic range, this adaptation takes a while (minutes). At any one time, we can see maybe 300-1000 distinct intensity levels per color channel. This only requires 10 bits to represent per channel. Even your 125 Mpixels is an exaggeration, because we h
Re: (Score:2)
So, 10 petabytes of data - who cares about the raw source.
Depends on what you're doing. If you're taking pictures of the Eiffel Tower on your vacation you're not really interested in if the CCD on your camera's LSB was a 1 or a 0. If you're taking images from a hyperspectral sensor for scientific purposes the better the accuracy the better (hopefully) your results. It depends on the type of application and how important accuracy is to you (and how accurate your sensor is).
I work for a video streaming company and we have several petabytes of H.264 video.
We've got several hundreds of terabytes in lossless compressed high resolution hyper spectral
Re: (Score:2)
So, 10 petabytes of data - who cares about the raw source.
Depends on what you're doing. If you're taking pictures of the Eiffel Tower on your vacation you're not really interested in if the CCD on your camera's LSB was a 1 or a 0. If you're taking images from a hyperspectral sensor for scientific purposes the better the accuracy the better (hopefully) your results. It depends on the type of application and how important accuracy is to you (and how accurate your sensor is).
I already pointed this out above, but it you RTFA they clearly say it would be compressed before it's stored. If they don't store the raw data, then, yes, who cares about it because it doesn't exist for analysis. And as I also pointed out, nowhere did it say it was lossy, just that the ~1EB of data compressed to 10PB.
Re: (Score:2)
My Home DVR (Score:2)
In nine years my home DVR system will generate more data than the entire internet!
Early processing of raw data (Score:1)
Re: (Score:3)
We know a decent amount about it. And we don't want to replicate it in our scientific instruments. The brain does all sorts of extrapolating, interpolating and other forms of making shit up. Which is great if you're a mammal who needs to see the sabre toothed tiger stalking you, but not so good if you're trying to get accurate, quantitative data out of a scientific instrument.
Re: (Score:1)
Also, computationally intensive to do algorithmically, free when it happens due to quantum interference or neurotransmitter binding or whatever in the optic nerve.
A grid-computing-based solution for processing and (Score:2)
storage called Skynet... ...I'm sure this won't turn out bady.
Perl one-liner (Score:1)
Mostly noise (Score:1)
The first image recorded from the SKA (Score:2)
"Pick it up, pick it up, GO!"