Mega-Uploads: The Cloud's Unspoken Hurdle 134
First time accepted submitter n7ytd writes "The Register has a piece today about overcoming one of the biggest challenges to migrating to cloud-based storage: how to get all that data onto the service provider's disks. With all of the enterprisey interweb solutions available, the oldest answer is still the right one: ship them your disks. Remember: 'Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.'"
Pro photography is a huge problem (Score:5, Interesting)
Returning from a site with a tethered computer full of 80 MP 16-bit raw files from a day's worth of shooting would break most bandwidth bills if you tried uploading all these images.
Re:Pro photography is a huge problem (Score:5, Interesting)
Realistically, though, if I want to I can just upload them all to home or a cloud storage in batches overnight, the same way I download 10 gigabyte files at home. It's just plain easier to cart em around, though.
Re: (Score:2)
I think you really misunderstood the message you were replying to. The key word there (repeated too) was "consumer".
Not everyone lives in the backwatered North America where the ISP's are still wringing large dollars for 1990's bandwidth out of typical home users.
Re: (Score:2)
The second part of his post specifically tells us a pro photographer "should have little to no problem forking out the ~$800
Re: (Score:2)
You're a Very Bad Pedant.
http://dictionary.reference.com/browse/laying?s=t [reference.com]
Re: (Score:2)
Consider me chastened for my early morning counter-pedantry. Next time, more coffee.
Re:Pro photography is a huge problem (Score:5, Informative)
The tiny town of Sebastopol CA, population ~7800 has gigabit fiber to the (some) doorstep for $69/month.
Re: (Score:2)
And all the socialized deficit spending you can eat.
Re: (Score:2)
Re: (Score:3)
Um, no. Sonic.net is the business doing the rollout. Basically, they pay for it by getting 100% adoption (by eating the other services' lunches).
Re: (Score:2)
That covers about 40 people.
Now how about the rest of us?
Re: (Score:2)
Yeah, and I'm across town. Fuckers! :-(
I'm not even in the "planned roll out area"
Re: (Score:2)
Re: (Score:2)
Nice to "meet" you neighbor. My back fence is Lynch Road.
why dodge this question? (Score:3)
Pressed if disks are accepted, the company responded that “All common database products provide a capability to extract to a common file format like .csv.”
what a professional answer. and by that i mean it didn't answer the question at all.
Re:why dodge this question? (Score:4, Insightful)
You don't know the exact dialogue between the journalist and the rep. I've been quoted in print in similarly stupid ways when what I said made absolute sense in context to what was asked. "Pressed if disks are accepted" could have been something like the rep telling them about a new CSV import tool they had built, the journalist saying "So if I mailed you a 5TB database on a disk, could you import that?", and the rep replying "Sure, but you'd need to export the data first...".
Re: (Score:2)
Station Wagon Full of Tapes (Score:5, Funny)
Yeah, the bandwidth is great, but the latency SUCKS.
Comment removed (Score:4, Funny)
Re: (Score:3)
Or at least a JATO unit
Better solution. (Score:2)
Add a flux capacitor to the station wagons.
What is your transmission speed when the transmission is completed in -1 hour?
Re: (Score:2)
Re: (Score:2)
That improves the already good bandwidth but doesn't do anything for the latency.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
A dumptruck full of harddrives. [slashdot.org]
Re: (Score:2)
True, but that's not even the real problem. The idea of creating an "initial seed" is a good one, but it's only dealing with one sign of a larger problem: our internet connections are often not good enough to deal with the volume of data we're pushing, and so cloud storage solutions can only serve certain cases.
Storing files on a remote system with limited bandwidth is only good when you're dealing with generally small files. A limited number of large files can be fine so long as you're primarily syncing
Re: (Score:2)
No. The cloud is really irrelevant.
Either I can't push it out there and thus can't use it.
Or, I can push it out there but it is irrelevant because I could just host my own stuff to begin with.
I either can't use it or don't need it.
Re: (Score:2)
There's also when there's a ton of stuff going in and little going out. Contrived example: Survey 1,000,000 people and get back 10 aggregated results.
Re: (Score:2)
Of course, that's not happening anytime soon, especially since the ISPs and media companies that they're partnered with have no interest in giving people decent upload rates.
I think if you look at most ISP's over time you'll see that upload rates have gotten far more symmetrical than they used to be. I just for example got moved from 15/5 to 25/25 for the same cost.
Re: (Score:2)
Second biggest challenge (Score:5, Insightful)
Re:Second biggest challenge (Score:5, Insightful)
No, that's third. The second biggest challenge is believing that those fine hosting companies with servers hosted in lower Slobbovia won't have a few entrepreneurial employees who will *actively* be searching your data for all that is monetizable.
Re: (Score:2)
A special case of my number two perhaps.
I like your sig.
Re: (Score:3)
Re: (Score:2)
Comment removed (Score:4, Insightful)
Re: (Score:2)
Why not? You are paying for their services right? Even a particularly scummy company would be swayed by the request "my pre-production server crashed hard. Can you mail me the disks?". I am not familiar with all these companies, but Crashplan charges you a fee to put your data on a disk and mail it to you. Unless you
Re: (Score:2)
Send them Truecrypt containers, problem solved.
Re: (Score:2)
Yeah I think that is the way to go... for the extra paranoid, how do you know you can trust Truecrypt?
http://superuser.com/questions/164162/is-truecrypt-truly-safe [superuser.com]
I'm sure (well, actually not really) rumours are just FUD (by whom and for what purpose?) but that's the thing, you're through Alice's Looking Glass now... is it FUD disseminated by people who don't want use to use Truecrypt just because it is iuin fact unbreakable or is it someone who's hit on truly suspect facts about Truecrypt? How do y
Backups (Score:5, Informative)
My last employer offered offsite backups to clients. For the initial seed, we always tried to get them to put it on an external HDD and ship it to us (or at least DVDs). The only major exceptions were clients that were also on FiOS - that was the only case where over-the-net transfer was faster than the backup-and-ship-it method for the initial seed.
tapes have to be written and read (Score:2)
Re: (Score:2)
140MB/s is low speed?
That is the speed of LTO-5, at best.
Re: (Score:2)
Re: (Score:2)
Then put it on a RAID array. I knew people that were doing this more than 10 years ago. They needed to move large amounts of data around. So they would FedEx a RAID array around.
Re: (Score:2)
That depends on the drive and how fast you can send it data, more than the format. This is for linear reads and writes of course.
Re: (Score:2)
Re: (Score:2)
That only depends on how many tape drives you utilize at once. If you are shipping 200lbs of tape, and using dozens of tape drives then you should have much better bandwidth than if you tried to send it over the internet.
Re: (Score:2)
station wagon has low bandwidth, the tapes have to be written and read.
No. Bandwidth is how much you can send in one go (tapes/hdds in a car are extremely high). Latency is fairly much how long it takes you to do it. Throughput brings these together.
Re: (Score:2)
Re: (Score:2)
As the old joke goes; the bandwidth is great, it's the latency that sucks.
Re: (Score:2)
Re: (Score:2)
True. I was at one point involved in converting the Stanford AI Lab archives from 6250BPI tape to a file server. People were loading tapes for weeks. As soon as a tape was loaded, the data went over the Internet to a file server at IBM Almaden for format conversion. The transmission of a tape only took a few seconds. Of course, both IBM Almaden and Stanford have major backbone connections.
I'm not so sure... (Score:3)
Bandwidth (Score:1)
Intercontinental company I used to work for, once or twice a year they'd send an intern over the Atlantic in the SST with a case of tapes.
When it just positively had to be there asap...
Bandwidth of a Station Wagon (Score:5, Funny)
Yes, never underestimate the bandwidth of a station wagon full of disks hurtling down the highway. The latency, on the other hand, leaves much to be desired, and I've heard the packet loss can be downright fatal.
Re: (Score:2)
Re: (Score:2)
Oh, I get it! You're talking about collisions!
Snail mail FTW! (Score:2)
I got my first linux distribution (I don't remember if they were called distributions back then) shipped on tape to the campus computer lab where a group of us brought our computers to copy the files.
Re: (Score:2)
I got mine in 1999 from Cheapbytes:
http://www.cheapbytes.com/ [cheapbytes.com]
Corel Linux FTW!
Re: (Score:2)
I got mine in 1999 from Cheapbytes: http://www.cheapbytes.com/ [cheapbytes.com]
Clicked on the link out of curiosity, got a splash screen that looked like it was designed in the 90s. (*) Anyway, clicked on the "Click here to enter the CheapBytes store" and I got...
Great Success !
Apache is working on your cPanel® and WHM Server
If you can see this page, then the people who manage this server have installed cPanel and WebHost Manager (WHM)
So are they still trading or is this just a zombie remnant? Guessing that their business would have shrunk quite a lot since the days when everyone was on dialup and you'd have had to be on crack to consider downloading even a CD's worth, let alone a DVD. (Used to order Linux discs quite a lot myself, haven't done it since I
Satellite cap (Score:2)
Guessing that [a CD distributor's] business would have shrunk quite a lot since the days when everyone was on dialup and you'd have had to be on crack to consider downloading even a CD's worth, let alone a DVD.
Shrunk? Yes, I'll grant. Still useful in places that can't get FTTH, DOCSIS, or DSL? Yes. Satellite and cellular are still capped to about one DVD a month, with single or dual layer depending on which plan you choose.
Re: (Score:2)
Cheapbytes didn't even really come around until broadband was commonplace. The fact that they are "cheap" is a reflection of that. Before then, you had more expensive CD sets.
It's cheap for a reason.
Re: (Score:2)
I got mine in 1994 from Walnut Creek. 6 CDs. 4 Distributions.
Re: (Score:2)
I am much younger, I got my linux from a dead tree.
Re: (Score:2)
That is relevant because it was easier/faster to get an up-to date-ish linux cd with a book that had to go through various editors, a printing press and the book stores distribution system then download the cd over a 33.6 modem... and then it was easier to read the book then load /search a document that large in a word processor and stare at a low res CRT screen.
A problem bigger than getting your data on ... (Score:5, Insightful)
... is getting it all back OFF again when you want to switch service providers.
The one thing you want never to happen is that you get locked in to a single cloud service. They might go bust, they might become uncompetitive. They may become politically "unfriendly" or tainted with customers you have no desire to be associated with - or any of a number of other reasons to say "adios".
Just like with disaster planning, all the processes and procedures, agreements and SLAs are worthless until you've actually PERFORMED the operation and done so without a major service interruption. How many cloud users have gone that far - and how many are locked in but don't know it?
Re: (Score:3)
We've already seen the unsinkable cloud get sunk. Amazon's never-down cloud has rained out at least once in some regions. Another problem is the cloud provider making changes to their services that impact the way your company operates. My last company was in the process of Googleizing when I left a year ago and Google's already made changes that must have required training and documentation updates. And they can remove apps and services at any time so some unpopular product that happens to be very usefu
TCP/USPS (Score:2)
Ah yes, the TCP/USPS revolution has finally arrived!
Re: (Score:2)
You must use a hell of a large window size with all that latency.
Aspera and Friends (Score:4, Interesting)
You you always use a UDP solution such as Aspera [asperasoft.com]. Fast transfer speeds, bandwidth management and they have a specific AWS implimentation. [asperasoft.com]
Other options to look at include Smartjog [smartjog.com], whose new Bolt product looks quite interesting, Riverbed's Steelhead [riverbed.com] product, Filecatalyst [filecatalyst.com] and Signiant [signiant.com].
There are many solutions around now to deal with large file transfers for both small and large business. Most of them use UDP instead of TCP/IP, with Checksums to ensure all data is reliable delivered. Even with just 1Mbps upload speeds, something like one of the above named products will be advantageous. I've worked in the media industry for a number of years, and this type of thing is being used in Film and Television all the time. Of course, there are still tapes being shipped around, but in emerging markets, such as Russia for instance, the file transfer really beats a tape being stuck in customs for weeks or months.
But then... (Score:5, Insightful)
How did you manage to fix armed FBI storming your servers located in another country problem?
Re: (Score:2)
How did you manage to fix armed FBI storming your servers located in another country problem?
Unless you're not in the US, and "another country" is the US - what the hell are the FBI doing there?
Not just a "public" challenge (Score:3)
I am dealing with this as well, albeit on a different scale. About a year ago, the powers that be decided that they were going to develop a private cloud for the company. Nobody really considered how to migrate 500+TB of data from three separate sites into the new cloud. We are doing a mixture of over the wire replication (for sites with 100Mb+ of bandwidth), physical replication (using NAS devices and tape), and synchronization using DoubleTake for the SQL data and Vice Versa Pro for file system data. It is a massive undertaking, made even more difficult by the fact that we are working with production systems with locked in SLAs that need be maintained.
For the average person, and even most enterprises, I honestly believe the best way to get into "the cloud" is by following a well planned out, phased approach. The first phase should be using the cloud as a DR target. Only when both sides of the equation are balanced and able to operate independently of each other can you consider doing away with one and moving to the other.
Re: (Score:3)
Actually, enterprise issues regarding data synchronization quickly make get problematic.
Have just watched a migration from private mail servers to cloud-based email. Months in, it quickly became apparent that a few short days or even a few weeks of pain associated with migrating users cold turkey (and then importing requested data from Notes once it had become static) would have been astronomically less cost and pain compared to wiring the connector and having two frameworks alive (and borking the sync in
Re: (Score:2)
we all can write the headline and article now for going the other direction the day some cloud provider implodes: Company Widgetcorp declared bankruptcy today. Their tragic fall from stalwart Rusell-2000 midcap manufacturer to receivership happened unexpectedly: Their cloud-based XXX provider shut off servers without warning less than 60 days ago, and Widgetcorp was never able to recover critical processes and data.
Perhaps the "use the cloud as your DR target" model is a good one after all! I think a good w
The bandwidth of a fully laden alimentary canal. (Score:2)
Is around a gigabyte per second.
(100 packs of 16*64GB microSDs, in appropriate packaging, swallowed at intervals over the course of a day)
Re: (Score:2)
Re:The bandwidth of a fully laden alimentary canal (Score:4, Funny)
African or European?
Perhaps the question is ... (Score:2)
The 'Cloud' option needs to be a part of your system design in the first place. So you begin to accumulate all that data in The Cloud from the word go.
Canada (Score:4, Interesting)
AWS Import/Export service... just ship the disks. (Score:3)
http://aws.amazon.com/importexport/ [amazon.com]
http://awsimportexport.s3.amazonaws.com/aws-import-export-calculator.html [amazonaws.com]
It's not rocket science. Yes, shipping drives is the cheapest, fastest option for a lot of people.
YMMV, speaking for myself, not my employer, etc. etc.
-Isaac
MegaUploads: The other unspoken hurdle (Score:3)
Namely, the increased risk that your data will become collateral damage in the War On Piracy.
Re:The real hurdle (Score:5, Insightful)
Getting around all the buzzwords
Well that's one hurdle.
The next is RECOVERY when ICE or FBI or some other 3letter agency walks in an takes your data because one tiny customer use the service for some allegedly nefarious purpose.
The key here is to use a service so big that even god himself would not dare take it down, although the Ayatollah might try. Small cloud services, even if multi-homed are a risky proposition. Even if you do manage to get all your data into them, they are not large enough to push back against any subpoena or search warrant that any misguided judge in some backwater jurisdiction may issue.
Re: (Score:2)
This, right here.
Unless that cloud service also comes with a guarantee* that the physical disks they park your data on are separate and distinct from anyone else's (and that no two customers share the same disks), you're just as borked as the perp when someone shows up at the colo with a warrant.
* Good luck with that. It would either cost you a mint, or you'd at best get your own LUN, which means approximately bupkis.
Re:The real hurdle (Score:5, Informative)
That is just one of many of the hurdles.
Really, these problems are problems because most 'cloud' shit is done wrong.
It's a bit of a worn out record here on Slashdot, but anyone or any company which is fully dependent upon The Cloud for business continuity is a fool.
* First off, there is no such thing as 'utility computing', and probably never will be due to the volatile nature of storage and its ongoing cost of maintenance.
* Second, if you do not maintain primary physical control of something, to the best of your ability, you do not control it.
* For primary IT infrastructure, it will cost more to do "Cloud" than local. If you can afford 2-3 servers a year, but not much more, and a nominal IT operations budget, chances are you should have an in-house "cloud" with off-site replication.
* Bandwidth costs both ways will kill you, as will latency in many cases, will kill Cloud functionality.
At this point, I still strongly recommend against public Clouding your systems unless they are:
a) (very!) low volume with use-based billing. This only makes sense for a low-volume public-facing site where you don't already have IT infrastructure (on a cost basis)
b) off-site 'hot' replication. You've got your inside 'private Cloud' which replicates to off-site systems. (Cloud is basically just colocated virtualization, after all.)
c) Other geographic/distribution requirements (eg. multisite organization with none serving as a good central hub). In this case, colocation of your own equipment makes more sense in many regards.
unless you host the olympics (Score:2)
Yeah, peak load for 20 days / 2 years.
Yeah, buy 20,000 HP servers with 48cores each.
Or prebuy 80,000 VMs at ec2 for 20 days.
Re: (Score:2)
OK, so you came up with the 1% scenario where Cloud is the best option. Yes, there are other similar (IPO, product launch, etc.) corner cases. Arguably, this falls under "c", maybe as "d" as "short term mass distribution".
Re: (Score:2)
You might get one hell of a download rate from torrents, but you don't get a better upload rate.
Re: (Score:2)
I'll send you a growl notification when it finishes...
Re: (Score:2, Funny)
> You might get one hell of a download rate from
> torrents, but you don't get a better upload rate.
You, sir, are unfamiliar with how things work here, in Soviet Russia!
Re: (Score:2)
I do know at least how things work on /.
Natalie_Portman_and_hot_grits.torrent
Re: (Score:2)
If you want to transfer a petabit of data by torrent, you're still going to need a petabits worth of data allowance from your ISP to get the data into the interwebs in the first place. Torrents aren't a magic way of bypassing the fact that data needs to be moved from one place (your computer) to another (someone else's computer). It just saves you from transferring the same data over and over again.
Re: (Score:2)
Please don't post a serious response to my humorous twaddle.
You don't want to appear like a sufferer from ASCIIbergers syndrome...
Re: (Score:2)
There must be a lot of us, judging by your painful 0 (Overrated) score. Next time, more humour and less twaddle, my good man!
Re: (Score:3, Informative)
I have never liked the station wagon analogy, because it misunderstands the thing we are trying to measure. In the example, we measure the bandwidth of the station wagon. But that's like measuring the bandwidth of a packet -- a nonsense concept. We measure the bandwidth of the channel, not the chunks of data which fly through it. To really get the right analogy, we should talk about the bandwidth of a freeway, not the station wagon which drives upon the freeway.
Bandwidth in the colloquial sense means "the amount of data which passes a given point, per second." So, imagine that you can load 25 TB in the form of tapes into a station wagon. For safety, these station wagons must drive a distance of 75 meters apart and a speed of 100 kilometers per hour. That means that one station wagon passes a given point every 2.7 seconds. That's 9.2 TB per second. Adding a second lane to the highway would double the bandwidth.
The stupid calculation which is often performed, on the other hand goes like this. You have 25 TB in the wagon, and you drive it to a location 10 hours away... Already you've gone off the tracks, because you are mentioning the TIME it takes to get to the destination, i.e. the LATENCY. And as anybody knows, the latency (or equivalently the distance between the points) has NOTHING to do with bandwidth.
How can you say Time has nothing to do with bandwidth when, in your own example, you measured it in TB per SECOND?
Following your example again of 9.2TB/sec, that can be changed to 9.2TB * 60 /min, or 9.2TB * 60 * 60 /hour, or 9.2TB * 60 * 60 * 10 / 10 hours, which is the exact measurement that you seem to have a problem with earlier in your post (data in a 10 hour period).
Re:Bandwidth of a station wagon (Score:5, Interesting)
Indeed. He also ignored the core reason for having said bandwidth - you have X amount of data to move in Y time (at under Z cost); what's the best way to do so?
As such, a 'packet' on the freeway system is rather expensive, so you don't want to be putting multiple station wagons on the system if you don't have to. Figure the driver costs $20/hour, the vehicle itself $.50/mile(gas, maintenance, insurance, tolls, etc...), and you're looking at 300 miles in 10 hours. For a single packet you're looking at $350 for that single 'packet'. If a single station wagon doesn't do it, perhaps a cargo van would, which doubles the capacity of the packet while only raising the cost $50, to $400. Still not good enough? Upgrade to a 'package van' like UPS/Fedex trucks. Next step would be a Semi.
In any case, I'd say that you could fit 25TB into a motorcycle today - 3 TB drives are fairly common now, and I can fit 10 into my saddlebags easily. Heck, I can get 1.5TB native tapes [wikipedia.org], about the same size as a HD. Padding it's dimensions up, it's 11 x 11 x 3 cm = 363 cm^3, or 2,755 per cubic meter.
A 2008-11 Dodge Grand Caravan Cargo van [allpar.com] - 143.8 cubic feet = 4.07 cubic meters, giving me room for 11k 1.5TB tapes. 16.5k TB, in 10 hours, if I have a single cargo van. Ouch. Disregarding media cost, that's ~$400.
Do this daily, we're looking at 1.5 terrabits per second. Don't know of any connections that fast.
Monthly, we're down to ~50 gigabit (rounding down). I can guarantee that a 50 gigabit connection will cost more than $400.
Annually, it's 'only' 4 gigabit, and I pay more than $100/month for my megabit class connection, which ISN'T utilized 100%, unlike my calc.
You don't normally need to figure out the bandwidth of the freeway because:
1. Generally 1 vehicle 'packet' is sufficient, and due to the high marginal cost per said vehicle, you normally only want to send one.
2. The roads are used for more than data shipment, which would be like trying to figure out how much bandwidth you have available for VOIP by looking at total circuit bandwidth.
Don't need to ship that much? You should be able to ship about 30 of them for $60, second day air. That's 45TB, or about 140 Mbit of 100% saturated traffic for a month. BTW, during my calcs for paying fedex to ship them, I think that weight might actually be enough of an issue to increase gasoline consumption - but I think I've established that even $800 would be cheap if you need to ship that ridiculous of an amount of data.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
You don't run a local copy of quickbooks off a remote database, you run quickbooks or another accounting solution in the cloud and pass HTML back and forth. More likely you break the solution up a bit.