Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Cloud Data Storage IT

Dropbox Moves Users' Data Off Amazon S3 to Its Own Infrastructure 45

Reader Richard_at_work writes: Dropbox today announced that it has been working on a "top secret" project called Magic Pocket for the past two and a half years to get data of more than 500 million users from Amazon S3 to its own custom-built infrastructure. The company says that it has migrated over 90% of its users' data so far. Dropbox's relationship with AWS isn't completely over, however, as they will continue to use AWS for specific regional data stores where there is a requirement.
This discussion has been archived. No new comments can be posted.

Dropbox Moves Users' Data Off Amazon S3 to Its Own Infrastructure

Comments Filter:
  • by vux984 ( 928602 ) on Monday March 14, 2016 @02:08PM (#51695161)

    I'd say there is no surprise to see them vertically integrate; they're large enough to leverage the economies of scale of running their own storage for themselves; rather than to pay someone else to do it.

  • by Anonymous Coward

    Does this mean that I'll finally be able to download my files as a .zip archive? I have some directories in Dropbox with a lot of files in them, and I get some bullshit message about the folder being too large to download, or something like that, when I try to use the functionality that exports the directory a .zip archive. It's not even that much data. Maybe like 5 GB in total. But I always get that fucking message, and it never lets me download these directories as an archive. I even bought the pro subscr

    • by Anonymous Coward on Monday March 14, 2016 @02:22PM (#51695235)

      Unless they get new software that supports the ZIP64 extension [wikipedia.org] probably not.

    • That's likely a limitation of the .zip format and/or how they generate them for export. It's hard to generate .zip data on-the-fly because the file has headers at the end as well as the beginning, so they are probably generating an actual file and storing it before sending. They don't want to store .zip files for large folders for long enough for you to download. Not to mention the CRC32 checksums required for the format are still computationally intensive at a large scale.

      I'm sure it's possible to gener

      • by glwtta ( 532858 )
        A modern CRC32 implementation, on a modern CPU can reach a throughput of 10GB/s (https://blog.fastmail.com/2015/12/03/the-search-for-a-faster-crc32/), I really doubt that's much of a bottleneck.
      • by chrish ( 4714 )

        Former Info-ZIP maintainer here!

        Classic zip files are limited to ~2GB (or ~4GB if your implementation is using 32-bit unsigned int). I can't remember 100% but it might also be limited to ~65,000 files per archive, too.

        If Dropbox attempts to build the entire .zip in memory before sending it over the wire, you could be hitting RAM limitations on the server. If Dropbox builds the .zip on disk before sending it, you might be running into a connection timeout.

        A tarball would be a better solution in this use case

        • If Dropbox attempts to build the entire .zip in memory before sending it over the wire, you could be hitting RAM limitations on the server. If Dropbox builds the .zip on disk before sending it, you might be running into a connection timeout.

          You don't have to hold it all in-memory. You can not enable HTTP byte-range requests and generate it on-the-fly - flushing it out of memory as it goes out the wire. Zip and tar are both the same in that you have to read in a whole file to calculate the checksum for each file's header before sending the file. This isn't as resource-friendly as a running checksum that can be computed and stuck in a head at the end of a file's data since you'd have to read every file twice if you don't want higher memory re

          • And yes, that would mean Dropbox would probably have to write their own implementation of ZIP. Or at least re-arrange an existing library so that its output is not a single, monolithic file.

    • by Richard_at_work ( 517087 ) on Monday March 14, 2016 @03:20PM (#51695517)

      Isn't that the whole goddamn point of the cloud? I can just use my goddamn web browser to interact with it, instead of a custom native app?!

      Uh, no. The cloud is whatever the people running the cloud want it to be, you just want it to be something different - there are no rules regarding what the cloud must do.

      At the end of the day, Dropbox is a syncing platform - that "goddamn desktop client" is the entire thing Dropbox is built around. If you wanted a different feature set, you chose the wrong product to use - there's no shame in admitting that, just don't blame the tool.

      Dropbox has issues creating zip files for huge data sets, because it doesn't want to commit a massive amount of resources to building that zip file, its as simple as that - if that's the way you are using Dropbox, then you are using it wrongly and not as its intended to be used.

    • Does this mean that I'll finally be able to download my files as a .zip archive? I have some directories in Dropbox with a lot of files in them, and I get some bullshit message about the folder being too large to download, or something like that, when I try to use the functionality that exports the directory a .zip archive. It's not even that much data. Maybe like 5 GB in total. But I always get that fucking message, and it never lets me download these directories as an archive. I even bought the pro subscription, and it still won't let me easily download an archive of my directories! I don't want to install the goddamn desktop client just to copy a few directories of files from Dropbox! Isn't that the whole goddamn point of the cloud? I can just use my goddamn web browser to interact with it, instead of a custom native app?! Holy fuck, all I want to do is download an archive of a directory in Dropbox. Why the fuck do they make it impossible to do that easily?! Does this move to their own infrastructure finally make it possible for them to let me download my directories as .zip archives?!

      Standard Zip compression/file format limits archive sizes to 4GB.

  • Why use some one else cloud when you could make your own?
    • by Overzeetop ( 214511 ) on Monday March 14, 2016 @02:26PM (#51695259) Journal

      Because your own cloud server requires maintenance, and when your cloud server goes down you're SOL until you, personally, have the time to troubleshoot and fix it.

      How do I know this? My server developed a tic in it's network card, corrupting about 1 bit in every 5,000,000,000 or so. Took me a year to find that I actually had a problem with the server, and then two weeks to narrow down what the problem actually was. As a side effect I also found that I had a dodgy drive cable (one of 6 in the system) which showed no outward sign of problems because CRCs were correcting those bit problems.

      Could this happen to a cloud service? Sure. Are they likely to catch it? Faster than I am, in all likelihood. Will it take them less time to correct it? You're damn sure it will. And for the cost of the time I spent troubleshooting my server, I could have paid for a decade of service from two cloud services so that I had 100% redundancy, and still had money to go buy a kegerator so I could drink beer instead of chasing bit problems.

      • by hawguy ( 1600213 ) on Monday March 14, 2016 @02:58PM (#51695415)

        Because your own cloud server requires maintenance, and when your cloud server goes down you're SOL until you, personally, have the time to troubleshoot and fix it.

        How do I know this? My server developed a tic in it's network card, corrupting about 1 bit in every 5,000,000,000 or so. Took me a year to find that I actually had a problem with the server, and then two weeks to narrow down what the problem actually was. As a side effect I also found that I had a dodgy drive cable (one of 6 in the system) which showed no outward sign of problems because CRCs were correcting those bit problems.

        Could this happen to a cloud service? Sure. Are they likely to catch it? Faster than I am, in all likelihood. Will it take them less time to correct it? You're damn sure it will. And for the cost of the time I spent troubleshooting my server, I could have paid for a decade of service from two cloud services so that I had 100% redundancy, and still had money to go buy a kegerator so I could drink beer instead of chasing bit problems.

        Don't count on it being any easier to troubleshoot rare network glitches with a cloud provider. Admittedly most of the time you can just launch a new instance and the problem goes away, but not always.

        The first thing they'll do is close your ticket with "can not reproduce", so it'll be up to you to provide a test case to reproduce the problem. Which may not be trivial since you have limited visibility into their systems. And you have to convince them that it's not a security group problem, and not a local configuration problem (like iptables). And even then they may dismiss your ticket because you're not running their officially supported kernel version, so you'll have to fight with them to accept that it is a real problem, or capitulate and try to repro on their supported software version.

        It took me 6 months to convince AWS support that there was a rare bug in network setup (not all subnets were reachable) that only hit once ever 500 - 1000 instance launches. They finally admitted that it was some sort of rare convergence problem in their network stack and that they are not monitoring for such problems so it won't recur.

        At least when you own the hardware, you have full visibility into the entire stack, and while you can sitll have different teams pointing the finger at each other, they all work for the same company so management can step in and tell them to stop pointing fingers and work together to find the solution.

    • Why use some one else cloud when you could make your own?

      Have you ever tried to roll your own industrial strength, production quality cloud infrastructure? That shit gets expensive, and it requires you to do significant investment up front. At this point we are dealing with issues of capital flow, acquisition (or rental) of equipment, depreciation of said equipment, etc.

      Renting cloud services, on the other hand, it changes the equation. What used to be a capital expense, it becomes an operational expense. It might sound more expensive down the line, but it stil

  • by OverlordQ ( 264228 ) on Monday March 14, 2016 @02:18PM (#51695209) Journal

    So they basically re-did everything that backblaze did for it's storage pods.

    • by hawguy ( 1600213 )

      So they basically re-did everything that backblaze did for it's storage pods.

      When you're big enough, reinventing the wheel is worth it because then the wheel is customized for your use case. Spending a million dollars to engineer a custom solution is worth it to eek out a few percent better performance when you're deploying $25M+ worth of hardware.

      There's a reason why Boeing doesn't just use off the shelf automobile wheels on their aircraft even though it would save them ten's or hundreds of thousands of dollars per aircraft, even though an automotive wheel is proven technology that

      • "even though an automotive wheel is proven technology that does pretty much the same thing as an aircraft wheel."

        Well, if you mean they are both round and made of rubber, then yes. An aircraft wheel has to go from 0 to whatever the landing speed of the aircraft is in just a second or so. They also need to handle fairly extreme temperature ranges, ie: 150 deg F tarmac on a hot day to freezing or sub-zero temps at altitude, and back again. And they need to be able to do both of those, and more, repeatedly
        • by hawguy ( 1600213 )

          "even though an automotive wheel is proven technology that does pretty much the same thing as an aircraft wheel."

          Well, if you mean they are both round and made of rubber, then yes. An aircraft wheel has to go from 0 to whatever the landing speed of the aircraft is in just a second or so. They also need to handle fairly extreme temperature ranges, ie: 150 deg F tarmac on a hot day to freezing or sub-zero temps at altitude, and back again. And they need to be able to do both of those, and more, repeatedly and with as low weight as practical and safe. I doubt most automotive tires would hold up well under same conditions.

          See, that's my point exactly. A wheel is a wheel as long as you just want someone to roll smoothly across pavement, but when you start looking at application constraints, not every wheel is suitable.

          If Dropbox just wants to store any old data, they can use Backblaze's design because data is data. But if they want it to work optimally with their application, then it's worth coming up with a custom design.

      • There's a reason why Boeing doesn't just use off the shelf automobile wheels on their aircraft even though it would save them ten's or hundreds of thousands of dollars per aircraft, even though an automotive wheel is proven technology that does pretty much the same thing as an aircraft wheel.

        Other than the fact the load on each tire on an aircraft is orders of magnitude higher than that of a car, and it has to maintain compliance at -50 deg C. Oh, and handle going from ~0 RPM to ~1000 RPM in the span of a second or two, whilst being heavily compressed. But other than that, they really do the same thing.

    • by swb ( 14022 )

      "Hi, Backblaze? This is DropBox calling. We're sick of sucking on Amazon's tit and wanted to do our own storage. Mind sending us all your details so we can do it just like you did? I'm pretty sure we couldn't do it better than you do, and boy do we love reading your hard disk diagnostic reports -- we're already hitting all the Best Buys we can find for disks."

      Even if Backblaze has "opened" their storage system so anyone can copy it, who says it's the optimal way to do anything? I'm guessing at Dropbox

  • Calm down. Just because Dropbox is big enough to "build their own cloud" doesn't mean it's right for everyone. There are always exceptions to the rule. Google, Facebook, Dropbox etc. are different. Your startup still needs the public cloud, be it AWS, Azure, or Google. When you get big enough, do what you want.
  • More storage with their free tier! Seriously, guys, 2GB doesn't cut it anymore. Your competitors like Google Drive and Microsoft Onedrive are offering five times more storage for their free tier customers.

If you don't have time to do it right, where are you going to find the time to do it over?

Working...