Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Cloud Data Storage NASA The Almighty Buck

NASA To Launch 247 Petabytes of Data Into AWS, But Forgot About Egress Costs Before Lift-Off 121

NASA needs 215 more petabytes of storage by the year 2025, and expects Amazon Web Services to provide the bulk of that capacity. However, the space agency didn't realize this would cost it plenty in cloud egress charges. As in, it will have to pay as scientists download its data. The Register reports: The data in question will come from NASA's Earth Science Data and Information System (ESDIS) program, which collects information from the many missions that observe our planet. NASA makes those readings available through the Earth Observing System Data and Information System (EOSDIS). To store all the data and run EOSDIS, NASA operates a dozen Distributed Active Archive Centers (DAACs) that provide pleasing redundancy. But NASA is tired of managing all that infrastructure, so in 2019, it picked AWS to host it all, and started migrating its records to the Amazon cloud as part of a project dubbed Earthdata Cloud. The first cut-over from on-premises storage to the cloud was planned for Q1 2020, with more to follow. The agency expects to transfer data off-premises for years to come.

NASA also knows that a torrent of petabytes is on the way. Some 15 imminent missions, such as the NASA-ISRO Synthetic Aperture Radar (NISAR) and the Surface Water and Ocean Topography (SWOT) satellites, are predicted to deliver more than 100 terabytes a day of data. We mention SWOT and NISAR because they'll be the first missions to dump data directly into Earthdata Cloud. The agency therefore projects that by 2025 it will have 247 petabytes to handle, rather more than the 32 it currently wrangles. NASA thinks this is all a great idea. And it will -- if NASA can afford to operate it. And that's a live question because a March audit report [PDF] from NASA's Inspector General noticed EOSDIS hadn't properly modeled what data egress charges would do to its cloudy plan.
NASA "has not yet determined which data sets will transition to Earthdata Cloud nor has it developed cost models based on operational experience and metrics for usage and egress," the Inspector General's Office wrote. "As a result, current cost projections may be lower than what will actually be necessary to cover future expenses and cloud adoption may become more expensive and difficult to manage."

"Collectively, this presents potential risks that scientific data may become less available to end users if NASA imposes limitations on the amount of data egress for cost control reasons."
This discussion has been archived. No new comments can be posted.

NASA To Launch 247 Petabytes of Data Into AWS, But Forgot About Egress Costs Before Lift-Off

Comments Filter:
  • The Cloud industry (Score:5, Insightful)

    by OrangeTide ( 124937 ) on Friday March 20, 2020 @02:04AM (#59852138) Homepage Journal

    It's all about the billing.
    You get people coming and going.
    You bill them every month they leave data sitting around idle.

    If you think Amazon can magically build this cheaper than you can build it yourself, no way. The advantage is you can get started now as well as avoid a huge initial capital investment. But when you pay as you go, you'll pay dearly.

    • by Mostly a lurker ( 634878 ) on Friday March 20, 2020 @05:07AM (#59852378)

      Build your own can save money. However, you had better have people involved who are used to dealing with data warehouses of this size. It is not trivial. Performance, security and (especially) reliability are all major challenges. This needs to be spread across several locations with fast data links between them. Consideration must be given to how to efficiently replicate across sites, protecting against data loss and allowing optimum servicing of large access requests. To really save money, a hierarchical storage solution will be needed, with offline storage of infrequently needed data. With this kind of scale, there is no substitute for experience.

      As against this, it may allow you to build in features that are not available with a generic cloud storage solution

      • Well, I gather that something like NASA can easily attract such profiles...
      • by Junta ( 36770 )

        If you have data warehouses of this size, you had better have people involved with experience period. In this context as well as many other cloud contexts, the skillset required is still going to be needed on the client end.

        In some SaaS contexts the hosting service does reduce the client-side skill needed (though I feel like this is mostly a failing of software development in giving up on making their software easy to deploy and has gotten worse as of late).

        When it comes to the geographically distributed i

      • This is it right here.

        Contracting out can be done for a variety of reasons. Specialization is one of the better ones. Most organizations have no concept of handing such large amounts of data.

        There's nothing 'wrong' with paying people to do things. I don't change my own oil in my car. I don't install my own windows in my house. I've contracted out for those. Even though, I'm pretty sure I could figure it all out. My employer contracts out its physical security. They contract out their building management...

        T

        • by 2TecTom ( 311314 )

          This is it right here.

          Contracting out can be done for a variety of reasons. Specialization is one of the better ones. Most organizations have no concept of handing such large amounts of data.

          There's nothing 'wrong' with paying people to do things. I don't change my own oil in my car. I don't install my own windows in my house. ....

          Sure, but let's not forget that there's no reason that this couldn't be done less expensively by making this a scientific community effort. There needs to be public domain open archives, and with open, community oversight. Academic institutions and public institutions need to develop public grass roots distributed infrastructure or the economic elites will continue to control the very tech we've come to depend upon.

      • by sjames ( 1099 )

        NASA has been warehousing their own data for years. It's just that for some reason this year, management forgot that rental tends to cost more than owning over the long term.

      • They already have the people to do this as they have been hosting large amounts of data already. From the summary "NASA operates a dozen Distributed Active Archive Centers (DAACs) that provide pleasing redundancy." They are currently managing 32 petabytes but someone decided that it would be a good idea to move it to AWS without reading all of the fine print. Oops.

        It was a matter of increasing their storage capability and possibly adding networking bandwidth between facilities.. They already had the all of

  • by RotateLeftByte ( 797477 ) on Friday March 20, 2020 @02:24AM (#59852154)

    they have you by the short and curlies. Easy to upload. Costs several arms and legs to download.
    To the likes of Amazon and Microsoft this makes it really,really expensive for companies to jump ship to another provider.
    All the time, your base costs will rise and rise and rise.

    Perhaps it is time for those bean counters everywhere to think again. You can't just change a Cloud provider like you can a cleaning or transport company. Perhaps it is time for on premises cloud to be re-evaluated?

    In this time of uncertainty, I don't think that it will be too long before a company that is 100% based in AWS goes bust and is potentially bought out by one that is 100% in Azure only to find that the costs of moving the data from one to the other exceeds the net worth of the company that went broke.
    but if you are Microsoft or Amazon or any other decent cloud provider, you are sitting pretty right now. Your revenue stream is pretty solid unless thousands of your customers go broke...

    • by dwywit ( 1109409 ) on Friday March 20, 2020 @02:28AM (#59852162)

      "unless thousands of your customers go broke..."

      Hang on there. It would take an international crisis....

      Oh wait.

    • Cloud is MORE expensive Cloud means your data is on another persons computer. Don't blame the beancounters, blame the top five highest paid executives, and ensure everyone involved looses their bonuses until cost neutral at best. First it was the Iraqi war - where no one got fired for nepotism and personal gains - and where real or percieved conflict of interest never mattered. Nowadays, it seems like no-one gets into trouble for defective recurrent costs. Not like the good old days when/if Walmart manage
    • by 0100010001010011 ( 652467 ) on Friday March 20, 2020 @09:14AM (#59852872)

      "Amazon deletes 20 years of research because the auto-pay creditcard was declined".

    • I don't understand how NASA could not fully understand AWS charging scheme before using it.

      I'm just a casual home user, and I recently migrated my offsite backups (a couple of TB) from a hard drive in my bank's safe deposit box to S3. Their new "Glacier deep archive" is ridiculously cheap, so it only costs me a fraction of my old safe deposit box fee per year.

      However, even I know that to actually read out and download that data, it would cost me about as much as buying a whole new hard drive of that capaci

      • I don't understand how NASA could not fully understand AWS charging scheme before using it.

        Makes perfect sense to me. You are spending your own money; They are spending somebody else's.

  • by 93 Escort Wagon ( 326346 ) on Friday March 20, 2020 @02:57AM (#59852198)

    This doesn’t seem like a situation where “the cloud” is going to save you money over running your own infrastructure. But I’m sure the middle managers didn’t seriously listen to what the sysadmins had to say on the subject, assuming they ever talked to them at all.

    • by Pentium100 ( 1240090 ) on Friday March 20, 2020 @03:04AM (#59852202)

      Yeah, IMO if you have enough data to fill a datacenter, it would be cheaper to have your own than pay someone else to rent space.

      Cloud works for small things - putting 100GB in the "cloud" is cheaper than buying my own server because the cloud provider can use split one server between multiple customers. On the other hand, if I need an entire server, I might as well buy one and only pay for colocation. If I need a datacenter, well, it would be better to have my own.

      • Re: (Score:2, Informative)

        by bobbied ( 2522392 )

        I see your point, but I think that if you need but one server, you are better in the cloud. It's when you get to a couple of racks full that the costs might turn into your favor. But even then, I doubt it will save you all that much.

        Why?

        There is more to running a server than the cost of the hardware and a place to put it.

        Cloud providers can be lower costs because they provide software licenses, networking support, hardware maintenance and multi-site redundancy that is pretty darned expensive to do when

        • Buying two servers and renting colocation in two datacenters would most likely be cheaper over time than renting the same amount of power in the cloud, since renting means I buy the server for the provider and also pay for profit.

          Of course, if the traffic is very spiky and the servers stay idle most of the time, then yeah, probably renting would be cheaper.

          In this case, however, the 250PB of data is going to be stored on lots of hard drives that will not be storing data for other customers, meaning that NAS

      • Hire someone like Backblaze to build you out on-premesis if you don't think your IT can handle it.

    • This doesn’t seem like a situation where “the cloud” is going to save you money over running your own infrastructure. But I’m sure the middle managers didn’t seriously listen to what the sysadmins had to say on the subject, assuming they ever talked to them at all.

      I guarantee this is what happened. The exact same scenario played out where I used to work, and indeed we were all going, "What about the bandwidth?" and they just sort of ignored us.

      • by Kjella ( 173770 )

        I guarantee this is what happened. The exact same scenario played out where I used to work, and indeed we were all going, "What about the bandwidth?" and they just sort of ignored us.

        Asking the in-house sysadmins what they think about moving to the cloud is a bit like asking in-house developers what they think about outsourcing or what the support staff thinks about a chat bot AI, whatever legitimate concerns they have is going to get mingled with very self-serving arguments. If you work in IT you've probably worked on automating processes and there's always people who claim to be essential and that workflows need the human touch when in reality it doesn't or the problems it causes can

        • by dwywit ( 1109409 )

          It's not the actual cost to move the data (in terms of electricity, floor space rental, hardware leasing, etc), it's what the provider can charge before the customer blinks. It's got nothing to do with faster lines or moving the processing load around, or cache..

          AKA charge what the market will bear. Those folk at Amazon, Azure, et al aren't stupid. They know where to apply charges to maximise revenue. It's not in the storage, it's in the move-data-set-A-to-location-Z where they can make some $$$

          • The bigger problem is nobody seems to know what the actual cost is, regardless of the "calculators" and "estimators' cloud vendors provide. The pricing models are so convoluted and fine grained that you literally don't know what you might spend.

            • The admins that were maintaining the in-house or colo'd systems will have a reasonably good idea what it costs, and will also know that Amazon pays a lot less given the bulk discounts on hardware, bandwidth, etc. that they almost certainly get.

        • Asking the in-house sysadmins what they think about moving to the cloud is a bit like asking in-house developers what they think about outsourcing or what the support staff thinks about a chat bot AI, whatever legitimate concerns they have is going to get mingled with very self-serving arguments.

          You would take what they say with a grain of salt, and you don't need to give them a say in the end decision; but any objections they raise (which I guarantee would include noting the costs of actually getting at the data, plus a knowledge of just how much data needs to be stored) would educate a smart middle manager regarding what questions they need to ask the cloud provider. If you can't actually get valid answers to the objections the sysadmins raise, then the objections are legitimate rather than just

        • by sjames ( 1099 )

          Of course, asking the in-house sysadmins is also like asking a neurosurgeon if Bubba the auto mechanic would be a good choice to remove your brain tumor, He's pretty sure the saw he bought at Home Depot last year can get through your skull OK and he says he can do it for a hundred bux.

    • This probably wasn't driven by middle managers as much as upper management who loved the sales pitch and have told them to implement this new cloud thing they have been hearing about.

    • Rule 1: Never get between a manager and their bonus.
      Beware folks missing "Cloud Migration" on their resumes
  • Resemblance (Score:4, Interesting)

    by hcs_$reboot ( 1536101 ) on Friday March 20, 2020 @03:07AM (#59852204)
    Am I the only one having the impression that the current NASA and Boeing look alike, more and more. Many things and people changed since Voyager 2 ; water under the bridge...
    • Re:Resemblance (Score:5, Insightful)

      by twocows ( 1216842 ) on Friday March 20, 2020 @07:33AM (#59852616)
      I think there are still plenty of incredibly smart people working at both. The problem seems to be that the leadership has changed and that those incredibly smart people are no longer running those organizations.
      • Leadership is most important element in a team. Remember when Microsoft was hiring the best young talents in the 90s... (hint: while the leadership was not so a talent).
  • Look at Cern (Score:5, Informative)

    by flux ( 5274 ) on Friday March 20, 2020 @03:13AM (#59852210) Homepage

    Maybe they should have asked Cern how they managed their 200PB storage surface two+ years ago: https://indico.cern.ch/event/6... [indico.cern.ch]

    I can't imagine it being more expensive than Amazon. They realize Amazon is in this business for profit, right? I think nowadays you can fit that much data in a couple racks; make that 6 racks in different data centers for redundancy with ie Ceph.

    • by dwywit ( 1109409 )

      Once again, the managers listen to the beancounters instead of the scientists.

      • "listen to the beancounters" Not very good bean counters!
        Oh right they work for the government!

        Just my 2 cents ;)
      • by sjames ( 1099 )

        Any half decent bean counter could have told them that rental only makes sense for short term temporary needs.Beyond that it gets more expensive than owning fast.

    • by Teun ( 17872 )
      But but, it has to be a local company where the pork barrel is placed.
      Just imagine the kick backs you as a politician would miss out on with this in-house crap, additionally they use unacceptable commie open source software...

      America First!
    • by jabuzz ( 182671 )

      Well if you go with a GPFS storage appliance (either the IBM ESS or Lenovo DSS offering basically the first is PowerPC the second is x86) then you can get ~5.5PB in a rack, so you are looking at somewhere around 50 racks in total. That's with 16TB spinning disks. I am not sure 16TB disks are an option yet, but I am just scaling up from the system at work.

      From an administration perspective it's a basically a doddle, *if* you are an experienced in GPFS. Put another way 50 racks of ESS/DSS is not that much dif

  • by Way Smarter Than You ( 6157664 ) on Friday March 20, 2020 @03:21AM (#59852222)
    It isn't that Amazon doesn't provide the cost data before you buy in.

    The problem is that if you've never used AWS before then you don't even know what questions to ask and there are endless pages in different places that describe billing costs that AWS has effectively highly obfuscated the real costs which is only a step away from making the costs secret.

    And don't even get me started on that piece of shit hair pulling cost explorer tool they give you. It is just good enough that it feels like all the information you;d need is there... until you need it.

    AWS of course won't help you calculate your real costs up front before you put your foot in the fire.

    Feel the burn!
    • by mobby_6kl ( 668092 ) on Friday March 20, 2020 @05:24AM (#59852420)

      Glad it's not just me. I only use S3 for backup and some small personal projects but the bills I just incomprehensible. Usually they're small enough for me not to care but trying to track down a sudden increase in fees is a nightmare.

      Still, NASA should've done their research. If they can send a rover to Mars, they should be able to send an MBA into the Cloud.

      • by nazzdeq ( 654790 )
        S3 is cheap. We have just over 50tb in S3 and pay only about $900 a month. We automatically transfer to Standard IA after 30 days. S3 is the cheapest big service on AWS.
      • Glad it's not just me. I only use S3 for backup and some small personal projects but the bills I just incomprehensible.

        If that's all you're using it for, you may be better off with one of two other options:

        Backblaze B2 has an API, and is $5/TB/month. They're also pretty good about prorating the billing; you pretty much pay for the days stuff is on the server. It's fast, they don't charge for either uploads or downloads, and it's got all the expected security things.

        The other option I've used with good success is Wasabi. At $4.90/TB/month, they're a hair cheaper than Backblaze, but the bigger draw is that they use the same A

  • Forgive me, as I'm not insane enough to ever use a "cloud", but... *EGRESS CHARGES*??

    How is that not the first think that would stop anyone form ever using it, with those criminal hostage-taking shakedown rules??

    And how would that work anyway?
    What's the difference from it serving a website containing the data, or me downloading the data via whatever this egress pathway is?

    This is nuts. End-stage crapitalism. Shoot the horse, it's got pustules coming out of the tentacles coming out of its neck!

    • by Anonymous Coward

      It's not different from serving a website on AWS, they charge egress for that too.

    • Forgive me, as I'm not insane enough to ever use a "cloud", but... *EGRESS CHARGES*??

      Is that where they shunt your data out of a side door and you have to pay to go back in through the front entrance again?

      • Forgive me, as I'm not insane enough to ever use a "cloud", but... *EGRESS CHARGES*??

        Is that where they shunt your data out of a side door and you have to pay to go back in through the front entrance again?

        Of course not.. Your data comes back in another side door.. You don't want to clog up the front door with this kind of thing and slow down the entry of new business..

      • by ceoyoyo ( 59147 )

        I think it's traditionally an upper storey window, but essentially yes.

    • Exactly what provider do YOU use that doesn't charge for bandwidth?

    • by sjames ( 1099 )

      That's the thing with the cloud. You get nickled and dimed to death. It costs you to upload the data, it costs you to keep the data there, it costs to get the data back. It costs to process the data in place.

  • NASA pays to upload bulk data to AWS. Now they have to pay for each and every data access!! Just to funny that no one thought about the access charges. Government dweebs spending other peoples money.

    Just my 2 cents ;)
    • It's not like AWS is going to warn you about the access charges...

      Have you ever tried to make sense of how AWS charges for the services they offer? I dare you, go look at their published costs for all the services you can use and try to figure out what it will cost to do something. Unless you are familiar with everything they are talking about and exactly what your hosted application does with Amazon's services down to the number of bytes of data, at rest, in motion or being transferred here to there by

  • I mean that is something you find out if you invest 15 min to read the list of costs associated with AWS....

    • Sure, but people are really bad at estimating how much traffic their site will generate.
    • by MikeKD ( 549924 )

      I mean that is something you find out if you invest 15 min to read the list of costs associated with AWS....

      Also, if you spend a few generations denigrating government workers, don't be surprised when people who have options decide to work someplace where they won't get shit upon by 30-40% of the population and one of the major political parties.

  • The government purchasing agents failed to properly calculate the cost of using the solution.
    Oops Oh well they are the government,
    "With government, The real surprise is when they do something right!"

    Just my 2 cents ;)
  • by tronicum ( 617382 ) * on Friday March 20, 2020 @04:21AM (#59852324)
    Backblaze (https://www.backblaze.com/b2/cloud-storage.html) is way cheaper than AWS. Just use their calculator and see.
    • by Anonymous Coward
      At the very least NASA could hire them to design and build the storage and to train staff. If NASA's not competent to do this then fine, job out build+training, but not data possession or on-going operations. This is taxpayer $ they beg for; if they can't manage it as well as AWS then flush out the upper layers of the hierarchy. Fear for their pensions will tighten up their act. It's not rocket science. The data is more valuable than the spacecraft which collected it. Just not as expensive.
  • If NASA has bandwidth to spare, perhaps they could host their own massive caching Squid proxy that uses their AWS data store as a back end. The bigger the cache, the lower the egress charges. Then at least the data storage NASA is responsible for isn't critical - dead drives can just be pulled without regard for data loss.

  • Silly NASA (Score:5, Insightful)

    by buss_error ( 142273 ) on Friday March 20, 2020 @05:43AM (#59852450) Homepage Journal

    Er... NASA helped develop Openstack. They should just put up a Cloud Files SWIFT cluster.

  • No one ever pays list cost for AWS services. Particularly egress charges. And if you were silly enough to accept the charges, buy a direct connect and egress it yourself. That will save you millions.
  • Before making decisions on clouding, you need to speak with someone other than a salesperson from Amazon or a reseller.

    It's good to have a "cloud practitioner [amazon.com]" on staff to run your cloud ideas by. This concept of no-cost upload, but high cost retrieval is well known.

    However, if your agency heads pushing this contract are doing so for their personal enrichment or their colleagues personal enrichment, then, this doesn't really matter.

  • If you look at NASA today, most of what I see are young people taking selfies in front of control stations and people goofing off while riding on the historic legacy of what NASA used to be. While I'm sure some engineers are left, they are hidden away and the business majors have taken over
  • You can check in anytime you like, but you can never leave (unless you pay us a lot of money).

    WHA HA HA HA!!!!

    Glacial Storage vs. a Tape Robot?

    If there were put on LTO-8 15/30TB tapes it would take between 800 and 1600 tapes to hold all of this data.

    Roughly $16,000 in tapes + The storage library robots

    What ever happened to mageneto optical anyway?

    • by ceoyoyo ( 59147 )

      I once worked with some people who got a big grant that involved collecting some data and analyzing it. One of them walks into my office and says "we're going to be getting X number of datasets and putting them in a database. What kind of server do we need?"

      So I pointed to the five year old computer on my desk and said "this machine holds about ten times as much data as you're talking about. So you need any old server, plus a backup strategy."

      So they bought a quarter million dollar server and storage array.

    • by Wolfrider ( 856 )

      --You forget: that's just for the FIRST copy...

  • People at NASA can't be this dumb. Worst case solution: charge people to download the data at cost. Yes, that requires a whole login + charging infrastructure.

  • Comment removed based on user account deletion
  • ..how many satellites forgot to pay their ISP bill.

  • Comment removed based on user account deletion
  • ...wouldn't consider that maybe storing data in a cloud had costs at some point?

  • Exactly what would be expected.

    Capitalism - how to get to pay blood in order to eat shit, with an additional post-dated charge if you ever fail to confirm how beautiful the experience of eating shit is.

    Thatcher (hawk, spit) taught us well. Including that dandified grandson of a whore-monger currently in the White House and planning his 3rd term.

"Conversion, fastidious Goddess, loves blood better than brick, and feasts most subtly on the human will." -- Virginia Woolf, "Mrs. Dalloway"

Working...