Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Data Storage

Seagate Joins the HDD Price Hike Party, Blames AI for Spike in Demand (theregister.com) 37

Seagate has joined Western Digital in increasing the prices of hard drives, with rising demand due to the huge data requirements of AI taking the blame. AI is also behind a rapid growth in orders for Enterprise solid state drives. From a report: One of the big three makers of traditional rotating hard disk drives, Seagate informed customers that it is increasing prices effective immediately for new orders, but also for any changes to orders that are "over and above" previously committed volumes. This was disclosed in a letter from the company seen by analyst Trendforce, and comes just a couple of weeks after rival manufacturer Western Digital sent out a similar letter to customers informing them of price hikes.

According to Trendforce, the cause of the issue is two-fold: rising demand for high-capacity HDD products driven by the current craze for all things AI, and reduced production by hard drive manufacturers that means they are unable to meet the demand, leading to soaring prices. The rising demand comes from AI training requiring huge volumes of data: OpenAI's GPT-3 model is said to have been trained using 45TB of data, which may have been surpassed for newer models. And while flash-based SSDs boast high-speed and low-latency, storing everything in flash would still be costly. Seagate launched a 30TB hard drive line last year. Hard drive production was cut by as much as 20 percent over the last two years or so because of falling orders during the pandemic, and now manufacturers are unprepared for a sudden uptick in demand.

This discussion has been archived. No new comments can be posted.

Seagate Joins the HDD Price Hike Party, Blames AI for Spike in Demand

Comments Filter:
  • Flash is costly? (Score:5, Interesting)

    by tylerni7 ( 944579 ) on Thursday April 25, 2024 @08:01PM (#64425994) Homepage

    I don't think 45TB of SSD storage is really expensive when you're training with billions of dollars worth of H100 GPUs? That does not really make any sense.

    • Yeah, you're right, 45TB isn't very much nowadays and quality SSDs cost under $100/TB. A regular HDD is still a quarter the price, but the numbers still don't really add up.

      This whole narrative they're spinning sounds like BS.

      • by rayzat ( 733303 )
        You're not factoring in software. You need centralized storage which requires sw costs a few multiples of the HW price, and companies charge wildly different prices for SSD vs NLSAS. The being unprepared is just straight BS. They've been abundantly clear that they're building prices to levels they feel are long term profitable, which I mostly understand as they've been bleeding cash for awhile now.
    • by Luckyo ( 1726890 )

      I suspect they got the wrong prefix, and it's actually petabytes.

      • by unrtst ( 777550 )

        I suspect they got the wrong prefix, and it's actually petabytes.

        Nope. 45TB training data: https://www.springboard.com/bl... [springboard.com]

        I do wonder what gets stored as a result of the training. That 45TB is just data fed in to train the model. I doubt that (mostly) static dataset is the cause for this spike in demand.

        • by Luckyo ( 1726890 )

          That makes sense. It's talking about just the chatbot, which was trained on plain text.

          Image and video generators are the ones with petabyte grade training sets.

    • As with everything, it's cost/benefit. How much is the extra speed of SSDs worth to the AI companies, vs. the cheaper price of HDDs? Even if SSDs are relatively cheap, when you buy tens of thousands of these things, the dollars add up.

      • by ls671 ( 1122017 )

        Also, once you have saturated the IO bandwidth available, it doesn't make any difference if storage is backed by multiple HDDs or SSDs since storage isn't the bottleneck anymore. With enough drives, the storage isn't the bottleneck anymore, the link to the IO storage/IO controller become the bottleneck.

        • That's when one scales out and goes to load balancing hosts.

          For example, for backups, one place I worked at (which was bought out) was handled by eight servers, each of which had a good amount of RAM and 10+ drives. The machines used redundant 100gigE connections for the storage VLAN, and had a separate 100gigE connection on each going to a load balancer.

          To the client, all the client needed was to go to a S3 url provided by the load balancer, and let rip. This provided petabytes of storage with common ser

    • I don't think 45TB of SSD storage is really expensive when you're training with billions of dollars worth of H100 GPUs? That does not really make any sense.

      Demand is what’s costly. It enables Greed to charge what they want.

      Now stop pretending that only those with a billion dollars or more of GPUs, are the only customers affected by that. Duh.

      • by Anonymous Coward
        2TB of SSD storage is roughly $100. 45TB of SSD storage, even with redundancy (lets say RAID-1 for max money-waste) shouldn't be a blip compared to the total number of drives sold. I blame raw 8K video recordings for increased demand before I blame AI datasets.
        • And before AI, it was cryptocurrencies like Chia which supposedly caused price hikes.

          45 TB of storage is nothing. At the low end, buy six 22 TB drives, stick them in a NAS, use RAID-Z2 or RAID-6, call it done. It won't be fast, but a Raspberry Pi with an eight drive USB enclosure can do this.

          SSD-wise, that is definitely more expensive, but even with 45 TB of SSD, but Micron 9400, 30 TB, PCIe SSDs at $5000 a pop can cover that. Add four of those using RAID 10, and that solves that.

          IMHO, AI is about comput

    • Re:Flash is costly? (Score:5, Informative)

      by Rei ( 128717 ) on Friday April 26, 2024 @06:16AM (#64426690) Homepage

      Creating the training dataset is the *last* step. I have dozens of TB of raw data which I use to create training datasets that are only a few GB in size. Of which I'll have a large number sitting around at any point in time.

      Take a translation task. I start with several hundred gigs of raw data. This inflates to a couple terabytes after I preprocess it into indexed matching pair datasets (for example, if you have an article that's published in N different languages, it becomes (N * N-1) language pairs - so, say, UN, World Bank, EU, etc multilingual document sets greatly inflate). I may have a couple different versions of this preprocessed data sitting around at any point in time. But once I have my indexed matching pair datasets, I'll weighted-sample only a relatively small subset of it - stressing higher-quality data over lower quality and trying to ensure a desired mix of languages.

      But what I do is nothing compared to what these companies do. They're working with common crawl. It grows at a rate of 200-300 TB per month. But the vast majority of that isn't going to go into their dataset. It's going to be markup. Inapplicable file types. Duplicates. Junk. On and on. You have to whittle it down to the things that are actually relevant. And in your various processing stages you'll have significant duplication. Indeed, even the raw training files... I don't know about them, but I'm used to working with jsons, and that adds overhead on its own. Then during training there's various duplications created for the various processing stages - tokenization, patching with flash attention, and whatnot.

      You also use a lot of disk space for your models. It's not just every version of the foundation you train (and your backups thereof) - and remember that enterprise models are hundreds of billions to trillions of FP16 parameters in their raw states - but especially the finetune. You can make a finetune in like a day or so; these can really add up.

      Certainly disk space isn't as big of a cost as your GPUs and power. But it is a meaningful cost. As a hobbyist I use a RAID of 6 20TB drives and one of 2 4TB SSDs. But that's peanuts compared to what people working with common crawl and having hundreds of employees each working on their own training projects will be eating up in an enterprise environment.

    • by gweihir ( 88907 )

      It is a lie. Seagate just wants to increase profits.

    • I don't think 45TB of SSD storage is really expensive when you're training with billions of dollars worth of H100 GPUs? That does not really make any sense.

      I interpret that as not saying that 45TB is the only amount that is ever needed for AI. I read that 45TB was previously needed for one instance of OpenAI's GPT-3 to train. Other AI models may need more and there may be multiple instances of an AI model. Certainly there will be newer versions of AI that may require more data.

  • Right (Score:5, Insightful)

    by ArchieBunker ( 132337 ) on Thursday April 25, 2024 @08:01PM (#64425996)

    Hard drive production was cut by as much as 20 percent over the last two years or so because of falling orders during the pandemic, and now manufacturers are unprepared for a sudden uptick in demand.

    More like prices were simply too low and Seagate had to make a correction.

    • by Anonymous Coward
      CEO needed a few more yachts and mansions. Those things are expensive.
    • Re:Right (Score:4, Insightful)

      by quonset ( 4839537 ) on Thursday April 25, 2024 @08:15PM (#64426020)

      It's also a bullshit comment. Covid was March of 2020. After two years things started to get back to normal. However, during covid there was a massive influx of orders for machines. That took over a year to sort out.

      I have no idea where they get the idea there was falling orders during the pandemic. Maybe initially, for a few months, but after that companies couldn't produce equipment fast enough.

      There's always an excuse.

      • by crow ( 16139 )

        You may be right, but COVID had huge orders to support work-from-home, remote education, and the like. Those would mostly be systems with SSDs or smaller HDDs. The larger HDDs are probably a different segment of the market.

        But in general, whether you're talking DRAM or HDDs, there's a long history of prices going up with demand spikes, followed by prices plummeting with a glut of oversupply. The producers know this and try to adjust production to avoid the oversupply, but it's a fundamentally difficult p

        • by Luckyo ( 1726890 )

          There was a massive increase in demand for cloud services as well, because of all of the "masses working from home" novelty. So there would be a lot of demand for enterprise as well.

          I suspect this is more of it being a long lead product. Or the story is just generally inaccurate on everything, since it also seems to imply that entire chatGPT training data set could fit on two hard drives, and yet that is somehow causing a shortage of hard drives.

      • by Luthair ( 847766 )
        I imagine after the first 18-months there would have been a massive drop off as everyone who needed a computer had a relatively new one.
      • by edwdig ( 47888 )

        At the start of COVID, you were still looking at >$200 for a 1 TB SSD. 1 TB was over $400 a year before that.

        During the pandemic years was when NVMe prices fell from crazy high to affordable for everyone. That's when cheap laptops switched from hard drives to SSDs. PlayStation and Xbox switched to SSDs then too.

        He's probably telling the truth. It's not directly pandemic related, but there was a big drop in demand for traditional hard drives then.

    • by AmiMoJo ( 196126 )

      More like profits were simply too low for next quarter's bonus and Seagate had to make a correction.

      FTFY.

    • by gweihir ( 88907 )

      "Too low" to satisfy corporate greed, that is. Still entirely profitable, but what nice asshole CEO would not use a chance to rip of his customers.

  • There is increased demand so they cut production.

    That can't be the story.

    Perhaps they're offline to switch to HAMR or something.

  • Ever since Hitachi sold their HDD division Ive noticed HDD prices stop dropping and remaining stagnant at best. I paid $50 for 4TB external drive a *long* time ago, and havenâ(TM)t seen that beat for some time. I did get a 16 TB external for $200 or so, but that was parity at best Ive been heavily using Samsung SSDs, but having a large drive is still beneficial for longer term storage

"Facts are stupid things." -- President Ronald Reagan (a blooper from his speeach at the '88 GOP convention)

Working...