Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Data Storage HP Hardware Technology

HPE Unveils The Machine, a Single-Memory Computer Capable of Addressing 160 Terabytes (venturebeat.com) 150

An anonymous reader quotes a report from VentureBeat: Hewlett Packard Enterprise announced what it is calling a big breakthrough -- creating a prototype of a computer with a single bank of memory that can process enormous amounts of information. The computer, known as The Machine, is a custom-built device made for the era of big data. HPE said it has created the world's largest single-memory computer. The R&D program is the largest in the history of HPE, the former enterprise division of HP that split apart from the consumer-focused division. If the project works, it could be transformative for society. But it is no small effort, as it could require a whole new kind of software. The prototype unveiled today contains 160 terabytes (TB) of memory, capable of simultaneously working with the data held in every book in the Library of Congress five times over -- or approximately 160 million books. It has never been possible to hold and manipulate whole data sets of this size in a single-memory system, and this is just a glimpse of the immense potential of Memory-Driven Computing, HPE said. Based on the current prototype, HPE expects the architecture could easily scale to an exabyte-scale single-memory system and, beyond that, to a nearly limitless pool of memory -- 4,096 yottabytes. For context, that is 250,000 times the entire digital universe today.
This discussion has been archived. No new comments can be posted.

HPE Unveils The Machine, a Single-Memory Computer Capable of Addressing 160 Terabytes

Comments Filter:
  • by UnknownSoldier ( 67820 ) on Tuesday May 16, 2017 @06:32PM (#54430255)

    > it could require a whole new kind of software.

    Huh? You mean it not a von Neumann or Harvard architecture because the article doesn't lead me to _that_ conclusion:

    The new prototype has 160 TB of shared memory spread across 40 physical nodes, interconnected using a high-performance fabric protocol. It has an optimized Linux-based operating system (OS) running on ThunderX2, Caviumâ(TM)s flagship second generation dual socket capable ARMv8-A workload optimized System on a Chip.

    So basically 4 TB / node. Is each node have independent memory or not?
     

    • by imgod2u ( 812837 )

      I would wager to guess that each node lives in some subregion of the memory address. And that each OS instance (or one giant distributed OS) accesses all addresses uniformly.

      It's certainly not infeasible even without memristor tech. But I wonder what benefits it has. The whole point of having localized nodes is to take advantage of the travel latency. Unless this is optimized specifically for embarrassingly parallel data feed-forward tasks, which even modern GPU workloads aren't anymore.

      • by dbIII ( 701233 )

        But I wonder what benefits it has

        Being able to do an operation on an entire huge dataset in memory instead of a pile of fetching and carrying to do it on disk.
        Since the alternative is an order of magnitude (or several) slower a bit of latency isn't a terrible price to pay.

        • by skids ( 119237 )

          The critical number missing in TFA is the memory access speeds at various tiers of the NUMA.

          Take a 4GHz computer. How far can a memory access go in one cycle given the speed of light? The answer is "not even to the other side of a 19 inch server rack. Not even halfway across a laptop." You can fetch cache lines in bulk, sure, but at some point this fact will intrude into your code, demanding you keep local registers local and tightly coupled calculations on physically close nodes... we can't tell how d

          • Why didn't you bother reading the second line of my post before spending so much time writing what you did? What you wrote is all true but kind of irrelevant without a massive leap in technology.
            Multiple nodes is certainly not as fast as having it on one board, but try reading that second line to find out why it's still useful.
            • by skids ( 119237 )

              I did read your second sentence. It seemed pretty a throwaway aside, given this is supposedly more than just a big fast disk.

              • by dbIII ( 701233 )
                Damn - so you didn't understand and I have to explain it to you.

                Access to memory on a remote machine is a great deal faster than access to disk when network speed is not the limiting factor.

                I thought it was kind of obvious to anyone who would want to comment on this article but it appears I was wrong.
      • AI using multidimensional data sets. I work with cubes in the tens of terabytes that could be sped up thousands of times if they could be held in memory.

        • AI using multidimensional data sets. I work with cubes in the tens of terabytes that could be sped up thousands of times if they could be held in memory.

          Indeed. I wonder how useful it would be for someone like the NSA or NRO for analyzing large datasets in near-realtime like, for instance, all the cellphone communications "metadata" (and contents?) in an area and cross check it against other datasets to destroy privacy, reveal networks of association of political/ideological opponents, etc etc? "Predict" crime a la 'Minority Report'?

          Seems like just the kind of cutting edge mass-data analysis technology leaders of a surveillance state would soil themselves o

          • I wonder how useful it would be for someone like the NSA or NRO for analyzing large datasets in near-realtime like, for instance, all the cellphone communications "metadata" (and contents?) in an area and cross check it against other datasets to destroy privacy, reveal networks of association of political/ideological opponents, etc etc? "Predict" crime a la 'Minority Report'?

            Well, they did call it The Machine [wikia.com], so I assume they're trying to make it easy for the government to connect the dots on that idea.

    • Huh? You mean it not a von Neumann or Harvard architecture because the article doesn't lead me to _that_ conclusion:

      I think what HP means is that you no more have to compress/pack your database tuples into 4K-sized pages because they "just stay in memory". The same for other formerly-disk-based structures like B-trees and such. Also, changes in latencies on their own might change algorithm preferences massively.

      • by imgod2u ( 812837 ) on Tuesday May 16, 2017 @06:49PM (#54430373) Homepage

        It seems to imply more than just persistent memory, though. It sounds like they're distributing processors in the data-path of the connected memory. Instead of the OS determining which context to put on a CPU and fetching the necessary data from memory/disk, the context and code will be decided by what data resides in memory that is closest to the processor node.

        A rather natural result of persistent, high-capacity memory for non-interactive compute tasks.

        • I don't recall them announcing this the last time this concept was in the news, but if they're doing that, then yes, that's an even bigger change. (I admit I'm still sort of fond of the Connection Machine...)
      • Doubtful.

        Read-ahead protocols allow you to identify further data sets and bring them in and out of memory faster than algorithmic performance. The fastest pattern is a giant linear read, and you can issue a DMA to read in the next several hundred megabytes and expire the prior without the CPU being further involved.

        Algorithms that process more-complex data sets generally need instrumentation code to identify where the next addresses are, which can be ordered to occur before processing: instead of iden

        • Read-ahead protocols allow you to identify further data sets and bring them in and out of memory faster than algorithmic performance. The fastest pattern is a giant linear read, and you can issue a DMA to read in the next several hundred megabytes and expire the prior without the CPU being further involved.

          Yes, because it hides the fact that the smallest block you can fetch is hundreds of bytes in size at least, and possibly several kilobytes.

          Algorithms that process more-complex data sets generally need instrumentation code to identify where the next addresses are, which can be ordered to occur before processing: instead of identify an array of 300, process it, then read off the next address and move your attention there, you would identify the array of 300, skip it, read the next address, issue the read-ahead, and process. This ordering only really adds the call for read-ahead (an OS madvise() call, really) on top of all other work.

          And how does that help you with data structures in which the access sequence is data-dependent even over smaller pieces of data? Spatial trees, for example? Unless of course you're tacitly limiting yourself to all the others that aren't. And madvise, isn't that for memory-mapped files on block devices? Since I don't see how madvise could tweak CPU cache logic which is a

          • And how does that help you with data structures in which the access sequence is data-dependent even over smaller pieces of data?

            Generally, if you're scattering over different row selects in RAM, you stall the CPU about 200 FSB cycles or 2,000 cycles for a 10x multiplier when you jump around in RAM. That means if the data is all in RAM to begin with and you spend 20 cycles processing, then jump to some data 40 megabytes away, you spend roughly 99.0099% of your time stalled waiting for CPU cache miss. To get around this, you'd have to use CPU prefetch instructions to load the upcoming data into L1.

            Access structures as such tend

    • ...could they get any more non-descript.

      Hey, it sure as hell worked for Pink Floyd.

    • by Anonymous Coward

      This page:
      https://news.hpe.com/memory-driven-computing-explained/
      has more helpful information about how the architecture works. It's neat.

    • Who cares what it runs, the NSA has already ordered a dozen of them.

      In unrelated news, you may want to switch to a minimum password length of 32 characters for any account you care about. Just saying...

    • The old version of that machine (more than 10 years ago) was using 384 Itaniums with 2GB of RAM per CPU and custom SGI interconnects so that the operating system saw one single memory space an all the CPUs.

      No big news here.

      It looks like HP wants to take something out of the effort that was put into the whole Itanium business, now that it is being discontinued.

      The new version of Cosmos uses x86 CPUs and GPUs as accelerators.

  • Just great. (Score:5, Funny)

    by fahrbot-bot ( 874524 ) on Tuesday May 16, 2017 @06:35PM (#54430265)

    I'll have to allocate an entire 1.6 TB drive for swap space.

  • by Anonymous Coward

    Then it's dead already. Unless it comes with some kind of magical recompiler.

    • I think AllegroCache and similar stuff already has you covered on both fronts.
    • by dargaud ( 518470 )
      Unless the performance is really massively superior. Then you'll have some libs optimized for that beast, while the rest of the program runs on a normal frontend. Similar to what we are currently doing with CUDA and such.
    • Yes. I fondly remember the Transputer. Brilliant stuff, but noone wanted to learn Occam, one of the most elegant parallel-from-the-ground-up languages I know. But they invented parallellizing compilers and libraries for that. Suboptimal, but given the raw power of this beast, I'm not sure that matters much.

  • 160 TB...

    32000 seconds or just under 9 hours at 40Gb/s assuming you have a storage array that can saturate that link.

    • by dbIII ( 701233 )
      I really don't know why that got modded up.
      They call it a "fabric" because there are several network connections instead of a single choke point.
  • by gfilion ( 80497 ) on Tuesday May 16, 2017 @06:48PM (#54430369) Homepage

    160 TB of RAM ought to be enough for anybody

    • by Gabest ( 852807 )
      If not, we can still use a memory extender to free up a few TBs.
    • How so? Can you simulate the entirety of the universe using only 160TB? No? Then it isn't enough, is it? Hm!
      • but then, that memory space being part of the universe...
        • Doesn't matter how deep you go, there's always another layer to the puzzle. If we concern ourselves with such insignificant details we'll never go anywhere or do anything. Infinity is like that. At some point you just have to say enough is enough.
  • Ob (Score:5, Funny)

    by Hognoxious ( 631665 ) on Tuesday May 16, 2017 @06:51PM (#54430399) Homepage Journal

    It's almost enough to store all the data their keylogger [slashdot.org] stole.

    • by Anonymous Coward

      Ok, sure. But technically, Hewlett Packard Enterprise (HPE) doesn't make laptops. HP Inc. makes the laptops that had the keylogger. They're two different companies. Welcome to 2017.

  • 4096 yottabytes = 4.096e27 bytes; 2^n=4.096e27, solve for n ... n = 92. Now we know the market for these 128-bit processors!
  • by somenickname ( 1270442 ) on Tuesday May 16, 2017 @07:19PM (#54430543)

    It would have been a lot more interesting, and a lot more paradigm shifting, if it was 160TB of ultra-fast next-gen M.2 sticks with 0MB of traditional RAM and 0MB of traditional storage. That would be a truly unique machine to work on. If you read the article, this isn't even a single machine. It's actually 40 nodes with high speed interconnects. Basically, HP is now running Linux on their VMS clusters.

  • Track and analyze your life to the smallest fraction we will. Soon. sooooooooon. MMHEHEHEHE!

  • The article contradicts itself multiple times.

    First, the start of the article (and the summary) say it's a prototype computer with a single bank of memory. Later they report that the machine has the 160TB spread across 40 nodes. It might be logically contiguous but it's hardly a "single bank".

    Secondly, the start of the article describes the architecture as memory-centric, but HP later states: "the Machine is an attempt to build, in essence, a new kind of computer architecture that integrates processors an

  • Having huge banks of memory and passing them through a "single computer" bottleneck is a colossal waste.

    • by Megane ( 129182 )
      But think of the rainbow tables you could load into it! The NSA should be all over this.
  • by Anonymous Coward

    If it is anything like the HPs I have owned, some major part will go out in 2 to 3 years.

  • Memory integrated architectures (PRISM, MPA, etc, etc..) have long been a twinkle in our collective eye, but I doubt HPE has the critical mass to pull this one across the finish line. Gone are the days when HP Labs held any credible sway in architecture. When was the last time HP(no E) told us they knew best in things architectural? Remember the Itanic!
  • Addressing 160 TB just requires a 48 bits bus, which most recent 64 bits architectures have. So "simultaneously" is probably missing from the title..
  • You are being watched...

  • This is the same HP that hasn't come up with a hit since the bubble jet printer, people. The same HP that pushed a cloud computing solution that was so pig-fucking awful that The Onion mocked them about it. [theonion.com] I worked at HP at the time, and I really have to think that The Onion had someone on the inside...because their parody was unbelievably on target. "We have 4G, 5G, 6G...we have all the Gs. We have app." That's literally as bad as what some of the people at HP were about it...it defied belief. This

    • by MancunianMaskMan ( 701642 ) on Wednesday May 17, 2017 @02:08AM (#54432043)
      years ago we heard HPE (or was it still HP then) talked about betting the farm on "the machine" all full of its new memristor tech, cheap, fast, persistent, practical, egg-laying wolly milk pig kind of chips.

      Now it's "DIMMs with a little battery stuck on" to handle the "persistency". Hope that's just for the demo.

      • The design should translate transparently to MRAM chips, if their engineers are competent.

        If they're really good, their architecture will also handle Intel's 3d Xpoint DIMMs, too.
  • In Russia 160 Terabytes * IS * you. Yet, so true.

  • Isn't this just IBMs iSeries reborn? That was / is a 64-bit address space that addresses physical memory and disk in one single-level storage. Granted, in the real world we don't often put 160TB into a machine, and the balance may be made up of spinning disks, but as far as the software is concerned it is the same, surely?
  • by mark_reh ( 2015546 ) on Wednesday May 17, 2017 @05:53AM (#54432613) Journal

    Seriously, are we still using books as a unit of comparison? Why not say it can process 80% of the internet, etc.?

    • Re:Books? (Score:4, Informative)

      by Voyager529 ( 1363959 ) <`moc.oohay' `ta' `925regayov'> on Wednesday May 17, 2017 @08:46AM (#54433445)

      Seriously, are we still using books as a unit of comparison? Why not say it can process 80% of the internet, etc.?

      Yes, and there are two related reasons. First, the LoC is a very large amount of data. It's not the kind of data that can land on a USB stick, it's enough to actually prove something.

      Second, it's a known quantity of data. Even if it's approximate, it's a set amount of books, with a set amount of pages. Can we really count the amount of data on the internet? Let's establish a baseline - what constitutes "the internet" in terms of storage? Every website ever? What about apps and the data they create - do we include those databases because mobile apps use them? How many companies will volunteer how big those databases are? GoDaddy will probably be able to more-or-less say how much data they host, but how much of it is active data - does it have to be served up to count? Similarly, does this include Dropbox data that's technically accessible, but only to its end user? If so, what about end users who own their own Synology boxes and back up their pictures to it over the internet? Does the data on those home NAS units count? Do we limit protocols to HTTP, or are we also talking about FTP sites, NNTP servers (do we count the total amount of Usenet data, or does each company who peers that data count separately?), and data available via torrents? What about e-mail - does e-mail count if it's stored on a server and accessible via a web browser? What if it's only accessible via POP/IMAP?

      Even if *you* came up with a number that includes what you deem appropriate for '80% of the internet', it's not going to translate well. If your metric was "anything that is accessible from a computer and isn't behind a login prompt", that's going to be different than someone who says that Dropbox counts, which doesn't fit your criteria - undoubtedly petabytes of difference, making the measurement irrelevant.

  • but, if power is interrupted (because that NEVER happens, even with UPS, right?), do you have to start over from scratch, and reinstall the OS, databases, etc?
  • But can it run Crysis?

    • But can it run Crysis?

      In 1080P with all sliders set to low... After all, I didn't see a 3-way SLI GPU as part of the specs....

  • This will do more to enable true AI than all the neural networks of the last 5 years'.
  • What a nightmare. Imagine how long memtest would take to run to identify just ONE goddamn back memory stick! What are you thinking, HP?
  • it could require a whole new kind of software.

    I asked the technical lead Kirk Bresniker (chief architect at Hewlett Packard Labs) about this exact thing at the launch yesterday, and he said no, that you should be able to use conventional software (I specifically asked about Python), with the speed-up occuring under the hood.

    I am not entirely convinced that it will be that easy...

  • Didn't Crays use something like this where the memory was central to the operating structure of the computer? Can anyone enlighten me?

If you don't have time to do it right, where are you going to find the time to do it over?

Working...