Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Data Storage Open Source Hardware Technology

Samsung Announces Standards-Compliant Key-Value SSD Prototype (anandtech.com) 74

Samsung has announced a new prototype key-value SSD that is compatible with the first industry standard API for key-value storage devices. "Earlier this year, the Object Drives working group of Storage Networking Industry Association (SNIA) published version 1.0 of the Key Value Storage API Specification," reports AnandTech. "Samsung has added support for this new API to their ongoing key-value SSD project." From the report: Samsung has been working on key-value SSDs for quite a while, and they have been publicly developing open-source software to support KV SSDs for over a year, including the basic libraries and drivers needed to access KV SSDs as well as a sample benchmarking tool and a Ceph backend. The prototype drives they have previously discussed have been based on their PM983 datacenter NVMe drives with TLC NAND, using custom firmware to enable the key-value interface. Those drives support key lengths from 4 to 255 bytes and value lengths up to 2MB, and it is likely that Samsung's new prototype is based on the same hardware platform and retains similar size limits.

Samsung's Platform Development Kit software for key-value SSDs originally supported their own software API, but now additionally supports the vendor-neutral SNIA standard API. The prototype drives are currently available for companies that are interested in developing software to use KV SSDs. Samsung's KV SSDs probably will not move from prototype status to being mass production products until after the corresponding key-value command set extension to NVMe is finalized, so that KV SSDs can be supported without needing a custom NVMe driver. The SNIA standard API for key-value drives is a high-level transport-agnostic API that can support drives using NVMe, SAS or SATA interfaces, but each of those protocols needs to be extended with key-value support.

This discussion has been archived. No new comments can be posted.

Samsung Announces Standards-Compliant Key-Value SSD Prototype

Comments Filter:
  • But I know how many shops write code. 2MB should be enough for anyone?

    • Not a problem! A key-value store can always be treated as a block device; so all you need to do is shove a dodgy, ad-hoc, approximation of a filesystem into your application; and then you can store values larger than would be allowed by individual key/value pairs as files on top of your key/value filesystem.
      The simplest approach is just to brute force it by deciding on a block size smaller than the limit for value sizes and then defining a bunch of keys that are LBA addresses for the 'block' stored in the
      • And you have precisely described what I expect to happen with it. Well done, sir.

      • by ron_ivi ( 607351 )
        Remember when traditional block devices all had Sectors, Clusters, Cylinders, Blocks, and Heads that mattered?

        This is just a flashback to those days.

        Just make all your keys "1", "2", "3" ... and pretend they're sector numbers and your traditional filesystems will work with them just fine.

    • I don't know if you just scanned the summary or didn't understand the article. These drives are designed to be a storage device for KV based applications, you know: hashes, data dictionaries, etc. There is a key to identify data and the data value. The 2MB is for each storage value. Instead of all the KV pairs being written by software to one or more files on a standard drive, each KV pair will be written directly to the drive. The drive becomes the KV database.

      FTFA:

      This allows a key-value drive to be used more or less as a drop-in replacement for software key-value databases like RocksDB, and as a backend for applications built atop key-value databases.

      • I totally understood it. I fully expect it to be abused as described by another poster above.

        I think there's plenty of actual use cases. Like the subject says, I want to like it.

        But then you give a mouse a cookie, and next thing you know, it's come down from on high that, "You will store 10 MB photos in those values, or else". Not in every shop, but certainly some.

        • To wit, my comments amount to, "it's a wonderful idea for the right application, but there's a fresh stream of daily WTF posts coming."

    • There seems to be some confusion about the justification for KV SSD technology .

      So here :

      https://www.snia.org/sites/def... [snia.org]

  • What is the point? This is a solution in search of a problem. The young-uns seem to like solutions to non-existent problems -- maybe it is because they are completely at a loss how to solve problems that actually exist.

    • Contents Addressable Memory
      • Re:The point? (Score:4, Insightful)

        by ShanghaiBill ( 739463 ) on Thursday September 05, 2019 @08:24PM (#59163980)

        Contents Addressable Memory

        Of course. But why implement it in a hardware API?

        Why not just use a "normal" SSD, and store the keys in an index that can be cached in RAM?

        Then when you need keys longer than 255 bytes, or values over 2MB, you can just upgrade your software.

        Commodity off-the-shelf drives will certainly be cheaper, and can work with existing backup software.

        • Did you read the article or other articles on it? They implemented it because it will provide considerably better performance for this narrow purpose. Just like you can buy a video card/GPU when "your CPU can do the same damn thing, by gummit!".
          • Did you read the article or other articles on it?

            Hah, you expected informed grumpy bitching? This is slashdot, where it's a point of pride to ramble uninformed when the answers are right in TFA. I've scanned through the comments, and already found about a half-dozen "what's the point of this?" comments.

            • It's so weird, it's like these people think everyone else is just a complete retard. Samsung has about 500 people who are just too stupid to realize that this was supposedly a bad idea and they should have just "used a normal SSD". If only this lone voice of reason had been there at Samsung HQ to convince all of their apparently very stupid engineers and executives who greenlit this project that they are wrong and can just "use a normal SSD".

              SlashDot: Unique Home to the Voice of Reason.

          • Did you read the article or other articles on it? They implemented it because it will provide considerably better performance for this narrow purpose.

            Yes, I RTFA, and it does NOT say it gives better performance. What it says is that it has "the potential to offload significant work from a server's CPUs". That is a BS justification. The server CPU is going to be FAR more powerful that an embedded CPU in the drive, and is very unlikely to be compute bound.

            Rather than spending money on some custom drive, it is almost certainly better to invest that money on a faster main CPU and more RAM for indexing.

            • the people this is targetted at already have the fastest highest core count cpu money can buy and a couple of terrabytes of ram.

              still not enough.

              Obviously if all you do all day is play csgo and post on slashdot, this api is not the api you are looking for.

            • That is a BS justification. The server CPU is going to be FAR more powerful that an embedded CPU in the drive

              Will it also be far more powerful than the embedded ASIC in the drive?

            • "Yes, I RTFA, and it does NOT say it gives better performance. What it says is that it has "the potential to offload significant work from a server's CPUs"."

              Since you can't figure out that offloading significant KV searches from the server CPU to a KV coprocessor gives better performance, this would be a good place to stop posting and making yourself look even more clueless.

            • by drnb ( 2434720 )

              The server CPU is going to be FAR more powerful that an embedded CPU in the drive, and is very unlikely to be compute bound.

              And for people looking at KV SSD the CPU is also likely to be saturated with requests and lagging as a result.

              Rather than spending money on some custom drive, it is almost certainly better to invest that money on a faster main CPU and more RAM for indexing.

              The point is that the KV SSD will be less expensive at some point compare to your suggested upgrade. Perhaps even eventually allowing for saving on the base CPU and RAM compared o today's requirements.

              Basically the KV SSD is offering more parallelism. The other poster with GPU analogy was spot on.

          • No, they didn't read the article. They all read the summary and started making ignorant comments.
        • Re:The point? (Score:5, Informative)

          by complete loony ( 663508 ) <Jeremy.Lakeman@noSPam.gmail.com> on Friday September 06, 2019 @12:32AM (#59164356)
          Flash memory devices already need to map between physical and logical addresses. If your data is naturally stored as key / value pairs, this could mean a significant reduction in complexity.
    • Content-addressable memory (CAM) is computer memory that operates like a hardware search engine for search-intensive applications. CAM is capable of searching its entire contents in a single clock cycle.
      • So what? This isn't that. This is flash, not dram. It can't do that.

        • What mistaken belief do you hold that suggests it can be done using DRAM but not FLASH? Asking for a friend.
          • Anything you cook up to "instantly" search across its contents can be done on a normal disk with a normal filesystem, too. Indexing, for example. DRAM can be accessed in a very parallel fashion, Flash RAM can't.

            • "Anything you cook up to "instantly" search across its contents can be done on a normal disk with a normal filesystem, too. Indexing, for example. DRAM can be accessed in a very parallel fashion, Flash RAM can't."

              Thank you. Now everyone knows what ridiculous belief you hold that makes you form such absurd conclusions. But as usual your stupidity just keeps getting deeper. What makes you think that a disk drive can access the data it has spread all over its various platters in "massively parallel fashion"? I

              • What makes you think that a disk drive can access the data it has spread all over its various platters in "massively parallel fashion"?

                The only person in this discussion who has mentioned spinning rust is you. You're going to have to figure out what the rest of us are talking about if you want to make a useful contribution.

                • You stated "normal disc". I hate to break it to you the D in SSD stands for drive and is not only not a "normal disc drive", it isn't a disk drive at all. So it turns out that you literally don't know what you are talking about. DOH!
                  • It would have been obvious to an intelligent person what i meant. QED, you are not that person. Now run along and let the smart people have the discussion, as you have nothing to add.

                    • Ah ... But it has been obvious to everyone that you are an idiot who doesn't know the difference between a disk drive and a solid state drive any more than he understand CAM, what it is, how it works, and why everything you have written announces to the world your ignorance, incompetence, and unabashed idiocy.
    • by Jeremi ( 14640 )

      The point is to make database operations faster. If it speeds up databases enough to offset the additional hardware cost (relative to traditional/commodity hardware), that's the point. If not, there is no point.

      By analogy: what's the point of a GPU when you already have a perfectly good, Turing-complete CPU that can already perform all the same calculations?

      • The point of a GPU is to offload the processing of the display from the main CPU. The point of I/O Channel Processors is to offload I/O processing from the main CPU (bittyboxen don't have these yet, though they did exists at one time for SCSI using a WD7000 SCSI controller). This is nothing more than an offloading of some very specific I/O to a co-processor.

    • by ceoyoyo ( 59147 )

      A hardware implemented hash table is kind of cool. I don't think laptops are going to be coming with them standard any time soon, but I'm sure there are lots of uses for it.

  • They'll basically be used for some marketing and concept systems, but what is the value-add to a system like this? It's locking you into a specific vendor, it's locking in your software system to the hardware limitations.

    There is a reason we want Software-Defined. Hardware anything is too slow and inflexible these days.

    • by keltor ( 99721 ) *
      They are rather explicitly moving to the Vendor-agnostic standards-based API, that's literally what this is about.
  • My SSD is performing as a key-value device right now. The keys are filenames. The values are the contents of files.

    • Re:I don't get it (Score:5, Informative)

      by Locke2005 ( 849178 ) on Thursday September 05, 2019 @08:20PM (#59163974)
      Content-Addressable Memory (CAM) can be used to greatly speed up database lookups. You don't need it for what you're doing. But for supermassive databases, it would be a big win.
      • Content-Addressable Memory (CAM)

        No way. This isn't a CAM its literally flash with a different interface that isn't even remotely on the same planet as DRAM let alone SRAM in terms of performance.

    • No, it is not.
      It is a block device.
      It has numbers which address a block.

      On top of that is a filesystem. That loosely resembles a key/value thing. However the key again does not map to a value, but a list of blocks.

      The difference might be subtile ... but it is there. Especially when one of the above is solved by the hardware itself and does not need an OS or an API or a filesystem.

      • "On top of that is a filesystem. That loosely resembles a key/value thing. However the key again does not map to a value, but a list of blocks."

        Yeah, and the SSD does all the same shit internally.

        The difference is that i don't need to support any new standard to read my disk. Just the old ones.

        If support for this new standard is as spotty as i expect, it will create the same kind of problems as hardware raid.

    • by ceoyoyo ( 59147 )

      Details differ a bit depending on just what kind of filesystem you're using, but generally looking up a file name is a matter of searching a sorted list. Average and worst case lookup are order N operations.

      Lookup times in a properly implemented hash table are constant complexity.

      In most file systems you could probably get constant order lookup times by implementing the hashing algorithm yourself using a predefined directory tree structure, but the filesystem would still have lots of overhead having to do a

  • Everything old is new again. Fixed Block Architecture storage devices won out because they are easier to deal with. These days, going to the bare metal is counter productive to almost all programmers. until there is robust support for this at all layers of the software stack (device drivers, file systems, high available / backup, dbms, transaction manager) support, if these devices are sold at all, most will just be used as classic FBA devices
  • by gweihir ( 88907 ) on Thursday September 05, 2019 @08:45PM (#59164012)

    Are people now too incompetent to put a hash-table on raw storage or what?

    • by 110010001000 ( 697113 ) on Thursday September 05, 2019 @08:54PM (#59164030) Homepage Journal

      To implement a hash table in 2019 you need to download 200MB of JAR files.

      • by gweihir ( 88907 )

        It is utterly pathetic, but I completely agree. For the average moron coder that is. My last custom hash-table keeps a fortune-100 in business.

    • You mean to put a hash table on an emulation of block-sized storage that is being emulated on top of block-erasable, cell-writable raw storage by probably yet another hashtable? Sure, but why?
      • by gweihir ( 88907 )

        No, I mean a hash table om a raw block device. You may have heard of those. Some people are capable of reading data-sheets pr even do their own benchmarks toady. Yes, I do know it is an almost lost black art, but not everybody has gone completely incompetent and only capable of coding for high-level interfaces.

    • No, they are too incompetent to understand advances in technology before running their ignorant mouths.
      • by gweihir ( 88907 )

        You think universal storage specialized to a single purpose is an _advantage_? Talk about being incompetent and pathetically grateful for some vendor catering to your incompetence.

    • Comment removed based on user account deletion
      • by gweihir ( 88907 )

        I, on the other hand, think this is basically a scam and only possible because more and more and mode coders do not even understand the basics of their trade.

  • This is supposed to increase performance for things like mongodb, right? Are there any benchmarks?
  • by keltor ( 99721 ) * on Thursday September 05, 2019 @09:42PM (#59164138)
    This is one of those areas like supercomputing where 99% of the people reading something have no clue what's going on and make statements that basically look like they have no clue what's going on. These drives are basically tailored for Ceph and into increase storage performance for Ceph. There was a nice presentation by Chunmei Liu at the Flash Memory Summit last year about this: https://www.flashmemorysummit.... [flashmemorysummit.com] - the presentation doesn't actually address the KVS drives, but it addresses where the performance bottlenecks in high speed Ceph are, this helps with this by moving chunks of software that is highly repeatable and very easy to optimize in silicon, outs to silicon.
    • Yeah. Jesus, some of these comments. "But muh normal SSD can do that, t'aint nobody not gonna buy dat dere weird SSD!!". These aren't for you to put in your hip Linux "box" in your closet.
  • The symlink's name is the key, what it points at is the value. Any modern filesystem has them (including NTFS apparently [wikipedia.org]), and they all use hashing to speed up directory lookups, making this a perfectly practical alternative to dedicated libraries with complicated file-formats.

    Maybe, by doing it all in dedicated hardware you can gain some speed, but your filesystem can do it already. I'd be curious in performance benchmarks of this device vs. the same SSD formatted and mounted as filesystem.

    • This SSD is not a hard drive, but a special kind of memory aka RAM extension.

      • This SSD is not a hard drive, but a special kind of memory aka RAM extension.

        No its an SSD full of flash memory. It's not special.

        Personally I'm amused by the whole thing after reading Samsung's benchmark experiment which basically exclusively measured writes to show substantial performance improvements.

        How many minutes could you sustain that kind of load before flash cells self destructed? 5? 10? 1 day? A week? Pick a number you think is fair. Whatever it is the benchmark itself has no purchase on reality for the simple reason if you did that kind of thing in the real world u

        • by fintux ( 798480 )

          No its an SSD full of flash memory. It's not special.

          Personally I'm amused by the whole thing after reading Samsung's benchmark experiment which basically exclusively measured writes to show substantial performance improvements.

          I'm not sure to which benchmark you're referring to, but I found for example this one: https://www.samsung.com/semico... [samsung.com] - where they are stating that the performance scales almost linearly with adding KV-SSDs and they mention 15x improvement in QPS.

          How many minutes could you sustain that kind of load before flash cells self destructed? 5? 10? 1 day? A week? Pick a number you think is fair. Whatever it is the benchmark itself has no purchase on reality for the simple reason if you did that kind of thing in the real world underlying storage hardware would self destruct quicker than it would be feasible to replace.

          Well, they also mention a much lower write amplification, so at the same amount of operations, they will last longer. I don't know about any later endurance tests of SSDs, but at least the Samsung 850 Pro in one test lasted 9100 TB of writing. That means that ev

          • The performance of normal disks in a stripe also scales almost linearly. So what? As for SSD degradation testing, in the real world devices have data on them, and degrade a lot quicker than in ideal test conditions.

            • by fintux ( 798480 )
              I wish Slashdot had notifications for replies... Well, in any case, here's a late reply: the performance in this use case does not scale even nearly linearly - see the document I already linked (https://www.samsung.com/semiconductor/global.semi.static/Samsung_Key_Value_SSD_enables_High_Performance_Scaling-0.pdf). In fact, it pretty much stays flat after 6 drives. And for the SSD degradation, the only reason for why the device degrades faster when there's data is wear leveling. It mostly has significance if
  • by Anonymous Coward

    Moving a key-value store behind a hardware API implies a large amount of trust in the vendor. After skimming the spec, there does not appear to be a single word about data integrity. This is a complex API, hiding an even more complex implementation, which does not inspire confidence. What are the chances that the internals will be as solid as ZFS, within commodity devices? If integrity must be implemented on top of this new API, what has been gained?

    This appears to be backwards; if performance is critical,

  • Seagate Kinetic disks had this 5 years ago [1], but they stopped the project last year [2].

    [1] https://www.storagereview.com/... [storagereview.com]
    [2] https://www.crn.com.au/news/se... [crn.com.au]
  • Imagine malware stored in keys.

Utility is when you have one telephone, luxury is when you have two, opulence is when you have three -- and paradise is when you have none. -- Doug Larson

Working...