Forgot your password?
typodupeerror
Cloud Data Storage Programming Hardware

Is It Time For NoSQL 2.0? 164

Posted by Unknown Lamer
from the hyperhack-the-hypergibsoncube dept.
New submitter rescrv writes "Key-value stores (like Cassandra, Redis and DynamoDB) have been replacing traditional databases in many demanding web applications (e.g. Twitter, Google, Facebook, LinkedIn, and others). But for the most part, the differences between existing NoSQL systems come down to the choice of well-studied implementation techniques; in particular, they all provide a similar API that achieves high performance and scalability by limiting applications to simple operations like GET and PUT. HyperDex, a new key-value store developed at Cornell, stands out in the NoSQL spectrum with its unique design. HyperDex employs a unique multi-dimensional hash function to enable efficient search operations — that is, objects may be retrieved without using the key (PDF) under which they are stored. Other systems employ indexing techniques to enable search, or enumerate all objects in the system. In contrast, HyperDex's design enables applications to retrieve search results directly from servers in the system. The results are impressive. Preliminary benchmark results on the project website show that HyperDex provides significant performance improvements over Cassandra and MongoDB. With its unique design, and impressive performance, it seems fittng to ask: Is HyperDex the start of NoSQL 2.0?"
This discussion has been archived. No new comments can be posted.

Is It Time For NoSQL 2.0?

Comments Filter:
  • Berkeley DB? (Score:5, Insightful)

    by gstoddart (321705) on Wednesday February 22, 2012 @01:31PM (#39127775) Homepage

    This sounds like the old Berkeley DB/Sleepy Cat software.

    Key/Value pairs instead of relational stuff. Worked with a product years ago that was built on Berkeley -- offered some pretty useful features that simply didn't map to object-relational stuff.

    For some applications, you really do need something that works a little differently than an RDB ... however, there's still loads of things I can't imagine trying to do without one.

    Choice is good in technology.

  • Re:Why not both? (Score:3, Insightful)

    by LostCluster (625375) * on Wednesday February 22, 2012 @01:45PM (#39127975)

    Can somebody explain how this NoSQL stuff works? It's a database without SQL, so what replaces it? Is this just the difference between BASIC and C being expanded.

  • by Sarten-X (1102295) on Wednesday February 22, 2012 @01:49PM (#39128025) Homepage

    And we'd still be able to have the cluster support, scalability, lax schema, and MapReduce algorithms NoSQL currently provides, right? Sometimes those aspects are vital to the application design, and key to the system's overall performance.

  • Re:Branding (Score:5, Insightful)

    by donscarletti (569232) on Wednesday February 22, 2012 @02:06PM (#39128261)

    It's great branding.

    Previously, I was developing MMO backend software that uses MySQL for a data storage. The fit to the model was completely inappropriate, there was just no applications of the relational model, since we were just checking in and out large blobs of data, not actually performing read/update transactions. Storing records (persistent game entities) as files in a directory would have worked far better than forcing that stuff into a relational DB. But customers know that Databases are what professionals use, so we did it anyway. Clients can buy it, realise they need the flat files and turn them on after benchmarking, we get the sale, they get a good product in the end, win win, but a bit of wasted effort.

    Now NoSQL is what professionals use, relational DBs can be used for what they are good at and NoSQL gives us marketing hype for doing certain things in the right way that could have been done using filesystems all along. I couldn't be happier. Furthermore we get this nice application level distributed data store with map-reduce stuff built in if we can be bothered using it.

    Here's what most geeks don't get about marketing: it's not just about being smarter than the other guy, you've got to be smarter than him and make him give you his money. Money is good, it buys freedom and power and if branding makes sure that you have more of this freedom and power than the fool who falls for it, then the world will be a better place.

  • Re:Why not both? (Score:2, Insightful)

    by Anonymous Coward on Wednesday February 22, 2012 @02:40PM (#39128687)

    Perhaps I'm just in an old state-of-mind, but what good is data without relations? I don't mean that as a gripe about the system, just, how would one ever pull members of a group, or messages belonging to a user, etc? I guess I don't understand how that's more efficient.

  • by Sarten-X (1102295) on Wednesday February 22, 2012 @02:54PM (#39128887) Homepage

    [citation needed], and preferably one that actually covers NoSQL as it's intended for use.

    Last time I checked thoroughly (2009), most RDBMSs (MS SQL Server included) could scale across an arbitrarily-large cluster, but for every doubling of the cluster's power, the costs would be around 300% to 400%. When you get to the point of needing billions of rows per table (and yes, there are applications out there that need that, even at relatively small startups), those outpacing costs become prohibitive.

    The lax schema isn't about not knowing what you're doing, but about acknowledging that you won't know everything about the data you'll receive. Back when I did server programming, the mantra was "be strict in what you provide, and lax in what you accept". This is that principle applied to databases. Maybe the website you're crawling doesn't have a title, or its address is obviously dynamic. Maybe the medical record's patient has seven different insurance providers. Maybe the passport holder legally doesn't have a surname. When you design a schema for a strict database like an RDBMS, you make certain assumptions about the data you'll get. Those assumptions lead to performance increases if they're accurate, and failure if they're wrong.

    MapReduce is the key to performance without assumptions, at lower cost. By moving processing to the data, and replicating the data to multiple nodes, network transfer is reduced greatly. The MapReduce programs are designed to operate on any amount of data they are presented with, so each node in the cluster contributes its available resources, and since the data is spread evenly, most "queries" will be partially processed by every node. Contrast that with RDBMS sharding, where certain servers handle certain shards, and the massive parallelism of the cluster isn't used. Some servers will sit idle while others do all of the work. Note that the parallelism applies generally, to all MapReduce algorithms. This means that you do not need to make as many assumptions about your queries ahead of time, like expecting to only look up a customer by name or phone number (and therefore indexing those).

    NoSQL isn't just "not using SQL". It's a different storage paradigm, which comes with its own advantages and disadvantages.

  • by sexconker (1179573) on Wednesday February 22, 2012 @03:33PM (#39129419)

    300 - 400%? Lol you're doing it wrong.
    Billions of rows? So what? Easily handled by SQL.

You had mail, but the super-user read it, and deleted it!

Working...