Is It Time For NoSQL 2.0? 164

Posted by Unknown Lamer on Wednesday February 22, 2012 @02:22PM from the hyperhack-the-hypergibsoncube dept.

New submitter rescrv writes "Key-value stores (like Cassandra, Redis and DynamoDB) have been replacing traditional databases in many demanding web applications (e.g. Twitter, Google, Facebook, LinkedIn, and others). But for the most part, the differences between existing NoSQL systems come down to the choice of well-studied implementation techniques; in particular, they all provide a similar API that achieves high performance and scalability by limiting applications to simple operations like GET and PUT. HyperDex, a new key-value store developed at Cornell, stands out in the NoSQL spectrum with its unique design. HyperDex employs a unique multi-dimensional hash function to enable efficient search operations — that is, objects may be retrieved without using the key (PDF) under which they are stored. Other systems employ indexing techniques to enable search, or enumerate all objects in the system. In contrast, HyperDex's design enables applications to retrieve search results directly from servers in the system. The results are impressive. Preliminary benchmark results on the project website show that HyperDex provides significant performance improvements over Cassandra and MongoDB. With its unique design, and impressive performance, it seems fittng to ask: Is HyperDex the start of NoSQL 2.0?"

Is It Time For NoSQL 2.0?

This discussion has been archived. No new comments can be posted.

Search 164 Comments Log In/Create an Account

Comments Filter:

Why not both? (Score:4, Interesting)

by Sarten-X ( 1102295 ) writes: on Wednesday February 22, 2012 @02:31PM (#39127777) Homepage

Er...
Um...
Why not learn both, and use whichever's strengths suit the application the best?

Great for Perl aficionado... (Score:5, Interesting)

by Spectre ( 1685 ) writes: on Wednesday February 22, 2012 @02:38PM (#39127881)

Many of the key-value pair DBs supply a Perl library that let you tie a Perl hash (%Variable) to the DB directly, giving you persistent hashes.
Makes database storage virtually a native feature of the language. Anybody who uses Perl is probably already a hash buff, so it is a win-win if you and your app already use Perl.
Disclaimer: I run a 10yo web "app" (Perl/CGI/Apache), so I'm a bit biased. But, the thing is rock-solid, so I'm not going to be too apologetic.

Re:Why not both? (Score:5, Interesting)

by Terrasque ( 796014 ) writes: on Thursday February 23, 2012 @08:08AM (#39135295) Homepage Journal

I've just gotten my NoSQL feet wet by playing around a weekend with python + mongodb. I am pretty used to SQL, and generally had the same thinking you (and most other SQL people) had.
But, many people liked it, so I figured out I should at least have a look at it. I made a small webapp for tracking my movies, with query to imdb and with users. I was surprised to see that most of the problems I anticipated wasn't a problem at all, and things mostly just worked naturally. For a quick get-started intro to python + mongodb : Part 1 [pythonisito.com] and Part 2 [pythonisito.com]. If you got the spare time and some interest, poking around with it is a great little weekend project.
Anyway, back to your question. MongoDB store data in a format very similar to JSON (technically BSON, a JSON superset), if you're familar with that. Unordered key->value and ordered lists. For the python driver, it translates the data to and from native python dict/list structures. I started with three fields; filename, added and imdb. The imdb field was more or less the raw data from imdb (json format, decoded to python native and encoded to mongodb's BSON format again.)
Later on I added option for users to mark movies as favorites and seen (by adding two new fields to movie list, "seenby" and "favoriteof" - both lists - these were added to a movie entry the first time someone marked one as seen or favorite). To add a new user I just did movie["seenby"].append(user_id) and movies.save(movie)
When I wanted to query the db, I created a data structure of what I wanted, and sent that to the server. The server would then return all documents that matched that example structure. So, to find the entry for file "/bla/test.mp4" I would do movies.find( {'filepath': '/bla/test.mp4} ).
For finding by imdb Title value : {'imdb.Title': '300'}. For finding all favorites by user: {"favoriteof": user_id} (yes, it would handle the list of users as you'd expect, and find all that the list of "favoriteof" had user in it. It would also of course skip all entries without that field).
mongodb also support some special keywords for searching. Let's say I have a list of 3 users, and want to have all movies that any of them have favorited. {"favoriteof" : {"$in": users} } would fix that - for movies that all of them have as favorite, {"favoriteof" : {"$all": users} }. Sorting was done using sort_by( field_n_direction_list )
You have a full list of modifiers here [mongodb.org]. And all could of course be combined to quickly and easily create powerful queries. And you of course have options for indexes. You might notice that you do lose something from normal SQL's here, if you wanted both movie and user info, you'd have to make two queries (well, from what I've understood) so highly relational data is not fitted for this. Also, you don't have the type constraints any more.
In the app I also wanted to list all movie genres (I did one preprocessing of the imdb data, splitting up comma seperated genres string to a list of genres) and number of times each genre was used. This led me to mapreduce, which was the thing I both anticipated most, and feared most. Well, I kinda chickened out, since the pymongo doc had an excellent example [mongodb.org] which was exactly what I wanted doing, but I did get a look at it at least :) And it was fast enough to not making a noticeable dent in load time for a few hundred movie entries.
*Cough* well, that was a long post.. I hope it helped you at least a bit in answering your question, and maybe inspire you to take a closer look at it when you get some spare time. I've only used it over a weekend, so I've probably just scratched the surface, and I probably have missed some neat features or horrible gotchas here and
Read the rest of this comment...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Is It Time For NoSQL 2.0? 164

Is It Time For NoSQL 2.0? More Login

Is It Time For NoSQL 2.0?

Why not both? (Score:4, Interesting)

Great for Perl aficionado... (Score:5, Interesting)

Re:Why not both? (Score:5, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot