NoSQL Document Storage Benefits and Drawbacks 96
Nerval's Lobster writes "NoSQL databases sometimes feature a concept called document storage, a way of storing data that differs in radical ways from the means available to traditional relational SQL databases. But what does 'document storage' actually mean, and what are its implications for developers and other IT pros? This SlashBI article focuses on MongoDB; the techniques utilized here are similar in other document-based databases."
Same as the old boss (Score:5, Interesting)
It's so cute how NoSQL developers have reinvented the XML database.
Re: (Score:2)
Re: (Score:2, Informative)
The article lied. It mentioned benefits and drawbacks in the title, but all it described was a collection of collections of key-value pairs. Is that really what this whole NoSQL thing is about?
Re: (Score:1)
Yes. It's a column-based store, as opposed to a row-based store.
Re: (Score:1)
XML as database has a hierarchie, NoSQL looks like a key (_id) value (hashes) flat unstructured mess. I'm not fond of XML/Xpath/Xwhatever but atleast I think it has more structure compared to NoSQL
Re: (Score:3)
JSON is relevant to NoSQL because it gives a good answer to "how do I store more complicated things than key/value lookups?", one that's even possible to decode in a web browser noadays. XML databases gave an answer to "how do I store schemaless data in a relational database?", a similar issue. Both combinations--relational + XML, NoSQL + JSON--end up providing the same basic capabilities.
Re:Same as the old boss (Score:4, Informative)
JSON is crap for storing arbitrary structured data and collections for web applications.
In javascript you can easily construct an object that is both an "Array" and has named attributes (an associative array). However, you can't recreate that object with valid JSON.
JSON also introduces a fantastic new method of inserting arbitrarily executing code into a web application, demanding yet another set of defenses against insertion attacks to be developed.
It is a problem masquerading as a solution to a problem it can't actually solve.
JSON sans eval (Score:2)
JSON also introduces a fantastic new method of inserting arbitrarily executing code into a web application
How so, if you parse the JSON in your own code [google.com] instead of eval()ing it?
Re: (Score:3)
That was the point. Not everyone does that.
The defacto standard for instantiating an object from json is still eval()
Re: (Score:2)
Re: (Score:1)
All major web browsers have supported JSON.parse() for a long time; including IE8. Anyone who is still using eval() for parsing JSON should come out of their cave and get with the times. I doubt you would find any serious web developer who still uses eval(), except in "extreme" circumstances. There's just no real need for it in most applications.
Re: (Score:2)
Ok, take that associative array and add non associative elements to it.
Or more accurately, take a non associative array and add associative elements to it
["a", "b", "c" "foo":"bar"] is not valid JSON
niether is {"a", "b", "c" "foo":"bar"}
Yet I can do:
var stuff = ["a", "b", "c"];
stuff.foo = "bar";
That javscript object can not be serialized to valid JSON.
Re:Wrong, wrong, and wrong. (Score:5, Informative)
Sure it won't create an instance of Array, but if you're using an Array to also be an associative array then really I think JSON is the least of your worries.
Re:Wrong, wrong, and wrong. (Score:4, Funny)
JavaScript under Wine (Score:2)
Re: (Score:2)
I know exactly what I am doing. I know I can iterate over all properties and array contents using in. You merely have poor reading comprehension.
I am adding a name property to an object, one that also happens to be an array. Exactly like I said I was. I am fully aware that I am not adding another element to the array when I do stuff.foo = "bar";
Your half intelligent json serialization routine that ignores the properties added to an array is wrong. Just like the other guys suggestion to implement it
Worse than the old boss (Score:5, Interesting)
The "old old boss" would be the CDF/NetCDF/HDF family of self-describing distributed storage solutions. They predate XML by a long way and are - I believe - the first true self-describing method of storing, indexing and searching data.
For the most part, they support network interconnections between instances, so you can have your virtual storage distributed over as many physical systems as you like. The users will never see the difference except in terms of speed. This gives you all the benefit of NoSQL's distributed model (which XML lacks) but with several decades more development in the database design.
But wait! There's more! If you order in the next gazillion years, you get OpENDAP absolutely free! (Which it is anyway.) OpENDAP will translate between any two data formats, so if one site wants to view the data as, say, a conventional database, another wants to look at it as a collection of spreadsheets and a third is expecting XML data, you'd have OpENDAP translate between client form and central repository form.
I have no objections to Mongo or Memcache, they're very powerful and are very useful, but we're still ultimately talking about technology everyone else has had since 1985, thanks be to NASA, and many NoSQL technologies are really just network-aware versions of the DBM/NDBM/BDB/GDBM/QDBM family which have existed since Unix began.
NoSQL definitely has a place - I would not want to try serving cached web data from HDF5 - and it's an important place. But that's just as true for Hierarchical Databases, Star Databases (aka "Data Warehouses"), "genuine" (ie: actually complies with Codd's rules) relational databases (SQL isn't truly relational in the Codd model, merely a subset), and so on.
It's time we got away from one-size-fits-all ideas, which violates the Unix ethos anyway, and get back to using best solutions for specific problems rather than passable solutions that fail at everything. These are all wonderful, highly specialized solutions to highly specific problem types. Treating them as such will always produce a better answer than force-fitting solutions into not-quite-failing with problems they aren't designed for.
Re: (Score:2)
It's time we got away from one-size-fits-all ideas
What do you mean? We shouldn't use ASCII? Or Unicode? How about what we in the West know as the Arabic numbering system? Universality has its place. Standards are useful and important. PL/1 failed perhaps because programming is more complicated. Though computing is universal, we have not yet managed to come up with a good universal programming language. But data may be simpler.
Having said that, I think HDF5, NetCDF, JSON, YAML, and SQL (and NoSQL) all fail on the universality front. You would not
Re: (Score:2)
The thing with self-descriptive data is it doesn't matter if you personally use ASCII, EBCDIC, Unicode or wide characters. You can map whatever to whatever. There is a standard, but it is in the description and not in the described. Specialized solutions are superior - in their niche. I would take a toolbox with a thousand types of saw, hammer and blade over a single Swiss Army Knife because each tool is superior even though no single blade can do everything.
A universal system would be an object-oriented Co
Re: (Score:2)
> What do you mean? We shouldn't use ASCII? Or Unicode? How about what we in the West know as the Arabic numbering system?
None of those *are* one-size-fits-all. ASCII and Unicode are very good at encoding text in human-readable forms; but I wouldn't want to encode my porn in them. The arabic numbering system is very good at expressing discrete quantities, but kindly refrain from writing a whole novel in it.
The point is that this NoSQL stuff is being hailed as the next big thing, which shall Smite the Rel
Re: (Score:2)
I don't think anybody is claiming NoSQL is new. Many NoSQL products are just incremental improvements over old-style object-oriented databases, after all.
All that is new is the concerted push to point out to people that RDBMSs and SQL shouldn't necessarily bethe automatic solution to every problem. They're extremely good at certain tasks, perhaps even for a large majority of tasks, but there are some instances where they are not the best tool for the job. The NoSQL people just want to make sure we all co
Re: (Score:2)
Unfortunately, the NoSQL people come over as if - and many actually do - believe that RDBMSes are utterly useless now that they have found Je- err, their new toy.
Of course what is now suddenly known as NoSQL has it's place - hell, how many of us haven't been using Memcached or something similar? Mozilla (and many others) uses RDF stores - yep, that's also NoSQL now. It's just not the ONLY solution, let alone always the BEST one - and of course you need to pick the right tool for the job.
There's been a kente
Re: (Score:2)
It's so cute how NoSQL developers have reinvented the XML database.
Actually, XML is a comparative latecomer.
NoSQL uses JSON which has "name: { blah:val, blah:val }" style syntax. I needed a text database format for some [perl/awk] scripts I wrote in the 80's. I ended up creating a similar curly brace format--no big deal.
Before relational databases even existed, there were CODASYL-compliant databases. These didn't even have SQL as we know it today.
Re: (Score:3)
Re: (Score:1)
Re: (Score:3)
What I find especially cute is that nobody in this thread seems to have heard of Z39.50 or WAIS.
Re: (Score:2)
The whole "XML in databases!" trend came out of people being frustrated with not being able to stuff arbitrary data into a relational database. This "new" document storage idea is addressing the exact same problem in a similar way, only it's a different schemaless storage scheme/database pair. That's why I was amused by the similarity.
Z39.50 and WAIS were implementing a client/server protocol that wasn't tied to any particular database storage backend. If I were searching for a historical precedent for f
Re: (Score:2)
Z39.50 and WAIS were implementing a client/server protocol that wasn't tied to any particular database storage backend.
That's certainly true. I only know of one Z39.50 database engine that actually speaks Z39.50 natively.
Nonetheless, Z39.50 was designed with SGML in mind. It implements a very flexible documents-with-nested-and-repeating-fields schema, and did so in 1988.
Re: (Score:2)
The article is barely a description of MongoDB... (Score:5, Informative)
The article is barely a description of MongoDB records. It does not really detail any real drawbacks or benefits beyond "look ma, random structure in my record!"
Re:The article is barely a description of MongoDB. (Score:4, Insightful)
I read this article with the hope of seeing some of the benifits and drawbacks (as the title implied). No talk of scalability, indexing, speed, etc. I actually feel dumber for having read the article.
Re: (Score:2)
The comments on SlashBI are great too. I also wanted to know how to query data out of your "documents" as the Wikipedia page doesn't describe that. Using the SlashBI example, show me all contact objects with state = "DC" or all records where last name ilike 'o_ama'. Does performing a search like that iterate over all records? Do you need to enable some full-text indexing of your entire document store to be able to execute queries like that?
Re: (Score:1)
Re: (Score:2)
You can turn off transaction isolation and cram serialized record data into a single BLOB field, and you will get the same thing right?
Not really. Schemaless databases provide indexing and search capabilities that are impossible to achieve using SQL blobs without either loading all your data back into memory whenever you want to search for something or providing your own index mechanism.
Or, use a freaking filesystem?
Which as well as lacking indexing and search as the SQL-based system would, also does not provide any useful mechanism for concurrent updates, or for ensuring consistency (whether eventual or guaranteed at all times). It would also probably be much slower.
Another NoSQL article on /. (Score:5, Insightful)
Oh, look, it's a NoSQL article.
Cue the hundreds of Slashdotters who proclaim "Oh, they're reinvented obsolete databases" and "Just wait until they need ACID, then they'll be fucked", the NoSQL blind-faith followers who harp about pure scalability and clustering, and at least a dozen references to an animated video of a retarded strawman saying "webscale" repeatedly.
Somewhere in the depths of poorly-researched comments will be some guy who thinks that NoSQL is a tool that really just might be useful for particular use cases, and should be used where appropriate, and nowhere else. Sadly, his post will be missed because everyone's too busy talking about how everything can be done just as easily on a $500,000 server farm running Oracle's latest and greatest turd.
Re: (Score:1)
You don't get mod points because you're AC.
Re: (Score:2)
+1 Preemptively made other posts Redundant
Re: (Score:1)
Kind of like the McDonalds of data storage.
Re: (Score:1)
Welp, looks like that's it for the thread, folks. Move along.
Re:Another NoSQL article on /. (Score:5, Informative)
Sadly, his post will be missed because everyone's too busy talking about how everything can be done just as easily on a $500,000 server farm running Oracle's latest and greatest turd.
Actually, I was going to talk about how PostgreSQL 9.2 (expected in Q3 of this year) will include JSON support [postgresql.org]. The database also has non-relational key value [postgresql.org] storage, and that feature is even available in Heroku deployments [heroku.com] now.
PostgreSQL also lets you relax ACID for performance when that makes sense, at the transaction level, using synchronous_commit parameter [postgresql.org] and unlogged tables [depesz.com].
There are two things PostgreSQL doesn't do as well as MongoDB. It won't do simple key/value lookups quite as fast; I normally eliminate that problem by putting a memcached server in at some level. And you can't split writes among multiple nodes easily yet.
Re: (Score:3)
Glad I'm not the first to bring up PostgreSQL, which gives you serious amounts of awesomeness at 0% of the cost of Oracle.
Re: (Score:2)
I was strictly speaking, because pgpool-II's statement replication is neither built-in nor without limitations--compared to the full feature set of PostgreSQL. Another write scaling approach is to use the PL/Proxy language to wrap database access. There's also people doing PostgreSQL sharding in their application layer, connecting to one of multiple databases based on what they need. None of these ideas are popular nor built-in to the database yet though.
Re: (Score:2)
Agreed, but that's the peril of living in a world where everything is tightly-coupled and highly-integrated. People forget that you can mix-n-match, they look no further than using one system for everything. NoSQL does indeed have a purpose, and just like an F1 car, it is in a class of its own when used for that purpose. But I'd no more use Memcache as a substitute for NetCDF or Ingres than I would use an F1 car to go off-road sight-seeing.
Re:Another NoSQL article on /. (Score:4, Funny)
For the benefit of readers in the US, F1 is like Indycar but the cars can turn both right and left.
my article about porn stars using kickstarter (Score:1)
my article about porn stars using kickstarter and other donation websites to provide each other with healthcare (because there is no healthcare in the porn industry) got instantly blacklisted.
instead, you get this.
shrug.
Re: (Score:2)
Somewhere in the depths of poorly-researched comments will be some guy who thinks that NoSQL is a tool that really just might be useful for particular use cases, and should be used where appropriate, and nowhere else.
Yes, those "appropriate uses for NoSQL" are like unicorns... often rumored, only apparently seen in huge companies like Amazon and Google where they have dozens of PhDs working on them. They're very similar, unfortunately, to the "seemingly appropriate uses for NoSQL" but you can't really tell until you've wasted months of development effort...
Re: (Score:2)
And when people point out there appears to be no actual use, the NoSQL people feel the need to expand NoSQL to include completely random things like Memcache, which is not any sort of 'database' at all, and the Memcache people would be completely baffled to be included in this group.
I'm frankly surprised they haven't started claiming that filesystems are NoSQL. And everyone uses those! So everyone uses NoSQL!
And, in fact, they're correct. That's really what NoSQL is. It's a not a replacement for any sort
Article is not useful (Score:5, Insightful)
I'm not sure what the point of this "article" is. It is light on actual information or anything useful, it's basically just a few paragraphs that say "a NoSQL database called Mongo stored data in JSON format. This may or may not work for you".
If we're going to have "BI" articles, they should be informative, containing useful information that we couldn't have gathered ourselves in 10 secs of googling.
How about some comparisons between various NoSQL solutions? How about binary access API v/s RESTful approach ala Couch? How about clustering, replication and scalability? How about stability concerns (with Couch, for example). Real world use cases? Examples of companies using them for specific solutions? Performance comparisons with RDBMS's? Problem domains that NoSQL/schema less DB is more suited to than a RDMBS?
I'm not trying to be pointlessly critical here, I'm trying to provide some constructive feedback on the new slashdot BI format. This article wasn't useful to me at all. I'll probably not spend time reading these articles in the future if the content is as light as this article.
Re: (Score:1)
Re: (Score:1)
Unstructured Data (Score:5, Interesting)
I don't know when unstructured data turned into NoSQL or Big Data, but it is a pretty simple concept with complex Enterprise level requirements. I work in this field and have for various companies. The biggest obstacle is conforming to the laws of various jurisdictions and levels of government.
You have unstructured data, but it NEEDS some level of structure. That structure is there to restrict access to certain groups within the organization and also for retention rules, which differ by type of data being stored. Not to mention that you must store certain documents in the country of origin, so structured field-based distributed storage plays a role. Oh yea, laws/policies around encryption and whether or not an index violates those laws/policies.
This doesn't work well with a relational database. Sure, you can jam it into a RDBMS like IBM Content Manager, but it becomes inflexible. However, there are constraints that must be followed and all documents need some kind of structure wrapped around them in a RDBMS-like fashion.
I haven't dove into these NoSQL systems myself. They seem like a good idea, but I hesitate if they are too loose. In an Enterprise with sensitive information, you need to deny first. Also, how do they index the fields? Like when you have 100,000,000 documents with invoice numbers...
Re: (Score:3)
Identifying which field is the primary key is not the same as indexing the fields, plural.
Re:Unstructured Data (Score:4, Interesting)
Some of those documents with invoice numbers are not invoices. In fact, they could have many invoice numbers. An invoice numbers are just an example. There is a lot of value to a company to find all documents relating to product #XYZ that was shipped to company ABC. Maybe throw some date constraints in there. And they don't want useless garbage in the results. Also, all invoices should have an invoice number. And an invoice number should have a certain pattern. Otherwise, garbage-in garbage-out.
Also, the part where RDBMS based document storage falls flat on it's face is versioning of the schema itself. Business requirements change; they want to require a field that wasn't required before. They want to make one optional. They want to change the type or the pattern format. But the searches should still go across all those documents. NoSQL based stuff, assuming they are properly and efficiently indexed, may do better in this department.
Re: (Score:2)
I agree that you need structure, much like RDBMS. However, there are advantages to a NoSQL-like model with Enterprise document storage. There are disadvantages to RDBMS as well. It needs something in the middle.
Sure, a traditional RDBMS can do it. IBM Content Manager is exactly that (with an unstructured component for storing docs). Have you used RDBMS for Enterprise Content Management? Holding documents to strict schemas can be ineffective, because documents change over time. Sure, you can just crea
Re: (Score:1)
... Holding documents to strict schemas can be ineffective, because documents change over time. Sure, you can just create more and more tables, but that requires administrators and time....
Suppose the(y) add a field to invoices. And it is required on all future invoices. With RDBMS, you need to create a new table with a NOT NULL constraint.
There are good reasons to choose NoSql (I like my paycheck), but that's not one of them.
The major SQL products have supported "add column" for decades. I implemented add column in the '80s. ISTR that oracle, db/2 and sybase also supported it at the time.
Re: (Score:1)
Relatively free form key value pairs except some other stuff that matters for your domain works just fine in a relational db, you just have to query for it when you need it. If you already have a db and an ORM, which would be the common case in any enterprise environment, you'll get your getters and setters for free once you specify class/member->table/column and you can have an attribute table in the without breaking step. How this would be hard to set up or use compared to a key/value store is a myst
Re: (Score:1)
In RavenDB, you create indices by creating a query returning the field you want to index and telling RavenDB to index that. For instance, if you are going to query your User documents based on email address often, you would write an index:
from user in Users select new { user.EmailAddress }
And then you can query:
from user in Users where user.EmailAddress == "bob.dobbs@example.org"
You can do this without an index, but it will be slow. Though in the case of RavenDB, I believe the database will add indices base
I'm using Mongo (Score:1)
Just wondering... (Score:3)
Where is Lotus Notes in all this bru ha ha? They were the original NoSQL system.
Re: (Score:2)
Still at the same place they were decades ago?
*not* flamebait here. Truth. I had deal with this not that long ago on a daily basis... and I wonder, I really really do.
Re: (Score:2)
Having done some development work on it a long time ago, your news is dissappointing.... yet... somehow not surprising.
Re: (Score:2)
Well, the "good news" is where I work is now dumping Lotus for Outlook... and all MS software along with it.
I am not entirely sure if this is a step forwards or backwards..
Disappointing (Score:2)
NoSQL is just an umbrella term (Score:2)
What is document storage? (Score:1)
Re: (Score:1)
What is the point of document storage in a noSQL database? If you're not going to store docs in a RDBMS, why not just store them in a filesystem? What is the point of Mongo or whatever this stuff is?
They are JavaScript Object Notation (JSON) documents and you can query into fields of the object-document without the database having to read the whole "document" in the same way you can read rows based on some set of columns in an RDBMS. Given objects like { a = { b,c } } or just d = f you could read a.b where = c or just d where = f. It's multidimensional as opposed to the flat column format of an RDBMS. Unfortunately, their are no data types, constraints, foreign keys or triggers. Data integrity has to
Re: (Score:2)
They are JavaScript Object Notation (JSON) documents and you can query into fields of the object-document without the database having to read the whole "document" in the same way you can read rows based on some set of columns in an RDBMS.
Yes, congratulations on off-loading the 'reading the document into memory and parsing it' into an entirely different process. I'm that will make it much faster then simply keeping the document mmapped in the application that needs it, and in no way be counterproductive
Not just k/v/blob store, but delegating queries (Score:2)
We used MongoDB as a query/cache accelerator for semi-structured data. The key bottleneck was delegating queries outside of application (pre-filtering results according to ACLs, date transforms, etc.).
We don't have a shockingly huge dataset, and site traffic wouldn't be considered as webscale, but the ad-hoc schema and ability to delegate complex queries to the DBs as JS was really powerful and bought us a lot of performance for very little effort.
And it's only a cache of the authoritative data store, so we
Re: (Score:2)
Could the same be achieved with SQL... of course, two tables, each with similar structure, but one allowing null values, and then another table which links them.
Uh, no. In SQL, you store data by putting each record in a row. No one has the slightest idea what you're talking about, or why you'd need to 'store only the differences', or why you'd need three tables for that.
However, then my queries become increasingly complex for pretty simple data.
Oh noes! Complicated queries!
Compare this to what you mi
NoSQL needs better PR. (Score:2)
I, for one, would have a much better time taking NoSQL seriously if so many of the arguments for it didn't reduce to -and to truly express this reduction properly I need to put on my best Barbie voice- "The relational model is haaaaard." Some say SQL instead (for example, whoever came up with the NoSQL moniker), but except for a couple of arguments that amount to pure syntax baw it reduces to pretty much the same thing.
NoSQL has its place: there are some things it does really well. The problem is that the t