Father of SQL Says Yes to NoSQL (theregister.com) 75
An anonymous reader shared this report from the Register:
The co-author of SQL, the standardized query language for relational databases, has come out in support of the NoSQL database movement that seeks to escape the tabular confines of the RDBMS. Speaking to The Register as SQL marks its 50th birthday, Donald Chamberlin, who first proposed the language with IBM colleague Raymond Boyce in a 1974 paper [PDF], explains that NoSQL databases and their query languages could help perform the tasks relational systems were never designed for. "The world doesn't stay the same thing, especially in computer science," he says. "It's a very fast, evolving, industry. New requirements are coming along and technology has to change to meet them, I think that's what's happening. The NoSQL movement is motivated by new kinds of applications, particularly web applications, that need massive scalability and high performance. Relational databases were developed in an earlier generation when scalability and performance weren't quite as important. To get the scalability and performance that you need for modern apps, many systems are relaxing some of the constraints of the relational data model."
[...] A long-time IBMer, Chamberlin is now semi-retired, but finds time to fulfill a role as a technical advisor for NoSQL company Couchbase. In the role, he has become an advocate for a new query language designed to overcome the "impedance mismatch" between data structures in the application language and a database, he says. UC San Diego professor Yannis Papakonstantinou has proposed SQL++ to solve this problem, with a view to addressing impedance mismatch between heavily object-based JavaScript, the core language for web development and the assumed relational approach embedded in SQL. Like C++, SQL++ is designed as a compatible extension of an earlier language, SQL, but is touted as better able to handle the JSON file format inherent in JavaScript. Couchbase and AWS have adopted the language, although the cloud giant calls it PartiQL.
At the end of the interview, Chamblin adds that "I don't think SQL is going to go away. A large part of the world's business data is encoded in SQL, and data is very sticky. Once you've got your database, you're going to leave it there. Also, relational systems do a very good job of what they were designed to do...
"[I]f you're a startup company that wants to sell shoes on the web or something, you're going to need a database, and one of those SQL implementations will do the job for free. I think relational databases and the SQL language will be with us for a long time."
[...] A long-time IBMer, Chamberlin is now semi-retired, but finds time to fulfill a role as a technical advisor for NoSQL company Couchbase. In the role, he has become an advocate for a new query language designed to overcome the "impedance mismatch" between data structures in the application language and a database, he says. UC San Diego professor Yannis Papakonstantinou has proposed SQL++ to solve this problem, with a view to addressing impedance mismatch between heavily object-based JavaScript, the core language for web development and the assumed relational approach embedded in SQL. Like C++, SQL++ is designed as a compatible extension of an earlier language, SQL, but is touted as better able to handle the JSON file format inherent in JavaScript. Couchbase and AWS have adopted the language, although the cloud giant calls it PartiQL.
At the end of the interview, Chamblin adds that "I don't think SQL is going to go away. A large part of the world's business data is encoded in SQL, and data is very sticky. Once you've got your database, you're going to leave it there. Also, relational systems do a very good job of what they were designed to do...
"[I]f you're a startup company that wants to sell shoes on the web or something, you're going to need a database, and one of those SQL implementations will do the job for free. I think relational databases and the SQL language will be with us for a long time."
But how does he pronounce SQL? (Score:1, Offtopic)
Squirrel?
Re: (Score:2, Funny)
Re:But how does he pronounce SQL? (Score:5, Informative)
Nope. It's Structured Query Language, you standardized clod.
Re:But how does he pronounce SQL? (Score:5, Funny)
Re:But how does he pronounce SQL? (Score:5, Funny)
Another proof for an important Internet law: To get something right, post it wrongly on the Internet, and someone will correct you.
It is called Murphy's law.
Re:But how does he pronounce SQL? (Score:5, Funny)
No it isn't, that's -- hey, wait a minute...
Re: (Score:2)
No, it's not. [wikipedia.org]
Re: (Score:1)
You've missed the joke.
Re: (Score:2)
Truly a /. worthy thread.
Re: (Score:2)
It is called Murphy's law.
Can I refer you to Muphry's Law [wikipedia.org]
Re: But how does he pronounce SQL? (Score:1)
It is actually SEQUEL (Structured English QUEry Language) - see their original paper https://www.semanticscholar.or... [semanticscholar.org]
Re: (Score:3, Interesting)
I once worked with someone who called it "Squeal".
Re: (Score:1)
If you do a squirrel it will squeal ... so I've heard.
Re: (Score:3)
By policy, we pronounce SQL as "Ess Que Elle". It makes life a lot less irritating.
We also pronounce TCL "Tea Sea Elle".
Re: (Score:2)
Re: But how does he pronounce SQL? (Score:1)
sequel (Score:2)
NoSQL is the sequel to SQL?
Re:sequel (Score:4, Interesting)
Meh. (Score:5, Informative)
Re:Meh. (Score:4, Insightful)
Re:Meh. (Score:5, Insightful)
Yeah my advice with NoSQL has always been that its a last resort, not a first resort. There are very specific things relational doesn't do well at, specifically graphs, column time series and a few things like that. (Though there are some fine relational implementations of time series, notably timescale). Unless your doing that , relational is almost always going to be the better choice. 99% of the time I've seen Mongo and the like deployed, its because the coder just doesn't grasp SQL (it doesnt help when one of the more popular SQL textbooks, SQL the hard way, is written by a guy who admits he doesnt understand JOIN statements, its a trainwreck.) or some marketing droid has bamboozled them into thinking its "Webscale", whatever the fuck that means.
Relational DBs scale like a beast in the right hands, and can be incredibly deterministic in their behavior, making smart planning for scale and breadth possible. Don't fall for the marketing hype.
Re: (Score:3)
>> relational is almost always going to be the better choice
If you have petabytes of data that may be the case, but that isn't "almost always".
I've worked with both, and usually prefer NoSQL. Most SQL implementations require that a query be programmatically constructed as a string, which is a clunky PIA. The more modern NoSQL's allow you to just use native objects to form queries, simpler and easier to maintain. The schema of the database is relatively flexible which facilitates changes over time.
Re: Meh. (Score:1)
"Most SQL implementations require that a query be programmatically constructed as a string, which is a clunky PIA."
That hasn't been true for two decades.
Re: (Score:3)
Most object-oriented languages have frameworks that support object-based queries: Hibernate and Eclipselink for Java, ODB for C++, Diesel for Rust, etc. Of course, for very complex tasks you may need to manually write SQL, but I can't imagine any alternative if your domain is naturally very complex.
Re: (Score:2)
The data intergration elephant (Score:5, Insightful)
NoSQL adapts well for semi-structured data. It has significantly higher costs in these areas when considering a long lived system in a large corporation
- When it has to receive data from other systems and export data to other systems. Common for large businesses. The data import and export are considerably more complicated, error prone and costly when compared to tabular data.
- Reporting, another data export operation, is also a more costly common problem for NoSQL document data.
- Data quality over a long term. Document based systems need significantly more costly efforts to upgrade old documents to the latest document format for data quality purposes.
- Long term placing the burden of having to interpret multiple different document formats based on age in each system accessing the document based data versus relational.
Implementation teams will miss these by
- Only considering the NoSQL system itself
- Mistakenly assuming that all data access, data put into, data retrieved from the NoSQL system will go through the teams REST API.
- A large mistake given that the REST API does not scale for millions of data rows without large costs, adds significant point to point legacy costs forcing each and every system to have a custom one-off REST call module implemented (not possible for many third party systems).
- Forcing each and every system to handle multiple generations of the same documents stored in the NoSQL system
Generically, it is placing more of the cost of business logic in each system that interacts with the NoSQL system instead of having that logic inside the NoSQL system. A REST API on top of the NoSQL system fails for the same reason.
Re: (Score:2)
Importing and exporting data from other systems is generally done via some intermediate format like CSV, JSON, or XML. Works just fine with NoSQL.
>> A REST API on top of the NoSQL system fails for the same reason
Sure hasn't failed for any of the apps I've built. You made a lot of claims, where's the evidence?
Re: (Score:2)
Agree that NoSQL may make sense for simplistic, greenfield systems which do not interconnect with other large systems.
Disagree for replacing relational databases for complex, deeply integrated systems, with high throughput data interconnects, legacy data from other systems, and geographically dispersed in different business units.
Dev teams typically focus on their system only and systems outside theirs are afterthoughts; and a large business miss.
Re: (Score:2)
>> Disagree for replacing relational databases for complex, deeply integrated systems
Nobody said anything about replacing those. If a company is deeply wired for that, they are stuck with it.
The 'father of SQL' tells us;
"The NoSQL movement is motivated by new kinds of applications, particularly web applications, that need massive scalability and high performance"
I submit that this covers the vast majority of applications. And as I said previously, communicating between systems is generally done via so
Re: (Score:2)
Re: (Score:1)
However, it [SQL]is also 10x as fast for most workloads as a NoSQL solution. ... the index would be bigger than a hard drive.
That is nonsense. As for stuff where SQL is faster: you would use SQL. Facepalm.
NoSQL is basically always only a: lookUpKey("my key").retriveDocument() thing.
And the keys are so many many many keys - and are random and not just numbers from zero to a trillion: so you can not simply index them
People store stuff in NoSQL DBs, because it perfectly fits the use case. Otherwise they would
Re: (Score:2)
Re: (Score:2)
It depends on the use case.
Obviously if you abuse it for stupid thing where SQL like PostgreSQL would be faster, then SQL is faster.
If you use it for what it is designed for: no way!
Re: (Score:2)
"The NoSQL movement is motivated by new kinds of applications, particularly web applications, that need massive scalability and high performance"
I submit that this covers the vast majority of applications.
I'd argue that the "vast majority of applications" neither need to be massively scalable nor of particularly high performance. The "vast majority of applications' are handling hundreds, not hundreds of millions of records and have a very small number of users. This is why you so often find spreadsheets being used as a "database" because they're very small specific problems being solved by individuals or very small teams.
Re: (Score:2)
>> neither need to be massively scalable nor of particularly high performance
I agree, but massively scalable and high performance covers the range of applications from the small to the large.
Re: (Score:2)
Sure hasn't failed for any of the apps I've built.
They will fail. Those kinds of systems are incredibly fragile.
Importing and exporting data from other systems is generally done via some intermediate format
You've completely misunderstood the problem. It's not about how the data is packaged, it's about how the data is structured, how that changes over time, and how that affects data access.
This is the problem with kids chasing fads. By time all the obvious and predictable problems manifest, you've already moved on to the next shiny bauble, leaving someone else to clean up the mess.
Re: (Score:2)
>> Those kinds of systems are incredibly fragile
Not in my experience but you are welcome to provide evidence.
>> It's not about how the data is packaged
The claim was that "data import and export are considerably more complicated, error prone and costly when compared to tabular data". It all gets packaged for export in much the same way regardless of the database implementation and besides, it is easy to store tabular data in NoSQL.
Re: (Score:2)
Not in my experience
Wait until the mess you've made is more than 20 minutes old.
The claim was that
Like I said, you've completely misunderstood the problem. I explained it to you, but you still don't get it. Your problem seems to be an astonishing lack of experience.
Re: (Score:2)
>> Wait until the mess you've made is more than 20 minutes old.
So you've got nothing.
>> I explained it to you
You expressed your opinion about something unrelated to the topic, and I have decades of experience.
Re: (Score:2)
I have experience, which is a lot more than you have. If you have any evidence for your nonsense, present it.
and I have decades of experience
LOL! Obviously not! You're either a beginner or really, really, bad at your job.
Re: (Score:2)
>> If you have any evidence for your nonsense, present it.
You are the one who made the claims, and now all you've got is bluster.
Re: (Score:2)
It was your claim, dipshit. Did you not read your own post? What a fucking joke you are!
Re: (Score:2)
>> It was your claim, dipshit
All you've got is a nasty yap.
>> Those kinds of systems are incredibly fragile
>> data import and export are considerably more complicated, error prone and costly when compared to tabular data
Who said that crap? Zero evidence.
Re: (Score:2)
You have zero evidence to support your bullshit claims. Fuck off.
Re: (Score:2)
You just got it all reversed.
- Data quality over a long term. Document based systems need significantly more costly efforts to upgrade old documents to the latest document format for data quality purposes.
You do not "upgrade" old documents. You retrieve the old document.
Depending what you are doing: changing the original document is fraud or forgery or both.
If you are dealing with documents: you do not put them into an SQL database. You put the meta info about them, like author, creation date or similar thi
Re: (Score:2)
Re: (Score:1)
Also, on the same amount of hardware a RDBMS has 10x the performance of Spark which is the best of the NoSQL world.
Only if you use Spark for something one wiser would have chosen an RDBMS in the first place.
For everything else: Spark is endless faster. Sorry, why talk about stuff you have no clue about?
And RDBMS is for storing information in a a structured way and retrieving information that you originally did not even know you have. By a structured query.
NoSQL is about STORING and RETRIEVING large amount
Re: (Score:2)
Only if you use Spark for something one wiser would have chosen an RDBMS in the first place.
Measure it. You will see for yourself. I have. This is why I don't talk nonsense like you do. You just make up stuff to suit your own feelings. I measure things to know how they work. We are not the same. Also, everything else you said was wrong. Probably for the same reasons.
Re: (Score:2)
I'm inclined to agree - and boy, I wish I'd been there when a couple of the open source projects that insist on using mongodb or whatever got started...
Nowadays, for "general" storage, I'd say "Start with Postgres". You can do all the SQL you want, and you can do a lot of NoSQL type things too (ie. using JSONField, and you mention Timescale, which I haven't used before). The really great thing is you can have as many "index" fields as you like, so you're not limited to the one or two the NoSQLs tend to have
Re: (Score:2)
relational is almost always going to be the better choice. 99% of the time I've seen Mongo and the like deployed, its because the coder just doesn't grasp SQL
No. They deploy it because they want to store DOCUMENTS. And have the DB INDEX them so they can do SQL like searches on data that HAS NO STRUCTURE to be placed into an RDBMS.
Sorry: you are stupid. Or an idiot.
I have 25 thousand PDFs about random stuff: how the fuck would that be stored in an RDBMS? Oh it is not. It is files in a file system, and a sear
Re:Meh. (Score:5, Interesting)
Once you find yourself implementing joins in a NoSQL database
Ugh. We had a consumer facing application that had this exact thing happen. It was something that was developed without the backend team being brought in. Eventually they had to setup nightly tasks to clean duplicate data into a more normalized form. Any time those tasks failed, all hell broke lose. Eventually, they had their jobs write success or nothing to the backend DB2 and we had an IBM i nightly job that would scan the table for success or lack of a message for success. Any missing success messages had the IBM machine hit a web service that would ring the person on-call that night to check what happened.
There was never not a day the on-call had to troubleshoot at least one of the scripts that cleaned their data in the middle of the night. It was a cluster.
For sure, the NoSQL got them up and running and got them through their first two versions of the product. So major win there. But as soon as the complexity increased, the ability for their NoSQL solution to keep working just fell out and their resistance to change made it harder and harder till they dug themselves a hole they could barely get out of. At some point early on it should have moved to a proper object on the IBM, but they just dug in their heels and keep going with a solution that was creating more and more debt for them.
There's good solutions and NoSQL can be among them, but there's a point where you have to realize that a different technology is required. It isn't to say one is better than the other any more to say that a hammer is better than a screwdriver. But people can get so stuck in picking one or the other that they can get themselves into trouble.
Re: Meh. (Score:2)
Your story about digging a hole with a nosql is important. It deserves more visibility. I worked with a team that used Sole as a search engine, but because anything can be persisted in Solar doesn't mean that it should be. It became a dumping ground for persistent data, and it became the mess you might imagine.
Re: Meh. (Score:2)
"Solar". Darned Auto correct.
Re: (Score:2)
No one is doing joins in a NoSQL DB.
Are you stupid?
No SQL means: not only SQL. You store stuff you want to join about in a relational DB and the rest in a NoSQL DB.
Do you really think that someone needs a "join" to make a /. post?
I do not think so. For reading posts? Neither.
Modern RDBMS are nothing more than an inappropriate storage layer for OO to relational mapping storage.
Joins happen when you do SQL by hand. In a terminal. I hardly ever have seen a join in a real day business application.
Re: (Score:2)
Re: (Score:2)
The problem is but the vast majority of applications will do just fine with an RDBMS.
People store stuff in RDBMS that would be better stored in a couple of files.
It does not help anyone that people who have no clue about databases force everything into an ill designed table model ...
Do you know how persistent storage in web browsers work? It is a key - value database. Guess what they usually use? SQLite - with a single table. Holding a key and a value. /Facepalm
SQL is not the RDBMS (Score:1)
Relational data models were proposed by Edgar F. Codd in 1970, SQL is only one of the languages that were proposed to work with the relational databases. SQL is an idea that there has to be an easy to use computer language to communicate information between humans and the underlying RDB. Chaberlin seems to be peddling a different language, that somehow translates data between an RDB and whatever passes for modern language 'of the web' (Javascript).
RDBMS is not going anywhere, many people just want to make
So basically (Score:5, Funny)
The guy is saying that both SQL and NoSQL have valid applications. Whoa, stop the presses.
Re: (Score:2)
The guy is saying that both SQL and NoSQL have valid applications. Whoa, stop the presses.
To be fair to the news cycle if you read the comments here on Slashdot it is like he is saying something novel and mindblowing.
Re: (Score:3)
He said more than that, though. Essentially, he said that "when you need performance and scalability, go NoSQL way". But he forgot to mention that when you need performance and scalability without sacrificing integrity and predictability, SQL and RDBMS are still the King.
Re: (Score:3)
JSONiq already exists (Score:2)
JSONiq [jsoniq.org] already exists.
Re: (Score:2)
Conflating (Score:5, Interesting)
There are bunches of issues raised that are not logical upon inspection. They seem to assume too much mutual exclusion of features. Until somebody proves that having Feature X prevents having Feature Y, we shouldn't assume they are mutually exclusive*. (I don't know if Chamberlin believes them mutually exclusive, the article is fuzzy on that, but many NoSQL fans do.)
For one, SQL is a high-level language and not a hardware architecture. Although there are proprietary hardware-centric extensions to some dialects, the language itself doesn't make any assumptions about the hardware*.
And as pointed out, traditional RDBMS are gradually getting the "web scale" features, such as optionally loosening up ACID for distributed databases.
Another thing is most apps typical devs work on are NOT web-scale/enterprise. We shouldn't bloat up stacks and tools meant for small and medium apps just to check off web-scale buzzwords on our resumes. That's selfish, bloating the company so you can get more money elsewhere. (AKA "Resume Oriented Programming") One-Tool-Size-Fits-All-Scales is a fool's errand.
Often startups realize they need more flexible query options once their business matures, and start to miss ever more traditional RDBMS features, having to hand-code data features they wouldn't otherwise.
One feature I would like to see is dynamic columns. PostgreSQL's JSON approach makes JSON columns a second-class citizen to "real" columns. I don't believe that's necessary. I'd prefer something like the draft Dynamic Relational. [reddit.com] For smaller projects or rapid prototyping, dynamism could simplify much. And you can incrementally lock down the schema as it matures.
As far as SQL the query language itself, I haven't seen a general purpose replacement that's clearly better. You don't unseat the king of the hill with incrementally better. And SQL can be extended to fill in most weak spots. Any candidate replacement will have to prove its mettle over time. (Personally I'm a fan of the SMEQL draft concept, it's more API-library-like than SQL's keyword-heavy COBOL-like approach. For example, you don't need a direct DDL, you just update the schema data dictionaries using regular query CRUD operations. The dictionaries don't have to be actual tables, by the way, it's just an interface using DRY in concepts.)
* Certain distributed features require making ACID-related trade-offs, but those tradeoffs can be configuration switches, and not different DB brands/languages for each ACID feature combo. That would be poor design DRY (outside of niches that need performance at any cost).
Re: (Score:3)
Re: (Score:1)
If your company's primary product is cat videos, losing 1 per 5k is no big deal, you are not charging for them anyhow. But do that at a bank, and the FDIC will yank your license.
Re: (Score:2)
Different Horses For Different Courses (Score:2)
Why not both? (Score:2)
PostgreSQL has 1st class support for JSON data. If you want to store something which isn't structured up front, add a JSONB field to a table. The B stands for binary, so it is parsed into binary and stored efficiently and indexed from fields or values as needs be.
So you get to do powerful SQL queries and joins, or throw in some constraints, triggers or whatever but you can also stick ad hoc data without thinking about it too much.
Still my favorite (Score:2, Funny)
http://howfuckedismydatabase.c... [howfuckedi...tabase.com]
Self-important person makes self-important choice. (Score:2)
Re: (Score:3)
Complicated, confusing, and weak.
The fact that you made this claim immediately had me mentally tag everything you claimed as 'Ignorant. To be ignored.'
T-SQL is none of these things.