Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage Databases

Father of SQL Says Yes to NoSQL (theregister.com) 75

An anonymous reader shared this report from the Register: The co-author of SQL, the standardized query language for relational databases, has come out in support of the NoSQL database movement that seeks to escape the tabular confines of the RDBMS. Speaking to The Register as SQL marks its 50th birthday, Donald Chamberlin, who first proposed the language with IBM colleague Raymond Boyce in a 1974 paper [PDF], explains that NoSQL databases and their query languages could help perform the tasks relational systems were never designed for. "The world doesn't stay the same thing, especially in computer science," he says. "It's a very fast, evolving, industry. New requirements are coming along and technology has to change to meet them, I think that's what's happening. The NoSQL movement is motivated by new kinds of applications, particularly web applications, that need massive scalability and high performance. Relational databases were developed in an earlier generation when scalability and performance weren't quite as important. To get the scalability and performance that you need for modern apps, many systems are relaxing some of the constraints of the relational data model."

[...] A long-time IBMer, Chamberlin is now semi-retired, but finds time to fulfill a role as a technical advisor for NoSQL company Couchbase. In the role, he has become an advocate for a new query language designed to overcome the "impedance mismatch" between data structures in the application language and a database, he says. UC San Diego professor Yannis Papakonstantinou has proposed SQL++ to solve this problem, with a view to addressing impedance mismatch between heavily object-based JavaScript, the core language for web development and the assumed relational approach embedded in SQL. Like C++, SQL++ is designed as a compatible extension of an earlier language, SQL, but is touted as better able to handle the JSON file format inherent in JavaScript. Couchbase and AWS have adopted the language, although the cloud giant calls it PartiQL.

At the end of the interview, Chamblin adds that "I don't think SQL is going to go away. A large part of the world's business data is encoded in SQL, and data is very sticky. Once you've got your database, you're going to leave it there. Also, relational systems do a very good job of what they were designed to do...

"[I]f you're a startup company that wants to sell shoes on the web or something, you're going to need a database, and one of those SQL implementations will do the job for free. I think relational databases and the SQL language will be with us for a long time."
This discussion has been archived. No new comments can be posted.

Father of SQL Says Yes to NoSQL

Comments Filter:
  • Squirrel?

  • NoSQL is the sequel to SQL?

    • Re:sequel (Score:4, Interesting)

      by dsgrntlxmply ( 610492 ) on Sunday May 12, 2024 @11:33AM (#64466987)
      SQL was the sequel to SEQUEL, which in turn was described by the linked 1974 paper. As a snapshot from history, in a course taught maybe 1973 by Prof. Melkanoff at UCLA, I recall a statement (paraphrased) "we have been hearing about a recent development by Ted Codd at IBM Research called the Relational Model, which looks promising as a path forward in databases.". We then were introduced to Select, Project, Join.
  • Meh. (Score:5, Informative)

    by jrnvk ( 4197967 ) on Sunday May 12, 2024 @11:24AM (#64466971)
    The limitations of a modern relational database, when designed properly with redundancies in mind, are mostly overblown. Make no mistake, NoSQL apps are great and all, but the vast majority of applications will do just fine with an RDBMS.
    • Re:Meh. (Score:4, Insightful)

      by Patrick May ( 305709 ) on Sunday May 12, 2024 @11:32AM (#64466983)
      Not "just fine" but better. Once you find yourself implementing joins in a NoSQL database, which you will, you'll wish you went with an RDBMS. NoSQL document stores are fine for a subset of problems, but nearly always require an RDBMS for the heavy lifting.
      • Re:Meh. (Score:5, Insightful)

        by sg_oneill ( 159032 ) on Sunday May 12, 2024 @11:42AM (#64467011)

        Yeah my advice with NoSQL has always been that its a last resort, not a first resort. There are very specific things relational doesn't do well at, specifically graphs, column time series and a few things like that. (Though there are some fine relational implementations of time series, notably timescale). Unless your doing that , relational is almost always going to be the better choice. 99% of the time I've seen Mongo and the like deployed, its because the coder just doesn't grasp SQL (it doesnt help when one of the more popular SQL textbooks, SQL the hard way, is written by a guy who admits he doesnt understand JOIN statements, its a trainwreck.) or some marketing droid has bamboozled them into thinking its "Webscale", whatever the fuck that means.

        Relational DBs scale like a beast in the right hands, and can be incredibly deterministic in their behavior, making smart planning for scale and breadth possible. Don't fall for the marketing hype.

        • >> relational is almost always going to be the better choice

          If you have petabytes of data that may be the case, but that isn't "almost always".

          I've worked with both, and usually prefer NoSQL. Most SQL implementations require that a query be programmatically constructed as a string, which is a clunky PIA. The more modern NoSQL's allow you to just use native objects to form queries, simpler and easier to maintain. The schema of the database is relatively flexible which facilitates changes over time.

          • "Most SQL implementations require that a query be programmatically constructed as a string, which is a clunky PIA."

            That hasn't been true for two decades.

          • The more modern NoSQL's allow you to just use native objects to form queries

            Most object-oriented languages have frameworks that support object-based queries: Hibernate and Eclipselink for Java, ODB for C++, Diesel for Rust, etc. Of course, for very complex tasks you may need to manually write SQL, but I can't imagine any alternative if your domain is naturally very complex.

            • Try putting it into a multivalue database. Any "naturally very complex" datastructure will fit into a multivalue database far easier than shoe-horning it into either SQL or NoSQL. Older than either, better than both. Here come the pitchforks, but I stand by my claim.
          • by will4 ( 7250692 ) on Sunday May 12, 2024 @12:21PM (#64467079)

            NoSQL adapts well for semi-structured data. It has significantly higher costs in these areas when considering a long lived system in a large corporation

            - When it has to receive data from other systems and export data to other systems. Common for large businesses. The data import and export are considerably more complicated, error prone and costly when compared to tabular data.
            - Reporting, another data export operation, is also a more costly common problem for NoSQL document data.
            - Data quality over a long term. Document based systems need significantly more costly efforts to upgrade old documents to the latest document format for data quality purposes.
            - Long term placing the burden of having to interpret multiple different document formats based on age in each system accessing the document based data versus relational.

            Implementation teams will miss these by
            - Only considering the NoSQL system itself
            - Mistakenly assuming that all data access, data put into, data retrieved from the NoSQL system will go through the teams REST API.
                  - A large mistake given that the REST API does not scale for millions of data rows without large costs, adds significant point to point legacy costs forcing each and every system to have a custom one-off REST call module implemented (not possible for many third party systems).
            - Forcing each and every system to handle multiple generations of the same documents stored in the NoSQL system

            Generically, it is placing more of the cost of business logic in each system that interacts with the NoSQL system instead of having that logic inside the NoSQL system. A REST API on top of the NoSQL system fails for the same reason.

            • Importing and exporting data from other systems is generally done via some intermediate format like CSV, JSON, or XML. Works just fine with NoSQL.

              >> A REST API on top of the NoSQL system fails for the same reason

              Sure hasn't failed for any of the apps I've built. You made a lot of claims, where's the evidence?

              • by will4 ( 7250692 )

                Agree that NoSQL may make sense for simplistic, greenfield systems which do not interconnect with other large systems.

                Disagree for replacing relational databases for complex, deeply integrated systems, with high throughput data interconnects, legacy data from other systems, and geographically dispersed in different business units.

                Dev teams typically focus on their system only and systems outside theirs are afterthoughts; and a large business miss.

                • >> Disagree for replacing relational databases for complex, deeply integrated systems

                  Nobody said anything about replacing those. If a company is deeply wired for that, they are stuck with it.

                  The 'father of SQL' tells us;
                  "The NoSQL movement is motivated by new kinds of applications, particularly web applications, that need massive scalability and high performance"

                  I submit that this covers the vast majority of applications. And as I said previously, communicating between systems is generally done via so

                  • by sfcat ( 872532 )
                    Him being the "father of SQL" means exactly nothing for him understanding why NoSQL came about. He was in a university doing something else. NoSQL came mostly from the private sector and has little to do with the types of systems the person being quoted would research. NoSQL exists because DB vendors did a terrible job with distributed DBs in the mid 2000s. It also exists because not every problem fits into relational algebra (although almost everything most devs will write does). And finally it exists
                    • However, it [SQL]is also 10x as fast for most workloads as a NoSQL solution.
                      That is nonsense. As for stuff where SQL is faster: you would use SQL. Facepalm.
                      NoSQL is basically always only a: lookUpKey("my key").retriveDocument() thing.
                      And the keys are so many many many keys - and are random and not just numbers from zero to a trillion: so you can not simply index them ... the index would be bigger than a hard drive.

                      People store stuff in NoSQL DBs, because it perfectly fits the use case. Otherwise they would

                    • by sfcat ( 872532 )
                      Measure it. You will see for yourself. I have. I have built real systems that do this level of performance. You are just guessing because that's all you can do you fake.
                    • It depends on the use case.
                      Obviously if you abuse it for stupid thing where SQL like PostgreSQL would be faster, then SQL is faster.
                      If you use it for what it is designed for: no way!

                  • by Zak3056 ( 69287 )

                    "The NoSQL movement is motivated by new kinds of applications, particularly web applications, that need massive scalability and high performance"

                    I submit that this covers the vast majority of applications.

                    I'd argue that the "vast majority of applications" neither need to be massively scalable nor of particularly high performance. The "vast majority of applications' are handling hundreds, not hundreds of millions of records and have a very small number of users. This is why you so often find spreadsheets being used as a "database" because they're very small specific problems being solved by individuals or very small teams.

                    • >> neither need to be massively scalable nor of particularly high performance

                      I agree, but massively scalable and high performance covers the range of applications from the small to the large.

              • by narcc ( 412956 )

                Sure hasn't failed for any of the apps I've built.

                They will fail. Those kinds of systems are incredibly fragile.

                Importing and exporting data from other systems is generally done via some intermediate format

                You've completely misunderstood the problem. It's not about how the data is packaged, it's about how the data is structured, how that changes over time, and how that affects data access.

                This is the problem with kids chasing fads. By time all the obvious and predictable problems manifest, you've already moved on to the next shiny bauble, leaving someone else to clean up the mess.

                • >> Those kinds of systems are incredibly fragile

                  Not in my experience but you are welcome to provide evidence.

                  >> It's not about how the data is packaged

                  The claim was that "data import and export are considerably more complicated, error prone and costly when compared to tabular data". It all gets packaged for export in much the same way regardless of the database implementation and besides, it is easy to store tabular data in NoSQL.

                  • by narcc ( 412956 )

                    Not in my experience

                    Wait until the mess you've made is more than 20 minutes old.

                    The claim was that

                    Like I said, you've completely misunderstood the problem. I explained it to you, but you still don't get it. Your problem seems to be an astonishing lack of experience.

                    • >> Wait until the mess you've made is more than 20 minutes old.

                      So you've got nothing.

                      >> I explained it to you

                      You expressed your opinion about something unrelated to the topic, and I have decades of experience.

                    • by narcc ( 412956 )

                      I have experience, which is a lot more than you have. If you have any evidence for your nonsense, present it.

                      and I have decades of experience

                      LOL! Obviously not! You're either a beginner or really, really, bad at your job.

                    • >> If you have any evidence for your nonsense, present it.

                      You are the one who made the claims, and now all you've got is bluster.

                    • by narcc ( 412956 )

                      It was your claim, dipshit. Did you not read your own post? What a fucking joke you are!

                    • >> It was your claim, dipshit

                      All you've got is a nasty yap.

                      >> Those kinds of systems are incredibly fragile
                      >> data import and export are considerably more complicated, error prone and costly when compared to tabular data

                      Who said that crap? Zero evidence.

                    • by narcc ( 412956 )

                      You have zero evidence to support your bullshit claims. Fuck off.

            • You just got it all reversed.
              - Data quality over a long term. Document based systems need significantly more costly efforts to upgrade old documents to the latest document format for data quality purposes.
              You do not "upgrade" old documents. You retrieve the old document.

              Depending what you are doing: changing the original document is fraud or forgery or both.

              If you are dealing with documents: you do not put them into an SQL database. You put the meta info about them, like author, creation date or similar thi

          • by sfcat ( 872532 )
            Its not about the amount of data. Its about the types of computation. Also, on the same amount of hardware a RDBMS has 10x the performance of Spark which is the best of the NoSQL world. NoSQL is really about distributed systems and only were created because the DB vendors did a bad job with their distributed products. Had that not happened, NoSQL never would have happened either.
            • Also, on the same amount of hardware a RDBMS has 10x the performance of Spark which is the best of the NoSQL world.
              Only if you use Spark for something one wiser would have chosen an RDBMS in the first place.
              For everything else: Spark is endless faster. Sorry, why talk about stuff you have no clue about?

              And RDBMS is for storing information in a a structured way and retrieving information that you originally did not even know you have. By a structured query.

              NoSQL is about STORING and RETRIEVING large amount

              • by sfcat ( 872532 )

                Only if you use Spark for something one wiser would have chosen an RDBMS in the first place.

                Measure it. You will see for yourself. I have. This is why I don't talk nonsense like you do. You just make up stuff to suit your own feelings. I measure things to know how they work. We are not the same. Also, everything else you said was wrong. Probably for the same reasons.

        • I'm inclined to agree - and boy, I wish I'd been there when a couple of the open source projects that insist on using mongodb or whatever got started...

          Nowadays, for "general" storage, I'd say "Start with Postgres". You can do all the SQL you want, and you can do a lot of NoSQL type things too (ie. using JSONField, and you mention Timescale, which I haven't used before). The really great thing is you can have as many "index" fields as you like, so you're not limited to the one or two the NoSQLs tend to have

        • relational is almost always going to be the better choice. 99% of the time I've seen Mongo and the like deployed, its because the coder just doesn't grasp SQL
          No. They deploy it because they want to store DOCUMENTS. And have the DB INDEX them so they can do SQL like searches on data that HAS NO STRUCTURE to be placed into an RDBMS.

          Sorry: you are stupid. Or an idiot.

          I have 25 thousand PDFs about random stuff: how the fuck would that be stored in an RDBMS? Oh it is not. It is files in a file system, and a sear

      • Re:Meh. (Score:5, Interesting)

        by slack_justyb ( 862874 ) on Sunday May 12, 2024 @12:24PM (#64467081)

        Once you find yourself implementing joins in a NoSQL database

        Ugh. We had a consumer facing application that had this exact thing happen. It was something that was developed without the backend team being brought in. Eventually they had to setup nightly tasks to clean duplicate data into a more normalized form. Any time those tasks failed, all hell broke lose. Eventually, they had their jobs write success or nothing to the backend DB2 and we had an IBM i nightly job that would scan the table for success or lack of a message for success. Any missing success messages had the IBM machine hit a web service that would ring the person on-call that night to check what happened.

        There was never not a day the on-call had to troubleshoot at least one of the scripts that cleaned their data in the middle of the night. It was a cluster.

        For sure, the NoSQL got them up and running and got them through their first two versions of the product. So major win there. But as soon as the complexity increased, the ability for their NoSQL solution to keep working just fell out and their resistance to change made it harder and harder till they dug themselves a hole they could barely get out of. At some point early on it should have moved to a proper object on the IBM, but they just dug in their heels and keep going with a solution that was creating more and more debt for them.

        There's good solutions and NoSQL can be among them, but there's a point where you have to realize that a different technology is required. It isn't to say one is better than the other any more to say that a hammer is better than a screwdriver. But people can get so stuck in picking one or the other that they can get themselves into trouble.

        • Your story about digging a hole with a nosql is important. It deserves more visibility. I worked with a team that used Sole as a search engine, but because anything can be persisted in Solar doesn't mean that it should be. It became a dumping ground for persistent data, and it became the mess you might imagine.

      • No one is doing joins in a NoSQL DB.

        Are you stupid?

        No SQL means: not only SQL. You store stuff you want to join about in a relational DB and the rest in a NoSQL DB.

        Do you really think that someone needs a "join" to make a /. post?

        I do not think so. For reading posts? Neither.

        Modern RDBMS are nothing more than an inappropriate storage layer for OO to relational mapping storage.

        Joins happen when you do SQL by hand. In a terminal. I hardly ever have seen a join in a real day business application.

    • And I'm sure his saying that NoSQL is fine has nothing to do with being a consultant to the company.
    • The problem is but the vast majority of applications will do just fine with an RDBMS.
      People store stuff in RDBMS that would be better stored in a couple of files.

      It does not help anyone that people who have no clue about databases force everything into an ill designed table model ...

      Do you know how persistent storage in web browsers work? It is a key - value database. Guess what they usually use? SQLite - with a single table. Holding a key and a value. /Facepalm

  • Relational data models were proposed by Edgar F. Codd in 1970, SQL is only one of the languages that were proposed to work with the relational databases. SQL is an idea that there has to be an easy to use computer language to communicate information between humans and the underlying RDB. Chaberlin seems to be peddling a different language, that somehow translates data between an RDB and whatever passes for modern language 'of the web' (Javascript).
    RDBMS is not going anywhere, many people just want to make

  • by 93 Escort Wagon ( 326346 ) on Sunday May 12, 2024 @11:53AM (#64467035)

    The guy is saying that both SQL and NoSQL have valid applications. Whoa, stop the presses.

    • The guy is saying that both SQL and NoSQL have valid applications. Whoa, stop the presses.

      To be fair to the news cycle if you read the comments here on Slashdot it is like he is saying something novel and mindblowing.

    • by vbdasc ( 146051 )

      He said more than that, though. Essentially, he said that "when you need performance and scalability, go NoSQL way". But he forgot to mention that when you need performance and scalability without sacrificing integrity and predictability, SQL and RDBMS are still the King.

      • by sfcat ( 872532 )
        Then he doesn't know what he is talking about. The DB kernel is 10x as fast as the Spark runtime. NoSQL is about distributed features as well as things that aren't relational algebra. Those things carry big performance penalties. That's why the NoSQL clusters are usually so big and expensive to run and are often compared to a DB on a single machine to make them look like they are fast.
  • FTFS:

    Like C++, SQL++ is designed as a compatible extension of an earlier language, SQL, but is touted as better able to handle the JSON file format inherent in JavaScript.

    JSONiq [jsoniq.org] already exists.

  • Conflating (Score:5, Interesting)

    by Tablizer ( 95088 ) on Sunday May 12, 2024 @12:17PM (#64467073) Journal

    There are bunches of issues raised that are not logical upon inspection. They seem to assume too much mutual exclusion of features. Until somebody proves that having Feature X prevents having Feature Y, we shouldn't assume they are mutually exclusive*. (I don't know if Chamberlin believes them mutually exclusive, the article is fuzzy on that, but many NoSQL fans do.)

    For one, SQL is a high-level language and not a hardware architecture. Although there are proprietary hardware-centric extensions to some dialects, the language itself doesn't make any assumptions about the hardware*.

    And as pointed out, traditional RDBMS are gradually getting the "web scale" features, such as optionally loosening up ACID for distributed databases.

    Another thing is most apps typical devs work on are NOT web-scale/enterprise. We shouldn't bloat up stacks and tools meant for small and medium apps just to check off web-scale buzzwords on our resumes. That's selfish, bloating the company so you can get more money elsewhere. (AKA "Resume Oriented Programming") One-Tool-Size-Fits-All-Scales is a fool's errand.

    Often startups realize they need more flexible query options once their business matures, and start to miss ever more traditional RDBMS features, having to hand-code data features they wouldn't otherwise.

    One feature I would like to see is dynamic columns. PostgreSQL's JSON approach makes JSON columns a second-class citizen to "real" columns. I don't believe that's necessary. I'd prefer something like the draft Dynamic Relational. [reddit.com] For smaller projects or rapid prototyping, dynamism could simplify much. And you can incrementally lock down the schema as it matures.

    As far as SQL the query language itself, I haven't seen a general purpose replacement that's clearly better. You don't unseat the king of the hill with incrementally better. And SQL can be extended to fill in most weak spots. Any candidate replacement will have to prove its mettle over time. (Personally I'm a fan of the SMEQL draft concept, it's more API-library-like than SQL's keyword-heavy COBOL-like approach. For example, you don't need a direct DDL, you just update the schema data dictionaries using regular query CRUD operations. The dictionaries don't have to be actual tables, by the way, it's just an interface using DRY in concepts.)

    * Certain distributed features require making ACID-related trade-offs, but those tradeoffs can be configuration switches, and not different DB brands/languages for each ACID feature combo. That would be poor design DRY (outside of niches that need performance at any cost).

    • Worth mentioning that most programmers don't know how to deal with data that is "eventually consistent" and then end up with a never-ending supply of "mysterious" bugs once their userbase does grow.
      • by Tablizer ( 95088 )

        If your company's primary product is cat videos, losing 1 per 5k is no big deal, you are not charging for them anyhow. But do that at a bank, and the FDIC will yank your license.

        • And yet bugs will still be annoying. When someone changes their password and the password doesn't change, or someone uploads a video and the video doesn't "upload" (or at least, the key to the upload is lost). "Eventually consistent" should only be used in specific cases where a speedup can be demonstrated (ie, measured), it shouldn't be used as the default case.
  • SQL and NoSQL both have separate domains in which they're the best tool for the job. NoSQL actually comprises several different classes of databases: Key-Value, Column, Graph, and Document and each of them have domains in which they're appropriate. For me, the biggest red flag is when I hear someone proclaiming they're switching from SQL to NoSQL. Since NoSQL databases are rarely ACID-compliant, a NoSQL database will likely be less consistent, especially in highly-parallel environments. Instead of apply
  • PostgreSQL has 1st class support for JSON data. If you want to store something which isn't structured up front, add a JSONB field to a table. The B stands for binary, so it is parsed into binary and stored efficiently and indexed from fields or values as needs be.

    So you get to do powerful SQL queries and joins, or throw in some constraints, triggers or whatever but you can also stick ad hoc data without thinking about it too much.

"The vast majority of successful major crimes against property are perpetrated by individuals abusing positions of trust." -- Lawrence Dalzell

Working...