Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage

Database Clusters for the Masses 279

grugruto writes "Cluster of databases is no more the privilege of few high-end commercial databases, open-source solutions are striking back! ObjectWeb, an Apache-like group, has announced the availability of Clustered JDBC (or C-JDBC). C-JDBC is an open-source software that implements a new concept called RAIDb (Redundant Array of Inexpensive Databases). It is simple: take a bunch of MySQL or PostgreSQL boxes, choose your RAIDb level (partitioning, replication, ...) and you obtain a scalable and fault tolerant database cluster."
This discussion has been archived. No new comments can be posted.

Database Clusters for the Masses

Comments Filter:
  • WOOHOO! (Score:2, Funny)

    by semifamous ( 231316 )
    Wow! Can you imagine a beowulf.... oh, wait, nevermind
  • by the_quark ( 101253 ) * on Wednesday April 30, 2003 @11:19AM (#5843247) Homepage
    Just started looking at the site. I've wanted this for years. I was ecstatic with what load-balancing cheap Apache boxes did for the cost of web hosting. Unfortunately, reliability has still required hundreds of thousands of dollars of high-end equipement and software for databases. I've been hoping the open-source community would make headway on this front.

    So, the question is - is anyone working on anything like this for Perl, C, or generic implmentations?
  • hmmm (Score:4, Interesting)

    by the_2nd_coming ( 444906 ) on Wednesday April 30, 2003 @11:20AM (#5843269) Homepage
    now if only MySQL or PosgreSQL can get the reputation that Oracle has mabye we will start to see Oracle DBs go away in favor of the cheaper solutions using RAIDb
    • It's not just a matter of reputation - MySQL and Postgres, as impressive as they are, are still nowhere close to Oracle in terms of features. Yes, most of those features may be high-end, but they're still features people look for. One example: RMAN.
    • now if only MySQL or PosgreSQL can get the reputation that Oracle has

      You mean 'being run by a privacy-hating megalomaniac like Larry Ellison'?

      Open source RDBMS's are good solutions for many, perhaps even most, problems. But there are still some situations where I'd want to stick with Oracle's strength and maturity and not take chances.
      • Open source RDBMS's are good solutions for many, perhaps even most, problems. But there are still some situations where I'd want to stick with Oracle's strength and maturity and not take chances.

        PostgreSQL isn't mature? It's a direct descendant of Ingres, the original relational database. Ingres was written in 1977 at Berkeley. Bob Miner, Ed Oates, and Bruce Scott saw the commercial potential of RDBMS and founded a company later in 1977 called Software Development Laboratories. Larry Ellison joined up
    • Re:hmmm (Score:3, Insightful)

      by Sxooter ( 29722 )
      Interesting point. I find that there are several views when it comes to OS databases.

      One is that since most open source databases lack some feature, they will never replace any Oracle servers. Most of the people who believe this also believe that Oracle servers are always used in high parallel load transactional systems that have to be up 24/7 and never go down. While plenty of sites that need that use oracle, it is not inversely always true. Many places put Oracle online because it's what their develo
  • by marcink1234 ( 556931 ) on Wednesday April 30, 2003 @11:22AM (#5843283) Homepage

    Running many databases is easy. Organizing and serializing replication is hard. Even if one have distributed transactions handy - not present in this case. But let's read their code...

  • Performance? (Score:5, Interesting)

    by deranged unix nut ( 20524 ) on Wednesday April 30, 2003 @11:22AM (#5843289) Homepage
    Hmm, interesting idea. I didn't see performance listed as a feature.

    I wonder how much slower my query will be when the data is spread across several machines. I'd imagine that a few complex queries that aren't correctly optimized would bring this system to it's knees rather quickly.
    • Re:Performance? (Score:5, Informative)

      by jsin ( 141879 ) on Wednesday April 30, 2003 @11:30AM (#5843373) Homepage Journal
      Database clustering is typically used for high-avaliability, not performance.

      There are better ways to improve the performance of a database, horizontal partitioning, federated servers, etc.

      This would be very cool if there was a generic implementation; we build many Microsoft SQL clusters and just the hardware requirements for an MSCS cluster easily exceed $50k, let alone the licensing...as an MCDBA I'd consider an open source solution if I could use it as a back-end ot an ASP/VB.NET application, just to save the licensing $$ for consulting! ; )

    • Re:Performance? (Score:2, Informative)

      by Anonymous Coward
      C-JDBC can handle more than just full partitioning or replication, it also provides partial replication (a little bit like you would use RAID-5 with disks).
      The idea is that with full replication you have to broadcast the write to all databases (to be consistent) and you can only balance the reads. By controlling the replication of each database table, you can have scalable performance. Look also at the nested RAIDb levels [objectweb.org], it's pretty cool to build large configurations.
      Some tests have been done with TPC-W [tpc.org]
    • I wonder how much slower my query will be when the data is spread across several machines. I'd imagine that a few complex queries that aren't correctly optimized would bring this system to it's knees rather quickly.

      Total read query throughput will scale with the number of machines in the cluster, given (from the website):

      "The database is distributed and replicated among several nodes and C-JDBC load balance the queries between these nodes."

      For writes, the data must go to every machine replicating the

  • by Jack William Bell ( 84469 ) on Wednesday April 30, 2003 @11:24AM (#5843302) Homepage Journal
    This is a major threat to the big vendors. In fact I would say it is even more of a threat to Oracle than it is to MS! After all MS can continue to go after the midrange market that are are already locked into them for the OS.

    But Oracle shops are dealing with expensive boxes they would love to replace, not to mention expensive Oracle licenses. Often the only reason they use Oracle (other than Oracle salesmen licking their buttholes) is because only Oracle has the horsepower to meet their requirements. Give them a cheaper alternative with the same capabilities and they will bail out faster than you can say 'Geronimo'.

    Expect Larry Ellison to start talking about the dangers of using Open Source software now...
    • People will always by oracle because "No one ever got fired for choosing Oracle". If something goes wrong, you always have someone to blame. With open source, your job is more on the line because you have to take responsability.

      We were using MySQL and it was working fine but somewhere along the line some Oracle salesman convinced someone that Oracle was better and we switched. I have seen some minor good things,but not as assload of $ worth.
      • by FortKnox ( 169099 ) on Wednesday April 30, 2003 @11:32AM (#5843399) Homepage Journal
        I have to say this is a major point. This is why you don't see people using open source. If my DB goes down, I call up Oracle, and make them bring someone down here to fix the problem. If my open source DB goes down, I crap my pants and hope to keep my job.

        What does proprietary software have that Open Source doesn't? Insurance.

        The best way to knock over oracle is to start up a company that supports open source for a fee (which is cheaper than running oracle for a year).
        • MySQL AB and a few Postgre companies.

          they do consultant work for there products.
        • Given the level of help I've typically gotten from Microsoft, I'd prefer to use real insurance: backups. Lots of backups. If your only recourse in the event of a database going down is to call the vendor, you'd best just start working on the resume now.

          And I can back up a MySQL database and offsite/onsite copy the tapes as necessary, just like SQL Server or Oracle. Generally I can start a server rebuild/restore in less than the time it takes to give some level one tech support asshat my phone number for
          • I agree with what you say, and it makes me think that I need to test my disaster recovery plan; but lets say that you have some XML transfer or DB trigger that causes that database to die. A restore won't fix that problem, you may not even know what the heck caused the problem.

            Most closed source databases are expensive. I like to think that most people can pick the best database that works best for their company. That could be MySQL, PostGreSQL, FileMaker, Access, DB2, Oracle whatever...

            In my opinion th
        • If my open source DB goes down, I crap my pants and hope to keep my job.
          Oh please! If you throw 1/2 as much money at one of a number of support organizations, you'd at least as guaranteed of the same uptime and probably able to push for enhancements that you won't get otherwise.
        • The best way to knock over oracle is to start up a company that supports open source for a fee (which is cheaper than running oracle for a year).

          Which is exactly what MySQL AB does for MySQL. Their support is not particularly cheap, (though I be that it is a lot less than Oracle's), but I recommend it highly. The original designers are still leading the development/support team (is that true for many of the alternatives?) and make a living *only* because of their superior product, not because some salesma
        • by leviramsey ( 248057 ) on Wednesday April 30, 2003 @12:23PM (#5843943) Journal

          Josh, know what you're talking about before you post. MySQL [mysql.com] (the company which does the vast majority of development of MySQL) offers a variety of levels of support and consulting, regardless of the number of systems that you admin. For $48,000/year, you get:

          • Access to the entire development team 24x7x365, with a guaranteed response within 30 minutes
          • Ability to request developers by name
          • Just about every issue is supported (from APIs to configuration to OS, kernel, library, and filesystem dependencies to custom compiles, to recovery, to tuning and so on)

          Does Oracle match that for the price?

        • Seems to me it'd be more cost-efficient to hire an admin who was experienced with MySQL or Postgres. That way you have support on-site from an actual expert; no expensive contracts required, no waiting, no getting screwed when the 'support' is sub-standard or the company that provides the support demands more money.

          Max
        • "I call up Oracle, and make them bring someone down here to fix the problem."

          I call bullshit. Unless you are a fortune 500 company AND are paying more then a hundred thousand dollars per year in support costs then there is no way Oracle is going to send someone over there to fix anything.

        • I was at a Linux conference yesterday (in Toronto). Oracle had a presentation. They now support Linux.

          If you have a problem with your database you phone Oracle, and they talk you through it. If it turns out to be an OS problem, then they tell you go talk to your OS vender except when it's Linux. If it's Linux they will deal with it directly.

          I was very impressed. They are moving their whole company onto Linux and are more then 50% there now.
        • Don't want to be knit-picking,but ...

          with Oracle you won't get any kind of insurance. Read their "EULA" for details. The only thing you have with commercial software is "someone to blame". (OK, of course you can buy any kind of support from most commercial vendors, but you must pay a lot of money and the only thing you get is, that their support tries harder (e.g. faster)). Try to set an agreement with Oracle of someone else where they pay you any money you lose from their faults.

          Say: 1 hour of downtime o
      • by valisk ( 622262 ) on Wednesday April 30, 2003 @12:20PM (#5843910) Homepage Journal
        People will always by oracle because "No one ever got fired for choosing Oracle". If something goes wrong, you always have someone to blame. With open source, your job is more on the line because you have to take responsability.

        Prior to Oracle taking off in a big way people used to say:

        People will always by IBM because "No one ever got fired for choosing IBM". If something goes wrong, you always have someone to blame. With the Seven Dwarfs (the common name for IBMs competitors back then), your job is more on the line because you have to take responsability.

        Then Larry E. shamelessly put together a cool SQL database which copied every major innovation IBM had made and added in a few more for good measure. He also cut the price by a third, IBMs database customers deserted in droves, after all if this Oracle thing turned out to be shit, they could always get IBM to come clean up the mess. It turned out though, that Oracle wasn't and isn't shit.

        That does not mean that Oracle is immortal and will always be top of the pile, Postgres now replicates almost all of the major features and is proven in the reliability stakes, tools like this are only going to make it more likely that corporate data departments will dip their toes into the Free software waters, after all if it turns out to be shit, they could always get Oracle to come clean up the mess.

    • by DavidpFitz ( 136265 ) on Wednesday April 30, 2003 @11:32AM (#5843398) Homepage Journal
      Give them a cheaper alternative with the same capabilities and they will bail out faster than you can say 'Geronimo'.
      But there isn't anything close to Oracle when it comes to availability/reliability etc. And, even if there was IT managers would not go for it for some years because it's not proven in the enterprise. Oracle is so embedded into management brains, and it's reputation is well deserved.

      If you want to cluster Oracle, use Oracle RAC (Real Application Clusters). It's based on Parallel Server so is mature enough to put forward for consideration... and even then it might be eschewed from above. Cheap databases are not going to ring the bells of the people with the say-so simply because Oracle (and DB2 etc) are proven over the years, and the cost of losing your data because you went for the cheap option is going to lose your company a lot of money, and you your job!

      Technically better, cheaper and all those good things does not mean better for a business. Databases are predominantly used for *business*, and as such a *business* reason it used when choosing one over another, not technical reasons.

    • Until someone can come up with an open source solution even vaguely resembling RMAN, Oracle has nothing to worry about.
  • by Lysol ( 11150 ) * on Wednesday April 30, 2003 @11:28AM (#5843345)
    So a few things come up just reading the docs on this:

    1. A Controller. It looks as tho a single controller is used by the clients to communicate to the various RAID'd dbs. I'm sure there can be multiple controllers since there would be little point to make some db's redundant, yet the access to them not. Still looking into this.

    2. And also, it looks as tho the default port is 1099 - RMI. If you have, for a web app, your EJBs and web app local to that containter, that might not be a problem. However, I happen to have my EJB server on its own box and this might very well cause probs. I think it said you could specify our own ports, but I haven't seen any examples in the docs yet of this being the case. Also, still looking.

    A few other things exist as well which are in the docs as known limitations:
    * XAConnections
    * Blobs
    * batch updates
    * callable statements

    These could be serious issues for some. My last project used CLOBs/BLOBs, batch updates and callable statements, so this would rule that out. Of course, all the db stuff was strictly tied to Oracle, so I think that would rule this all regardless. ;)

    All in all tho, this looks like a good start. As my current project progresses, clustered dbs will become more and more of an issue. I've looked into some other projects out there for Postgres, but nothing yet really satisfactory. I think this is a good step in the right direction - for Java developers. It'll be interesting to watch.
    • Some answers:

      1. Yes, you can have multiple controllers that synchronizes using group communication. In the driver, you give a list of coma separated host names running controllers. The driver has built-in failover and load balancing among multiple controllers (check the doc here [objectweb.org]).

      2. Yes, all ports are customizable when you start the controller (check the doc here [objectweb.org]).

      This is just an alpha version, so as you mentioned, there are still many features missing but it is a good starting point and contributions a

  • by a7244270 ( 592043 ) on Wednesday April 30, 2003 @11:31AM (#5843383) Homepage Journal
    I looked at the diagram, and it looks very nice, but they seem to be very light on the details.

    Supposedly, This new version has been successfully tested with Tomcat, JOnAS, MySQL and PostgreSQL. Excellent results have been obtained with the TPC-W and RUBiS benchmarks.

    Don't get me wrong, I like the idea, and I have been wanting something like this for years, but I sure would like to _see_ the test results, even if they are preliminary.
  • by Frater 219 ( 1455 ) on Wednesday April 30, 2003 @11:35AM (#5843424) Journal
    Since the article suggests the idea of applying disk-volume concepts (RAID) to databases, I thought I'd bring this up: For a while now I've been wishing there was an equivalent of NFS for databases, a way to mount a running database's tablespace into another database. This would allow one to draw together disparate databases, creating views and running joins across tables which natively reside in different databases, on different hosts.

    Here's an example of an application: I have a database-driven Web application [slashdot.org] that allows my onsite clients to register network services for openings in the firewall. Another software component probes the registered hosts for daemon version information and records it in the database, so that we can send out alerts when security holes are discovered in particular versions. I use PostgreSQL on Debian and Solaris. Independently of my work, our networking office has a Microsoft SQL Server database of IP addresses, MAC addresses, and physical switch ports and jack numbers.

    What I'd like to do is mount both my database and the networking office's database into some sort of "meta-database" -- analogous to mounting filesystems from two different hosts via NFS -- and run SQL queries that span both data sets. I wouldn't expect to be able to write to this conjoined database -- locking would be a nightmare -- but being able to SELECT across the two sets would be incredibly valuable.

    • Oracle has database links.

      Create a database link (for example to an AS400) and you can query the remote tables just like local tables.

      select * from somelib.sometable@as400

      Oracle will pass as much SQL as it can to the remote DB engine in order to keep things speedy.
    • Your problem wouldn't be solved with the product mentioned in the story. However, because you are using MS SQL Server, this is really easy. You just need to get the postgres ODBC Driver, and setup a Linked Server on the MS box.

      Check out This page [postgresql.org] for the postgresql ODBC Driver.

      You should also look at the linked servers documentation in SQL Server Books Online (under sp_addlinkedserver) as well as the interface in enterprise manager (security -> linked servers)

      As I was searching a bit, I see that pe
    • Actually, if you look at RAIDb-0 [objectweb.org], it is very close to this, maybe even identical. They show having different tables on different database servers. They also indicate that C-JDBC can be used without modifcations to the application. This would imply that if you get a JDBC driver for MSSQL, a JDBC for PostreSQL, and write your code using JDBC, you should be able to do the type of selects you are talking about.

  • by binaryDigit ( 557647 ) on Wednesday April 30, 2003 @11:38AM (#5843459)
    Maybe I missed it but there info is pretty sparse on how they handle updates (i.e. adds/deletes/updates). Does it do two phase commit so if I'm stripping data and one of the updates fail then everything fails? If they are replicating, will they automatically update replication servers if they are down at the time of the update? If one of the databases in the RAIDb doesn't support online backups and it's backing up, what will their system do? After all, this would be the true grunt work, without these features then what they have isn't a big deal at all. Does anyone have more info?
    • The C-JDBC controller embedds a recovery log that allows backends to recover from failures (check the recovery log part in the doc).
      If one backend fails in the cluster, it is automatically disabled and the controller always ensures that data that are sent back to the application are consistent.
      By the way, you can tune how you want distributed queries to complete (return as soon as the first node has commited, wait for a majority or safer wait for all nodes to commit). There are many options that helps tun
  • Why? (Score:2, Insightful)

    by Anonymous Coward
    Why do masses need database clusters? Does anyone apart from mid-large sized businesses need one?
    • Reliability. You may have quite a small, relatively low loaded database for your small business. But if your business depends on quick response, you want 24/7 uptime. If somebody asks at a shop if you have something in stock or checks a reservation at a hotel, you want to be able to say yes/no quickly. How amny times to we go elsewhere if someone says "Sorry, the computer is down"? I got that at my doctor's the other day - due to building works, the one computer with the appointments on it had been powered
    • Re:Why? (Score:2, Interesting)

      by grugruto ( 530141 )
      You have your web site backed by an open source database?
      Just put a replica on a second node and you will have fault tolerance (even just for maintenance) and you will be able to handle peak loads. 2 nodes is already a cluster, don't need to have hundreds of nodes.

      Another usage could be to keep a single Oracle instance and put a bunch of open-source databases to offload your main Oracle database. You could have all the write queries (orders, ...) handled by your [safe] main Oracle database and have all ot

  • by Arethan ( 223197 ) on Wednesday April 30, 2003 @11:44AM (#5843509) Journal
    Isn't clustering supposed to be a function of the database system, not the software you use to access it?

    I mean, this is neat and all, but I really don't want to have to use this interface just so that I can cluster my database. You're much better off placing clustering functions within the database itself. Then you can access the data by any method (ODBC, native libraries, hell even with the provided command line interface).

    Take a look at how MS SQL Server performs clustering sometime. Everything (and I mean EVERYTHING) is performed via triggers and tsql. All the clustering setup does is set up a bunch of known working trigger scripts to propagate the data. You can even edit them to your liking afterwards if you wish. Now I'm not saying that MS's solution for clustering is the cat's ass. Personally, I think it is kind of hackish, but then again I believe that clustering should be something you simply turn on, and shouldn't be able to fuss with. Realistically, I can't think of any good reason to change the cookie cutter tsql scripts that perform the clustering, so I only see the ability to modify them as a potential way to fsck it up (that being an obviously bad thing).

    Clustering really isn't that hard to implement. I'm pretty surprised that MySQL and Postgres don't have better support for it. Especially Postgres, since transaction support is really the one big key that makes clustering possible. Maybe no one has really had an itch to make it heppen yet. Hopefully it will happen soon, since I'd love clustering to be another argument for why OSS databases can play with the big kids just as easily.
    • You are true, clustering not only it better implemented ad DMBS itslef, it actually NEEDS support from the DBMS.

      You are wrong saying that implementing clustering isn't hard.

      If we are talking about REAL DBMSes (no, MySQL is not a real DBMS) enabling every form of clustering which maintains the ACID properties we expect from a DBMS is a major step, it means becoming a distributed application, and it is one of the most complex thing to implement.

      Just for example, suppose you have two machines in a master-to
      • Suppose now that the clients update the same record on the two servers in an incompatible way... you could imagine what will happen when the servers become visible to each other again...

        This is why you have transaction logs that are timestamped. When the sytems resync, they merge their transaction logs, rollback to the last synced state, and then re-execute every transaction until they are current. The end result is that the newer row updates will overwrite the older row updates. This may or may not be th
    • There already is clustering support in postgres via triggers. Problem is that it's still alpha/beta.. The problem (as other replies have stated) is that the job is nefareous, and it's possible to massively corrupt the data.

      Think of RAID as it's hard-drive counter-part.. Data-integrity could be most efficiently handled at the hard-drive layer.. Having multiple redundant controllers and disks, etc. It would be a generic disk as far as the SCSI/IDE card was concerned.. But it turns out to not be the cheape
  • by Anonymous Coward
    The commercial databases that have been doing this for years are DB2, Informix, and Teradata.

    Know what? There are a ton of deep issues beyond just making the different partitions transparent to the application level. Think about joins across partitions for sec...
    • There are a ton of deep issues beyond just making the different partitions transparent to the application level. Think about joins across partitions for sec...

      Don't see how it limits you here. If you have n fully redundant RDB's, a single controller, and m clients, the dispatcher load balances you such that if all m clients are performing non-destructive reads, they all read from different machines.. (preventing resource starvation). Each machine either put everything on one disk or segments the data acr
  • by frodo from middle ea ( 602941 ) on Wednesday April 30, 2003 @11:45AM (#5843526) Homepage
    But , Seriously do you see Oracle/DB2 etc customers suddenly jumping over this ?
    My view is that it may be difficult to migrate OSes or even hardware, but its almost darm impossible to migrate existing Databases.
    A Database is the most fundamental and most cared about aspect of a major business. There is a lot of time and effort and MONEY spent to incorporate it in to the company.
    Lots and lots of critical business applications are written using the propritory extenstions of these vendors. Is it very easy to migrate this code ?
    May be interesting for a future pilot project, but if expect business to change their database vendors.. that's not going to happen very soon.
    • It's easy, and it's not easy.

      If you have alot of PL/SQL stored procs, and you are moving to MySQL (no stored procedures yet, no PL/SQL) then it's tough.

      If you are moving to Postgres, then it gets easier.

      It really depends on how you coded your application. Even if you use a bit of non-standard SQL, there are usually equivalents.
  • Just curious.

    How do you join one table to another when they are on two separate boxes?

    Well. I know how to actually use SQL to join two tables from two separate databases. But what is actually happening inside the RDBMS at the low lever. Does one just bring over the entire other table. How does it use indexes.

    Seems to me this really is doing at best, a reference implementation that may actually degrade performance.

    • How do you join one table to another when they are on two separate boxes?

      Several people have asked this question. Have you looked at the white-paper? It's possible to do RAID-1 which is m fully redudant DBs with all tables being fully accessible from a given DB. In RAID-1, therefore, there is "zero" problem with joins / updates / transactions, because it's literally just pretending to be accessing a single machine.. (I quoted zero because you might have synchronization issues if one machine somehow res
  • ...does it mean that their db really works? (at least, until now..)
  • Cluster of databases is no more the privilege of few high-end commercial databases, open-source solutions are striking back!

    Finally, my grandmother can have that database cluster she has been bugging me about.

  • Also new! (Score:5, Funny)

    by Dark Lord Seth ( 584963 ) on Wednesday April 30, 2003 @12:06PM (#5843738) Journal

    RAID -- Redundant Array of Inexpensive Developers

    RAID 0
    Multiple developers work on the same project but none of them has any idea what the other is doing at the same time. One developer failing (caffeine dehydration, severe electrostatic shock, sex, etc) will cause the entire project to screw up and become a mess.

    RAID 1
    Extreme Programming.

    RAID 2
    Inefficient way to keep track of what developers are doing. For every 10 developers, 4 are needed to keep track of them and recover any error by the aforementioned 10 while they don't work together at all. Level of efficienty comparable to a modern goverment.

    RAID 3
    Equal to RAID 2, except all responsibility for checking the code is now granted to one person. The rest has been budget-cutted away. A bite more effective but considering people still don't cooperate, not too good.

    RAID 4
    Equal to RAID 3, escept people are finally working together now. Kinda efficient and fast, except it all still relies on that one person who checks the data.

    RAID 5
    Everyone knows what everyone else is doing, they all work perfectly together and they can easily miss one person because of that.

  • The user's guide page says this about caching:

    8.7.2. Request cache

    The query cache provides query result caching. If two exact same requests are to be executed, only one is executed and the second one waits until the completion of the first one (this is the default pendingTimeout value which is 0). To prevent the second request to wait forever, a pendingTimeout value in seconds can be defined for the waiting request. If the timeout expires the request is executed in parallel of the first one.

    A request ca

  • We've been using a similar idea for years. It's pretty much using "scaling out" with some application logic to make it useful for high-availability purposes. At one time, we had 13 subscriber databases (MS SQL 6.5) throughout the world, using transactional replication to keep them in sync. A small bit of logic in the front-end determined which server a user would connect to. In this way we could point users at the server geographically closest to them (which was configurable in a database itself).

    Esse

    • Yes but how did you handle failover in your implementation, if the defined server was unavailable or was up but unresponsive what steps were taken? Yes you can implement all that yourself but why reinvent the wheel if you don't have to. Having the ability to have redundant frontends querying redundant databses sounds pretty close to what apache and load balancing has done for webserving (allow the use of lots of cheap servers to achieve 24X7X365 operations, now I understand databases have some unique proper
  • C-JDBC is an open-source software that implements a new concept called RAIDb (Redundant Array of Inexpensive Databases).

    It's good that these are becoming available in open source form, but the concept is not new at all. IBM and Oracle both have had commercial versions for a while (I suppose the "inexpensive" part is new).

  • Thorough rundown (Score:5, Informative)

    by photon317 ( 208409 ) on Wednesday April 30, 2003 @12:47PM (#5844218)

    After actually reading the documentation, here's my informed take on this:

    1) In it's current incarnation, it's only useful for very very simple database access. No transactions, no blobs, etc. Basically if you're just storing some simple weblication tables and doing single-statements against them for selects/updates (no big cross-table transactions), you can use it.

    2) It's JDBC only. Perhaps someone could port the concept to ODBC though.

    3) There's a new middle tier between the JDBC driver and the database itself, which is the bulk of their code. This tier actually re-implements some database constructs like recovery logging, query caching, etc. Of course this is neccesary, as trying to do replication from the client-code side alone would be impossible (what do you do when one of 3 DB mirrors goes offline for an hour? have every jdbc client cache the requests and replay them later, hoping those clients are even stilla round later?)

    For some applications and some companies, in it's current state this is a godsend - but it's not a general solution yet. Making it ODBC (or even better, having the front of it emulate a native postgresql or mysql listener) would broaden it's applicability.

    Supporting transactions would be a big win too, although I'm not sure how feasible this is - I think at that point they may as well just write their own new database engine which is parallel from the start, seeing as they'll be re-implementing in their cluster tier almost everything the database server does except for actual physical storage.

    Still, it's nice to see that someone did this and made it work - and for a lot of simple databases behind java apps it's all you really need.

    PostgreSQL has all the transaction support in place already, so of all the free DBs out there it would seem they have the best shot at doing their own native parallelism, if they would just get it done someday.
  • I once worked for an opensourced company that tried creating something like this in Perl. We did so to try and lure customers from oracle and prove that open source could handle massive databases. But... we found many problems when trying to sell this to expirenced customers over oracle.

    1st... multiple points of failure. By increasing the number or databases your increasing the potential points of failure. What features are there to automatically backup data? If the data is spread randomly across the dbs a
    • You are arguing against multiple redundant servers and saying that putting everything on one big server and disk array is better??? Are you nuts? A cluster is obviously better in that any one machine or disk array going offline does not take down you complete system. Now maybe Oracle RAC or a DB2 cluster might be better for some, but a cluster of dual cpu linux boxes running postgresql might come in at a fraction of the cost and so allow some people to get clustering protection where they normally couldn't
      • Yes, putting everything in one server with redundant power supplies, redundant disk with hot swap capability, reduntant network controllers is better! This JDBC hasn't implemented multiple reduntant servers, right now its about speed increases by striping data across multiple servers. RAID 0. Are you going to trust the redunant ability in this software more then your RAID controller?

        If I bought 3 dual CPU linux boxes there is a 3x better chance of having a Power Supply die, a network connection die etc a
  • Not there yet... (Score:2, Insightful)

    by jfroebe ( 10351 )
    While, I commend their efforts, what they are offering is little more than a poor man's High Availability cluster.

    The shared disk array (RAID, etc.) is just a part of implementating HA.

    My recommendation is for the developers to take a look at how it is implemented in the enterprise DBMSs (Sybase, Oracle, MS SQL Server, DB2) first.

    jason
  • Good starting block though...

    First, they should move more and more features of the DB to the controller layer. The goal should be that you can call plain SQL statements and complex joins directly. Later, you could even have stored procedures execute there and use the cluster as if it were one db.

    Then, they should try and work it so that you make low level calls to the DB layer, this would save time in having the seperate DBs compile the SQL statements.

    Next, make some kernal mods ala Tux to make the DB ca
  • by godofredo ( 198906 ) on Wednesday April 30, 2003 @05:21PM (#5847683)
    There are many problems with this design, some have already been mentioned. There are serious issues with performing atomic updates. Modern databases use locking to allow high levels of concurrency. Foreign key constraint checking is one thing that would be very hard to implement in this design, as it is generally implemented in the indexes themselves. Likewise, to get all databases in a "RAIDb 0" group to reflect the same state, operations such as concurrent delete and insert must be completely serialized to assure consistency...serialized across all clients, not just from one source.

    Furthermore, to scale up systems generally take advantage of stripping. At the IO level that means striping across multiple disks (modern convention is to stripe across all!). In a parallel database one usually stripes a single table across multiple nodes for parallel query processing. While it is possible with C_JDBC to put table X on node A, table Y on node B I don't see any provision for striping the data. It will be very difficult to use your hardware efficiently in this scenario.

    If you are going to go through the trouble of implementing a complete query processor (that can handle jobs larger than ram), a full update/query scheduler (lock manager), and a journalling mechanism that can (somehow) even maintain atomic transactions (even in the face of multiple failures) then why not just build your own database. This system might be useful in certain rare cases but I wouldn't use it except possibly for replication.

    JJ

Beware of Programmers who carry screwdrivers. -- Leonard Brandwein

Working...