Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Data Storage Technology

World's Largest Databases Ranked 356

Posted by CowboyNeal
from the cylinder-sizing dept.
prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."
This discussion has been archived. No new comments can be posted.

World's Largest Databases Ranked

Comments Filter:
  • Google (Score:5, Interesting)

    by ScribeOfTheNile (694546) on Friday December 12, 2003 @08:09AM (#7699684) Homepage
    I would've expected to see Google in there somewhere.
    • Re:Google (Score:5, Informative)

      by tinrib (632120) <david@[ ]in.org ['sta' in gap]> on Friday December 12, 2003 @08:14AM (#7699717)
      Doesn't Google use 'big files' rather than a database for storing all its data?

      see http://www.cs.rochester.edu/sosp2003/papers/p125-g hemawat.pdf [rochester.edu] which describes the Google filesystem.
      • Re:Google (Score:5, Informative)

        by Wastl (809) on Friday December 12, 2003 @09:35AM (#7700299) Homepage
        The term "database" is rather unprecise.

        One might see a database as merely a "big file" with mechanisms to access and modify it consistently (and surely, Google has some means to ensure consistency). A big file does not disqualify for the term "database" just because it is not produced by one of {Oracle, MS-SQL, ...} or cannot be queried by the language SQL.

        It is also possible to consider the Web to be a database (of Web sites). Or an XML, BibTeX, dbm, whatsoever file.

        Sebastian

        • Re:Google (Score:3, Informative)

          by MattRog (527508)
          A database is any collection of data. A database management system (which is what most people erroneously call a database) is a system of programs (say Oracle/MS SQL) to maintain the data in a database.
    • Re:Google (Score:5, Informative)

      by lewp (95638) on Friday December 12, 2003 @08:21AM (#7699767) Journal
      Even if Google qualified, which it probably doesn't due to the methods it uses for its data storage, if I read the article properly the database vendors are responsible for naming the participants.

      Since Google's stuff seems to be developed in-house, they don't have a major database vendor to nominate them.
    • Re:Google (Score:5, Informative)

      by stripmarkup (629598) on Friday December 12, 2003 @08:25AM (#7699806) Homepage
      It seems that they are comparing relational databases. Search engines use proprietary databases which, among other things, do not allow for live insertion of records, SQL commands, etc. As for data volume, Google (or Yahoo or MSN, for that matter) are probably in the ballpark. The average html page is around 10k. Google probably stores at least 10^9 raw web pages in their cache(that's 10 TB alone) plus a lot of meta information about links to-from many others.
    • Re:Google (Score:3, Insightful)

      by Ilgaz (86384)
      What about visa/mastercard/american express?

      IMHO some of them didn't want to be in that list.
      • Re:Google (Score:2, Informative)

        by KarmaPolice (212543)
        What about visa/mastercard/american express?

        IMHO some of them didn't want to be in that list.


        If you look at "database size", number 4 is listed as anonymous. They probably aren't too interested in telling everyone what database and platform they are using for storing very critical data with.
      • MasterCard (Score:4, Interesting)

        by truthsearch (249536) on Friday December 12, 2003 @09:57AM (#7700548) Homepage Journal
        I left MasterCard in 1999 after working with their data warehouse. At the time they recently bought a 3 terabyte Sun E10000 with Oracle. They quickly ran out of space and added another terabyte. I'm also surprised to not see them on the list. They work closely with Oracle, who have an office down the street, since they have high volume. Just the credit card transactions table alone gets 14 million new records on average every day.

        I agree that there are many companies who would not want to be in that list. There's a small competitive advantage if you keep what technology you use secret.
      • Re:AmEx (Score:3, Informative)

        by hrieke (126185)
        I used to work for a company called Epsilon Data Management[1], in Burlington MA. They've been bought since I left them a while ago, but they where the keeper of AmEx customer transaction database for data mining and direct marketing (junk mail and phone calls).
        Big. 7 data silos big. Each silo holds 50k tapes, each tape was 30gb, and it usually took 4 days to load.

        [1] Epsilon was originally an AmEx division, which was spun off to keep other customers happy (banks and other CC companies).
  • by Trigun (685027) <evil AT evilempire DOT ath DOT cx> on Friday December 12, 2003 @08:09AM (#7699686)
    scored a measley 17th. Oh well, time for more surfing.
    • by real_smiff (611054) on Friday December 12, 2003 @09:03AM (#7700061)
      Does anyone actually have their porn in a database (of some sort)? I'm curious whether the "porn database" is just a joke or ... hmm, worth implementing! For all I know, there's already a 'porn-o-base' (tm?) collaborative project on sourcefourge that you're all using - after reading slashdot for a bit nothing would surprise me...

      What are the pros and cons to databasing (sp.?) your porn? - except perhaps, reduced chance of getting a girlfriend, and chance of ridicule, obviously...

      Hey, this is the right place to ask ;)

      • Re:My porn database (Score:2, Interesting)

        by lonb (716586)
        I used to run a porn site, RezX.com (about six years ago). All the content, porn included, was served out of a db.
      • I started to organize my pr0n with a database, but i found that I was easily distracted by the content.

        Plus 'leafing' through it is half the fun.
  • SQL Server? (Score:5, Interesting)

    by B5_geek (638928) on Friday December 12, 2003 @08:10AM (#7699689)
    Does the SQL Server mean MS-SQL?

    I would have liked to see SQL vs non-SQL ranking too.
    • Does the SQL Server mean MS-SQL?

      Yes, in this case - look at the "Vendor" column. Note that in the past both MS and Sybase called their database "SQL Server", nowadays Sybase calls it "Adaptive Server". Sybase IQ is highly optimized for DSS work, where as AS is optimized for OLTP.
      • Typical Microsoft calling their product something generic that should apply to any SQL server. Almost like calling a product .. Windows.
        • Re:SQL Server? (Score:5, Informative)

          by azaris (699901) on Friday December 12, 2003 @08:43AM (#7699943) Journal

          Typical Microsoft calling their product something generic that should apply to any SQL server. Almost like calling a product .. Windows.

          It was originally called Sybase SQL Server but was later picked up by MS who adapted the name. Typical /. objectivity.

    • What I know is, SQL is a method, not invented by MS, actually IBM and Compaq in business first adepting it.

      MS did amazing PR job to show it like their invention, which is totally wrong.

      SQL thing first got attention by Oracle after being ignored by all others and you see what they have become after caring about it.

      If I don't remember wrong, SQL, relative databases, relative queries has earthed at 1978 or some later.
  • Spam databases (Score:2, Insightful)

    by stanmann (602645)
    I wonder how many of the spammers allowed their databases to be evaluated for this list.
    • Why would a spammer be even close to making this list. They likely only need one big table containing the email address. The rest of the supporting tables would be relatively small.
  • No IMS? (Score:5, Interesting)

    by John Harrison (223649) <johnharrison AT gmail DOT com> on Friday December 12, 2003 @08:13AM (#7699714) Homepage Journal
    I thought that 90% of the world's data was irretrievably trapped in IMS? Seriously though, I am surprised that an IMS system isn't on the list. Probably because it isn't relational, and the people making the list figure that RDBMS are the only DB around.
    • Re:No IMS? (Score:2, Funny)

      by musikit (716987)
      I thought that 90% of the world's data was irretrievably trapped in IMS?

      looks like you got a typo in your question there. let me fix it for you.

      I thought that 90% of the world's data was irretrievably trapped in MS?
    • I thought that 90% of the world's data was irretrievably trapped in IMS?
      WTF? IMS? IMNAAL!

      (Uh.... my head hurts..... what's this IMS anyway?)

      • Re:No IMS? (Score:5, Informative)

        by John Harrison (223649) <johnharrison AT gmail DOT com> on Friday December 12, 2003 @08:35AM (#7699881) Homepage Journal
        Google is your friend. [google.com]

        IMS is the database that was used to keep track of things for the moonshot. It is an IBM product. It is hierarchical as opposed to relational. Because of this it can do certain things very quickly, though in general it isn't as flexible as say DB2. Because it has been around so long, applications where having a DB was really important tend to have bought IMS a long time ago and developed systems around it. If your system is old enough, large enough and still works well for you there is no need to migrate to relational. Most of the world's financial transactions pass through an IMS system at some point. It is very stable and has uptimes that measure in years if not decades by now.

        Because of this I am surprised that it is not on the list. There are really big IMS databases out there that run a lot of transactions. Because it isn't relational there is some bigotry against it and it is ignored in the popular press.

        • Because it isn't relational there is some bigotry against it and it is ignored in the popular press.

          Dude, where I come from the popular press doesn't often run stories on database architecture of any description - they're more into celebrity gossip and stuff.

  • Hmmm (Score:2, Interesting)

    by Cenuij (526885)

    OK so this is obviously only vendors of databases and RDBMS systems.

    In a broader sense aren't such things as the wayback machine [archive.org] a database? What about the truly massive amounts of data gathered at research labs, e.g. CERN [web.cern.ch]. Who's the daddy of these guys?

  • I would imagine that the Winter Corporation's db is now climbing up the peak performance for online transactions right now ;o)
    • On that note...

      "Experiments at CERN will produce hundreds of TB of data per year at data rates up to 35MB/second starting in 1999," states Jamie Shiers, Project Leader at CERN. "Experience from the use of Objectivity/DB and HPSS on these experiments will help us understand how we can cope with the staggering 100PB of data at rates up to 1.5GB/second expected at CERN's Large Hadron Collider, starting in 2005."

      "The size of CERN's database is bigger than any numbers ever seen," according to Richard Winte

  • What surprised me... (Score:5, Interesting)

    by MyNameIsFred (543994) * on Friday December 12, 2003 @08:15AM (#7699735)
    I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.
    • Anybody know how many concurrennt queries slashdot gets at peak?

      It would be an interesting reference point.

      • Anybody know how many concurrennt queries slashdot gets at peak? It would be an interesting reference point.

        I agree, it would.

        I wouldn't be able to take a stab at the actual numeric value for your answer, but I believe that Slashdot (as most large, content-driven websites need to do) caches a lot of data, so that it doesn't need to be queried out of the database every single time somebody requests the page. That greatly cuts down on the actual number of queries being slung at the database.

        • so that it doesn't need to be queried out of the database every single time somebody requests the page.

          Agreed.

          I've constantly remarked that my threshhold=-1 story grabs are so quick to come back.

    • by sql*kitten (1359) * on Friday December 12, 2003 @08:27AM (#7699828)
      I have none, nada, zip experience in big databases.

      S'okay, I have plenty :-)

      But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.

      You would typically see tens of thousands (or more) of concurrent connections to a middleware layer - like Tuxedo - which would then multiplex them down to hundreds of connections to the database. This is because there is a lot of latency in establishing a connection, in fact logging in often takes an order of magnitude longer than running an actual query, yet few users submit transactions nonstop. So there is no sense in maintaining tens of thousands of expensive user contexts on the DB server, and there is no sense in requiring intermittent (relatively speaking) users to log out after a short idle period. Middleware does nothing but manage concurrent user contexts, and it can do so very efficiently. A database can't, because it tries to preallocate as much context as it can, and that doesn't match real-world usage patterns, and anyway, database vendors concentrate on their SQL engines and leave middleware vendors to manage the rest.

      Of course, if you are a big database vendor, you probably also sell middleware, but there's no-one who tries to bundle the two into one, any more than you'd want a web server to have its own filesystem.
      • by Quill_28 (553921) on Friday December 12, 2003 @09:51AM (#7700452) Journal
        Something is wrong...

        Here I find a knowledgable person on Slashdot,
        Who has given a well-written response,
        Answered the question without flaming the askee,
        Didn't use numbers/symbols for letters,
        Never slammed MS or SCO,

        And was modded up?

      • I havent read their definition of Peak workload, but I guess it probably means concurrent queries. Even with a persistent connection, shouldnt there be a large number of concurrent queries? With things like parallel querying etc, does the number of connections have to be the same as queries?

        Another factor could be caching; if intelligently used could cut down on the DB workload substantially.
  • by epiphani (254981) <epiphani.dal@net> on Friday December 12, 2003 @08:16AM (#7699741)
    I honestly doubt that 29.2 Terabytes is the biggest database in the world. But anyway...

    I recognize Oracle and DB2, but could someone give a brief synopsis of what the other database systems are? And what is an MPP archetype?

    • by Peridriga (308995) on Friday December 12, 2003 @08:22AM (#7699779)
      Well... if you actually read the article it clearly states that 29.2 is not the largest...

      You can find the link to the article yourself but

      1. AT&T @ 94.3TB
      2. Amazon @ 34.2TB
      • So, its unclassified dbase "competition" just like top 500 supercomputers are the unclassified?

        IMHO with dozens of years, FBI and NSA would be top in petabyte levels.
      • by Zocalo (252965)
        Or if you include Hybrids 828.3TB owned by the Stanford Linear Accelerator Center. Frankly, I was expecting to see much larger figures than these from academia and large scale research projects, Laurence Livermore for example.

        Obviously data collected from places like Arecibo wouldn't lend themselves to this kind of survey, even though it must be vastly larger, but what about storage of particle vectors from nuclear event simulations? I'm guessing that they were either not nominated or declined to be lis

      • Here's a thought: How do they backup a database that is 94.3 TB? I deal with servers that have only a puny 100-150 GB. One or two LTO tapes backup these servers. What tapes to they use to backup this database?
        • offsite RAID / mirroring ?

          i doubt they just have one copy and a little guy sat next to it feeding in DDS3 tapes all day and night ...
        • by jgerry (14280) * <jason DOT gerry AT gmail DOT com> on Friday December 12, 2003 @10:48AM (#7701199) Homepage
          How do they backup a database that is 94.3 TB?

          I support very large Oracle databases for a living (very large meaning > 1TB), databases that must be up 24/7. Backups are done in a number of different ways:

          1) Disk syncs, block by block, between disk subsystems at disparate locations, to retain multiple copies of a database in different locations. They can be synced to more than one location too, so you can have as many copies of the database as you want. Your main database is the only "hot" database, the others can be brought up and recovered if needed. We mainly use EMC disk subsystems to do this, the process is called BCV (can't remember what that stands for right now)

          2) Real-time replication. One-to-one or one-to-many. All databases are "hot" at all times. This can be great for load balancing too since you can have multiple system onine at the same time. Very difficult to maintain and monitor.

          Large databases just can't be put to tape anymore. Even if you did, it would take days or weeks to recover them if they failed. Disk to disk is about the only way to provide backups for really large databases.
          • To add to that,
            Standby databases are popular when (in Oracle scenario) the archived log files from your hot production database are constantly automatically applied to the cold standby database in some different location and if something happens to the primary it takes very little time to bring the standby up.
            Also Oracle hot backup is by nature incremental, you can do like one tablespace per night, dont have to do the whole database at the same time (while backing up all the archived log files). I have se
    • Let's see:

      Stanford Linear Accelerator Center 828,293 - Objectivity DB - Cluster - Objectivity - Sun - Sun

      You can find that under 'database size, hybrid'. Note that this is an object database and as such will never be found under one of the 'number of rows' entries, simply because rows are relational and an object base simply stores objects.

      I believe that CERN has got a huge odbms also.

    • by mountainhouse (561889) on Friday December 12, 2003 @08:32AM (#7699858)
      I think the NCR Teradata approach is one of the most interesting. It is made up of a number of nodes (each quad Intel processor systems with separate memory and disk), each broken down into a number of logical machines. Data is hashed across all the nodes in the systems based on the data's indexing. So if two tables have the same indexing the join takes place at the "logical machine" level, and then the result is spooled together. The largest systems approach 300 nodes, with over 2,000 logical machines and 150 Tb of disk (some used to duplicate tables in case of node failure).

      Personally, it has it's drawbacks, but if the indexing is right, you can join hundred million row tables at amazing speed. Based on my experience in data warehousing, it's performance Oracle can't touch (no, I'm not paid by NCR...just a user).

      http://www.teradata.com

      Overview:
      http://www.teradata.com/t/go.aspx/?id =84960
  • Switches (Score:3, Funny)

    by Davak (526912) on Friday December 12, 2003 @08:18AM (#7699750) Homepage
    AT&T 94,305GB Daytona SMP AT&T Sun Sun

    I wonder how much of this database is everytime users have switched to and from AT&T to get those cash bonuses!
  • 94.3TB!?!?! (Score:5, Interesting)

    by Peridriga (308995) on Friday December 12, 2003 @08:19AM (#7699756)
    I know where I work we recently (for an IT pat on the back) calculated our total network accessiable storage capacity and came in at a rough estimate of about 150TB. Now that is a giant swarth of data and a decent amount is in databases (MSSQL farm) but, scattered across 1000's of DB's.

    It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...

    A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.

    Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...

    Kudos...
    • I agree, this is amazing. What's even more amazing is looking at the vendor: AT&T. This is a home grown RDBMS! They not only maintain the largest database, but write the software that makes it run!!!
    • I always wonder about large systems like that. They develop procedures and policies and a whole layer of bureaucracy to try and keep a firm grip on them, but they always seem to become an entity unto themselves that just *seems* to be under control, when it reality no two or three guys have enough access and enough experience with the thing to know exactly what's there.

      Or maybe I just lack imagination...
      • they always seem to become an entity unto themselves that just *seems* to be under control, when it reality no two or three guys have enough access and enough experience with the thing to know exactly what's there.

        Turns out after AT&T deleted an ex-employee's porn, mp3, and warez stash he was hiding in his own personal table they were able to optimize the database down to about 3GB of customer billing data. You just can't find good help these days.

    • Being it's AT&T and assuming a great deal is billing and maintence functions

      Oh how naive! It may be AT&T but the DB will still be run by a bunch of nerds...

      "Right, boss needs a client list"
      .. login... ok..

      > use bigassdb;
      > show tables;
      games
      porn
      mp3s
      films
      tv
      other

      ..
      "Ok clients must be in here somewhere..."
  • by CompWerks (684874) on Friday December 12, 2003 @08:21AM (#7699764)
    They claim to have over 300tb of data.

    Quote:
    "The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here [archive.org]

  • I cannot see what OS each DB is running on. Is that irrelevant?
    • In some of the cases it is obvious from the vendor.
      HP is likely HP/UX, Sun is Solaris, MS is Windows of some variant. The abiquous one is IBM it could be AIX, Solaris, Windows or something else.
      • IBM could, of course, be linux or OS/2, but I doubt either of them is being used for any large database, although once they get linux scaling well on p690s, we'll see what happens. Oracle 10g is also based around "grid" databases using clusters of smaller servers to achieve higher throughput which bypasses the need for scaling on a server level.

        IBM still have the real "big iron" in their mainframes, but AFAIK, they don't tend to do the largest databases, just ones where they are (a) running legacy code o

  • Anonymous (Score:5, Funny)

    by suso (153703) on Friday December 12, 2003 @08:29AM (#7699834) Homepage Journal
    Not only does Anonymous say a lot of things and write some music and paint, but he also has one of the world's largest databases.
  • by UnknowingFool (672806) on Friday December 12, 2003 @08:30AM (#7699841)
    While it is nice to see the ranking in terms of size and usage, it would be nice if the survey ranked other factors like maintenance time and number of users to see how they really compare in operation. Largest number of OLTP might signify lower downtime but maybe not.
  • Winter Corp's own results database shoots to number one in the 'Peak Workload' rankings after being linked to from Slashdot...
  • Doh! (Score:2, Funny)

    by Dilaudid (574715)
    I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart

    Hmm - how to /. your own website in one simple step?

  • Boy is the slanted. I work on Large IBM machines with DB2 built-in... Were are those?

    Some one lese wrote about google, it should be in this listing too, even if it is using a in-house developed DB.

    Platforms: Windows or Unix... BAH!
  • SMP? (Score:5, Informative)

    by paulbd (118132) on Friday December 12, 2003 @08:36AM (#7699893) Homepage
    does anybody believe that the "SMP" used in reference to the French Telecom DB means "symbol manipulation program" rather than "symmetric multiprocessing"? how are we supposed to take seriously a study (or at least a report about the study) where they just look up acronyms with no understanding?
    • Re:SMP? (Score:3, Informative)

      by RapaNui (242132)
      Yup.

      Methinks the character who wrote the article came across the term 'SMP', went to FOLDOC or The Jargon File, and whaddya know - the first hit returns 'Symbol Manipulation Program - Stephen Wolfram's yadda yadda yadda'.

  • Some things in life are scored 1-10
    Some are scored 10-1

    shouldn't the overall best performer have been ranked 1984? and the rest from there?

  • Genomic databases (Score:2, Interesting)

    by xplenumx (703804)
    I'm absolutely shocked that the NCBI's [nih.gov] (National Center for Biotechnology Information - part of the NIH) genomic and proteomic search engine BLAST [nih.gov] isn't included in the list. BLAST is consistantly used by scientists worldwide to search the genome of several organizms. I'm similarly shocked that MEDLINE / PubMed [nih.gov] isn't included as it's the primary database for searching published scientific literature. When I think of databases, I think of these two sites - not Amazon.
    • Yes, but their traffic is miniscule compared to a dot com like amazon. Also, I don't know the details of the BLAST backend, but I'm not sure it even counts in this competition. It is a conglomeration of tools and several datasets, not incorporated as a single database.

      -Sean
    • My database professor [nih.gov] gave us the run down of the technologies that the NIH databases employ- its some impressive business! Researchers all over the world are indexing and adding papers... SCREW amazon!
  • Frightening (Score:3, Interesting)

    by water-and-sewer (612923) on Friday December 12, 2003 @08:45AM (#7699957) Homepage
    Why am I simultaneously frightened and amazed to note that two of the winners are the United States'customs and border patrol database and Experion's credit rating database? If you've ever checked your credit rating [nechako.bc.ca] you'd realized this company and its peers (equifax etc.) maintain a tremendous amount of information on you, and charge you to verify it. Finding out why your credit is bad, and in the case of a mistake, changing it, is an expensive and time consuming task.
  • Anyone else notice if you go to wintercorp.com it states:

    The TopTen Program is sponsored by Hewlett-Packard, Microsoft, Oracle, Sybase, and Teradata, a division of NCR.

    Makes you wonder how definitive this survey really is.

  • by Anonymous Coward
    The size of the database isn't all that interesting. What is more important from a maintenance and reliability perspective is size in relation to average and peak loads. Who cares if you have 3Tb of data in MS Sql Server, if it takes you 10x longer to run the same query on TeraData and Oracle. For small databases, who cares. Any of the major database can handle several Gb of data without any problems. But there is a huge difference between TeraData, Oracle, Sybase, Db2 and MSSql Server. Sql Server can't han
  • Databases not ranked (Score:2, Interesting)

    by Hungus (585181)
    I find it interesting that the largest database is only 2TB larger than the one I recently built. It is a medical system. 66 mysql servers bear the load but I only usually have 30 of them actually active as the rest are mirrors and logging masters. Typical connections: 4500 at any given time.
  • I wonder if any of these are large government surveillance databases?
  • OK I'll be flamed for technical illiteracy, but there are a number of archival systems which go into the Petabyte (1000 Terabyte) range but are still relational databases with row level access.

    One I worked on stored the output of Cray supercomputers running modelling programs 24x7. The data was output to a bank of Teradata boxes and then archived to tape. The system had a robot tape librarian at the back end but could still operate as a relational database.

    The historical data should all be in there by
  • I had always heard that walmart maintained one of, if not the biggest database in the world. Kmart appears on one or two of the top ten lists here, but not walmart. Anybody know what gives?

    I'd truly expect the truly largest databases to be maintained by financial institutions (banks, credit card companies, the stock market, etc) based on the sheer volume of transactions. Either them or the NSA or the FBI.
  • Daytona? (Score:3, Insightful)

    by wandazulu (265281) on Friday December 12, 2003 @10:06AM (#7700658)
    Is it just me, or is this the first time anyone has heard of AT&T's Daytona? A quick Google [google.com] search reveals a pdf and 8 links before Daytona becomes Daytona Beach. For such a high ranking, I'd think AT&T would want to make it better known that they have this system.
  • My first reaction is that, if France Telecom has the largest (non-hybrid) proprietary relational data storage, at 29 TB, ahead of AT&T and SBC, at around 26TB each, that France Telecom must have a bunch of redundant data lying around.

    As of 2001-01-01 [ambafrance-zm.org], France had a population of about 59 Million. As it turns out, however, France Telecom (FTE) provides services to a dozen countries, not just France. Checking Yahoo! Finance, I see that

    FTE had 2002 revenues of 49B [yahoo.com], with 240,000 employees.
    ATT had 2002 revenues of 40B [yahoo.com], with 71,000 employees.
    Finally, SBC had 2002 revenues of 43B [yahoo.com], with 175,000 employees.

    So nothing terribly unusual about the size of their database. But it's obvious that the French employees are a bunch of unproductive slackers...
  • bah, meaningless (Score:4, Interesting)

    by kpharmer (452893) * on Friday December 12, 2003 @10:34AM (#7701014)
    This is like ranking projects based on largest number of lines of code.

    Without system descriptions (like in tcp) it merely shows that such a top-end is feasible.

    What about total cost?
    annual cost?
    time to build?
    software versions?
    hardware?
    staffing composition?

    I mean really, a 500 gbyte database on a modest single CPU server is far more challenging than a 2 TB database on a 64-CPU E10k.

  • We are larger: 500TB (Score:3, Interesting)

    by SilverSun (114725) on Friday December 12, 2003 @09:39PM (#7708131) Homepage
    I don't understand their counting. Not that I am happy with it, but we (BaBar) have certainly a much larger database than all of these companies. And, since we also have severl computing farm summing up to several thousand CPUs which process the data constantly, I doubt that they have higher load.

    Press release:

    http://www.slac.stanford.edu/slac/media-info/200 20 412/database.html

    Cheers

"It's what you learn after you know it all that counts." -- John Wooden

Working...