Forgot your password?
typodupeerror
Data Storage Technology

World's Largest Databases Ranked 356

Posted by CowboyNeal
from the cylinder-sizing dept.
prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."
This discussion has been archived. No new comments can be posted.

World's Largest Databases Ranked

Comments Filter:
  • Google (Score:5, Interesting)

    by ScribeOfTheNile (694546) on Friday December 12, 2003 @09:09AM (#7699684) Homepage
    I would've expected to see Google in there somewhere.
  • SQL Server? (Score:5, Interesting)

    by B5_geek (638928) on Friday December 12, 2003 @09:10AM (#7699689)
    Does the SQL Server mean MS-SQL?

    I would have liked to see SQL vs non-SQL ranking too.
  • No IMS? (Score:5, Interesting)

    by John Harrison (223649) <johnharrison@nOsPam.gmail.com> on Friday December 12, 2003 @09:13AM (#7699714) Homepage Journal
    I thought that 90% of the world's data was irretrievably trapped in IMS? Seriously though, I am surprised that an IMS system isn't on the list. Probably because it isn't relational, and the people making the list figure that RDBMS are the only DB around.
  • Hmmm (Score:2, Interesting)

    by Cenuij (526885) on Friday December 12, 2003 @09:14AM (#7699725)

    OK so this is obviously only vendors of databases and RDBMS systems.

    In a broader sense aren't such things as the wayback machine [archive.org] a database? What about the truly massive amounts of data gathered at research labs, e.g. CERN [web.cern.ch]. Who's the daddy of these guys?

  • What surprised me... (Score:5, Interesting)

    by MyNameIsFred (543994) * on Friday December 12, 2003 @09:15AM (#7699735)
    I have none, nada, zip experience in big databases. But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.
  • by epiphani (254981) <epiphani&dal,net> on Friday December 12, 2003 @09:16AM (#7699741)
    I honestly doubt that 29.2 Terabytes is the biggest database in the world. But anyway...

    I recognize Oracle and DB2, but could someone give a brief synopsis of what the other database systems are? And what is an MPP archetype?

  • 94.3TB!?!?! (Score:5, Interesting)

    by Peridriga (308995) on Friday December 12, 2003 @09:19AM (#7699756)
    I know where I work we recently (for an IT pat on the back) calculated our total network accessiable storage capacity and came in at a rough estimate of about 150TB. Now that is a giant swarth of data and a decent amount is in databases (MSSQL farm) but, scattered across 1000's of DB's.

    It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...

    A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.

    Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...

    Kudos...
  • by CompWerks (684874) on Friday December 12, 2003 @09:21AM (#7699764)
    They claim to have over 300tb of data.

    Quote:
    "The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here [archive.org]

  • by nick_urbanik (534101) <nicku&nicku,org> on Friday December 12, 2003 @09:23AM (#7699781) Homepage
    I cannot see what OS each DB is running on. Is that irrelevant?
  • by mountainhouse (561889) on Friday December 12, 2003 @09:32AM (#7699858)
    I think the NCR Teradata approach is one of the most interesting. It is made up of a number of nodes (each quad Intel processor systems with separate memory and disk), each broken down into a number of logical machines. Data is hashed across all the nodes in the systems based on the data's indexing. So if two tables have the same indexing the join takes place at the "logical machine" level, and then the result is spooled together. The largest systems approach 300 nodes, with over 2,000 logical machines and 150 Tb of disk (some used to duplicate tables in case of node failure).

    Personally, it has it's drawbacks, but if the indexing is right, you can join hundred million row tables at amazing speed. Based on my experience in data warehousing, it's performance Oracle can't touch (no, I'm not paid by NCR...just a user).

    http://www.teradata.com

    Overview:
    http://www.teradata.com/t/go.aspx/?id =84960
  • by arrogance (590092) on Friday December 12, 2003 @09:41AM (#7699925)
    From the article: "the TopTen Program featured 141 qualified and validated surveys representing 23 countries spanning all major DBMS, server and storage vendor products." So it just has to be a DataBase Management System, not necessarily Relational.
  • Genomic databases (Score:2, Interesting)

    by xplenumx (703804) on Friday December 12, 2003 @09:41AM (#7699927)
    I'm absolutely shocked that the NCBI's [nih.gov] (National Center for Biotechnology Information - part of the NIH) genomic and proteomic search engine BLAST [nih.gov] isn't included in the list. BLAST is consistantly used by scientists worldwide to search the genome of several organizms. I'm similarly shocked that MEDLINE / PubMed [nih.gov] isn't included as it's the primary database for searching published scientific literature. When I think of databases, I think of these two sites - not Amazon.
  • Frightening (Score:3, Interesting)

    by water-and-sewer (612923) on Friday December 12, 2003 @09:45AM (#7699957) Homepage
    Why am I simultaneously frightened and amazed to note that two of the winners are the United States'customs and border patrol database and Experion's credit rating database? If you've ever checked your credit rating [nechako.bc.ca] you'd realized this company and its peers (equifax etc.) maintain a tremendous amount of information on you, and charge you to verify it. Finding out why your credit is bad, and in the case of a mistake, changing it, is an expensive and time consuming task.
  • by Cenuij (526885) on Friday December 12, 2003 @09:56AM (#7700011)
    On that note...

    "Experiments at CERN will produce hundreds of TB of data per year at data rates up to 35MB/second starting in 1999," states Jamie Shiers, Project Leader at CERN. "Experience from the use of Objectivity/DB and HPSS on these experiments will help us understand how we can cope with the staggering 100PB of data at rates up to 1.5GB/second expected at CERN's Large Hadron Collider, starting in 2005."

    "The size of CERN's database is bigger than any numbers ever seen," according to Richard Winter, president of Winter Corp., a Boston-based consultancy specializing in VLDBs. "The growing use of non-traditional data types is producing a produce a giant leap in database size. Such databases will soon be commonplace in engineering, commercial, and medical fields as well." concludes Winter.

    big mama db's [objectivity.com]

  • by Zocalo (252965) on Friday December 12, 2003 @10:00AM (#7700041) Homepage
    Or if you include Hybrids 828.3TB owned by the Stanford Linear Accelerator Center. Frankly, I was expecting to see much larger figures than these from academia and large scale research projects, Laurence Livermore for example.

    Obviously data collected from places like Arecibo wouldn't lend themselves to this kind of survey, even though it must be vastly larger, but what about storage of particle vectors from nuclear event simulations? I'm guessing that they were either not nominated or declined to be listed on security grounds rather than don't rate high enough. Does anyone have any figures?

  • by bruthasj (175228) <(moc.oohay) (ta) (jsahturb)> on Friday December 12, 2003 @10:05AM (#7700075) Homepage Journal
    All the more proving that you don't need a stupid database for everything. Actually, they should put conventional static filesystems as part of the comparison. Because you know what, some IT people get hooked on trying to dumping everything under the Sun in Oracle. This request is especially relevant for journaling/transaction based filesystems and possibly the future Longhorn thingy where it's got SQL capabilities.
  • by fritz1968 (569074) on Friday December 12, 2003 @10:05AM (#7700076)
    Here's a thought: How do they backup a database that is 94.3 TB? I deal with servers that have only a puny 100-150 GB. One or two LTO tapes backup these servers. What tapes to they use to backup this database?
  • Re:My porn database (Score:2, Interesting)

    by lonb (716586) on Friday December 12, 2003 @10:16AM (#7700161) Homepage
    I used to run a porn site, RezX.com (about six years ago). All the content, porn included, was served out of a db.
  • Databases not ranked (Score:2, Interesting)

    by Hungus (585181) on Friday December 12, 2003 @10:19AM (#7700178) Journal
    I find it interesting that the largest database is only 2TB larger than the one I recently built. It is a medical system. 66 mysql servers bear the load but I only usually have 30 of them actually active as the rest are mirrors and logging masters. Typical connections: 4500 at any given time.
  • MasterCard (Score:4, Interesting)

    by truthsearch (249536) on Friday December 12, 2003 @10:57AM (#7700548) Homepage Journal
    I left MasterCard in 1999 after working with their data warehouse. At the time they recently bought a 3 terabyte Sun E10000 with Oracle. They quickly ran out of space and added another terabyte. I'm also surprised to not see them on the list. They work closely with Oracle, who have an office down the street, since they have high volume. Just the credit card transactions table alone gets 14 million new records on average every day.

    I agree that there are many companies who would not want to be in that list. There's a small competitive advantage if you keep what technology you use secret.
  • Re:SQL Server? (Score:3, Interesting)

    by Dr Caleb (121505) <thedarkknight AT hushmail DOT com> on Friday December 12, 2003 @11:10AM (#7700707) Homepage Journal
    If you want something that complies with the relational model and relational theory, skip SQL and go directly to IBM DB2 and RPG. SQL as you say is a kludge. DB2 as a language so much reminds me of assembler I tend to liken it to opcode for databases. As you may tell, I'm a big IBM'er.

  • bah, meaningless (Score:4, Interesting)

    by kpharmer (452893) * on Friday December 12, 2003 @11:34AM (#7701014)
    This is like ranking projects based on largest number of lines of code.

    Without system descriptions (like in tcp) it merely shows that such a top-end is feasible.

    What about total cost?
    annual cost?
    time to build?
    software versions?
    hardware?
    staffing composition?

    I mean really, a 500 gbyte database on a modest single CPU server is far more challenging than a 2 TB database on a 64-CPU E10k.

  • Re:My porn database (Score:1, Interesting)

    by Anonymous Coward on Friday December 12, 2003 @11:40AM (#7701084)

    I keep md5 hashes and the galleries they relate to of all the pictures of porn I have (about 25000 images so far) so I dont get duplicates when I add new ones... I also have another db table for galleries wich keeps track of the number of images are included at that gallery and the traits of the chick/chicks in those pictures (young, hot, shaved, redhead, cartoon), althoug only about 60% of the galleries have been gategorized.

    It runs on mysql and is controlled by a couple of legacy php scripts (yes I taught my self php so I could create a cool database for my porn)

    The pictures them selves are held in seperate Blowfish encrypted files (so my parents wont find em) and I also keep smaller (also encrypted) thumbnails of all images.

    The funny thing is that I rarely see any of these pictures after I enter them to my database because im always looking for new ones (free6.com, thehun.com, ampland.com and spidering various newsgroups)... Oh well... maby I just give em to my grandchildren some day.

  • Re:Walmart (Score:1, Interesting)

    by Anonymous Coward on Friday December 12, 2003 @01:08PM (#7702273)
    Yes, Walmart does have one of the largest, but finacial and government institutions are the largest. The NSA keeps track of everything on the net(yes, everything) they just don't have the tools to analyze all that data.

    I read an article in CRN(could be wrong here) about Visa's north american systems. They have two sites, one for the eastern half and one western, dividing the continent at the Mississippi river. The eastern site generates about 240 TB of data a month and could take over the whole continent if the western site went down. All this with just 4-5 IBM mainframes, probably running IMS.

    How often does your visa transaction not get processed? On the shopping day after Thanksgiving? All this with just 4-5 boxes? I would like to see Sun, HP, Oracle, Microsoft try that. Even if they could do it, it would be far more expensive to build and maintain.
  • Open Source DBs? (Score:2, Interesting)

    by Anonymous Coward on Friday December 12, 2003 @01:14PM (#7702362)
    Since neither PostgreSQL or MySQL showed up in the list (not surprisingly), does anybody know what the largest databases are running either of them?

    I would guess that PostgreSQL maxes out larger than MySQL. </fuel-on-the-fire>
  • Re:SQL Server? (Score:2, Interesting)

    by $ASANY (705279) on Friday December 12, 2003 @10:25PM (#7708060) Homepage
    The actual story is that in the mid-90's Microsoft bought the source code and rights to Sybase SQL Server 4.9.2 from Sybase, and then sued Sybase claiming that the name "SQL Server" was part of the package that they paid for. Sybase settled the case and relinquished the "SQL Server" name re-branding their OLTP RDBMS "Adaptive Server Enterprise".

    Now MS has overwhelmed Sybase with a derivation of it's own technology that has MS's special additional bugs included for a nominal price, largey because they know how to market and Sybase regularly fails to market it's products effectively.

  • We are larger: 500TB (Score:3, Interesting)

    by SilverSun (114725) on Friday December 12, 2003 @10:39PM (#7708131) Homepage
    I don't understand their counting. Not that I am happy with it, but we (BaBar) have certainly a much larger database than all of these companies. And, since we also have severl computing farm summing up to several thousand CPUs which process the data constantly, I doubt that they have higher load.

    Press release:

    http://www.slac.stanford.edu/slac/media-info/200 20 412/database.html

    Cheers

New crypt. See /usr/news/crypt.

Working...