

World's Largest Databases Ranked 356
prostoalex writes "Winter Corp. has summarized its findings of the annual TopTen competition, where the world's largest and most hard-working (in terms of load) databases are ranked. The results are in, and this year the contestants were ranked on size, data volume, number of rows and peak workload. I wrote up a brief summary of the top three winners in each category for those too lazy to browse the interactive WinterCorp chart."
Google (Score:5, Interesting)
Re:Google (Score:5, Informative)
see http://www.cs.rochester.edu/sosp2003/papers/p125-
Re:Google (Score:5, Informative)
One might see a database as merely a "big file" with mechanisms to access and modify it consistently (and surely, Google has some means to ensure consistency). A big file does not disqualify for the term "database" just because it is not produced by one of {Oracle, MS-SQL, ...} or cannot be queried by the language SQL.
It is also possible to consider the Web to be a database (of Web sites). Or an XML, BibTeX, dbm, whatsoever file.
Sebastian
Re:Google (Score:3, Informative)
Re:Google (Score:5, Informative)
Since Google's stuff seems to be developed in-house, they don't have a major database vendor to nominate them.
Re:Google (Score:5, Informative)
Doesn't have to be relational (Score:4, Interesting)
Re:Google (Score:3, Insightful)
IMHO some of them didn't want to be in that list.
Re:Google (Score:2, Informative)
IMHO some of them didn't want to be in that list.
If you look at "database size", number 4 is listed as anonymous. They probably aren't too interested in telling everyone what database and platform they are using for storing very critical data with.
MasterCard (Score:4, Interesting)
I agree that there are many companies who would not want to be in that list. There's a small competitive advantage if you keep what technology you use secret.
Re:AmEx (Score:3, Informative)
Big. 7 data silos big. Each silo holds 50k tapes, each tape was 30gb, and it usually took 4 days to load.
[1] Epsilon was originally an AmEx division, which was spun off to keep other customers happy (banks and other CC companies).
My porn database (Score:3, Funny)
Re:My porn database (Score:5, Funny)
What are the pros and cons to databasing (sp.?) your porn? - except perhaps, reduced chance of getting a girlfriend, and chance of ridicule, obviously...
Hey, this is the right place to ask ;)
Re:My porn database (Score:2, Interesting)
Re:My porn database (Score:2, Funny)
Plus 'leafing' through it is half the fun.
SQL Server? (Score:5, Interesting)
I would have liked to see SQL vs non-SQL ranking too.
Re:SQL Server? (Score:2)
Yes, in this case - look at the "Vendor" column. Note that in the past both MS and Sybase called their database "SQL Server", nowadays Sybase calls it "Adaptive Server". Sybase IQ is highly optimized for DSS work, where as AS is optimized for OLTP.
Re:SQL Server? (Score:3, Funny)
Re:SQL Server? (Score:5, Informative)
Typical Microsoft calling their product something generic that should apply to any SQL server. Almost like calling a product .. Windows.
It was originally called Sybase SQL Server but was later picked up by MS who adapted the name. Typical /. objectivity.
Re:SQL Server? (Score:4, Insightful)
Well, "SQL server" is a stupid way to refer to a RDBS. That's like calling Apache "perl-server". I'm not surprised the only people chosing to name their RDBS products as SQL-something-or-other are the open source developers and Microsoft. Also I've never heard of MS sueing MySQL or PostgreSQL for use of the term SQL in relation to a RDBS.
Besides, the product is officially called Microsoft SQL Server and has always been, just like Microsoft Windows, but everybody refers to it as SQL Server or, if there is possibility of confusion, MS SQL Server or MSSQL for short. Is it malevolence on the part of Microsoft if people can't be bothered to use the full name of each and every one of their products?
Re:SQL Server? (Score:2)
MS did amazing PR job to show it like their invention, which is totally wrong.
SQL thing first got attention by Oracle after being ignored by all others and you see what they have become after caring about it.
If I don't remember wrong, SQL, relative databases, relative queries has earthed at 1978 or some later.
Re:SQL Server? (Score:3, Interesting)
Re:SQL Server? (Score:5, Insightful)
sPh
Re:SQL Server? (Score:4, Insightful)
Spam databases (Score:2, Insightful)
Re:Spam databases (Score:2)
No IMS? (Score:5, Interesting)
Re:No IMS? (Score:2, Funny)
looks like you got a typo in your question there. let me fix it for you.
I thought that 90% of the world's data was irretrievably trapped in MS?
Re:No IMS? (Score:2)
(Uh.... my head hurts..... what's this IMS anyway?)
Re:No IMS? (Score:5, Informative)
IMS is the database that was used to keep track of things for the moonshot. It is an IBM product. It is hierarchical as opposed to relational. Because of this it can do certain things very quickly, though in general it isn't as flexible as say DB2. Because it has been around so long, applications where having a DB was really important tend to have bought IMS a long time ago and developed systems around it. If your system is old enough, large enough and still works well for you there is no need to migrate to relational. Most of the world's financial transactions pass through an IMS system at some point. It is very stable and has uptimes that measure in years if not decades by now.
Because of this I am surprised that it is not on the list. There are really big IMS databases out there that run a lot of transactions. Because it isn't relational there is some bigotry against it and it is ignored in the popular press.
Re:No IMS? (Score:2)
Dude, where I come from the popular press doesn't often run stories on database architecture of any description - they're more into celebrity gossip and stuff.
Hmmm (Score:2, Interesting)
OK so this is obviously only vendors of databases and RDBMS systems.
In a broader sense aren't such things as the wayback machine [archive.org] a database? What about the truly massive amounts of data gathered at research labs, e.g. CERN [web.cern.ch]. Who's the daddy of these guys?
wintercorp climbing up the ratings now.. (Score:2)
Re:wintercorp climbing up the ratings now.. (Score:2, Interesting)
What surprised me... (Score:5, Interesting)
Re:What surprised me... (Score:2)
It would be an interesting reference point.
Re:What surprised me... (Score:2)
I agree, it would.
I wouldn't be able to take a stab at the actual numeric value for your answer, but I believe that Slashdot (as most large, content-driven websites need to do) caches a lot of data, so that it doesn't need to be queried out of the database every single time somebody requests the page. That greatly cuts down on the actual number of queries being slung at the database.
Re:What surprised me... (Score:2)
so that it doesn't need to be queried out of the database every single time somebody requests the page.
Agreed.
I've constantly remarked that my threshhold=-1 story grabs are so quick to come back.
Re:What surprised me... (Score:5, Informative)
S'okay, I have plenty
But it surprised me that the peak workloads were measured in 100s of concurrent queries. If I had to make a wild guess, I would have guessed 10s of thousands. My blessed ignorance destroyed.
You would typically see tens of thousands (or more) of concurrent connections to a middleware layer - like Tuxedo - which would then multiplex them down to hundreds of connections to the database. This is because there is a lot of latency in establishing a connection, in fact logging in often takes an order of magnitude longer than running an actual query, yet few users submit transactions nonstop. So there is no sense in maintaining tens of thousands of expensive user contexts on the DB server, and there is no sense in requiring intermittent (relatively speaking) users to log out after a short idle period. Middleware does nothing but manage concurrent user contexts, and it can do so very efficiently. A database can't, because it tries to preallocate as much context as it can, and that doesn't match real-world usage patterns, and anyway, database vendors concentrate on their SQL engines and leave middleware vendors to manage the rest.
Of course, if you are a big database vendor, you probably also sell middleware, but there's no-one who tries to bundle the two into one, any more than you'd want a web server to have its own filesystem.
Re:What surprised me... (Score:5, Funny)
Here I find a knowledgable person on Slashdot,
Who has given a well-written response,
Answered the question without flaming the askee,
Didn't use numbers/symbols for letters,
Never slammed MS or SCO,
And was modded up?
Re:What surprised me... (Score:3, Funny)
None dare mod down those w/ 4 digits.
Re:What surprised me... (Score:3, Insightful)
Another factor could be caching; if intelligently used could cut down on the DB workload substantially.
29 TB is the biggest? (Score:4, Interesting)
I recognize Oracle and DB2, but could someone give a brief synopsis of what the other database systems are? And what is an MPP archetype?
Re:29 TB is the biggest? (Score:5, Informative)
You can find the link to the article yourself but
Re:29 TB is the biggest? (Score:2)
IMHO with dozens of years, FBI and NSA would be top in petabyte levels.
Re:29 TB is the biggest? (Score:3, Interesting)
Obviously data collected from places like Arecibo wouldn't lend themselves to this kind of survey, even though it must be vastly larger, but what about storage of particle vectors from nuclear event simulations? I'm guessing that they were either not nominated or declined to be lis
Re:29 TB is the biggest? (Score:2, Interesting)
Re:29 TB is the biggest? (Score:2)
i doubt they just have one copy and a little guy sat next to it feeding in DDS3 tapes all day and night
Re:29 TB is the biggest? (Score:5, Informative)
I support very large Oracle databases for a living (very large meaning > 1TB), databases that must be up 24/7. Backups are done in a number of different ways:
1) Disk syncs, block by block, between disk subsystems at disparate locations, to retain multiple copies of a database in different locations. They can be synced to more than one location too, so you can have as many copies of the database as you want. Your main database is the only "hot" database, the others can be brought up and recovered if needed. We mainly use EMC disk subsystems to do this, the process is called BCV (can't remember what that stands for right now)
2) Real-time replication. One-to-one or one-to-many. All databases are "hot" at all times. This can be great for load balancing too since you can have multiple system onine at the same time. Very difficult to maintain and monitor.
Large databases just can't be put to tape anymore. Even if you did, it would take days or weeks to recover them if they failed. Disk to disk is about the only way to provide backups for really large databases.
Re:29 TB is the biggest? (Score:3, Informative)
Standby databases are popular when (in Oracle scenario) the archived log files from your hot production database are constantly automatically applied to the cold standby database in some different location and if something happens to the primary it takes very little time to bring the standby up.
Also Oracle hot backup is by nature incremental, you can do like one tablespace per night, dont have to do the whole database at the same time (while backing up all the archived log files). I have se
Re:29 TB is the biggest? (Score:2)
Let's see:
Stanford Linear Accelerator Center 828,293 - Objectivity DB - Cluster - Objectivity - Sun - Sun
You can find that under 'database size, hybrid'. Note that this is an object database and as such will never be found under one of the 'number of rows' entries, simply because rows are relational and an object base simply stores objects.
I believe that CERN has got a huge odbms also.
Re:29 TB is the biggest? (Score:5, Interesting)
Personally, it has it's drawbacks, but if the indexing is right, you can join hundred million row tables at amazing speed. Based on my experience in data warehousing, it's performance Oracle can't touch (no, I'm not paid by NCR...just a user).
http://www.teradata.com
Overview:
http://www.teradata.com/t/go.aspx/?i
Switches (Score:3, Funny)
I wonder how much of this database is everytime users have switched to and from AT&T to get those cash bonuses!
94.3TB!?!?! (Score:5, Interesting)
It takes a truely amazing staff to maintain (backup, adminisister, maintence, sit and stare at screens) the servers and maintain the integrity of the data but, good lord...
A 94.3TB database? My upmost, and highest kudo's to those DBMA's and admins there. That is one gigantic task to operate. Being it's AT&T and assuming a great deal is billing and maintence functions these have to be up I'm sure a good 3 nines if not greater.
Regardless of the result of the study, which without actually reading the entire study the end results are simply a short-read of a geek pissing contest, I find it truely amazing how much work, man-hours, and midnight pager calls go into maintaining these databases. I know I don't want our DBMA's jobs and certainly wouldn't want to be a DBMA on a 94.3TB farm but, I know those that do and love doing it. It's a speciality skill and apparently these guys do it right...
Kudos...
Re:94.3TB!?!?! (Score:2)
Re:94.3TB!?!?! (Score:4, Insightful)
What else do you expect from the company that kinda sorta wrote Unix?
Re:94.3TB!?!?! (Score:2)
Or maybe I just lack imagination...
Re:94.3TB!?!?! (Score:3, Funny)
Turns out after AT&T deleted an ex-employee's porn, mp3, and warez stash he was hiding in his own personal table they were able to optimize the database down to about 3GB of customer billing data. You just can't find good help these days.
Re:94.3TB!?!?! (Score:2, Funny)
Oh how naive! It may be AT&T but the DB will still be run by a bunch of nerds...
"Right, boss needs a client list"
> use bigassdb;
> show tables;
games
porn
mp3s
films
tv
other
"Ok clients must be in here somewhere..."
Archive.org not on the list? (Score:4, Interesting)
Quote:
"The Internet Archive Wayback Machine contains over 300 terabytes of data and is currently growing at a rate of 12 terabytes per month." Taken from here [archive.org]
Re:Archive.org not on the list? (Score:2, Informative)
Re:Archive.org not on the list? (Score:2)
Re:Archive.org not on the list? (Score:4, Interesting)
But it doesn't say what OS? (Score:2, Interesting)
Re:But it doesn't say what OS? (Score:2)
HP is likely HP/UX, Sun is Solaris, MS is Windows of some variant. The abiquous one is IBM it could be AIX, Solaris, Windows or something else.
Re:But it doesn't say what OS? (Score:2)
IBM still have the real "big iron" in their mainframes, but AFAIK, they don't tend to do the largest databases, just ones where they are (a) running legacy code o
Re:But it doesn't say what OS? (Score:2)
No it's not. z/OS is effectively the current version of IBMs MVS operating system which goes back to the 60s. It does have a POSIX-compliant implementation of UNIX available for applications to use if they want (USS - UNIX System Services), but it's not a UNIX platform, especially not when referring to DB/2 for z/OS which is a native MVS application.
Anonymous (Score:5, Funny)
Re:Anonymous (Score:2)
Other factors? (Score:3)
And in other news... (Score:2)
Doh! (Score:2, Funny)
Hmm - how to /. your own website in one simple step?
Only WIndows and Unix? (Score:2)
Some one lese wrote about google, it should be in this listing too, even if it is using a in-house developed DB.
Platforms: Windows or Unix... BAH!
SMP? (Score:5, Informative)
Re:SMP? (Score:3, Informative)
Methinks the character who wrote the article came across the term 'SMP', went to FOLDOC or The Jargon File, and whaddya know - the first hit returns 'Symbol Manipulation Program - Stephen Wolfram's yadda yadda yadda'.
should it really have been traditional ranking? (Score:2)
Some are scored 10-1
shouldn't the overall best performer have been ranked 1984? and the rest from there?
Genomic databases (Score:2, Interesting)
Re:Genomic databases (Score:2)
-Sean
Me Too! (Score:2)
Frightening (Score:3, Interesting)
Sponsorhip (Score:2)
Anyone else notice if you go to wintercorp.com it states:
Makes you wonder how definitive this survey really is.
put things in perspective (Score:2, Informative)
Databases not ranked (Score:2, Interesting)
Lots of 'Anon' entries (Score:2)
Bah, that's nothing -- let's talk Petabytes (Score:2)
One I worked on stored the output of Cray supercomputers running modelling programs 24x7. The data was output to a bank of Teradata boxes and then archived to tape. The system had a robot tape librarian at the back end but could still operate as a relational database.
The historical data should all be in there by
Walmart (Score:2)
I'd truly expect the truly largest databases to be maintained by financial institutions (banks, credit card companies, the stock market, etc) based on the sheer volume of transactions. Either them or the NSA or the FBI.
Daytona? (Score:3, Insightful)
France Telecom? They must be doing something wrong (Score:4, Funny)
As of 2001-01-01 [ambafrance-zm.org], France had a population of about 59 Million. As it turns out, however, France Telecom (FTE) provides services to a dozen countries, not just France. Checking Yahoo! Finance, I see that
FTE had 2002 revenues of 49B [yahoo.com], with 240,000 employees.
ATT had 2002 revenues of 40B [yahoo.com], with 71,000 employees.
Finally, SBC had 2002 revenues of 43B [yahoo.com], with 175,000 employees.
So nothing terribly unusual about the size of their database. But it's obvious that the French employees are a bunch of unproductive slackers...
bah, meaningless (Score:4, Interesting)
Without system descriptions (like in tcp) it merely shows that such a top-end is feasible.
What about total cost?
annual cost?
time to build?
software versions?
hardware?
staffing composition?
I mean really, a 500 gbyte database on a modest single CPU server is far more challenging than a 2 TB database on a 64-CPU E10k.
We are larger: 500TB (Score:3, Interesting)
Press release:
http://www.slac.stanford.edu/slac/media-info/20
Cheers
Re:No, it's 30,000GB (Score:4, Informative)
Re:No, it's 30,000GB (Score:2)
Re:Hang on ... (Score:2)
Re:Hang on ... (Score:2)
France Telecom's Oracle database is around 30 TB in size (29,232 GB.. thats a comma not a decimal point).
Re:Hang on ... (Score:2)
Tim
Re:Hang on ... (Score:2, Informative)
In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500GB of data for Microsoft Corp.'s Windows and NT platforms and 1TB of data for all other platforms.
Only on Windows platform! (Score:5, Informative)
Lastly, in the Windows OTLP category HP servers were used by 7 of 10 organizations, and Microsoft SQL Server was the DBMS choice for seven respondents.
Neither WindowsNT, nor MS SQL are generally a choice for the top databases. In fact, to make the entry in this list, a Windows-Database was required to be only half as big as databases on other platforms:
In order to qualify for the TopTen program consideration, any commercial production database implementation was required to feature a minimum of 500 GB of data for Microsoft Corp.'s Windows and NT platforms and 1 TB of data for all other platforms
ms
Re:Only on Windows platform! (Score:3, Informative)
By the way, I must just grumble at the lack of knowledge some people have on SQL Server. I sat in a meeting a few weeks ago with our Oracle-centric architects who decided that, as SQL Server is being used more and more extensively in our company, they'd better understand something about it. They started asking us various questions which rather puzzled me
Re:article also reports that (Score:2, Funny)
How was your test environment organised?
Oh no, you were being ironic, I must pay more attention.
Re:telemarketers (Score:2)