Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Data Storage The Internet IT

Internet Archive Gets 4.5PB Data Center Upgrade 235

Lucas123 writes "The Internet Archive, the non-profit organization that scrapes the Web every two months in order to archive web page images, just cut the ribbon on a new 4.5 petabyte data center housed in a metal shipping container that sits outside. The data center supports the Wayback Machine, the Web site that offers the public a view of the 151 billion Web page images collected since 1997. The new data center houses 63 Sun Fire servers, each with 48 1TB hard drives running in parallel to support both the web crawling application and the 200,000 visitors to the site each day."
This discussion has been archived. No new comments can be posted.

Internet Archive Gets 4.5PB Data Center Upgrade

Comments Filter:
  • by fuzzyfuzzyfungus ( 1223518 ) on Wednesday March 25, 2009 @06:56PM (#27336293) Journal
    TFA indicates that they have a mirror at the library of Alexandria. Unless things have changed since last I read about them, the mirroring is pretty much it. The Internet Archive does very impressive work; but they don't have that much money. No Real Big Serious Enterprise tape silos here.
  • by Anonymous Coward on Wednesday March 25, 2009 @07:01PM (#27336339)
    1 library of congress = 10 terabytes [wikipedia.org]
    4.5 petabytes = 4608 terabytes [google.com]
    So, that's 460.8 LOCs.
  • by commodore64_love ( 1445365 ) on Wednesday March 25, 2009 @07:08PM (#27336399) Journal

    83 terabyte in the LOC, so 4.5 petabytes == 54 Libraries of Congress

    4.5 petabytes == 4500 terabyte hard drives, times $75 each == ~$340,000 == how much taxpayers spend, each hour, to maintain the LOC

  • In Other News (Score:5, Informative)

    by Erik Fish ( 106896 ) on Wednesday March 25, 2009 @07:09PM (#27336407) Journal

    Incidentally: FileFront [filefront.com] is closing in five days, taking with it any files that aren't hosted elsewhere.

    I am told that many of the Half-Life mods [filefront.com] hosted there are not available anywhere else, so get while the getting is good...

  • by Profane MuthaFucka ( 574406 ) <busheatskok@gmail.com> on Wednesday March 25, 2009 @07:29PM (#27336569) Homepage Journal
    Because after 1996 women shaved all their hair off due to a mistaken belief that men prefer their women to look like little girls. We don't, we like the big bushes, and that is why you must save that porn for the good of mankind.
  • Math (Score:3, Informative)

    by PowerKe ( 641836 ) on Wednesday March 25, 2009 @07:29PM (#27336579)
    63 servers * 48 disk of 1 TB = 3024 TB. According to the announcement [archive.org] on the archive.org 3 Petabytes would be right.
  • "Sun Fire" (Score:4, Informative)

    by fm6 ( 162816 ) on Wednesday March 25, 2009 @07:37PM (#27336647) Homepage Journal

    The new data center houses 63 Sun Fire servers

    That's not very specific. "Sun Fire" is a brand that for a while got applied to all of Sun's rack-mount servers (except for NEBS-compliant servers, which were and are called "Sun Netra"). A little confusing, of course, which is why they've started calling new SPARC boxes "Sun SPARC Enterprise" to differentiate them from those mangy x64 "Sun Fire" systems. Except that there are still SPARC systems called "Sun Fire", so I guess the confusion factor didn't get any better...

    Anyway, the specific server being used here is the Sun Firex X4500 [sun.com], a system with no less than 48 1 TB disks in a 4U space. Notice that this model is EOLed; presumably iarchive got a deal on some remaindered machines.

    The shipping container is something we've seen before [slashdot.org].

  • by scottrocket ( 1065416 ) on Wednesday March 25, 2009 @07:55PM (#27336801) Journal
    Yes, "The Wayback Machine", at archive.org. Coincidentally, I was there just last night, looking at a January '98 Slashdot.
  • Re:63 x 48 = 3024Tb (Score:5, Informative)

    by spinkham ( 56603 ) on Wednesday March 25, 2009 @07:55PM (#27336809)

    TFA says "...eight racks filled with 63 Sun Fire x4500 servers with dual- or quad-core x86 processors running Solaris 10 with ZFS. Each Sun server is combined with an array of 48 1TB hard drives." (emphasis mine)

    I would guess this means there's a x4500 with 24TB in local disks, and 48TB in attached storage per machine. (24+48)*63 does give us the quoted number

  • by Anonymous Coward on Wednesday March 25, 2009 @07:55PM (#27336811)

    In Brewster Kahle's December 2007 TED talk he mentions a third mirror in the Netherlands.
    http://www.ted.com/index.php/talks/brewster_kahle_builds_a_free_digital_library.html [ted.com]

    As he puts it, the Archive is mirrored on 'a fault line, a flood plain, and in the Middle East'.

    Funny thing is I can't find another reference to the Netherlands mirror. The Bibliotheca Alexandrina site mentions a plan to eventually have four sites (California, Alexandria, Europe, and Asia), but that's it. Anyone know what happened with the Netherlands site?

  • Re:"Sun Fire" (Score:3, Informative)

    by ximenes ( 10 ) on Wednesday March 25, 2009 @07:57PM (#27336831)

    Since they're using one of Sun's modular datacenters that is actually on the Sun campus, I would imagine that they got some financial incentives / support from Sun for all of this.

    The X4500 is EOL as you mention, although it was still sold a few months back. It lives on as the X4540, which really isn't that different; the main thing is it's moved to a newer Opteron processor type and is a fair bit cheaper. So they didn't really miss out on anything.

    It's kind of interesting to me that they went this route, as opposed to a bunch of servers talking to a bunch of storage separately. This seems to be an exact use case for the X4500-type system, which as far as I'm aware is pretty unique.

  • Re:63 x 48 = 3024Tb (Score:3, Informative)

    by rackserverdeals ( 1503561 ) on Wednesday March 25, 2009 @08:38PM (#27337165) Homepage Journal

    Sun has more information and an Interactive tour [sun.com] of the Internet Archive modular data center on their site.

    The total raw capacity of the container is 3 peta bytes. In reality it's going to be less than that. First, 2 disks are likely to be setup in a mirrored pool for the system disks. I believe the root pool only supports mirrors, not raidz. Not sure if this has changed.

    That leaves you with 46 disks for data. Maybe they partitioned part of the root pool to include in the data pools, not sure, but zfs works better with whole disks.

    In the interactive tour, they weren't clear on how they set up the pools.

    Side note. Maybe I'm cynical, but if this was the other way around, with linux servers replacing sun/solaris servers that probably would have been the headline.

    Pretty neat to find out that the internet archive is powered by Java too. The wayback machine is java as well as the crawlers.

  • Re:"Sun Fire" (Score:2, Informative)

    by Anonymous Coward on Wednesday March 25, 2009 @09:02PM (#27337371)

    Anyway, the specific server being used here is the Sun Firex X4500 [sun.com], a system with no less than 48 1 TB disks in a 4U space. Notice that this model is EOLed; presumably iarchive got a deal on some remaindered machines.

    There are newer X4540s which are mostly the same, but have newer CPUs, and can hold more memory (16 -> 64 GB).

  • by Rural ( 136225 ) on Thursday March 26, 2009 @02:42AM (#27338889)

    Their aim is to preserve the content found on the Web. They need the hardware for that. I assume they don't need much for the "serving users" part.

Old programmers never die, they just hit account block limit.

Working...