Forgot your password?
typodupeerror
Google Businesses The Internet Data Storage Technology

Google Two Years Into Overhaul of the Google File System 217

Posted by samzenpus
from the we-can-make-it-better dept.
El Reg writes "As its ten-year-old file system — GFS — struggles to keep up with Gmail, YouTube, and other apps it was never designed to support, Google is brewing a replacement. According to the company, it's two years into a GFS sequel designed specifically for customer-facing apps that require ultra low latency."
This discussion has been archived. No new comments can be posted.

Google Two Years Into Overhaul of the Google File System

Comments Filter:
  • hmm (Score:5, Funny)

    by gnarfel (1135055) <anthony.j.fiumara@gmail.com> on Wednesday August 12, 2009 @10:31PM (#29046933) Homepage
    Well I'm no expert on Google's internal workings, but are any of these protocols or file systems they've developed been released outside of Google for public use?
    • Re:hmm (Score:5, Funny)

      by buchner.johannes (1139593) on Wednesday August 12, 2009 @10:42PM (#29046991) Homepage Journal

      GFS is proprietary and for internal use only. The only released a paper describing how it works (don't know if that content is enough to rebuild it). I think GFS (global file system) from Redhat [redhat.com] and OpenGFS [sourceforge.net] is something differently. Hadoop is what you want. What would we do without the wiki [wikipedia.org]

    • Re:hmm (Score:5, Funny)

      by Brian Gordon (987471) on Wednesday August 12, 2009 @10:51PM (#29047059)

      No, they haven't. So why does the editor think we care? "Google Six Months Into Resurfacing Parking Lot"

    • Re:hmm (Score:5, Funny)

      by mysidia (191772) on Wednesday August 12, 2009 @10:51PM (#29047061)

      They have not, and apparently Google thinks of the Google FS as part of their secret sauce, such that they will probably never get it released. Although they seem happy to write papers about it.

      It's actually really sad... Google has built an innovative platform for distributed computing, that solves quite a few problems, vastly superior to the state of the art in distributed computing, but they basically keep the filesystem and clustering implementations completely to themselves, it would seem.

      They use the Linux platform to the absolute max, leveraging all the blood and sweat Linux developers poured into its development over the past 15 years, and yet, not contributing back any of their most significant enhancements.

      I won't call it evil, as they're under no obligation to release GoogleFS or their map reduce implementations, it's just unkind.

      I would equate it to an inventor creating the lightbulb, and their employer saw this, and decided instead of trying to sell the invention to the public, they decided to only allow their own factories to buy lightbulbs, thus netting them a competitive advantage over other factories whose workers had to operate in the dark or by candlelight.

      No software product available to the public that even utilizes GoogleFS. Instead it's all software as a service (The Google search engine service, that is)

      • Re:hmm (Score:5, Interesting)

        by MeatBag PussRocket (1475317) on Wednesday August 12, 2009 @11:05PM (#29047153)

        They use the Linux platform to the absolute max, leveraging all the blood and sweat Linux developers poured into its development over the past 15 years, and yet, not contributing back any of their most significant enhancements.

        i see your point, but its not like google isnt giving signifigantly in return. most people would be hard pressed to deny that Googles search engine was a game changer in the interweb. at its release it was leaps and bounds better tahn just about anything out there, and is still the gold standard for finding information. hell they gave us the verb "to google" we got a pretty decent browser out of it, gmail, google docs, google maps, and a whole bunch of other stuff they've generated. not to mention a forthcoming OS. at this point i can already hear critics screaming about Googles profits driving these services, and you know what, maybe they are, but i havent paid Google a dime, and most likely, neither have you. i dont care if they make money, theres nothing wrong with it, and i'm even happier that they make money without involving me whatsoever. in many ways i would think Google would be a champion to the FOSS community. so they want to keep a filesystem proprietary, frankly thats not so bad, competition is good but competitors arent usually. Google is a good counter balance to Microsoft and other would-be owners of the interwebs. are they "good" as in saintly? no, but they never claimed to be, they claimed "dont be evil" i'd say they're pretty far from that.

        • Your comment and his moderation are an exact copy of what astroturfers has been doing for MS for years on public forums (paid marketing spin). Unfourtunately slashdotters seems to be easily deceived by the G word. :(

          • astroturfers has been doing for MS for years on public forums (paid marketing spin).

            That's simply not true.

          • by jmpeax (936370) *

            Your comment and his moderation are an exact copy of what astroturfers has been doing for MS for years on public forums

            You're implying he's being paid by someone to express certain opinions. Where's your evidence? Or maybe you have none?

            Is it really so unlikely that someone might simply have an opinion different to yours and want to express it? Or is it that you are so insecure in your belief in the strength of your position that you feel the need to attribute different points of view to evil corporate conspiracies? Or is it that you are so utterly arrogant in your belief that things should work a certain way that you cann

          • by steveo777 (183629)

            goats.ex?

      • Re:hmm (Score:5, Insightful)

        by lawpoop (604919) on Wednesday August 12, 2009 @11:17PM (#29047237) Homepage Journal

        They use the Linux platform to the absolute max, leveraging all the blood and sweat Linux developers poured into its development over the past 15 years, and yet, not contributing back any of their most significant enhancements.

        Not contributing back!? Dude, they gave us *google*. Remember what it was like before google? When internet search was basically voo-doo crapshoots, that worked 25% of the time? They gave us a search engine that actually *worked*. Before that, you basically had to bookmark or memorize internet sites that you liked. Good luck actually finding what you were looking for without having an actual site in mind beforehand.

        I think that alone has probably spurred the development of free software. Imagine being able to *find things* on the internet!

        • Re:hmm (Score:5, Funny)

          by Night Goat (18437) on Wednesday August 12, 2009 @11:37PM (#29047315) Homepage Journal

          Yahoo worked fine for me before Google. I think you give it more credit than it deserves. The downside of Yahoo was its advertising and clutter. The searching part worked fine.

          • Re:hmm (Score:5, Funny)

            by CharlyFoxtrot (1607527) on Wednesday August 12, 2009 @11:44PM (#29047355)
            Altavista worked fine, HotBot too. I started using Google primarily because of the cached pages, not because the search was that much better. Plus like you say the Google interface was a breath of fresh air.
          • Re:hmm (Score:5, Informative)

            by mysidia (191772) on Wednesday August 12, 2009 @11:47PM (#29047377)

            Yahoo was originally a web directory, not a conventional search engine. The search results were provided by others.

            In 2000, they signed an agreement with Google, and Yahoo's search was powered by Google, in other words -- if you used Yahoo, you were using Google.

            That didn't change until 2005, and after several other search engine company acquisitions, when they developed their own search technology.

            • In 2000, they signed an agreement with Google, and Yahoo's search was powered by Google, in other words -- if you used Yahoo, you were using Google.

              Let us not forget Inktomi [cnet.com], I believe they used a few other providers during those years as well.

            • I fail to understand the difference. Directory or Search Engine, don't they both crawl the web for data, index it and allow users to start a search of the stored content?

              I don't recall Yahoo being like a telephone directory where things were grouped. I recall entering the page and typing a search query. More often than not I would get back the garbage pages that had the hidden tagging with all of those hot keywords everyone was searching on just to bump a page rank.

              Google was of the first engines to see

          • Re:hmm (Score:5, Informative)

            by jcnnghm (538570) on Thursday August 13, 2009 @12:01AM (#29047449)

            You know from June 2000 to February 2004 Google was the backend for the Yahoo web page search. That was back when Yahoo was a web site "human directory" search first and foremost, and only secondarily a machine-powered internet search. Sort of like how Yahoo search is going to be powered by Bing in the future, and was powered by Inktomi before Google.

        • Re:hmm (Score:5, Funny)

          by billcopc (196330) <vrillco@yahoo.com> on Wednesday August 12, 2009 @11:42PM (#29047347) Homepage

          You clearly weren't an Altavista user.

          Google's results today are no better than the leading search engines 10 years ago. People were gaming the engines then, and Google came up with a smarter algorithm (Pagerank), but today's results page is again full of garbage because people learned how to game Pagerank. Combine that with the web 2.0 fad of scraping and regurgitating everyone else's content, and the resultant pile of URLs for any given keyword is utterly worthless. I call it "metapublishing", because the content is worthless, it's become a twisted game of outwitting Google to maximize ad revenue while providing zero value.

          Searching has always been a game of finding the most specific yet least popular terms to define what you want, and then adding a bunch of negative keywords to filter out the junk. Google scored a hit, many many years ago, but they haven't been able (or willing) to maintain that lead, and all their competitors have pretty much died out anyway.

          If Google hadn't come along when it did, someone else would have stepped up. Maybe Altavista, or Yahoo, or someone else. There was a need, and a provider to address that need. The only reason we don't have a new search engine to beat Google today is because, well, everyone is scared shitless of going head-to-head with Google, except Microsoft with their propaganda-laced Bing embarrassment. They're just not the golden child people seem to think they are.

          • Re:hmm (Score:5, Insightful)

            by lawpoop (604919) on Thursday August 13, 2009 @12:06AM (#29047483) Homepage Journal

            If Google hadn't come along when it did, someone else would have stepped up.

            Doesn't change the fact that it *was* them, who was able to do it when nobody else had been able to. So I think that yes, they did contribute a lot to open source development. It's not enough to have a good idea, or believe that someone will eventually get around to it; someone actually has to sit down and *do* it. If google hadn't done it then, we would be that much further behind in internet search technology.

          • Re:hmm (Score:5, Informative)

            by Mostly a lurker (634878) on Thursday August 13, 2009 @12:34AM (#29047647)
            Your recollections are different from mine. Prior to Google, I tended to use AltaVista and Hotbot. Searches took at least ten times as long. Results rarely included any recently created pages. The number of indexed pages was several orders of magnitude less than Google handles today (which in turn is one order of magnitude, or so, greater than current competitors). In spite of the fact that gaming of search engines is overwhelmingly targeted at Google, Google still does a relatively better job of finding the genuinely useful pages. Is Google perfect? No, of course not. Search is still only a partially solved problem. However, since its inception, Google has come up with most of the practical advances in the state of the art, as well as the best infrastructure for its implementation.
          • Re:hmm (Score:5, Interesting)

            by Dahamma (304068) on Thursday August 13, 2009 @01:00AM (#29047779)

            You clearly weren't a daily Google user 10 years ago.

            The moment I realized Google was completely superior to the others was when I was able to paste an obscure compile error for an equally obscure CPU architecture into Google and immediately get the answer back... the kind of utterly random error that a few years previous would have potentially taken hours to debug...

            If Google hadn't come along when it did, someone else would have stepped up. Maybe Altavista, or Yahoo

            And you were modded Insightful - sigh... So you are saying they decided "oh, well Google is pretty good at this - let's NOT STEP UP." Yeah, that's what companies do in that situation. Or maybe they do try, and fail (nothing wrong with trying and failing... but that's the REALITY of the situation).

            • by mwvdlee (775178)

              Or maybe they do try, and fail

              Microsoft's recent and past antics in search engines pretty much proves this.
              MS continuously tries to step up, even though they keep failing.

          • Re:hmm (Score:5, Insightful)

            by wdr1 (31310) * <wdr1 AT pobox DOT com> on Thursday August 13, 2009 @01:10AM (#29047855) Homepage Journal

            Put the crackpipe down!

            I was an altavista user. A die-hard one, for most of the mid/late-nineties. In fact, I remember the day I finally convinced my boss to switch from Altavista to Google, because he had worked on Altavista.

            Today's results completely blow away the search engines of 10 years ago. In fact, any of the major players -- Yahoo, Microsoft, even Ask & co. -- would blow away the search engines of 10 years ago.

            (Add to the fact that the number of documents on the web that they need to crawl & rank have exploded.)

            Your comment that "the resultant pile of URLs for any given keyword is utterly worthless" is itself hyperbolic nonsense. If that were true, nobody would use them.

            • by cheekyboy (598084)

              google is so good I forgot a movie title , and just entered 1 letter and 1 number of it, and it found it.

              Now thats good!!!

          • Re:hmm (Score:5, Insightful)

            by unity (1740) on Thursday August 13, 2009 @01:48AM (#29048071)
            The only thing you really missed there was the really simple, non-image intensive interface. That alone spurred people to use google.
          • Re:hmm (Score:5, Insightful)

            by sootman (158191) on Thursday August 13, 2009 @02:29AM (#29048299) Homepage Journal

            It really amuses me how all these different comments come up in every thread about search engines. Everyone's experience is different. Google is still very useful to me 99% of the time. As for AltaVista, I remember '96-'97 very well. I would usually use Yahoo first. If Yahoo only produced a small handful of results--literally, 10 or less, and no good ones--then I'd go to AltaVista and get tens of thousands of results. If I was lucky I'd find what I wanted in the first few pages, else I'd give up.

            Google is still literally orders of magnitude than anything else I've tried. Disclaimer: I've pretty much used only Google for the last... um, however many years it's been since they came on the scene. I won't claim to have used it when they were still hosted at stanford.edu, but I heard about them early on (back when they had , probably from Slashdot, and I was impressed right away. I probably stopped using Yahoo altogether within a couple months.

          • by dr_d_19 (206418)

            Even if that was true (which it is not), it's not the reason I switched to Google. My problem was that altavista snuck ads into the search results. Like adwords, but it would appear as search result #1. That - combined with the fact that Googles result were better and updated more often - didn't really make the choice all that difficult.

        • Re:hmm (Score:5, Funny)

          by MobileTatsu-NJG (946591) on Thursday August 13, 2009 @12:35AM (#29047661)

          They use the Linux platform to the absolute max, leveraging all the blood and sweat Linux developers poured into its development over the past 15 years, and yet, not contributing back any of their most significant enhancements.

          Not contributing back!? Dude, they gave us *google*. Remember what it was like before google? When internet search was basically voo-doo crapshoots, that worked 25% of the time? They gave us a search engine that actually *worked*. Before that, you basically had to bookmark or memorize internet sites that you liked. Good luck actually finding what you were looking for without having an actual site in mind beforehand.

          I think that alone has probably spurred the development of free software. Imagine being able to *find things* on the internet!

          Are you kidding? Search for Quake? Porn. Search for a new version of Netscape? Porn. Google? PFtb. It always gave me Quake and Netscape. My pr0n searching was MUCH more productive before Google!

        • by sootman (158191)

          Not contributing back!? Dude, they gave us *google*.

          And "just fucking google it" [justfuckinggoogleit.com] has started replacing RTFM. :-) See also "Let Me Google That For You." [lmgtfy.com]

          A guy on a list I used to be on used to ask REALLY dumb, easily-googlable questions. I mean, you could literally take the message subject, plug it into google, and get the answer. I wrote (but never deployed) a script that would take the subject of his message, google it, and reply to the list with the first page of search results in the body. (Something like

      • Let's see what their TOS are for Chrome. Since OS's are another crown jewel of computing, if they sufficiently open it up to really let the devs make a Baskins Robbins 31-Flavors (Service Mark) of Chrome, *without* trying to copy Apple's secrecy, *and give back to Debian core*, they could unleash a force of nature.

  • by mysidia (191772) on Wednesday August 12, 2009 @10:38PM (#29046963)

    It's GoogleFS.

    GFS refers to the Global File System [wikipedia.org], which is commonly used in Linux clustering environments.

    By comparison, GoogleFS came second, is basically a no-name filesystem unknown to most of the IT world, because it's not available for use, hasn't been released as a product, compared to the well-established global filesystem.

    It would certainly seem like the Global File system would have priority claim over the name GFS...

    So let's stop calling Google's filesystem, which we'll probably never get to use GFS :)

  • by Alien Being (18488) on Wednesday August 12, 2009 @10:40PM (#29046981)

    but God help us all if they ever do turn evil.

    • by PhrostyMcByte (589271) <phrosty@gmail.com> on Wednesday August 12, 2009 @10:58PM (#29047109) Homepage
      Google is what happens when developers and IT talk to each other correctly. Normally there is a brick wall separating the two, with IT guys being at the mercy of whatever the well-meaning but typically oblivious (to IT problems) devs cook up.
    • by Duncan3 (10537) on Thursday August 13, 2009 @01:04AM (#29047811) Homepage

      Not really, it's IT done by not letting anyone over 30 or with any experience into the room. Every single issue they had to learn and fix mentioned in the article is quite literally standard textbook stuff in distributed systems, and has been for over 40 years. The failure model, the huge chunk sized, the single master problems... etc. Nobody who had taken even one decent class would have ever considered the original design viable.

      They really should just stick to buying their tech pre-made like everything else Google is known for - acquisitions [wikipedia.org]. Other companies are willing to hire experienced people. You know, those old lazy bastards that only work 40 hours a week because they have families, cost way too much to provide health insurance to, but get things done 5x as fast because they have done it before :)

      • Re: (Score:3, Insightful)

        Not really, it's IT done by not letting anyone over 30 or with any experience into the room. Every single issue they had to learn and fix mentioned in the article is quite literally standard textbook stuff in distributed systems, and has been for over 40 years. The failure model, the huge chunk sized, the single master problems... etc. Nobody who had taken even one decent class would have ever considered the original design viable. They really should just stick to buying their tech pre-made like everything else Google is known for - acquisitions [wikipedia.org]. Other companies are willing to hire experienced people. You know, those old lazy bastards that only work 40 hours a week because they have families, cost way too much to provide health insurance to, but get things done 5x as fast because they have done it before :)

        You hit the nail right on the head. The original GFS is pretty lame, as Google folks freely admit (full disclosure: I'm a fomer Googler, but I'm not telling you anything you can't find on ahem Google). The new GFS will also be pretty lame, because as you correctly point out, Larry, Sergey and Eric don't quite get the concept of experienced people who have done it before. All that standard clustering stuff has to be reinvented by Googlers, who frankly, have gotten a little soft over the years, now so used

    • Just think about it, you're George Bush or Dick Cheney and The Google is just sitting there ripe for the plucking. Done deal dude. The USA Government very likely has all the access they want at Google to anything at all in real time with all details. If it's not true assume it is to be on the safe side.

  • Curiously (Score:5, Insightful)

    by ShooterNeo (555040) on Wednesday August 12, 2009 @10:43PM (#29047001)
    In the article, it's stated that the load on the google file system has grown orders of magnitude greater than it was ever intended to handle. And one of the algorithm changes is that the chunks in the new file system are 1 megabyte in size rather than 64 megabytes. This is to reduce latency, which makes logical sense...but dividing a gigantic database into pieces that are 64 time smaller doesn't make intuitive sense...
    • Re: (Score:2, Interesting)

      by Alien Being (18488)

      "..but dividing a gigantic database into pieces that are 64 time smaller doesn't make intuitive sense..."

      It does if it was 64x too big to begin with. Live and learn.

  • by s0litaire (1205168) * on Wednesday August 12, 2009 @11:19PM (#29047243)
    There's over 25 comments and not one has attempted to call it "Goatse File System"!

    Whats up with you trolls! You guys on a union break or what!!

  • Seriously folks, is there no already existing file system that can already meet these needs? If not, then what are Google's competitors using?

    Is that no one else has yet to face up to this issues properly and this is a huge competitive advantage for Google, or is it simply NIH?
    • Re: (Score:3, Insightful)

      by MikeBabcock (65886)

      Google has competitors?

      Seriously, Microsoft has been promising a database driven filesystem for its server OS for years without delivering anything substantial to date, and it doesn't seem like they're running anything different internally either.

    • by jimicus (737525)

      Well, there are distributed databases around and you could always write a web frontend to query one of those. But they're very expensive and not generally well represented in terms of Free (as in beer) products, and you're still left with the problem "how do we turn SELECT * FROM pages WHERE content="%user data%" into a useful set of results?

  • Cool, but where can we download it?

    Oh we can't?

    It's an internal project and it will remain an internal project just like the previous version. So what's the point for the rest of us?

    I'm really more excited about projects like Elliptics Network [ioremap.net], because at least, they can be useful to everyone, not only to Google's employees.

  • by kerskine (46804) on Thursday August 13, 2009 @07:59AM (#29049939) Homepage Journal

    I have not doubt that the new file system will be great, but after reading this summary, the first thought I had was that I should back-up all that Gmail before they cut it over. I've been putting it off for far too long, but I'll just have to burn a couple of days of attention to do it.

    • Re: (Score:3, Informative)

      by fracai (796392)

      Couple days?

      Install OfflineIMAP or Getmail, write the config file, schedule cron, launchd, etc., and be done with it. Shouldn't take more than an hour. The bulk of that should be working through the config and both products include healthy examples.

FORTUNE'S FUN FACTS TO KNOW AND TELL: #44 Zebras are colored with dark stripes on a light background.

Working...