Google Two Years Into Overhaul of the Google File System 217
El Reg writes "As its ten-year-old file system — GFS — struggles to keep up with Gmail, YouTube, and other apps it was never designed to support, Google is brewing a replacement. According to the company, it's two years into a GFS sequel designed specifically for customer-facing apps that require ultra low latency."
hmm (Score:5, Funny)
Re:hmm (Score:5, Funny)
GFS is proprietary and for internal use only. The only released a paper describing how it works (don't know if that content is enough to rebuild it). I think GFS (global file system) from Redhat [redhat.com] and OpenGFS [sourceforge.net] is something differently. Hadoop is what you want. What would we do without the wiki [wikipedia.org]
Re:hmm (Score:5, Funny)
Parent and GP modded funny? Am I missing the joke or are there some giddy drunks with mod points?
Re:hmm (Score:4, Funny)
Everyone is in a good mood. Why not :-)
Re:hmm (Score:5, Funny)
Everyone is in a good mood. Why not :-)
Modded Troll... now that's delicious.
Re:hmm (Score:5, Funny)
Woooooooooooooo!
(Apparently just entering "Woooooooooooooo!" creates an error. I have to explain that it's supposed to be a giddy mod, thus destroying any semblance of assuming intelligence present in at least part of the /. community).
Re:hmm (Score:5, Funny)
Re:hmm (Score:5, Informative)
It takes absolutely zero effort for this post to be modded funny
It doesn't take much to be modded informative either.
Re:hmm (Score:5, Funny)
C-C-C-C-COMBO BREAKER!
Re: (Score:3, Funny)
Except for you apparently Mr. (Score:3, Informative)....
Re:hmm (Score:5, Funny)
Of course, by posting under your user account, you're not a mod in this thread any more so it's all rather academic.
Re:hmm (Score:5, Funny)
Please take your meds, Rick Flair....
Re: (Score:3, Funny)
Re:hmm (Score:5, Funny)
Re:hmm (Score:5, Funny)
It's called a " running gag".
Re:hmm (Score:5, Funny)
Re:hmm (Score:5, Funny)
Re:hmm (Score:5, Funny)
Because it's funny.
Re: (Score:2, Funny)
I CAN'T SEE ANY DOTS!
Re:hmm (Score:5, Funny)
Move closer to your screen. There's plenty of them.
Where's meta-moderation when you need it? (Score:5, Funny)
I'm impressed that all of these Reddit users had the attention span to stay long enough to get mod points. But nobody likes a guest who overstays their welcome. Besides, I think somebody posted an animated gif of an old man falling down or something. GO CHECK IT OUT!!!1!1one
You're a fascist Republican jihadist (Score:2)
Reddit groupthink: Although we work day jobs in call centers and as photoshop jockeys, we have figured out The One True Path (a mixture of populist liberalism, lolcats and outrage at anything that might make us leave our W.O.W.-enabled terminals for more than 30 seconds) and consider all of you who have not seen it to be beneath us. Even as we ask you to work with us while we fix this driver issue together.
Re:hmm (Score:5, Funny)
Shit, I see dots from the start. My brain must be really lazy.
Re:hmm (Score:5, Funny)
Funny
Funny
Funny
Fun...
Shit, you're right!
Re: (Score:2)
Re: (Score:2)
It's meta-funny
Does that mean the meta mod for the mod should be modded funny?
I think we have a winner for the next April Fool's.
Re:hmm (Score:5, Funny)
Many people think that Google's original claim to fame is PageRank. That's only partially true. Google became as successful as they are because of their systems-scalability work. That is, Google figured out how to build the biggest clusters, with the most storage space, the most computation capacity, and the lowest latency, for the least amount of money (compared to their competitors anyway). If you have 1000x times the computing power of your nearest competitor, then you can do 1000x as much data mining, which means that your search results (and ad relevancy) will be that much better.
For a long time, Google refused to release any information on their system infrastructure (it was their crown jewel, after all). The GFS paper was released in 2003, well after Google had put the filesystem (and its predecessors) to public use.
To sum it up: GFS has been one of the strongest contributing factors to Google's dominance. The idea that Google would voluntarily give this code to competitors is laughable.
Re: (Score:3, Funny)
Almost every comment posted on this page is modded 5 Funny. o_0
Really, what's going on?
Re:hmm (Score:4, Funny)
Re:hmm (Score:5, Funny)
No, they haven't. So why does the editor think we care? "Google Six Months Into Resurfacing Parking Lot"
Re:hmm (Score:5, Funny)
And it's still in beta.
Re:hmm (Score:5, Funny)
They have not, and apparently Google thinks of the Google FS as part of their secret sauce, such that they will probably never get it released. Although they seem happy to write papers about it.
It's actually really sad... Google has built an innovative platform for distributed computing, that solves quite a few problems, vastly superior to the state of the art in distributed computing, but they basically keep the filesystem and clustering implementations completely to themselves, it would seem.
They use the Linux platform to the absolute max, leveraging all the blood and sweat Linux developers poured into its development over the past 15 years, and yet, not contributing back any of their most significant enhancements.
I won't call it evil, as they're under no obligation to release GoogleFS or their map reduce implementations, it's just unkind.
I would equate it to an inventor creating the lightbulb, and their employer saw this, and decided instead of trying to sell the invention to the public, they decided to only allow their own factories to buy lightbulbs, thus netting them a competitive advantage over other factories whose workers had to operate in the dark or by candlelight.
No software product available to the public that even utilizes GoogleFS. Instead it's all software as a service (The Google search engine service, that is)
Re:hmm (Score:5, Interesting)
They use the Linux platform to the absolute max, leveraging all the blood and sweat Linux developers poured into its development over the past 15 years, and yet, not contributing back any of their most significant enhancements.
i see your point, but its not like google isnt giving signifigantly in return. most people would be hard pressed to deny that Googles search engine was a game changer in the interweb. at its release it was leaps and bounds better tahn just about anything out there, and is still the gold standard for finding information. hell they gave us the verb "to google" we got a pretty decent browser out of it, gmail, google docs, google maps, and a whole bunch of other stuff they've generated. not to mention a forthcoming OS. at this point i can already hear critics screaming about Googles profits driving these services, and you know what, maybe they are, but i havent paid Google a dime, and most likely, neither have you. i dont care if they make money, theres nothing wrong with it, and i'm even happier that they make money without involving me whatsoever. in many ways i would think Google would be a champion to the FOSS community. so they want to keep a filesystem proprietary, frankly thats not so bad, competition is good but competitors arent usually. Google is a good counter balance to Microsoft and other would-be owners of the interwebs. are they "good" as in saintly? no, but they never claimed to be, they claimed "dont be evil" i'd say they're pretty far from that.
Re:hmm Google shills (Score:2)
Your comment and his moderation are an exact copy of what astroturfers has been doing for MS for years on public forums (paid marketing spin). Unfourtunately slashdotters seems to be easily deceived by the G word. :(
Re: (Score:2)
astroturfers has been doing for MS for years on public forums (paid marketing spin).
That's simply not true.
Re: (Score:2)
Your comment and his moderation are an exact copy of what astroturfers has been doing for MS for years on public forums
You're implying he's being paid by someone to express certain opinions. Where's your evidence? Or maybe you have none?
Is it really so unlikely that someone might simply have an opinion different to yours and want to express it? Or is it that you are so insecure in your belief in the strength of your position that you feel the need to attribute different points of view to evil corporate conspiracies? Or is it that you are so utterly arrogant in your belief that things should work a certain way that you cann
Re: (Score:2)
goats.ex?
Re:hmm (Score:5, Insightful)
They use the Linux platform to the absolute max, leveraging all the blood and sweat Linux developers poured into its development over the past 15 years, and yet, not contributing back any of their most significant enhancements.
Not contributing back!? Dude, they gave us *google*. Remember what it was like before google? When internet search was basically voo-doo crapshoots, that worked 25% of the time? They gave us a search engine that actually *worked*. Before that, you basically had to bookmark or memorize internet sites that you liked. Good luck actually finding what you were looking for without having an actual site in mind beforehand.
I think that alone has probably spurred the development of free software. Imagine being able to *find things* on the internet!
Re:hmm (Score:5, Funny)
Yahoo worked fine for me before Google. I think you give it more credit than it deserves. The downside of Yahoo was its advertising and clutter. The searching part worked fine.
Re:hmm (Score:5, Funny)
Re:hmm (Score:5, Funny)
Plus like you say the Google interface was a breath of fresh air.
Sometimes I wonder if Yahoo hadn't made their default page http://search.yahoo.com/ [yahoo.com] early on, if they wouldn't have done somewhat better for themselves.
Re:hmm (Score:5, Funny)
I used Dogpile, back in the day; it would show you the results from ten or so other search engines.
Re:hmm (Score:5, Informative)
Yahoo was originally a web directory, not a conventional search engine. The search results were provided by others.
In 2000, they signed an agreement with Google, and Yahoo's search was powered by Google, in other words -- if you used Yahoo, you were using Google.
That didn't change until 2005, and after several other search engine company acquisitions, when they developed their own search technology.
Re: (Score:2)
In 2000, they signed an agreement with Google, and Yahoo's search was powered by Google, in other words -- if you used Yahoo, you were using Google.
Let us not forget Inktomi [cnet.com], I believe they used a few other providers during those years as well.
Re: (Score:2)
I fail to understand the difference. Directory or Search Engine, don't they both crawl the web for data, index it and allow users to start a search of the stored content?
I don't recall Yahoo being like a telephone directory where things were grouped. I recall entering the page and typing a search query. More often than not I would get back the garbage pages that had the hidden tagging with all of those hot keywords everyone was searching on just to bump a page rank.
Google was of the first engines to see
Re:hmm (Score:5, Informative)
You know from June 2000 to February 2004 Google was the backend for the Yahoo web page search. That was back when Yahoo was a web site "human directory" search first and foremost, and only secondarily a machine-powered internet search. Sort of like how Yahoo search is going to be powered by Bing in the future, and was powered by Inktomi before Google.
Re:hmm (Score:5, Funny)
You clearly weren't an Altavista user.
Google's results today are no better than the leading search engines 10 years ago. People were gaming the engines then, and Google came up with a smarter algorithm (Pagerank), but today's results page is again full of garbage because people learned how to game Pagerank. Combine that with the web 2.0 fad of scraping and regurgitating everyone else's content, and the resultant pile of URLs for any given keyword is utterly worthless. I call it "metapublishing", because the content is worthless, it's become a twisted game of outwitting Google to maximize ad revenue while providing zero value.
Searching has always been a game of finding the most specific yet least popular terms to define what you want, and then adding a bunch of negative keywords to filter out the junk. Google scored a hit, many many years ago, but they haven't been able (or willing) to maintain that lead, and all their competitors have pretty much died out anyway.
If Google hadn't come along when it did, someone else would have stepped up. Maybe Altavista, or Yahoo, or someone else. There was a need, and a provider to address that need. The only reason we don't have a new search engine to beat Google today is because, well, everyone is scared shitless of going head-to-head with Google, except Microsoft with their propaganda-laced Bing embarrassment. They're just not the golden child people seem to think they are.
Re:hmm (Score:5, Insightful)
If Google hadn't come along when it did, someone else would have stepped up.
Doesn't change the fact that it *was* them, who was able to do it when nobody else had been able to. So I think that yes, they did contribute a lot to open source development. It's not enough to have a good idea, or believe that someone will eventually get around to it; someone actually has to sit down and *do* it. If google hadn't done it then, we would be that much further behind in internet search technology.
Re:hmm (Score:5, Informative)
Re:hmm (Score:5, Interesting)
You clearly weren't a daily Google user 10 years ago.
The moment I realized Google was completely superior to the others was when I was able to paste an obscure compile error for an equally obscure CPU architecture into Google and immediately get the answer back... the kind of utterly random error that a few years previous would have potentially taken hours to debug...
If Google hadn't come along when it did, someone else would have stepped up. Maybe Altavista, or Yahoo
And you were modded Insightful - sigh... So you are saying they decided "oh, well Google is pretty good at this - let's NOT STEP UP." Yeah, that's what companies do in that situation. Or maybe they do try, and fail (nothing wrong with trying and failing... but that's the REALITY of the situation).
Re: (Score:2)
Or maybe they do try, and fail
Microsoft's recent and past antics in search engines pretty much proves this.
MS continuously tries to step up, even though they keep failing.
Re:hmm (Score:5, Insightful)
Put the crackpipe down!
I was an altavista user. A die-hard one, for most of the mid/late-nineties. In fact, I remember the day I finally convinced my boss to switch from Altavista to Google, because he had worked on Altavista.
Today's results completely blow away the search engines of 10 years ago. In fact, any of the major players -- Yahoo, Microsoft, even Ask & co. -- would blow away the search engines of 10 years ago.
(Add to the fact that the number of documents on the web that they need to crawl & rank have exploded.)
Your comment that "the resultant pile of URLs for any given keyword is utterly worthless" is itself hyperbolic nonsense. If that were true, nobody would use them.
Re: (Score:2)
google is so good I forgot a movie title , and just entered 1 letter and 1 number of it, and it found it.
Now thats good!!!
Re:hmm (Score:5, Insightful)
Re:hmm (Score:5, Insightful)
It really amuses me how all these different comments come up in every thread about search engines. Everyone's experience is different. Google is still very useful to me 99% of the time. As for AltaVista, I remember '96-'97 very well. I would usually use Yahoo first. If Yahoo only produced a small handful of results--literally, 10 or less, and no good ones--then I'd go to AltaVista and get tens of thousands of results. If I was lucky I'd find what I wanted in the first few pages, else I'd give up.
Google is still literally orders of magnitude than anything else I've tried. Disclaimer: I've pretty much used only Google for the last... um, however many years it's been since they came on the scene. I won't claim to have used it when they were still hosted at stanford.edu, but I heard about them early on (back when they had , probably from Slashdot, and I was impressed right away. I probably stopped using Yahoo altogether within a couple months.
Re: (Score:2)
Even if that was true (which it is not), it's not the reason I switched to Google. My problem was that altavista snuck ads into the search results. Like adwords, but it would appear as search result #1. That - combined with the fact that Googles result were better and updated more often - didn't really make the choice all that difficult.
Re:hmm (Score:5, Funny)
They use the Linux platform to the absolute max, leveraging all the blood and sweat Linux developers poured into its development over the past 15 years, and yet, not contributing back any of their most significant enhancements.
Not contributing back!? Dude, they gave us *google*. Remember what it was like before google? When internet search was basically voo-doo crapshoots, that worked 25% of the time? They gave us a search engine that actually *worked*. Before that, you basically had to bookmark or memorize internet sites that you liked. Good luck actually finding what you were looking for without having an actual site in mind beforehand.
I think that alone has probably spurred the development of free software. Imagine being able to *find things* on the internet!
Are you kidding? Search for Quake? Porn. Search for a new version of Netscape? Porn. Google? PFtb. It always gave me Quake and Netscape. My pr0n searching was MUCH more productive before Google!
Re: (Score:2)
Not contributing back!? Dude, they gave us *google*.
And "just fucking google it" [justfuckinggoogleit.com] has started replacing RTFM. :-) See also "Let Me Google That For You." [lmgtfy.com]
A guy on a list I used to be on used to ask REALLY dumb, easily-googlable questions. I mean, you could literally take the message subject, plug it into google, and get the answer. I wrote (but never deployed) a script that would take the subject of his message, google it, and reply to the list with the first page of search results in the body. (Something like
Re: (Score:2)
Gopherspace is becoming too cluttered up with people [dailyrecord.co.uk].
Re:Give Back (Score:2)
Let's see what their TOS are for Chrome. Since OS's are another crown jewel of computing, if they sufficiently open it up to really let the devs make a Baskins Robbins 31-Flavors (Service Mark) of Chrome, *without* trying to copy Apple's secrecy, *and give back to Debian core*, they could unleash a force of nature.
It's not really GFS (Score:5, Insightful)
It's GoogleFS.
GFS refers to the Global File System [wikipedia.org], which is commonly used in Linux clustering environments.
By comparison, GoogleFS came second, is basically a no-name filesystem unknown to most of the IT world, because it's not available for use, hasn't been released as a product, compared to the well-established global filesystem.
It would certainly seem like the Global File system would have priority claim over the name GFS...
So let's stop calling Google's filesystem, which we'll probably never get to use GFS :)
Re: (Score:2)
I think I like the term Goo-FS
Re:It's not really GFS (Score:5, Funny)
That is a problem that may be getting corrected [ietf.org] by the IANA TLA registry :)
Re:It's not really GFS (Score:5, Funny)
Tell that to the NWA. Wrestling and a wildlife foundation should be even easier to tell apart, they both aren't as similar as two file systems.
Re: (Score:2)
Was I the only person whose first thought was "Niggaz With Attitude" [slashdot.org] ?
Broken Links (Score:2)
Feck [wikipedia.org].
Google is IT done right... (Score:5, Funny)
but God help us all if they ever do turn evil.
Re:Google is IT done right... (Score:4, Funny)
Re: (Score:2)
Re:Google is IT done right... (Score:4, Funny)
not on your life.
Developers constantly ruin perfectly good infrastructure.
Re: (Score:2)
Re: (Score:3, Interesting)
Because they're not allowed to share their ideas with IT, and vice versa. I can't list the number of times developers have published brain-scrambled vomit as part of their projects, because it didn't interest them and no one with experience was around to explain the inevitable problems. The maintenance model for subversion where you have to completely rebuild the repository to completely delete an accidentally stored DVD image is a classic example.
Conversely, I've expressed extreme doubts about projects tha
Re:Google is IT done right... (Score:5, Funny)
Developers aren't IT?
Not really, no. It's kind of like the difference between a doctor and a patient. Or to use a car analogy, the difference between being an automotive engineer and the guy who takes money for candy bars, magazines and fuel.
Disclosure: I was a developer for about thirty years before I took a step down and moved into marketing. I learned a lot of languages but was stopped when I discovered I was having trouble mastering Hindi.
Re: (Score:2)
If by "Doctor and patient" you're referring to the old joke where the patient walks up and says "my arm hurts when I do this" and the doctor responds: "Then don't do that."
I've met a lot of developers who really have no understanding of the operational impact of their design decisions. Some do a poor job of developing good test hooks and logging, others do a poor job of developing scalable software. Others ask for procedural workarounds for fundamental usability problems.
Re: (Score:2)
You need to change your terminology, because if you are saying that developers are not part of Information Technology, you are full of shit.
Oh dear dear dear. I may be uh, compostally privileged, but I am pretty much across the industry. I even started a bit of it (not hard when you enter the industry in 1969). "IT" or "ICT" is commonly used to denote network, server and desktop infrastructure these days. Development, that is COTS package development or bespoke customisation is usually abstracted and referred to as -- well, "Development".
I asked my daughter once what genus Dragons were most closely related to - were they a type of bird? "No
Re: (Score:2)
if you are saying that developers are not part of Information Technology, you are full of shit
IT (as an area of work) usually refers to IT support staff such as network administrators. Development is, generally, the writing of software.
IT as a general term can refer to pretty much anything directly or indirectly related to something using a microchip, so in that sense I suppose developers work in IT. But this use of the definition is so generic that it's not really useful at all, so dividing development and IT (as most people do) makes more sense.
you are full of shit
That was pretty senseless, wasn't it?
Re:Google is IT done right... (Score:5, Interesting)
Not really, it's IT done by not letting anyone over 30 or with any experience into the room. Every single issue they had to learn and fix mentioned in the article is quite literally standard textbook stuff in distributed systems, and has been for over 40 years. The failure model, the huge chunk sized, the single master problems... etc. Nobody who had taken even one decent class would have ever considered the original design viable.
They really should just stick to buying their tech pre-made like everything else Google is known for - acquisitions [wikipedia.org]. Other companies are willing to hire experienced people. You know, those old lazy bastards that only work 40 hours a week because they have families, cost way too much to provide health insurance to, but get things done 5x as fast because they have done it before :)
Re: (Score:3, Insightful)
Not really, it's IT done by not letting anyone over 30 or with any experience into the room. Every single issue they had to learn and fix mentioned in the article is quite literally standard textbook stuff in distributed systems, and has been for over 40 years. The failure model, the huge chunk sized, the single master problems... etc. Nobody who had taken even one decent class would have ever considered the original design viable. They really should just stick to buying their tech pre-made like everything else Google is known for - acquisitions [wikipedia.org]. Other companies are willing to hire experienced people. You know, those old lazy bastards that only work 40 hours a week because they have families, cost way too much to provide health insurance to, but get things done 5x as fast because they have done it before :)
You hit the nail right on the head. The original GFS is pretty lame, as Google folks freely admit (full disclosure: I'm a fomer Googler, but I'm not telling you anything you can't find on ahem Google). The new GFS will also be pretty lame, because as you correctly point out, Larry, Sergey and Eric don't quite get the concept of experienced people who have done it before. All that standard clustering stuff has to be reinvented by Googlers, who frankly, have gotten a little soft over the years, now so used
You think they're not evil already, that's funny (Score:2)
Just think about it, you're George Bush or Dick Cheney and The Google is just sitting there ripe for the plucking. Done deal dude. The USA Government very likely has all the access they want at Google to anything at all in real time with all details. If it's not true assume it is to be on the safe side.
Curiously (Score:5, Insightful)
Re: (Score:2, Interesting)
"..but dividing a gigantic database into pieces that are 64 time smaller doesn't make intuitive sense..."
It does if it was 64x too big to begin with. Live and learn.
Re:Curiously (Score:5, Funny)
No need to learn. 64x should be enough for anybody, dammit!
Re: (Score:2)
It does if it was 64x too big to begin with. Live and learn.
Yeah, I get that all the time.
Re: (Score:2, Insightful)
Re:Curiously (Score:5, Funny)
The question is based on assumptions. I've personally pushed an 18-wheeler over 150 mph. I've pushed a bike over 170 mph. In both cases, the limiting factors were all the 4-wheelers. Take the cars off the roads, and let the bikes and the trucks run.
Oh yeah - one more thing. Mandate that cop cars have square wheels. They already have radio, they need a handicap to make things fair.
Re: (Score:2)
I've personally pushed an 18-wheeler over 150 mph.
You must have really strong legs!
Re: (Score:2)
Interestingly, this is the only comment which is not modded funny!
-dZ.
Quality of comments going downhill... (Score:5, Funny)
Whats up with you trolls! You guys on a union break or what!!
Re: (Score:2)
they're from california. It's a furlough day.
Seriously Folks (Score:2)
Is that no one else has yet to face up to this issues properly and this is a huge competitive advantage for Google, or is it simply NIH?
Re: (Score:3, Insightful)
Google has competitors?
Seriously, Microsoft has been promising a database driven filesystem for its server OS for years without delivering anything substantial to date, and it doesn't seem like they're running anything different internally either.
Re: (Score:2)
Well, there are distributed databases around and you could always write a web frontend to query one of those. But they're very expensive and not generally well represented in terms of Free (as in beer) products, and you're still left with the problem "how do we turn SELECT * FROM pages WHERE content="%user data%" into a useful set of results?
Where can we download it? (Score:2)
Cool, but where can we download it?
Oh we can't?
It's an internal project and it will remain an internal project just like the previous version. So what's the point for the rest of us?
I'm really more excited about projects like Elliptics Network [ioremap.net], because at least, they can be useful to everyone, not only to Google's employees.
Better back your files up (Score:3, Interesting)
I have not doubt that the new file system will be great, but after reading this summary, the first thought I had was that I should back-up all that Gmail before they cut it over. I've been putting it off for far too long, but I'll just have to burn a couple of days of attention to do it.
Re: (Score:3, Informative)
Couple days?
Install OfflineIMAP or Getmail, write the config file, schedule cron, launchd, etc., and be done with it. Shouldn't take more than an hour. The bulk of that should be working through the config and both products include healthy examples.
Re:GFS? (Score:5, Funny)
Wrong, its the Google GFS File System!
Re: (Score:2)
Wouldn't that be GGFS?
Re: (Score:2)
no it would be GGFSFS
Re: (Score:2)
wwwwwoooooooooowwwwwooooooooossssshhhhhssssshhhhh
Re: (Score:2)