Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Facebook Data Storage Open Source Hardware

Facebook's Corona: When Hadoop MapReduce Wasn't Enough 42

Nerval's Lobster writes "Facebook's engineers face a considerable challenge when it comes to managing the tidal wave of data flowing through the company's infrastructure. Its data warehouse, which handles over half a petabyte of information each day, has expanded some 2500x in the past four years — and that growth isn't going to end anytime soon. Until early 2011, those engineers relied on a MapReduce implementation from Apache Hadoop as the foundation of Facebook's data infrastructure. Still, despite Hadoop MapReduce's ability to handle large datasets, Facebook's scheduling framework (in which a large number of task trackers that handle duties assigned by a job tracker) began to reach its limits. So Facebook's engineers went to the whiteboard and designed a new scheduling framework named Corona." Facebook is continuing development on Corona, but they've also open-sourced the version they currently use.
This discussion has been archived. No new comments can be posted.

Facebook's Corona: When Hadoop MapReduce Wasn't Enough

Comments Filter:
  • by Anonymous Coward

    after paging through the code a bit, i found it interesting that they use java in their implementation (not just corona, but hadoop as well). i was wondering why, and after some googling found this link [nabble.com] which helped explain the situation a bit clearer.

    pretty interesting stuff. but id be willing to bet googles map reduce is written in c/c++

  • Facebook (Score:4, Interesting)

    by gman003 ( 1693318 ) on Friday November 09, 2012 @02:58PM (#41934991)

    I have to admit, while I hate using Facebook, and hate most of their business practices, I like how they're not just writing new infrastructure software, but are open-sourcing it all. I don't think it quite makes up for everything else, but it helps.

  • Have been code-named corona these last few years?? Seems like every org's got a project named corona nowadays.

    • Have been code-named corona these last few years?

      The only one I can think of involves me remotely managing a server from the beach with only a lime wedge and cold beer.

  • They could start by actually deleting deleted content. Seems simple to me. Lets hope their shortsightedness continues when everyone jumps ship for the next social fad, and continuing this rat race becomes far to costly.
    • They could start by actually deleting deleted content

      They could, but why should they put themselves at a disadvantage over Google, every other corporation that buys such data and the NSA, who all most certainly do not delete stuff in the way you'd like them to?

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (10) Sorry, but that's too useful.

Working...