Facebook's Corona: When Hadoop MapReduce Wasn't Enough 42
Nerval's Lobster writes "Facebook's engineers face a considerable challenge when it comes to managing the tidal wave of data flowing through the company's infrastructure. Its data warehouse, which handles over half a petabyte of information each day, has expanded some 2500x in the past four years — and that growth isn't going to end anytime soon. Until early 2011, those engineers relied on a MapReduce implementation from Apache Hadoop as the foundation of Facebook's data infrastructure. Still, despite Hadoop MapReduce's ability to handle large datasets, Facebook's scheduling framework (in which a large number of task trackers that handle duties assigned by a job tracker) began to reach its limits. So Facebook's engineers went to the whiteboard and designed a new scheduling framework named Corona."
Facebook is continuing development on Corona, but they've also open-sourced the version they currently use.
Re:What? (Score:3, Informative)
Hadoop: massive data storage system framework... "Apache Hadoop is an open-source software framework that supports data-intensive distributed applications"
MapReduce: a way of managing distributed clusters of data sets... "MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computers"
Scheduling framework: a framework for providing optimal scheduling of something such that events are handled in an optimal manner.
Or, to put it another way:
http://lmgtfy.com/?q=hadoop [lmgtfy.com]
http://lmgtfy.com/?q=mapreduce [lmgtfy.com]
http://lmgtfy.com/?q=scheduling+framework [lmgtfy.com]