Forgot your password?
typodupeerror
Databases Intel Hardware

Intel Launches Its Own Apache Hadoop Distribution 18

Posted by Soulskill
from the if-you-want-something-done-right-do-it-yourself dept.
Nerval's Lobster writes "The Apache Hadoop open-source framework specializes in running data applications on large hardware clusters, making it a particular favorite among firms such as Facebook and IBM with a lot of backend infrastructure (and a whole ton of data) to manage. So it'd be hard to blame Intel for jumping into this particular arena. The chipmaker has produced its own distribution for Apache Hadoop, apparently built 'from the silicon up' to efficiently access and crunch massive datasets. The distribution takes advantage of Intel's work in hardware, backed by the Intel Advanced Encryption Standard (AES) Instructions (Intel AES-NI) in the Intel Xeon processor. Intel also claims that a specialized Hadoop distribution riding on its hardware can analyze data at superior speeds—namely, one terabyte of data can be processed in seven minutes, versus hours for some other systems. The company faces a lot of competition in an arena crowded with other Hadoop players, but that won't stop it from trying to throw its muscle around."
This discussion has been archived. No new comments can be posted.

Intel Launches Its Own Apache Hadoop Distribution

Comments Filter:
  • How does that compare to something like spark [spark-project.org]?
    • Re:Speed (Score:5, Informative)

      by Anonymous Coward on Tuesday February 26, 2013 @06:36PM (#43019227)

      The performance claim in the summary seems to come from page 15 of this presentation [intel.com], where the speedup for a 1TB sort (presumably distributed) is 4 hours -> 7 minutes. I can't find the details for that test, but most of the speedup comes from using better hardware - faster CPU and network adapter, and SSDs instead of HDDs - while they get a 40% speedup from using their Hadoop distribution over some other Hadoop distribution, which is a fairly modest gain.

      The biggest performance benefit of Spark comes from avoiding disk and network access, so improving those bottlenecks will presumably reduce Spark's lead over Hadoop somewhat. But it's hard to say how well Spark would do with this particular hardware and test setup. I would guess it's still much faster than their Hadoop distribution. (Note: I'm a Spark power user but not an expert in its performance.)

      • by Anonymous Coward

        Yeah, the details in that presentation describe something far less impressive than the top-line "4 hours -> 7 minutes" claim. You are absolutely correct that only a very modest amount of the ~35x speedup claimed is attributable to the Intel Hadoop distribution itself, with the bulk of the speedup coming from significant hardware upgrades across the cluster. Spark wouldn't benefit from the hardware changes in exactly the same way, but it would still see significant gains from upgrading the cluster hardw

      • by Anonymous Coward

        Approximated results from the presentation:
        - Hadoop 1.0.3, old Xeon, HDD, 1G Ethernet -> 240 minutes
        - Hadoop 1.0.3, new Xeon, HDD, 1G Ethernet -> 120 minutes
        - Hadoop 1.0.3, new Xeon, SSD, 1G Ethernet -> 24 minutes
        - Hadoop 1.0.3, new Xeon, SSD, 10G Ethernet -> 12 minutes
        - Hadoop 2.1.1, new Xeon, SSD, 10G Ethernet -> 7 minutes

        The only useful conclusion is that changing Hadoop version from 1.0.3 to 2.1.1 can give you 40% reduction of duration. I wonder how it would work for other hardware config

    • Re: (Score:3, Informative)

      by Anonymous Coward

      It's impossible to say without the details of apples-to-apples comparisons, but superficially, none of the announcements of "improved Hadoop" from Intel, Greenplum, Hortonworks, etc. is all that impressive in comparison to Spark even if you assume that none of their improvements can or will be integrated into Spark. Take, for example, a couple of the claims that Intel is making for their new Hadoop distribution. First, the "four hour job reduced to seven minutes" claim is the same ballpark 30-40x claim ma

  • by masternerdguy (2468142) on Tuesday February 26, 2013 @06:12PM (#43018961)
    So they've migrated an open solution to a vendor locked in solution? Sweet.
    • by wlj (204164)

      The (stated) speed-up could be nice, but:

      (1) how locked-in is it (just some tuning, serious modification, what?)
      (2) have they actually released it?

      • Even if it's completely locked in, you data isn't.

        Simple really, if you have Intel hardware use this distro to take advantage of it, otherwise use the Apache one. No reason AMD or nVidia can't do the same...
        -nb

  • by citizenr (871508) on Tuesday February 26, 2013 @06:20PM (#43019065) Homepage

    ...

    • by Anonymous Coward

      AES-NI is not just AES processing, it includes significant improvements to general vector processing instructions.

  • by Anonymous Coward

    must run gentoo

"We learn from history that we learn nothing from history." -- George Bernard Shaw

Working...