Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Power Hardware Hacking Networking Build

RPiCluster: Another Raspberry Pi Cluster, With Neat Tricks 79

New submitter TheJish writes "The RPiCluster is a 33-node Beowulf cluster built using Raspberry Pis (RPis). The RPiCluster is a little side project I worked on over the last couple months as part of my dissertation work at Boise State University. I had need of a cluster to run a distributed simulator I've been developing. The RPiCluster is the result. I've written an informal document on why I built the RPiCluster, how it was built, and how it performs as compared to other platforms. I also put together a YouTube video of it running an MPI parallel program I created to demo the RGB LEDs installed on each node as part of the build. While there have certainly been larger RPi clusters put together recently, I figured the Slashdot community might be interested in this build as I believe it is a novel approach to the rack mounting and power management of RPis."
This discussion has been archived. No new comments can be posted.

RPiCluster: Another Raspberry Pi Cluster, With Neat Tricks

Comments Filter:
  • 5 - Profit! (Score:5, Funny)

    by wibblewibble ( 2766235 ) on Saturday May 18, 2013 @02:29AM (#43760399)
    Dude, you should totally mine bitcoins with that bad boy!
    • ...Then you can buy more of these 'bad boy`s' to make even more BTC, buy even more 'bad boy`s' yet... Before you know it: WORLD DOMINATION!!!
      You could be driving around in one of these next week! (http://www.youtube.com/watch?v=cDoRmT0iRic)
    • by Anonymous Coward

      6.) Move out of basement.

      7.) Talk to a girl.

  • Hm... (Score:5, Funny)

    by Anonymous Coward on Saturday May 18, 2013 @02:37AM (#43760413)

    A new Raspberry Pi cluster Fram Boise University, eh?

    • Re: (Score:3, Insightful)

      by hughbar ( 579555 )
      Yes that's funny, but no-one much on here knows French...
      • by girlinatrainingbra ( 2738457 ) on Saturday May 18, 2013 @05:24AM (#43760777)
        Haha. It would have been funny (or funnier) if this guy had come up with the acronym FRAM for this project and then called the page (or overall project) FRAM-Boise , perhaps:
        Facilitated
        Raspberry.Pi
        Architectural
        Messaging

        since he says in his pdf document that " My research is currently focused on developing a novel da ta sharing system for wireless sensor networks to facilitate in-network collaborative processing of sensor data. In the process of developing this system it became clear that perhaps the most expedient way to test many of the ideas was to create a distributed simulation rather than developing directly on the final target embedded hardware."

      • Funny, I first saw it as "honni suit, qui mal y pense," but looking it up, find it's "honi soit." Guess that 8th grade French book had a few mistakes in it, back in '60. But then I don't know French, just a few bits here and there that kinda stuck. Bonne chance, and all.

        • by hughbar ( 579555 )
          Yes, my sig is a 'jeu de mots' based on 'honi soi qui mal y pense', translated it would mean 'off we go for those who think little of it' but it sounds like the original. French speakers spend a certain amount of their lives doing this, look as Asterisk in the original, all the characters names 'mean' something.
    • The Pele of Anal?

  • It looks like Orac.
    • by RDW ( 41497 )

      "You pathetic fool. That isn't Orac! Look at it! It's just a box of flashing lights!"

  • Slow Pi (Score:2, Insightful)

    by Anonymous Coward

    Running the numbers from the paper says the $1000 x86 compute node took 3.85 seconds on a benchmark, where the RPI cluster took (456/32)=14.25 seconds and also cost about $1000. Thus, after porting the software, a 3.7 times slow down was achieved over traditional methods.

    While there may be some gains (GPIO and such may be useful in this context) they didn't appear to be used here.

    This looks like a fun project, that got research money, but was not very useful for the goal the money was supposed to be spent o

    • Re: Slow Pi (Score:1, Informative)

      by Anonymous Coward

      If the purpose was to make a fast computer you may have a point. But the need for this project was to have a lore cost cluster to run massively parallel/distributed software. A single or low number of cores (relativity). May not give the solution you want. By exemplem, if you have a fast algorithm that can has to be run in order with no parallelism it will run fast on your $1000 x86. But the only way to speed this up is to use a faster processor then your technology limited. If you derive a different algor

      • by gl4ss ( 559668 )

        If the purpose was to make a fast computer you may have a point. But the need for this project was to have a lore cost cluster to run massively parallel/distributed software. A single or low number of cores (relativity). May not give the solution you want. By exemplem, if you have a fast algorithm that can has to be run in order with no parallelism it will run fast on your $1000 x86. But the only way to speed this up is to use a faster processor then your technology limited. If you derive a different algorithm that may be a bit slower but allows massive parallelism, then you can make the system faster by adding more hardware. This system is not about doing things fast, it's about seeing how things run on a cluster. If you used the x86 then you would get a wrong result faster.

        by another exempls.

        ah fuck it, the benchmark is supposed to test that. so it is in a parallel thing faster than the pi cluster. on a single thread thing it would be ridiculously slower to use the pi's.

        anyhow, I would wager that the point here is just to test the parallel algorithms on real hw - not to run them fast, but to prove that the basic ideas work.

        • anyhow, I would wager that the point here is just to test the parallel algorithms on real hw - not to run them fast, but to prove that the basic ideas work.

          I guess the issue is that building this cluster for accurate testing of the behavior of distributed algorithms was probably cheaper than trying to build an accurate simulator for it running on a desktop workstation would have been.

      • by Cenan ( 1892902 )

        So you can make it faster by adding more hardware or.... adding more hardware. Parallel and distributed are two very different things, and you cannot run a distributed anything on a single cluster, if you do, it would be properly named parallel. Anyways, the comparison is still valid - the RPI cluster failed to deliver; it was slower, was just as expensive as their benchmark x86 machine and probably 1000x as complex.

        You're right in what you say about algorithms, but it only holds if you already have unused

        • by gatkinso ( 15975 )

          >> So you can make it faster by adding more hardware or.... adding more hardware.

          Gene Amdahl says different.

        • "processors with multiple cores are cheaper than mutliple processors with one core."
          And both are cheaper than one really, really fast core. You can only really go up to 4Ghz with off-the-shelf parts - any higher than that and you're on to exotic cooling systems involving liquified gasses of one type or another. The record is 8.8GHz, but that took liquid nitrogen.

          • by julesh ( 229690 )

            You can only really go up to 4Ghz with off-the-shelf parts - any higher than that and you're on to exotic cooling systems involving liquified gasses of one type or another. The record is 8.8GHz, but that took liquid nitrogen.

            Of course, just measuring GHz isn't everything. As that's an AMD chip, you could probably get similar single-threaded performance by overclocking a recent Intel chip to about 6.6GHz [pureoverclock.com] (consensus seems to be that in computationally-intentensive tasks, sandy bridge is about 25% faster than bulldozer).

          • GHz cannot be used to describe a CPU's performance anymore.
        • Parallel and distributed are two very different things, and you cannot run a distributed anything on a single cluster, if you do, it would be properly named parallel.

          It's quite obvious that any distributed system is inherently parallel (unless you decide to do only synchronous message passing, which would be stupid). And if that cluster is comprised of isolated nodes passing messages over a network, then it's a distributed system - by definition.

        • Re: Slow Pi (Score:4, Informative)

          by TheJish ( 2926133 ) on Saturday May 18, 2013 @10:16AM (#43761751) Homepage
          You seem to be missing the point of this completely. ;) I needed a cluster to test some distributed programs (yes, you can test distributed programs inside a cluster). The cluster itself has nothing to do with my PhD work other than that it is a tool I created to ensure I could test the software I've been developing. As for providing a tutorial on how to do what I did, I was writing this to enable freshman engineers understand what was involved with building the cluster. Not everyone knows Linux, or how simple it is to build a Beowulf cluster.
    • by girlinatrainingbra ( 2738457 ) on Saturday May 18, 2013 @04:36AM (#43760665)
      it looks like the purpose behind this project is to have an "always available" (to this Ph.D. student) 32-node cluster that is dedicated to doing the work which this dissertation student needs to perform in order to complete his Ph.D., and it makes sense to be able to do this for the cost of a single Xeon node in a larger beowulf cluster.
      .
      This lets him escape the externalities which might impinge on his getting his own work done, like the big bad Beowulf cluster not being up or available when he needs it, or it being prioritized for someone else's project (say a professor who has tenure and more funding available). Those sorts of shenanigans would delay his work. So a 1/3rd speed cluster that's always available for your own project is a helluva good deal at 1/32 the cost of the big bad beowuilf cluster, eh? At least I think so!
      • by gl4ss ( 559668 )

        but the 32 raspberry pi's. are 3 times more expensive per compute speed unit than the onyx node he benchmarked against.
        that's to say the 1000 dollars(8 threads) machine is about 3 times faster than all the raspberry pi's combined! it's a vastly superior computing solution.

        it has to be for proofing some supercomputing sw and learning more than for anything practical.
        you can't even get the pi's for price that would get you 32 pi's for a thousand bucks though. and add costs for cabling, power sources etc.

        • by girlinatrainingbra ( 2738457 ) on Saturday May 18, 2013 @05:15AM (#43760761)
          Right, but a "vastly superior computing solution" for CFD or linear equations is one thing. Trying to simulate network communications activity for 32 or 33 nodes on a single compute node is probably slower than actually trying out the algorithms on dedicated hardware that instantiates an actual hardware network. Thus, for a project that tries out different networking and communications algorithms, a 3 times more expensive by your calculations might actually end up being 10 times less expensive, especially considering the locking and interprocess communications required in a multi-threaded simulation on a single compute node vs. actually running it on real hardware with 32 nodes and an ethernet network linking the 32 nodes.
          .
          Especially considering that this system is going to be used for wireless communications protocols, the real hardware solution is IMHO the better way to go.
          • by gl4ss ( 559668 )

            yeah, for that it makes sense, as a learning/testing tool as I said in other comments.

            but you said that it's 1/3rd of the power of the beowulf cluster for 1/32 of price, it just doesn't go that way(if it did, it would scale for super computing at vastly cheaper price than the pc nodes). the cluster is 1/3rd of the power of a single pc for more expensive price than a single pc..

          • by shess ( 31691 )

            I'm sorry, but ... what? The locking and other interprocess overhead will not increase on a multi-core single-node solution, it will decrease. If your system can run lock-free on the multi-node solution, they can run lock-free on a multi-core solution. It's a fleet of processes talking to each other via TCP/IP either way (except on a single-node solution you have additional options like UNIX-domain sockets or named pipes).

            The only way I could see it possibly being a win is if the system being simulated i

            • by flux ( 5274 )

              And how does a single node effortlessly simulate the data propagation delays that are inevitable in a distributed system? Do you have a solution that involves work less than the worth of $1000? (Well, I suppose building up the RPi cluster took some time as well..)

              It would be a more general solution if such software was written, but I wouldn't say cheaper.

          • Exactly!
        • 32 pis, 800ma per pi, 25.6A. Call it 30A to give some margin for error. Not exactly exotic - should be doable for thirty quid or so.

          I've read about servers that pack hundreds or thousands of arm or atom chips into one enclosure, giving great performance-per-watt for heavily threaded workloads. Mostly targetted at webservers.

        • I believe you still miss the point. The performance of the cluster isn't the real issue. The benchmark was run just to show the expected degree of parallelism was actually reached. The benchmark is in no way representative of the user requirements for the cluster itself and the tasks it is needed for. It was just ran as a checkpoint to demonstrate the cluster is working as expected.
      • So how about making a 32 node simulator?
    • by gatkinso ( 15975 )

      RPI is cheap. Now, scale this to a bunch of Panda's or Gumstix running in a suitcase. Wala luggable supercomputer.

      • Scale to ODROID-U2. It only has a four week warranty, but if you use enough of them the presumably high failure rate might not impinge on operations. Delivered, it costs about the same as Pi, but it's a lot more machine. It has the same problems with proprietary chips, but they're the same problems after all, it's not like R-Pi doesn't have them.

        • (er, delivered, it costs four times as much as the Pi, but it has four cores, and a lot more of everything else too. so what I meant to say but didn't (in b4 correction) is that you get more for your money. The abysmally short warranty is why I don't own one already.)

    • It is all about the RGB LEDs. Nothing else matters.
  • Rack mounting? (Score:5, Insightful)

    by thegarbz ( 1787294 ) on Saturday May 18, 2013 @03:43AM (#43760541)

    Not to diminish your achievements which are otherwise quite cool, but this novel approach to rack mounting is anything but. Quite possibly the single most important feature of a rack is ease of component access. By tying all components together with PCB standoffs you basically can't remove a single RPi if there's ever a pressing need.

    If anything you've shown a novel way of cramming things together without the use of a rack.

    • Comment removed based on user account deletion
      • Granted there's nothing much to remove from a pi mounted like this other than the SD card.
        The only time I'd image you'd tamper with a pi is when it decides to die from the overclock.

        Of course, but that's the point. Racks exist to allow you to take out components to swap. Often this is damage, sometimes this is upgrades, sometimes expansion.

        Of note is that there's now several variants of the RPi including 256MB and 512MB versions. So upgrading may be a logical choice too.

  • by Anonymous Coward

    Neat project but really the report left me frustrated.
    You start by comparing price and features of RPi to two other alternatives, e.g. Onyx node.
    Then you compare one RPi to one Onyx node. But moving on you never do a price or performance comparison of the 32 RPi cluster against the same onyx node which would be the interesting thing.
    Figure 5 shows something you could possible relate to the earlier information but only graphically. You don't state the actual numbers!

    Moving on "As discussed earlier, each RP

    • by gl4ss ( 559668 )

      the 4-to-8 improvement is probably because it only has 4 real cores.

      However, I suspect that later he doesn't do the comparison of the single Onyx node vs. his whole cluster because it would show the Pi cluster as a pointless endeavor(It's only useful for learning parallel computing, not for executing it). his 32 Pi cluster is more expensive than a 1000 dollar node(which certainly isn't the cheapest way to get a 3ghz quad core pc).

  • by girlinatrainingbra ( 2738457 ) on Saturday May 18, 2013 @04:18AM (#43760627)
    With a name like that (RPiCluster), perhaps it ought to be situated at the R.P.I. [wikipedia.org] in Troy, New York? Though for that nomenclature geographicalocalization, the Republican Party of Iowa [wikipedia.org] has as much claim to RPI [wikipedia.org] as these others do. I like the justification pointed out by the builder of this RPi.Cluster:
    The RPi platform has to be one of the cheapest ways to create a cluster of 32 nodes. The cost for an RPi with an 8GB SD card is ~$45. For comparison, each node in the Onyx cluster was somewhere between $1,000 and $1,500. So, for near the price of one PC-based node, we can create a 32 node Raspberry Pi cluster! [from the pdf file at http://coen.boisestate.edu/ece/files/2013/05/Rasp.-Pi.pdf [boisestate.edu] ]

    So the summary of the informal document is that it's cheaper to build a 32-node Rasp.-Pi cluster than to purchase even a single node of the 32-node Beowulf cluster that may or may not be available to you. And if you want to get your Ph.D. work done, I must agree that it sounds better to not be dependent upon the whims and follies of others' benevolence in having external hardware clusters available for your use. Bravo, Joshua Kiepert, I like your "informal writeup". Best wishes on your work!

    • Thanks. You have accurately deduced my intent ;) I started this project when the 32-node cluster I had been using was taken offline for renovations of the lab it resides in.
  • by decora ( 1710862 ) on Saturday May 18, 2013 @08:20AM (#43761121) Journal

    i wish i had done this, therefore you suck.

  • Impressive and cool!
  • The big problem in Ph D studies is your own review a few weeks before submittal time when you realize the things you should have done, at this point your own 'cluster' and always available is a "beyond price" jewel asset' to you. Awaiting priority on faculty assets could cost you your degree.

    Good luck to you. Good thinking out of your priorities.

  • I would have preferred graphs with lines, logarithmic scale and comparison with the theoretically attainable performances.

    Moreover, some more popular benchmarks should be run: HPL, NERSC Trinity benchmarks, or even real applications like Quantum Espresso which has some standard benchmark tests.

    Power consumption should be measured when running any benchmarks as it may vary depending on the type of application (CPU bound, memory bound).

    Nice project on the electrical and electronic engineering part, could bene

It is easier to write an incorrect program than understand a correct one.

Working...