Forgot your password?
typodupeerror
Education Hardware Hacking Python Supercomputing United Kingdom Build

University Team Builds Lego and Raspberry Pi Cluster 147

Posted by Unknown Lamer
from the build-a-beowulf-wait-a-minute dept.
hypnosec writes about a neat little hack using Lego, Raspberry Pis, and Scratch to construct a "supercomputer." From the article: "A team of computational engineers over at the University of Southampton led by Professor Simon Cox have built a supercomputer using Raspberry Pi and Lego. The supercomputer is comprised of 64 processors, 1TB of storage (16GB SD cards in each of the Raspberry Pis) and can be powered on using just a single 13-amp mains socket. MPI is used for communications between the nodes through the ethernet port. The team managed to build the core of the supercomputer for under £2500. Named 'Iridis-Pi' after University of Southampton's supercomputer Iridis, the supercomputer runs software that was built using Python and Scratch. Professor Cox used the free plug-in 'Python Tools for Visual Studio' to develop code for the Raspberry Pi." Lots of pictures of the thing, and a howto on making your own.
This discussion has been archived. No new comments can be posted.

University Team Builds Lego and Raspberry Pi Cluster

Comments Filter:
  • by Sparticus789 (2625955) on Wednesday September 12, 2012 @09:57AM (#41312529) Journal

    Gussy it up however you want, Trebek. What matters is does it work? Will the Rasperry PI supercomputer calculate large prime numbers? Because I've ordered devices like that before - wasted a pretty penny, I don't mind telling you. And if the Rasperry PI supercomputer works, I'll order a dozen!

    • Re:Want (Score:5, Informative)

      by the_humeister (922869) on Wednesday September 12, 2012 @11:12AM (#41313285)

      Unfortunately, the term "supercomputer" isn't really being used properly. They built a cluster of computers, sure. But "supercomputer"??? Hardly. The Raspberry Pi uses a processor based on ARM v6. Lemme give a single-threaded rendering comparison (Povray 3.6 running the benchmark scene [povray.org] (here's what the benchmark image output looks like [povray.org]) with my old HTC Aria, which uses a Qualcomm MSM 7227 processor and has similar processor specs as the Raspberry Pi (ARM v6 + VFP2 floating point hardware):

      HTC Aria (MSM 7227 @ 0.6 GHz) *
          Debian 6.0(armel), gcc 4.4, -mfloat-abi=softfp -mthumb
          Parse Time: 0 hours 1 minutes 3 seconds (63 seconds)
          Photon Time: 0 hours 53 minutes 49 seconds (3229 seconds)
          Render Time: 57 hours 31 minutes 41 seconds (207101 seconds)
          Total Time: 58 hours 26 minutes 33 seconds (210393 seconds)

      For comparison, here's a faster ARM processor from my Samsung Galaxy S II:

      Exynos 4210 @ 1.2 GHz (ARM Cortex A9),
          Debian 7.0(armhf), gcc 4.6, -mcpu=cortex-a9 -mhard-float -mthumb -mfpu=vfpv3 -ffast-math -funsafe-math-optimizations
          Parse Time: 0 hours 0 minutes 4 seconds (4 seconds)
          Photon Time: 0 hours 1 minutes 33 seconds (93 seconds)
          Render Time: 1 hours 26 minutes 34 seconds (5194 seconds)
          Total Time: 1 hours 28 minutes 11 seconds (5291 seconds)

      And here's from an Intel Core i5 2400s @ 2.5 GHz:

      Core i5 2400s @ 2.5 GHz, Ubuntu 12.04, gcc 4.6, -march=corei7-avx
      Total Scene Processing Times
          Parse Time: 0 hours 0 minutes 1 seconds (1 seconds)
          Photon Time: 0 hours 0 minutes 14 seconds (14 seconds)
          Render Time: 0 hours 10 minutes 12 seconds (612 seconds)
          Total Time: 0 hours 10 minutes 27 seconds (627 seconds)

      The ARM v6 processor took more than 2 days to render something that takes 10 minutes on a Core i5. So, "supercomputer" this cluster is not.

      * You may say, "Hey, this test is running using soft-float! If you used hard float, it'd be faster!" Well, you would be right that it would render faster under hard float, but this processor still wouldn't finish rendering in less than a day, let alone come anywhere close to Core i5 or Cortex A9.

      • Forgot to add the following:

        The image rendered is 384 x 384 pixels. MSM 7227 results are 0.70 pps and 1.17 pps/GHz. Raspberry Pi is runnying at 700 MHz, so it should theoretically get 0.82 pps. Its possible (and fairly easy) to split up the rendering among all the CPUs in this cluster with some custom scripting, so this benchmark image could theoretically render at 52.42 pps. That Core i5 2400s I mentioned above renders at 235.18 pps!

      • Re: (Score:2, Interesting)

        by Sparticus789 (2625955)

        There's a bit of apples and oranges comparison there. You are comparing single-core processors to a quad-core processor. Of course the i5 is going to be faster. It would be better to divide the performance of the i5 by 4, to represent the performance of a single core of the processor.

        There's also a cost comparison. Just the i5 processor is ~$200, not to mention the motherboard, RAM, etc. Let's just say you can build a computer with an i5 for about $800 That's the same price as 32 Raspberry PIs. So if

        • Re:Want (Score:4, Informative)

          by the_humeister (922869) on Wednesday September 12, 2012 @11:32AM (#41313481)

          Povray 3.6 is single threaded, so all results are single threaded.

        • by dgatwood (11270)

          You are comparing single-core processors to a quad-core processor. Of course the i5 is going to be faster. It would be better to divide the performance of the i5 by 4, to represent the performance of a single core of the processor.

          Based on those numbers, the quad-core i5 processor is approximately equal in performance to 335 Raspberry Pi cores (at this type of computation). Thus, even a single-core of the i5 would still be equivalent to almost 84 Raspberry Pi cores, and costs only about $600 even if you bu

        • Re:Want (Score:4, Informative)

          by petermgreen (876956) <plugwash.p10link@net> on Thursday September 13, 2012 @05:26AM (#41321685) Homepage

          The GPs figures are off. He is using a horrible compiler setup, not only is he using the softfloat calling convention, he is using -mthumb which AIUI will prevent the code from making direct use of the hardware FPU (and I suspect he uwas using debians version of libc preventing indirect use of the hardware fpc through libc routines)at all on armv6. According to hexxeh the povray benchmark under raspbian gives the following results under raspbian on a PI.

          Total Scene Processing Times
          Parse Time: 0 hours 0 minutes 16 seconds (16 seconds)
          Photon Time: 0 hours 5 minutes 57 seconds (357 seconds)
          Render Time: 6 hours 13 minutes 57 seconds (22437 seconds)
          Total Time: 6 hours 20 minutes 10 seconds (22810 seconds)

          http://www.raspberrypi.org/phpBB3/viewtopic.php?f=9&t=4256&start=175 [raspberrypi.org]

          Your price figures are off too. An i5 based compute node can be built for more like $500

          Similarlly the real price of a Pi node is quite a bit more than $25. Firstly the Pi you can actually buy and would want for this task (clustering needs networking support) has a base price of $35 not $25. Secondly that price excludes things like the power power supply the SD card, the network cable and the mouning hardware. The real cost of a Pi node is probablly more like $50.

          So the Pi is about 10 times lower per node than the i5

          My overall conclusion is if compute power per dolar is your goal then a smamler number of i5s is a much better bet than a larger numer of Pis.

      • Re:Want (Score:5, Informative)

        by CastrTroy (595695) on Thursday September 13, 2012 @04:43AM (#41321515) Homepage
        Actually, Just ran a test, because I was a little amazed that the ARM 6 was so much slower than the A9. Here are my numbers. Parse Time: 0 hours 0 minutes 14 seconds (14 seconds) Photon Time: 0 hours 5 minutes 43 seconds (343 seconds) Render Time: 5 hours 58 minutes 53 seconds (21533 seconds) Total Time: 6 hours 4 minutes 50 seconds (21890 seconds) While the Raspberry Pi wasn't faster than the A9 (didn't expect it to be) it was way faster then ARM6 you tested on. Most likely due to the fact that it uses hard float.
      • You may say, "Hey, this test is running using soft-float! If you used hard float, it'd be faster!"

        Massively faster

        http://www.raspberrypi.org/phpBB3/viewtopic.php?f=9&t=4256&start=175 [raspberrypi.org]

  • by BooMonster (110656) on Wednesday September 12, 2012 @09:58AM (#41312539)

    One university managed to get a hold of 64 Raspberry PI units.

    • by Anonymous Coward

      They must have ordered from Farnell, because they sure as hell wouldn't have received them from RS.

  • They should have built a Beowulf cluster. The regular one is such a cliché.
  • by Anonymous Coward

    So a cluster of 64 pi boards don't exceed ~3 kilowatts... Why would you expect them to given that they are supposed to run from a 5V supply at 1A (5W * 64 = 320W)

    • So a cluster of 64 pi boards don't exceed ~3 kilowatts... Why would you expect them to given that they are supposed to run from a 5V supply at 1A (5W * 64 = 320W)

      The comparison point is that a 64-node cluster of regular PC hardware couldn't fit behind a basic mains line.

      • No, but a single computer that has 16 or rmore cores, many GB of RAM and multiple TB of storage would. I'd guess it would out perform the PI cluster too.
        • Re: (Score:3, Insightful)

          by M1FCJ (586251)

          Of course, this being a teaching tool, having the performance would mean nothing. Having the discrete computing units do.

        • sounds like the old mainframe vs x86 commodity server argument. small mobile tech will take the place of much of the current PC's because grandma doesn't need a PC to check email, facebook, play solitaire, and watch netflix, but the PC won't disappear just like the mainframe didn't disappear because it still has many more use's. in fact there are more mainframes sold per year now then there were in the pre-PC era. what we will see come from this will be a continuation of the trend toward less and less expen

      • by Desler (1608317)

        Well, yes. It would also be magnitudes faster than this.

  • by colin_faber (1083673) on Wednesday September 12, 2012 @09:59AM (#41312555)
    Sorry but doesn't even crack the top 10,000's in machine performance, not exactly a super computer. A cluster yes. Super computer, HPC machine, etc. no.
    • by vlm (69642)

      A supercomputer is any overall system that's IO limited not CPU limited like most machines. At least at full theoretical CPU use. Hard to define a rasp pi as anything other than IO limited, so... An alternative def more popular recently is programmer limited as in its hard to parallelize some algorithms. Either way it fits.

      • by slim (1652)

        A supercomputer is any overall system that's IO limited not CPU limited like most machines.

        So if I cripple all IO to my 486, except a 300 baud modem, I've built a supercomputer?

      • by Desler (1608317)

        Actually many normal desktops are IO limited for a number of applications. That hardly qualifies them as supercomputers.

      • That seems like a pretty poor definition to me. In fact, it seems like something could be or not be a supercomputer depending on what job you have running on it.

      • A supercomputer is any overall system that's IO limited not CPU limited like most machines.

        Do all your computers have a 286? Last time I checked, modern PCs are always IO limited.

      • Sorry but this statement is completely wrong. HPC (Super computers) are a moving target and the term represents the top 500 (or less) machines world wide.

        This statement is also wrong in respect to I/O limitations as bottle necks have nothing to do with whether or not a system is considered a "Super computer".

      • by ultranova (717540)

        A supercomputer is any overall system that's IO limited not CPU limited like most machines.

        Isn't that pretty much any system nowadays? Most desktop computers are limited by their disk-to-memory and memory-to-registers speeds.

        • While technically RAM is not part of the CPU itself, I think you'll find most people don't consider memory access when calling something "I/O limited". That's more along the lines of network ports, hard drive, USB, firewire, thunderbolt, etc.

  • by thammoud (193905) on Wednesday September 12, 2012 @10:01AM (#41312571)

    Can we now retire "Bewolf cluster" jokes?

  • by Plammox (717738) on Wednesday September 12, 2012 @10:01AM (#41312581)
    Aaaargh...imagine a.....in the Soviet Union....***carrier lost***
  • by CajunArson (465943) on Wednesday September 12, 2012 @10:02AM (#41312597) Journal

    Calling this thing a cluster.. fine.
    Calling it interesting for students to learn about how clusters work... fine.
    Calling it a supercomputer? Maybe if the University of Southampton got sucked into a time vortex to the early 1990's, and even then while the raw theoretical number crunching capability of the RPis would be impressive, the lackluster I/O and interconnects would mean that even supercomputers of that time would still win on many common workloads.

  • by Bovius (1243040) on Wednesday September 12, 2012 @10:03AM (#41312607)

    Cluster of Raspberry Pis = Bramble. Slashdot has been so drooly over every nitpicky update about these, I thought everyone here would know that by now.

  • by gblackwo (1087063) on Wednesday September 12, 2012 @10:04AM (#41312611) Homepage
    Whenever I see "professional" projects like this use legos- I have mixed feelings. Here is another example, a lab using legos for automation [hackaday.com]

    I love to see legos doing advanced things, but for a chassis? I feel like people can be very smart, but sometimes afraid to learn how to build something with their hands. The lab example I posted above is at Cambridge University. Cambridge has a very competent engineering department, why not reach out to them?- It could have made for an excellent project for some engineering students.

    I'm reminded of the very cited researcher who reinvented some calculus instead of simply reaching out to someone in another department for help [slashdot.org]
    • I love to see legos doing advanced things, but for a chassis? I feel like people can be very smart, but sometimes afraid to learn how to build something with their hands. The lab example I posted above is at Cambridge University. Cambridge has a very competent engineering department, why not reach out to them?- It could have made for an excellent project for some engineering students.

      Aside from your Lego-assembling robot, Lego is always assembled by hand. It's also cheaper and faster to build a Lego chassis than get the engineers to weld up one from mild steel.

      • by Dishevel (1105119)

        Engineers do not weld.
        They use a computer to let others know that they need to weld those two pieces of steel together.

        • by gblackwo (1087063)
          It is this kind of thinking that makes people build something out of legos instead of trying to learn a new skill.
    • by xaxa (988988)

      "The rack for the supercomputer has been built using Lego under the guidance of Professor Cox's son James Cox (aged 6)."

      Also "reached out" is a stupid phrase.

  • I get 64 cores a hell of a lot more memory and storage in a single quad proc server. Does this make every new VM or DB server I buy a supercomputer? It's not even drawing as much power as this stack. Maybe there planning on using there undocumented GPU's I can throw a couple of those as well and still trounce this setup. Am I missing something? Besides the putting them together with legos with his I assume son.

    • Re:Supercomputer? (Score:4, Interesting)

      by Kupfernigk (1190345) on Wednesday September 12, 2012 @10:26AM (#41312881)
      Yes, you are missing something (though I have slight reservations about the 16 cores to a die CPUs you claim to be using). There's this thing called education...your large server running loads of VMs is not going to be nearly as useful or informative at getting the ideas across as a rig like this. There is a big difference between working with virtual networks and seeing the hardware of a real network, as well as being able to program the thing with "small" languages without monster frameworks just to make anything happen.

      However, you do win a "Miserable git" award for being unpleasant about Prof. Cox.

      • Can I get nominated for that award? I'm creeped out by his Mini-Me son.
        All parents that dress their children as tiny doll versions of themselves actually.
      • You mean you have reservations about stock shipping AMD server procs? If you want education you want to be able to do things like artificially inflate the latency of the linking network that's easy to do on VM's. Test the effectiveness of different storage methods vs the type of workload. Looks at nodes with different processing capabilities. Honestly I find it amazing hard to fathom that it took a whole group of people to stack 64 SBC's load them with an OS and connect them up to a switch. This is a m

  • by slim (1652) <john AT hartnup DOT net> on Wednesday September 12, 2012 @10:08AM (#41312661) Homepage

    I'm a big fan of the RP project. But I'm a bit bored of seeing news items in which someone does something with this Linux box, which obviously a Linux box can do. Raspberry Pi compiles C! Raspberry Pi controls a robot! Raspberry Pi runs MAME! Well of course it does, it's a little PC, and that's what PCs can do.

    • by sirwired (27582) on Wednesday September 12, 2012 @11:00AM (#41313187)

      You clearly need to turn in your Slashdot commenter license... to REALLY entice editors to post a story, work BitCoins into the mix. Oohh... better yet, work in references to the MPAA, And Ubuntu, and whatever else can be stirred into the pot. References to MAME are old school... (although that can be forgiven, Mr. 4-digit UID.)

      How does this sound? "Raspberry Pi used to mine BitCoins to help pay an MPAA Lawsuit Fine. However, due to a security hole in Ubuntu caused by the new Unity interface, the new coins were stolen from the user by someone claiming to be affiliated with Anonymous. Wil Wheaton offers to sponsor a live D&D game played with Arduino-programmed robotic miniatures to make up for the lost funds."

      Did I miss anything?

      • by Rotag_FU (2039670)

        Yep, you forgot about how kick-starter was used to fund the creation of the robotic miniatures. Also, the Raspberry Pi was actually running MineCraft which had a working implementation of an 8-bit processor that was doing the actual BitCoin mining. Researchers were observing the operation of the MineCraft processor using the Oculus Rift headset. In the future, the designers plan to port this all to the upcoming Ouya.

        There, I think that has it covered. :)

    • I think this is a great project for students, because it will let them develop and test simulations and other algorithms for parallel computing without tying up expensive "real" supercomputers. A bonus is that the relatively slow speed may encourage techniques to make such computations more efficient, with a corresponding payoff when the algorithm is put onto the real thing.
    • by mapkinase (958129)

      I agree. The original post also omits relevant number of Lego blocks used for construction of the cluster.

  • I would like to recommend the red and white suited astronaut lego people to maintain the server, or to work as sysadmins. They seem very dependable. If not them, then maybe the Lego people from the 70s that didn't have the smiley face painted on them. They seem more analytical and inclined to this type of work. Anybody remember them?
  • Wouldn't a dual Xeon server be able to easily out-muscle that "supercomputer"?

    • A Core 2 era dual-core Xeon would out-muscle them unless they can use the propitiatory GPU. With the GPUs they could spank any single chip.

  • 64 SoC 700mhz cores connected via universal serial bus ethernet controllers, using flash memory that can at best pull about 10-30MB/sec read, and maybe 10MB/sec write if you're lucky.

    If this is an example of applying high-performance computing and data handling techniques to tackle complex engineering and scientific challenges", this is a massive fail.

    $4,000 buys you at retail (not with any sort of educational discount) a 1U machine (ie, a formfactor of about one quarter or less) with 12 Xeon 2Ghz cores con

    • I just did the math. The Pi community supposedly recommends a minimum of 1A@5V if you intend on using any peripherals, including ethernet. 700mAh is the minimum draw with *nothing* connected. 5W x 64 = 320W. That's quite close to the max capacity of the power supply for the dual-socket machine I mentioned. The E5-2620 processors have a max TDP of 95W each. Now, that doesn't count the auxiliaries - but there's still a 120W difference between typical power usage for the Pi, and MAXIMUM power usage for t
    • by slim (1652) <john AT hartnup DOT net> on Wednesday September 12, 2012 @10:45AM (#41313065) Homepage

      Mmm, but it *is* a nice environment for *students* to experiment with the *principles* of parallel computing in a tactile manner.

      I began learning to code on an 8 bit 2Mhz CPU, with 32KB of RAM. If I wrote an inefficient loop, I'd often notice the slowness without benchmarking. If I was careless with memory, my program would crash. On my quad core laptop today, I only notice issues like that if I benchmark or do deliberate load testing. So working on low-spec systems is instructive.

      Likewise, working with clusters of low-powered units on a slow comms bus is going to teach these students a lot about optimising parallel programs. They're going to have to deal with race conditions, memory ceilings, etc. which might not even show up on faster systems.

      • Re: (Score:3, Insightful)

        by hattig (47930)

        Exactly, whilst the system isn't powerful, it is instructive in cluster design and programming, which is very relevant at a university.

        They won't be running "real applications doing real calculations" on this thing. They'll be writing student-level clustered applications. For the price paid, it's probably a really instructive system for the university to have installed, if they make use of it in student courses and/or projects.

      • I thought the same thing. It's much easier to notice errors in your code and how the whole thing works when using a machine that is slow or purposely slow.
      • There is questionable benefit to having that tactile experience a)extend beyond a few nodes - certainly nowhere near 64 b)have it on hardware which resembles embedded systems, not real compute nodes. In short, it's teaching 15 year old HPC technology in an era where you can fit 64 cores into 1U for a couple grand.
    • by M1FCJ (586251)

      So.. You have a little server there. Good luck with using it for teaching a bunch of students about how scalable clustered software works, how to write the software, what are the pitfalls and else.

      Good luck running 64 separate VMs on your small server (not saying it's not impossible but I really wonder which one is faster to set up) and you won't be able to test any of the very different interconnects that easily.

      • by slim (1652)

        Good luck running 64 separate VMs on your small server (not saying it's not impossible but I really wonder which one is faster to set up) and you won't be able to test any of the very different interconnects that easily.

        Very easy indeed, and almost certainly quicker/easier to set up than the physical way, either using something like Vagrant [vagrantup.com] or by rolling your own scripts to drive VirtualBox.

        However, I think it's instructive for students to do it the physical way first. By analogy: first understand LANs, then learn about VLANs.

      • by CastrTroy (595695)
        When I was in university, I took a parallel computing course and we used MPI, same as these guys. Back then, all the personal machines were single core. If we were lucky we could test the program out by remote logging into the quad processor SUN machine. Guess what? We were able to learn quite a bit just running 64 different processes on the same box, even with just a single processor core. It would have been nice to have a machine around with 64 actual cores on it to see how things worked one everythin
    • by Simon Brooke (45012) <stillyet@googlemail.com> on Wednesday September 12, 2012 @10:56AM (#41313155) Homepage Journal

      WOOOOOOOOOOOOOOSH!

      There is a whole lot of point missing going on here. Yes, you could build a faster computer cheaper using other hardware. But it wouldn't explain the concepts to children (and to first year CS students, which is pretty much the same thing) nearly so well. Throw together a heap of little itty-bitty boards each of which, individually is, as everyone knows, relatively low power, and knit them together with ordinary cat5 cable, and get out of the collection high compute performance, and you have something which will intrigue children|students and get them thinking about how it works. Show them an anonymous 1U box doing exactly the same job, and you won't get them thinking, because they can't immediately see and understand what it comprises and how it's put together. This is a teaching machine, not a practical machine. It's job is to teach students. It teaches students by being perspicuous.

      It's not (yet) a requirement for getting a Slashdot account to demonstrate that you have an IQ slightly south of that of a stick of used chewing gum, but some of you clearly haven't yet got that message.

      • There is a whole lot of point missing going on here.

        You're the one missing the point here. I can fit in 1U what used to take an entire rack.

        When you can fit that kind of power into 1U, and given the massive leaps in computing power per core, traditional nodes-connected-by-networks clusters are applicable for far fewer people these days. What they should be teaching is proper multithreaded programming techniques.

        get out of the collection high compute performance

        Were you not paying attention when I said t

    • by rusty0101 (565565)

      Something that I don't thing got much play in the article is that each of the 64 Pi boards has a SOC processor that in addition to the general purpose processor also includes a 48 core processor optimized for graphics. And yes in http://www.raspberrypi.org/archives/1967 [raspberrypi.org] they note that there is already code that can use those processors for graphics. I have little doubt that someone looking at the code can port one of the gpu processing libraries to make use of these processors for other numerically intensiv

  • Supercomputer as in ' Super Structure' not as in ' Super Man'
  • ....I can't even get one! I've been in the Queue since before initial release and still have yet to receieve mine, and even got an email two weeks ago about further delays!

    Still a really great accomplishement though.

    • by ctid (449118)

      RS components have had serious delays. I cancelled my order with them and instead ordered from CPC [cpc.co.uk]. It took approximately 48 hours for RS to refund my money and almost exactly the same amount of time for CPC to deliver.

  • But unfortuntately they would not sell more than one per customer - Unless you purchased one from both RS and Farnell.
    I'd even bought a rack mount case to house the cluster :(

    N...

  • This story has built out of Legos, and Raspberry Pis, so it's definitely worthy for the slashdot front page. But it could be better, like they called the order in from their Nokia phone and paid for it using Bitcoins.
  • Quite seriously, I wondered about making a cluster of Pis to replace a desktop PC I have running in the loft. It really just runs some web servers, PHP, Mysql and a few other fiddly things. I wondered if I could potentially even dynamically boot up Pis to cover load (ie. spin up some extra web servers when load increases). My big problem is the DB though - I mainly use Drupal, so don't have separate read and write DB handles, so I can't scale MySQL horizontally. Also, the ethernet isn't very fast, so the in

  • How do you build a supercomputer out of processor modules that cannot reliably communicate with each other. The ethernet connectivity of the pi is based on a small module that attaches to the USB. I don't get it...

When all else fails, read the instructions.

Working...