Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Education Hardware Hacking Python Supercomputing United Kingdom Build

University Team Builds Lego and Raspberry Pi Cluster 147

hypnosec writes about a neat little hack using Lego, Raspberry Pis, and Scratch to construct a "supercomputer." From the article: "A team of computational engineers over at the University of Southampton led by Professor Simon Cox have built a supercomputer using Raspberry Pi and Lego. The supercomputer is comprised of 64 processors, 1TB of storage (16GB SD cards in each of the Raspberry Pis) and can be powered on using just a single 13-amp mains socket. MPI is used for communications between the nodes through the ethernet port. The team managed to build the core of the supercomputer for under £2500. Named 'Iridis-Pi' after University of Southampton's supercomputer Iridis, the supercomputer runs software that was built using Python and Scratch. Professor Cox used the free plug-in 'Python Tools for Visual Studio' to develop code for the Raspberry Pi." Lots of pictures of the thing, and a howto on making your own.
This discussion has been archived. No new comments can be posted.

University Team Builds Lego and Raspberry Pi Cluster

Comments Filter:
  • by Sparticus789 ( 2625955 ) on Wednesday September 12, 2012 @10:57AM (#41312529) Journal

    Gussy it up however you want, Trebek. What matters is does it work? Will the Rasperry PI supercomputer calculate large prime numbers? Because I've ordered devices like that before - wasted a pretty penny, I don't mind telling you. And if the Rasperry PI supercomputer works, I'll order a dozen!

    • Re:Want (Score:5, Informative)

      by the_humeister ( 922869 ) on Wednesday September 12, 2012 @12:12PM (#41313285)

      Unfortunately, the term "supercomputer" isn't really being used properly. They built a cluster of computers, sure. But "supercomputer"??? Hardly. The Raspberry Pi uses a processor based on ARM v6. Lemme give a single-threaded rendering comparison (Povray 3.6 running the benchmark scene [povray.org] (here's what the benchmark image output looks like [povray.org]) with my old HTC Aria, which uses a Qualcomm MSM 7227 processor and has similar processor specs as the Raspberry Pi (ARM v6 + VFP2 floating point hardware):

      HTC Aria (MSM 7227 @ 0.6 GHz) *
          Debian 6.0(armel), gcc 4.4, -mfloat-abi=softfp -mthumb
          Parse Time: 0 hours 1 minutes 3 seconds (63 seconds)
          Photon Time: 0 hours 53 minutes 49 seconds (3229 seconds)
          Render Time: 57 hours 31 minutes 41 seconds (207101 seconds)
          Total Time: 58 hours 26 minutes 33 seconds (210393 seconds)

      For comparison, here's a faster ARM processor from my Samsung Galaxy S II:

      Exynos 4210 @ 1.2 GHz (ARM Cortex A9),
          Debian 7.0(armhf), gcc 4.6, -mcpu=cortex-a9 -mhard-float -mthumb -mfpu=vfpv3 -ffast-math -funsafe-math-optimizations
          Parse Time: 0 hours 0 minutes 4 seconds (4 seconds)
          Photon Time: 0 hours 1 minutes 33 seconds (93 seconds)
          Render Time: 1 hours 26 minutes 34 seconds (5194 seconds)
          Total Time: 1 hours 28 minutes 11 seconds (5291 seconds)

      And here's from an Intel Core i5 2400s @ 2.5 GHz:

      Core i5 2400s @ 2.5 GHz, Ubuntu 12.04, gcc 4.6, -march=corei7-avx
      Total Scene Processing Times
          Parse Time: 0 hours 0 minutes 1 seconds (1 seconds)
          Photon Time: 0 hours 0 minutes 14 seconds (14 seconds)
          Render Time: 0 hours 10 minutes 12 seconds (612 seconds)
          Total Time: 0 hours 10 minutes 27 seconds (627 seconds)

      The ARM v6 processor took more than 2 days to render something that takes 10 minutes on a Core i5. So, "supercomputer" this cluster is not.

      * You may say, "Hey, this test is running using soft-float! If you used hard float, it'd be faster!" Well, you would be right that it would render faster under hard float, but this processor still wouldn't finish rendering in less than a day, let alone come anywhere close to Core i5 or Cortex A9.

      • by the_humeister ( 922869 ) on Wednesday September 12, 2012 @12:21PM (#41313359)

        Forgot to add the following:

        The image rendered is 384 x 384 pixels. MSM 7227 results are 0.70 pps and 1.17 pps/GHz. Raspberry Pi is runnying at 700 MHz, so it should theoretically get 0.82 pps. Its possible (and fairly easy) to split up the rendering among all the CPUs in this cluster with some custom scripting, so this benchmark image could theoretically render at 52.42 pps. That Core i5 2400s I mentioned above renders at 235.18 pps!

      • Re:Want (Score:2, Interesting)

        by Sparticus789 ( 2625955 ) on Wednesday September 12, 2012 @12:29PM (#41313449) Journal

        There's a bit of apples and oranges comparison there. You are comparing single-core processors to a quad-core processor. Of course the i5 is going to be faster. It would be better to divide the performance of the i5 by 4, to represent the performance of a single core of the processor.

        There's also a cost comparison. Just the i5 processor is ~$200, not to mention the motherboard, RAM, etc. Let's just say you can build a computer with an i5 for about $800 That's the same price as 32 Raspberry PIs. So if you take the MSM 7227 processing time and divide by 32, you get ~1.8 hours. Not stellar, by any means. However it is an interesting figure. There's also power requirements, cooling requirements, etc.

        Not saying that everyone should flock to SoC cluster computing, but the story is interesting nonetheless.

        And perhaps Celebrity Jeopardy was before your time.

        • Re:Want (Score:4, Informative)

          by the_humeister ( 922869 ) on Wednesday September 12, 2012 @12:32PM (#41313481)

          Povray 3.6 is single threaded, so all results are single threaded.

        • by dgatwood ( 11270 ) on Wednesday September 12, 2012 @11:26PM (#41320017) Homepage Journal

          You are comparing single-core processors to a quad-core processor. Of course the i5 is going to be faster. It would be better to divide the performance of the i5 by 4, to represent the performance of a single core of the processor.

          Based on those numbers, the quad-core i5 processor is approximately equal in performance to 335 Raspberry Pi cores (at this type of computation). Thus, even a single-core of the i5 would still be equivalent to almost 84 Raspberry Pi cores, and costs only about $600 even if you buy it pre-built as a laptop. By contrast, 335 Raspberry Pi machines would cost almost $12,000 (or more like $16k by the time you add power supplies, power cords, and flash cards). Even dividing by four, 84 of them (with power and flash) would cost about as much as a new Mac Pro with 12 cores. That, in turn, is equivalent to somewhere on the order of a Porsche Boxster worth of Raspberry Pi units as far as computing power goes. :-)

          And they're also not remotely efficient in power-per-watt. That Mac Pro takes about as much power as only a hundred Raspberry Pi boards, assuming a mythical 100% efficient power supply for the Pi, but would outperform a couple of thousand of them. So current Intel offerings are somewhere around 20x as efficient as a chip based on an eight-year-old ARM cell design. No surprise there.

          But that's really not the point. In performance-per-dollar or performance-per-watt, clusters of several-year-old CPUs will almost never be cost-effective. What makes them interesting is that they can be a cheap platform for letting folks test cluster-computing (distributed multiprocessing) apps experimentally without investing a huge amount of money. When testing how well an algorithm scales, assuming your tests don't tie up the hardware for such a long period of time that other folks in the class can't use it, it really doesn't matter how fast the total performance of the hardware is. What matters is the number of nodes. Thus, for supercomputer research, these sorts of devices are seriously cool.

        • Re:Want (Score:4, Informative)

          by petermgreen ( 876956 ) <plugwash.p10link@net> on Thursday September 13, 2012 @06:26AM (#41321685) Homepage

          The GPs figures are off. He is using a horrible compiler setup, not only is he using the softfloat calling convention, he is using -mthumb which AIUI will prevent the code from making direct use of the hardware FPU (and I suspect he uwas using debians version of libc preventing indirect use of the hardware fpc through libc routines)at all on armv6. According to hexxeh the povray benchmark under raspbian gives the following results under raspbian on a PI.

          Total Scene Processing Times
          Parse Time: 0 hours 0 minutes 16 seconds (16 seconds)
          Photon Time: 0 hours 5 minutes 57 seconds (357 seconds)
          Render Time: 6 hours 13 minutes 57 seconds (22437 seconds)
          Total Time: 6 hours 20 minutes 10 seconds (22810 seconds)

          http://www.raspberrypi.org/phpBB3/viewtopic.php?f=9&t=4256&start=175 [raspberrypi.org]

          Your price figures are off too. An i5 based compute node can be built for more like $500

          Similarlly the real price of a Pi node is quite a bit more than $25. Firstly the Pi you can actually buy and would want for this task (clustering needs networking support) has a base price of $35 not $25. Secondly that price excludes things like the power power supply the SD card, the network cable and the mouning hardware. The real cost of a Pi node is probablly more like $50.

          So the Pi is about 10 times lower per node than the i5

          My overall conclusion is if compute power per dolar is your goal then a smamler number of i5s is a much better bet than a larger numer of Pis.

      • Re:Want (Score:5, Informative)

        by CastrTroy ( 595695 ) on Thursday September 13, 2012 @05:43AM (#41321515)
        Actually, Just ran a test, because I was a little amazed that the ARM 6 was so much slower than the A9. Here are my numbers. Parse Time: 0 hours 0 minutes 14 seconds (14 seconds) Photon Time: 0 hours 5 minutes 43 seconds (343 seconds) Render Time: 5 hours 58 minutes 53 seconds (21533 seconds) Total Time: 6 hours 4 minutes 50 seconds (21890 seconds) While the Raspberry Pi wasn't faster than the A9 (didn't expect it to be) it was way faster then ARM6 you tested on. Most likely due to the fact that it uses hard float.
      • by petermgreen ( 876956 ) <plugwash.p10link@net> on Thursday September 13, 2012 @05:56AM (#41321569) Homepage

        You may say, "Hey, this test is running using soft-float! If you used hard float, it'd be faster!"

        Massively faster

        http://www.raspberrypi.org/phpBB3/viewtopic.php?f=9&t=4256&start=175 [raspberrypi.org]

  • by BooMonster ( 110656 ) on Wednesday September 12, 2012 @10:58AM (#41312539)

    One university managed to get a hold of 64 Raspberry PI units.

  • by filmorris ( 2466940 ) on Wednesday September 12, 2012 @10:59AM (#41312551)
    They should have built a Beowulf cluster. The regular one is such a cliché.
  • by Anonymous Coward on Wednesday September 12, 2012 @10:59AM (#41312553)

    So a cluster of 64 pi boards don't exceed ~3 kilowatts... Why would you expect them to given that they are supposed to run from a 5V supply at 1A (5W * 64 = 320W)

  • by colin_faber ( 1083673 ) on Wednesday September 12, 2012 @10:59AM (#41312555)
    Sorry but doesn't even crack the top 10,000's in machine performance, not exactly a super computer. A cluster yes. Super computer, HPC machine, etc. no.
  • by thammoud ( 193905 ) on Wednesday September 12, 2012 @11:01AM (#41312571)

    Can we now retire "Bewolf cluster" jokes?

  • by Plammox ( 717738 ) on Wednesday September 12, 2012 @11:01AM (#41312581)
    Aaaargh...imagine a.....in the Soviet Union....***carrier lost***
  • by CajunArson ( 465943 ) on Wednesday September 12, 2012 @11:02AM (#41312597) Journal

    Calling this thing a cluster.. fine.
    Calling it interesting for students to learn about how clusters work... fine.
    Calling it a supercomputer? Maybe if the University of Southampton got sucked into a time vortex to the early 1990's, and even then while the raw theoretical number crunching capability of the RPis would be impressive, the lackluster I/O and interconnects would mean that even supercomputers of that time would still win on many common workloads.

  • by Bovius ( 1243040 ) on Wednesday September 12, 2012 @11:03AM (#41312607)

    Cluster of Raspberry Pis = Bramble. Slashdot has been so drooly over every nitpicky update about these, I thought everyone here would know that by now.

  • by gblackwo ( 1087063 ) on Wednesday September 12, 2012 @11:04AM (#41312611) Homepage
    Whenever I see "professional" projects like this use legos- I have mixed feelings. Here is another example, a lab using legos for automation [hackaday.com]

    I love to see legos doing advanced things, but for a chassis? I feel like people can be very smart, but sometimes afraid to learn how to build something with their hands. The lab example I posted above is at Cambridge University. Cambridge has a very competent engineering department, why not reach out to them?- It could have made for an excellent project for some engineering students.

    I'm reminded of the very cited researcher who reinvented some calculus instead of simply reaching out to someone in another department for help [slashdot.org]
  • by silas_moeckel ( 234313 ) <silas@@@dsminc-corp...com> on Wednesday September 12, 2012 @11:05AM (#41312643) Homepage

    I get 64 cores a hell of a lot more memory and storage in a single quad proc server. Does this make every new VM or DB server I buy a supercomputer? It's not even drawing as much power as this stack. Maybe there planning on using there undocumented GPU's I can throw a couple of those as well and still trounce this setup. Am I missing something? Besides the putting them together with legos with his I assume son.

    • Re:Supercomputer? (Score:4, Interesting)

      by Kupfernigk ( 1190345 ) on Wednesday September 12, 2012 @11:26AM (#41312881)
      Yes, you are missing something (though I have slight reservations about the 16 cores to a die CPUs you claim to be using). There's this thing called education...your large server running loads of VMs is not going to be nearly as useful or informative at getting the ideas across as a rig like this. There is a big difference between working with virtual networks and seeing the hardware of a real network, as well as being able to program the thing with "small" languages without monster frameworks just to make anything happen.

      However, you do win a "Miserable git" award for being unpleasant about Prof. Cox.

      • by FilmedInNoir ( 1392323 ) on Wednesday September 12, 2012 @11:45AM (#41313067)
        Can I get nominated for that award? I'm creeped out by his Mini-Me son.
        All parents that dress their children as tiny doll versions of themselves actually.
      • by silas_moeckel ( 234313 ) <silas@@@dsminc-corp...com> on Wednesday September 12, 2012 @12:52PM (#41313691) Homepage

        You mean you have reservations about stock shipping AMD server procs? If you want education you want to be able to do things like artificially inflate the latency of the linking network that's easy to do on VM's. Test the effectiveness of different storage methods vs the type of workload. Looks at nodes with different processing capabilities. Honestly I find it amazing hard to fathom that it took a whole group of people to stack 64 SBC's load them with an OS and connect them up to a switch. This is a mornings work for an intern.

        Thanks for the miserable git been awhile since somebody called me names on the internet I'll try and be offended. I did not talk about the prof outside of mentioning that it was cool that be built it out of legos with his son.

  • by slim ( 1652 ) <john@hartnupBLUE.net minus berry> on Wednesday September 12, 2012 @11:08AM (#41312661) Homepage

    I'm a big fan of the RP project. But I'm a bit bored of seeing news items in which someone does something with this Linux box, which obviously a Linux box can do. Raspberry Pi compiles C! Raspberry Pi controls a robot! Raspberry Pi runs MAME! Well of course it does, it's a little PC, and that's what PCs can do.

  • by puddingebola ( 2036796 ) on Wednesday September 12, 2012 @11:18AM (#41312793) Journal
    I would like to recommend the red and white suited astronaut lego people to maintain the server, or to work as sysadmins. They seem very dependable. If not them, then maybe the Lego people from the 70s that didn't have the smiley face painted on them. They seem more analytical and inclined to this type of work. Anybody remember them?
  • by MetalliQaZ ( 539913 ) on Wednesday September 12, 2012 @11:21AM (#41312829)

    Wouldn't a dual Xeon server be able to easily out-muscle that "supercomputer"?

  • by SuperBanana ( 662181 ) on Wednesday September 12, 2012 @11:30AM (#41312921)

    64 SoC 700mhz cores connected via universal serial bus ethernet controllers, using flash memory that can at best pull about 10-30MB/sec read, and maybe 10MB/sec write if you're lucky.

    If this is an example of applying high-performance computing and data handling techniques to tackle complex engineering and scientific challenges", this is a massive fail.

    $4,000 buys you at retail (not with any sort of educational discount) a 1U machine (ie, a formfactor of about one quarter or less) with 12 Xeon 2Ghz cores connected by a bus that is orders of magnitude faster. 20-40MB of L3 cache between the processors. 16GB of ram (32GB if you're willing to spend another $600 or so), and TWO terabytes (wow, two!) of storage that will run at well over 100MB/sec sequential read. And guess what? It'll run on "one mains socket" too. In fact, because you don't have 64 separate DC linear regulators, it might even be *more* efficient.

    Spend $7k and you can get 64 xeon cores on four chips...still in 1U...

    • by SuperBanana ( 662181 ) on Wednesday September 12, 2012 @11:43AM (#41313045)
      I just did the math. The Pi community supposedly recommends a minimum of 1A@5V if you intend on using any peripherals, including ethernet. 700mAh is the minimum draw with *nothing* connected. 5W x 64 = 320W. That's quite close to the max capacity of the power supply for the dual-socket machine I mentioned. The E5-2620 processors have a max TDP of 95W each. Now, that doesn't count the auxiliaries - but there's still a 120W difference between typical power usage for the Pi, and MAXIMUM power usage for the Xeons, and I haven't even counted the power loss from the AC-DC power supply against the Pi (the rackmount machine's supply is ~95% efficient.)
    • by slim ( 1652 ) <john@hartnupBLUE.net minus berry> on Wednesday September 12, 2012 @11:45AM (#41313065) Homepage

      Mmm, but it *is* a nice environment for *students* to experiment with the *principles* of parallel computing in a tactile manner.

      I began learning to code on an 8 bit 2Mhz CPU, with 32KB of RAM. If I wrote an inefficient loop, I'd often notice the slowness without benchmarking. If I was careless with memory, my program would crash. On my quad core laptop today, I only notice issues like that if I benchmark or do deliberate load testing. So working on low-spec systems is instructive.

      Likewise, working with clusters of low-powered units on a slow comms bus is going to teach these students a lot about optimising parallel programs. They're going to have to deal with race conditions, memory ceilings, etc. which might not even show up on faster systems.

    • by M1FCJ ( 586251 ) on Wednesday September 12, 2012 @11:53AM (#41313129) Homepage

      So.. You have a little server there. Good luck with using it for teaching a bunch of students about how scalable clustered software works, how to write the software, what are the pitfalls and else.

      Good luck running 64 separate VMs on your small server (not saying it's not impossible but I really wonder which one is faster to set up) and you won't be able to test any of the very different interconnects that easily.

      • Good luck running 64 separate VMs on your small server (not saying it's not impossible but I really wonder which one is faster to set up) and you won't be able to test any of the very different interconnects that easily.

        Very easy indeed, and almost certainly quicker/easier to set up than the physical way, either using something like Vagrant [vagrantup.com] or by rolling your own scripts to drive VirtualBox.

        However, I think it's instructive for students to do it the physical way first. By analogy: first understand LANs, then learn about VLANs.

      • by CastrTroy ( 595695 ) on Wednesday September 12, 2012 @01:46PM (#41314289)
        When I was in university, I took a parallel computing course and we used MPI, same as these guys. Back then, all the personal machines were single core. If we were lucky we could test the program out by remote logging into the quad processor SUN machine. Guess what? We were able to learn quite a bit just running 64 different processes on the same box, even with just a single processor core. It would have been nice to have a machine around with 64 actual cores on it to see how things worked one everything was truly running in parallel, but we were able to do quite a bit with just a single machine.
    • by Simon Brooke ( 45012 ) <stillyet@googlemail.com> on Wednesday September 12, 2012 @11:56AM (#41313155) Homepage Journal

      WOOOOOOOOOOOOOOSH!

      There is a whole lot of point missing going on here. Yes, you could build a faster computer cheaper using other hardware. But it wouldn't explain the concepts to children (and to first year CS students, which is pretty much the same thing) nearly so well. Throw together a heap of little itty-bitty boards each of which, individually is, as everyone knows, relatively low power, and knit them together with ordinary cat5 cable, and get out of the collection high compute performance, and you have something which will intrigue children|students and get them thinking about how it works. Show them an anonymous 1U box doing exactly the same job, and you won't get them thinking, because they can't immediately see and understand what it comprises and how it's put together. This is a teaching machine, not a practical machine. It's job is to teach students. It teaches students by being perspicuous.

      It's not (yet) a requirement for getting a Slashdot account to demonstrate that you have an IQ slightly south of that of a stick of used chewing gum, but some of you clearly haven't yet got that message.

      • by SuperBanana ( 662181 ) on Wednesday September 12, 2012 @01:11PM (#41313849)

        There is a whole lot of point missing going on here.

        You're the one missing the point here. I can fit in 1U what used to take an entire rack.

        When you can fit that kind of power into 1U, and given the massive leaps in computing power per core, traditional nodes-connected-by-networks clusters are applicable for far fewer people these days. What they should be teaching is proper multithreaded programming techniques.

        get out of the collection high compute performance

        Were you not paying attention when I said that 64 700mhz ARM nodes connected via USB (which requires enormous CPU overhead, on a processor with virtually no cache and slow busses, which means lots of out-of-cache memory access and context switching) with shitty, slow storage - does not make "high compute performance"? That cluster probably struggles to match one single 6-core Xeon.

        It's not (yet) a requirement for getting a Slashdot account to demonstrate that you have an IQ slightly south of that of a stick of used chewing gum, but some of you clearly haven't yet got that message.

        http://yourlogicalfallacyis.com/ad-hominem

    • by rusty0101 ( 565565 ) on Thursday September 13, 2012 @01:23AM (#41320573) Homepage Journal

      Something that I don't thing got much play in the article is that each of the 64 Pi boards has a SOC processor that in addition to the general purpose processor also includes a 48 core processor optimized for graphics. And yes in http://www.raspberrypi.org/archives/1967 [raspberrypi.org] they note that there is already code that can use those processors for graphics. I have little doubt that someone looking at the code can port one of the gpu processing libraries to make use of these processors for other numerically intensive purposes.

      So don't forget to add in sufficient video processing to provide 3072 cores of processing equivalence to your rig. I suspect that you can figure out how to do that as you've already calculated what you feel is the equivalent general processing equivalent (or better) for your sample system. I'm not sure that the resulting system would still fit in a 1u case, but it might.

  • by SimplexBang ( 2685909 ) on Wednesday September 12, 2012 @11:41AM (#41313025)
    Supercomputer as in ' Super Structure' not as in ' Super Man'
  • ....I can't even get one! I've been in the Queue since before initial release and still have yet to receieve mine, and even got an email two weeks ago about further delays!

    Still a really great accomplishement though.

  • by polyp2000 ( 444682 ) on Wednesday September 12, 2012 @12:40PM (#41313563) Homepage Journal

    But unfortuntately they would not sell more than one per customer - Unless you purchased one from both RS and Farnell.
    I'd even bought a rack mount case to house the cluster :(

    N...

  • by csumpi ( 2258986 ) on Wednesday September 12, 2012 @12:44PM (#41313615)
    This story has built out of Legos, and Raspberry Pis, so it's definitely worthy for the slashdot front page. But it could be better, like they called the order in from their Nokia phone and paid for it using Bitcoins.
  • by coofercat ( 719737 ) on Thursday September 13, 2012 @08:34AM (#41322311) Homepage Journal

    Quite seriously, I wondered about making a cluster of Pis to replace a desktop PC I have running in the loft. It really just runs some web servers, PHP, Mysql and a few other fiddly things. I wondered if I could potentially even dynamically boot up Pis to cover load (ie. spin up some extra web servers when load increases). My big problem is the DB though - I mainly use Drupal, so don't have separate read and write DB handles, so I can't scale MySQL horizontally. Also, the ethernet isn't very fast, so the interconnects probably wouldn't work very well either. I thought about maybe using Beaglebones or Pandaboards. Whilst that have faster CPUs, they still only have 100MB networking, so probably a little less than ideal. After working through all that, I don't suppose I'd save that much power (or space?) from the desktop.

  • by Douglas Goodall ( 992917 ) on Thursday September 13, 2012 @01:20PM (#41325577) Homepage
    How do you build a supercomputer out of processor modules that cannot reliably communicate with each other. The ethernet connectivity of the pi is based on a small module that attaches to the USB. I don't get it...

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...