RPiCluster: Another Raspberry Pi Cluster, With Neat Tricks 79
New submitter TheJish writes "The RPiCluster is a 33-node Beowulf cluster built using Raspberry Pis (RPis). The RPiCluster is a little side project I worked on over the last couple months as part of my dissertation work at Boise State University. I had need of a cluster to run a distributed simulator I've been developing. The RPiCluster is the result. I've written an informal document on why I built the RPiCluster, how it was built, and how it performs as compared to other platforms. I also put together a YouTube video of it running an MPI parallel program I created to demo the RGB LEDs installed on each node as part of the build. While there have certainly been larger RPi clusters put together recently, I figured the Slashdot community might be interested in this build as I believe it is a novel approach to the rack mounting and power management of RPis."
5 - Profit! (Score:5, Funny)
Re: (Score:3)
You could be driving around in one of these next week! (http://www.youtube.com/watch?v=cDoRmT0iRic)
Re: 5 - Profit! (Score:1)
6.) Move out of basement.
7.) Talk to a girl.
Re: (Score:2)
Re: (Score:1)
Since when does /. allow scam advertising within comments?
Hm... (Score:5, Funny)
A new Raspberry Pi cluster Fram Boise University, eh?
Re: (Score:3, Insightful)
acronym for F.R.A.M. + Boise = red + sour (Score:5, Interesting)
Raspberry.Pi
Architectural
Messaging
since he says in his pdf document that " My research is currently focused on developing a novel da ta sharing system for wireless sensor networks to facilitate in-network collaborative processing of sensor data. In the process of developing this system it became clear that perhaps the most expedient way to test many of the ideas was to create a distributed simulation rather than developing directly on the final target embedded hardware."
Re: (Score:2)
Funny, I first saw it as "honni suit, qui mal y pense," but looking it up, find it's "honi soit." Guess that 8th grade French book had a few mistakes in it, back in '60. But then I don't know French, just a few bits here and there that kinda stuck. Bonne chance, and all.
Re: (Score:2)
Re: (Score:2)
ah, thanks; I get the drift, but it's over my head [grin]
Re: (Score:2)
The Pele of Anal?
Different concept: HA Clustering (Score:5, Interesting)
http://blogs.linbit.com/p/406/raspberry-tau-cluster/ [linbit.com]
Obligatory Blake 7 Reference (Score:2)
Re: (Score:2)
"You pathetic fool. That isn't Orac! Look at it! It's just a box of flashing lights!"
Slow Pi (Score:2, Insightful)
Running the numbers from the paper says the $1000 x86 compute node took 3.85 seconds on a benchmark, where the RPI cluster took (456/32)=14.25 seconds and also cost about $1000. Thus, after porting the software, a 3.7 times slow down was achieved over traditional methods.
While there may be some gains (GPIO and such may be useful in this context) they didn't appear to be used here.
This looks like a fun project, that got research money, but was not very useful for the goal the money was supposed to be spent o
Re: Slow Pi (Score:1, Informative)
If the purpose was to make a fast computer you may have a point. But the need for this project was to have a lore cost cluster to run massively parallel/distributed software. A single or low number of cores (relativity). May not give the solution you want. By exemplem, if you have a fast algorithm that can has to be run in order with no parallelism it will run fast on your $1000 x86. But the only way to speed this up is to use a faster processor then your technology limited. If you derive a different algor
Re: (Score:2)
If the purpose was to make a fast computer you may have a point. But the need for this project was to have a lore cost cluster to run massively parallel/distributed software. A single or low number of cores (relativity). May not give the solution you want. By exemplem, if you have a fast algorithm that can has to be run in order with no parallelism it will run fast on your $1000 x86. But the only way to speed this up is to use a faster processor then your technology limited. If you derive a different algorithm that may be a bit slower but allows massive parallelism, then you can make the system faster by adding more hardware. This system is not about doing things fast, it's about seeing how things run on a cluster. If you used the x86 then you would get a wrong result faster.
by another exempls.
ah fuck it, the benchmark is supposed to test that. so it is in a parallel thing faster than the pi cluster. on a single thread thing it would be ridiculously slower to use the pi's.
anyhow, I would wager that the point here is just to test the parallel algorithms on real hw - not to run them fast, but to prove that the basic ideas work.
Re: (Score:3)
anyhow, I would wager that the point here is just to test the parallel algorithms on real hw - not to run them fast, but to prove that the basic ideas work.
I guess the issue is that building this cluster for accurate testing of the behavior of distributed algorithms was probably cheaper than trying to build an accurate simulator for it running on a desktop workstation would have been.
Re: (Score:3)
So you can make it faster by adding more hardware or.... adding more hardware. Parallel and distributed are two very different things, and you cannot run a distributed anything on a single cluster, if you do, it would be properly named parallel. Anyways, the comparison is still valid - the RPI cluster failed to deliver; it was slower, was just as expensive as their benchmark x86 machine and probably 1000x as complex.
You're right in what you say about algorithms, but it only holds if you already have unused
Re: (Score:2)
>> So you can make it faster by adding more hardware or.... adding more hardware.
Gene Amdahl says different.
Re: (Score:2)
"processors with multiple cores are cheaper than mutliple processors with one core."
And both are cheaper than one really, really fast core. You can only really go up to 4Ghz with off-the-shelf parts - any higher than that and you're on to exotic cooling systems involving liquified gasses of one type or another. The record is 8.8GHz, but that took liquid nitrogen.
Re: (Score:2)
You can only really go up to 4Ghz with off-the-shelf parts - any higher than that and you're on to exotic cooling systems involving liquified gasses of one type or another. The record is 8.8GHz, but that took liquid nitrogen.
Of course, just measuring GHz isn't everything. As that's an AMD chip, you could probably get similar single-threaded performance by overclocking a recent Intel chip to about 6.6GHz [pureoverclock.com] (consensus seems to be that in computationally-intentensive tasks, sandy bridge is about 25% faster than bulldozer).
Re: (Score:2)
Re: (Score:2)
Parallel and distributed are two very different things, and you cannot run a distributed anything on a single cluster, if you do, it would be properly named parallel.
It's quite obvious that any distributed system is inherently parallel (unless you decide to do only synchronous message passing, which would be stupid). And if that cluster is comprised of isolated nodes passing messages over a network, then it's a distributed system - by definition.
Re: Slow Pi (Score:4, Informative)
Made for specific availability + project priority! (Score:5, Informative)
.
This lets him escape the externalities which might impinge on his getting his own work done, like the big bad Beowulf cluster not being up or available when he needs it, or it being prioritized for someone else's project (say a professor who has tenure and more funding available). Those sorts of shenanigans would delay his work. So a 1/3rd speed cluster that's always available for your own project is a helluva good deal at 1/32 the cost of the big bad beowuilf cluster, eh? At least I think so!
Re: (Score:2)
but the 32 raspberry pi's. are 3 times more expensive per compute speed unit than the onyx node he benchmarked against.
that's to say the 1000 dollars(8 threads) machine is about 3 times faster than all the raspberry pi's combined! it's a vastly superior computing solution.
it has to be for proofing some supercomputing sw and learning more than for anything practical.
you can't even get the pi's for price that would get you 32 pi's for a thousand bucks though. and add costs for cabling, power sources etc.
Comms and network testing needs hardware!!! (Score:5, Informative)
.
Especially considering that this system is going to be used for wireless communications protocols, the real hardware solution is IMHO the better way to go.
Re: (Score:2)
yeah, for that it makes sense, as a learning/testing tool as I said in other comments.
but you said that it's 1/3rd of the power of the beowulf cluster for 1/32 of price, it just doesn't go that way(if it did, it would scale for super computing at vastly cheaper price than the pc nodes). the cluster is 1/3rd of the power of a single pc for more expensive price than a single pc..
Re: (Score:2)
I'm sorry, but ... what? The locking and other interprocess overhead will not increase on a multi-core single-node solution, it will decrease. If your system can run lock-free on the multi-node solution, they can run lock-free on a multi-core solution. It's a fleet of processes talking to each other via TCP/IP either way (except on a single-node solution you have additional options like UNIX-domain sockets or named pipes).
The only way I could see it possibly being a win is if the system being simulated i
Re: (Score:2)
And how does a single node effortlessly simulate the data propagation delays that are inevitable in a distributed system? Do you have a solution that involves work less than the worth of $1000? (Well, I suppose building up the RPi cluster took some time as well..)
It would be a more general solution if such software was written, but I wouldn't say cheaper.
Re: (Score:1)
Re: (Score:2)
32 pis, 800ma per pi, 25.6A. Call it 30A to give some margin for error. Not exactly exotic - should be doable for thirty quid or so.
I've read about servers that pack hundreds or thousands of arm or atom chips into one enclosure, giving great performance-per-watt for heavily threaded workloads. Mostly targetted at webservers.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
RPI is cheap. Now, scale this to a bunch of Panda's or Gumstix running in a suitcase. Wala luggable supercomputer.
Re: (Score:2)
Scale to ODROID-U2. It only has a four week warranty, but if you use enough of them the presumably high failure rate might not impinge on operations. Delivered, it costs about the same as Pi, but it's a lot more machine. It has the same problems with proprietary chips, but they're the same problems after all, it's not like R-Pi doesn't have them.
Re: (Score:2)
(er, delivered, it costs four times as much as the Pi, but it has four cores, and a lot more of everything else too. so what I meant to say but didn't (in b4 correction) is that you get more for your money. The abysmally short warranty is why I don't own one already.)
Re: (Score:2)
Wait a bit. See: http://olimex.wordpress.com/tag/a20/ [wordpress.com] - when these become available, they'll be about 4x the speed of a pi for about twice the money. Plus the olimex boards have a lot more GPIOs and useful stuff like that. :)
Re: (Score:2)
Thanks for the heads-up! I will, in fact, wait. (I am getting an Ouya for the living room, but that's something else...)
It is the LEDs stupid (Score:2)
Rack mounting? (Score:5, Insightful)
Not to diminish your achievements which are otherwise quite cool, but this novel approach to rack mounting is anything but. Quite possibly the single most important feature of a rack is ease of component access. By tying all components together with PCB standoffs you basically can't remove a single RPi if there's ever a pressing need.
If anything you've shown a novel way of cramming things together without the use of a rack.
Re: (Score:2)
Re: (Score:2)
Granted there's nothing much to remove from a pi mounted like this other than the SD card.
The only time I'd image you'd tamper with a pi is when it decides to die from the overclock.
Of course, but that's the point. Racks exist to allow you to take out components to swap. Often this is damage, sometimes this is upgrades, sometimes expansion.
Of note is that there's now several variants of the RPi including 256MB and 512MB versions. So upgrading may be a logical choice too.
Report could be improved... (Score:1)
Neat project but really the report left me frustrated.
You start by comparing price and features of RPi to two other alternatives, e.g. Onyx node.
Then you compare one RPi to one Onyx node. But moving on you never do a price or performance comparison of the 32 RPi cluster against the same onyx node which would be the interesting thing.
Figure 5 shows something you could possible relate to the earlier information but only graphically. You don't state the actual numbers!
Moving on "As discussed earlier, each RP
Re: (Score:2)
the 4-to-8 improvement is probably because it only has 4 real cores.
However, I suspect that later he doesn't do the comparison of the single Onyx node vs. his whole cluster because it would show the Pi cluster as a pointless endeavor(It's only useful for learning parallel computing, not for executing it). his 32 Pi cluster is more expensive than a 1000 dollar node(which certainly isn't the cheapest way to get a 3ghz quad core pc).
should this perhaps be at RPI instead? jk! (Score:4, Insightful)
So the summary of the informal document is that it's cheaper to build a 32-node Rasp.-Pi cluster than to purchase even a single node of the 32-node Beowulf cluster that may or may not be available to you. And if you want to get your Ph.D. work done, I must agree that it sounds better to not be dependent upon the whims and follies of others' benevolence in having external hardware clusters available for your use. Bravo, Joshua Kiepert, I like your "informal writeup". Best wishes on your work!
Re: (Score:1)
welcome to the home of jealous haters (Score:5, Interesting)
i wish i had done this, therefore you suck.
Very impressive (Score:2)
Re: (Score:3)
Yes, as he threw a real power supply at it instead of using the crappiest USB adapter he could find.
overcomes 'others have priority' (Score:1)
The big problem in Ph D studies is your own review a few weeks before submittal time when you realize the things you should have done, at this point your own 'cluster' and always available is a "beyond price" jewel asset' to you. Awaiting priority on faculty assets could cost you your degree.
Good luck to you. Good thinking out of your priorities.
More measures & better data representation (Score:2)
I would have preferred graphs with lines, logarithmic scale and comparison with the theoretically attainable performances.
Moreover, some more popular benchmarks should be run: HPL, NERSC Trinity benchmarks, or even real applications like Quantum Espresso which has some standard benchmark tests.
Power consumption should be measured when running any benchmarks as it may vary depending on the type of application (CPU bound, memory bound).
Nice project on the electrical and electronic engineering part, could bene