Mini-ITX Clustering 348
NormalVisual writes "Add this cluster to the list of fun stuff you can do with those tiny little Mini-ITX motherboards. I especially like the bit about the peak 200W power dissipation. Look Ma, no fans!! You may now begin with the obligatory Beowulf comments...."
Floating point performance (Score:5, Interesting)
I decided against a mini-ITX cluster because the floating point performance (why else would you build a cluster?) of VIA CPUs is just abyssmal.
Is there any reason why there are no P4 or AMD mini-ITX mobos around?
Re:Floating point performance (Score:4, Interesting)
Seriously, though... (Score:5, Interesting)
Has anyone tried stuffing several into a single 1U chassis? For a sort of cluster of clusters?
shuttle (Score:3, Interesting)
The only problem I've found so far is they ony come with nvidia onboard graphics, but that's what the agp slot is for.
Re:Floating point performance (Score:1, Interesting)
This with Chess (Score:3, Interesting)
I built a fanless ITX system... (Score:1, Interesting)
Cool stuff ... (Score:5, Interesting)
Here's a picture [amd.co.at] of our first 4 boxes. The USB stick seen sticking out from one of the boxes is bootable and an excellent replacement for floppy disks...
FLASH... (Score:2, Interesting)
Maybe he should consider PXE instead.
Whilst not clustering... (Score:5, Interesting)
Re:Floating point performance (Score:2, Interesting)
Btw, you're wrong - there ARE P4-based mini-ITX mobos.
Re:FLASH... (Score:5, Interesting)
Actually, he's not. IBM Micro Drives are not CF, they just have a CF form factor/interface to be compatible with hand held devices. They are hard drives.
Re:Seriously, though... (Score:5, Interesting)
Re:Floating point performance (Score:2, Interesting)
In fact, a Pentium M platform would be a perfect choice as long as the mobile Athlon mobos are impossible to find.
Does anyone have a link?
Re:Inexpensive for testing purposes, (Score:5, Interesting)
Sounds Fun (Score:5, Interesting)
I think all of these could be solved at once. What if someone built low-power, low-noise, and low-cost computer, good enough for running light office applications? I don't mean OpenOffice, but rather lightweight programs that implement the functionality people use _without_ the bloat. My 486 handles email just fine and the WYSIWYG word processors were once satisfied with a first-generation Pentium (and even these were already bloated).
Current PDAs have more than enough processing power to handle those tasks, and I've noticed that company's like gumstix [gumstix.org] build and sell devices almost like what I have in mind (the gumstix don't seem to have display connectors, though). Hey, these machines could actually be portable and have a really decent battery life (more than a full working day); that would be a killer!
Am I just daydreaming here or are others with me? Maybe you know of devices that do this job? Someone recommended Sharp's Zaurus, which is excellent, but still rather more expensive than what I have in mind.
Re:Floating point performance (Score:2, Interesting)
Re:Inexpensive for testing purposes, (Score:2, Interesting)
Why is it that most people think that 1 4GHz system is just as fast as 2 2GHz systems? This is the fallacy that never fails to irritate me. The fact is that for a lot of things, the number of machines matters. It's a pipeline, and a CPU can only do one thing at a time. For many application having multiple CPUs that are slower will give you faster response time than a single fast CPU. Of course, most people here don't get that, give it up when trying to talk to the PHB about it.
Re:Imagine.. (Score:2, Interesting)
Would it be possible to set up a clusta of these in a stretch Escalade? If so, how much would it cost and can I get some iced out (real diamonds, not no zircon encrusted shiat) 1U or smaller cases for the nodes in the clusta? Anybody willing to set something up for me. I gotst cash fo it.
A beowulf cluster of FreeBSD machines? (Score:2, Interesting)
Re:Floating point performance (Score:5, Interesting)
Older C3 cores run the FPU at half the clock rate. If you get the fanless 600 MHz EPIA motherboard, the FPU will be running at 300 MHz.
The newer, Nehemiah core C3 chips run the FPU at full clock speed. Any C3 newer than Nehemiah should run the FPU at full speed.
He used the VIA EPIA V8000A motherboard with an Eden core CPU. From what I found on google (here [hardwareirc.com]), the Eden core does run the FPU at full clock speed.
In any event, he said the cluster has more processing power than a four-P4 SMP system, while taking less electricity to run. And it will be quieter and more reliable. I'd like to see actual benchmarks, but it seems like it makes enough sense.
I read about a cluster of PocketPCs, and that didn't make practical sense. It was just a fun project.
steveha
Re:Inexpensive for testing purposes, (Score:3, Interesting)
That said, the only time a cluster of servers will do better than a fast single node is when the task divides well over the cluster. Great for clustered webservers, even distributed databases (in fact most server processes), but pretty damn useless if you're trying to do interactive work, or calculate something which *doesn't* divide well. Anything with time-dependent processing (ie: you need the results of the last step to calculate the current one) will run as slow as your fastest node, minus some for overhead...
This doesn't dispute your point of course, but I think the sense of how you said it over-stated the case for the usefulness of the system.
Simon
Re:Floating point performance (Score:4, Interesting)
Check out Theo de Raadt's little benchmark:
http://marc.theaimsgroup.com/?l=openbsd-misc&m=10
Re:Floating point performance (Score:5, Interesting)
But at a significantly higher development and debugging cost. Why go for integer adaptation, if a P4 can do four FP operations in one clock, using SSE2? I have tested my 2.4GHGz P4 at 6 gigaflops, in a practical application doing matrix inversion. The theoretical maximum for my machine would be 9.6 Gflops. If you RTFA, you'll see they mention 3.6 Gflops performance for their cluster, about 60% of my single-processor system. I see no point at all in building that cluster.
Re:Floating point performance (Score:1, Interesting)
I'm interested. Do you have to use assembly to get this, or can you plunk down some C code that reaches this?
HA-Cluster on Mini-itx boards (Score:2, Interesting)
It was used for demonstration, but the mini-itx machines are still used quite a bit for testing etc.
Re:Floating point performance (Score:3, Interesting)
Mars is not made any closer to Earth by the revelation, that Alpha Centauri is really far...
This is why you might need the FP performance. I was answering a totally different question -- what would you do without the good floating point performance.
Thank you, thank you.
Would you, please, demonstrate, how I can rebuild a project of 3000+ files, modified by 100+ developers (ccache helps, but still)? Or compress a 32Gb database dump? Granted, these tasks are nothing compared with, say, protein folding, but they are computationally expensive still.
Why this particular set of software / booting? (Score:5, Interesting)
I've always wondered; why not PXE boot something like this? Set your node controller to also do DHCP and you're set.
While you're at it, use the CL version for the controller which has two network cards and build a NATTING firewall into the node controller too. Then you have a plug-in appliance that doesn't interfere with your network topology at all. PXE boot it and the motherboards will only need RAM.
The board he used is available for $99 with proc. A stick of 256 is probably around $20.
The best price froogle would give me on the drives he's using is $60, and they're prone to wear and tear.
Add in the $10 CF-IDE adapter and the drive is %60 of the cost of the motherboard itself...
Hell if you don't want the network bogged down with a bunch of PXE booting nodes all the time, just get cheap CD drives and put dyne:bolic [dynebolic.org] on it, which does automagic clustering...
Personally, if I were to do it, I'd set dynebolic to PXE boot, get a huge stack of motherboards and RAM, and do it that way. Then adding/changing nodes is relatively simple... IIRC, they're even factory set to try PXE booting if no IDE devices are found...
The only other change I would make would be to ditch the 16-port switch... move to 4-ports, connect those to a 4-port with gigabit uplink, and connect that to a gigabit switch. Of course at this point I'm talking about really scaling the cluster up, to a few hundred nodes or so. At that point I'd stop using a mini-ITX board for my node controller and go with a motherboard with a bit more juice behind it, dual procs, RAID 0/1, the whole shebang...
Now if only I had a couple grand burning a hole in my pocket... speaking of which:
motherboard: $100
RAM: $20
DC-DC converter: $30
CF adapter: $10
Microdrive: $60
Total: $220
Total PXE booter: $150
Savings: 30%
So, not counting the costs of cabinets, power rectifier/UPS, wiring, network gear, and labor, you can increase the size of your cluster by %30 for the same cost, just for setting up PXE boot...
Re:Inexpensive for testing purposes, (Score:3, Interesting)
Samba throws open a hell of a lot of threads. (At least on my network of 200 people.) A cluster with each node posessing an external network port would be able to split the threads across dedicated processors. Not too useful for me, but if someone was trying to serve a few thousand clients at a time, that would be useful.
TMYK
Re:Seriously, though... (Score:4, Interesting)
Power. (Score:3, Interesting)
Re:Floating point performance (Score:3, Interesting)
Point by point:
You did not get it. You are looking at the bit (and byte, and word) as a number. I suggest you look at it as a unit of information. With 64 bits you can only have 2^64 distinct possibilities. If you choose to treat them as numbers -- fine, you only have 2^64 distinct numbers.
Okay FYI I do remember first year discrete math, I have experience with inner details of computer architecturs, I understand two's compliment representation, IEEE floating format, how ALUs work for integer operations etc... I have read Shannon's information theory paper etc. What you are saying doesn't change the reality of the situation though, because you are looking at numbers without regard for the staggering dynamic range in real life scientific computations.
You may split the 64 bits to use some of them to represent the mantissa and some as the exponent. Or -- use all of them to represent the integer number of the smallest units in your application's domain.
Yes and both approaches have very different merits. For most numerically intensive programs, there are very specific requirements for numerical precision. All numerical programs are necessarily approximations to infinite precision arithmetic, and the following constraints are the norm:
Now if you examine fixed point arithmetic, you will find that the first criterion is met, however the second and third are not met nearly as well as they are in FP of the same number of bits.
The nature of scientific operations is such that it makes sense to think about numerical calculations in terms of significant digits, due to the nature of error propagation in numerical arithmetic.
The second method, actually, gives you better precision in a controllable (by you) fashion. If the difference between the smallest and the biggest quantity of those minimal units in your application exceeds 64 orders of binary magnitude, than 64 bits is not enough for you -- regardless of whether you use floating or fixed point. You either lose precision (FP) or overflow (int).
It most certainly does not give better precision for the same number of bits used. Think of FP as a lossy compression algorithm. It allows the use of orders of magnitude less number of bits because it alters the density distribution of the representable numbers to meet the above specificiations.
Also you should note that many applications do not have a "basic unit". For instance, what is the "basic unit" of length? What if your "basic unit" of lenght is of a radicaly different exponent than your "basic unit" of energy or "basic unit" of time? In physics applications these basic units are near infinitesimal... we're talking 10^-51 or smaller! Then add to that that astrophysics simulations tend to work on scales that are 10 orders of magnitude greater than 1, you're talking about a dynamic range that is clearly rediculous!
The reason to use FP may be because it is more convenient to think in terms of standard units, rather than the minimal units of the application (its precision). Also, many CPUs have special features allowing to do FP computations really quickly. But it is possible to go without them.
The problem here is that you are thinking in terms of absolute error. In most cases it is the *relative* error that is important, not the absolute. Because of the exponential notation, relative error is minimized for any given number of bits used to represent numbers.
Another issue that you fail to mention is that integer overflow is rediculously easy to run into when using several multiplications in a row.
Re:Floating point performance (Score:3, Interesting)
No, I tested it with a random matrix, as in
Re:Floating point performance (Score:3, Interesting)
This makes sense. With integers the density is uniform, which is impediment in some cases, but of help in others. [Any attempt to quantify the number of cases in each group is silly and will reveal nothing, but the attempter's personal bias. With my bias, I'll insist you are underestimating the number of cases, where such uniform distribution of density is useful and desirable.]
However, unless you carefully choose the basic unit, you don't have control over the precision distribution. If most of your computations involve quantities on far edges of (you claimed 20 orders of (decimal?) magnitude) -- you are less precise than you may realize and the (carefull) use of integers may improve your results.
Of course, they all have basic units! Usually, it will depend on the application's desired precision.
Depends on the application. In yours, it is, probably, some fraction of light year.
Who cares? Even if my program operates internally on units as horrible as, say "pounds per square inch" (a.k.a. PSI) -- so be it. If 4.5 Newtons is my basic unit of force I want and the 0.025 meter is as precise as I want the length to be -- fine.
Wait, we started with 20 orders of magnitude. Is it 30 now? Fine, the 128 integers (long long) will be able to store that. But I don't believe, the tasks where so wide-ranging amounts of the same thing are common place (my bias?). Whether you are using floating or fixed point, you are not going to do this easily -- you'll risk losing precision dramaticly, or overflowing. Whichever it is, it is, probably, better to consider modifying the algorithm.
Also, the modern processors (at least -- Intel's) "cheat". Their FPU's internal precision is 80 bit by default (64 significant bits) -- if I'm reading ``icc -help'' output correctly. So they can "promote" the numbers to higher precision when applying precision-losing operations. So, floating point might win. :-)
Very valid point. However, sometimes (often?) carefully picking the basic unit and the number of bits it is possible to avoid all computational imprecisions, simply by having more bits left at your disposal, whereas the blind use of the floating will mask them and further compound all other sources of errors (measurements, estimates, &c.)
You don't need to think about it, because 1 is as close as you can get to zero with integers...
No it just has to match the smallest reasonably needed by the application -- something, it'll never need a half of. And I urge you to pick such units carefully even if you stick with floating point, because otherwise, even the smallest number in the floating point system might not be small enough at some point, and you will waste a few teracents of taxpayers' money :-)
That being said, I don't think, anyone else reads this, but us. It is hard to justify continuing the thread. Thanks for your input!