Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Hardware

Building an 1100Mhz "SuperStation" 129

Anonymous Coward writes "There is an interesting article on building a dual Celeron 550 (overclocked 366) computer by David Green; he goes a bit into the theory of SMP computers, what components he chose, and shows some benchmark results (under Linux) for the system. His computer could really crank through RC5 blocks..." Us hardware tinkerers love this sort of stuff; the rest of you can feel free to ignore it. (AboutLinux.com is where this cool scoop came from, BTW.)
This discussion has been archived. No new comments can be posted.

Building an 1100Mhz "SuperStation"

Comments Filter:
  • by Rendus ( 2430 )
    Exactly. I'm running Linux on this here 486, IPMasqing for 3 other computers to my cable modem and serving POP3, Sendmail and Apache. Normal load average? 0.06 or so.
  • What is an RC5 key? What's it for? What's it do? Good or Bad? Any other info you can give to help out a newbie would be greatly apreciated.

    SopWATh
  • by Nick Mitchell ( 1011 ) on Saturday October 09, 1999 @10:29AM (#1627147) Homepage
    Certainly some applications are embarrasingly parallel (aka "data parallel"); that is, after a tiny bit of startup cost, your speedup is only limited by the number of processors and the size of the problem (amount of data).

    Examples of data parallel problems: image rendering, key cracking, matrix-matrix and matrix-vector multiplication.

    However, many applications are not embarrasingly parallel; that is, the processors must communicate (aka synchronize) at certain points, in order for the computation to proceed. Here, your speedup is limited by the

    Examples: sorting, matrix factorization (e.g. LU decomposition).

    In my experience, commodity Intel motherboards scale very poorly for this latter class of problems. Why? If the two threads always hit their L2 cache (i.e. don't have to fetch across the memory bus to main memory), then everything might be ok. (even then, write sharing can cause cache thrashing!). If the threads must fetch miss L2 cache often enough, then (on commodity motherboards), the threads will be serialized, because the memory is not interleaved, nor multi-ported.

    On fancier (expensiver, hehe) SMPs, processors are connected to either interleaved, multi-ported memory, or over a crossbar (rather than a bus), or probably all three. For example, the HP Convex Exemplar ($$$) has all three.

    On a counting (integer) sort, a 2-processor commodity SMP is limited to 1.4/2 speedup (roughly the fraction of memory references which hit cache). The convex gets speedups of 1.95 (limited only by the tiny startup costs, as in the embarrasingly parallel case).
  • The problem with Celeron CPUs, is that they were not meant to be used in an SMP machine. They don't scale well. Better off with a PIII.

    Sure, for times when money isn't a main factor, like when buying a server for business use. But for the home user a dual P3 550+ might be out of reach in terms of cost. An Abit BP6 with 2 Celerons is a relatively cheap solution which provides the home user with an awful lot of power.

    It may not be quite as stable as a Slot-1 dual board, but that's more an issue for professional users. The BP6/Celeron combo is aimed at overclockers, who are generally more concerned about bang for the buck.
  • RC5 is an encryption algorithm from RSA Labs [rsasecurity.com]. A RC5 'key' is a specific decryption code that might decrypt an encrypted message. RSA is sponsoring a contest [rsasecurity.com] to see if anyone can crack a message encrypted with RC5.

    The reason this is even mentioned is because there is a group that is working on this contest using a 'brute force' attack. distributed.net [distributed.net] has a client you can download [distributed.net] that will allow you to participate in this contest, along with thousands of other people.

    This client is designed to use all CPU time that would otherwise be 'wasted'. People tend to use it as a benchmark, even though it's not very representative of actual computing power, since it uses a small number of instructions repeatedly.

    If you have more questions, feel free to email me at decibel@distributed.net [mailto]

    dB!
    distributed.net Human Interface
  • by Anonymous Coward
    Running two CPU's at 550Mhz does not mean the machine runs at 2*550=1100Mhz. You can't just wack CPU's into a BP6 and double the speeds - it adds multi-tasking abilities, but unless an app is written to fully harness SMP, it wouldn't double the speed. Just to be a pedant...

    I gave up the sig's years ago. Gave me lung cancer.

  • by suitcase ( 4089 ) on Saturday October 09, 1999 @07:33AM (#1627156)
    I have a dual celeron system, currently running at 504mhz. Under Linux, I couldnt be happier with the speed of things. Celerons, believe it or not, can be used very efficiently in a server, regardless of the fact they only have 128kb L1 cache.
  • I'm running the same thing almost:
    96 meg: (64 PC100 32 crap)
    So this prevents me from going over 92Mhz or so.
    I haven't had to touch the voltage settings etc.
    I run 2x 366 @ (5.5x and 92 Mhz).
    This gives me 505.981381 MHZ and bogomips : 504.63.
    Works great for RC5 stuff, but honestly besides that I haven't seen much improvement over my old PII-233.
    But hell........ Having a GHz computer just sounds cool when you tell someone.... And cheap. cheap cheap. Recommend to anyone.
    PS If you do get one download the bios update from http://www.bp6.com


  • there was a discussion on slashdot when the abit bp6 first came out. .... And it convinced me to buy one of those babys... :) I have my two 366s overclocked to 550, too. But I just bought the (quote): "Intel Celeron Processor / Designed for the Basic PC" and used the fan that came with it. Works fine. No Problems so far. jp
  • Just curious - a Celeron 366 overclockable to 550 costs $110; an AMD K6-2/450 is $55 - and I think it has a larger L1 and L2 cache, so the performance might be similar. Are there any particular reasons for avoiding AMD for SMP applications?
  • Yeah,
    I don't think the AMD chips have SMP support on them (althought the K7's do now).
    --Mark
  • Teehee, I noticed that I submitted that post a little early. Let me finish the unfinished sentence in third paragraph:

    ... with communication-bound algorithms] your speedup is limited by Amdahl's Law. As most good laws, it's pretty simple to state: the best parallel efficiency you can expect is bounded by the amount of time spent doing serial stuff.

    So, with an embarrasingly parallel algorithm, you're set! But, an algorithm which has lots of communication is doomed on two fronts: first, Amdahl's Law hits it due to the amount of communication (time spent doing zero computation!), and second, commodity SMPs, get a further hit due to poor memory hierarchy design: once the algorithm strays beyond L2 cache, the application is serialized.

    nick
  • Overkill.

    I used a AMD 486dx4/120Mhz registering a mere 60 bogomips with two tulip ethernet cards tossing packets around for nearly 6 months. The machine ran flawlessly with 40mb EDO ram, and had consistant uptimes of months (upgrades, and misc).

    Now I've upgraded to a AMD K6-233Mhz with 128mb EDO, and have been using it as a workstation, hosting dynamic websites on cable. redirecting quake servers inside the LAN and other neat stuff. I just got a dual celeron 366 setup, and will be playing with it.

    Migrating my existing system to it, can't wait to play in 1k bogomips. Should be interesting.

  • Worse than a lack of knowledge of multiprocessing, it represents the very sloppy use of (lack of understanding of?) units. MHz tells you how many times a second something happens. If you've got one soundcard sampling at 44.1 MHz, and you slap a second one in, and start advertising that you can do sampling at 88.2 MHz--you're wrong! You can now record twice as many 44.1 MHz audio streams, but you can't record at twice the sampling rate.

    You could slave the two soundcards together under a master sync, write the code to make it look like a single sound card, and have a single input... (but this gets into the poster's comment about lack of SMP understanding).

    To finish this analogy: take one of your dual Celeron boxes, cat /proc/cpuinfo. Do you see one CPU 2Xspeed or two CPUs at the 1X speed? (I know you can just edit the source... So does _everything_ on the box see a single processor at 2X or two processors? This is where the SMP knowledge comes in again.)

    I've got an SMP box which really rocks compared to a single processor box of 2X the speed--it does mail/web/services ~5 users simultaneously (not just "pine"-users either.. matlab, gimp, netscape, emacs type-users. So my experience is that SMP boxes are great for many processes (unless you want to run just one thing like rc5--and then it is still running processes for each processor! compare this w/ the soundcard analogy above.)
  • The AMD K6, K6-2 and K6-3 (as well as Cyrix 6x86 and M2s) all have SMP support via the OpenPIC standard. The problem is, no motherboards support this standard, therefore you can't use these processors in an SMP configuration.
  • That Little Green Heat Sink (LGHS) of the BP6 may
    need some special treatment if you want to oc your
    BP6 baby:

    The LGHS is casted out of Aluminum. The side which
    makes contact with the BX chips is not flattenened
    mechanically after casting (bean counters?).

    Problem:
    When the LGHS cools down after casting,
    it will bend upwards because of its shape
    (look at its bridge like design). Now the
    contact surface will make very bad contact with
    your BX chip.

    The original (bended) heatsink may produce hot
    spots on your BX chip. Even an additional fan
    cannot help, if the heat sink makes poor contact.


    ______|---------|______
    |-------------------|

    LGHS hot after casting


    _______/--------\_______
    |----______________----|

    LGHS cooled down after casting


    Solution:
    Pull the two white plugs which are pressing the
    LGHS against the chip and remove the LGHS.
    Take fine water resistant sanding paper (120 is
    OK), apply some water for smoothest results and
    put paper on a flat piece of glass. Now flatten
    the LGHS contact surface.

    Control result by holding a ruler against the
    surface and look against light. If surface is
    flat, you'll see a nice constant boundary.

    Use heat transfer compound when reassembling.
    This may make the difference between a stable and
    an unstable board. Same thing applies for the CPU
    heat sinks.

    Now reboot into BIOS and set fsb at will :)


    --
  • I have a dual 400 system setup, but I am not overclocking it... It seems that I have enough problems as it is... Every four days it crashes. Just locks up tight on whatever it is doing and halts. No idea why, it just does.

    It is fast, but it pisses me off... What can you do eh?
  • Could the crashes be due to overheating in the BX chipset as mentioned in the article?

    I ran 4 SETI processes on my machine for 4 weeks, and it raised the CPU temp about 4-5 degrees F. At the same time I did my normal everyday stuff, and I had no problems what-so-ever.

    I haven't really tested out sustained max I/O throughput, but I hadn't thought about the chipset being a possible point of failure from overheating.

    What would be a good test for max sustained I/O on Linux? I would like to see if I could kill the machine.

    Also, has anyone tested the kernel patch to make the HPT66 (UDMA/66) chip work on the BP6?

    Thanks,

    PS:No its not 1100MHz, but 1101 BogoMIPs still looks cool when it starts! :)
    (and yes.. I know its a useless number for comparisons)
  • I've got an overclocked celeron 366 SMP machine, so do 2 of my friends.. is this really rare?

    Maybe it's cause were Canadian, eh? :)
  • Take care for the little green heat sink

    Check your SDRAMS for memory errors. I had the
    same problems for about eight weeks until I found
    out that some SDRAM memory cells were unstable.
    Memtest [sgi.com] is quite good in finding broken memory
    chips which other memory testers cannot find.


    --
  • I find this issue to be overblown in my completely untested opinion. I've been running a 400 at 83mhz FSB with a 1:1 AGP:PCI ratio for some time now, with no problems, and I know quite a few people who are at 83 or 75 mhz FSB with no problems either. Where you start to run into problems is with older hardware, like my Future Domain SCSI card from `94, AKA the AHA-2920A, which simply refuses to work. Put it this way: anyone who needs to actually worry about FSB incompatibly problems obviously knows their stuff, and is more than likely going to be building their own system with cutting-edge parts that won't have FSB issues. For most of us it's not going to matter.
  • How the holy f*cking HELL is this "offtopic"? What kind of MORON are you to not notice that, in a conversation about dual celeron machines, someone talking about a dual-celeron machine is about as ON-TOPIC as you can GET?

    /. really, REALLY needs to start giving moderator access to people with *working brains*.

    - A.P. (Score -1 Flamebait, if you want. I just hope I knocked some *sense* into some idiot moderators.)
    --


    "One World, one Web, one Program" - Microsoft promotional ad

  • much appreciated, I will take a look at that.
  • You've heard wrong. The K7 is capable of SMP, the K7 Ultra will use a different type of bus.
  • Just my 2 cents, see what you do with it... My system: Chaintech 6BTM mainboard, Matrox Marvel G200, 128MB mem, WD 10.something GB harddisk. Celeron466 Proc. (7*66MHz).. The Chaintech board can be tweaked to run at 66, 68,75 or 88 MHz. (or forced to 100MHz bus and then even more tweaked to 100+MHz... But for Celerons this is nice to try for a few seconds, see that the CPU won't hold up and then abandon this). Currently I have my board running at 75MHz. Which means the CPU does a cool 525MHz, the AGP bus runs at 75MHz, with which the Matrox card doesn't seem to have a problem with. Neither do any of the PCI cards. (SCSI controller for my CD ROM and CD Writer, soundcad and network card) It has been running 24/7 now for two weeks without a problem. And that's even with doing some video-grabbing.
    I did try to crank the bus up to 88MHz, but during the POST, after seeing the CPU the system halts... Since there are no components I can ditch, I didn't bother to find out what exactly caused the halt in the POST.
    Running the board at 100MHz bus (which would mean the proc would be running at 7*100MHz) the board didn't even see the CPU. Not even at a real "cold" start. (room temperature system, not turned on for at least an hour)
    But anyway... I wish I'd bought the ABit dual Celeron board now... That would have given me a 2*525MHz machine.... (if that board supports the 75MHz bus-freq...)

    Why am I telling this... Hell, I don't know.. It's late and I'm babbling... :-)
    ----------
    'We have no choice in what we are. Yet what are we,
    but the sum of our choices.' --Rob Grant
    ----------
  • by Anonymous Coward
    Just thought I'd mention how happy I am with my Dual PII 400 machine. I started with a single processor but added the second about three weeks ago. I would like to comment on how much simplier it is to "Add" SMP to LINUX as compared to NT. With NT you have to either buy the resource kit, or reinstall. With Slackware, just checked the SMP box after a make xconfig, recompiled and I was good to go. My benchmarking was with rc5des. I now get around 2.25 Mkeys/sec under Linux, about the same under NT.

    Here's my computer specs:

    SuperMicro P6DBU motherboard (Ultra2 SCSI, mmmm)
    PII 400 x 2
    128MB Generic PC100 RAM
    Matrox Marvel G200 Video
    SB AWE64 Gold
  • *grmbl* Since this was about RC5... Forgot to mention the macine does an average of 1,445 MKeys/s

    Yes.. I know... I shouldn't be posting this late and in thise state.. Yada Yada Nag Nag Whine Whine Shit happens......
    ----------
    'We have no choice in what we are. Yet what are we,
    but the sum of our choices.' --Rob Grant
    ----------
  • "18GB Western Digital Expert 7200rpm UDMA 66 hard drive (Linux only supports UDMA33!)"

    As I write this from a box with a 18GB Western Digital Expert 7200RPM UDMA 66 hard drive running Linux, nope - that isn't true. In fact, he mentions the existence of a patch. Several ways to run the drive - as a regular old ATA, as a UDMA 33, or a UDMA 66. The patch exist for the 2.2 kernel series. I'm running 2.3.13 - which doesn't need to be patched. In the "for what it's worth" category:

    [root@eco jgreer]# hdparm -Tt /dev/hde
    /dev/hde:
    Timing buffer-cache reads: 128 MB in 1.25 seconds =102.40 MB/sec
    Timing buffered disk reads: 64 MB in 3.73 seconds =17.16 MB/sec

    Jim

  • I for one am utterly sick and tired of the holier than thou attitude that seems to flourish on this site. This site gives a real nice forum to express ideas and comments, and yet every single day I see conversations degenerate into name calling and other antisocial hostilities. Jesus H Christ, if you cannot communicate like a civil human being, then PLEASE go back to AOL chat rooms where name calling and assorted flames are more commonly accepted. Wakko, I looked for your original comment , as I do have moderator access. I sure don't see it. Was it written using the same inflamatory tones? Learn to communicate, it will really help in the long run. Granted, mistakes and poor moderation do occur, but to degenerate to calling people morons ...really now...grow the fsck up!!
  • Okay, the package for Linux that is capable of reading sensors on various motherboards (dunno about the abit, but it works on mine) is lm_sensors and is available at "http://www.netroedge.com/~lm78". It consists of kernel modules, a client program or two, and a C library for building apps that read the sensors. Good luck.
  • he gets paid though, doesn't he? and i've never had a problem with anything on slashdot, and most people haven't, i think slashdot's pretty good, adn unfortunately i'm addicted to it, as i'm addicted to checking my sparse email.
  • At least in theory, dual CPUs should help quite a bit in X. The client sends requests to the server, which then does the drawing. With a dual box, they both get their own CPU to work on, meaning the client can prepare the next request while the server is drawing the last one.

    In the X FAQ, one of the suggestions for speeding X up is to "swap" machines with a coworker, and use his machine set to your display, and vice versa. The point of this exercise is to reduce the constant context switches (1 per X request). Of course, with extensions like MIT-SHM this is (obviously) no longer a win, however on a SMP box, it very well could be.
  • Mhz==Clockspeed. Period. Nothing about ops per second in there, at all.
    mhz!=ops per second, by a long shot...

    Man hours is a concept of throughput, not speed.
    Mhz is a concept of speed, not throughput.
    If you want to measure clock speed, you use Mhz..
    If you want to measure throughput, you use mips, mflops, or other such unit.

    Yes, there is a big difference, and my original analogy holds.
  • Mhz==speed, specifically in this case, cpu clock speed.
    mips/mflops/etc==throughput, in this case, dual processor throughput.

    He should have given us the bogomips score if anything, though it is a completely inaccurate way of testing throughput, at least he would have had his units right...
    (yes, I know he did give us the bogomips score later, but the tilte was still a major screw up..)
    And yes, I do have a sense of humor (somewhere around here... D'oh, where did I put it again?)
    and could figure out what he meant, even if it was completely wrong...
  • Well, if you're more intelligent than I and mature enough to be out of high school then perhaps you'd care to educate rather than denigrate?

    I still think the article was pretty clear and informative in it's claims and never mentioned a clock speed (bus speed*clock multiplier) above 550MHz. I suppose you can interpret the 1100MHz figure however you please. I chose to interpret it in a way that made sense. If you choose to interpret it another way then perhaps you should consider how it was intended before slagging the writer of the article off as an idiot.
    The Great Chunder Page - Alcohol Induced Fun!
  • I believe supercomputer status is based on number of operations executed per second, not the number of processors.

    - Scott
    ------
    Scott Stevenson
  • MHz != Clockspeed, MHz == 10*6 cycles per second. I never mentioned operations once.
    The Great Chunder Page - Alcohol Induced Fun!
  • by Splat ( 9175 )
    About the hardware buying cycle ... I have experienced the same thing. While a 550 x 2 SMP Celeron machine certainly looks cool on paper - why in the WORLD does anyone need that? Ever since I cut back on my game playing (I regulate myself to basically emulated consoles now.. I find they're the most fun) and in the past year I've migrated from Windows to Linux as my desktop of choice, the hardware rat race doesn't amuse me anymore.

    I've been trying to explain as of late to people
    they put entirely too much emphasis on the clock speed of the CPU. I explain how the real bandwith in a system is the hard drive and video card usually. But no one listens ... they'd much rather shell out $700 for the latest 900MHZ WunderProcessor CPU and a board that supports it and plop it in their system,rather than taking their current perfectly usable system and say.. implementing SCSI in it which would probably make a bigger performance boost.

    I know this will sound cheesy - but using Linux has given me more respect for technology. Before I'd think "oh gosh, that 486 sucks. It can't do anything!". Now and days, I see a 386 40mhz with a cd-rom and think "what a perfectly usable little linux box that could be!".

    Stop software manufacturer & CPU makers siphoning of your wallets - use Linux. The little OS that could.
  • It is 1100Mhz. All that means is that you have 1100M cpu cycles a second which is undeniably true as far as I can see.

    I didn't see anyone claiming that it was equivalent to a single chip running at 1100Mhz, in fact if you actually read the article the guy explains what SMP is useful for and what limitations it has in his "Theory" section on the first and second page.

    Slashdot should implement a system which gives people -1 on a comment unless they've actually visited the article being discussed (logged by having an internal link which redirects to the target article). There are clearly far too many people commenting on something they haven't read (often in an attempt be first post?).

    The Great Chunder Page - Alcohol Induced Fun!
  • Can you tell what number of RC5 blocks you completed at the different clocking rates every day?
  • Actually, aside from buying the NT Resource Kit you can also get the uptomp.exe utility as a download from the Microsoft website. Unfortunately the program does have some problems and many people are left with unbootable systems afterwards, so there's also has a Knowledge Base article about doing it by hand.

    BTW, I get about 3.16 megakeys a second on my dual Celeron 366 at 550. RC5DES is certainly very scalable, much more so than SETI.
  • It continues to amaze me that people insist on commenting on articles they clearly haven't read.

    This demonstrates a complete lack of knowledge on how to avoid making yourself look like a total fool.

    The article is very clear on what an SMP system does and does not do. If you'd read it you'd know that.

    The 1100Mhz figure is correct, it is simply a measure of the number of CPU cycles per second going on under the hood. It is not a measure of overall speed, nor did anyone say it was.

    Some of the 'experts' on Slashdot are clearly so 'knowledgable' that they don't need to read an article to comment on it. Slashdot should do something about this (see my post above).

    The Great Chunder Page - Alcohol Induced Fun!
  • One reason for avoiding AMD microprocessors for SMP is that there seem to be no motherboards available.

    Another good reason is, that people going for multiple CPUs are often people who need floating-point calculations going fast, and AMD has been unable to deliver this for long.

    However with the Athlon, AMD seem to be past this. For sure, my next box at home will be a dual Athlon, if the boards show up, and the price is somewhere near the dual inel P today.

    I can well imagine my next box at work being a SMP AMD box. But we need to see motherboards first. It would also mean a lot to me, if Asus shipped a SMP Athlon board. They've been shipping some rock solit intel based SMP boards so far, and seeing them shipping Athlon SMP boards _would_ make a difference, at least for me and my employers.
  • Some people claim the Celeron doesn't scale well and to some degree they are right. The Celeron's second level cache, though twice as fast, is only one fourth the size of that of a Pentium III and as such main memory will have to be accessed more frequently. If you don't happen to be overclocking, that main memory runs at only 66 Mhz , compared to 100 Mhz on other Intel processors. Add another Celeron on the same bus and at some point you're going to have contention as both CPUs wait around for their memory accesses.

    This isn't always a critical issue. One program that's hardly affected is the RC5DES client. This program is optimised for i686 processors and it supports SMP. The client will automatically recognise and use 2 CPUs and a second CPU gives instant practically double key throughput. My own dual Celeron 366 running at 550 Mhz goes from 1.6 to 3.15 Mkeys a sec. It seems that RC5 is a small piece of code with a small data set that runs almost completely in the L1 and L2 caches. The dual Celeron scores competitively with even a Xeon 550.

    Now enter the SETI client. I was trying the text mode NT client and noticed it was running much slower than I had expected on my box, so I retimed the same work unit (WU) a couple of times with different configurations. The same PC, which was equipped with only 64 MB of memory - fine for RC5DES, does the WU in about 10 hours and 30 minutes. However, when I ran a second parallel SETI client (The client is generic i386 and does not support SMP by itself) it did two WUs in about 15:50 hours. Major overhead!

    It didn't matter much whether I let the two processes run free or explicitly tied their affinity to one CPU each. In the first case one of the processes was finished about 3 minutes earlier, but total time was still roughly the same.

    So I first added another 64 MB and retimed it. Timing went down to 14 hours and 40 mins. Although I hadn't noticed it swapping and the SETI docs claim the clients take only about 13 megs of mem each, the original 64 MB apparantly was not quite enough to fit both processes completely in real memory, and thus the modest improvement. Then I tried changing the memory timing from CAS 3 to CAS 2 and lo and behold, I was now doing this unit (same unit twice in parallel) in only 12:30 hours each! Nice. Much closer to the 10:30 of a single 550.

    But still not quite as good as dual Pentium IIIs, as far as I can gather from Usenet postings that is. Xeons supposedly have phenomenal SETI scores. And because I overclock, my Celeron runs at a 100 Mhz bus, partly making up for the difference. Normal Celerons running at 66 Mhz bus would break down worse, as far as I can predict.

    I've noticed some people here are quite critical about these dual Celeron freaks. But in some ways our bragging rights are real and these PCs really do counts as 1100 Mhz. In others, it falls down flat on its face. I love my box tho. I learned a lot of stuff about how different operating systems upgrade to SMP, how process affinity works out, which drivers aren't threadsafe, all stuff I can apply when I'm working on serious SMP hardware. Yeah it mostly sits around cracking keys or SETI, but just the ability to run VMWare on a CPU of its own is at least one killer app.

    Michiel
  • (Offtopic)

    I want a StarFire too. But I can't really justify one for 800 users on an Oracle database. They've only given be a multi-node RS/6000 SP2 cluster. And a bunch of big 740-series AS/400s for DB2 (can you say 20GB of RAM). Overall , the system starts to lose performance at 12,500 interactive users. It's still usable at 20,000. But I still want a StarFire.
    (/offtopic)

    I think the total 'multiply processor speed by processors' on that system must be quite a lot. Someone really ought to port Quake to the AS/400. It'd make a cracking server. 5,000 user deathmatch here we come....

    Chaz
  • Ok.. first off, I still am running my dual 450 slot 1 celeron box.. (899 or some such bogomips FWIW, more if I bump em up to 464..) yes, I was one of the old school, hard core, drill the via's and solder the wires on the CPU guys!

    If I remember my kernel compile benchmarks, I would get 60-70% increase over a single processor, the dual compile time was about 1 min 35 sec for the 2.0.36 kernel. Never got around to trying it on (and under) a 2.2... SMP's improved, the code got bigger. Maybe I'll get an 2.0.36 source and compile it on the current Mandrake 6.1 system...

    I wanted to comment on the Beowulf comment.. I'd heard of one, at a small college in Florida (that I can't remember now) that built one on 300a's running at 450Mhz and it tied a Cray for the #1 spot in the POV Raytracing benchmark.. (don't have the site handy, but that would be where to find it) And of course, the price/performance point is probably a record unto itself. Cheap?? When I bought my 300a's back a year ago Sep, they were cheap at $160!! (compared with PII-450's at $750!) You can get 366's for $50 bucks now...

    So, they work, they work well, yes I wouldn't bet the (server) farm on them for mission critical stuff, but for cheap home or research work, there ya go..
  • Doesn't Sandia Nat'l Labs have some sort of machine from Intel that has 9000 Pentium Pro processors at 200MHz each [intel.com]? Does that mean this box is 1.8 terrahertz?

    Heh. (for the sarcasm-impaired)

  • In the computer world, Mhz is the unit we use to measure clockspeed.

    Mhz is a unit used for measuring frequency.

    1 hz=1 cycle per second
    60 hz= 60 cycles per second
    1MHz=1,000,000 cycles per second

    In SMP computers, both cpu's syncronize to the same clock cycle. In this case, that cycle is 550 Mhz, or 550,000,000 hz.
    that means, every 1/550000000th of a second, the processors act. Having two of them act at the same time does nothing to speed this time up. It does acomplish more in each cycle, but there are the same amount of cycles per second.
    So, 2 SMP processors at 550Mhz do not give you a 1100Mhz computer. It gives you a 550Mhz computer that accomplishes more per clock cycle.

    I guess at this point, you are just looking for something to wine about, but maybe someone else will learn from this.
  • According to Apple [apple.com] a supercomputer is defined by the US Military as any computer capable of performing over 1 gigaflop or 1 billion floating point operations per second. This is a far more accurate measure of speed than clock cycles/second (Mhz)
  • Mhz = cycles per second.

    at 550Mhz, the entire machine is running at 550Mhz.

    The fact that it has 2 cpus means it can just do more in each clock cycle.

    Consider if IBM was to put multiple G4 cores on the same die running at 600mhz. People would not say that they had a 1200Mhz CPU.

    The reasoning that 2 CPUs increases clockspeed is similar to arguing that your 386dx-33 is twice the megahertz of your 386sx-33 because it has a bus which is twice as wide.

    OR that a 486dx-33 is twice the megahertz of a 486sx-33 because it has an on board FPU.

    smash
  • by smash ( 1351 )
    heh.

    we do primary DNS and mail relay for about hrm.. 400 simultaneous clients on a p200mmx linux box :)

    ah well :P

    our proxy server for the same clients is a p2-300 with 320meg ram.

    it also does secondary DNS, and sits on a load of about 0.10 pretty constantly :)

    smash
  • Is it really a good idea cracking keys and looking for et's on an overclocked box? I remember reading an article(a how-to? I don't recall) where the author did a couple of fractals and could find a few tiny miscalcs. I can just see it now...

    *number crunching noises from happy o/c'd cpu's*

    setiathome scans: Greetingd earkhoslkings ! takiwe me to ykhour leahjhgyder!
    Nah, just static :-) . Keep looking...

    rc5 decrypts : The secret massage us :
    Nope, that aint the key *sigh* keep searching..


    The above examples are just that ;), I humbly admit to greatly simplifying the above two programs' operations... but you get the point.

    I can't speak from any overclocking experience myself, frickin' MII's are a little unstable at their rated speed anyway... and whose to say intel aren't that good with their overclocking? Remember the fdiv bug? Intel : "Ohyeh ,well, its very rare etc." Now that everyone's into number-crunching, who's to say it's not happening now?

    Just a few stray thoughts...
  • Sorting does (almost) scale. If you start with your favourite order n log n algorithm (such as merge sort), you can then split your list in half, sort each half (a processor each), then merge the two.

    If the original serial algorithm takes kn log(n) units of time, then the parallel step takes roughly half that, since
    k(n/2) log(n/2) = k(n/2) (log(n)-log(2)) ~ kn log(n)/2,
    and the final merge takes time proportional to n, which is small compared to kn log(n)/2 (insignificant, for large n).

    So the total time taken is roughly half the original time.

  • The problem with Celeron CPUs, is that they were not meant to be used in an SMP machine. They don't scale well. Better off with a PIII.
  • Better off paying a lot less. And who cares what they were meant for? :)
  • I know a guy who's already done this and has a functioning 1100mhz system although running win2k sadly. He also has one of those neeto cryotech chilled cases.
  • I tried 466s at 588 and a 366s at 550 for several months. The bus speed makes a huge difference.

    The 466s encoded MPEG video and tested RC5 keys faster than anything but compiling was dog slow.

    The 366s are slower at MPEG encoding and RC5 than the 466s but compiling is light speed faster. You need to get those 366s pretested from a company which has been testing them for a while. My untested pair of 366s was stable running RC5, Seti, and Prime95 for days on end but attempting to composite video at 550Mhz crashed them every time.

    I got a tested pair of 366s and these are stable compositing video. While they run Prime95 at 574Mhz the video compositing crashes them every time above 560Mhz, You need a really small heat sink to fit in the BP6. My dual 550 uses Radio Shack blowers on the default heat sinks and stays at 104F.
  • by jkorty ( 86242 ) on Saturday October 09, 1999 @08:01AM (#1627227) Homepage
    Assembling a dual Celeron 1100MHz is a trivial project, suitable for hardware newbies. Leo LaPorte built one on ZDTV a few weeks ago for about $1500. Erecter Set engineering, bolt together and turn on. Fun to do, but no big deal other than a careful selection of parts with special emphasis on cooling.

    Joe

    Slashdot's new slogan: news for nerdy wannabees. Stuff that's simple.

  • Unfortunately, the Celeron 366 haven't been available for a few months now, save for the few companies that have been hoarding them (and selling tested 366@550 at a premium price).

    The Celeron has a fixed multiplier, so the only way to overclock is to increase the multiplier. 400@600 is not unheard of, but it's also not too common. While it's possible to use a bus speed in between 66 and 100, it's not desired because you'll have to overclock (or underclock) your PCI & AGP bus. That's a Good Thing in theory, but a Bad Thing in reality, because there are a good number of add on cards and hard drives that won't take a higher bus.

    If I were to build an overclocked Celeron system today, I'd buy a single pretested 366@550.
  • Sorry, but I must disagree...
    Mhz is pulses per second, and this is two different cpu's both pulsing at the same exact time 550 million times a second.
    What you are arguing for is like saying a highway with two lanes and a fifty-five mile an hour speedlimit really has a 110 mile an hour speedlimit...
    You can say it is as efficient as a 110 mile an hour speed limit, and it may be that or not.. But it certainly isn't a 110 mile an hour speedlimit..
  • K7 is SMP capable, whether or not people will build SMP chipsets for the K7 is yet another thing...
  • Reread the comment, then realize that the poster should at least attempt to understand what the hell he is talking about, like the difference between L1 and L2 cache.
  • Ok, just to explain myself, L1 cache is on the core, and is 32k on Celerons, and p2's and p3's. On celerons L2 cache is 128k on the same wafer as the rest of the processor, so it runs at full speed. On Pentium II and Pentium III processors, it is external SRAM chips, which is why Intel went to Slot1 (among other reasons). The reason Celerons overclock so well is the integrated L2 cache, because it is the same quality as the wafer. Pentium II and Pentium III's external L2 makes overclocking difficult, which is why celerons are clocked far past their intended speeds, and other's can't.
  • Au contraire, no AMD with the K6 core supports OpenPIC, however the K5 and all post-6x86 Cyrix processors do. When the K6 came out, AMD believed that nobody would ever come out with an OpenPIC board, so supporting it would be a waste of silicon. Unfortunately, they were right (except for CHRP compliant PowerPC boards, and there are very few of them).

    I'm very very glad that the K7 supports multiprocessing, although it is not via the OpenPIC standard. IMHO OpenPIC is (was) an awesome standard, with virtually unlimited processors and IRQ's. But it looks like it has betamax'd by now.
  • Except quake isn't multi-threaded..
    You could get at least 500 people or so on a k7-700 if you had the bandwidth though... Quake servers aren't that processor intensive...
    At least my Quake2 server isn't, runs on a lowely k6-2 300, and gets about 1% load with 6 users on...
    But those six users use up most of my 128Kb cable modem upstream..
  • >And yes, I do have a sense of humor

    And I'm big enough to admit I'm probably wrong. ("eventually"-all).
    I was thinking in terms of cpu clock cycles (in a similar sense to man hours), not clock cycles. I guess there's no intrinsically correct way of looking at it (as the numbers themselves mean nothing on their own in real terms without a whole host of extra info).
    Thus it's best to go with the generally accepted way I guess
    It was still a very good and informative article though, with the exception of the 'misleading' title.
    The Great Chunder Page - Alcohol Induced Fun!
  • going on what i would like to know is wether there is any s/w that will pick up the cpu temp +fan speed etc readings from the ABIT motherboards and display them while you are running your os and apps, maybe you will need support from the kernel. But has something like this been done or is it even possible?
    it had be really nice if you could have s/w monitoring all those readings and turn the computer off or send out admin alerts if the readings reach a user defined critical point.
    sorta like a UPS.
  • You can claim any algorithm scales if you have a simplistic enough performance model. Naively implemented (i.e. without concern for cache locality) sorting scales relatively well using a uniform memory access cost model, but very poorly using a nonuniform memory access model.

    nick
  • The bus speed is not the actual problem. The BX chipset
    is fairly poor at SMP. That coupled with the less than
    optimal cooling of the BP6 on the BX chipset can cause
    instability on some boards when running in 2 celerons.
    Unlike most chips the BX chipset won't get too hot
    before it crashes, so touching it to measure the heat
    wont help you. If your system crashes, the recommended
    solution is to apply thermal paste, flattening the
    heatsink as the previous poster recommended would
    probably help too. (sorry about the formatting, mozilla bug)
  • Finally, some common sense! Thank you.

    MHz has never been a true measure of system performance. Yes, sure, an 1100 MHz single processor celeron would be faster than this dual celeron, but that's a comparison of apples to oranges.
  • I can't see how you can argue that there aren't 1100 * 10^6 potentially useable cpu cycles per second going on inside that box.

    Nobody is suggesting that it is equivalent to a single cpu, not me, not the author of the original article, not by a long shot

    I don't think your analogy fits the situation. A closer analogy would be two people working on a job. You've still got two man-hours per hour, regardless of whether one of them is sitting twiddling his thumbs (performing fast-fourier transforms for Seti@Home in his head) or how much time they spend talking to organise sharing of the workload. Equally you have 1100 * 10^6 cpu cycles per second in that box.

    The Great Chunder Page - Alcohol Induced Fun!
  • Ah.. Thanks for the correction, I was under the impression the K6 core had OpenPIC support as well. Oh well, can't be right all the time (or very often for that matter)... :)

    Hopefully, dual (and quad) Slot A boards will come down in price quickly. Hardcore gamers going for the best Quake 3 experience may just save AMD by purchasing more processors for use inm SMP...

    Blah.. If the above makes little sense, it's because Mozilla M10 doesn't word wrap paragraphs after the second line, and I dont want t fddle with it at the moment. Has anyone reported this bug in this manner?


  • Yes I have one just like this,


    2 x 366 Celerons @ 550 (Week 30 CPUs). Booting reports 1101 BogoMips.


    128 PC100 (-6) Megabytes of RAM.


    Diamond Viper V770 32 Megs VideoCard.


    Soundblaster Live Soundcard.


    Philips 107S, 17" monitor.


    Putting this system together was almost too simple and costed about 13-1400$ (9000 French Francs).


    If anybody is assembling a new system today, I wouldn't hesitade to reccomend this solutilon, the machine absolutely rocks !

    --
    Why pay for drugs when you can get Linux for free ?
  • You must have a problem. If you have 2 celerons, the bogomips should be about 1008. Are you running SMP? Does dmesg say recognizes both processors?
  • In a multitasking system, you really don't need to worry about VMWare running on just one of the processors. It is now possible tho' with a processor-affinity patch to 2.3.x. For processor-bound programs (typical under windows if the program has _any_ performance optimizations) though you will be running pretty much like the single-cpu windows.
  • (shrug) I wouldn't have thought that would make any difference, but I take your point about memory access, cache hits, etc being critical. (Even with cache-challenged Celerons.*) This is how I see it, in a nutshell:

    The time-intensive bit is the initial sorting of the two halves, using an n log(n) sorting algorithm. As these are done independently, the only shared resources are .... the memory and bus!

    So (in a simplistic and naive sense) the hypothetical algorithm scales to at most p processors, where 1/p is the proportion of bus bandwidth (or any shared resource) used in the non-parallel case. Pretty obvious, eh? In this case, p could be increased through faster bus or bigger cache.

    It looks like I have learned something today. It is a pity that it is of no real use to me. 8-)

    [ *-yes, this an attempt to stay on-topic ]

  • athlon (yes, smp ready) and the EV6 mb will make a killer SMP box with nearly optimal scaling (If SMP performance of Alpha's are any indication.) Rumor is that Tyan is working on a dual Athlon MB for release in the next 2 months. These people, http://www.hotrail.com/main.htm, are working on 8-way boards.
  • I decided to use the "slotket" approach: a small adaptor board that allows a Socket-370 Celeron to be used in standard Slot-1 motherboards. My assembly & test notes are online here. [209.233.19.231]
    By using the slotket, I am able to "upgrade" to a non-overclocked PentiumIII (or maybe Coppermine) when those CPUs become cheap enough. Until then, the dual-300A processors overclocked to 450 really cook!

  • Until they find an Alien that is. Look who'll be laughing then...

    -
    /. is like a steer's horns, a point here, a point there and a lot of bull in between.
  • ...but it's been a lot of trouble, particularly the converter cards for my Slot 1 mobo. Oddly enough, when I took one of the procs off it seemed to do everything just as quickly, with the exception of RC5.

    Therefore, I'd have to say that there's no reason for a normal Linux box to have dual processors. Come to think of it, my P-120s run WindowMaker pretty quickly.. maybe one of those is all most of us need.

  • The combined CPU power is about equal to having (in theory) around a 950mhz CPU. I quite often hear people call the dual celeron systems Gigahertz machines, but in reality, its not quite that fast, albiet not too far away!
  • just kidding...
    J.
  • This is exactly how I made mine. I'm using an Epox KP6-BS dual socket1 board now, and I won't be stuck with a dual PPGA board.
  • - The guy used 128MB of RAM.. IMHO, dual CPU system deserves 256 at least. Yes, I know about prices. Sad.

    - "18GB Western Digital Expert 7200rpm UDMA 66 hard drive (Linux only supports UDMA33!)"
    Oh dear.. It goes on and on - linux doesn't support these, linux doesn't support that, does it bad, does that even worse.. Probably revolution need some more people. Me for once ;-)
    Can someone tell (in 25 words or less ;-) how to make proper diffs, where to send them and what to do next? Some pointers to good places with hardware specs are also welcome.
  • As you said, it depends on the problem at hand. Emacs is not really threaded, so your typing will still lack on a 550MHz if that's the kind of problems you're having :)

    Still:
    *) Many systems run several serial compute jobs.
    Those will run on each their CPU, and thus you have close to 100% speedup (give and take some)
    *) Even if you run only one CPU intensive serial job, if it's a workstation it will feel as though nothing was running on it at all. That's nice.
    *) For completely serial tasks, such as compilation, make will start a number of jobs for you, and again, you have good speedup.

    You will most often see below 100% speedup, because the CPUs share memory bandwith and disk I/O. But in some cases (where problems fit in L1/L2 cache) you see superlinear speedup because both CPU intensive tasks fit in the (not shared) CPU cache, and those other maintenance jobs will only destroy half as much of your total L1/L2 cache as it would with half the CPUs.

    I have a dual at home and at work, because C++ compilation is slow.

  • A quite timely article I must say since I recently started to seriously think about putting together a Dual Celeron machine myself. :)

    However, I don't think I'll overclock it, I imagine I wouldn't be able to stand the noise from all those fans... :(
  • I have mine running dual 400MHz celerons @ 500mhz, rock solid. 600MHz boots but it does freeze within minutes. Dual peltier fans keep the CPUs nice and cool, while the sides of the case are kept off for improved circulation..

    Definitely pick up the BP6.. for its subversive element if for anything else.
  • Just one question... how did you get the SBLive working with a SMP kernel???? I haven't had any success with that. Uniprocessor kernel works fine if I force the module to load.
  • ya, ok, they only have a 128kb cache, but contrary to PIIs & PIIIs, the cache runs at the cpu speed, not the bus speed. On a P2,3 100mhz bus system you'll have the 512kb cache running at 100mhz, thats it. Celerons have that 128kb cache at the clock speed, in your case, 504mhz, i.e. that little 128kb is blasting the p2,3 cache by 5X!
  • Bunch of whiners. Before spouting off why don't you try it? I have a dual 333 overclocked to 750 Mhz (yes, that's 375 + 375). It does the standard POVBench in 1'30". When I split the work into 2 equal loads (~1/2 half of the picture), run them simultaneously, and splice the picture into a whole, it takes exactly 45". That's right math geniuses, exactly twice as fast! Obviously there is no resource contention.
    Linux SMP makes for a profoundly faster system - more responsive and multitasking than single cpu. I highly recommend it to all linux users, especially when it can be done so cheaply.
  • I have a question about something that I read in this article
    ---------------------------
    5.Consider upgrading to kernel 2.2.13 or above. I upgraded to this kernel after writing the article and it seems to eliminate some Xwindows programs hanging up.
    ----------------------------

    What is this? I thought the highest stable (2.2.x) series kernel was 2.2.12. (At least the last time I checked kernelnotes.org)
  • > However, I don't think I'll overclock it, I
    > imagine I wouldn't be able to stand the noise
    > from all those fans... :(

    Yes, it does get rather loud when I take the cover
    off. I don't even know why am using fans, my
    CPU's aren't overclocked.
    I personally put together my machine around the BP6 because it was a good cheap board. Dual 400 Celeron's may not be exactly twice as fast as a single 400 Celeron, but it's pretty nice.

    The one thing that I've noticed is that a lot of program makefiles aren't set up to be able to take advantage of SMP machines. (make -j #). -- Like XFree86 -- But, doing fun things like recompiling the kernel can be done rather quickly.
  • That's what it seems like to me. David Green says at the end of the article that the machine is running fine while doing IP masq for a small network, and running a mail server for his wife and kids. Unless this guy is getting/sending a lot of mail, or by small network he means large, this is overkill!

    I have a pentium 200 that does more than this. Talk about cheap! IP masq/firewalling takes practically no CPU (486 anybody?) and at the quantities of mail he is likely to generate/receive, an SMP system is overkill. Probably would run fine on a 486. Sigh.

    Rant mode on:

    It's funny how two events got me out of the vicious cycle of buying hardware. One, I quit playing games. I know it sounds extreme, but there just seems to be better things to do w/ my time. The other thing was switching to linux. Things just don't seem bad enough to upgrade my hardware anymore.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...