Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Hardware

AMD Athlon Multi-Processor Under Linux 108

An Anonymous Coward writes: "Just saw this review at GamePC. It's a pretty extensive review of AMD's entry into the multiprocessor arena, full of exciting benchmarking results. The full text is here."
This discussion has been archived. No new comments can be posted.

AMD Athlon Multi-Processor Under Linux

Comments Filter:
  • by Anonymous Coward
    Actually, your comparison is utterly ridiculous, unless you were compiling an x86 kernel on your Mac with a cross-compiler.

    Different GCC back ends (and the different default compiler options for the PPC kernel build vs the x86 one) make your numbers 100% useless for any kind of comparison.
  • by Anonymous Coward
    When Q3 was released the P4 wasn't even available.
  • by Anonymous Coward
    Well, I've got some happy news for you, twice sound sources giving of the same amount of noise very close to each other do not double the amount of sound, it only gives you 3dBa increase.

    Anyway, palomino is supposed to produce less heat than older athlons, so here goes another happy surprise, you can get little less noisy fans!
  • by Anonymous Coward
    On the kernel compilation tests he says "we can definitely see where AMD's superior FPU and number crunching come into play". This doesn't make sense to me. GCC is probably not doing any FP calculations at all while compiling the kernel!!! GCC is a well known, hard to optimize *integer* workload, not floating point (FPU and the usual instruction type for number crunching)
  • by Anonymous Coward on Thursday July 12, 2001 @12:56PM (#88680)
    FYI, NetBSD boots and runs on dual-CPU Athlons. You just have to build your system via the (expirimental) nathanw_sa CVS branch. Here's a dmesg posted to the NetBSD tech-smp mailing list:

    http://mail-index.netbsd.org/tech-smp/2001/06/05/0 000.html [netbsd.org]
  • AMD have had MP technology in their processors for a long, long time, including the K6. It's just no-one built motherboards for it. The Athlon series has been the first time they'rve had the strength to do their own motherboards, too.

  • the K6 family never supported the OpenPIC standard. The K5 family DID, however (granted no chipsets were ever developed for OpenPIC). It is widely assumed that the K6 also had OpenPIC built-in, but that is untrue. They did not support any type of SMP operation, even "in theory."
  • As for scientific applications, you need load balancing regardless of whether the processors running the threads are in one box or in several. This is usually handled transparently by the OS, the compliler, the communications library, or a combination of the above (usually all of the above).

    In my experience with high precision RF simulations on massive SGI multiprocessor machines, the OS (IRIX), the compiler (MipsPRO), and the communications package (OpenMP) do very little for you automatically. In order to achieve decent scalability you need to take your computing needs into consideration when you develop the architecture behind your simulation. If you, as the developer, don't distribute your calculations as evenly as possible, your simulation is going to run very, very poorly. It will become more noticible how poorly your simulation runs as you throw more processors at the problem.

    If you are dealing with simulations requiring high levels of internal messaging, you definitely gain from a multiprocessor machine. Bus throughput is much higher than network throughput, especially because data must be specially packaged before it is sent across the network. Again, you must design your simulation architecture specifically for this application. Frequently used data should be close to the process that needs it. If the data resides on only one processor (or machine in a cluster, as the case may be), you will deal with a high cost in accessing that data from any processor or machine to which the data is not local.

    If you are dealing with a simulation that has high internal messaging and relatively little computational work, you are actually better off using fewer processors/machines. Of course the optimal amount depends on the application, so your milage may vary. And, of course, none of this means squat if you don't plan for spreading the work and localizing messaging and hope everything will be taken care of automatically.

  • You're the 3rd post after the FreeBSD post, I think you're losing your touch. I remember the days when the the *BSD is dieing trolls were all appearing instantly in relation to any BSD post. Things just arne't what they used to be. For all intents and purposes the *BSD Troll is dead.
  • One processor could be running civclient [freeciv.org] while the other is running civserver [freeciv.org]. :-)
  • You are absolutely correct about how the MySQL "benchmarks" work. MySQL's crufty benchmarks are a Perl script that simulates one user doing a whole lot of goofy things like dropping tables and dropping and creating connections. Quite frankly I can't think of a single benchmark that would be less useful for benchmarking a SMP system. I was actually surprised when the folks making up the benchmarks didn't simply add up the respective bogomips scores from each processor and call it a day.

    I generally don't criticize benchmarking tests like these because I don't consider myself knowledgeable enough, but even I know that running one instance of the MySQL benchmark is not a good SMP test.

  • The ppc backend is more computationally expensive than the x86 backend, in my experience. gcc runs fast on athlons because the x86 codegen doesn't take as much time as ppc codegen. (I'm 90% sure about this, since I didn't check it carefully or anything, and it was a while ago. I used gcc 2.94 when I tried a bit of stuff.)
    #define X(x,y) x##y
  • If you used Compaq/Digital's optimizing C compiler, you obviously found that the compile took a long time, because the compiler spends a long time scheduling the instructions for the Alpha's pipeline and exact insn execution capability. I read somewhere that the compiler actually simulates an Alpha running the code to see what it can do to make faster code. No wonder it takes a long time, and no wonder the code it generates is so good.

    Compilation speed is nice, esp. when developing software, but you can usually get that by turning optimization off. When you're compiling something that will eventually use more CPU cycles than it took to compile, it's ok if the compile takes a long time, as long as the compiler does something useful with that time!
    #define X(x,y) x##y
  • Since Athlons use the Alpha bus, does the recent purchase of the Alpha line of processors by Intel affect AMD at all? (ie. Intel now owns the Athlon core)

    Just wondering if anyone knew.
    ------------
    a funny comment: 1 karma
    an insightful comment: 1 karma
    a good old-fashioned flame: priceless
  • Clock speeds on two different processors mean
    different things, so I would say yes it's fair
    since what they're comparing is top end
    vs. top end, not efficiency per MHz.

    Mac users used to play this silly game all the
    time: PPC is faster per MHz! Yeah sure, but
    if you can't buy them at the same higher
    speeds as Intel CPUs then it doesn't really
    matter. In fact, that efficiency may be why
    they can't manufacture faster clocked processors.

    -Kevin
  • I agree that the limitation with MySQL is probably with I/O, but to answer your query, MySQL is multithreaded, but I've had to really pound on it in order to need more than one CPU on a four headed Sun box.

    -"Zow"

  • if you think that buying a dual whatever board is going to be more cost effective than a Abit BP6 with a couple of cheap celerons, you are nuts.

    At least that is my opinion.
  • In my experience with high precision RF simulations on massive SGI multiprocessor machines, the OS (IRIX), the compiler (MipsPRO), and the communications package (OpenMP) do very little for you automatically. In order to achieve decent scalability you need to take your computing needs into consideration when you develop the architecture behind your simulation. If you, as the developer, don't distribute your calculations as evenly as possible, your simulation is going to run very, very poorly.

    What the original poster was referring to, if I understand correctly, was distribution of processes or (in his case) web page requests across hardware nodes. This is handled by the OS and/or libraries in the setups that I've seen (admittedly few).

    I definitely agree that the task has to be properly parallelized in the first place, but that wasn't what I was responding to.

    Regarding the internal bus in a SMP system being faster than the communications network in a cluster, that's what I'd thought too. Then my prof got me to run subsets of the SKaMPI and NPB 2.3 benchmark suites on our cluster (MPI libraries on top of SCore running on a bunch of dual-processor Linux boxes). Running on 2nx1 nodes was faster than running on nx2 nodes for almost all of the tests. Caveat: this was for small n, and I can't guarantee that it wasn't just due to lousy process distribution or lousy SMP communication routines in the libraries.

    YMMV.

    For anything with a high communications load, SMP almost certainly wins, but for moderate communications loads, my experiences have made me leery of it.
  • by Christopher Thomas ( 11717 ) on Thursday July 12, 2001 @01:02PM (#88694)
    Has anyone done a cost-efficiency comparison of dual-cpu performance vs. a simple cpu when considering the costs involved (special SMP boards, etc.) In otherwords is it more economical to buy two web servers or one smp server with tons of ram? Do certain applications (cpu intensive obviously) save money with SMP systems verus others that depend on IO throughput, etc and what applications are those?

    Any task that is easily parallelized and has low internal communications requirements would run more effectively on multiple servers than on one SMP behemoth. Web serving has zero internal communications requirement, and so falls into this category. Things like ray-tracing have low communications requirements when partitioned properly, which is why you use clusters as render farms instead of massively parallel Big Iron.

    SMP has overhead from coherence operations, and more complex and expensive chipsets.

    SMP benefits tasks that lend themselves to shared-memory implementations. It's a lot easier to toss ownership of memory pages back and forth inside an SMP machine than it would be to send modified pages back and forth across a network. I don't have examples of this kind of task offhand, but I'm sure they exist.

    All of this is for CPU-bound tasks. For I/O bound tasks, you're still better off splitting it up into multiple machines if it's easily parallelized, but again I don't have good examples to illustrate with off the top of my head.

    For more information, pick up a couple of good books on parallel computer architecture and parallel programming. Your local university's bookstore will stock these.
  • by Christopher Thomas ( 11717 ) on Thursday July 12, 2001 @05:10PM (#88695)
    Any task that is easily parallelized and has low internal communications requirements would run more effectively on multiple servers than on one SMP behemoth.

    I have several problems with this generalization. First, parallelizing over multiple servers always adds overhead (in both $$$ and performance) of its own. How are you going to spread a load over multiple web servers? You need a load balancer, either the dedicated (pricey) hardware kind or a standard server converted over to load balancing service (which doesn't get you the greatest speed or scalability in the world). Even in a scientific application that you spread over several boxes, you need some kind of load balancer or traffic cop to get an equitable distribution of work.

    You make a valid point, in that load-balancing is an issue. However, I'm assuming that in the case of a web server, if you have enough traffic to need more than one server, you have enough money to buy a hardware load-balancer to spread out requests (and a hardware firewall, if management has any sense).

    As for scientific applications, you need load balancing regardless of whether the processors running the threads are in one box or in several. This is usually handled transparently by the OS, the compliler, the communications library, or a combination of the above (usually all of the above). This is standard for any high-performance computing project, and so doesn't add to your maintenance overhead. It also doesn't contribute substantially to the processor workload, so I don't see it as much of a concern for scientific workloads.

    Second, let's not forget that two-way (and even four-way, if you were in the Xeon market to begin with) boxes have gotten much cheaper in the past year or two. Most of the important server availability features, like hot swap drives, hot swap power supplies, ECC RAM, 64 bit PCI, etc., are almost impossible to find on 1-way systems these days.

    The last time I checked, n-way systems for n > 2 were still far more expensive than n one-way systems, but I haven't checked within the past couple of months. This might have changed, but I doubt it.

    N = 2 was marginal, if I remember correctly.

    ECC RAM support is available on several single-CPU motherboards; check your favourite vendor's site for a list of options (admittedly pricier than most of the boards, but not horribly so).

    I'm assuming that hot swap power supplies aren't relevant. Your load-balancing hardware or (for a cluster) software will be able to detect malfunctioning nodes; this is essential for any cluster of significant size. A supply failing would be no different from any other component failing from a maintenance point of view (bad node is cut out of the loop by the load balancer, the hardware person gets paged, the node is swapped out and the old node serviced or gutted for parts).

    PCI-64 support is a good point. If you have to support PCI-64, then it probably makes sense to build your cluster out of dual-CPU nodes, because the incremental cost of getting a dual-CPU motherboard will be low. Quad-CPU and higher will probably be less economical (quad cost diamonds the last time I checked). You'd only need PCI-64, though, if you either had a very large communications requirement (multiple very fast network cards per node), or if you were mounting a large RAID on the node (many controllers, many strings). In the first case, I can weasel out by claiming that you're outside of my stated problem domain (low communications bandwidth) :). In the second case, you're looking at one of a handful of disk nodes within a much larger system (in all likelihood). For non-disk nodes, you wouldn't need PCI-64. For clusters that distributed disks over many nodes, your I/O bandwidth needs would be adequately served by PCI-32, and PCI-64 again becomes unnecessary.

    It's nice to get an interesting response, though :). You've made me think about the problem in more detail.
  • Yup! it actually is quite good.

    2 PIII 866 processors - $129.00 ea
    1 ASUS mobo to support above - $129.00

    result? Un-fricking-believeable.....

    Really, The motherboard in it's self is massively faster than the regular ASUS mobo. Dont know why but I get a major speed difference that I can actually feel.

    Now activate the 2nd processor... Woah!
    Frames render almost 2 times faster with povray and BMRT is even faster.

    And this is with el-cheapo 866 processors. The mobo says it will handle 1.2Ghz.

    I havent had this much of a speed-up cince I went from 486 to pentium.. it's about time htat feeling of "WOW" came back to computing.

    Now, if you are a regular user, it's probably a waste... but the fact that I can actually preview my video clips within 2 hours is awesome! (Now if someone would only port poser to linux I'd be really happy!)
  • It's been less than two years since the Athlon has been released. How impatient are you?
  • Any word on whether or not BeOS R5 runs on SMP Athlons? (no flames, please)
  • Two noisy athlon coolers in my bedroom

    I have two Athlon heaters in mine, which is very unpleasant in July. Trade ya?


    ---
  • Or am I the only one still old enough to remember that?
    damn, slashdot readership must be young. even my 12 year old brother knows that.
  • anyone want to mirror the article? My employer's proxy has blocked the site because of the word "game" in the URL. Yes, they are that lame.

  • > There were a few AMD CPUs that *should* have been recalled.

    IIRC the original runs of the K6, or maybe the K6-2, had a bad problem, and though there wasn't a recall, they would give you an exchange for it if you asked.

    There may have been others, though I don't recall hearing about them. Intel, on the other hand...

    > You'd better stick to the price argument.

    Not at all, though the price would be argument enough by itself.

    --
  • by Black Parrot ( 19622 ) on Thursday July 12, 2001 @12:41PM (#88703)
    > As much as the Athlon people tout their shit as superior, and it took them HOW long to do SMP ?

    What's the ratio of AMD processor recalls : Intel processor recalls ?

    --
  • How the fuck is that flamebait?
  • John Baldwin (@ FreeBSD.org) managed to land himself a dual Athlon board as long ago as April. Apparently it booted 5.0-current first time [freebsd.org].

    Highlights of the dmesg for those who like that sort of thing:

    Copyright (c) 1992-2001 The FreeBSD Project.
    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
    FreeBSD 5.0-SNAP-20010419 #1: Fri Apr 20 14:59:46 PDT 2001
    root@:/usr/src/sys/compile/GUINESS-smp
    CPU: AMD Athlon(tm) Processor (1194.68-MHz 686-class CPU)
    real memory = 1073741824 (1048576K bytes)
    FreeBSD/SMP: Multiprocessor motherboard
    cpu0 (BSP): apic id: 1, version: 0x00040010, at 0xfee00000
    cpu1 (AP): apic id: 0, version: 0x00040010, at 0xfee00000
    io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000

    Whohoo!

    Dave
  • by JohnZed ( 20191 ) on Thursday July 12, 2001 @03:09PM (#88706)
    While you definitely have a good point about applications that lend themselves to multiple-box clusters rather than SMP, I don't think you should make such a blanket statement as:
    Any task that is easily parallelized and has low internal communications requirements would run more effectively on multiple servers than on one SMP behemoth.

    I have several problems with this generalization. First, parallelizing over multiple servers always adds overhead (in both $$$ and performance) of its own. How are you going to spread a load over multiple web servers? You need a load balancer, either the dedicated (pricey) hardware kind or a standard server converted over to load balancing service (which doesn't get you the greatest speed or scalability in the world). Even in a scientific application that you spread over several boxes, you need some kind of load balancer or traffic cop to get an equitable distribution of work.

    Second, let's not forget that two-way (and even four-way, if you were in the Xeon market to begin with) boxes have gotten much cheaper in the past year or two. Most of the important server availability features, like hot swap drives, hot swap power supplies, ECC RAM, 64 bit PCI, etc., are almost impossible to find on 1-way systems these days.

    Finally, there's a huge difference between up-front cost and maintenance costs, with the maintenance usually being more expensive. If you double the amount of rack space you need, double the amount of power you need, and put in the effort to keep both systems perfectly in sync, you'll quickly find that you've blown away that little savings you got at the cash register.
    But, on the other hand, I agree with that this business of benchmarking web servers with like 8 and 12 CPUs (where things really get into a different pricing league) is a bit silly.

  • How the fuck is that flamebait?

    It's Slashdot, you're going against the pro-Linux orthodoxy... you mean you're surprised?
  • yes, I know that there are very significnt variations which make this an unscientific number.

    However, the kernel I built was a fairly well-provisioned one with plenty of drivers (USB, networking), filesystems, ... so I expect my build shouldn't be unusually fast.

    My experience is that the other stuff one builds often has a relatively minor effect on the total time, so the number is probably not a crazy comparison.

  • by Mendenhall ( 32321 ) on Thursday July 12, 2001 @01:03PM (#88709)
    Not knowing these benchmarks were available, I just spent today compiling 2.4.6-smp for my Dual 500 MHz G4 PowerMac. My complete kernel rebuild time, using 4 jobs, was 3min,10sec, putting it ahead of the Dual-PIII/1GHz but behind the Dual Athlon/1.2GHz. I was very pleased with this speed.
  • It seems pretty strange to me that the reviewer decided to run their benchmarks on a 2.4.2 kernel. AFAIK there's been quite a few patches going into the kernel since dual Athlon hardware actually became avaliable to improve stability/performance. Surely a more recent kernel would've been a better choice?
  • Problem is your not compiling the same thing. To get a real comparison you would have to compile the with their .config file. At a minimum you would have to make sure you are compiling all the same drivers they are.

  • Ah. Thanks. That kind of explains the lack of improvement in the benchmark then for dual CPUs. All it means is that the perl script ran on one CPU and MySQL on the other.
  • by throx ( 42621 ) on Thursday July 12, 2001 @01:07PM (#88713) Homepage
    Quote about freezes: "which could have been caused by either 1) nVidia driver problem (which still has a few known SMP bugs still in the latest version) or 2) the AMD 760MP chipset."

    Or a whole slew of other things like cooling, SMP problems in the IDE driver for the 760, plain bad luck that you got the 760MP both times etc. etc. Without actually nailing this down as to what specifically causes the problem you can only make VERY vague guesses about what the problem is.

    Quote from compiling the kernel: "Here, we can definitely see where AMD's superior FPU and number crunching power come into play."

    When did gcc actually use ANY floating point code. Does this guy actually understand what he's benchmarking? All sorts of effects can slow down a compile, from memory bandwidth to I/O bandwidth as well as CPU speed. It was nice to see the Athlon beat the P4, but what CPU was gcc optimised for when IT was compiled (just curious)?

    Quote from MySQL bench: "A real surprise occurred when the single processors faced off. The Athlon not only soundly beat the P3, but actually also managed to beat the dual Athlon by a little over a minute. This does seem a bit odd because going from a single P3 system to a dual P3 system decreased the time buy a good 10 minues. This could be another example of the maturity of Intel's SMP solution versus AMD's."

    It is more likely that the issue is somewhere in the I/O bandwidth chain. SQL tests tend to stress I/O bandwidth more than anything else - I'd be looking at the drivers before claiming that there are issues with the 760MP. Is MySQL multithreaded anyway so it can take advantage of dual CPUs? Most of the tests seem to show that only the OS is getting any advantage from the dual CPUs.

    Quote from Blender: "It is surprising to note, however, that the Athlon, despite running 500 MHz slower than the P4, still managed to render blacksmith.blend at least a tenth of a second faster."

    No, it's not surprising. Even Intel says that x86 floating point code is slow on the P4. If Blender was rewritten to use SSE-2 instructions rather than x86 FPU instructions then I'd almost guarantee a 50% improvement in P4 scores. I'm not defending the P4 here - just saying that the P4 giving cruddy results is not surprising.

    Kudos to the author for the journalistic integrity to correct his error about NT and SMP. Anyone can be wrong - few journalists ever admit it.

    Anyway - those are my thoughts. Debate them as you will.
  • OK, maybe this isn't so much an argument for dual processor servers. But I'm now absolutely sold on dual processor machines for desktops, and won't go back to singles. Why?

    Every so often a process dies horribly, and camps on one of your processors chewing up cycles like there's no tomorrow. Depending on what sort of development you do this may be a rare experience or a common one. And on a single processor machine, sorting it out can be painful, because everything is responding like thick treacle.

    On a dual processor machine, everything still works just fine, and you go in, identify the messed up process, and kill it.

    When I'm working I normally have

    • Oracle
    • Postgres
    • Apache
    • JServ
    • Tomcat
    • Weblogic
    • XEmacs
    • assorted java compiles

    all going at the same time. My twin processor PII/300 definitely feels a lot more responsive under this sort of load than the Athlon 500 on the next desk. Mind you, the Athlon box was a lot cheaper!

  • You need a load balancer, either the dedicated (pricey) hardware kind or a standard server converted over to load balancing service (which doesn't get you the greatest speed or scalability in the world).

    No you don't, you need BIND (which is free) and multiple A records (which require a clueful network admin, but are otherwise free). We get within 0.2% of perfect load balance with this method on our server farm.

    Cisco LocalDirector is for people with two-tier Windows NT setups who don't know any better.

  • from the reviews i've seen so far, battery life isn't that swuperior to x86 machines....
  • given a reasonable amount of RAM, i find 'make -jN' fastest, where 'N' = number of CPUs +1
  • is it right to compare several systems of *COMLETELY* differing speeds???

    perhaps a single/pair of 1.4GHz athlons against a single/pair of 1.4GHz intel chips maybe with a few thousand dollars' worth of xeon thrown in for a reasonable high-end comparason?
  • what use is a SMP box for gaming when there are no SMP games? i know, i know, Quake3Arena runs SMP, but only under WinNT/2K, iirc.

    am i off the mark? are there any games that can/will take advantage of SMP linux?
  • what use is a SMP box for gaming when there are no SMP games? i know, i know, Quake3Arena runs SMP, but only under WinNT/2K, iirc.

    Well, seeing as Win9x doesn't support SMP for anything that's not too surprising. The real problem with SMP support in Quake3 is that it relies on the graphics card drivers having SMP support. Most don't - NVidia are supposed to, but I've heard it's buggy

    That doesn't mean a SMP box is useless as a gaming machine though. I bought an ABit BP6 (dual socket 370 motherboard - stuck two celeron 333s in it and overclocked them to 550). Mainly because it was an excellent deal at the time. People have said to me before "there's no point you having a SMP box for a desktop/gaming machine". From my experience of using that machine I disagree. It was generally a lot more responsive than single CPU systems, and when I had a very CPU intensive task like a game running I could happily switch to another task and actually use my machine! Nowadays you can buy 1.5GHz chips which are so much more powerful than my dual celerons, but at the time it rocked. Now it makes an excellent desktop machine for things other than the latest games, or a great little server.

  • Be very careful trying to compare compilation times of two different archs. To really do this comparison right, you need to put together cross compilers for each arch. Run the x86 cross compiler on the G4s and the PPC cross compiler on the Athlons. Then compare the numbers and be prepared for a shock. The PPC cross compiler running on the Athlons are likely to smoke the the G4s (as truely cool as those machines are ... drool drool) terribly. The integer and logic speed of the Athlons with the DDR memory is mesmerizing.
  • So what you're saying is that no one should buy a P4? I say I must agree. The brilliance of the Athlon arch is that it runs legacy compiled code faster than Intel processors. To get the architectural speed improvements of the P4, you have to recompile your source code. Are you going to recompile every bit of software you run? So the conclusion is that no one should buy a P4 unless they have an entire distribution compiled for it. The Q3 tests came out smelling like a rose on the P4 because they have a lot of SSE code in them so they will run well on P3s, and at 1.7GHz, there is no way the P4 would get beat by the Athlon running their own instructions at 500MHz faster clock rate.
  • I think hypertransport is intended as a PCI/AGP replacement - not for connecting the CPU and north bridge (memory controller) - of course pushing the memory controller onto the CPU probably isn't far off so hypertransport may be the only bus floating around the CPU after that (which of course will make SMP architecturally more difficult)
  • Heh.. if your going to read stuff from flat files, then you'll probably never hit the cpu limit before disk. If you had a real database the story might be a little different. That is, I'm guessing you don't do many calculations on the data considering the lack of (easy to use) triggers / functions, subselects and other wonderful database tasks that push the CPU. In you case, push the frontend -- as thats where all the work goes on to implement the above features. Then again, perhaps your just running a forum based website and as such don't require a database -- but use MySQL for ease in storage / retrieval of information rather than integrity / reporting.
  • "Since all of the hardware and software, save the motherboards and processors, were the same on each of our systems, it should be safe to conclude that the differences in performance are primarily CPU related."

    Well, I saw two different harddrives on the list, one was 15000 RPM and the other was 10000 RPM, that could have an effect on performance.
  • MySQL spawns another child process for each connection to the database. If you use one mysql_connect() in your code, then you'll be communicating with only one mysqld.

  • Who needs a dual board? No dual Celeron BP6 config can hold up to today's GHz uniprocessor systems, and most of them are a lot more stable than the BP6, especially that temperamental HPT366 controller.

    --

  • You're getting a few things confused:

    1. The BH6 was a uniprocessor motherboard, and I oughtta know, since I had one and used for Celeron overclocking until it died a few weeks ago. The CPU (Cel. 366A) lives on in a different BX board. The BP6 was the Abit board that could do dual PPGA Celerons, but it can't handle more than one FC-PGA processor of any kind.

    2. There was never a PII 600. PII's maxed out at 450, then Intel moved over to the Katmai P3's, which were PII's in every respect, discounting the SSE extensions. Katmai effectively ran up to 550, although there were a handful of 600's, which were really just overclocked 550's that Intel shoved out the door as a preemptive strike against the Athlon. AMD countered with a 650 at launch time, and thus began The Chip Wars...

    Don't get me wrong -- I have a dual P3-800 system and I love it. It's good at the kind of stuff that I like to do at my workstation. Beside it, I have a 1GHz Athlon, which is good in its own right for playing games (and running that icky Win98), and beside that is the old Celeron I mentioned, to serve them both with files, DNS, and all that other crap.

    --

  • make your numbers 100% useless for any kind of comparison

    Well, it does tell us how long it takes him to build a "standard" kernal on a PPC machine vs. on an Intel machine. So it's a TINY bit useful (most benchmarks are 97% useless, while this is 99%).

    Especially with software that you have source for (and that has a gazillion compilation options) almost ALL benchmarks have very little relation to the real world use of a machine. It is entirely likely that with just a compile or config option or two you could DOUBLE (or more) the speed of an app like MySQL or GCC. And these options usually have different results for different platforms / processors / configurations / test scripts, so you have to decide if you want to use the same set of options for both (more "scientific"), or use whichever works best on each setup (more "real world").

  • and what applications are those? I'm really interested in knowing with better evidence than "well, I think..."

    Unfortunately, "I think" you'll just get vague estimates because your question is not sufficiently specific to provide the kind of answer that you demand.

    That is, exactly what application will you run and under exactly what kinds of circumstances?

    The hardware performance depends on so many variables (CPU speed, cache hits, main memory size, main memory BW and main memory latency, network card, disk controller, disks) that different applications will be limited by different parts of the hardware.

    Worse, the same application could be limited by different parts of the hardware depending on the tasks demanded of it.

    As a friend of mine used to say to difficult questions like this

    "The answer is four."
  • The sad thing is, someone probably will mod you up as funny. A mod you definitely do not deserve for such stupid blatant idiocy

    Yawn. I've got *plenty* of Karma to burn.

  • I agree! It actually was pretty funny. Mod it up!
  • by zpengo ( 99887 ) on Thursday July 12, 2001 @12:29PM (#88733) Homepage
    Something something Beowulf something something Quake something something cluster something.

    Okay, mod me up as "insightful."

  • Two important advantages of SMPs:
    • CPU load balancing: If you're using many servers instead of one, you need to make sure that your load is spread evenly across the systems. For some applications, load imbalance can significantly reduce efficiency. An SMP can dynamically balance between many tasks, efficiently utilizing the processors.
    • Sharing other resources: SMPs allow the jobs to dynamically share other (non-processor) resources such as RAM, swap, disk I/O bandwidth, etc. For example, a web server can share a large file cache between all the server processes, making efficient use of the DRAM.
    One of my favorite uses of smaller SMPs is parallel compilation via make -j. You can launch 2n jobs on n processors and get good speedup. This does not work as well on a cluster setup. You can do it, it is just a bit of a pain.

    Of course this all depends on the cost premium of an SMP over uniprocessors. This is more of the market effect of economies of scalar rather than a fundamental issue with SMPs. If everyone bought SMPs, they would have at worst a small cost premium. SMPs could even be _cheaper_ (per processor) since you can share the DRAM, case, power supply, keyboard, video card, and maybe even monitor (unless you run headless servers).

  • My tests have shown that N = number of CPU's is faster. Any higher number actually slows down compile times. I did the test with a simple benchmark script that I wrote standardized on Red Hat 7.1.

    http://www.amdmb.com/lkc/ [amdmb.com]

    Test Results in Minutes
    Abit VP6 dual CPU Pentium III 1GHz machine.
    J2 5.55
    J3 7.15
    J4 7.2667
    J5 7.45
    J6 7.45

    Asus A7V single CPU Athlon 750MHz

    J1 10.26
    J2 10.93
    J3 13.61
    J4 13.71
    J5 14.3
    J6 13.88

  • I don't know about anyone else, but I'm planning to hold off on buying new hardware til the Athlon 4 desktop version hits the market. How many of these new Athlon-MP mobo's are gonna be compatible with it? I've seen way too often a new CPU will come out and nothing will support it for a month after that.
  • i posted the following to comp.os.linux.hardware a few weeks back. the numbers are lower, reasons being: beta tyan board, mem speed is only 119mhz for some reason(not 133), producing 1071 mhz cpu speed. the demo for q3, which is older than that used in this test.


    enjoy the following hardware is identical under both conditions. visiontek geforce3 the quake3 free demo was used. for your enlightenment, i will post my Quake3 demo benchmarks under linux on the following boards. the first is the Tyan S5250, an 840 board, with dual p3-1ghz, and 1gig rambus ram. 2.4.5 kernel, XFree86-4.1.0, nvidia binary drivers 1.0-1251. agpgart was not used, as it was slower than nvidia's nvagp support using agp4x, probably because the 840 isn't that well supported under agpgart. using hi-quality settings under q3 and doing demo001, i got 119fps. on a Tyan S2462, a 760mp dual athlon board with dual 1066mhz chips(are 1.33ghz chips, but posted at 1066 for some reason), with 256 meg ddr sdram running at 266. same 2.4.5 kernel, XFree86-4.1.0, nvidia binary drivers 1.2-1251. once again agpgart was not used, as it complained that it was an unsupported chipset. doing agp_try_unsupported=1(or whatever it was, can't remember, but it was the right thing), didn't work either. using nvagp listed it as generic amd chip with support for agp4x. i edited the nv-registry.c(or whiver one it was) to include sba support, which only took on this chipset. using hi-quality settings and doing demo001, i got 133 fps. both these seemed low to me. possible reasons are poor support in nvagp for 840 (a server chipset, not exactly a gaming platform) and the 760mp, which is really new. also, the older version of q3 which is in the demo probably doesn't have as good support for dual cpu's. also, agpgart has no support for the 760mp, so i couldn't even test it. nevertheless, the clear winner is the 760mp system. it was only 6% faster in cpu speed, less ram, slower memory, and it is 12% faster in q3. not too shabby.
    --
    Patrick Paul
    Microway.com
    patrick@no_microway_spam.com


  • The main point of Crusoe is that it isn't a hardware only solution and is nothing you can bash it for.
  • SSE is for floating point numbercrunching. Neither a compiler or a database uses floating point mainly, but it did well in the Q3 bench which is both floating point and SSE optimized.
  • The P4 is doing well on classic benchmarks like Q3, SPECviewperf(some of it), 3DMark and Sysmark.
    Intel even optimized FlaskMPEG just to make it score higher when it was benched on tomshardware.

    To bad this doesn't show in other realworld situations.

  • Yeah 4 way was definately around back in the ppro day. But i still haven't seen that dual p4 machine make it's appearance yet either.
  • Not knowing these benchmarks were available, I just spent today compiling 2.4.6-smp for my Dual 500 MHz G4 PowerMac. My complete kernel rebuild time, using 4 jobs, was 3min,10sec, putting it ahead of the Dual-PIII/1GHz but behind the Dual Athlon/1.2GHz. I was very pleased with this speed.

    Your point of comparison is pretty much worthless. The point of benchmarking is you do the same task on different hardware. You compiled a different version of the kernel. You couldn't have possibly had even remotely similar options because you didn't compile the arch/i386 directory and did compile the arch/ppc directory. For the architecture-independent options, you don't know what they compiled in (I didn't see it on their page). Their numbers are only useful relative to each other. I could take Linux 2.4.6 and compile it with the bare minimum of drivers and filesystems and finish in less time than any of their machines. It wouldn't mean anything.

  • by malfunct ( 120790 ) on Thursday July 12, 2001 @01:30PM (#88743) Homepage
    I've seen benchmarks of the T-bird chips done in the tyan board with the 760mp. As far as I've ever read they are pin compatible. I think this is a case of "it will work but we don't support it". There are also some huge benifits of the MP chips vs the t-bird chips when it comes to multiprocessing because of the cache improvements in the MP chips.
  • "how long before we see 64 and 128 way athlon boxes"

    Err, now. That is if you count Beowulf cluster in a rack, 64 way a simp at $70720

    Medway Dual Athlon Cluster [microway.com]

  • They are still some Slot A, Thunderbird 1GHz Athlons, which would be your simplest ungrade.

    But with anything so cheap right now, why not get a new MB and an Athlon 1.4GHz, Or maybe wait for a 1.5GHz Palamino on the nForce 420 MB (which should rock).

    AMD are committed socket A well into 2003 (with "Barton" Athlons on a .13 micron copper SOI process), so a new MB shouldn't be to much of a dead end.

  • No AMDs license for the Alpha bus can't be revoked.

    An analyst asked the same question at the AMD earning conference call [amd.com] and Jerry Sanders gave a firm reply that the Bus license is solid. BTW the CC is worth a listen just to here Jerry slagging of the P4:

    " A: I think the the P4 is a dud. The P4 is a lousy product and they have to price it cheap and made a lot of noise that they wouldn't give up any market share in a marketplace that wants lower cost solutions. AMD is in a very good position with the Duron to do that, with the Athlon to do that. Pentium 4 is a loser. Intel is spending tremendous amuont of money in 130nm so that they can be marginally competitive. &quot

  • I would love to see some stats that are more realistic of a server envirnment. I don't know about mySql, but I know postgres spawns more than one thread for addition queries when needed. Looks to me from the numbers that mySql is only doing one thing at a time for the bechmark.

    I know my dual PIII 700 kicks ass when two big queries are going on at once, as long as they are indexed well. I have a hard time believing a single processor would still beat the dual in any useful DB test. (How many DB's really only perform one query at a time?) Two mySql benhmarks run at the same time, and then 4, would be much more interesting to me.

    Correct me if I'm wrong about how the mySql benchmark works.

    -Pete
  • The reason the kernel compile didn't gain from > 2 CPUs is that the disk become a bottleneck. The proper way to compile a kernel on a multicpu machine:
    1) change the makefile to run gcc with '-pipe'. Read the man page to see why.
    2) set MAKE=make -jN, where n=num of CPUs
    3) either put the source in a ramdisk or run it on a fast striped raid system.
    4) run make -jN (yes, both the environ and the arg)

    TaDa! Much faster!
  • In contrast what most others say (buy cheap CPUs on SMP), the inverse is true. If you decide to go SMP, you have to take the fastest CPUs available to be "cost effective".

    I'll explain :

    Imagine that you go for the cheaper ones at half the speed of the fastest one. Your performance will be 25%-40% slower than the fastest single cpu box, and still more expensive because you had to pay extra for the motherboard.

    Strangely enough, this is true for intel CPUs and not for AMD CPUs. AMDs slower CPUs are far more cheaper and may turn more profitable.
    But for Intel, the only CPU with which your SMP can compete is a not yet existing one (because availability wins from price) which would theoritically be 1.7 times faster than your cpu speed on your smp.

    It's strange but it's true.

    Buy AMD.

  • by Doomdark ( 136619 ) on Thursday July 12, 2001 @02:03PM (#88750) Homepage Journal
    Although it may be that 2 x Single-Processor system in many cases is better than 1 x Dual-processor, there are some benefits from having a SMP system:
    • You can share most other components; maintainability is better (one monitor, as many HDs as you need, one case, one motherboard even though it's more expensive etc.). And even though it's kind of a "single point of failure", it's no different really from having more systems, if they all have to be up for the service to be available (ie. no redundant backup systems)
    • For closely-coupled processes SMP is faster than UPs talking via Ethernet; most web-servers talk to databases, and direct communication is more efficient than talking via 100mb (or even gig) Ether. And DBs generally scale quite nicely to multiple SMP systems.
    • Bit irrelevant here, but I certainly enjoyed double-PII work station I had few years back... interactive response _is_ much better (on Linux too)
      • So... it's more convenient IMO to have dual/quadruple system than N single-CPU system.

        Most important, though, is what everyone and their donkey has said; it all depends on what you plan to do with your system.

  • Not knowing these benchmarks were available, I just spent today compiling 2.4.6-smp for my Dual 500 MHz G4 PowerMac. My complete kernel rebuild time, using 4 jobs, was 3min,10sec, putting it ahead of the Dual-PIII/1GHz but behind the Dual Athlon/1.2GHz. I was very pleased with this speed.
    Short day today, wasn't it?
  • Crusoe was meant for people like me who consider getting basic work done adequately a higher priority than blowing the doors off the guy down the street or rendering so sharp you can see Lara Croft's pantyline and erect nipples.

    It's a good chip for what it does (and I had this idea the other day that I'd love to see it do a PDP-11 just so I could run the copy of Unix V6 I downloaded a year or so ago). But it just doesn't fit into this company.

    /Brian
  • I was under the impression that the Athlon4 was a Palamino with SSE (not SSE2) instructions, or is it a case of me needing to LART myself?


    How every version of MICROS~1 Windows(TM) comes to exist.
  • As much as the Athlon people tout their shit as superior, and it took them HOW long to do SMP ?

    Just because SMP hasn't been a business priority for AMD doesn't meen that they didn't have the technical competance to do so years ago, nor does it have any relevance to the quality of their chips.
  • by tshak ( 173364 ) on Thursday July 12, 2001 @12:31PM (#88755) Homepage
    Because TransMeta doesn't make processors to compete on speed, rather battery life and portability.
  • by Freddy_K ( 174281 ) on Thursday July 12, 2001 @05:37PM (#88756) Homepage
    In response too: [gamepc.com]
    "Unfortunately, after doing a some testing and analyzing the results, it appears that SMP Quake3 under linux isn't running at 100%. And when I say that, I mean it doesn't run at all. After trying to enable it with a "r_smp 1" command and a restart, I noticed this error message in the console log: "Trying SMP acceleration... failed". Not good. So, off to Google Groups I go to see if anyone else has had any success. After browsing through what seemed like hundreds of message board posts and pages, we were not able to find anyone who had this working successfully If someone knows how to get this working, we'd love to hear about it!"
    I emailed TTimo [mailto] at id [idsoftware.com] about it and here's what he had to say:
    You were not able to turn on SMP in Quake III Arena linux .. simply because it is not available yet. Id has never released a linux binary of Quake III Arena with SMP support. That's why you get the "trying SMP acceleration .. failed" message. We have in-house binaries though, and it's on the TODO list ... "when it's done"


    TTimo

    --

    Linux Quake III Arena / Quake III: Team Arena

    Id software

  • Has anyone done a cost-efficiency comparison of dual-cpu performance vs. a simple cpu when considering the costs involved (special SMP boards, etc.) In otherwords is it more economical to buy two web servers or one smp server with tons of ram?

    As others here mentioned, the choice between SMP and multiple boxes depends a lot on your current bottlenecks and the amount of communication between tasks.

    For instance, if your web server is memory speed or I/O bound, and updates are managed by a central authority (like an SQL server), go for a pair of servers. That doubles your system bus and disk subsystem capacity over a single SMP system.

    On the other hand, if your server is CPU bound and/or requires fast communication between tasks (like updates to on-disk data), go for the SMP. Sure, the board is more expensive, but you only need one (same goes for drives, cases, network cards, etc).

    Web servers generally fall into the I/O bound category, so two servers is probably your better bet unless you handle quite a bit of dynamic content.

  • "The answer is four."
    Err, actually no. The answer is forty-two. Or am I the only one still old enough to remember that?
  • by LoudMusic ( 199347 ) on Thursday July 12, 2001 @12:38PM (#88759)
    A mirror [stoneward.com]

    ~LoudMusic

  • Hypertransport is a point to point communication protocol for connecting chips, nothing more and nothing less. It's not intended as a PCI replacement as it is no a bus. Replacing AGP with Hypertransport is an option, though I haven't heard anyone proposing that.

    As for how AMD is planning on using it, first off it will be used for connecting north and south bridges together. nVidia's new nForce chipset for the AMD Athlon (expected in shipping products in August or there-abouts) already makes use of Hypertransport for this. The next step is the really important one though. AMD's Hammer series of processors, scheduled for release this time next year, will make use of hypertransport for communicating with external chipsets. This will not be quite like EV6 in the Athlon and Alpha now, mainly because the Hammer chips will have integrated memory controllers all on their own. Hypertransport will be used to talk to a sort of companion chip for all I/O other then memory stuff. Just how well it will work at this point is anybody's guess, though the integrated memory controller at least seems like a good idea and is likely the way that most CPUs will go eventually.

  • I believe AMD will be transitioning to their own hypertransport [amd.com] bus which Sun is also planning on using. If anything this will only accelerate the transition.

    obtw- Intel does not own the Athlon core, only the Alpha EV6 bus it runs on.
  • by ageitgey ( 216346 ) on Thursday July 12, 2001 @12:37PM (#88762) Homepage
    Has anyone done a cost-efficiency comparison of dual-cpu performance vs. a simple cpu when considering the costs involved (special SMP boards, etc.) In otherwords is it more economical to buy two web servers or one smp server with tons of ram? Do certain applications (cpu intensive obviously) save money with SMP systems verus others that depend on IO throughput, etc and what applications are those? I'm really interested in knowing with better evidence than "well, I think..."

  • No flame intended, but I'm just not impressed by a review for a multi processor platform unless it's from some concern evaluating it as a serious engineering or server platform. I'd be the last person to buy something like this to play games on.

    --
    All your .sig are belong to us!

  • by CtrlPhreak ( 226872 ) on Thursday July 12, 2001 @12:37PM (#88764) Homepage
    The Athlon MP is actually the same core (palamino) as the upcoming Athlon 4, the only difference is that the athlon MP has been 'certified' by AMD to run in SMP configurations. There is no change in the socket or connections, all AMD socket A processors are compatible with the AMD 760(MP) based boards and will be compatible with the Athlon 4.
  • they license the bus technology, and that's for the main-board technology, which has nothing to do with the core
  • of course it will be cost effective, you are taking the cheaper way out. However, some people actually _need_ dual 1.2's in their system, for whatever reason....
  • by ocbwilg ( 259828 ) on Thursday July 12, 2001 @02:49PM (#88767)
    obtw- Intel does not own the Athlon core, only the Alpha EV6 bus it runs on.

    Which AMD already licensed from Compaq/Digital, so they should have rights to it for as long as they need them. This is further encouraged by statements from Compaq offiers that the sale of the Alpha designs to Intel were not an exclusive deal, so it would seem that Compaq/Digital could continue to license the technology to additional companies.

    Say "NO!" to tax money for religious groups. [thedaythatcounts.org]
  • actually, 2.4.3 kernel from redhat is available for a while now, as an update (RPMS). That's why i wonder why the GamePC guy didn't use this kernel, which has a lot of improvments. Moreover, i had several problems with my Duron and that 2.4.2, which, while investigating a bit, made me think the AMD architecture wasn't very well supported using pre-compiled RH 2.4.2 kernel. I simply recompiled it de-activating MTRR stuff and so on, and everything worked very well afterwards.
    I also think choosing a RH to test AMD processors is far from fair, because they usually focuses on Intel platforms when pre-compiling their ugly kernel.
    Lastely, 2.4 kernel series haven't been a heaven till 2.4.6 (remember 2.4.5 ReiserFS bug ?), so why bothering testing on 2.4.2 ? I don't know much about SMP support in 2.2.x, is it that bad ?
  • True, but at the time they came out there were no 1GHz x86 chips around, and a couple of celeron 300A's (effectively rebadged PII 450's) on a BH6 was immensely cheaper than a PII 600 uniprocessor system. Now high end processors are a lot cheaper, so it comes down to how you actually run stuff on the box as to whether a dual is worth it - is it really worth shelling out the extra $$ to be able to still run other stuff when netscape goes wierd and grabs all of one CPU?
  • Doh!
    1. The BH6 ... BP6
    Just consider it a typo, you know the board I mean, the dual processor one.
    2. There never was a PII 600 ...
    True, (pity, it would be cheap by now) but that's about the speed the two celeron combination runs at for doing stuff like encoding mp3's running four copies of the encoding program at once. For other stuff relative speed varies all over the place. Mosts apps run on only one processor, but if X takes one and a 3D graphic intensive game takes the other then you are ahead of a uniprocessor system of the same clockspeed. Whether I put faster chips in the thing or get a new uniprocessor motherboard and CPU will come down to price/performence, which is currently skewed towards AMD chips in a big way.
  • As much as the Athlon people tout their shit as superior, and it took them HOW long to do SMP ?

    Well, don't forget that before the Athlon less than two years ago, AMD was always playing catch-up with Intel; Intel was always the performance leader. Now that AMD has arguably grabbed the performance crown, it makes sense for them to put in the support for SMP.
    --
    Convictions are more dangerous enemies of truth than lies.
  • The difference is so variable and application based that its hard to say which is better. The Multics system, designed from the ground up for SMP achieved the following performance figures for an SMP system (unofficial tests reported in alt.os.multics):


    1 processor 1 figure-of-merit
    2 processors 1.8 FOMs
    3 processors 2.5 FOMs
    4 processors 3.1 FOMs
    5 processors 3.6 FOMs
    6 processors 4.0 FOMs

    An ISP I ran used a dual P166 system for the news server. The dual P166 was later replaced with a AMD 300 single cpu MB. The 300 was noticably faster then the dual P166. Best guess is the dual P166 was about the same as about a single P250 (about 1.5x a single P166). This was with a linux 1.x kernel. The disk and RAM were the same.

  • IIRC CTSS was not really SMP. I think it was more a Master/Slave situation. The original GECOS (GE) later GCOS (Honeywell) was also Master/Slave. One processor did all the I/O, etc. so under heavy I/O load, it would get saturated and slow down the system.

    I haven't seen any information about TOPS-10 so I don't know if it was true SMP, Master/Slave, or other. Multics was SMP. From what I remember, it was one of the most efficient SMP systems made as of hte last time I had information, maybe 20 or so years ago.

  • Go buy a Silverado [noisecontrol.de] and improve your Quality-of-Sleep.

  • The Athlon4 was the name given to the new mobile Athlon chips based on the Palimino core. The AthlonMP was the name given to the workstation/server chip based on the Palimino core. The Athlon4 is also the name given to the upcoming desktop chip based on the Palimino core. The ONLY difference between all 3 types of chips are the speed. The mobile was started in the high hundreds of MHz. The MPs start at 1GHz and the Athlon4 will start at 1.53GHz. To clarify, you can use regular Thunderbirds in the MP dual motherboards, and you will be able to run both the upcoming Athlon4 chips AND the Thouroughbred chips in the MP motherboard, since they are all exactly the same, just released at different times at different speeds for the different platforms. All compatible, all fast, all good. :) Hope this clears things up. Take Care, Paul

Beware the new TTY code!

Working...