Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
AMD Hardware

A Three-Way AMD Opteron Server 137

Abdul tips a thin little review up at The Inquirer of the Themis Slice. "The Slice is a three socket Opteron machine with two PCIe slots and two Infiniband 4x ports... Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."
This discussion has been archived. No new comments can be posted.

A Three-Way AMD Opteron Server

Comments Filter:
  • That is one weird looking board. "you take a big hit on cache coherency latency" Isn't this only a problem with NUMA based systems (of which Opteron is)? The article also mentions UltraSparc and PowerPC-64....
    • Re:Weird (Score:5, Informative)

      by Anonymous Coward on Monday August 13, 2007 @02:54PM (#20215165)
      This is also a problem on FSB systems, as all CPUs need to snoop the bus for cache coherency information. On Intels dual-bus systems, this information needs to go across busses. The Intel 4 FSB systems are even worse. AFAIK, Opteron is the only x86 chip that would support 6 cores (12 cores with Barcelona) with a single hop.
      • by LuSiDe ( 755770 )
        Its not my field, but afaik SGI solved this in MIPS with a 'cross' architecture. Imagine it like an X.
    • by Kreff ( 1142157 )
      I think you're right.
    • Couldn't you build a 4-way,8-way, or N-Way board where there isn't such a latency problem? Where each processor is connected to each other processor. Sure the circuit design would be pretty complex, but if it is such a speed increase, such that 3 processors gives you more power than 4, then it might be worth it. It might be very difficult with 16 + processors, but 4 shouldn't be that difficult. If it is impossible, please explain why.
      • Re:Weird (Score:5, Informative)

        by TheRaven64 ( 641858 ) on Monday August 13, 2007 @03:13PM (#20215377) Journal
        Yes, it's possible. The main problem in general is that cost scales in proportion to the factorial of the number of nodes. The main problem in the specific case of Opterons is that each chip needs one HyperTransport controller per other CPU. Current Opterons come with up to three HT connections, and you need one for connecting to the PCIe bus, and other peripherals, leaving two for CPU-to-CPU connections.
        • Re: (Score:2, Interesting)

          by poopdeville ( 841677 )
          I was under the impression that this latency issue was caused by the fact that there is no positive solution to the utility problem [wolfram.com]. Essentially, each core is connected directly to the other two, in a planar graph. There's no way to connect each of 4 cores to the other three without the connections intersecting, at least if the connections are made on anything topologicically the same as a convex subset of the plane (that is, no planar graph exists).

          This can be solved directly by creating chips with multi
          • Re: (Score:3, Informative)

            by TheRaven64 ( 641858 )
            Not really, because modern circuit boards are not planes. A modern motherboard is typically 7 layers, with wires in one layer all running parallel to each other. Within a die the utility problem is much more of an issue, but this is largely due to constraints other than those under discussion.
            • Structured ASICs typically have several (7-9 being common) metal routing layers which can and do cross without interconnect on a regular basis. Vias == pillars comprised by layering dots in each layer. crossing is accomplished with other materials which may or may not be removed depending on your expense/yield/dielectric properties needs. The last impacts EM coupling of traces and inter-symbol interference, and by extension speed.
            • So a specific question: do modern dice have the ability to use multiple planes? I'm referring specifically to those in use by AMD and Intel for multi-core machines. Circuit boards as such aren't really relevant to the issue of interconnecting cores on dice.
              • interconnect planes, yes.
                Transistor planes, no.

                A typical CPU by AMD or Intel is about 9-12 layers, only on of which (the bottom doped Si layer) has transistors. Everything else is poly or metal.
          • I'm probably missing something, but you can definitely have a fully-connected planar graph with four nodes. Make a triangle out of three, stick the fourth in the middle of the triangle and connect it out to the other three.
          • by knapkin ( 665863 )
            If you read the link you posted you would see you have misquoted or misinterpreted the utility problem.  Below is a diagram showing how to connect 4 nodes to each of the other 3 without intersection in one plane.  Posted as code because I can't seem to get it to work otherwise.

            |\ /|
            | X |
            \ | /

          • your confusing the issue. AMD Opteron processors ship with up to 3 HT buses on board plus dedicated ram. That is the physical limit. Intel processors only have 1 bus, then they play with different ways of connecting that to ram and other components. The board is an AMD-unique thing. HT sees processors or other components as "equals" not a "master-slave" relationship like in Intel-land. That makes the AMD available for trying out weird ways of connecting hardware and writing programs.

            on a side note t
      • You'd have to put the processors in a circle or something...
      • Re: (Score:3, Interesting)

        by pla ( 258480 )
        If it is impossible, please explain why.

        Problem 1)
        Draw four circles on a piece of paper.
        Now draw a line from every circle to every other circle without crossing any lines.

        Problem 2)
        Draw four circles on a piece of paper. Draw two "pins" on each.
        Now draw a minimal path between any two circles such that you can only start and stop at a pin, and only one connection can go to a single pin.

        You have the right idea for problem 1, that for low-N, you can just route connections through different layers
        • That assumes a two-dimensional topology. PCBs do not suffer that same constraint (e.g. they have more than a single layer to work with). Any side of a six-sided cube is adjacent to any other side, assuming that you have the ability to transport a unit along the interior of the cube. If you took all six processors, and wired them with the same theory, no processor is more than one hop away from any other.
        • Re:Weird (Score:5, Insightful)

          by rrhal ( 88665 ) on Monday August 13, 2007 @04:12PM (#20216113)
                   / | \
                  /  x  \
                 / .   . \
  • nothing new (Score:4, Informative)

    by Exter-C ( 310390 ) on Monday August 13, 2007 @02:51PM (#20215115) Homepage
    There is nothing new in this product at all, IBM have had this type of server platform (3 socket supported) for some time in the form factor of the x3755.
  • IBM System x3755 (Score:5, Informative)

    by OS24Ever ( 245667 ) * <trekkie@nomorestars.com> on Monday August 13, 2007 @02:53PM (#20215143) Homepage Journal
    Disclaimer, I work for IBM.

    The IBM System x3755 [ibm.com] has offered this feature since it came out as well. Instead of the fourth processor card you install a pass through card and it turns it into a three way. We've done a few benchmarks [lionbridge.com] (warning pdf) with the Pass Through card and what it could do between 3CPU and 4CPU operations.

    pretty cool ability for a few things.
    • by Anonymous Coward on Monday August 13, 2007 @03:19PM (#20215449)
      OS24Ever wrote, "Disclaimer, I work for IBM."

      You don't say... : p
      • Re: (Score:3, Interesting)

        by mr_mischief ( 456295 )
        Actually, I've never worked for IBM, and I keep pricing eComStation. I'd kind of like to use that on a system or two. Warp 3 is getting a bit paunchy. I don't want to drop it, though, because then I'd be down to Linux, BSD, Windows, OS X, DOS, and AmigaOS.

        Visopsys, ReactOS, OpenSolaris, plan9, Minix, QNX, MMURTL, OpenVMS, Haiku, and some others could serve for utility and novelty in varying degrees, but I already have plenty of software for OS/2.

        Yes, I'm an avid system collector. If you have hardware or sof
      • For the record, I used OS/2 before I worked for IBM, and not after I worked for IBM. I got a Win95 machine. Though at the time we still had end users on OS/2. I think it got pushed out when they started Y2K-ing things as a desktop OS. I was a heavy 1.3/2.x user, even ran a BBS on it in the early 90s, Maximus was the name if I remembered right.
        • Maximus kicked ass.
          I ran a BBS on it in the dying days of the BBS era. There are times I want to bring it back (ala, dialup to initiate a circuit connection, then DSL to DSLAM connection, but the TelCos won't allow that because they are asshats).

          Time to fire up the old BBS server and serial console port into it for fun :)
          • A friend of mine kept his BBS saved on a disk, every once in a while he runs some TCP/IP to Serial thing that lets me telnet into his WWIV based system. I think Win 2k3 broke the last one he had so I've not used it in a while. Was quite the retro trip especially since I had unread email from 1993.
    • by afidel ( 530433 )
      Rerun the test with the HP having 15K disks and I might not dismiss the results. Oh and I hate that SPECjbb2005 doesn't require financial disclose, jobs per $ and jobs per watt are the only things that really matter.
      • by Zak3056 ( 69287 )

        Rerun the test with the HP having 15K disks and I might not dismiss the results.

        They actually did address this in their benchmark document:

        Configuration Exception
        Due to backorder shipping delays from HP on the 144GB SAS 15K RPM hard drives the 72GB SAS drives
        were deemed an acceptable substitute. The SPECjbb2005 workload tool does nothing to exercise the hard
        drive and writes no data to it.
        As a result, this configuration exception was determined to be immaterial to the
        performance results addressed in this st

      • We didn't specifically call out the cost of the systems, but we listed the exact hardware used for each test with part numbers. We used it because SpecJBB2005 is CPU centric and doesn't rely on I/O as much as say TPC-C would where you can just attach 2400 hard drives to it to drive the number up.

        Believe it or not, when my team is tasked with coming up with studies like this, we try to be as fair as possible and don't try to stack the deck. We know the people evaluating purchasing our stuff aren't that stu
    • Probably also worth noting that an x3755 takes up 4 U of rack mount space. this opteron slice thing looks like it might be more dense than that if its similar to a blade.
      • Good point, my intent was to point out that the 3-cpu idea wasn't nutty, it actually had some merit.
  • Sorry... I tuned out after 'A Three-Way'.
  • CoProcessors? (Score:5, Interesting)

    by tji ( 74570 ) on Monday August 13, 2007 @02:57PM (#20215197)
    Wasn't AMD also talking about licenses or agreements with other companies to allow for different types of coprocessor chips to be used alongside their processors?

    There is some interesting potential in that realm.. Crypto accelerators for VPN, SAN, or other devices. Multimedia encode/decode accelerators (encode 1080P H.264 in real time?). Inevitable video game acceleration devices (physics co-processor, accelerated NIC chip, 3D GPU offload processor?).

    Those would be even more interesting in home-user oriented Athlon64 boards. Multi-socket opteron boards are out of my price range.
    • Re: (Score:3, Insightful)

      by DigiShaman ( 671371 )
      That's why we have buses to open up expansion possibilities.

      For example, we have NIC chips that offload TX checksum processing, Audio accelerators (Creative X-Fi), 3D GPU cards (nVidia and ATI cards), and physic cards (ASUS brand AGEIA card). The only reason you want a dedicated socket is for extremely fast and wide IO to RAM. So far, only the GPU has come close to needing that but hanging just fine with the PCI Express interface.
    • by LWATCDR ( 28044 )
      Some devices are already avaliable that plug into extra AMD sockets.
      FPGAs are very popular so that you can create custom co-processors.
  • by Tackhead ( 54550 ) on Monday August 13, 2007 @03:01PM (#20215235)
    ...with a million dollars?

    > Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."

    Lawrence: Three chips at the same time, man.
    Peter: That's it? If you had a million dollars, you'd use three sockets at the same time?
    Lawrence: Damn straight. I always wanted to do that, man. And I think if I worked at AMD I could hook that up, too; 'cause I hate motherboard layouts with latency.
    Peter: Well, not all layouts.
    Lawrence: Well, the type of chips that'd triple up on a board like that would.
    Peter: Good point.
    Lawrence: Well, what about you now? what would you do?
    Peter: Besides three chips at the same time?
    Lawrence: Well, yeah.
    Peter: Idle.
    Lawrence: Idle, huh? Peter: I would relax... I would sit on my ass all day... I would idle.
    Lawrence: Well, you don't need a million dollars to idle, man. Take a look at that fourth chip: it's two hops away, don't do shit.

  • Any CPU in a 3S system is one hop away from any other CPU.
    So... if I run Mac OS X on this box, can we call it an iHOP?
  • Where's the specs? (Score:2, Interesting)

    by achbed ( 97139 ) *
    There's no reference to this board/blade anywhere on the manufacturer's site. The only thing I can find is that this guy saw this board at a conference and took a shot and wrote a really short article about it. Ok, so a 3-way is a bit of a novelty, but good luck getting it to work. Isn't most microcode on the processors designed with 1, 2, or 4 way in mind? And isn't the cache coherency microcode embedded (at least in part) on the processors themselves? So setting up a 3-way using current processors wo
  • Threesome (Score:3, Funny)

    by macdaddy ( 38372 ) on Monday August 13, 2007 @03:07PM (#20215301) Homepage Journal
    So what kind of doe will this Opteron Threesome run me?
  • by Laxator2 ( 973549 ) on Monday August 13, 2007 @03:09PM (#20215331)
    The article states that with 3 processors one gets better performance, latency wise, because in a triangle configuration any processor cache is just one hop away. You can have 4 processors in a tetrahedron configuration and still have any processor one hop away. Of course it will take 3 hypertransport connections per processor just for the internal communications, so a 4th connection is needed for at least one processor to connect to the northbridge. The quad-core Opteron will have a maximum of 4 hypertransport connections, is that right ?
    • by pla ( 258480 )
      You can have 4 processors in a tetrahedron configuration and still have any processor one hop away

      Ignoring the physical trace-routing issues, you can have N fully connected nodes as long as every one has a N-1 connections (ie, a dedicated link to every other node), plus you need at least one bus-drop somewhere.

      In practice, all those connections need to physically connect somewhere, making more than a handful of fully-connected processors all but impossible.
    • The quad-core Opteron will have a maximum of 4 hypertransport connections, is that right ?

      Will have, yes, once both chip and socket support it. The current socket only supports 3 HT links.
    • by default luser ( 529332 ) on Monday August 13, 2007 @04:34PM (#20216419) Journal
      Yes, the quad-core chips will have the fourth link. In addition, the chips will be able to split their 16-bit HT links into dual 8-bit HT links, allowing for 8-way CPU configurations without hops (8 x 8-bit HT links per socket). In reality, this is the reason why AMD is pushing the new HyperTransport 3.0: so they can cut the bus lines to 8 without sacrificing too much bandwidth.

      Check it out here. [realworldtech.com]
    • So far that I know, the AMD CPUs that have three external HT links are the 8xx series Opterons, which gives up to eight physical processors with a maximum of two hops. I haven't heard of one with four external HT links. The 8xx series Opterons are bloody expensive.
  • This reminds me of some 6-way systems that I'm told Data General used to sell. They took two 4-way systems, and used one of the processor slots on each as a bridge between the two boards.
  • Opening sentences FTA:

    Themis Computer has developed a breakthrough in distributed computing for mission-critical systems. By functionally disaggregating commercial computing resources and housing them in a standardized footprint, purpose-built enclosure, the Themis Slice Architecture provides resilience with superior thermal and kinetic management. This open and modular design allows for spiral technology refresh, extending computing infrastructure investments for complete lifecycle management.

    I admit this article is probably just over my head technically, but did anyone else read this and think of ROOTER [mit.edu]? I mean, what is "kinetic management" in a computer? Maybe they spin the CPUs through the air instead of blowing air over them. That might explain "spiral refresh technology" as well.

  • Isn't this only a problem if the OS doesn't manage the NUMA architecture well? Surely there is an OS out there smart enough to recognize separate processors with separate memory regions and assign physical addresses appropriately....
  • by aapold ( 753705 ) on Monday August 13, 2007 @03:35PM (#20215629) Homepage Journal
    I mean how to convince the wife that we need a three-way?
  • Multi core (Score:3, Interesting)

    by jshriverWVU ( 810740 ) on Monday August 13, 2007 @03:41PM (#20215711)
    Curious if it can take multi-core cpu's. Having a 3way system with dual core opteron's sounds really nice.
  • How are the hypertransport links arranged?
  • by Anonymous Coward

    Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, ...

    How about a tetrahedron for four CPUs?
    • Re: (Score:1, Informative)

      by Anonymous Coward
      They are talking specifically about the Opteron. Each CPU has two links. You'd need three links from each CPU to form a tetrahedron.
  • This architecture might be good for server applications - i.e. lots of instances of a single-CPU task.

    However, it doesn't work that well for large apps that get parallelized across multiple CPUs. It turns out that most code, and most compilers, are good at splitting tasks in two - or in powers of two - so having three CPUs is no faster than having two.
    • by Namlak ( 850746 )
      However, it doesn't work that well for large apps that get parallelized across multiple CPUs. It turns out that most code, and most compilers, are good at splitting tasks in two - or in powers of two - so having three CPUs is no faster than having two.

      The third processor can run supporting thread(s) that control the "worker" threads. Let alone support processes such as network, I/O, or anything else in the OS - leaving the two CPUS (and their caches) wide(r) open for application crunching.
    • Re: (Score:3, Informative)

      by dlapine ( 131282 )
      Ok, so it's not for HPC systems. I'm betting that the number of servers/server farms out there may make this attractive for the non hpc users, if the 3 way is significantly cheaper than a 4 way. If you can get this on a blade, you get a 50% increase in CPU power for non-parallel tasks.

      Hmmm, now that I think about it, a three way box might be really interesting for some HPC loads as well. The low latency is a really big issue for some codes, and the three way could be more scalable (with some hand coding

    • This is so bullshit I don't know where to begin. GCC is a single threaded application, you can invoke parallel builds with ANY NUMBER of jobs, be it 1, 2, 3, 4, 5, ..., whatever.

      So with a 3-way box you'd just use something like -j3 or -j4 to distribute load. unless they're dual cores than -j6 or -j7 would do.

  • I'm kinda new to enterprise servers. In the picture it looks as though each CPU has its own bank of memory. If so, is that efficient or not?
  • people are more surprised by the 3 CPU sockets than they are by the IB ports.

    I thought IB was dead - replaced by 10gigE?
  • I thought a while ago that AMD, specifically, should create a 3-core processor. Why? Because they can call it the TriAthlon!
  • A 3-way server could sell better than 4-way ones in China, as the number 4 in China is associated with death.
  • a 3 cylinder engine is smoother than a 4 cylinder, a 5 cylinder engine is smoother than a 6 (or an 8 for that matter). with an even number of cylinders, 1 is on a power stroke lined up with one on an intake stroke. with odd numbers, no 2 cylinders move at the same time.
    • Sorry, this just isn't true in practice. The Geo's, Suzuki's, VW's and Audi's which used odd-numbers of cylinders did so only for packaging considerations, not because the engineering (smoothness, etc.) made sense. They represented a cylinder added onto or removed from a 4 cylinder engine to meet displacement needs while still fitting in the car.

      The smoothest piston automotive engines are in-line 6 cylinder engines or V-12 engines, which provide a power pulse with every 30 degrees of crankshaft rotation

Real Programmers don't write in PL/I. PL/I is for programmers who can't decide whether to write in COBOL or FORTRAN.