Forgot your password?
typodupeerror
AMD Upgrades Hardware

First 16-Core Opteron Chips Arrive From AMD 189

Posted by timothy
from the one-for-each-candle dept.
angry tapir writes "After a brief delay and more than a year of chatter, Advanced Micro Devices has announced the availability of its first 16-core Opteron server chips, which pack the largest number of cores available on x86 chips today. The new Opteron 6200 chips, code-named Interlagos, are 25 per cent to 30 per cent faster than their predecessors, the 12-core Opteron 6100 chips, according to AMD."
This discussion has been archived. No new comments can be posted.

First 16-Core Opteron Chips Arrive From AMD

Comments Filter:
  • So... how do these compare to the new Sandy Bridge chips Intel announced on the same day? There must be some overlap of the target market - whether to buy a quad-socket Intel server or dual-socket AMD one, for example.
    • by Surt (22457) on Monday November 14, 2011 @03:00PM (#38050932) Homepage Journal

      This would compete with the Xeon-E chips that aren't out yet. But in terms of performance about 75%, so this is the equivalent of a 12-core intel chip.

      • by ByOhTek (1181381)

        If they aren't out yet, how can you know? I wouldn't trust the performance benchmarks from either manufacturer.

        • by Surt (22457)

          This assumes that performance is not significantly different from the desktop line, which is usually the case.

      • by Surt (22457)

        Slight correction, on threaded workloads, we'd be talking about a 6-core chip, intel runs 2 threads per core.

    • by rrossman2 (844318)

      Not sure what Tyan has planned and what the chips can do, but tyan had boards that supported 4 quad core opterons plus you could add a "daughter board" that allowed you to add 4 more (plus more ram slots)

      Now that setup using 16 core cpus in an eatx format would be crazy

      • by ByOhTek (1181381)

        Yeah. I could ditch my furnace in the winter with a computer like that... Might even have to open a few Windows.

        • by Talderas (1212466)

          The idle heat would be sufficient, no? I don't see why you would need to open some windows just to ramp up the temperature unless you're using this thing to few heat for a sauna.

    • by beelsebob (529313) on Monday November 14, 2011 @03:39PM (#38051370)

      Put simply, the AMD ones are slower than the intel ones by about 2 fold per core. This isn't because AMD sucked at design, so much as their marketing department sucked at telling the truth. In reality, we're looking at 8 core AMD CPUs with 2 integer units per core - i.e. no more 16 core than intel's are 16 core chips because of hyperthreading.

      Once that's ironed out, the AMD chips turn out to have rather good performance if you want lots of integer work done, and the Intel chips to have rather good performance if you want anything else done.

      • they are like 3/4 cores. neither 1 core, nor half core.
        • by beelsebob (529313)

          The problem is, while this is true, bulldozer also suffers from being a fairly crappy arch design compared to sandy bridge. The result is that AMD's 8 "core" bulldozer is only roughly as fast as intel's 4 core i5 without hyperthreading. Extrapolate this to bolting two 8 "core" bulldozers together and you get to... well, that would only be about as fast as an 8 core sandy bridge with no hyperthreading, or a 6 core with hyperthreading. Given that Intel is selling 6 core E5 Xeons with hyperthreading for les

      • BENCHMARK!
        In the meantime, *please* STFU.

  • by TheTyrannyOfForcedRe (1186313) on Monday November 14, 2011 @02:59PM (#38050920)

    The "cores" in Bulldozer are not your typical first-class x86 core. Bulldozer "cores" are worth 2/3 of a modern x86 core. The 6200 is more like a 10 core. Add to that the crappy IPC and I'm not impressed.

    I was excited about Bulldozer before it was released. It's not often that CPU makers take chances on radical new architectures. Too bad this one turned out to be a huge pile of fail.

    • by Theovon (109752) on Monday November 14, 2011 @03:41PM (#38051386)

      Your description in inaccurate, but that's not surprising since most slashdot readers don't know much about CPU architecture.

      Bulldozers are essentially full-fledged cores, where the two cores in each module are mostly independent. There are two completely independent integer pipelines, so people seem to want to harp on the fact that the FPU is "shared". It's really a single split FPU, where each half can execute independent instructions, as long as the data width is 128 bits or less. Only when it is executing 256-bit AVX instructions is there any competition for resources. This is a very sensible design decision, since you don't find enough AVX software right now to justify completely dedicated AVX logic. (Plus, IIRC sandy bridge's FPU is only 128 bits wide and issues AVX instructions in two cycles, so what's the difference?) Moreover, even with AVX-heavy workloads, most software won't issue AVX instructions every cycle, and two AVX-heavy tasks on the same module won't really run into much contention. Assuming my memory of Sandy Bridge's FPU is correct, then Bulldozer has the advantage of having lower latency within the FPU on isolated AVX instructions.

      The PROBLEM with Bulldozer is that they just have not done some of the really aggressive and costly things that Intel has done in their design. Bulldozer is still a 3-issue design. While going to 4-issue doesn't help that much that often, it still gives Sandy Bridge a slight edge. But where SB REALLY gets its advantage is the huge instruction window. Intel found clever ways to shrink the logic for various components so that they could make room for a much larger physical register file and reorder buffer. As a result, SB can have many more decoded instructions in flight, which exposes more instruction-level parallelism and, critically, absorbs more memory access latency.

      A Sun engineer (discussing Rock, among other things) once described modern CPU execution as a race between last-level cache misses. When you have a miss on your L3 cache, it can cost hundreds of cycles, upwards of 1000. During that miss, the CPU fills up its reservation station with other instructions and then stalls, waiting on something to retire. This won't happen for a long time. Because of the disparity in speed (and latency) between compute and memory access, this is typically the most significant bottleneck. By enlarging the instruction window, SB can achieve much higher throughput, and it shows in the benchmarks.

      This is Bulldozer's Achilles' heel. I know there are a few benchmarks where Bulldozer is faster than SB, but they're not typical workloads with typical memory footprints. Anyhow, so if you're going to rag on Bulldozer, rag on it for the right reasons. Bulldozer's "shared" FPU is a red herring.

      • by Artraze (600366) on Monday November 14, 2011 @04:08PM (#38051708)

        The OP right, and seems to understand the issues far better than you. It isn't that the FPU is shared, it that nearly _everything_ is shared: Instruction cache, fetch and decode, FPU, L2 data cache. The only things that aren't shared are L1 data and integer operations (scheduler and ALU).

        Instruction issuing and and cache misses are big performance areas, but these are precisely the resources the cores share! You're running two threads off (with the exception of L1 data) the same caches and instruction fetches. So, in reality, the second core in bulldozer is much more like ultra-hyperthreading than it is a second core. I think the fact that they're even listed as cores is a marketing strategy that has backfired pretty hard.

        P.S. L3 cache has proven to be quite useless in many workloads... It helps a bit in servers, IIRC, but that's about it. So it's more a race to L2 cache, which, again, is a shared resource. AMD, in fact, has indicated that it may drop the L3 from desktop parts.

        • by Rockoon (1252108)
          If you look at the performance numbers comparing Phenom II x4 830 (2.8ghz) to the new A8-3850 (2.9ghz) you see that the lack of L3 isnt a problem at all when you can also pack on twice as much L2.
      • Your description is also inaccurate. Instruction decode and L2 cache are shared between cores in Bulldozer modules as well; I wouldn't ding Bulldozer for the shared L2 cache but the L1 cache is write-through, and there doesn't seem to be enough cache bandwidth to keep both integer cores busy. Bulldozer is not a 3-issue design, it is a 4-issue design. With regards to Bulldozer's Achilles' heel, I think that its deficiency in single-threaded performance comes more from actual cache misses and latency than
      • by loufoque (1400831)

        (Plus, IIRC sandy bridge's FPU is only 128 bits wide and issues AVX instructions in two cycles, so what's the difference?)

        My SSE code converted to AVX runs two times faster (not all of it though -- certain instructions do run in two cycles)

    • Bulldozer was very poorly handled from the beginning. What really suprises me is that they tried the NetBurst approach: when all else fails, go for clocks. Unfortunately, ARM seems to be focusing on a similar strategy (more cores, higher clocks, less focus on IPC)... Anyways, I don't buy their "poorly optimized" story. They knew all about it and could've waited - surely they realized at the early stages of development that OSes aren't optimikzed for this yet. They could've delayed Bulldozer and pushed out y
  • Wish List (Score:4, Informative)

    by Nom du Keyboard (633989) on Monday November 14, 2011 @03:09PM (#38051020)
    I so much want some real competition for Intel. Competition that doesn't artificially limit clock speeds and fuse off perfectly good working features in order to market a dozen overlapping and conflicting SKUs at a dozen different price points. And working drivers, current standards (DirectX 11 and OpenCL for starters), and USB-3 that doesn't require a $50 cable between every device would be nice.
  • by craftycoder (1851452) on Monday November 14, 2011 @04:08PM (#38051706)

    I just got a fancy 8 core T7500 Dell workstation and only one of my compilers actually takes advantage of the multiple cores when it is compiling. As a result this expensive desktop is only 15% faster in terms of time to compile than the 4 year old PC it replaced (the new PC has twice the ram as the old though which may account for some of that speed increase). I am seriously unimpressed with all these cores. Maybe they are useful for something, but I've not found anything that I do that shows significant improvement. Putting my development projects on a SSD did much more for my work flow performance than this fancy new computer, that is for certain.

    • by Anonymous Coward on Monday November 14, 2011 @04:19PM (#38051864)

      You're doing it wrong.

      make -j8

    • by fyngyrz (762201)

      Try doing DSLR image editing with Lightroom or Aperture. Those cores make one hell of a difference.

      • by Renegrade (698801)

        Yeah, they do in image editing.

        However, there will always be things that must be done in series, and always a maximum speed-up you can get from multiprocessing. (Amdahl's Law comes to mind) Plus, you'll often hit other bottlenecks, especially if you have an obscene number of cores. Memory, disk, video, network..

        Memory has always been a problem after the 6502 era. Even single core systems splat into the performance barrier that is main memory.

        I'd rather have a single-core system that's 8x faster than an

    • I just got a fancy 8 core T7500 Dell workstation and only one of my compilers actually takes advantage of the multiple cores when it is compiling.

      If your compiler isn't threaded, then at least run multiple compile jobs simultaneously--this is probably better anyway. If your build system can't do this, your tools are broken.

    • by friedmud (512466) on Monday November 14, 2011 @06:06PM (#38053120)

      What do you mean by "only one of my compilers actually takes advantage of the multiple cores when it is compiling"?

      Are you on Windows? Because any compiling done in linux with a "make" based (or similar) build system can use as many cores as you can throw in a machine (regardless of the actual compiler it's running). It should be the same in Windows...

      Don't look to your compiler to be multithreaded... look at the build system (i.e. in Visual Studio there should be an option somewhere to tell it how many processors to use while compiling). For make you just do "make -j8" to use 8 "jobs" total for compiling (i.e. 8 instances of the compiler will be running).

      Here is a test for one of my software projects doing "make -j#" where # is 1,4,8,12,16,24:

      1 : 15m9.614s
      4 : 3m57.947s
      8 : 2m6.354s
      12 : 1m33.426s
      16 : 1m25.559s
      24 : 1m17.345s

      That is on my dual 6-core hyperthreaded Mac workstation (so it had 12 "real" cores and 12 "hyperthreads"). You can see that hyperthreads definitely aren't as good as real cores... but do provide some speedup. That said, I thank God every time I compile (which is all day long) for the cores he has bestowed upon me...

      Good to hear that you are already on SSD... because parallel compiling does need speedy disk to keep the processors humming. The timings above are for two 256GB SSD's in RAID0.

      • I mostly work with Eclipse doing Java, Android, and GWT. Only GWT offered an effective way to use those cores. It is VERY possible that I just don't know how to use Eclipse to the best of its ability, but I can tell you that Eclipse never pushes more than one core during a build except when its building GWT projects for me (I had to tell it explicitly to do that though).

        • Yeah, you're screwed, sorry. Eclipse integrates nicely with Ant, but Ant doesn't do multi-core builds either. And Ant tasks are very heavy, so parallelizing them wouldn't help much anyway. You might try rebuilding your build process in plain 'make' and try that -j option.

          Also, I'm sorry you have to use GWT. That thing was just absurdly slow last time I used it, to the point that it would be faster to hand-code JavaScript.

          • That's what I thought. I research it every couple months when I get annoyed by multi minute builds. I never get any answers.

            GWT is slow and deployment is a little cumbersome, but the code is so elegant I just don't care. I love GWT. I wish Google provider more libraries, but I'm pleased with it. I'm not certain it has a future though.

            I loath Ruby. What's a fella to do if he wants a strongly typed object oriented website?

            • What's a fella to do if he wants a strongly typed object oriented website?

              I think JBoss is the usual answer to that. That only takes care of the back end, but GWT has your front end covered anyway, and the more code you can move into JBoss, the less you have to crank through GWT's slow processor.

    • by cbhacking (979169)

      Well, you could consider using a better compiler, or a better configuration for it. Many parts of compilation parallelize reasonably well, especially if you have a lot of source files. Some things will have dependencies on other parts (which limits parallelism) and some have dependencies on the entire previous stage (which severely limits or prevents it, for that stage).

      Besides, unless you're just building a pure build machine (and I doubt it, if your compilation setup is so bad), multiple cores can help a

    • by evilviper (135110)

      only one of my compilers actually takes advantage of the multiple cores when it is compiling.

      Send your octo-core my way, I'll see that it gets some use...

      For any RPM based Linux distro, just edit your RPM macros file to add eg. -j8 option to make, and every "rpmbuild" will max-out all 8 cores with 8 instances of gcc operating on different files each.

      And if you're lzma compressing the RPMs in question, and they're a non-trivial size, you can get a pretty good speed-up using either parallel-xz or p7zip across

Line Printer paper is strongest at the perforations.

Working...