Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Hardware

8 way SMP chipset for K7 43

Bill Henning writes "For those of you interested, HotRail has announced an 8 way SMP chipset for the K7 using crossbar switching to improve SMP performance; Read more in this article " So much speed, so many RC5 keys to crack.
This discussion has been archived. No new comments can be posted.

8 way SMP chipset for K7

Comments Filter:
  • Why not have a single cache, shared between all the CPUs? Okay, if two CPUs wanted to access it at once, one of them would have to wait, but only for one cycle.
  • Hm ... maybe something like Transmeta's system ... ;) Aren't they looking to separate the single os from the processor under it?
  • Just think. You can now buy a 2 ghz K7 machine from Kryotech. They've just developed an refrigerated case, in which they've successfully overclocked a k7 to 1ghz. So you can basically throw 2 processors in there to get a 2ghz x86 machine from them. I bet intel is sorry they locked the clock multiplier on their p III.
  • FWIW, this is exactly what we did for a north bridge chipset for the DEC Alpha. Because control information is split across multiple data-path chips it's necessary to run an I/O bandwidth (read relatively slow) channel in addition to the main CPU-memory path, but there's really no reason why that can't be done at 1600 Mb/s over two wires in each direction.

    Other I/O can be handled in the same way, of course. Running a PCI serial bridge is easy by comparison with the memory path because PCI tolerates much more latency than memory does.
  • According to JC's PC News [jc-news.com], SMP K7 boards should be available Q1 of 2000 (though I don't know if that'll be the 8-way boards). Also in his archives he has the K7 RC5 score using various cores (as there's no K7 core yet). It seems to get keyrates equivilent to a P6 core when using the P6 optimized code. Presumably the RC5 folks will make a K7 optimized version in the future though.
  • Now, I don't know much about chip design, but the author stated clearly that the emphasis of the chipset would be high performance, as noted by the poster above. However, burst memory transfers would not benefit much from a non-multiplexed design, since the address is sent on the first cycle, followed by the burst of data. Unfortunately, most memory access is random, so what the heck...
  • L1 is impossible to share between processors. Yeah. But not impossible to share between multiple instruction streams, which the processors implement. All that interconnection stuff is so messy anyway. The history of messy interconnections is that it becomes small, cheap, mass-produced silicon.

    We already have (uni-)processors capable of executing 4 instructions at once. It's just that the chip jocks seem to be making DOS CPUs. No reason those 4 instructions couldn't come from separate threads. Then L1 sharing is implicit. We should have something on-chip by '01 if we're lucky and Kindly Uncle Ben keeps the furnaces going. Not cheap, but a first step.
  • Ah, the Transputer, legendary money sink for many a company and a government or two.

    Demand is relative. Volume is concentrated in uniprocessor low-end boxes: demand with a capital D. Makes sense to throw R&D money there.

    Projections for these umpteen-way SMP and switched systems claim the big market is in "servers". I really don't see where servers need to be so tightly (and expensively) coupled vs 100BaseT or something. 'cause the SMP stuff starts needing fire-breathing PCI-X and beyond I/O bus(es!) to feed it and hot-swap cards and redundant power supplies. These are all expensive items!

    One can set up a loose cluster server where one box can be smashed to bits and the others carry on transparently. Just plain, simple $500 specials. Maybe SMP is good for other things, but the marketing announcements always say "servers", not ray tracing or particle physics. If the "server" part is factored out, I wonder what the demand really is.

    Making an affordable consumer 4-way or 8-way SMP box would be tough. The whole thrust of mass user PCs has been getting rid of those glue chips, those card slots, those cables.

    There are other (tightly-coupled) ways to handle more than one instruction stream at once than wiring discrete CPUs with gruesome $$$ bus systems and arbitration logic.
  • Multiplex data transfer will help, esp if transfers are more than one byte at a time, as they tend to be in machines larger than 8-bit. Many transfers are going to range from two byte to eight bytes at a time, even in random access.
  • Suppose Processor A is a producer and Processor B is a consumer. The only way they can communicate is via shared memory, unless, like a transputer, you have some other explicit CPU-to-CPU communication mechanism.

    What happens is that Processor A ends up caching up a whole bunch of data that Processor B needs. The way Processor B ends up getting it is by accessing external memory, triggering Processor A to do its cache writebacks, and in the meantime stalling both Processor A and B.

    Ick.

    So, while the whole process is transparent to the programmer (at least insofar as the entire memory map is coherent, regardless of what CPU you're on), the whole cache-snooping business can kill performance.

    For decent SMP performance in this scenario, Processor A really needs to manually flush the portions of its cache that Processor B needs (or instead use write-through, rather than write-back, in which case cache invalidation is all that is necessary). Since you're explicitly manipulating the cache, you're no longer transparent.

    (Or, if you're a DSP or some other device with onchip SRAM instead-of/in-addition-to cache, you just DMA all the data where it needs to be in the background, and keep processing without missing a beat... :-) ) --Joe

    --

  • BeOS will support SMP nicely, but it's not a server OS, really, is it?
    WinNT tops out at 2 CPU, or 4 if you believe MS, (or really, it just plain craps out period).
    OS/2 Server is optimized for 8 CPU and handles up to 64 CPU.
    How many CPU will Linux handle? Is the current kernel only good for 2 CPU?
  • AMD should concentrate on (cheap) SMP systems..

    If Intel are going to SMP-disable the Celeron, they're obviously not too interested in this market themselves. Doesn't make any sense to me, since SMP means selling more CPU's to the customer, and the guy who's going to buy a dual Celeron system is unlikely to fork out the cash for a dual PIII instead - too expensive.

    I've go to think there'd be plenty of demand for a 4 or 8-way SMP box using $100 processors that could start out cheap with a single CPU, and just be upgraded. Seing as Intel don't seem to want to support *cheap* SMP, AMD could step right in...

    Too bad the Transputer never took off - that was a CPU *WAY* ahead of its time...
  • Very cool, and a good reason to switch to AMD hardware. (not that intel isn't good, but AMD does tend to be cheaper...) And 8-way SMP...wow.

    That I like.

    Although, I do wonder about the syncronization issues...

    Enough of my ramblings...christ, it's 5am... *sighs*

  • I have never bought anything but an AMD. I have been pleased w/ their products. However, AMD, Intel, IBM, etc. have made their processors so complex that trying to add in a major feature like SMP causes all sorts of problems & can make the setup slower than a single cpu by itself if the SMP technique is not well thought out. Problems like keeping caches in sync and so forth are being encountered but for the wrong reasons. Cache logic was instituted because memory was slower than the cpu & since most of what you do is access memory, it was the most cost effective way to speed up memory access. However, now they want to setup smp between the systems. Cache logic automates the cache so that the programmer is "unaware" of its presence. That may be great for a single processor, but when you hit SMP, it provides the potential for multiprocessor performance to go into the toilet. If you want to see a cool processor design, go to www.ultratechnology.com & read about Chuck Moore & his forth chips. 27 instructions & they run as 500mhz. Guess what, no cache... His design does not require any. Only one downside to his chips, he only makes 20bit processors (please Chuck, make a 32 bit/64 bit). His transistor count is way down too. Around 20000! So you take an intel wafer & put a 32/64 bit version of Chucks chip on it plus about 256mb of 500mhz ram! Maybe some day chip vendors will learn. I don't have a problem w/ making something complex. As long as it is "necessary" complexity. If Chuck does make a 32/64 bit version, I will have my Masters in electrical engineering in a few years & guess who will be building his own PC... Well, that is my 2 cents.
  • It's pretty obvious everyone that has posted so far read the article.
  • by abischof ( 255 )
    Anyone know when these SMP boards are going to be available?

    Also, since CmdrTaco mentioned it, are there any estimates for the K7's rc5 speed?

    Alex Bischoff
    ---

  • The article isn't that bad, at least, it's more informative than most of the crap you usually have to read looking for real information in an industry typically reported on by twits who know nothing of it.

    Personally, despite some of the valid criticism people will make of this technology I am just pleased about what it means to the processor scene - at last x86 systems will be able to be built with multiple CPU's using non-Intel CPU's!

    Now personally, yes, I have always been a big fan of AMD, I have several machines at home running on K6 series chips right now, but AMD zeal is not why this makes me happy. This makes me happy simply because I want to see more competition in the higher end x86 system market where chips are concerned. For years typical home and office single user PC builders have had choices and competition which has driven down prices. Hopefully, this kind of technology and the possibilities it opens will create more competition and freedom of choice in the higher end markets for the serverheads and rendermonkeys who need the heavy multiple CPU horsepower but until now have had little choice where x86 architechture was concerned.


    SAVE THE BATS

  • Do you suppose you could mix chips on a board?

    Start off with "cheap" k-7's and gradually swap them for alphas as the bank balance improves. Some sort of weird microkernal type o/s might be up to the job.... ]|->
  • Like most of the readers (AFAICT), I do believe that competition is a Good Thing[tm]. AMD does keep some pressure on Intel to keep the prices from getting non-artificially high, but is there any evidence that AMD is anything but Microsoft-friendly?

    Last I looked, AMD still advertised their CPUs as "Designed for MS-Windows". While Intel plays with the OS community to some extent, what does AMD, or their chipset partners such as VIA Inc., do?

    And what does the phrase "Designed for Windows" actually mean wrt. CPUs? Softwarewise it sounds quite ominous, especially if you once used DR-DOS, OS/2, Lotus 1-2-3...
  • Unless I totally missed the point and 3/4 of the article is exposition and education, I'm not impressed w/ CPUReview's take on the tech.


    There are a few interesting problems with the article but I think it's clear that the writer has no real clue about chip design or hardware in general. For instance, his estimation of the number of pins required to support address/data is off. Back in the bad old days of 8-bit 40-pin (and 16-bit 40-pin) devices, address and data lines were available on the same pins. The same can be done for i/o ports. The downside is that the speed of the HotRail chip would need to outstrip processor speed by 2. But the docs do mention up to a 1.6 ghz bus.

  • Yes, address and data can be multiplexed on the same pins. The author did mention this possibility, but then noted that it would take an extra clock cycle per access. Since the emphasis of this chipset is performance, he assumed (rightly or wrongly) that HotRail would not use multiplexed address/data pins. I'm not so sure about that assumption, though. Seems to me that the decrease in bandwidth contention might more than make up for the extra clock cycle needed for multiplexing. As long as the product is manufacturable, and there's a net gain in performance, it doesn't matter whether the design is the absolute fastest...
  • But I wanted ESS7 switching in my computer not
    some out dated crossbar tecnoligy.
  • The address, data, and command lines can all be multiplexed on a single 64-bit wide bus. Recall that most transfers onto and off of the chip are going to be cache-line fills. Folks, these addresses are adjacent. You only need to know the starting address to know where the entire burst of data goes.

    What this means is that you can put out a single "Address With Command" on the bus, followed by a burst of N data items. You've lengthened the transfer by 1 cycle for N items. Suddenly, you've cut your number of lines per CPU port from 128 to 64, and your performance is (N + k) / (N + k + 1), where N is the number of items, and k is the overhead occurred just by initiating a bus access. All in all, the "+ 1" looks pretty tiny, doesn't it?

    Besides, ever hear of "pipeline burst cache" for L2?

    --Joe

    --
  • Neither the article nor the previous posts brought this up...

    The K7 was designed from the ground up for SMP.

    If you doubt it, just look at the specs for the pins on top of the cartridge. They are expressly for multi-processor systems. I *think* they allow processors to be connected to each other DIRECTLY.
  • And what does the phrase "Designed for Windows" actually mean wrt. CPUs

    With the AMD processors, it simply meant "fully Intel x86 compatible" in a way that J. Random Luser could understand. Mr. Luser, a "Windows Magazine" subscriber, doesn't understand "x86 compatible", and Intel's lawyers would hammer AMD if it advertised its chips as "Intel compatible" or "Pentium Compatible". It also meant, "See? A big name supports us! We're not such a risky choice!"

    If you recall, the AMD486-100 had a "100 reasons" ad way back when ('93/94/95?). It repeated "Windows compatible" and "MS Office compatible" three or four times, but also mentioned that it was OS/2, DOS, NetWare, UNIX, WordPerfect, Lotus 1-2-3, and SmartSuite compatible.
  • The person who wrote this article clearly hasn't thought about crossbars
    very much. It's all about how hard it is to put a big crossbar on a
    single chip. But the author has apparently never dealt with an actual
    crossbar. I used to design supercomputers back in the '80s, and we had it
    even worse than they do nowadays, as far as how many pins you could get on
    a chip. And yet, we managed to design and even build big crossbars. How
    did we do it?

    We split the data path across multiple chips. The way this
    applies to the example at hand is to build a chip that can switch among 14
    ports, each only 16 bits wide. This will require a failrly reasonable 224
    pins on the chip. Then you gang these chips up, with one switching data
    bits 0 to 15, one switching 16 through 32, etc. To switch a
    128-bit-wide bus, you need 8 of these chips. You also need to design a
    control chip to look at the address and control lines to decide which
    processor to connect to which memory on each clock cycle. The control
    chip broadcasts identical switching instructions to all the data chips.

    This solution keeps you from having to multiplex pins, and it keeps you
    from having to build stupendous 1000-pin packages, and it keeps you from
    having to run busses at 800 MHz and turn your computer into a microwave
    oven. The only downside is that you need a set of nine chips, but on a
    motherboard that aleady has eight processors, that's not too bad.

    --Carl Feynman
  • I haven't the time to read all the various posts on this subject, but this crosswitch thing has been obvious to me since '74. It's already been done. Design your bus according to all those transmission line stuff you learned in skool and crank the living hell out of it. Multiplex, ATM-like. 500mhz was doable in *1975* with ecl logic and a big power supply (current mode switching. still a good idea.)
    GAAS gates have propagation times in the single digit PICOSECONDS! Use them!
    I leave the rest as an excercise to the observant reader.

"Gotcha, you snot-necked weenies!" -- Post Bros. Comics

Working...