8 way SMP chipset for K7 43
Bill Henning writes "For those of you interested, HotRail has announced an 8 way SMP chipset for the K7 using crossbar switching to improve SMP performance; Read more in this article
" So much speed, so many RC5 keys to crack.
Re:Cache thrash between the SMPs (Score:1)
Re:x86... -> Transmeta (Score:1)
2 ghz machines (Score:1)
Re:How crossbars work, really. (Score:1)
Other I/O can be handled in the same way, of course. Running a PCI serial bridge is easy by comparison with the memory path because PCI tolerates much more latency than memory does.
Re:When? (Score:1)
Re:CPUReview is a hardware clue-free zone? (Score:1)
Re:Cache thrash between the SMPs (Score:1)
We already have (uni-)processors capable of executing 4 instructions at once. It's just that the chip jocks seem to be making DOS CPUs. No reason those 4 instructions couldn't come from separate threads. Then L1 sharing is implicit. We should have something on-chip by '01 if we're lucky and Kindly Uncle Ben keeps the furnaces going. Not cheap, but a first step.
Re:Funny I was just thinking that... (Score:1)
Demand is relative. Volume is concentrated in uniprocessor low-end boxes: demand with a capital D. Makes sense to throw R&D money there.
Projections for these umpteen-way SMP and switched systems claim the big market is in "servers". I really don't see where servers need to be so tightly (and expensively) coupled vs 100BaseT or something. 'cause the SMP stuff starts needing fire-breathing PCI-X and beyond I/O bus(es!) to feed it and hot-swap cards and redundant power supplies. These are all expensive items!
One can set up a loose cluster server where one box can be smashed to bits and the others carry on transparently. Just plain, simple $500 specials. Maybe SMP is good for other things, but the marketing announcements always say "servers", not ray tracing or particle physics. If the "server" part is factored out, I wonder what the demand really is.
Making an affordable consumer 4-way or 8-way SMP box would be tough. The whole thrust of mass user PCs has been getting rid of those glue chips, those card slots, those cables.
There are other (tightly-coupled) ways to handle more than one instruction stream at once than wiring discrete CPUs with gruesome $$$ bus systems and arbitration logic.
Re:CPUReview is a hardware clue-free zone? (Score:1)
Cache thrash between the SMPs (Score:1)
Suppose Processor A is a producer and Processor B is a consumer. The only way they can communicate is via shared memory, unless, like a transputer, you have some other explicit CPU-to-CPU communication mechanism.
What happens is that Processor A ends up caching up a whole bunch of data that Processor B needs. The way Processor B ends up getting it is by accessing external memory, triggering Processor A to do its cache writebacks, and in the meantime stalling both Processor A and B.
Ick.
So, while the whole process is transparent to the programmer (at least insofar as the entire memory map is coherent, regardless of what CPU you're on), the whole cache-snooping business can kill performance.
For decent SMP performance in this scenario, Processor A really needs to manually flush the portions of its cache that Processor B needs (or instead use write-through, rather than write-back, in which case cache invalidation is all that is necessary). Since you're explicitly manipulating the cache, you're no longer transparent.
(Or, if you're a DSP or some other device with onchip SRAM instead-of/in-addition-to cache, you just DMA all the data where it needs to be in the background, and keep processing without missing a beat... :-) ) --Joe
--
And which OS would be used? (Score:1)
WinNT tops out at 2 CPU, or 4 if you believe MS, (or really, it just plain craps out period).
OS/2 Server is optimized for 8 CPU and handles up to 64 CPU.
How many CPU will Linux handle? Is the current kernel only good for 2 CPU?
Funny I was just thinking that... (Score:1)
If Intel are going to SMP-disable the Celeron, they're obviously not too interested in this market themselves. Doesn't make any sense to me, since SMP means selling more CPU's to the customer, and the guy who's going to buy a dual Celeron system is unlikely to fork out the cash for a dual PIII instead - too expensive.
I've go to think there'd be plenty of demand for a 4 or 8-way SMP box using $100 processors that could start out cheap with a single CPU, and just be upgraded. Seing as Intel don't seem to want to support *cheap* SMP, AMD could step right in...
Too bad the Transputer never took off - that was a CPU *WAY* ahead of its time...
Nifty stuff. (Score:1)
That I like.
Although, I do wonder about the syncronization issues...
Enough of my ramblings...christ, it's 5am... *sighs*
Modern chips are overly complex... (Score:3)
Hummm..... (Score:1)
When? (Score:2)
Also, since CmdrTaco mentioned it, are there any estimates for the K7's rc5 speed?
Alex Bischoff
---
Finally a non-Intel SMP solution for x86... (Score:2)
Personally, despite some of the valid criticism people will make of this technology I am just pleased about what it means to the processor scene - at last x86 systems will be able to be built with multiple CPU's using non-Intel CPU's!
Now personally, yes, I have always been a big fan of AMD, I have several machines at home running on K6 series chips right now, but AMD zeal is not why this makes me happy. This makes me happy simply because I want to see more competition in the higher end x86 system market where chips are concerned. For years typical home and office single user PC builders have had choices and competition which has driven down prices. Hopefully, this kind of technology and the possibilities it opens will create more competition and freedom of choice in the higher end markets for the serverheads and rendermonkeys who need the heavy multiple CPU horsepower but until now have had little choice where x86 architechture was concerned.
SAVE THE BATS
Re:x86.....still trying to push it! (Score:1)
Start off with "cheap" k-7's and gradually swap them for alphas as the bank balance improves. Some sort of weird microkernal type o/s might be up to the job.... ]|->
AMD still "Designed for MS-Windows"? (Score:1)
Last I looked, AMD still advertised their CPUs as "Designed for MS-Windows". While Intel plays with the OS community to some extent, what does AMD, or their chipset partners such as VIA Inc., do?
And what does the phrase "Designed for Windows" actually mean wrt. CPUs? Softwarewise it sounds quite ominous, especially if you once used DR-DOS, OS/2, Lotus 1-2-3...
CPUReview is a hardware clue-free zone? (Score:1)
There are a few interesting problems with the article but I think it's clear that the writer has no real clue about chip design or hardware in general. For instance, his estimation of the number of pins required to support address/data is off. Back in the bad old days of 8-bit 40-pin (and 16-bit 40-pin) devices, address and data lines were available on the same pins. The same can be done for i/o ports. The downside is that the speed of the HotRail chip would need to outstrip processor speed by 2. But the docs do mention up to a 1.6 ghz bus.
Re:CPUReview is a hardware clue-free zone? (Score:1)
Crossbar! (Score:1)
some out dated crossbar tecnoligy.
NOT an extra clock cycle per access. (Score:1)
The address, data, and command lines can all be multiplexed on a single 64-bit wide bus. Recall that most transfers onto and off of the chip are going to be cache-line fills. Folks, these addresses are adjacent. You only need to know the starting address to know where the entire burst of data goes.
What this means is that you can put out a single "Address With Command" on the bus, followed by a burst of N data items. You've lengthened the transfer by 1 cycle for N items. Suddenly, you've cut your number of lines per CPU port from 128 to 64, and your performance is (N + k) / (N + k + 1), where N is the number of items, and k is the overhead occurred just by initiating a bus access. All in all, the "+ 1" looks pretty tiny, doesn't it?
Besides, ever hear of "pipeline burst cache" for L2?
--Joe--
Ya'll missed something (Score:1)
The K7 was designed from the ground up for SMP.
If you doubt it, just look at the specs for the pins on top of the cartridge. They are expressly for multi-processor systems. I *think* they allow processors to be connected to each other DIRECTLY.
Re:AMD still "Designed for MS-Windows"? (Score:1)
With the AMD processors, it simply meant "fully Intel x86 compatible" in a way that J. Random Luser could understand. Mr. Luser, a "Windows Magazine" subscriber, doesn't understand "x86 compatible", and Intel's lawyers would hammer AMD if it advertised its chips as "Intel compatible" or "Pentium Compatible". It also meant, "See? A big name supports us! We're not such a risky choice!"
If you recall, the AMD486-100 had a "100 reasons" ad way back when ('93/94/95?). It repeated "Windows compatible" and "MS Office compatible" three or four times, but also mentioned that it was OS/2, DOS, NetWare, UNIX, WordPerfect, Lotus 1-2-3, and SmartSuite compatible.
How crossbars work, really. (Score:2)
very much. It's all about how hard it is to put a big crossbar on a
single chip. But the author has apparently never dealt with an actual
crossbar. I used to design supercomputers back in the '80s, and we had it
even worse than they do nowadays, as far as how many pins you could get on
a chip. And yet, we managed to design and even build big crossbars. How
did we do it?
We split the data path across multiple chips. The way this
applies to the example at hand is to build a chip that can switch among 14
ports, each only 16 bits wide. This will require a failrly reasonable 224
pins on the chip. Then you gang these chips up, with one switching data
bits 0 to 15, one switching 16 through 32, etc. To switch a
128-bit-wide bus, you need 8 of these chips. You also need to design a
control chip to look at the address and control lines to decide which
processor to connect to which memory on each clock cycle. The control
chip broadcasts identical switching instructions to all the data chips.
This solution keeps you from having to multiplex pins, and it keeps you
from having to build stupendous 1000-pin packages, and it keeps you from
having to run busses at 800 MHz and turn your computer into a microwave
oven. The only downside is that you need a set of nine chips, but on a
motherboard that aleady has eight processors, that's not too bad.
--Carl Feynman
Hey! Tune the bus and timeslice it! (Score:1)
GAAS gates have propagation times in the single digit PICOSECONDS! Use them!
I leave the rest as an excercise to the observant reader.