BigTux Shows Linux Scales To 64-Way 247
An anonymous reader writes "HP has been demonstrating a Superdome server running the Stream and HPL benchmarks, which shows that the standard 2.6 Linux kernel scales to 64 processors. Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor. The article also notes that HP's customers are increasingly using Linux for enterprise applications, and getting more interested in using it on the desktop..."
So this time.. (Score:4, Funny)
Does it run Linux well?
Re:So this time.. (Score:5, Interesting)
The answers have to do with fine grained locking of kernel services, so that the number of resource contentions between processors can be mitigated through a diverse number of locks with the hope that diversifying locks will ensure that fewer will be likely to be held at a given time, or designing interfaces that don't require locking of kernel structures at all.
At any rate, Amazon successfully powers their backend database with Linux/IA64 running on HP servers. YMMV, but if it's good for what most would consider the preminent online merchant, it's probably good enough for you too.
Re:So this time.. (Score:3, Interesting)
Absolutely. This is why we should be wary of claims that have been made (and posted on Slashdot recently) that Linux 'scales to 512 or 1024 processors' (as in some SGI machines). This size of machine is only effective for very specialised software. A report that the kernel scales well to 64 processors is far more believable, and
Re:So this time.. (Score:2)
Specialized software is required to take advantage
of any NUMA architecture. The code has to be tuned
to take advantage of it. This is true even if you're
only talking about 12 cpus.
A claim that Linux scales well to 64 cpus is no
more or less believable than a claim that it scales
to 1024. Both represent a scale of problem that few
ever deal with.
Re:So this time.. (Score:2)
more or less believable than a claim that it scales
to 1024.
Surely the lesser number is more believable!
Both represent a scale of problem that few ever deal with.
Large businesses like large numbers of processors. Some software scales naturally to this kind of setup: web and application servers, for example.
Re:So this time.. (Score:2)
Re:So this time.. (Score:2)
I disagree. First, the post I replied to said, about high numbers of processors that..."Both represent a scale of problem that few ever deal with." Well, web and appservers are very common!
Secondly, these types of application frequently make a lot of use of kernel services. They can require networking and/or disk activity to make use of databases or other forms of information storage, or to connect to o
Re:So this time.. (Score:2)
The database would be their only shared component and the only part of the system that has any place in this discussion. It's doing all the heavy lifting and concurrency management.
Re:So this time.. (Score:4, Funny)
Re:So this time.. (Score:2)
Pardon my ignorance, but... (Score:3, Interesting)
Re:Pardon my ignorance, but... (Score:2, Interesting)
Re:Pardon my ignorance, but... (Score:3, Insightful)
That type of processing is frequently called "embarrasingly parallel", and it's far more common than you seem to think. I think that 3D rendering and web serving that doesn't require writing to a database can all be handled this way. There are also many categories of scientific data processing- think SETI@home- that work this way. The real reason that this kind of SMP isn't interesting is because it's so easy that you don't need fancy hardware like 64-way servers to take advantage of it. It can be farme
Re:Pardon my ignorance, but... (Score:3, Interesting)
As a simple question you are correct that every parallel computing job has some single processing parts. Those who study parallel systems spend most of their time looking for way to make sure that all processors are in use. Often an algorithm that less than optimal for single processor systems can use more processors, so a choice needs to be made.
The other major issue is communication time. An algorithm that depends on all the CPUs talking all the time may appear fast on paper, but it will be slower t
Re:Pardon my ignorance, but... (Score:2)
Note that the relative size of the serial part of a program is independent of the ease or parallelization. The size of the serial part determines how many processors can be committed to the task before reaching diminis
Geez... (Score:4, Funny)
And you?
Re:Geez... (Score:5, Funny)
Re:Geez... (Score:2)
I
-Chris
Hrmm (Score:5, Interesting)
Unisys
Fujitsu
HP
It looks like there might actually be a competitive marketplace for scalable multiprocessor Linux systems real soon now (if not already).
Re:Hrmm (Score:2)
Where's Microsoft?!?
(snicker)
Re:Hrmm (Score:2)
Re:Hrmm (Score:2)
Re:Hrmm (Score:2)
Re:Hrmm (Score:4, Interesting)
This is about an unmodified 2.6 kernel.
I have the articles at home (Linux Journal) about the SGI systems. First they do measurements on their systems, and then patch the bottlenecks in the kernel.
I don't think these patches can easily be put into a standard kernel.
excuse my ignorance (Score:2, Informative)
Re:excuse my ignorance (Score:3, Insightful)
As they say, if you have to ask, you don't need it.
The point for stuff like this isn't the number of programs that will support it, it's that you already have *one* program that not only supports it, but requires it.
Think weather modeling. It's a specialized application that requires massive CPU horsepower - and it's written specifically for the task at hand. This isn't something you'd pick up at Best Buy, or download from Freshmeat - it
Re:excuse my ignorance (Score:5, Informative)
Think about virtualization. I would love to have a 64-way system and break that up into 32 2-way systems or 16 4-ways systems. It would make system management much easier. And with software, you can instantly assign more processors in a virtualized system to a server that was being hit hard. So your 4-way DB can turn into a 8-way or 16-way DB in an instant. Once the load is gone, you set it back to a 4-way DB.
I personally still prefer to load balance many smaller servers to save costs. However, this could be an excellent option for some enterprises. I know where I work we have some big Sun boxes and we just add processors as we need. However, that has proven to be rather expensive and virtualizing could help save some big costs.
Re:excuse my ignorance (Score:2)
Why would I want a 16-way processor in place of 8 dual processor boxes with a gigabit backbone network to them?
Re:excuse my ignorance (Score:5, Insightful)
I do agree, that "big iron" is losing the power it once had. Especially when one can cluster a bunch of much cheaper 2-way boxes.
Re:excuse my ignorance (Score:4, Interesting)
Take this with a grain of salt, because I was part of the group that developed the chipset for the first Superdome systems (PA-RISC). I'm probably a little biased.
A 64-way Superdome system is spread across sixteen plug-in system boards. (Imagine two refrigerators next to each other; it really is that big.) A partition is made up of one or more system boards. Within a partition, each processor has all of the installed memory in its address space. The chipset handled the details of getting cache blocks back and forth among the system boards.
That's a huge amount of memory to have by direct access. Access is pretty fast, too.
Still, they were doubtless pretty expensive. HP-UX didn't allow for on-the-fly changes to partitions, but the chipset supports it. (The OS always lagged a bit behind. We built a chip to allow going above 64-way, but the OS just couldn't support it. A moral victory.) Perhaps Linux could get that support in place a little more quickly....
Re:excuse my ignorance (Score:5, Insightful)
I prefer to say "might" make systems management much easier. The problem with the One Big Box is the same whether it's Sun, HP, Linux, etc.:
Etcetera...of course, there are just as many if not more problems with the "we'll just build a giant cluster of 64 boxes and scale across it!" approach...I'll rant on that some other day.
It's all trade-offs. And no matter which way you go, you'll discover some truly ugly hidden costs that never seem to show up in those vendor white papers. And none of it works exactly the way it should or you'd like it to. But I'm not jaded or anything ;)
Re:excuse my ignorance (Score:2, Funny)
The answer, of course, is to have a hot spare E10K! Doesn't everyone have an extra one lying around?
Re:excuse my ignorance (Score:3, Informative)
Re:excuse my ignorance (Score:2)
A database server of size 1/n is much more common,
well understood and debugged problem then a database
server of size n.
Re:excuse my ignorance (Score:5, Interesting)
In general, people use clusters of single or dual-processor systems, because many problems demand lots of hauling of data but relatively little communication between processors. For example, ray-tracing involves a lot of processor churning, but the only I/O is getting the information in at the start, and the image out at the end.
Databases are OK for this, so long as the data is relatively static (so you can do a lot of caching on the separate nodes and don't have to access a central disk much).
A 64-way superscaler system, though, is another thing altogether. Here, we're talking about some complex synchronization issues, but also the ability to handle much faster inter-processor I/O. Two processors can "talk" to each other much more efficiently than two ethernet devices. Far fewer layers to go through, for a start.
Not a lot of problems need that kind of performance. The ability to throw small amounts of data around extremely fast would most likely be used by a company looking at fluid dynamics (say, a car or aircraft manufacturer) because of the sheer number of calculations needed, or by someone who needed the answer NOW (fly-by-wire systems, for example, where any delay could result in a nice crater in the ground).
The problem is, most manufacturers out there already have plenty of computing power, and the only fly-by-wire systems that would need this much computing power would need military-grade or space-grade electronics, and there simply aren't any superscaler electronics at that kind of level. At least, not that the NSA is admitting to.
So, sure, there are people who could use such a system, but I cannot imagine many of them are in the market.
Re:excuse my ignorance (Score:4, Informative)
To whom do you think HP has been selling the SuperDome line? And to whom has Sun been selling the E10/12/15K?
One of the benefits of using a huge multiprocessor Sun box, though, besides the massive numbers of CPUs you can have in a single frame running under a single system image is the ability to dynamically reconfigure resources, like a few other posters have touched on.
Imagine this... you have a box with 64 CPUs and 128GB of RAM. During the day, you have developers who are working with 16 CPUs and 32GB of RAM, working on the next generation of the database you'll be running for your business. A development domain.
You have another domain of 16 CPUs and 32GB as a test domain. Like when stuff goes out to beta, you run tests on the stuff you've pushed out from your development copy to see if it's ready for prime-time.
You have a third domain of 32 CPUs and 64GB in which you run production. It's a bit oversized for your needs for the work throughout the day, but it's capable of handling peak loads without slowing down.
Then, you have a nightly database job that runs recalculating numbers for all the accounts, dumping data out to be sent to a reporting server somewhere, batch data loads coming in that need to be integrated into your database. Plus you have to keep servicing minimal amounts of requests from users throughout the night, but hey, nobody's really on between 10PM and 4AM.
Wouldn't it be nice to drop the dev and test databases down to maybe 4CPUs if they're still running minimal tasks, and throw 56CPUs and 112GB of RAM at your nightly batch jobs? They get what's almost the run of the machine... until you're done with the batch jobs. Then you shrink production back to half the machine, and boost up the test and dev to a quarter each... so everyone's happy when the day starts.
Re:excuse my ignorance (Score:3, Informative)
Having the entire engine on one multi-way motherboard wouldn't really gain you much, because none of the work you described needs tight interconnects.
Re:excuse my ignorance (Score:3, Informative)
Right, and that's exactly the situation where a single honking box is going to kick on any kind of cluster that's connected more loosely than what you get with a high-cpu count multiprocessor box.
It all depends on how much interdependence on memory access between threads/processes (i.e. how well you can partition your data set to match your cluster topology). Often, it's a lot cheaper for a company to buy
Re:excuse my ignorance (Score:3, Informative)
I'd personally expect to see a system where common views of the data were cached locally, where the "authoritative" database was accessed via a SAN rather than the processor network, and where interprocessor communication was practically nil. There's not a whole lot that different threads would need to sent to each other.
The whole point of SANs, "local busses" and other such technologies
Re:excuse my ignorance (Score:2)
Everyone is going to be blocking on the data that everyone else wants.
Slashdot hype and RTFA (Score:2)
> A 64-way system may or may not be useful. It depends on the speed of the interconnects, and the way it handles bus locking.
Of course it IS useful. It is great for database consolidation (especially for SQL Server which practically doesn't scale horizontally), for example, as upgrades can be done in minutes and the whole goddamned thing is as stable as an Intel box can be.
And in case you missed what the FA said, they did NOT run an OS on
Re:excuse my ignorance (Score:2)
Instead of taking a day to compile a complex map, it could be done in the time it takes to brew a jug of coffee :)
Threads vs. Processes (Score:5, Insightful)
Then, OTOH, Windows threads are very lightweight compared to the equivalent thread model in Linux. Benchmarks have shown that in multi-process setups, Unix is heavily favored, but in multi-threaded setups Windows comes out on top.
When it comes to multi-processors, is there a theoretical advantage to using processes vs threads? Leaving out the Windows vs Linux debate for a second, how would an OS that implemented very efficient threads compare to one that implemented very efficient processes?
Would there be a difference?
Re:Threads vs. Processes (Score:2, Informative)
Re:Threads vs. Processes (Score:4, Informative)
Oh, and if you think the latest implementation of Linux thread are slower, especially slower then MS Windows, you are an idiot. Here is are some test from IBM. Current Linux threads were spawning at more then 10,000 PER SECOND while MS Windows was spawning barely 6,000. Linux Thread performance [ibm.com], scroll down to the "pretty" graphs. Oh, and these numbers are higher then Solaris. Linux threads and Linux processes spawn _MUCH_ faster then the best MS has to offer and faster then Solaris.
Re:Threads vs. Processes (Score:5, Informative)
Dude, get with the times, LinuxThreads are obsolete. Kernel 2.6 / glibc 2.3 use NPTL, which launches new threads four times faster than LinuxThreads, allows you to have more than 8192 threads per process, doesn't require you to have lots of manager threads that don't do anything useful, delivers signals to threads as opposed to processes, and is actually more-or-less POSIX compliant.
I've been using NPTL on my workstation for 12 months, and I haven't looked back (except when early versions of Mono were incompatible with NPTL). You talk about "any _recent_ Linux thread" - but it looks like you are using a Debian Woody...
welll isnt that sweet (Score:2)
Really the key will be when the system scales to 128processors and beyond.
Wow (Score:5, Funny)
Interesting. (Score:3, Insightful)
To be efficient, the processors would need gigantic caches, to keep the load on the rest of the system down. Either that, or you COULD run the CPUs out of step over a bus that is 64 times faster than normal. I'd hate to be the person designing such a system, though.
Now, this system could be of extreme interest in the supercomputer world. One of the biggest complaints about clustering is the poor interconnects. This would seem to get round that problem. A Blue Gene-style cluster where each node is a 64-way SMP board, and you're running a few thousand nodes, would likely be an order of magnitude faster than anything currently on the supercomputer charts.
On the other hand, do we need to know what the weather is not going to be, ten times as often?
Re:Interesting. (Score:5, Informative)
No, you have a misconception. On these REAL big iron systems, each CPU (or each few CPUs) does have its own busses, memory, and io busses.
So in that regard it is as good as a cluster, but then add the fact that they have a global, cache coherent shared memory and interconnets that shame any cluster.
The only advantage of a cluster is cost. Actually redundancy plays a role too, although less so with proper servers, as they have redundancy built in, and you can partition off the system to run multiple operating systems too.
To be efficient, the processors would need gigantic caches, to keep the load on the rest of the system down. Either that, or you COULD run the CPUs out of step over a bus that is 64 times faster than normal. I'd hate to be the person designing such a system, though.
Now, this system could be of extreme interest in the supercomputer world. One of the biggest complaints about clustering is the poor interconnects. This would seem to get round that problem. A Blue Gene-style cluster where each node is a 64-way SMP board, and you're running a few thousand nodes, would likely be an order of magnitude faster than anything currently on the supercomputer charts.
Not really. Check the world's second fastest supercomputer. It is a cluster of 20 512-way IA64 systems running Linux.
Re:Interesting. (Score:2)
So what you're saying is that this 64-way linux setup is behind the game
The specs (Score:2)
True, you can build very large clusters from these bricks, but the bricks themselves don't scale beyond a relatively small number of CPUs.
Re:The specs (Score:2)
Re:The specs (Score:2)
Now, if you ever log into one of our Altix or Origin systems, you'll see all 512p available to you when you cat
Re:You're an idiot. (Score:2)
Or maybe it's because I was building transputer arrays before you learned what a computer was. (For that matter, I've probably been programming clusters for longer than many Slashdotters have been alive.)
Am I arrogant because of that? Maybe, but I'd much rather express what I know. Do
Re:Interesting. (Score:5, Insightful)
The Altix uses 4-way CPU "bricks", along with networking and memory bricks, which you can then use to assemble a system. Yes, resources are visible globally, and it is a LOT faster than a PoP (pile-of-pcs) cluster using ethernet, but it is still a cluster of 4-way nodes.
It also doesn't avoid the main point, which is that any given resource can only be used by one CPU at a time. If processor A on brick B is passing data along wire C, then wire C cannot be handling traffic for any other processor at the same time. That resource is claimed, for that time.
When you are talking a massive cluster of hundreds or thousands of CPU bricks, it becomes very hard to efficiently schedule the use of resources. That's one reason such systems often have an implementation of SCP, the Scheduled Communications Protocol, where you reserve networking resources in advance. That way, it becomes possible to improve the efficiency. Otherwise, you run the risk of gridlock, which is just as real a problem in computing as it is on the streets.
Re:Interesting. (Score:2, Informative)
There is a world of difference between emulating it with the operating system / programming environment, and having hardware cache coherent global shared memory.
The Altix uses 4-way CPU "bricks", along with networking and memory bricks, which you can then use to assemble a system. Yes, resources are visible globally, and it is a LOT faster than a PoP (pile-of-pcs) cluster using et
Re:Interesting. (Score:2)
You mean, the way most people do? I don't know there are accurate figures out there, but I'd be willing to bet that there are more MOSIX, OpenMOSIX, OpenSSI and Beowulf clusters in production enterprise environments than there are Origins, Altix' and Crays.
+0.5 (Score:2)
To maximize resources to the absolute limit, you'd need a completely asynchronous computer. Such computers exist, sure, but they're usually very specialized a
Re:Interesting. (Score:2, Interesting)
Are you a cluster salesman by chance?
A "big iron" system like one of these has exactly the same CPU-memory ratio as any cluster box - they are COMMODITY CPUs, you put 2-4 of them per bus in these big systems just as you put 2-4 of them on a bus in each
Re:Interesting. (Score:3, Interesting)
If you were to read more about Superdome, you would find that each set of 2 or 4 processors have their own memory, and PCI I/O bus, comprising what is called a "cell".
The memory and I/O devices in a cell are accessible to all the other cells via a inte
Interesting. Almost exactly a year ago... (Score:3, Interesting)
Looks like someone was up to those challenges, eh? 64-processor support *and* 64-bit support. Awesome news.
Will we ever see (Score:3, Interesting)
I'm not talking about mere mortal SMP systems, I wan't all the crazy memory partitioning and whatnot.
Re:Will we ever see (Score:3, Interesting)
8-way multicore chips will be available within a year. Not exactly NUMA, but they'll probably have other nuances to keep you entertained.
But wait...we hate Itanium! (Score:2)
---
Posted as me for the negative karma whoring.
Read my lips (Score:5, Interesting)
http://www.sgi.com/features/2004/oct
The story should be HP has finally caught up to where SGI were 2 years ago.\
Re:Read my lips (Score:3, Interesting)
I am someone who might be in the market for a SGI Altix or XD1, but a very parallel broken box does not scale that well in my opinion.
Parallelizing a C compilation (Score:2)
Really? I would have thought that the compilation of loads and loads of .c files is exactly the sort of thing that could be shared among processors. It certainly has been on projects that I've worked on.
make -j (num of processors) ?
Re:A little factoid for you (Score:2)
Re:A little factoid for you (Score:3, Insightful)
it took 19 minutes to compile with a single cpu, and 26x faster for the 64 processor machine. Does that equate to about 43 seconds for a kernel compile? It'd probably take longer than that just to untar/unbzip2 the source, since that would be running on only 2 cpus (one process for tar, one for bzip2).
Re:A little factoid for you (Score:2)
Re:A little factoid for you (Score:5, Informative)
Re:A little factoid for you (Score:2)
This is 5*16. They said it wrong.
Re:A little factoid for you (Score:5, Insightful)
First of all, a 26x speedup is GOOD. That said, if you are trying to use a cluster of 64 Itanium 2 processors to compile things, you're an idiot. IIRC, the long pipeline and VLIW, highly scheduled, architecture of the Itanium 2 make it bad at compiling. You could get that performance with cheapter Athlon 64s or Xeons. Not only that, but compiling one thing will ALWAYS be partly serial. Now if they were to compile multiple things (say 3 kernels, or the kernel, X, and KDE) at the same time, they should see closer to that 64x speedup. It's all about how much you can make parallel.
Which is something else. If you were to give that same thing a better application, it WOULD give you near 64x performance. If you used it to batch convert WAVs to MP3s, or RAW images to JPEGs, or MPEG4 to DiVx, or even just raytrace images (all things where no part is dependant on another part so they are highly parallizable), things will go great. In the article, they give the example of some bandwidth benchmark where the bandwidth scales almost perfectly with the number of processors they throw at it.
PS: Interesting fact I saw the other day. The human brain can only do about 200 operations per second, which is why computers are much faster at math. But the brain can do MILLIONS of things at once. So while it may only be able to process the image from our eyes at 200 "operations" per second, it do that for the millions of little bits of information all at once, which is why people are so good at visual things, pattern matching, chess, etc. Just FYI.
Re:A little factoid for you (Score:5, Informative)
While FreeBSD is a great OS/kernel, it doesn't scale as well as Linux, end of story.
Huh? What smoke are you craking? Here is the comparison of MS's latest and greatest Windows 2003 server editions [microsoft.com] So, umm where is this double of what Linux supports? Plain vanilla Linux 2.6 can do 64-way no problem. Actually, SGI has had single image 128-way Linux system out for a while. They should have 256-way, single image Linux system out soon. That is more then MS can even touch. Maybe do some research before you just shoot off FUD.Re:A little factoid for you (Score:3, Informative)
Re:A little factoid for you (Score:2)
Re:A little factoid for you (Score:2)
Well I hope the article is wrong concerning how long it took to compile that kernel using a single processor Itanium 2...19 min?
Thu Jan 13 03:22:14 MST 2005
Thu Jan 13 03:41:52 MST 2005
make buildworld = 19.75 min
Tue Jan 18 21:32:08 MST 2005
Tue Jan 18 21:35:54 MST 2005
make buildkernel = 4 min
That is 25 min to compleatly rebuild FreeBSD 4.11 from source.
This is a P4 2.55 (no HTT) with on 1GB ram and PATA disks running Fre
Re:A little factoid for you (Score:2)
Comparing the Itanium complie times to anything is just stupid. I can compile my Linux kernel on a P4 or AMD _much_ faster then 19 minutes.
Re:A little factoid for you (Score:2)
But a 5 times speed increase for me running a machine with a load with ATA disks?
If the pipeline clears/stalls are that bad (even with their massive L3 cache (1.5MB to 9MB), it looks like the Itaniums are really only good for number crunching and not much else.
BWP
Re:A little factoid for you (Score:2)
Re:A little factoid for you (Score:4, Informative)
Until you start talking about double that amount of procs, which is what Windows Server does these days
Wrong. Windows Server 2003 supports a maximum of only 64 processors [microsoft.com], and I believe it was significantly tested only on 32-way and smaller machines.
Re:A little factoid for you (Score:4, Interesting)
Re:A little factoid for you (Score:2, Funny)
They did try Windows Server 2003 on a 64-way machine, but the kernel got scared and hid under the disk controller.
Re:A little factoid for you (Score:2)
Re:A little factoid for you (Score:5, Insightful)
Someone wasn't awake when their Comp Sci class covered Ahmdal's Law. Or the Dining Philosopher's Problem. Or vector processing. Or networking. Or the parallelization problem. Or...
Actually, the troll can be made to serve a useful purpose, because there are probably a lot of people who read Slashdot who didn't do Comp Sci.
Part of the problem with parallelization is that not all problems can be divided up that way. If one man takes 60 seconds to dig a posthole, how long would it take 60 men to dig a single posthole? Answer - 60 seconds. Exactly the same amount of time is spent, because only one person can be digging the posthole at a time. Having more people doesn't help.
Another part of the problem is sharing resources. Let's say you have some computer memory that can respond to a read operation in one clock cycle. Let's also say that the computer program never reads from memory. (Very unlikely.) The first processor fetches an instruction (which is a read operation) and then executes it. The second processor can't do anything while the first one is reading, so has to wait until it has finished with that part, before it can do a read of its own.
If the instruction takes 1 clock cycle to execute, then the first processor will be ready after the second one has performed its fetch. In which case, you will be running the memory flat-out with just 2 processors. Any more than that, and the system will actually slow down, because the processors will have to wait.
Likewise, if the average time to run an instruction is N clock cycles, you will (on average) be able to have N+1 processors, before the memory is maxed out.
In practice, processors run about an order of magnitude faster than RAM, which is why modern systems have lots of L1 and L2 cache (and sometimes L3), pipelining, etc. These are all tricks to try and access the somewhat slower main memory as little as possible.
Also in practice, programmers try to avoid "expensive" (in terms of clock cycles) operations because you can generally get the same results faster by other means. (That's why RISC technology became popular - make the fast operations faster, rather than adding stuff that people will try to avoid.)
In consequence, sharing resources is a very difficult problem. It is not the only problem that many-way systems face, though. If you have N processors, there are !N possible ways for those processors to communicate. In this case, it would be !64 (64x63x62x...x2x1), which is a horribly large number. You couldn't have one link per pathway, for example, which means you've got to share links, which means you've got to have some damn good scheduling and routing mechanisms. Even then, with limited resources, you can only have so many processors talking at a time, before you are overwhelmed. Which means that "chatty" problems will involve a lot of processors spending a lot of time simply waiting for their turn to chat.
(This goes back to why people generally build clusters, rather than many-way SMP systems, and why high-end clusters use the fastest networking technology on the planet. Clustering is easy. Getting the communication speeds up is the problem. Getting communication speeds to the point of being useful for scientific applications is a very complex, expensive problem. Which is the main reason Mr. Cray charged more than Mr. Dell for his computers - and why people would pay it.)
Re:A little factoid for you (Score:2)
Re:64 Itanics! (Score:4, Funny)
That should be enough for anybody :)
Re:Grrr...64 processors? (Score:2)
Re:Grrr...64 processors? (Score:2)
Re:Grrr...64 processors? (Score:2)
Re:Wow... (Score:2, Insightful)
There's a reason you pay so much more per CPU for an SMP or NUMA system, and it ain't for network services.
11?!? (Score:3, Interesting)
How'd you get a three processor system? Is it a quad board, discounted heavily because one socket was broken? That'd be neat, where'd you get it?
Re:11?!? (Score:2)
~phil
Explaining jokes (Score:2)
I guess the two Spinal Tap explainers never heard the joke about only 10 people in the world. No, I'm not going to explain that.
This is pathetic. Insightful, jeez. Now watch someone mod this as flamebait or funny.
Re:The SGI Altix is scaling to 256 cpus... (Score:5, Informative)
The SGI boxes are nothing like the stock kernel.
Re:The SGI Altix is scaling to 256 cpus... (Score:2)
SCO?
Joke, joke! *exit pursued by villagers with buring torches*
TWW
Re:Mandatory... (Score:3, Funny)
Two mod points if you can work a good goatse or overlord joke into this topic. Although, the thought of a 64-way goatse overlord gives me the jeebies.
Re:I work on a SuperDome (Score:2, Funny)
You must have high ceilings in your office.
Re:I work on a SuperDome (Score:2)