Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Linux Business HP Software Hardware Linux

BigTux Shows Linux Scales To 64-Way 247

An anonymous reader writes "HP has been demonstrating a Superdome server running the Stream and HPL benchmarks, which shows that the standard 2.6 Linux kernel scales to 64 processors. Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor. The article also notes that HP's customers are increasingly using Linux for enterprise applications, and getting more interested in using it on the desktop..."
This discussion has been archived. No new comments can be posted.

BigTux Shows Linux Scales To 64-Way

Comments Filter:
  • by puntloos ( 673234 ) on Tuesday January 18, 2005 @11:32PM (#11404620) Journal
    The age-old Slashdot question should read:

    Does it run Linux well?

    • Re:So this time.. (Score:5, Interesting)

      by ikewillis ( 586793 ) on Tuesday January 18, 2005 @11:43PM (#11404687) Homepage
      This is the real question which is oft ignored. There is far too great an emphasis of being able to manage n CPUs rather than how effectively kernel services operate on n CPUs.

      The answers have to do with fine grained locking of kernel services, so that the number of resource contentions between processors can be mitigated through a diverse number of locks with the hope that diversifying locks will ensure that fewer will be likely to be held at a given time, or designing interfaces that don't require locking of kernel structures at all.

      At any rate, Amazon successfully powers their backend database with Linux/IA64 running on HP servers. YMMV, but if it's good for what most would consider the preminent online merchant, it's probably good enough for you too.

      • Re:So this time.. (Score:3, Interesting)

        by Decaff ( 42676 )
        This is the real question which is oft ignored. There is far too great an emphasis of being able to manage n CPUs rather than how effectively kernel services operate on n CPUs.

        Absolutely. This is why we should be wary of claims that have been made (and posted on Slashdot recently) that Linux 'scales to 512 or 1024 processors' (as in some SGI machines). This size of machine is only effective for very specialised software. A report that the kernel scales well to 64 processors is far more believable, and
        • That is a truism.

          Specialized software is required to take advantage
          of any NUMA architecture. The code has to be tuned
          to take advantage of it. This is true even if you're
          only talking about 12 cpus.

          A claim that Linux scales well to 64 cpus is no
          more or less believable than a claim that it scales
          to 1024. Both represent a scale of problem that few
          ever deal with.
          • A claim that Linux scales well to 64 cpus is no
            more or less believable than a claim that it scales
            to 1024.


            Surely the lesser number is more believable!

            Both represent a scale of problem that few ever deal with.

            Large businesses like large numbers of processors. Some software scales naturally to this kind of setup: web and application servers, for example.
            • Web and java appservers are the IT equivalent of a renderfarm. They have no place in this discussion.
              • Web and java appservers are the IT equivalent of a renderfarm. They have no place in this discussion.

                I disagree. First, the post I replied to said, about high numbers of processors that..."Both represent a scale of problem that few ever deal with." Well, web and appservers are very common!

                Secondly, these types of application frequently make a lot of use of kernel services. They can require networking and/or disk activity to make use of databases or other forms of information storage, or to connect to o
                • Webservers and java servers don't share any common resources. Much like a renderfarm, you can split the problem onto as many single or dual cpu systems as you like with no scaling penalty.

                  The database would be their only shared component and the only part of the system that has any place in this discussion. It's doing all the heavy lifting and concurrency management.
    • by Trejkaz ( 615352 ) on Tuesday January 18, 2005 @11:58PM (#11404771) Homepage
      I'd rather know if it can run Longhorn...
    • It's about time Linux was ported to the Commodore-64. They said it couldn't handle all 64 Kilobytes of memory, but this shows otherwise.
  • by wizard_of_wor ( 849406 ) on Tuesday January 18, 2005 @11:34PM (#11404636)
    What parallel-computing activity doesn't involve intermittent activity by a single processor? You have to spawn the parallel job somehow, and typically that starts as a single process. Is the implication here that compiling is pipelined, but linking is a single-CPU job?
    • I could imagine an SMP job where you immediately spawn N new processes each which computes a certain subset of a given dataset. Assuming you never collected the results at the end (say, you just write out the results to files on disk for later analysis), you would technically never need inter-process communication, thus no serial processing by a single "master" process. But yes, you're right. You almost never do this in parallel processing, and in that sense the post is misleading in assuming there is anyt
      • That type of processing is frequently called "embarrasingly parallel", and it's far more common than you seem to think. I think that 3D rendering and web serving that doesn't require writing to a database can all be handled this way. There are also many categories of scientific data processing- think SETI@home- that work this way. The real reason that this kind of SMP isn't interesting is because it's so easy that you don't need fancy hardware like 64-way servers to take advantage of it. It can be farme

    • As a simple question you are correct that every parallel computing job has some single processing parts. Those who study parallel systems spend most of their time looking for way to make sure that all processors are in use. Often an algorithm that less than optimal for single processor systems can use more processors, so a choice needs to be made.

      The other major issue is communication time. An algorithm that depends on all the CPUs talking all the time may appear fast on paper, but it will be slower t

  • Geez... (Score:4, Funny)

    by byronne ( 47527 ) on Tuesday January 18, 2005 @11:36PM (#11404647) Homepage
    I haven't had a 64-way since college.

    And you?

  • Hrmm (Score:5, Interesting)

    by Nailer ( 69468 ) on Tuesday January 18, 2005 @11:38PM (#11404652)
    SGI
    Unisys
    Fujitsu
    HP

    It looks like there might actually be a competitive marketplace for scalable multiprocessor Linux systems real soon now (if not already).
  • excuse my ignorance (Score:2, Informative)

    by g0dsp33d ( 849253 )
    I know linux is pretty good from a security sence (compared to windows, at least), and I'm not surprised to find it operates on exotic setups, but is there that many programs out there that support such a setup? or ones that will actually benefit from this many processors? Or is the point of this system to develop custom business for their use? Or is it for a data server of some sort that can benefit from multiple cores answering requests?
    • by Anonymous Coward
      is there that many programs out there that support such a setup?

      As they say, if you have to ask, you don't need it.

      The point for stuff like this isn't the number of programs that will support it, it's that you already have *one* program that not only supports it, but requires it.

      Think weather modeling. It's a specialized application that requires massive CPU horsepower - and it's written specifically for the task at hand. This isn't something you'd pick up at Best Buy, or download from Freshmeat - it
    • by AstroDrabb ( 534369 ) on Tuesday January 18, 2005 @11:58PM (#11404773)
      There are still many uses for this many processors. Think of a monster DB. It is much easier to have more processors on you DB than to have many small systems and have to worry about syncing the data.

      Think about virtualization. I would love to have a 64-way system and break that up into 32 2-way systems or 16 4-ways systems. It would make system management much easier. And with software, you can instantly assign more processors in a virtualized system to a server that was being hit hard. So your 4-way DB can turn into a 8-way or 16-way DB in an instant. Once the load is gone, you set it back to a 4-way DB.

      I personally still prefer to load balance many smaller servers to save costs. However, this could be an excellent option for some enterprises. I know where I work we have some big Sun boxes and we just add processors as we need. However, that has proven to be rather expensive and virtualizing could help save some big costs.

      • What is the advantage of 64-way processor box in place of a HPCC of a few(dozen?) single or dual processor boxes that are cheaper?

        Why would I want a 16-way processor in place of 8 dual processor boxes with a gigabit backbone network to them?
        • by AstroDrabb ( 534369 ) on Wednesday January 19, 2005 @12:51AM (#11405045)
          Why would I want a 16-way processor in place of 8 dual processor boxes with a gigabit backbone network to them?
          It all depends on what you are doing. Where I work we replaced a few bigger boxes with a bunch of smaller/cheaper boxes behind a load balancer for web apps. However, when it came to DB performance, the bigger boxes were much better. Well, at least to a point. Our 8-way DB was much better then are 4 2-way DB's. The cost wasn't much more, so an 8-way worked well.

          I do agree, that "big iron" is losing the power it once had. Especially when one can cluster a bunch of much cheaper 2-way boxes.

        • by Sir Nimrod ( 163306 ) on Wednesday January 19, 2005 @12:59AM (#11405093)

          Take this with a grain of salt, because I was part of the group that developed the chipset for the first Superdome systems (PA-RISC). I'm probably a little biased.

          A 64-way Superdome system is spread across sixteen plug-in system boards. (Imagine two refrigerators next to each other; it really is that big.) A partition is made up of one or more system boards. Within a partition, each processor has all of the installed memory in its address space. The chipset handled the details of getting cache blocks back and forth among the system boards.

          That's a huge amount of memory to have by direct access. Access is pretty fast, too.

          Still, they were doubtless pretty expensive. HP-UX didn't allow for on-the-fly changes to partitions, but the chipset supports it. (The OS always lagged a bit behind. We built a chip to allow going above 64-way, but the OS just couldn't support it. A moral victory.) Perhaps Linux could get that support in place a little more quickly....

      • by afabbro ( 33948 ) on Wednesday January 19, 2005 @01:56AM (#11405342) Homepage
        It would make system management much easier.

        I prefer to say "might" make systems management much easier. The problem with the One Big Box is the same whether it's Sun, HP, Linux, etc.:

        • Something bad happens to the One Or Two Critical Components. If you know of any open systems box has no one single point of failure, I'd sure like to see it. If you want one big box without a single point of failure, you buy a mainframe. Every open systems big box I'm aware has at least one or ten SPOFs...and I've had the backplane go out on more than one Sun E10K. At that point, you don't lose just one system, you lose everything if you've consolidated to one 64-way box.
        • It's time to do some hardware maintenance. Good luck coordinating that with 32 different user groups. "Ah, but we can do everything hot with this big box." Always sounds good on paper. I've always run into things for each of them that required a power-off maintenance.
        • Or perhaps it's not even maintenance...it's just something weird. I had a Big Box once where a power supply made a popping noise and emitted a small puff of smoke. It burned out. Not a big deal in the end - it could be replaced hot - but it was a nervous couple of hours. Versus a cluster where you'd fail over to the spare (yes, I know you could cluster your Two Big Boxes, but we start getting into financial justifications).
        • ISVs say things like "You want to run XYZ 1.0 on your 64-way box? That's a tier 9 platform and that will be $100,000, thank you." "But I'm only using it on one 2-way partition!" "You might dynamically reconfigure it after we sell you the license and our software isn't that smart, so it's $100K or no deal. And then you can use it on all your partitions!" "But I don't need it on all of them!" You'd be amazed how many prominent software companies tier based on the overall box and don't support virtual partitions, etc. from a licensing perspective. And you're guaranteed to have a user who needs one of their products.
        • Department B bought SAN gizmo X and your big box is exotic enough that there is no driver for it. They really want SAN gizmo X, so they go off and buy a new 4-way box for themselves. Or they want to run SuSE and SuSE doesn't support your box. Or everyone wants his own gig-E or two and you don't have 128 ports out the back. Etcetera - there are lots of scenarios where you can't get the technical architecture brainiacs to think ahead or you can't get the vendors' stars to line up and you wind up with people who don't want to be on the big box...and pretty soon the data center is proliferating again.

        Etcetera...of course, there are just as many if not more problems with the "we'll just build a giant cluster of 64 boxes and scale across it!" approach...I'll rant on that some other day.

        It's all trade-offs. And no matter which way you go, you'll discover some truly ugly hidden costs that never seem to show up in those vendor white papers. And none of it works exactly the way it should or you'd like it to. But I'm not jaded or anything ;)

      • by Cajal ( 154122 )
        Just remember that almost no open-source databases use parallelized algorithms. PostgreSQL, Firebird and MySQL certainly don't. OpenIngres is the only one I know of with a parallel query engine. By this I mean the ability of a single query to use multiple processors (say, for handling a complex join and a large sort). The only way PG, FB and MySQL can use multiple CPUs is if you have multiple queries running. But for OLAP-style workloads, you won't see much benefit from SMP.
    • by jd ( 1658 ) <imipak@yahoGINSBERGo.com minus poet> on Wednesday January 19, 2005 @12:16AM (#11404876) Homepage Journal
      A 64-way system may or may not be useful. It depends on the speed of the interconnects, and the way it handles bus locking. (On a 64-way system, any given CPU can only have control of a given resource 1/64th of the time. Unless this is handled extremely well, this is Bad News.)


      In general, people use clusters of single or dual-processor systems, because many problems demand lots of hauling of data but relatively little communication between processors. For example, ray-tracing involves a lot of processor churning, but the only I/O is getting the information in at the start, and the image out at the end.


      Databases are OK for this, so long as the data is relatively static (so you can do a lot of caching on the separate nodes and don't have to access a central disk much).


      A 64-way superscaler system, though, is another thing altogether. Here, we're talking about some complex synchronization issues, but also the ability to handle much faster inter-processor I/O. Two processors can "talk" to each other much more efficiently than two ethernet devices. Far fewer layers to go through, for a start.


      Not a lot of problems need that kind of performance. The ability to throw small amounts of data around extremely fast would most likely be used by a company looking at fluid dynamics (say, a car or aircraft manufacturer) because of the sheer number of calculations needed, or by someone who needed the answer NOW (fly-by-wire systems, for example, where any delay could result in a nice crater in the ground).


      The problem is, most manufacturers out there already have plenty of computing power, and the only fly-by-wire systems that would need this much computing power would need military-grade or space-grade electronics, and there simply aren't any superscaler electronics at that kind of level. At least, not that the NSA is admitting to.


      So, sure, there are people who could use such a system, but I cannot imagine many of them are in the market.

      • by PornMaster ( 749461 ) on Wednesday January 19, 2005 @01:02AM (#11405106) Homepage
        There are plenty of them on the market, and as the price comes down, there will be even more.

        To whom do you think HP has been selling the SuperDome line? And to whom has Sun been selling the E10/12/15K?

        One of the benefits of using a huge multiprocessor Sun box, though, besides the massive numbers of CPUs you can have in a single frame running under a single system image is the ability to dynamically reconfigure resources, like a few other posters have touched on.

        Imagine this... you have a box with 64 CPUs and 128GB of RAM. During the day, you have developers who are working with 16 CPUs and 32GB of RAM, working on the next generation of the database you'll be running for your business. A development domain.

        You have another domain of 16 CPUs and 32GB as a test domain. Like when stuff goes out to beta, you run tests on the stuff you've pushed out from your development copy to see if it's ready for prime-time.

        You have a third domain of 32 CPUs and 64GB in which you run production. It's a bit oversized for your needs for the work throughout the day, but it's capable of handling peak loads without slowing down.

        Then, you have a nightly database job that runs recalculating numbers for all the accounts, dumping data out to be sent to a reporting server somewhere, batch data loads coming in that need to be integrated into your database. Plus you have to keep servicing minimal amounts of requests from users throughout the night, but hey, nobody's really on between 10PM and 4AM.

        Wouldn't it be nice to drop the dev and test databases down to maybe 4CPUs if they're still running minimal tasks, and throw 56CPUs and 112GB of RAM at your nightly batch jobs? They get what's almost the run of the machine... until you're done with the batch jobs. Then you shrink production back to half the machine, and boost up the test and dev to a quarter each... so everyone's happy when the day starts.
        • by jd ( 1658 )
          A SSI cluster that supported roles for defining the distribution of tasks would probably be more cost-effective. You'd also need Distributed Shared Memory, though, and distribution of threads as well as processes.


          Having the entire engine on one multi-way motherboard wouldn't really gain you much, because none of the work you described needs tight interconnects.

          • by ppanon ( 16583 )
            You'd also need Distributed Shared Memory, though, and distribution of threads as well as processes.

            Right, and that's exactly the situation where a single honking box is going to kick on any kind of cluster that's connected more loosely than what you get with a high-cpu count multiprocessor box.

            It all depends on how much interdependence on memory access between threads/processes (i.e. how well you can partition your data set to match your cluster topology). Often, it's a lot cheaper for a company to buy
            • by jd ( 1658 )
              No, databases don't need tight interconnects, unless the data is changing rapidly, relative to the number of queries.

              I'd personally expect to see a system where common views of the data were cached locally, where the "authoritative" database was accessed via a SAN rather than the processor network, and where interprocessor communication was practically nil. There's not a whole lot that different threads would need to sent to each other.

              The whole point of SANs, "local busses" and other such technologies

              • The key with the database problem is how tightly coupled the changes would be. If the data can be effectively partitioned at runtime, then it should cluster quite well. Now if the data can't be partitiond at runtime then you are going to have scalability problems with a single system image anyways.

                Everyone is going to be blocking on the data that everyone else wants.
      • The fucking news and the fucking article itself are misleading.

        > A 64-way system may or may not be useful. It depends on the speed of the interconnects, and the way it handles bus locking.

        Of course it IS useful. It is great for database consolidation (especially for SQL Server which practically doesn't scale horizontally), for example, as upgrades can be done in minutes and the whole goddamned thing is as stable as an Intel box can be.
        And in case you missed what the FA said, they did NOT run an OS on
    • vising and lighting quake (or any other fps) maps would be one application close to many /.ers' hearts. Of course, the tools themselves have to support threading (qfvis and qflight from quakeforge [quakeforge.net] do), but that's just a minor detail :)

      Instead of taking a day to compile a complex map, it could be done in the time it takes to brew a jug of coffee :)

  • by Dancin_Santa ( 265275 ) <DancinSanta@gmail.com> on Tuesday January 18, 2005 @11:53PM (#11404745) Journal
    Looking at the literature, Linux and Unix in general seems to be designed to keep processes as lightweight as possible. OTOH, Windows processes are a little heavier and take longer to start up.

    Then, OTOH, Windows threads are very lightweight compared to the equivalent thread model in Linux. Benchmarks have shown that in multi-process setups, Unix is heavily favored, but in multi-threaded setups Windows comes out on top.

    When it comes to multi-processors, is there a theoretical advantage to using processes vs threads? Leaving out the Windows vs Linux debate for a second, how would an OS that implemented very efficient threads compare to one that implemented very efficient processes?

    Would there be a difference?
    • by Anonymous Coward
      Nice thing about processes is that they do not share memory. As such, the processes will be localized as would all the memory access. OTH, if you had just ONE big process loaded with nothing but threads, you would likely find the memory backplane going into highgear as data would be moved around abit.
  • thats great to know that the kernel can handle 64way machines.. Especially since i just ordered one from my local pc store in bits to build myself..

    Really the key will be when the system scales to 128processors and beyond.
  • Wow (Score:5, Funny)

    by Anonymous Coward on Wednesday January 19, 2005 @12:22AM (#11404906)
    Never mind Linux for a moment, I'm just amazed that 64 Itanium 2's have actually been sold...
  • Interesting. (Score:3, Insightful)

    by jd ( 1658 ) <imipak@yahoGINSBERGo.com minus poet> on Wednesday January 19, 2005 @12:27AM (#11404940) Homepage Journal
    The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.


    To be efficient, the processors would need gigantic caches, to keep the load on the rest of the system down. Either that, or you COULD run the CPUs out of step over a bus that is 64 times faster than normal. I'd hate to be the person designing such a system, though.


    Now, this system could be of extreme interest in the supercomputer world. One of the biggest complaints about clustering is the poor interconnects. This would seem to get round that problem. A Blue Gene-style cluster where each node is a 64-way SMP board, and you're running a few thousand nodes, would likely be an order of magnitude faster than anything currently on the supercomputer charts.


    On the other hand, do we need to know what the weather is not going to be, ten times as often?

    • Re:Interesting. (Score:5, Informative)

      by Anonymous Coward on Wednesday January 19, 2005 @12:36AM (#11404985)
      The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.

      No, you have a misconception. On these REAL big iron systems, each CPU (or each few CPUs) does have its own busses, memory, and io busses.

      So in that regard it is as good as a cluster, but then add the fact that they have a global, cache coherent shared memory and interconnets that shame any cluster.

      The only advantage of a cluster is cost. Actually redundancy plays a role too, although less so with proper servers, as they have redundancy built in, and you can partition off the system to run multiple operating systems too.

      To be efficient, the processors would need gigantic caches, to keep the load on the rest of the system down. Either that, or you COULD run the CPUs out of step over a bus that is 64 times faster than normal. I'd hate to be the person designing such a system, though.

      Now, this system could be of extreme interest in the supercomputer world. One of the biggest complaints about clustering is the poor interconnects. This would seem to get round that problem. A Blue Gene-style cluster where each node is a 64-way SMP board, and you're running a few thousand nodes, would likely be an order of magnitude faster than anything currently on the supercomputer charts.


      Not really. Check the world's second fastest supercomputer. It is a cluster of 20 512-way IA64 systems running Linux.
      • 20 512-way IA64
        So what you're saying is that this 64-way linux setup is behind the game ;) that this article is only news in that YAN computer company is offering a barely there mass CPU enterprise linux server ;)
        • by jd ( 1658 )
          The second-fastest supercomputer [top500.org] is an Altix. The Altix uses the same "brick" structure as the Origin, where you bolt together pre-fabricated computing blocks. Essentially a cluster, rather than a N-way SMP system. The specs for the bricks [sgi.com] say that the processor bricks are 16-way. A tad shy of the 512-way that AC boasted. :)

          True, you can build very large clusters from these bricks, but the bricks themselves don't scale beyond a relatively small number of CPUs.

      • Re:Interesting. (Score:5, Insightful)

        by jd ( 1658 ) <imipak@yahoGINSBERGo.com minus poet> on Wednesday January 19, 2005 @01:21AM (#11405208) Homepage Journal
        Global shared-memory can be done on OpenMOSIX, using the Migshm extension, which provides you with Distributed Shared Memory.


        The Altix uses 4-way CPU "bricks", along with networking and memory bricks, which you can then use to assemble a system. Yes, resources are visible globally, and it is a LOT faster than a PoP (pile-of-pcs) cluster using ethernet, but it is still a cluster of 4-way nodes.


        It also doesn't avoid the main point, which is that any given resource can only be used by one CPU at a time. If processor A on brick B is passing data along wire C, then wire C cannot be handling traffic for any other processor at the same time. That resource is claimed, for that time.


        When you are talking a massive cluster of hundreds or thousands of CPU bricks, it becomes very hard to efficiently schedule the use of resources. That's one reason such systems often have an implementation of SCP, the Scheduled Communications Protocol, where you reserve networking resources in advance. That way, it becomes possible to improve the efficiency. Otherwise, you run the risk of gridlock, which is just as real a problem in computing as it is on the streets.

        • Re:Interesting. (Score:2, Informative)

          by Anonymous Coward
          Global shared-memory can be done on OpenMOSIX, using the Migshm extension, which provides you with Distributed Shared Memory.

          There is a world of difference between emulating it with the operating system / programming environment, and having hardware cache coherent global shared memory.

          The Altix uses 4-way CPU "bricks", along with networking and memory bricks, which you can then use to assemble a system. Yes, resources are visible globally, and it is a LOT faster than a PoP (pile-of-pcs) cluster using et
          • Why do you think people pay so much money for one when they could get 1000 cheap P4's and cluster them?


            You mean, the way most people do? I don't know there are accurate figures out there, but I'd be willing to bet that there are more MOSIX, OpenMOSIX, OpenSSI and Beowulf clusters in production enterprise environments than there are Origins, Altix' and Crays.

    • Re:Interesting. (Score:2, Interesting)

      by Anonymous Coward
      The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.

      Are you a cluster salesman by chance?

      A "big iron" system like one of these has exactly the same CPU-memory ratio as any cluster box - they are COMMODITY CPUs, you put 2-4 of them per bus in these big systems just as you put 2-4 of them on a bus in each
    • Re:Interesting. (Score:3, Interesting)

      by ptbarnett ( 159784 )
      The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.

      If you were to read more about Superdome, you would find that each set of 2 or 4 processors have their own memory, and PCI I/O bus, comprising what is called a "cell".

      The memory and I/O devices in a cell are accessible to all the other cells via a inte

  • by gnunick ( 701343 ) on Wednesday January 19, 2005 @12:58AM (#11405086) Homepage
    IBM packs 64 Xeons into a single server [zdnet.co.uk] (Jan 15, 2004)
    "[CTO of IBM's xSeries server group Tom Bradicich] acknowledges that there are challenges in producing such a large system -- including building support into Windows and Linux, neither of which are suited for 64-processor systems today"

    Looks like someone was up to those challenges, eh? 64-processor support *and* 64-bit support. Awesome news.

  • Will we ever see (Score:3, Interesting)

    by stratjakt ( 596332 ) on Wednesday January 19, 2005 @01:04AM (#11405115) Journal
    Smaller, say 4 or 8 way NUMA boards, that are within the means of the average geek?

    I'm not talking about mere mortal SMP systems, I wan't all the crazy memory partitioning and whatnot.
    • Re:Will we ever see (Score:3, Interesting)

      by SunFan ( 845761 )

      8-way multicore chips will be available within a year. Not exactly NUMA, but they'll probably have other nuances to keep you entertained.

  • I'm so confused. Itanium bad. Linux kernel scalability good. Help!
    ---
    Posted as me for the negative karma whoring.
  • Read my lips (Score:5, Interesting)

    by Chatz ( 8725 ) on Wednesday January 19, 2005 @01:48AM (#11405310)
    Linux scaling to 512 processors:
    http://www.sgi.com/features/2004/oct/ columbia/

    The story should be HP has finally caught up to where SGI were 2 years ago.\
    • Re:Read my lips (Score:3, Interesting)

      by hackstraw ( 262471 ) *
      I've heard through the grapevine that the mods to the linux kernel have stability issues.

      I am someone who might be in the market for a SGI Altix or XD1, but a very parallel broken box does not scale that well in my opinion.
  • Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor.

    Really? I would have thought that the compilation of loads and loads of .c files is exactly the sort of thing that could be shared among processors. It certainly has been on projects that I've worked on.

    make -j (num of processors) ?

Neutrinos have bad breadth.

Working...