Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Operating Systems Software Windows Hardware Linux Technology

Windows and Linux Not Well Prepared For Multicore Chips 626

Mike Chapman points out this InfoWorld article, according to which you shouldn't immediately expect much in the way of performance gains from Windows 7 (or Linux) from eight-core chips that come out from Intel this year. "For systems going beyond quad-core chips, the performance may actually drop beyond quad-core chips. Why? Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores. Problem? The development tools aren't available and research is only starting."
This discussion has been archived. No new comments can be posted.

Windows and Linux Not Well Prepared For Multicore Chips

Comments Filter:
  • by Microlith ( 54737 ) on Sunday March 22, 2009 @03:35PM (#27290099)

    So basically yet another tech writer finds out that a huge number of applications are still single threaded, and that it will be a while before we have applications that can take advantage of the cores that the OS isn't actively using at the moment. Well, assuming you're running a desktop and not a server.

    This isn't a performance issue with regards to Windows or Linux, they're quite adept at handling multiple cores. They just don't need that much themselves and the applications run these days, individually, don't need much more than that either.

    So yes, applications need parallelization. The tools for it are rudimentary at best. We know this. Nothing to see here.

  • by Anonymous Coward on Sunday March 22, 2009 @03:42PM (#27290185)

    "The development tools aren't available and research is only starting"

    Hardly. Erlang's been around 20 years. Newer languages like Scala, Clojure, and F# all have strong concurrency. Haskell has had a lot of recent effort in concurrency (www.haskell.org/~simonmar/papers/multicore-ghc.pdf).

    If you prefer books there's: Patterns for Parallel Programming, the Art of Multiprocessor Programming, and Java Concurrency in Practice, to name a few.

    All of these are available now, and some have been available for years.

    The problem isn't that tools aren't available, it's that the programmers aren't preparing themselves and haven't embraced the right tools.

  • by Anonymous Coward on Sunday March 22, 2009 @03:46PM (#27290259)

    The quote presented in the summary is nowhere to be found in the linked article. To make matters worse, the summary claims that linux and windows aren't designed for multicore computers but the linked article only claims that some applications are not designed to be multi-threaded or running multiple processes. Well, who said that every application under the sun must be heavily multi-threaded or spawning multiple processes? Where's the need for a email client to spawn 8 or 16 threads? Will my address book be any better if it spans a bunch of processes?

    The article is bad and timothy should feel bad. Why is he still responsible for any news being posted on slashdot?

  • by tepples ( 727027 ) <tepples.gmail@com> on Sunday March 22, 2009 @03:51PM (#27290311) Homepage Journal

    Is TFA talking about the Linux or Windows thread and scheduling not good enough for 4+ cores (so your programs no matter how good designed will not benefit from more cores), about being damn hard to split, thread and join tasks, or both?

    I understood the article to refer to the latter. The programming languages that are popular for desktop applications as of the 2000s don't have the proper tools (such as an unordered for-each loop or a rigorous actor model [wikipedia.org]) to make parallel programming easy.

  • by Pascal Sartoretti ( 454385 ) on Sunday March 22, 2009 @03:59PM (#27290431)
    Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores.

    So what? If I had a 32 core system, at least each running process (even if single-threaded) could have a core just for itself. Only a few basic applications (such as a browser) really need to be designed for multiples threads.
  • by klossner ( 733867 ) on Sunday March 22, 2009 @04:00PM (#27290443)
    In fact, TFA doesn't even use the words "Linux" or "Windows."
  • by tepples ( 727027 ) <tepples.gmail@com> on Sunday March 22, 2009 @04:09PM (#27290531) Homepage Journal

    What good are multiple cores and threads when you are running event driven GUI application?

    Mozilla Firefox is an event-driven GUI application. But if I open a page in a new tab, a big reflow or JavaScript run in that page can freeze the page I'm looking at. You can see this yourself: open this page [slashdot.org] in multiple tabs, and then try to scroll the foreground page. If Firefox used a thread or process per page like Google Chrome does, the operating system would take care of this. Other applications need to spawn threads when calling an API that blocks, such as gethostbyname() or getaddrinfo(); otherwise, the part of the program that interacts with the user will freeze. But these are the kind of threads that are useful even on a single core, not multicore-specific optimizations.

  • It's already there (Score:4, Insightful)

    by wurp ( 51446 ) on Sunday March 22, 2009 @04:14PM (#27290575) Homepage

    Seriously, no one has brought up functional programming, LISP, Scala or Erlang? When you use functional programming, no data changes and so each call can happen on another thread, with the main thread blocking when (& not before) it needs the return value. In particular, Erlang and Scala are specifically designed to make the most of multiple cores/processors/machines.

    See also map-reduce and multiprocessor database techniques like BSD and CouchDB (http://books.couchdb.org/relax/eventual-consistency).

  • by tyler_larson ( 558763 ) on Sunday March 22, 2009 @04:15PM (#27290597) Homepage
    If you spend more time assigning blame than you do describing the problem, then clearly you don't have anything insightful to say.
  • Re:Adapt (Score:5, Insightful)

    by Cassini2 ( 956052 ) on Sunday March 22, 2009 @04:19PM (#27290637)

    Give us a year maybe two.

    I think this problem will take longer than a year or two to solve. Modern computers are really fast. They solve simple problems, almost instantly. A side-effect of this, is that if you underestimate the computational power required for the problem at hand, then you are likely to be off by large amounts.

    If you implement an order n-squared algorithm, O(n^2), on a 6502 (Apple II), if n was larger than a few hundred, you were dead. Many programmers wouldn't even try implementing hard algorithms on the early Apple II computers. On the other hand, a modern processor might tolerate O(n^2) algorithms with n larger than 1000. Programmers can try solving much harder problems. However, the programmers ability to estimate and deal with computational complexity has not changed since the early days of computers. Programmers use generalities. They use ranges: like n will be between 5 and 100, or n will be between 1000 and 100,000. With modern problems, n=1000 might mean the problem can be solved on a netbook, and n=100,000 might require a small multi-core cluster.

    There aren't many programming platforms out there that scale smoothly between applications deployed on a desktop, to applications deployed on a multi-core desktop, and then to clusters of multi-core desktops. Perhaps most worrying, is that the new programming languages that are coming out, are not particularly useful for intense data analysis. The big examples of this for me are: .NET and Functional Languages. .NET deployed at about the same time multi-core chips showed up, and has minimal support for it. Functional languages may eventually be the solution, but for any numerically intensive application, tight loops of C code are much faster.

    The other issue with multi-core chips, is that as a programmer, I have two solutions to making my code go faster:
    1. Get out the assembly print outs and the profiler, and figure out why the processor is running slow. Doing this, helps every user of the application, and works well with almost any of the serious compiled languages (C, C++). Sometimes, I can get a 10:1 speed improvement.(*) It doesn't work so well with Java, .NET, or many functional languages, because they use run-time compilers/interpreters and don't generate assembly code.
    2. I recode for a cluster. Why stop at a multi-core computer? If I can get a 2:1 to 10:1 speed up by writing better code, then why stop at a dual or quad core? The application might require a 100:1 speed up, and that means more computers. If I have a really nasty problem, chances are that 100 cores are required, not just 2 or 8. Multi-core processors are nice, because they reduce cluster size and cost, but a cluster will likely be required.

    The problem with both of the above approaches, is that from a tools perspective, they are the worst choice for multi-core optimizations. Approach 1 will force me into using C and C++, which doesn't even handle threads really well. In particular, C and C++ lacks an easy implementation of Software Transactional Memory, NUMA, and clusters. This means that approach 2 may require a complete software redesign, and possibly either a language change or a major change in the compilation environment. Either way, my days of fun loving Java and .NET code are coming to a sudden end.

    I just don't think there is any easy way around it. The tools aren't yet available for easy implementation of fast code that scales between the single-core assumption and the multi-core assumption in a smooth manner.

    Note: * - By default, many programmers don't take advantage of many features that may increase the speed of an algorithm. Built-in special purpose libraries, like MMX, can dramatically speed up certain loops. Sometimes loops contain a great deal of code that can be eliminated. Maybe a function call is present in a tight loop. Anti-virus software can dramatically affect system speed. Many little things can sometimes make big differences.

  • That's a big leap (Score:4, Insightful)

    by SuperKendall ( 25149 ) on Sunday March 22, 2009 @04:20PM (#27290641)

    If you don't believe me, pull out a profiler and run it on one of your programs, it will show you where things can be easily sped up.

    Now, given that the performance of most programs is not processor bound

    That's a pretty big leap, I think.

    Yes a lot of todays apps are more user bound than anything. But there are plenty of real-world apps that people use that are still pretty processor bound - Photoshop, and image processing in general is a big one. So can be video, which starts out disk bound but is heavily processor bound as you apply effects.

    Even Javascript apps are processor bound, hence Chrome...

    So there's still a big need for understanding how to take advantage of more cores - because chips aren't really getting faster these days so much as more cores are being added.

  • by caerwyn ( 38056 ) on Sunday March 22, 2009 @04:30PM (#27290761)

    This is true to a point. The problem is that, in modern applications, when the user *does* do something there's generally a whole cascade of computation that happens in response- and the biggest concern for most applications is that app appear to have short latency. That is, all of that computation happens as quickly as possible so the app can go back to waiting for user input.

    There's a lot of gain that can be gotten by threading user input responses in many scenarios. Who cares if the user often waits 5 minutes before input? When he *does* do something, he wants it done *immediately*. The fact that it's a tiny percentage by wall time doesn't change the fact that responsiveness here is a massive percentage of user perception.

  • by Todd Knarr ( 15451 ) on Sunday March 22, 2009 @04:37PM (#27290827) Homepage

    Part of the problem is that tools do very little to help break programs down into parallelizable tasks. That has to be done by the programmer, they have to take a completely different view of the problem and the methods to be used to solve it. Tools can't help them select algorithms and data structures. One good book related to this was one called something like "Zen of Assembly-Language Optimization". One exercise in it went through a long, detailed process of optimizing a program, going all the way down to hand-coding highly-bummed inner loops in assembly. And it then proceeded to show how a simple program written in interpreted BASIC(!) could completely blow away that hand-optimized assembly-language just by using a more efficient algorithm. Something similar applies to multi-threaded programming: all the tools in the world can't help you much if you've selected an essentially single-threaded approach to the problem. They can help you squeeze out fractional improvements, but to really gain anything you need to put the tools down, step back and select a different approach, one that's inherently parallelizable. And by doing that, without using any tools at all, you'll make more gains than any tool could have given you. Then you can start applying the tools to squeeze even more out, but you have to do the hard skull-sweat first.

    And the basic problem is that schools don't teach how to parallelize problems. It's hard, and not everybody can wrap their brain around the concept, so teachers leave it as a 1-week "Oh, and you can theoretically do this, now let's move on to the next subject." thing.

  • by Anonymous Coward on Sunday March 22, 2009 @04:40PM (#27290861)
    Lisp... has a few problems "Let's take function argument evaluation, as a simple example. Because a function call in Lisp must evaluate all arguments, in order, function calls cannot be parallelized. Even if the arguments could have been computed in parallel, there's no way to know for sure that the evaluation of one argument doesn't cause a side-effect which might interfere with another argument's evaluation. It forces Lisp's hand into doing everything in the exact sequence laid down by the programmer. This isn't to say that things couldn't happen on multiple threads, just that Lisp itself can't decide when it's appropriate to do so. Parallelizing code in Lisp requires that the programmer explicitly demarcate boundaries between threads, and that he use global locks to avoid out-of-order side-effects. " - John Wiegley But yeah, functional languages can sidestep these things. Erlang, Haskell, Scalia, etc.
  • Re:Adapt (Score:5, Insightful)

    by tftp ( 111690 ) on Sunday March 22, 2009 @04:41PM (#27290875) Homepage

    To dumb your message down, CPU manufacturers act like book publishers who want you to read one book in two different places at the same time just because you happen to have two eyes. But a story can't be read this way, and for the same reason most programs don't benefit from several CPU cores. Books are read page by page because each little bit of story depends on previous story; buildings are constructed one floor at a time because each new floor of a building sits on top of lower floors; a game renders one map at a time because it's pointless to render other maps until the player made his gameplay decisions and arrived there.

    In this particular case CPU manufacturers do what they do simply because that's the only thing they know how to do. We, as users, for most tasks would rather prefer a single 1 THz CPU core, but we can't have that yet.

    There are engineering and scientific tasks that can be easily subdivided - this [wikipedia.org] comes to mind - and these are very CPU-intensive tasks. They will benefit from as many cores as you can scare up. But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.

  • Re:Adapt (Score:5, Insightful)

    by Sentry21 ( 8183 ) on Sunday March 22, 2009 @04:42PM (#27290883) Journal

    This is the sort of thing I like about Apple's 'Grand Central'. The idea behind is that instead of assigning a task to a processor, it breaks up a task into discrete compute units that can be assigned wherever. When doing processing in a loop, for example, if each iteration is independent, you could make each iteration a separate 'unit', like a packet of computation.

    The end result is that the system can then more efficiently dole out these 'packets' without the programmer having to know about the target machine or vice-versa. For some computation, you could use all manner of different hardware - two dual-core CPUs and your programmable GPU, for example - because again, you don't need to know what it's running on. The system routes computation packets to wherever they can go, and then receives the results.

    Instead of looking at a program as a series of discrete threads, each representing a concurrent task, it breaks up a program's computation into discrete chunks, and manages them accordingly. Some might have a higher priority and thus get processed first (think QoS in networking), without having to prioritize or deprioritize an entire process. If a specific packet needs to wait on I/O, then it can be put on hold until the I/O is done, and the CPU can be put back to work on another packet in the meantime.

    What you get in the end is a far more granular, more practical way of thinking about computation that would scale far better as the number of processing units and tasks increases.

  • Nutty (Score:2, Insightful)

    by eneville ( 745111 ) on Sunday March 22, 2009 @04:44PM (#27290913) Homepage
    I disbelieve this entirely. UNIX/Linux is well designed for multiple core CPUs. Just take the whole single program, single small job approach of a pipeline command and you have your multicore solution ready. Programs that can make use of tasks that are IO bound are frequently written with threading in mind. qmail/apache are both well written for mutliple core CPUs. I don't see what the article is trying to say. Its clearly wrong.
  • by amorsen ( 7485 ) <benny+slashdot@amorsen.dk> on Sunday March 22, 2009 @04:57PM (#27291059)

    Fork isn't slow or painful. And if you think shared memory is a bad way to communicate, you REALLY won't like threads.

  • by Snowblindeye ( 1085701 ) on Sunday March 22, 2009 @04:58PM (#27291079)

    open this page [slashdot.org] in multiple tabs, and then try to scroll the foreground page. If Firefox used a thread or process per page like Google Chrome does, the operating system would take care of this.

    I think you are gravely oversimplifying things. Firefox certainly uses multiple threads. My Firefox thread is using 16 threads at the moment. The reason Chrome is using processes is so that when one of them crashes the other ones stay up.

    Also, if you look closely, it doesn't completely look up while the other tabs are loading. It *does* however, lock up at some point during the rendering. Which would indicate that some points of the code are synchronizing between threads, or bottlenecking on some resource, and that locks it up.

    Which is part of the problem. Its easy to say people need to use more threads. But the trouble comes when you need to synchronize, when they need to communicate with each other. Thats when you introduce performance bottlenecks. It's also one of the reasons why threading is harder than it seems.

  • Re:Adapt (Score:4, Insightful)

    by init100 ( 915886 ) on Sunday March 22, 2009 @05:09PM (#27291193)

    To begin with, I don't believe the article about the systems being badly prepared. I can't speak for Windows, but I know for sure that Linux is capable of far heavier SMP operation than 4 CPUs.

    My take on the article is that it is referring to applications provided with or at least available for the systems in question, and not actually the systems themselves. In other words, it takes the user view, where the operating system is so much more than just the kernel and the other core subsystems.

    But more importantly, many programming tasks simply aren't meaningful to break up into such units of granularity is OS-level threads.

    Actually, in Linux (and likely other *nix systems), with command lines involving multiple pipelined commands, the commands are executed in parallel, and are thus being scheduled on different processors/cores if available. This is a simple way of using the multiple cores available on concurrent systems, and thus, advanced programming is not always necessary to take advantage of the power of multicore chips.

  • by Dahamma ( 304068 ) on Sunday March 22, 2009 @05:11PM (#27291213)

    Yeah, there are also imaginary languages for imaginary processors like mic1 and stuff. But TFA is talking about operationg systems

    Don't state what TFA says if you didn't even read TFA.

    There isn't a SINGLE reference to Linux, Windows, or any other operating system in TFA. It was about lack of developer tools to create effective multithreaded applications, and had nothing to do with operating systems.

  • Re:Adapt (Score:2, Insightful)

    by Anonymous Coward on Sunday March 22, 2009 @05:38PM (#27291497)

    This is also known as Processor Affinity outside of the Apple box

  • Re:Adapt (Score:5, Insightful)

    by Anonymous Coward on Sunday March 22, 2009 @05:48PM (#27291593)

    You're thinking too simply. A single-core system at 5GHz would be less-responsive for most users than a dual-core 2GHz. Here's why:

    While you're playing a game more programs are running in the background - anti-virus, defrag, email, google desktop, etc. Also, any proper, modern game splits it's tasks, e.g. game AI, physics, etc.

    So dual-core is definitely a huge step up from single. So, no, users don't want single-core, they want a faster more responsive pc, which NOW is dual-core. In a few years it will be quad core. Most now hardly benefit from quad core.

  • Re:Adapt (Score:5, Insightful)

    by try_anything ( 880404 ) on Sunday March 22, 2009 @06:06PM (#27291789)

    But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.

    Yeah, I agree. There are a few rare types of software that are naturally parallel or deal with concurrency out of necessity, such as GUI applications, server applications, data-crunching jobs, and device drivers, but basically every other kind of software is naturally single-threaded.


    Sarcasm aside, few computations are naturally parallelizable, but desktop and server applications carry out many computations that can be run concurrently. For a long time it was normal (and usually harmless) to serialize them, but these days it's a waste of hardware. In a complex GUI application, for example, it's probably fine to use single-threaded serial algorithms to sort tables, load graphics, parse data, and check for updates, but you had better make sure those jobs can run in parallel, or the user will be twiddling his thumbs waiting for a table to be sorted while his quad-core CPU is "pegged" at 25% crunching on a different dataset. Or worse: he sits waiting for a table to be sorted while his CPU is at 0% because the application is trying to download data from a server.

    Your example of building construction is actually a good example in favor of concurrency. Construction is like a complex computation made of smaller computations that have complicated interdependencies. A bunch of different teams (like cores) work on the building at the same time. While one set of workers is assembling steel into the frame, another set of workers is delivering more steel for them to use. Can you imagine how long it would take if these tasks weren't concurrent? Of course, you have to be very careful in coordinating them. You can't have the construction site filled up with raw materials that you don't need yet, and you don't want the delivery drivers sitting idle while the construction workers are waiting for girders. I'm sure the complete problem is complex beyond my imagination. By what point during construction do need your gas, electric, and sewage permits? Will it cause a logistical clusterfuck (contention) if there are plumbers and eletricians working on the same floor at the same time? And so on ad infinitum. Yet the complexity and inevitable waste (people showing up for work that can't be done yet, for example) is well worth having a building up in months instead of years.

  • Re:Adapt (Score:2, Insightful)

    by Dolda2000 ( 759023 ) <fredrikNO@SPAMdolda2000.com> on Sunday March 22, 2009 @06:24PM (#27291975) Homepage

    Until harddrives cease being pathetic slugs, and I include top end SSDs, our machines are going nowhere fast.

    You must be using Vista, if you think that modern computers are slow. :)

  • Re:Adapt (Score:3, Insightful)

    by gbjbaanb ( 229885 ) on Sunday March 22, 2009 @07:21PM (#27292529)

    Yeah, I reckon you've got the reason things are "single-threaded" by design. So the solution is to start getting creative with sections of programs and not the whole.

    For example, if you're using OpenMP to introduce parallelisation, you can easily make loops run in multi-core mode, and you'll get compiler errors if you try to parallelise loops that can't be broken down like that.

    Like your building analogy - sure, you have to finish one floor before you can put the next one on, but once the floors are up, you can plumb each room up concurrently. You have to then wait until the plumbing and wiring is done before you can start plastering, and then you have to wait for that to dry before you can decorate - but you can then decorate each room concurrently.

    Stuff like that will allow you to easily set some parts running concurrently, and I reckon that's as good as we're going to get unless we start thinking in full-on functional-style programming designs. (see the wikipedia entry [wikipedia.org] for a good exmaple) But I don't hold out hope for that anytime soon, its still hard to get right if the task is not simple.

    Besides, who really needs 8 cores anyway - unless there are specialist tasks (and I can think of only a few) the biggest problems we have are memory and IO bandwidth, not CPU performance.

  • Re:Adapt (Score:5, Insightful)

    by TheRaven64 ( 641858 ) on Sunday March 22, 2009 @07:48PM (#27292767) Journal

    So the chip companies are generally going to end up spending process improvements by making chips cheaper, rather than more complex?

    Probably. Cheaper, and less power-hungry. For the past 50 years we've had a set of cycles where computers get dedicated hardware for some task, then the general purpose hardware gets fast enough to run it and the dedicated hardware goes away, then the cycle repeats with some other algorithm (sound, 2D video, and so on). The side-effect of this is that it also consumes a lot more power. For any algorithm, you can design dedicated hardware that executes it with less power than a general-purpose CPU. The DSP on something like an OMAP3 can decode MP3 audio in under 50mW; even something like the Atom is going to struggle to get within two orders of magnitude of this.

    This wasn't a problem for desktop PCs, because they were plugged into the mains and no one has itemised electricity bills, so no one notices the difference between a 20W and a 100W CPU. In a laptop or palmtop, the difference between 250mW (a typical ARM Cortex A8 SoC) and 20W (Atom + a cheap chipset) can be several hours of battery life. People are starting to expect 10 hours of battery life from portables, and doing this with a small battery requires a lot of dedicated silicon that can be turned off when not in use and draw small amounts of power when executing the task it was designed for.

    I expect the future of CPUs will be heterogeneous multicore. In a way, that's the present of CPUs too; you can consider the FPU and vector unit as separate, specialised, cores (although they lack separate control instructions, so it's stretching it slightly).

  • Re:Adapt (Score:5, Insightful)

    by beav007 ( 746004 ) on Sunday March 22, 2009 @07:53PM (#27292811) Journal
    It's posts like these that make me think that I'm the only one with 7 programs on the task bar, 12 in the system tray, assorted server processes, and 32 tabs open in Firefox (come on, 1 thread per tab!!). It doesn't much matter to me if each of these parts are not multithreaded, as long as the OS is smart enough to put active threads on different cores.
  • elephant years (Score:3, Insightful)

    by epine ( 68316 ) on Sunday March 22, 2009 @08:18PM (#27293005)

    Knuth's maxim is sufficiently pithy to have become, over time, self referential, as evidenced by your misunderstanding.

    The root of all evil used to be deep and singular, now it is broad and shallow. I guarantee you that Knuth did not include choosing the best fundamental algorithm under the label "premature" unless it involves squabbling over log log N terms or stray digits in the exponent term.

    http://www.siam.org/pdf/news/174.pdf [siam.org]

    An unpacked (deoptimized) version of Knuth's maxim is that the transition from program structure and notation which maximizes readability, comprehension, and conviction (concerning its correctness and merit) to one which favours performance should be delayed as long as possible. Ideally until performance becomes the sole remaining success factor.

    (Taking into account the human mind's special capacity to imprint upon evil, Knuth's formulation remains the better one.)

    Originally Knuth meant manually hoisting loop constant expressions (often in ways that later turn out to not be fully general) or manually evaluating constant expressions or manually fusing nested function calls and the kind of rot that a good compiler these days will do on your behalf. Anyone used the "register" keyword lately? Once upon a time it seemed like a good idea.

    While the principle remains the same, the temptations have changed. Such as parallelizing a bad implementation of a poor algorithm in the misguided belief that the underlying task is not sequentially bound.

    That said, projects which do *no* evil typically fail to impress anyone. The ideal is to wrap large amount of cleanly structured and accessible source code around a nugget of pure, smoldering evil, coked to the last clock cycle.

    Perversely, the worst example of this is TeX itself. The smoldering nugget of pure evil is the single pass parsing regime and data packing eight bit character values.

    I suspect the literature on parallel programming would roughly equal the literature on electro-chemical storage cells. Sheesh, if only those guys were paying attention, we'd have watch batteries powering small cities by now.

    On second thought, how much literature could there really be if you can summon the majority of it onto your screen in 4/10'ths of a second for any combination of keywords?

    Parallel programming is a lot like fuel cells. You get some pretty impressive results on selected applications involving pristine apparatus in a controlled setting, dating back to the Apollo program (in both cases).

    Reality on the ground is rarely so forgiving.

    If we hadn't already achieved a pixel processing speed-up between 1980 and 2008 best approximated by a sideways 8, Javascript wouldn't even have entered the conversation.

    It boils down to this: ignoring everything you guys have already accomplished, you've pretty much done nothing. I worked for that kind of company once. The guy in charge put on a Cirque du Soleil of intestinal recursion. That's how I feel about the claim that software developers haven't been paying attention to parallelism for elephant years.

  • Re:Adapt (Score:1, Insightful)

    by phantomfive ( 622387 ) on Sunday March 22, 2009 @08:36PM (#27293135) Journal
    I keep seeing comments like this, but I'm not sure you've actually thought through the issues here. What sort of applications are you running that are pegging the CPU at 25%? In the application you just described (in your second multiword paragraph), running those things in parallel can actually slow things down. Why? Because by far the thing that is taking all your time is the disk access. Launch photoshop sometime and see what it is doing on startup, 90% of it is loading palettes, etc. From the disk. A single disk access can take a million CPU cycles, which may be more than the entire rest of the startup code put together, so really it's not even worth optimizing until you deal with getting the disk faster.

    How can it actually take longer if you run them in parallel? There is only one disk, and if you are trying to load from it from two different processes, it will be forced to rapidly switch back and forth between the two. Not good. The fastest way to get info from a disk is sequentially.

    Parallelization is not the easy solution it seems on the surface.
  • Re:Adapt (Score:3, Insightful)

    by bertok ( 226922 ) on Sunday March 22, 2009 @08:48PM (#27293243)

    I think the consensus was that making compilers emit efficient VLIW for a typical procedural language such as C is very hard. Intel spend many millions on compiler research, and it took them years to get anywhere. I heard of 40% improvements in the first year or two, which implies that they were very far from ideal when they started.

    To achieve automatic parallelism, we need a different architecture to classic "x86 style" procedural assembly. Programming languages have to change too, the current crop are too close to the metal. I suspect that in the future, languages will rely on intermediate byte-code more, and become ever more functional as designers realize that functional code is easy to transform due to a lack of side-effects.

    I've heard of automatically parallelized versions of some pure functional languages that can execute almost any code on almost any number of CPUs without the programmer ever having to write a single synchronization instruction! For example, Microsoft is working on "parallel LINQ" in C# 4.0, which is essentially a small island of parallelizable functional code that can be embedded in a procedural language.

  • Re:Adapt (Score:3, Insightful)

    by mgblst ( 80109 ) on Sunday March 22, 2009 @09:06PM (#27293393) Homepage

    A better but less humorous analogy would be to consider that Intel and co can't keep increasing the top speed of a car, so they are putting more seats into your car. This works OK when you have lots of people to transport, but when you only have 1 or two, it doesn't make the journey any faster. The problem is, most journeys only consist of one or two people. What the article is suggesting is that we implement some sort of car-sharing initiative, we stop taking so many cars to the same destination. Or a bus!

  • Real Dumb (Score:3, Insightful)

    by omb ( 759389 ) on Sunday March 22, 2009 @09:23PM (#27293519)
    As has already been explained, Non-Sequential thinking is hard, you postulate double speed, BUT the producer thread, the app finished and handed of the buffer to the OS to send to the GPU, and you say it threads this. Well fine, so the threaded part can run on another core, but then hardware DMAs the data and waits for a GPU interrupt/done-queue ack so how does this speed things up on multicore. Not at all, someone has to set up the DMA and wait, not run, while it completes, so unless all cores are at 100% you have saved nothing, and created additional overhead spawning a new thread

    Duh, Marketing Departments
  • by Raenex ( 947668 ) on Sunday March 22, 2009 @09:47PM (#27293699)

    There are plenty of good designs that work in a single threaded environment that do not in multi-threaded environment. It's just a completely different ballgame when you allow multiple threads to be running on the same piece of code. With threading, the complexity goes up an order of magnitude and so does the penalty for failure.

    Anyways, I'm out. This is the standard debate about "good" programmers and "good" designs vs dangerous techniques that should be avoided.

  • Re:Adapt (Score:4, Insightful)

    by Waffle Iron ( 339739 ) on Sunday March 22, 2009 @10:18PM (#27293895)

    It's posts like these that make me think that I'm the only one with 7 programs on the task bar, 12 in the system tray, assorted server processes, and 32 tabs open in Firefox (come on, 1 thread per tab!!).

    I'd be willing to bet a good deal of money that almost all of those tasks are currently asleep and waiting for input, a timer signal or external I/O. Such processes don't need *any* cores unless and until they wake up.

    (The big exception for most people would be having flash ads running in those 32 firefox tabs. The way to solve that problem without adding more cores is by installing flashblock.)

    Right now "ps" says that my system is running 127 different processes. Current CPU utilization? 0.7%.

  • Re:Adapt (Score:2, Insightful)

    by shutdown -p now ( 807394 ) on Monday March 23, 2009 @12:13AM (#27294511) Journal

    The remaining 4 for Flash?

  • Re:Adapt (Score:5, Insightful)

    by fractoid ( 1076465 ) on Monday March 23, 2009 @12:32AM (#27294607) Homepage

    This is the sort of thing I like about Apple's 'Grand Central'.

    What's this 'grand central' thing? From a few brief Google searches it appears to be a framework for using graphics shaders to offload number crunching to the video card. It'd be nice if they'd stick (at least for technical audiences) to slightly more descriptive and less grandiose labels.

    That's always been my main peeve with Apple, they give opaque, grandiloquent names to standard technologies, make ridiculous performance claims, then set their foaming fanboys loose to harass those of us who just want to get the job done. Remember "AltiVEC" (which my friend swore could burn a picture of Jesus's toenails onto a piece of toast on the far side of the moon with a laser beam comprised purely of blindingly fast array calculations) which turned out to just be a slightly better MMX-like SIMD addon?

    Or the G3/G4 processors which lead us to be breathlessly sprayed with superlatives for years until Apple ditched them for the next big thing - Intel processors! Us stupid, drone-like "windoze" users would never see the genius in using Intel proce... oh wait. No, no wait. We got the same "oooh the Intel Mac is 157 times faster than an Intel PC" for at least six months until 'homebrew' OSX finally proved that the hardware is exactly the friggin same now. For a while, thank God, they've been reduced to lavishing praise on the case design and elegant headphone plug placement. It looks like that's coming to an end, though.

  • by fractoid ( 1076465 ) on Monday March 23, 2009 @12:35AM (#27294629) Homepage

    [T]hats why on Mac, Linux or Windows you stick with code that will just work on one core. No problems then.

    That, and the much greater reason that (a) 99% of software these days would run just fine on a single core P4 3GHz, and (b) most programmers are really, really bad and it's much harder to screw up a single-threaded app badly enough that I can't fix it, than it is to screw up a multi-threaded app.

  • Re:MOD PARENT UP! (Score:2, Insightful)

    by drizek ( 1481461 ) on Monday March 23, 2009 @01:25AM (#27294869)
    Do you need to dedicate an entire 3ghz CPU core to run your bittorrent, and another to refresh slashdot?
  • by Asic Eng ( 193332 ) on Monday March 23, 2009 @02:24AM (#27295079)
    Parallel computing and parallel hardware have been around for decades - not on the desktop, but in the supercomputer area. It's a tough problem to solve efficiently - there are some things which are hard to get around. As an example think of the equation y = SQRT(a*b) - you need two mathematical operations there. It doesn't really help if you have two processors, since you need the result of one operation before you can perform the second. The example isn't very interesting, but essentially you always have this problem - if you rely on the result of the previous steps, then you need to do things in order. You can modify your algorithms so that happens less often, but this is hard work and interferes with your desire to write clean readable code.
  • by gooneybird ( 1184387 ) on Monday March 23, 2009 @08:45AM (#27296685)
    "The problem my dear programmer, as you so elequently put, is one of choice.."

    Seriously. I have been involved with software development from 8-bit pics to Cluster's spanning wans and everything in between for the past 20 years or so.

    Multiprocessing involves coordination between the processes. It doesn't matter (too much) whether it's separate cores or separate silicon. On any given modern OS there are plenty of examples of multiprocessor execution: Hard drives each have a processor, video cards each have a processor, USB controllers have a processor. All of these work because there is a well-defined API between them and the OS - a.k.a device drivers. People that write good device drivers (and kernel code) understand how an OS works. This is not generally true of the broader developer population.

    Developer's keep blaming the CPU manufactures' that it's their fault. It's not. What prevents parallel processing from becoming mainstream is the lack of a standard inter-process communications mechanism (at the language level) that abstracts a lot of the dirty little details that are needed. Once the mechanism is in place, then people will start using it. I am not referring to semaphores and mutexes. These are synchronization mechanisms, NOT (directly) communication mechanisms... I am not talking about queues either - too much leeway on their use. Sockets would be closer, but most people think of sockets for "network" applications. They should be thinking of them as "distributed applications". As in distrbuted across cores. As an example, Microsoft just recently started to demonstrate that they "get it" because with the next release of VS. It will have a messaging library.


    At this time there are too many different ways to implement multi-threaded/multi-processor aware software. Each implementation has possible bugs - race conditions, lockups, priority inversion, etc. The choices need to be narrowed

    Having a standard (language & OS) API is the key to providing a framework for developer's to use, yet still allowing them the freedom to customize for specific needs. So the OS needs an interface for setting CPU/core preferences and the language needs to provide the API. Once there is an API, developer's can "wrap their minds" around the concept and then things will "take off". As I stated previously, I prefer the "message box" mechansims simply because they port easily, are easy to understand and provide for a very loosely coupled interaction. All good tenants of a multi-threaded/multi-processor implementation.

    Danger Will Robinson:

    One thing that I fear is that once the concept catches on, it will be overused or abused. People will start writing threads and processes that don't do enough work to justify the overhead. Everyone who starts writing programs will "advertise" that it's "multi-threaded", as if this somehow automatically indicates quality and/or "better" software...Not.
  • Re:Adapt (Score:3, Insightful)

    by TheNinjaroach ( 878876 ) on Monday March 23, 2009 @08:58AM (#27296815)

    A single-core system at 5GHz would be less-responsive for most users than a dual-core 2GHz. Here's why:

    Because you're going to claim it takes more than 20% CPU time for the faster core to switch tasks? That's doubtful, I'll take the 5GHz chip any day.

  • by EnglishTim ( 9662 ) on Monday March 23, 2009 @10:06AM (#27297607)

    And the basic problem is that schools don't teach how to parallelize problems. It's hard, and not everybody can wrap their brain around the concept...

    And there's more to it than that; If a problem is hard, it's going to take longer to write and much longer to debug. Often it's just not worth investing the extra time, money and risk into doing something that's only going to make the program a bit faster. If we proceed to a future where desktop computers all have 256 cores, the speed advantage may be worth it but currently it's a lot of effort without a great deal of gain. There's probably better ways that you can spend your time.

  • Re:Adapt (Score:3, Insightful)

    by mr_mischief ( 456295 ) on Monday March 23, 2009 @10:41AM (#27298095) Journal

    This is indeed true on a general-purpose desktop most of the time. There are many server and workstation tasks, though, that ca take as many cores as you can throw at them.

    A web server, application middlware server, or database server will often run multiple single-threaded programs at once rather than running one huge multi-threaded application.

    People who say that you must have multi-threaded applications to use multiple cores are either incompetent or are looking at a very narrow section of the industry. Not everyone runs a single foreground process with just a virus scanner in the background. An SMP or NUMA server with a hundred application instances running isn't going to run all of them on the first four cores and ignore the rest.

    Some scheduling changes may be necessary to make doling the work out to really big number of cores, like 128, 256, or 512 work really well, but Linux is already run on HPC clusters much larger than that and Windows HPC is supposed to be capable of it, too.

The Force is what holds everything together. It has its dark side, and it has its light side. It's sort of like cosmic duct tape.