Windows and Linux Not Well Prepared For Multicore Chips

Follow Slashdot stories on Twitter

Windows and Linux Not Well Prepared For Multicore Chips 626

Posted by timothy on Sunday March 22, 2009 @03:30PM from the until-that-invisible-hand-flexes dept.

Mike Chapman points out this InfoWorld article, according to which you shouldn't immediately expect much in the way of performance gains from Windows 7 (or Linux) from eight-core chips that come out from Intel this year. "For systems going beyond quad-core chips, the performance may actually drop beyond quad-core chips. Why? Windows and Linux aren't designed for PCs beyond quad-core chips, and programmers are to blame for that. Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores. Problem? The development tools aren't available and research is only starting."

This discussion has been archived. No new comments can be posted.

Windows and Linux Not Well Prepared For Multicore Chips

Load 500 More Comments

Search 626 Comments Log In/Create an Account

Comments Filter:

Adapt (Score:3, Funny)

by Dyinobal ( 1427207 ) writes: on Sunday March 22, 2009 @03:33PM (#27290067)

Give us a year maybe two.

Share
twitter facebook
- Re:Adapt (Score:5, Interesting)
  
  by Dolda2000 ( 759023 ) writes: <fredrik@dolda200 0 . c om> on Sunday March 22, 2009 @04:05PM (#27290487) Homepage
  
  No, it's not about adaptation. The whole approach currently taken is completely, outright on-its-head wrong.
  To begin with, I don't believe the article about the systems being badly prepared. I can't speak for Windows, but I know for sure that Linux is capable of far heavier SMP operation than 4 CPUs.
  But more importantly, many programming tasks simply aren't meaningful to break up into such units of granularity is OS-level threads. Many programs would benefit from being able to run just some small operations (like iterations of a loop) in parallel, but just the synchronization work required to wake up even a thread from a pool to do such a thing would greatly exceed the benefit of it.
  People just think about this the wrong way. Let me re-present the problem for you: CPU manufacturers have been finding it harder to scale the clock frequencies of CPUs higher, and therefore they start adding more functional units to CPUs to do more work per cycle instead. Since the normal OoO parallelization mechanisms don't scale well enough (probably for the same reasons people couldn't get data-flow architectures working at large scales back in the 80's), they add more cores instead.
  The problem this gives rise to, as I stated above, is that the unit of parallelism gained by more CPUs is to large to divide the very small units of work that exist among. What is needed, I would argue, is a way to parallelize instructions in the instruction set itself. HP's/Intel's EPIC idea (which is now Itanium) wasn't stupid, but it has a hard limitation on how far it scales (currently four instructions simultaneously).
  I don't have a final solution quite yet (though I am working on it as a thought project), but the problem we need to solve is getting a new instruction set which is inherently capable of parallel operation, not on adding more cores and pushing the responsibility onto the programmers for multi-threading their programs. This is the kind of the the compiler could do just fine (even the compilers that exist currently -- GCC's SSA representation of programs, for example, is excellent for these kinds of things), by isolating parts of the code in which there are no dependencies in the data-flow, and which could therefore run in parallel, but they need the support in the instruction set to be able to specify such things.
  
  Parent Share
  twitter facebook
  - Re:Adapt (Score:5, Informative)
    
    by Dolda2000 ( 759023 ) writes: <fredrik@dolda200 0 . c om> on Sunday March 22, 2009 @04:28PM (#27290731) Homepage
    
    Since the normal OoO parallelization mechanisms don't scale well enough
    It hit me that this probably wasn't obvious to everyone, so just to clarify: "OoO", here, stands not for Object-Oriented Something, but for Out-of-Order [wikipedia.org], as in how current, superscalar CPUs work. See also Dataflow architecture [wikipedia.org].
    
    Parent Share
    twitter facebook
    - Re:Adapt (Score:4, Funny)
      
      by Opyros ( 1153335 ) writes: on Sunday March 22, 2009 @08:26PM (#27293065) Journal
      
      Thanks for the explanation -- for a moment, I was actually wondering what OpenOffice.org's parallelization mechanisms had to do with anything!
      
      Parent Share
      twitter facebook
      - Re:Adapt (Score:5, Funny)
        
        by jd ( 1658 ) writes: <imipak@yahoGINSBERGo.com minus poet> on Sunday March 22, 2009 @10:16PM (#27293873) Homepage Journal
        
        Well, you see, once IBM buys out Sun, Solaris is going to be re-implemented as macros in OpenOffice. Or Emacs. Whichever one they decide to pick as the new OS kernel.
        
        Parent Share
        twitter facebook
  - Re:Adapt (Score:5, Interesting)
    
    by Yaa 101 ( 664725 ) writes: on Sunday March 22, 2009 @04:36PM (#27290817) Journal
    
    The final solution is that the processor measures and decides which part of which program must be run parallel and which are better off left alone.
    What else do we have computers for?
    
    Parent Share
    twitter facebook
    - Re:Adapt (Score:5, Interesting)
      
      by Dolda2000 ( 759023 ) writes: <fredrik@dolda200 0 . c om> on Sunday March 22, 2009 @04:43PM (#27290889) Homepage
      
      As I mentioned briefly in my post, there was research into dataflow architecures [wikipedia.org] in the 70's and 80's, and it turned out to be exceedingly difficult to do such things efficiently in hardware. It may very well be that they still are the final solution, but until such time as they become viable, I think doing the same thing in the compiler, as I proposed, is more than enough. That's still the computer doing it for you.
      
      Parent Share
      twitter facebook
  - Re:Adapt (Score:5, Insightful)
    
    by tftp ( 111690 ) writes: on Sunday March 22, 2009 @04:41PM (#27290875) Homepage
    
    To dumb your message down, CPU manufacturers act like book publishers who want you to read one book in two different places at the same time just because you happen to have two eyes. But a story can't be read this way, and for the same reason most programs don't benefit from several CPU cores. Books are read page by page because each little bit of story depends on previous story; buildings are constructed one floor at a time because each new floor of a building sits on top of lower floors; a game renders one map at a time because it's pointless to render other maps until the player made his gameplay decisions and arrived there.
    In this particular case CPU manufacturers do what they do simply because that's the only thing they know how to do. We, as users, for most tasks would rather prefer a single 1 THz CPU core, but we can't have that yet.
    There are engineering and scientific tasks that can be easily subdivided - this [wikipedia.org] comes to mind - and these are very CPU-intensive tasks. They will benefit from as many cores as you can scare up. But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.
    
    Parent Share
    twitter facebook
    - Re:Adapt (Score:5, Insightful)
      
      by Anonymous Coward writes: on Sunday March 22, 2009 @05:48PM (#27291593)
      
      You're thinking too simply. A single-core system at 5GHz would be less-responsive for most users than a dual-core 2GHz. Here's why:
      While you're playing a game more programs are running in the background - anti-virus, defrag, email, google desktop, etc. Also, any proper, modern game splits it's tasks, e.g. game AI, physics, etc.
      So dual-core is definitely a huge step up from single. So, no, users don't want single-core, they want a faster more responsive pc, which NOW is dual-core. In a few years it will be quad core. Most now hardly benefit from quad core.
      
      Parent Share
      twitter facebook
      - Re:Adapt (Score:5, Funny)
        
        by David Gerard ( 12369 ) writes: <slashdot AT davidgerard DOT co DOT uk> on Sunday March 22, 2009 @06:11PM (#27291843) Homepage
        
        Three cores to run GNOME, one core to run Firefox.
        
        Parent Share
        twitter facebook
        
        Re:Adapt (Score:5, Funny)
        
        by jd ( 1658 ) writes: <imipak@yahoGINSBERGo.com minus poet> on Sunday March 22, 2009 @10:21PM (#27293909) Homepage Journal
        
        Three Cores for the Gnome kings under the Gtk,
        Seven for the KDE lords in their halls of X,
        Nine for Emacs Men doomed to spawn,
        
        Parent Share
        twitter facebook
        
        Re:Adapt (Score:5, Funny)
        
        by Draek ( 916851 ) writes: on Monday March 23, 2009 @12:33AM (#27294613)
        
        Three Cores for the Mozilla-kings under the GUI,
        Seven for the Gnome-lords in their halls of X,
        Nine for KDE Men doomed to be flamed,
        One for the Free Scheduler on his free kernel
        In the Land of Linux where the SMP lie.
        One Core to rule them all, One Core to find them,
        One Core to bring them all and in the scheduler bind them
        In the Land of Linux where the SMP lie.
        Which is, of course, what will eventually happen if the number of cores keep increasing: we'll need one dedicated exclusively to manage what goes where and when. Which is pretty cool when you think about it ;)
        
        Parent Share
        twitter facebook
      - Re:Adapt (Score:5, Informative)
        
        by TheRaven64 ( 641858 ) writes: on Sunday March 22, 2009 @06:56PM (#27292287) Journal
        
        This is simply not true. Assuming both cores are fully loaded, which is the best possible case for dual core, then they will still be performing context switches at the same rate as a single chip if you are running more than one process per core. Even if you had the perfect theoretical case for two cores, where you have two independent processes and never context switch, you could run them much faster on the single-core machine. A single-core 5GHz CPU would have to waste 20% of its time on context switching to be slower than a dual-core 2GHz CPU, while a real CPU will spend less than 1% (and even on the dual-core CPU, most of the time your kernel will be preempting the process every 10ms, checking if anything else needs to run, and then scheduling it again, so you don't save much).
        The only way the dual core processor would be faster in your example would be if it had more cache than the 5GHz CPU and the working set for your programs fitted into the cache on the dual-core 2GHz chip but not on the 5GHz one, but that's completely independent of the number of cores.
        
        Parent Share
        twitter facebook
        
        Re:Adapt (Score:5, Insightful)
        
        by TheRaven64 ( 641858 ) writes: on Sunday March 22, 2009 @07:48PM (#27292767) Journal
        
        So the chip companies are generally going to end up spending process improvements by making chips cheaper, rather than more complex?
        Probably. Cheaper, and less power-hungry. For the past 50 years we've had a set of cycles where computers get dedicated hardware for some task, then the general purpose hardware gets fast enough to run it and the dedicated hardware goes away, then the cycle repeats with some other algorithm (sound, 2D video, and so on). The side-effect of this is that it also consumes a lot more power. For any algorithm, you can design dedicated hardware that executes it with less power than a general-purpose CPU. The DSP on something like an OMAP3 can decode MP3 audio in under 50mW; even something like the Atom is going to struggle to get within two orders of magnitude of this.
        This wasn't a problem for desktop PCs, because they were plugged into the mains and no one has itemised electricity bills, so no one notices the difference between a 20W and a 100W CPU. In a laptop or palmtop, the difference between 250mW (a typical ARM Cortex A8 SoC) and 20W (Atom + a cheap chipset) can be several hours of battery life. People are starting to expect 10 hours of battery life from portables, and doing this with a small battery requires a lot of dedicated silicon that can be turned off when not in use and draw small amounts of power when executing the task it was designed for.
        I expect the future of CPUs will be heterogeneous multicore. In a way, that's the present of CPUs too; you can consider the FPU and vector unit as separate, specialised, cores (although they lack separate control instructions, so it's stretching it slightly).
        
        Parent Share
        twitter facebook
        
        Re:Adapt (Score:4, Interesting)
        
        by david.given ( 6740 ) writes: <dg@cowlark.com> on Sunday March 22, 2009 @10:46PM (#27294043) Homepage Journal
        
        I expect the future of CPUs will be heterogeneous multicore.
        You may be interested to know that, as far as I can tell from the rather fuzzy documention, the MSM7201A processor used in the G1 smartphone has at least three dissimilar cores, and potentially five:
        an ARM11 for the application stack
        an ARM9 for the radio stack
        a QDSP4000
        possibly a QDSP5000, the spec is unclear as to whether you get both this and the QDSP4000
        a PowerVR 3D accelerator unit, although the spec is again unclear as to whether this is actually in silicon and not just a particular firmware load for the DSP
        I gather that it's pretty hard to make them share address spaces, even the two ARMs; so SMP is probably not feasible. Message-passing via specific shared memory segments is the usual approach.
        
        Parent Share
        twitter facebook
      - Re: (Score:3, Insightful)
        
        by TheNinjaroach ( 878876 ) writes:
        
        A single-core system at 5GHz would be less-responsive for most users than a dual-core 2GHz. Here's why:
        Because you're going to claim it takes more than 20% CPU time for the faster core to switch tasks? That's doubtful, I'll take the 5GHz chip any day.
    - Re:Adapt (Score:5, Insightful)
      
      by try_anything ( 880404 ) writes: on Sunday March 22, 2009 @06:06PM (#27291789)
      
      But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.
      Yeah, I agree. There are a few rare types of software that are naturally parallel or deal with concurrency out of necessity, such as GUI applications, server applications, data-crunching jobs, and device drivers, but basically every other kind of software is naturally single-threaded.
      Wait....
      Sarcasm aside, few computations are naturally parallelizable, but desktop and server applications carry out many computations that can be run concurrently. For a long time it was normal (and usually harmless) to serialize them, but these days it's a waste of hardware. In a complex GUI application, for example, it's probably fine to use single-threaded serial algorithms to sort tables, load graphics, parse data, and check for updates, but you had better make sure those jobs can run in parallel, or the user will be twiddling his thumbs waiting for a table to be sorted while his quad-core CPU is "pegged" at 25% crunching on a different dataset. Or worse: he sits waiting for a table to be sorted while his CPU is at 0% because the application is trying to download data from a server.
      Your example of building construction is actually a good example in favor of concurrency. Construction is like a complex computation made of smaller computations that have complicated interdependencies. A bunch of different teams (like cores) work on the building at the same time. While one set of workers is assembling steel into the frame, another set of workers is delivering more steel for them to use. Can you imagine how long it would take if these tasks weren't concurrent? Of course, you have to be very careful in coordinating them. You can't have the construction site filled up with raw materials that you don't need yet, and you don't want the delivery drivers sitting idle while the construction workers are waiting for girders. I'm sure the complete problem is complex beyond my imagination. By what point during construction do need your gas, electric, and sewage permits? Will it cause a logistical clusterfuck (contention) if there are plumbers and eletricians working on the same floor at the same time? And so on ad infinitum. Yet the complexity and inevitable waste (people showing up for work that can't be done yet, for example) is well worth having a building up in months instead of years.
      
      Parent Share
      twitter facebook
      - Re:Adapt (Score:4, Interesting)
        
        by AmiMoJo ( 196126 ) writes: on Sunday March 22, 2009 @07:42PM (#27292719) Homepage Journal
        
        So, we can broadly say that there are three areas where we can parallelise.
        First you have the document level. Google Chrome is a good example of this - first we had the concept of multiple documents open in the same program, now we have the concept for a separate thread for each "document" (or tab in this case). Games are also moving ahead in this area, using separate threads for graphics, AI, sound, physics and so on.
        Then you have the OS level. Say the user clicks to sort a table of data into a new order, the OS can take care of that. It's a standard part of the GUI system, and can be set off as a separate thread. Of course, some intelligence is required here as it's only worth spawning another thread if the sort is going to take some appreciable amount of time.
        At the bottom you have the algorithm level, which is the hard one. So far this level has got a lot of attention, but the others relatively little. The first two are the low hanging fruit, which is where people should be concentrating.
        
        Parent Share
        twitter facebook
      - Re:Adapt (Score:5, Insightful)
        
        by beav007 ( 746004 ) writes: on Sunday March 22, 2009 @07:53PM (#27292811) Journal
        
        It's posts like these that make me think that I'm the only one with 7 programs on the task bar, 12 in the system tray, assorted server processes, and 32 tabs open in Firefox (come on, 1 thread per tab!!). It doesn't much matter to me if each of these parts are not multithreaded, as long as the OS is smart enough to put active threads on different cores.
        
        Parent Share
        twitter facebook
        
        Re:Adapt (Score:4, Insightful)
        
        by Waffle Iron ( 339739 ) writes: on Sunday March 22, 2009 @10:18PM (#27293895)
        
        It's posts like these that make me think that I'm the only one with 7 programs on the task bar, 12 in the system tray, assorted server processes, and 32 tabs open in Firefox (come on, 1 thread per tab!!).
        I'd be willing to bet a good deal of money that almost all of those tasks are currently asleep and waiting for input, a timer signal or external I/O. Such processes don't need *any* cores unless and until they wake up.
        (The big exception for most people would be having flash ads running in those 32 firefox tabs. The way to solve that problem without adding more cores is by installing flashblock.)
        Right now "ps" says that my system is running 127 different processes. Current CPU utilization? 0.7%.
        
        Parent Share
        twitter facebook
      - Re: (Score:3, Interesting)
        
        by try_anything ( 880404 ) writes:
        
        Short answer: only one thing I mentioned involved disk I/O, RAM is cheap, and application frameworks typically limit the number of jobs being run at one time.
        If there's really a performance need to serialize tasks involving disk I/O, then go ahead and serialize them. Eclipse, the application framework I'm most familiar with, makes this straightforward: just define a scheduling policy that allows only one job to run at a time and apply that policy to all your disk I/O jobs. Other jobs will continue to be
        
        Re: (Score:3, Informative)
        
        by SL Baur ( 19540 ) writes:
        
        Short answer: only one thing I mentioned involved disk I/O, RAM is cheap.
        Not in modern architectures and it depends. Registers are faster than L1 caches. L1 caches are faster than L2 caches, etc.
        See: http://lwn.net/Articles/250967/ [lwn.net] for an excellent discussion about how one can dramatically speed up applications by optimizing memory access.
        And I disagree with the title of this thread - Linux (the kernel at least) is quite well prepared for multicore chips.
    - Re:Adapt (Score:5, Funny)
      
      by nmb3000 ( 741169 ) writes: on Sunday March 22, 2009 @06:21PM (#27291945) Journal
      
      To dumb your message down, CPU manufacturers act like book publishers [...]
      What is this "books" crap? Pft, I remember when car analogies were good enough for everyone. Now you have to get all fancy. Let me try and explain it more clearly:
      CPUs are like cars. Intel and Friends haven't been able to keep increasing the velocity they can safely and reliably run, so instead of relying on increased speed to get more people from point A to point B, they are instead starting to look at parallelization as a means to achieve better performance.
      Now you are chopped up into 10 pieces and FedEx'd to your destination with 100 other people. Pieces may go by road, rail, air, or ship and thus overall capacity--"bandwidth" you might say--of the lanes of travel has been increased.
      The only problem is that the people who make use of this new technique ("programmers", that is) have a hard time chopping you up in such a way that you can be put back together again. Usually it's a bit of a mess and more trouble that it's worth, thus we just keep driving our old-fashioned cars at normal speeds while adding lanes to the roads.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Insightful)
        
        by mgblst ( 80109 ) writes:
        
        A better but less humorous analogy would be to consider that Intel and co can't keep increasing the top speed of a car, so they are putting more seats into your car. This works OK when you have lots of people to transport, but when you only have 1 or two, it doesn't make the journey any faster. The problem is, most journeys only consist of one or two people. What the article is suggesting is that we implement some sort of car-sharing initiative, we stop taking so many cars to the same destination. Or a bus!
    - Re: (Score:3, Insightful)
      
      by gbjbaanb ( 229885 ) writes:
      
      Yeah, I reckon you've got the reason things are "single-threaded" by design. So the solution is to start getting creative with sections of programs and not the whole.
      For example, if you're using OpenMP to introduce parallelisation, you can easily make loops run in multi-core mode, and you'll get compiler errors if you try to parallelise loops that can't be broken down like that.
      Like your building analogy - sure, you have to finish one floor before you can put the next one on, but once the floors are up, you
    - Re: (Score:3, Interesting)
      
      by giorgist ( 1208992 ) writes:
      
      You havn't seen bulding go up. You don't place a brick render it, paint it hang a picture frame and go to the next one.
      
      A multi story building has a myriad of things happening at the same time. If only computers were as parralel processing.
      If you have 100 or 1000 people working on a building, each is an independant process that shares resources.
      
      It is simple, 8 core CPUs is a solution that arrived before the problem. A good 10 year old computers can do most of todays
      office work.
    - Re: (Score:3, Informative)
      
      by TapeCutter ( 624760 ) * writes:
      
      "a game renders one map at a time because it's pointless to render other maps until the player made his gameplay decisions and arrived there"
      
      Rendering is perfect for parallel processing, sure you only want one map at a time but each core can render part of the map independently from other parts of the map.
    - If you are right, we aren't very smart (Score:5, Interesting)
      
      by coryking ( 104614 ) * writes: on Sunday March 22, 2009 @10:38PM (#27293987) Homepage Journal
      
      But most computing in the world is done using single-threaded processes which start somewhere and go ahead step by step, without much gain from multiple cores.
      The fact that all we do is sequential tasks on our computer means we are still pretty stupid when it comes to "computing". If you look outside your CPU, you'll see the rest of the computers on this planet are massively parallel and do tons and tons of very complex operations far quicker than the computer running on either one of our desks.
      Most of the computers on the planet are organic ones inside of critters of all shapes and sizes. I dont see those guys running around with some context-switching, mega-fast CPU, do you?**. All the critters I see are using parallel computers with each "core" being a rather slow set of neurons.
      Basically, evolution of life on earth seems to suggest that the key to success is going parallel. Perhaps we should take the hint from nature.
      ** unless you count whatever the hell consciousness itself is... "thinking" seems to be single-threaded, but uses a bunch of interrupt hooks triggered by lord knows what running under the hood.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Interesting)
        
        by tftp ( 111690 ) writes:
        
        If you look outside your CPU, you'll see the rest of the computers on this planet are massively parallel
        You don't even need to look outside of your computer - it has many microcontrollers, each having a CPU, to do disk I/O, video, audio - even a keyboard has its own microcontroller. This is not far from a mouse being able to think about escape and run at the same time - most mechanical functions in critters are highly automated (a headless chicken is an example.) I don't call it multithreading because th
        
        Re: (Score:3, Interesting)
        
        by coryking ( 104614 ) * writes:
        
        Logically thinking, any single thought can't be easily parallelized, but why couldn't we think two thoughts at the same time?
        
        Yes, but there is increasing evidence (dont ask me to cite :-) that many of our thoughts are something that some background process has been "thinking about" long (i.e. seconds or minutes) before our actual conscious self does. There are many examples of this in Malcolm Gladwell's "Blink", though I dont feel much like citing them. Part of that book, I think, basically says that we
        
        Re: (Score:3, Funny)
        
        by coryking ( 104614 ) * writes:
        
        our train of though it single-threaded, but that doesn't mean our train of though isn't just a byproduct
        And sometimes, even, our background grammar checker misses things that our background finger-controller mis-types while on auto pilot. thought/though, thing/think are stroke-patterns that my hand-controller mixes up a lot and since this isn't something super-formal, the top-part of my brain never catches.
  - Re:Adapt (Score:5, Insightful)
    
    by Sentry21 ( 8183 ) writes: on Sunday March 22, 2009 @04:42PM (#27290883) Journal
    
    This is the sort of thing I like about Apple's 'Grand Central'. The idea behind is that instead of assigning a task to a processor, it breaks up a task into discrete compute units that can be assigned wherever. When doing processing in a loop, for example, if each iteration is independent, you could make each iteration a separate 'unit', like a packet of computation.
    The end result is that the system can then more efficiently dole out these 'packets' without the programmer having to know about the target machine or vice-versa. For some computation, you could use all manner of different hardware - two dual-core CPUs and your programmable GPU, for example - because again, you don't need to know what it's running on. The system routes computation packets to wherever they can go, and then receives the results.
    Instead of looking at a program as a series of discrete threads, each representing a concurrent task, it breaks up a program's computation into discrete chunks, and manages them accordingly. Some might have a higher priority and thus get processed first (think QoS in networking), without having to prioritize or deprioritize an entire process. If a specific packet needs to wait on I/O, then it can be put on hold until the I/O is done, and the CPU can be put back to work on another packet in the meantime.
    What you get in the end is a far more granular, more practical way of thinking about computation that would scale far better as the number of processing units and tasks increases.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Interesting)
      
      by Trepidity ( 597 ) writes:
      
      The problem is still the efficiency, though. There are lots of ways to mark units of computation as "this could be done separately, but depends on Y"--- OpenMP provides a bunch of them, for example, and there's been proposals dating back to the 80s [springer.com], probably earlier. The problem is figuring out how to implement that efficiently, though, so that the synchronization overhead doesn't dominate the parallelization gains. Does the system spawn new threads? Maintain a pool of worker threads and feed thunks to them
    - Re:Adapt (Score:5, Insightful)
      
      by fractoid ( 1076465 ) writes: on Monday March 23, 2009 @12:32AM (#27294607) Homepage
      
      This is the sort of thing I like about Apple's 'Grand Central'.
      
      What's this 'grand central' thing? From a few brief Google searches it appears to be a framework for using graphics shaders to offload number crunching to the video card. It'd be nice if they'd stick (at least for technical audiences) to slightly more descriptive and less grandiose labels.
      
      <rant>
      That's always been my main peeve with Apple, they give opaque, grandiloquent names to standard technologies, make ridiculous performance claims, then set their foaming fanboys loose to harass those of us who just want to get the job done. Remember "AltiVEC" (which my friend swore could burn a picture of Jesus's toenails onto a piece of toast on the far side of the moon with a laser beam comprised purely of blindingly fast array calculations) which turned out to just be a slightly better MMX-like SIMD addon?
      
      Or the G3/G4 processors which lead us to be breathlessly sprayed with superlatives for years until Apple ditched them for the next big thing - Intel processors! Us stupid, drone-like "windoze" users would never see the genius in using Intel proce... oh wait. No, no wait. We got the same "oooh the Intel Mac is 157 times faster than an Intel PC" for at least six months until 'homebrew' OSX finally proved that the hardware is exactly the friggin same now. For a while, thank God, they've been reduced to lavishing praise on the case design and elegant headphone plug placement. It looks like that's coming to an end, though.
      </rant>
      
      Parent Share
      twitter facebook
    - - Say what ? (Score:4, Informative)
        
        by Space cowboy ( 13680 ) * writes: on Sunday March 22, 2009 @05:52PM (#27291639) Journal
        
        Apple have no 2 core intel systems. Period.
        Even the lowly Mac mini is a dual-core system. Every laptop is a dual-core system. The Mac Pro is either 4-core (with hyperthreading for a virtual 8-core) or 8-core (with hyperthreading for a virtual 16-core) system.
        "Better to keep silent and look the fool, rather than speak and remove all doubt"
        Simon.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Informative)
        
        by Space cowboy ( 13680 ) * writes:
        
        Gaah - the < was swallowed in the statement "Apple have no <2 core intel systems. Period."
        Probably obvious, but to save people nit-picking
    - - Re: (Score:3, Insightful)
        
        by fractoid ( 1076465 ) writes:
        
        [T]hats why on Mac, Linux or Windows you stick with code that will just work on one core. No problems then.
        
        That, and the much greater reason that (a) 99% of software these days would run just fine on a single core P4 3GHz, and (b) most programmers are really, really bad and it's much harder to screw up a single-threaded app badly enough that I can't fix it, than it is to screw up a multi-threaded app.
  - Re:Adapt (Score:5, Informative)
    
    by Cassini2 ( 956052 ) writes: on Sunday March 22, 2009 @04:53PM (#27291005)
    
    HP's/Intel's EPIC idea (which is now Itanium) wasn't stupid, but it has a hard limitation on how far it scales (currently four instructions simultaneously). I don't have a final solution quite yet (though I am working on it as a thought project), but the problem we need to solve is getting a new instruction set which is inherently capable of parallel operation, not on adding more cores and pushing the responsibility onto the programmers for multi-threading their programs.
    The problem with very long instruction word (VLIW) architectures like the EPIC and the Itanium, is that the main speed limitations in today's computers are bandwidth and latency. Memory bandwidth and latency can be the dominant performance driver in a modern processor. At a system level, network, I/O (particularly for the video), and a hard drive bandwidth and latency can dramatically affect system performance.
    With a VLIW processor, you are taking many small instruction words, and gathering them together into a smaller number of much larger instruction words. This never pays off. Essentially, it is impossible to always use all of the larger instruction words. Even with a normal super-scalar processor, it is almost impossible to get every functional unit on the chip to do something simultaneously. The same problem applies with VLIW processors. Most of the time, a program is only exercising a specific area of the chip. With VLIW, this means that many bits in the instruction word will go unused much of the time.
    In and of itself, wasting bits in an instruction word isn't a big deal. Modern processors can move large amounts of memory simultaneously, and it is handy to be able to link different sections of the instruction word to independent functional blocks inside the processor. The problem is the longer instruction words use memory bandwidth every time they are read. Worse, the longer instruction words take up more space in the processor's cache memory. This either requires a larger cache, increasing the processor cost, or it increases latency, as it translates into fewer cache hits. It is no accident the Itanium is both expensive and has an unusually large on-chip cache.
    The other major downfall of the VLIW architecture is that it cannot emulate a short instruction word processor quickly. This is a problem both for interpreters and for 80x86 emulation. Interpreters are a very popular application paradigm. Many applications contain them. Certain languages, like .NET and Java, use pseudo-interpreters/compilers. 80x86 emulation is a big deal, as the majority of the worlds software is written for an 80x86 platform, which features a complex variable length instruction word. The long VLIW instructions are unable to decode either the short 80x86 instructions, or the Java JIT instruction set, quickly. Realistically, a VLIW instruction processor will be no quicker, on a per instruction basis, than an 80x86 processor, despite the fact the VLIW architecture is designed to execute 4 instructions simultaneously.
    The memory bandwidth problem, and the fact that VLIW processors don't lend themselves to interpreters, really slows down the usefulness of the platform.
    
    Parent Share
    twitter facebook
    - Re:Adapt (Score:4, Interesting)
      
      by Dolda2000 ( 759023 ) writes: <fredrik@dolda200 0 . c om> on Sunday March 22, 2009 @05:38PM (#27291481) Homepage
      
      All that which you say is certainly true, but I would still argue that EPIC's greatest problem is its hard parallelism limit. True, it's not as hard as I tried to make it out, since an EPIC instruction bundle has its non-dependence flag, but you cannot, for instance, make an EPIC CPU break off and execute two sub-routines in parallel. Its parallelism lies only in very small spatial window of instructions.
      What I'd like to see is, rather, that the CPU can implement a kind of "micro-thread" function, that would allow two larger codepaths simultaneously -- larger than what EPIC could handle, but quite possibly still far smaller than what would be efficient to distribute on OS-level threads, with all the synchronization and scheduler overhead that would mean.
      
      Parent Share
      twitter facebook
    - Re: (Score:3, Insightful)
      
      by bertok ( 226922 ) writes:
      
      I think the consensus was that making compilers emit efficient VLIW for a typical procedural language such as C is very hard. Intel spend many millions on compiler research, and it took them years to get anywhere. I heard of 40% improvements in the first year or two, which implies that they were very far from ideal when they started.
      To achieve automatic parallelism, we need a different architecture to classic "x86 style" procedural assembly. Programming languages have to change too, the current crop are too
  - Re:Adapt (Score:4, Insightful)
    
    by init100 ( 915886 ) writes: on Sunday March 22, 2009 @05:09PM (#27291193)
    
    To begin with, I don't believe the article about the systems being badly prepared. I can't speak for Windows, but I know for sure that Linux is capable of far heavier SMP operation than 4 CPUs.
    My take on the article is that it is referring to applications provided with or at least available for the systems in question, and not actually the systems themselves. In other words, it takes the user view, where the operating system is so much more than just the kernel and the other core subsystems.
    But more importantly, many programming tasks simply aren't meaningful to break up into such units of granularity is OS-level threads.
    Actually, in Linux (and likely other *nix systems), with command lines involving multiple pipelined commands, the commands are executed in parallel, and are thus being scheduled on different processors/cores if available. This is a simple way of using the multiple cores available on concurrent systems, and thus, advanced programming is not always necessary to take advantage of the power of multicore chips.
    
    Parent Share
    twitter facebook
  - Re: (Score:3, Interesting)
    
    by erroneus ( 253617 ) writes:
    
    Multi-core processing is one thing but access to multiple chunks of memory and peripherals are also keeping computers slow. After playing with running machines from PXE boot and NFS rooted machines, I was astounded at how fast those machines performed. Then I realized that the kernel and all wasn't being delayed waiting on local hardware for disk I/O.
    It seems to me, when NAS and SAN are used, things perform a bit better. I wonder what would happen if such control and I/O systems were applied into the sam
- It's already there (Score:4, Insightful)
  
  by wurp ( 51446 ) writes: on Sunday March 22, 2009 @04:14PM (#27290575) Homepage
  
  Seriously, no one has brought up functional programming, LISP, Scala or Erlang? When you use functional programming, no data changes and so each call can happen on another thread, with the main thread blocking when (& not before) it needs the return value. In particular, Erlang and Scala are specifically designed to make the most of multiple cores/processors/machines.
  See also map-reduce and multiprocessor database techniques like BSD and CouchDB (http://books.couchdb.org/relax/eventual-consistency).
  
  Parent Share
  twitter facebook
  - - Re: (Score:3, Interesting)
      
      by SpuriousLogic ( 1183411 ) writes:
      
      I'm not sure I totally agree that Haskell if the future, although I do think that functional programming right now looks to be the the most promising way to deal with muli-cores. Scala has some very strong points that can see it's adoption beat the other, specifically being able to run in the Java JVM and make use of existing Java libraries. You can use the function aspects of Scala when you need to, but still use Java where you do not need parallelism.
- Re:Adapt (Score:5, Insightful)
  
  by Cassini2 ( 956052 ) writes: on Sunday March 22, 2009 @04:19PM (#27290637)
  
  Give us a year maybe two.
  I think this problem will take longer than a year or two to solve. Modern computers are really fast. They solve simple problems, almost instantly. A side-effect of this, is that if you underestimate the computational power required for the problem at hand, then you are likely to be off by large amounts.
  If you implement an order n-squared algorithm, O(n^2), on a 6502 (Apple II), if n was larger than a few hundred, you were dead. Many programmers wouldn't even try implementing hard algorithms on the early Apple II computers. On the other hand, a modern processor might tolerate O(n^2) algorithms with n larger than 1000. Programmers can try solving much harder problems. However, the programmers ability to estimate and deal with computational complexity has not changed since the early days of computers. Programmers use generalities. They use ranges: like n will be between 5 and 100, or n will be between 1000 and 100,000. With modern problems, n=1000 might mean the problem can be solved on a netbook, and n=100,000 might require a small multi-core cluster.
  There aren't many programming platforms out there that scale smoothly between applications deployed on a desktop, to applications deployed on a multi-core desktop, and then to clusters of multi-core desktops. Perhaps most worrying, is that the new programming languages that are coming out, are not particularly useful for intense data analysis. The big examples of this for me are: .NET and Functional Languages. .NET deployed at about the same time multi-core chips showed up, and has minimal support for it. Functional languages may eventually be the solution, but for any numerically intensive application, tight loops of C code are much faster.
  The other issue with multi-core chips, is that as a programmer, I have two solutions to making my code go faster:
  1. Get out the assembly print outs and the profiler, and figure out why the processor is running slow. Doing this, helps every user of the application, and works well with almost any of the serious compiled languages (C, C++). Sometimes, I can get a 10:1 speed improvement.(*) It doesn't work so well with Java, .NET, or many functional languages, because they use run-time compilers/interpreters and don't generate assembly code.
  2. I recode for a cluster. Why stop at a multi-core computer? If I can get a 2:1 to 10:1 speed up by writing better code, then why stop at a dual or quad core? The application might require a 100:1 speed up, and that means more computers. If I have a really nasty problem, chances are that 100 cores are required, not just 2 or 8. Multi-core processors are nice, because they reduce cluster size and cost, but a cluster will likely be required.
  The problem with both of the above approaches, is that from a tools perspective, they are the worst choice for multi-core optimizations. Approach 1 will force me into using C and C++, which doesn't even handle threads really well. In particular, C and C++ lacks an easy implementation of Software Transactional Memory, NUMA, and clusters. This means that approach 2 may require a complete software redesign, and possibly either a language change or a major change in the compilation environment. Either way, my days of fun loving Java and .NET code are coming to a sudden end.
  I just don't think there is any easy way around it. The tools aren't yet available for easy implementation of fast code that scales between the single-core assumption and the multi-core assumption in a smooth manner.
  Note: * - By default, many programmers don't take advantage of many features that may increase the speed of an algorithm. Built-in special purpose libraries, like MMX, can dramatically speed up certain loops. Sometimes loops contain a great deal of code that can be eliminated. Maybe a function call is present in a tight loop. Anti-virus software can dramatically affect system speed. Many little things can sometimes make big differences.
  Read the rest of this comment...
  
  Parent Share
  twitter facebook
  - Re:Mythical Machine Month (Score:3, Interesting)
    
    by Tiger4 ( 840741 ) writes:
    
    2. I recode for a cluster. Why stop at a multi-core computer? If I can get a 2:1 to 10:1 speed up by writing better code, then why stop at a dual or quad core? The application might require a 100:1 speed up, and that means more computers. If I have a really nasty problem, chances are that 100 cores are required, not just 2 or 8. Multi-core processors are nice, because they reduce cluster size and cost, but a cluster will likely be required.
    
    I think I agree with you, BUT... don't fall into the old trap: If ten machines can do the job in 1 month, 1 machine can do the job in 10 months. But it doesn't necessarily follow that if one machine can do the job in 10 months, 10 machines can do the job in 1 month.
    Also, the problem with runtime interpreters is not that they don't generate assembly code. The problem is that it is harder to get at the underlying code that is really executing. That code could be optimized if you could see it. But seeing i
  - - Re: (Score:3, Informative)
      
      by TheRaven64 ( 641858 ) writes:
      
      Erlang, as mentioned elsewhere, is a great example of a high level functional language which parallelizes much better than C/C++,
      No it isn't. Erlang gains absolutely no benefit in terms of parallelism from being a functional language. All of the concurrency of Erlang comes from the CSP [wikipedia.org] model, while functional languages get theirs via an extension to the lambda calculus.
      The one relevant feature of Erlang when talking about functional languages is that it does not allow mutable data other than the process dictionary. If you want to write parallel code in any language, there is one golden rule you should follow:
      No data shall be b
- Re:Adapt (Score:5, Funny)
  
  by camperslo ( 704715 ) writes: on Sunday March 22, 2009 @05:26PM (#27291359)
  
  The programmers of Slashdot are ready for multiple cores and threads. There is no problem.
  When performing a number of operations in parallel the key is to simply ignore the results of each operation.
  For operations that would have used the result of another as input simply use what you think the result might be or what you wish it was.
  The programmers of Slashdot already have the needed skills for such programming as the mental processes are the same ones that enable discussion of TFAs without reading them.
  
  Parent Share
  twitter facebook
Nothing new to see here... (Score:5, Insightful)

by Microlith ( 54737 ) writes: on Sunday March 22, 2009 @03:35PM (#27290099)

So basically yet another tech writer finds out that a huge number of applications are still single threaded, and that it will be a while before we have applications that can take advantage of the cores that the OS isn't actively using at the moment. Well, assuming you're running a desktop and not a server.
This isn't a performance issue with regards to Windows or Linux, they're quite adept at handling multiple cores. They just don't need that much themselves and the applications run these days, individually, don't need much more than that either.
So yes, applications need parallelization. The tools for it are rudimentary at best. We know this. Nothing to see here.

Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by thrillseeker ( 518224 ) writes:
  
  Did you ever follow the Occam language? It seemed to have parallelization intrinsic, but it never went anywhere.
  - Re: (Score:3, Informative)
    
    by 0123456 ( 636235 ) writes:
    
    Did you ever follow the Occam language? It seemed to have parallelization intrinsic, but it never went anywhere.
    Occam was heavily tied into the Transputer, and without the transputer's hardware support for message-passing, it's a bit of a non-starter.
    It also wasn't easy to write if you couldn't break your application down into a series of simple processes passing messages to each other. I suspect it would go down better today now people are used to writing object-oriented code, which is a much better match to the message-passing idea than the C code that was more common at the time.
- Re:Nothing new to see here... (Score:5, Interesting)
  
  by phantomfive ( 622387 ) writes: on Sunday March 22, 2009 @03:59PM (#27290423) Journal
  
  From the article:
  The onus may ultimately lie with developers to bridge the gap between hardware and software to write better parallel programs......They should open up data sheets and study chip architectures to understand how their code can perform better, he said.
  Here's the problem, most programs spend 99% of its time waiting. MOST of that is waiting for user input. Part of it is waiting for disk access (as mentioned in the AnandTech story [slashdot.org], the best thing you can do to speed up your computer is get a faster hard drive/SSD). A miniscule part of it is spent in the processor. If you don't believe me, pull out a profiler and run it on one of your programs, it will show you where things can be easily sped up.
  
  Now, given that the performance of most programs is not processor bound, what is there to gain by parallelizing your program? If the performance gain were really that significant, I would already be writing my program with threads, even with the tools we have now. The fact of the matter is in most cases, there is really no point to writing your program in a parallel manner. This is something a lot of the proponents of Haskell don't seem to understand, that even if their program is easily paralellizable, the performance gain is not likely to be noticeable. Speeding up hard drives will make more of a difference to performance in most cases than adding cores.
  
  I for one am certainly not going to be reading chip data sheets unless there's some real performance benefit to be found. If there's enough benefit, I may even write parts in assembly, I can handle any ugliness. But only if there's a benefit from doing so.
  
  Parent Share
  twitter facebook
  - That's a big leap (Score:4, Insightful)
    
    by SuperKendall ( 25149 ) writes: on Sunday March 22, 2009 @04:20PM (#27290641)
    
    If you don't believe me, pull out a profiler and run it on one of your programs, it will show you where things can be easily sped up.
    Now, given that the performance of most programs is not processor bound
    That's a pretty big leap, I think.
    Yes a lot of todays apps are more user bound than anything. But there are plenty of real-world apps that people use that are still pretty processor bound - Photoshop, and image processing in general is a big one. So can be video, which starts out disk bound but is heavily processor bound as you apply effects.
    Even Javascript apps are processor bound, hence Chrome...
    So there's still a big need for understanding how to take advantage of more cores - because chips aren't really getting faster these days so much as more cores are being added.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Informative)
      
      by phantomfive ( 622387 ) writes:
      
      So there's still a big need for understanding how to take advantage of more cores - because chips aren't really getting faster these days so much as more cores are being added.
      OK, so we can go into more detail. For most programs, parallelization will do essentially nothing. There are a few programs that can benefit from it, as you've mentioned. But those programs are already taking advantage of them, not only do video encoding programs use multiple cores, some can even farm the process out over multiple systems. So it isn't a matter of programmers being lazy, or tools not being available, it's a matter of in most cases, multiple cores won't make a difference. If you run wind
    - Re:That's a big leap (Score:5, Informative)
      
      by davecb ( 6526 ) * writes: <davecb@spamcop.net> on Sunday March 22, 2009 @05:44PM (#27291551) Homepage Journal
      
      And if you look at a level lower that the profiler, you find your programs are memory-bound, and getting worse. That's a big part of the push toward multithreaded processors.
      To paraphrase another commentator, they make process switches infinitely fast, so one can keep on using the ALU while your old thread is twiddling its thumbs waiting for a cache-line fill.
      --dave
      
      Parent Share
      twitter facebook
  - Re: (Score:3, Insightful)
    
    by caerwyn ( 38056 ) writes:
    
    This is true to a point. The problem is that, in modern applications, when the user *does* do something there's generally a whole cascade of computation that happens in response- and the biggest concern for most applications is that app appear to have short latency. That is, all of that computation happens as quickly as possible so the app can go back to waiting for user input.
    There's a lot of gain that can be gotten by threading user input responses in many scenarios. Who cares if the user often waits 5 mi
    - - Re:Nothing new to see here... (Score:4, Interesting)
        
        by caerwyn ( 38056 ) writes: on Sunday March 22, 2009 @05:31PM (#27291419)
        
        I don't entirely agree with you here. A lot of current applications *do* suffer from CPU-induced latency after user interactions, and the problem is simple: they don't differentiate between the things that must get done before control is returned to the user, and the things that need to happen in response to the action but can be allowed to happen whenever resources are free. Even when the problem is resource-access latency, multithreading can be a win because that latency no longer contributes to the latency that the user perceives if it happens on a background thread.
        Something as simple as tossing function calls off on a background thread to deal with some of these tasks would do a great deal to improve latency from the user's perspective, and is really quite trivial to implement. Most programmers don't do it, though. Part of that is that in most situations there aren't ready-made solutions- you can't just say "run this function call on a background thread", you've got to go through the pthread creation process, etc. (Apple's Cocoa framework is actually an exception to this with it's NSOperation).
        The situation is analogous to that of an interrupt task: Do absolutely as little as possible before returning; everything else should happen on some other thread.
        I agree with you regarding optimization, but it's been my experience that many applications *can* benefit from these sorts of simple multithreading techniques- the programmers just don't do them, either from lack of ability or lack of resources.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Insightful)
        
        by Raenex ( 947668 ) writes:
        
        There are plenty of good designs that work in a single threaded environment that do not in multi-threaded environment. It's just a completely different ballgame when you allow multiple threads to be running on the same piece of code. With threading, the complexity goes up an order of magnitude and so does the penalty for failure.
        Anyways, I'm out. This is the standard debate about "good" programmers and "good" designs vs dangerous techniques that should be avoided.
      - elephant years (Score:3, Insightful)
        
        by epine ( 68316 ) writes:
        
        Knuth's maxim is sufficiently pithy to have become, over time, self referential, as evidenced by your misunderstanding.
        The root of all evil used to be deep and singular, now it is broad and shallow. I guarantee you that Knuth did not include choosing the best fundamental algorithm under the label "premature" unless it involves squabbling over log log N terms or stray digits in the exponent term.
        http://www.siam.org/pdf/news/174.pdf [siam.org]
        An unpacked (deoptimized) version of Knuth's maxim is that the transition fro
- Re:Nothing new to see here... (Score:5, Funny)
  
  by ari wins ( 1016630 ) writes: on Sunday March 22, 2009 @04:21PM (#27290653)
  
  I almost modded you Redundant to help get your point across.
  
  Parent Share
  twitter facebook
There's a simple paradigm here (Score:5, Interesting)

by mysidia ( 191772 ) writes: on Sunday March 22, 2009 @03:39PM (#27290139)

Multiple virtual machines on the same piece of metal, with a workstation hypervisor, and intelligent balancing of apps between backends.
Multiple OSes sharing the same cores. Multiple apps running on the different OSes, and working together.
Which can also be used to provide fault tolerance... if one of the worker apps fails, or even one of the OSes fails, your processor capability is reduced, a worker app in a different OS takes over, use checkpointing procedures, and shared state, so the apps don't even lose data.
You should even be able to shutdown a virtual OS for windows updates without impact, if the apps that arise get designed properly...

Share
twitter facebook
Huh? (Score:5, Funny)

by Samschnooks ( 1415697 ) writes: on Sunday March 22, 2009 @03:39PM (#27290141)

...programmers are to blame for that
The development tools aren't available and research is only starting."
Stupid programmers! Not able to develop software without the tools! In my day we wrote our own tools - in the snow, uphill, both ways! We didn't need no stink'n vendor to do it for us - and we liked it that way!

Share
twitter facebook
The article's turning a real problem into FUD. (Score:5, Informative)

by davecb ( 6526 ) * writes: <davecb@spamcop.net> on Sunday March 22, 2009 @03:40PM (#27290145) Homepage Journal

Firstly, it's false on the face of it: Ubuntu is certified on Sun T2000, a 32-thread and Canonical is supporting it.
Secondly. it's the same FUD as we heard from uniprocessor manufacturers when multiprocessors first came out: this new "symmetrical multiprocessing" stuff will never work, it'll bottleneck on locks.
The real problem is that some programs are indeed badly written. In most cases, you just run lots of individual instances of them. Others, for grid, are well-written, and scale wonderfully.
The ones in the middle are the problem, as they need to coordinate to some degree, and don't do that well. It's a research area in computer science, and one of the interesting areas is in transactional memory.
That's what the folks at the Multicore Expo are worried about: Linux itself is fine, and has been for a while.
--dave

Share
twitter facebook
- - Re:The article's turning a real problem into FUD. (Score:5, Funny)
    
    by cowbutt ( 21077 ) writes: on Sunday March 22, 2009 @03:49PM (#27290291) Journal
    
    I dunno, I'm not feeling particularly fearful or doubtful after reading the article.
    The articles has, apparently, sown Uncertainty in you, however, so it was 33.3% successful.
    
    Parent Share
    twitter facebook
- - Re: (Score:3)
    
    by tepples ( 727027 ) writes:
    
    Dealing with multiple cores is the operating system's problem - not the application's. If the programmer uses multiple threads or processes, then it should be the OS that worries about allocating resources among the cores.
    But the problem of TFA is that desktop applications don't use enough threads or processes. If the programmer hasn't split an application into multiple threads or processes, then there usually isn't more than one thread of one process that wants to run at any given time, and there is nothing for the operating system to schedule.
- - - - Real Dumb (Score:3, Insightful)
        
        by omb ( 759389 ) writes:
        
        As has already been explained, Non-Sequential thinking is hard, you postulate double speed, BUT the producer thread, the app finished and handed of the buffer to the OS to send to the GPU, and you say it threads this. Well fine, so the threaded part can run on another core, but then hardware DMAs the data and waits for a GPU interrupt/done-queue ack so how does this speed things up on multicore. Not at all, someone has to set up the DMA and wait, not run, while it completes, so unless all cores are at 100%
Example: Scripting Languages (Score:4, Interesting)

by mcrbids ( 148650 ) writes: on Sunday March 22, 2009 @03:41PM (#27290175) Journal

Languages like PHP/Perl, as a rule, are not designed for threading - at ALL. This makes multi-core performance a non-starter. Sure, you can run more INSTANCES of the language with multiple cores, but you can't get any single instance of a script to run any faster than what a single core can do.
I have, so, so, SOOOO many times wished I could split a PHP script into threads, but it's just not there. The closest you can get is with (heavy, slow, painful) forking and multiprocess communication through sockets or (worse) shared memory.
Truth be told, there's a whole rash of security issues through race conditions that we'll soon have crawling out of nearly every pore as the development community slowly digests multi-threaded applications (for real!) in the newly commoditized multi-CPU environment.

Share
twitter facebook
- Re:Example: Scripting Languages (Score:5, Interesting)
  
  by dansmith01 ( 1128937 ) writes: on Sunday March 22, 2009 @04:24PM (#27290695)
  
  Perl has excellent support for building threaded applications. See http://perldoc.perl.org/threads.html [perl.org] . I code multi-threaded apps in perl all the time and they utilize my quad-code very efficiently - in fact, my biggest hassle with multithreading is keeping the CPU cooled! There's also a threads::shared module (http://perldoc.perl.org/threads/shared.html) for handling locks, etc. I'd be hard pressed to imagine better language support for threading. Hardware, operating systems, and a lot of languages support threading. Granted, it isn't always easy/possible/worth it, but as things currently stand, the only bottleneck is programmers who are too lazy to design their algorithms for parallel execution.
  
  Parent Share
  twitter facebook
- Re:Example: Scripting Languages (Score:5, Insightful)
  
  by amorsen ( 7485 ) writes: <benny+slashdot@amorsen.dk> on Sunday March 22, 2009 @04:57PM (#27291059)
  
  Fork isn't slow or painful. And if you think shared memory is a bad way to communicate, you REALLY won't like threads.
  
  Parent Share
  twitter facebook
How many tools do you need? (Score:5, Insightful)

by Anonymous Coward writes: on Sunday March 22, 2009 @03:42PM (#27290185)

"The development tools aren't available and research is only starting"
Hardly. Erlang's been around 20 years. Newer languages like Scala, Clojure, and F# all have strong concurrency. Haskell has had a lot of recent effort in concurrency (www.haskell.org/~simonmar/papers/multicore-ghc.pdf).
If you prefer books there's: Patterns for Parallel Programming, the Art of Multiprocessor Programming, and Java Concurrency in Practice, to name a few.
All of these are available now, and some have been available for years.
The problem isn't that tools aren't available, it's that the programmers aren't preparing themselves and haven't embraced the right tools.

Share
twitter facebook
BeOS (Score:5, Interesting)

by Snowblindeye ( 1085701 ) writes: on Sunday March 22, 2009 @03:42PM (#27290191)

Too bad BeOS died. One of the axioms the developers had was 'the machine is a multi processor machine', and everything was built to support that.
Seems like they were 15 years ahead of their time. But, on the other hand, too late to establish an other OS in a saturated market. Pity, really.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by yakumo.unr ( 833476 ) writes:
  
  So you missed Zeta then ? http://www.zeta-os.com/cms/news.php [zeta-os.com] (change to English via the dropdown on the left)
  - Re: (Score:3, Informative)
    
    by b4dc0d3r ( 1268512 ) writes:
    
    Looks dead to me, a year ago they posted this:
    With immediate effect, magnussoft Deutschland GmbH has stopped the distribution of magnussoft Zeta 1.21 and magnussoft Zeta 1.5. According to the statement of Access Co. Ltd., neither yellowTAB GmbH nor magnussoft Deutschland GmbH are authorized to distribute Zeta.
    
    http://www.bitsofnews.com/content/view/5498/44/ [bitsofnews.com]
- Re: (Score:3, Interesting)
  
  by verbatim_verbose ( 411803 ) writes:
  
  It may have been an axiom, but really, what did BeOS do (or want to do) that Linux doesn't do now?
  The Linux OS has been scaled to thousands of CPUs. Sure, most applications don't benefit from multi-processors, but that'd be true in BeOS, too.
  I'd honestly like to know if there is some design paradigm that was lost with BeOS that isn't around today.
Another flamebait story by timothy (Score:5, Insightful)

by Anonymous Coward writes: on Sunday March 22, 2009 @03:46PM (#27290259)

The quote presented in the summary is nowhere to be found in the linked article. To make matters worse, the summary claims that linux and windows aren't designed for multicore computers but the linked article only claims that some applications are not designed to be multi-threaded or running multiple processes. Well, who said that every application under the sun must be heavily multi-threaded or spawning multiple processes? Where's the need for a email client to spawn 8 or 16 threads? Will my address book be any better if it spans a bunch of processes?
The article is bad and timothy should feel bad. Why is he still responsible for any news being posted on slashdot?

Share
twitter facebook
Parallel programming is hard, film at 11. (Score:5, Informative)

by Troy Baer ( 1395 ) writes: on Sunday March 22, 2009 @03:57PM (#27290411) Homepage

The /. summary of TFA is almost exquisitely bad. It's not Window or Linux that's not ready for multicore (as both have supported multi-processor machines for on the order of a decade or more), but rather the userspace applications that aren't ready. The reason is simple: Parallel programming is rather hard, and historically most ISVs have haven't wanted to invest in it because they could rely on the processors getting faster every year or two... but no longer.
One area where I disagree with TFA is the claimed paucity of programming models and tools. Virtually every OS out there supports some kind of concurrent programming model, and often more than one depending on what language is used -- pthreads [wikipedia.org], Win32 threads, Java threads, OpenMP [openmp.org], MPI [mpi-forum.org] or Global Arrays [pnl.gov] on the high end, etc. Most debuggers (even gdb) also support debugging threaded programs, and if those don't have enough heft, there's always Totalview [totalview.com]. The problem is that most ISVs have studiously avoided using any of these except when given no other choice.
--t

Share
twitter facebook
- Re:Parallel programming is hard, film at 11. (Score:5, Insightful)
  
  by klossner ( 733867 ) writes: on Sunday March 22, 2009 @04:00PM (#27290443)
  
  In fact, TFA doesn't even use the words "Linux" or "Windows."
  
  Parent Share
  twitter facebook
Multithreaded applications not always needed (Score:3, Insightful)

by Pascal Sartoretti ( 454385 ) writes: on Sunday March 22, 2009 @03:59PM (#27290431)

Developers still write programs for single-core chips and need the tools necessary to break up tasks over multiple cores.

So what? If I had a 32 core system, at least each running process (even if single-threaded) could have a core just for itself. Only a few basic applications (such as a browser) really need to be designed for multiples threads.

Share
twitter facebook
Article = -1 Flamebait (Score:4, Insightful)

by tyler_larson ( 558763 ) writes: on Sunday March 22, 2009 @04:15PM (#27290597) Homepage

If you spend more time assigning blame than you do describing the problem, then clearly you don't have anything insightful to say.

Share
twitter facebook
Not tools, developers (Score:4, Insightful)

by Todd Knarr ( 15451 ) writes: on Sunday March 22, 2009 @04:37PM (#27290827) Homepage

Part of the problem is that tools do very little to help break programs down into parallelizable tasks. That has to be done by the programmer, they have to take a completely different view of the problem and the methods to be used to solve it. Tools can't help them select algorithms and data structures. One good book related to this was one called something like "Zen of Assembly-Language Optimization". One exercise in it went through a long, detailed process of optimizing a program, going all the way down to hand-coding highly-bummed inner loops in assembly. And it then proceeded to show how a simple program written in interpreted BASIC(!) could completely blow away that hand-optimized assembly-language just by using a more efficient algorithm. Something similar applies to multi-threaded programming: all the tools in the world can't help you much if you've selected an essentially single-threaded approach to the problem. They can help you squeeze out fractional improvements, but to really gain anything you need to put the tools down, step back and select a different approach, one that's inherently parallelizable. And by doing that, without using any tools at all, you'll make more gains than any tool could have given you. Then you can start applying the tools to squeeze even more out, but you have to do the hard skull-sweat first.
And the basic problem is that schools don't teach how to parallelize problems. It's hard, and not everybody can wrap their brain around the concept, so teachers leave it as a 1-week "Oh, and you can theoretically do this, now let's move on to the next subject." thing.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by EnglishTim ( 9662 ) writes:
  
  And the basic problem is that schools don't teach how to parallelize problems. It's hard, and not everybody can wrap their brain around the concept...
  And there's more to it than that; If a problem is hard, it's going to take longer to write and much longer to debug. Often it's just not worth investing the extra time, money and risk into doing something that's only going to make the program a bit faster. If we proceed to a future where desktop computers all have 256 cores, the speed advantage may be worth it but currently it's a lot of effort without a great deal of gain. There's probably better ways that you can spend your time.
This is kinda like XML... (Score:3, Interesting)

by FlyingGuy ( 989135 ) writes: <.flyingguy. .at. .gmail.com.> on Sunday March 22, 2009 @05:10PM (#27291211)

it is the answer to the question that no one asked...
In a real world application, as others have mentioned pretty much all of a programs time is spent in an idle loop waiting something to happen and in almost all circumstances it is input from the user in whatever form, mouse, keyboard, etc.
So lets say it is something life Final Cut. Now to be sure when someone kicks of a render this is an operation that can be spun off on its own thread or its own process, freeing up the main process loop to respond to other things that the user might be doing, but that is where the rubber really hits the road is user input. The user could do something that affects the process that was just spun off, either as a separate thread or process on the same core or any other number of cores so you have to keep track of what the user is doing in the context of things that have been farmed out into other cores/processes/threads.
Enter the OS.. Take your pick since it really does not matter which OS we are talking about, they all do the same basic things, perhaps differently, but they do. How does an OS designer make sure any of say 16 cores ( dual 8 core processors) are actually well and fairly utilized? Would it be designed to use a core to handle each of the main functions of the OS, lets say Drive Access, Com Stack pick your protocol here, Video Processing etc., or should it just run a scheduler like those that they now run which farms out thread processing based on priority? Is there really any priority scheme for multiple cores that could run say hundreds of threads / processes each? And what about memory? A single core machine that is say truly 64 bit can handle a very large amount of memory and that single core controls and has access to all that ram at its whim ( DMA not withstanding ), but what do you do now that you have 16 cores all wanting to use that memory, do we create a scheduler to schedule access from 16 different demanding stand alone processors or do we simply give each core a finite memory space and then have to control the movement of data from each memory space to another, since a single process thread ( handling the main UI thread for a program ) has to be aware of when something is finished on one core and then get access to that memory to present results either as data written to say a file or written into video memory for display?
I submit that the current paradigm of SMP is inadequate for these tasks and must be rethought to take advantage of this new hardware. I think a more efficient approach is that each core detected would be fired up with its own monitor stack as a place to start so that the scheduling is based upon the feedback from each core. The monitor program would be able to ensure that the core it is responsible for is optimized for the kind of work that is presented. This concept while complicated could be implemented and serve as a basis for further development in this very complex space.
In the terms of "super computers" this has been dealt with but in a very different methodology that I do not think lends itself to general computing. Deep Blue, Cray's and things like that aren't really relevant in this case since those are mostly very custom designs to handle a single purpose and are optimized for things like Chess or Weather Modeling, Nuclear Weapons study where the problem are already discretely chunked out with a known set of algorithms and processes. General purpose computing on the other hand is like trying to heard cats from the OS point of view since you never really know what is going to be demanded and how.
OS designers and user space software designers need to really break this down and think it all the way through before we get much further or all this silicon is not going to used well or efficiently.

Share
twitter facebook
This is incorrect (Score:4, Funny)

by hazydave ( 96747 ) writes: on Sunday March 22, 2009 @05:20PM (#27291309)

The idea of an OS and/or suppoet tools handling the SMP problem is nothing more than a crutch for bad programming.
In fact, anyone who grew up with a real multitheaded, multitasking OS is already writing code that will scale just dandy to 8 cores and beyond. When you accept that a thread is nothing more or less than a typical programming construct, you simply write better code. This is no more or less an amazing thing than when regular programmers embraced subroutines or structures.
This was S.O.P. back in the late 80s under the AmigaOS, and enhanced in the early/mid 90s under BeOS. This in not new, and not even remotely tied to the advent of multicore CPUs.
The problem here is simple: UNIX and Windows. Windows had fake multitasking for so long, Windows programmers barely knew what you could do when you had "thread" in the same toolkit as "subroutine", rather than it being something exotic. UNIX, as a whole, didn't even have lightweight preemptive threads until fairly recently, and UNIX programmers are only slowly catching up.
However, neither of these is even slightly an OS problem... it's an application-level problem. If programmers continue to code as if they had a 70s-vintage OS, they're going to think in single threads and suck on 8-core CPUs. If programmers update themselves to state-of-the-1980s thinking, they'll scale to 8-cores and well beyond.

Share
twitter facebook
- Re:This is incorrect (Score:4, Informative)
  
  by Todd Knarr ( 15451 ) writes: on Sunday March 22, 2009 @07:22PM (#27292541) Homepage
  
  Unix didn't for a long time have lightweight preemptive threads because it had, from the very beginning, lightweight preemptive processes. I spent a lot of time wondering why Windows programmers were harping on the need for threads to do what I'd been doing for a decade with a simple fork() call. And in fact if you look at the Linux implementation, there are no threads. A thread is simply a process that happens to share memory, file descriptors and such with it's parent, and that has some games played with the process ID so it appears to have the same PID as it's parent. Nothing new there, I was doing that on BSD Unix back in '85 or so (minus the PID games).
  That was, in fact, one of the things that distinguished Unix from VAX/VMS (which was in a real sense the predecessor to Windows NT, the principal architect of VMS had a big hand in the architecture and internals of NT): On VMS process creation was a massive, time-consuming thing you didn't want to do often, while on Unix process creation was fast and fairly trivial. Unix people scratched their heads at the amount of work VMS people put into keeping everything in a single process, while VMS people boggled at the idea of a program forking off 20 processes to handle things in parallel.
  
  Parent Share
  twitter facebook
  - - - Re: (Score:3, Interesting)
        
        by dkf ( 304284 ) writes:
        
        It's better to have specifically declared shared memory with inherently limited access. At the very least, analysis could catch unlocked accesses to known-shared memory.
        You're better off going to a message-passing model; they're theoretically much more tractable (there are several schemes that have had decades of work done and even spawned programming languages) and they scale up to multi-machine computing (e.g. cluster-scale) much more easily.
        Shared memory parallelism is just plain nasty. Occasionally useful, but always nasty. Use with care and good taste.
The problem is human nature & choice (Score:3, Insightful)

by gooneybird ( 1184387 ) writes: on Monday March 23, 2009 @08:45AM (#27296685)

"The problem my dear programmer, as you so elequently put, is one of choice.."

Seriously. I have been involved with software development from 8-bit pics to Cluster's spanning wans and everything in between for the past 20 years or so.

Multiprocessing involves coordination between the processes. It doesn't matter (too much) whether it's separate cores or separate silicon. On any given modern OS there are plenty of examples of multiprocessor execution: Hard drives each have a processor, video cards each have a processor, USB controllers have a processor. All of these work because there is a well-defined API between them and the OS - a.k.a device drivers. People that write good device drivers (and kernel code) understand how an OS works. This is not generally true of the broader developer population.

Developer's keep blaming the CPU manufactures' that it's their fault. It's not. What prevents parallel processing from becoming mainstream is the lack of a standard inter-process communications mechanism (at the language level) that abstracts a lot of the dirty little details that are needed. Once the mechanism is in place, then people will start using it. I am not referring to semaphores and mutexes. These are synchronization mechanisms, NOT (directly) communication mechanisms... I am not talking about queues either - too much leeway on their use. Sockets would be closer, but most people think of sockets for "network" applications. They should be thinking of them as "distributed applications". As in distrbuted across cores. As an example, Microsoft just recently started to demonstrate that they "get it" because with the next release of VS. It will have a messaging library.

choice:

At this time there are too many different ways to implement multi-threaded/multi-processor aware software. Each implementation has possible bugs - race conditions, lockups, priority inversion, etc. The choices need to be narrowed

Having a standard (language & OS) API is the key to providing a framework for developer's to use, yet still allowing them the freedom to customize for specific needs. So the OS needs an interface for setting CPU/core preferences and the language needs to provide the API. Once there is an API, developer's can "wrap their minds" around the concept and then things will "take off". As I stated previously, I prefer the "message box" mechansims simply because they port easily, are easy to understand and provide for a very loosely coupled interaction. All good tenants of a multi-threaded/multi-processor implementation.

Danger Will Robinson:

One thing that I fear is that once the concept catches on, it will be overused or abused. People will start writing threads and processes that don't do enough work to justify the overhead. Everyone who starts writing programs will "advertise" that it's "multi-threaded", as if this somehow automatically indicates quality and/or "better" software...Not.

Share
twitter facebook
- Re: (Score:3)
  
  by bucky0 ( 229117 ) writes:
  
  I guess you could read it and find out...
  seriously?
- Re: (Score:3, Insightful)
  
  by tepples ( 727027 ) writes:
  
  Is TFA talking about the Linux or Windows thread and scheduling not good enough for 4+ cores (so your programs no matter how good designed will not benefit from more cores), about being damn hard to split, thread and join tasks, or both?
  I understood the article to refer to the latter. The programming languages that are popular for desktop applications as of the 2000s don't have the proper tools (such as an unordered for-each loop or a rigorous actor model [wikipedia.org]) to make parallel programming easy.
- Grand Central (Score:4, Informative)
  
  by tepples ( 727027 ) writes: <tepples.gmail@com> on Sunday March 22, 2009 @03:42PM (#27290197) Homepage Journal
  
  Anonymous Coward wrote:
  get a mac..
  I assume you're talking about Mac OS X 10.6 (Snow Leopard), whose Grand Central framework [wikipedia.org] is supposed to add some tools to make Mac-exclusive multithreaded apps easier to program.
  
  Parent Share
  twitter facebook
- Don't imagine. Its name was Java. (Score:5, Informative)
  
  by tepples ( 727027 ) writes: <tepples.gmail@com> on Sunday March 22, 2009 @03:45PM (#27290237) Homepage Journal
  
  imagine software being developed for imaginary or speculatory hardware.
  I think Sun called it "Java". It was run on emulators [wikipedia.org] long before ARM and others came out with hardware-assisted JVMs such as Jazelle [wikipedia.org].
  
  Parent Share
  twitter facebook
  - - Re: (Score:3, Insightful)
      
      by Dahamma ( 304068 ) writes:
      
      Yeah, there are also imaginary languages for imaginary processors like mic1 and stuff. But TFA is talking about operationg systems
      Don't state what TFA says if you didn't even read TFA.
      There isn't a SINGLE reference to Linux, Windows, or any other operating system in TFA. It was about lack of developer tools to create effective multithreaded applications, and had nothing to do with operating systems.
- Re: (Score:3, Insightful)
  
  by Asic Eng ( 193332 ) writes:
  
  Parallel computing and parallel hardware have been around for decades - not on the desktop, but in the supercomputer area. It's a tough problem to solve efficiently - there are some things which are hard to get around. As an example think of the equation y = SQRT(a*b) - you need two mathematical operations there. It doesn't really help if you have two processors, since you need the result of one operation before you can perform the second. The example isn't very interesting, but essentially you always have
- Some tasks are embarrassingly parallel (Score:5, Informative)
  
  by tepples ( 727027 ) writes: <tepples.gmail@com> on Sunday March 22, 2009 @04:00PM (#27290439) Homepage Journal
  
  Most programs barely use any computational power, in fact there are very few programs that require all that computing power to operation and those are certainly well designed.
  Home users do use some apps that could benefit from multiple cores. Video encoding is one of them, but that one is embarrassingly parallel because the encoder could just split the video into quadrants and have each of four cores work on one quadrant.
  
  Parent Share
  twitter facebook
  - Re:Some tasks are embarrassingly parallel (Score:4, Informative)
    
    by evilviper ( 135110 ) writes: on Sunday March 22, 2009 @05:28PM (#27291387) Journal
    
    Video encoding is one of them, but that one is embarrassingly parallel
    This is most certainly not true. While many video codecs have been multi-threading enabled, they always do so at a significant quality reduction.
    because the encoder could just split the video into quadrants and have each of four cores work on one quadrant.
    Many features of H.264 (like GMC) require a a whole frame, not a quadrant. In practically all lossy video codecs, motion vectors have to be computed as the differential from the previous. And there are endless other examples. Of course there's little point in going into it, because the next time video encoding comes up on /., dozens of other people will make the exact same uninformed statements...
    Just go visit the x264 mailing list and ask the developers why they stopped using slice-based encoding for multithreaded encoding...
    I used to recommend splitting a 2-hour video into four 30-minute parts and feeding each to a single-threaded encoder.
    That would only make ANY sense with fixed bitrate encoding. It can possibly be used in the second-pass of multipass encoding, but that's not trivial to do by any stretch.
    
    Parent Share
    twitter facebook
- Re: (Score:3, Interesting)
  
  by palegray.net ( 1195047 ) writes:
  
  Hey, at least we aren't dealing with the lovely world of Cyrix [wikipedia.org] anymore... those were truly fun times with respect to compiler optimizations (or lack thereof, as it turned out). That and the, um, heat "issues."
- Why a web browser needs threads (Score:5, Insightful)
  
  by tepples ( 727027 ) writes: <tepples.gmail@com> on Sunday March 22, 2009 @04:09PM (#27290531) Homepage Journal
  
  What good are multiple cores and threads when you are running event driven GUI application?
  Mozilla Firefox is an event-driven GUI application. But if I open a page in a new tab, a big reflow or JavaScript run in that page can freeze the page I'm looking at. You can see this yourself: open this page [slashdot.org] in multiple tabs, and then try to scroll the foreground page. If Firefox used a thread or process per page like Google Chrome does, the operating system would take care of this. Other applications need to spawn threads when calling an API that blocks, such as gethostbyname() or getaddrinfo(); otherwise, the part of the program that interacts with the user will freeze. But these are the kind of threads that are useful even on a single core, not multicore-specific optimizations.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by Snowblindeye ( 1085701 ) writes:
    
    open this page [slashdot.org] in multiple tabs, and then try to scroll the foreground page. If Firefox used a thread or process per page like Google Chrome does, the operating system would take care of this.
    I think you are gravely oversimplifying things. Firefox certainly uses multiple threads. My Firefox thread is using 16 threads at the moment. The reason Chrome is using processes is so that when one of them crashes the other ones stay up.
    Also, if you look closely, it doesn't completely look up while the other tabs are loading. It *does* however, lock up at some point during the rendering. Which would indicate that some points of the code are synchronizing between threads, or bottlenecking on some resource,
- Re: (Score:3, Interesting)
  
  by hazydave ( 96747 ) writes:
  
  Multithreading is a system-level thing, not a language level thing.
  Sure, there have been languages that make threading ubiquitous, but they've never caught on, and it's hardly necessary.
  You'll notice that internet, graphics, and many other programming necessities are not built into C/C++ either. They are higher level functions, and thousands of programmers have no problem understanding C's role here. People have been writing multithreading code in C/C++ for decades... I've personally done in from the 80s un
- Re: (Score:3, Informative)
  
  by johannesg ( 664142 ) writes:
  
  There's not even a way in the C or C++ core language to start a new thread. And with many different third party libraries, there'll never be a reliable standard way to do it.
  Never? A standard, reliable way to do it will be part of C++0x - so that's hardly "never"...
- Re: (Score:3, Interesting)
  
  by hazydave ( 96747 ) writes:
  
  That's incorrect, at least in part. Modern MacOS is based on CMU's Mach, which has had lightweight threading support since long before Apple got into the picture. The OS was completely designed for multiple CPUs, down to the very core.
  If modern MacOS apps are not heavily multithreaded (I have no idea, I don't run priorietary hardware anymore, regardless of the OS), that's the fault of programmers not advancing past the days of MacOS 9... it has nothing whatsoever to do with the OS.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Adapt (Score:3, Funny)

Re:Adapt (Score:5, Interesting)

Re:Adapt (Score:5, Informative)

Re:Adapt (Score:4, Funny)

Re:Adapt (Score:5, Funny)

Re:Adapt (Score:5, Interesting)

Re:Adapt (Score:5, Interesting)

Re:Adapt (Score:5, Insightful)

Re:Adapt (Score:5, Insightful)

Re:Adapt (Score:5, Funny)

Re:Adapt (Score:5, Funny)

Re:Adapt (Score:5, Funny)

Re:Adapt (Score:5, Informative)

Re:Adapt (Score:5, Insightful)

Re:Adapt (Score:4, Interesting)

Re: (Score:3, Insightful)

Re:Adapt (Score:5, Insightful)

Re:Adapt (Score:4, Interesting)

Re:Adapt (Score:5, Insightful)

Re:Adapt (Score:4, Insightful)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re:Adapt (Score:5, Funny)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

If you are right, we aren't very smart (Score:5, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Funny)

Re:Adapt (Score:5, Insightful)

Re: (Score:3, Interesting)

Re:Adapt (Score:5, Insightful)

Say what ? (Score:4, Informative)

Re: (Score:3, Informative)

Re: (Score:3, Insightful)

Re:Adapt (Score:5, Informative)

Re:Adapt (Score:4, Interesting)

Re: (Score:3, Insightful)

Re:Adapt (Score:4, Insightful)

Re: (Score:3, Interesting)

It's already there (Score:4, Insightful)

Re: (Score:3, Interesting)

Re:Adapt (Score:5, Insightful)

Re:Mythical Machine Month (Score:3, Interesting)

Re: (Score:3, Informative)

Re:Adapt (Score:5, Funny)

Nothing new to see here... (Score:5, Insightful)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re:Nothing new to see here... (Score:5, Interesting)

That's a big leap (Score:4, Insightful)

Re: (Score:3, Informative)

Re:That's a big leap (Score:5, Informative)

Re: (Score:3, Insightful)

Re:Nothing new to see here... (Score:4, Interesting)

Re: (Score:3, Insightful)

elephant years (Score:3, Insightful)

Re:Nothing new to see here... (Score:5, Funny)

There's a simple paradigm here (Score:5, Interesting)

Huh? (Score:5, Funny)

The article's turning a real problem into FUD. (Score:5, Informative)

Re:The article's turning a real problem into FUD. (Score:5, Funny)

Re: (Score:3)

Real Dumb (Score:3, Insightful)

Example: Scripting Languages (Score:4, Interesting)

Re:Example: Scripting Languages (Score:5, Interesting)

Re:Example: Scripting Languages (Score:5, Insightful)

How many tools do you need? (Score:5, Insightful)

BeOS (Score:5, Interesting)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re: (Score:3, Interesting)

Another flamebait story by timothy (Score:5, Insightful)

Parallel programming is hard, film at 11. (Score:5, Informative)

Re:Parallel programming is hard, film at 11. (Score:5, Insightful)

Multithreaded applications not always needed (Score:3, Insightful)

Article = -1 Flamebait (Score:4, Insightful)

Not tools, developers (Score:4, Insightful)