Intel Says to Prepare For "Thousands of Cores"

Slashdot is powered by your submissions, so send in your scoop

Intel Says to Prepare For "Thousands of Cores" 638

Posted by ScuttleMonkey on Wednesday July 02, 2008 @04:42PM from the viva-la-coding-revolucion dept.

Impy the Impiuos Imp writes to tell us that in a recent statement Intel has revealed their plans for the future and it goes well beyond the traditional processor model. Suggesting developers start thinking about tens, hundreds, or even thousand or cores, it seems Intel is pushing for a massive evolution in the way processing is handled. "Now, however, Intel is increasingly 'discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander,' [Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab] said. He says that the more radical programming path to tap into many processing cores 'presents the "opportunity" for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence.'"

This discussion has been archived. No new comments can be posted.

Intel Says to Prepare For "Thousands of Cores"

Search 638 Comments Log In/Create an Account

Comments Filter:

Not Sure I'm Getting It (Score:5, Insightful)

by gbulmash ( 688770 ) * writes: <semi_famous@yah o o . c om> on Wednesday July 02, 2008 @04:44PM (#24035965) Homepage Journal

I'm no software engineer, but it seems like a lot of the issue in designing for multiple cores is being able to turn large tasks into many independent discrete operations that can be processed in tandem. But it seems that some tasks lend themselves to that compartmentalization and some don't. If you have 1,000 half-gigahertz cores running a 3D simulation, you may be able to get 875 FPS out of Doom X at 1920x1440, but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?

Share
twitter facebook
Good idea (Score:5, Insightful)

by Piranhaa ( 672441 ) writes: on Wednesday July 02, 2008 @04:52PM (#24036083)

It's a good idea.. Somewhat of the same idea that the Cell chip has going for it (and well, Phenom X3s). You make a product with lots of redunant objects so that when some are bound to failure, the percentage of failure is much lower..
If there are 1000 cores on a chip, and 100 go bad... You're still only losing a *maximum* of 10% of performance versus when you have 2 or 4 cores and 1 or 2 go bad, you have a performance impact of 50% essentially.. Brings costs down because yeilds go up dramatically.

Share
twitter facebook
Re:Memory bandwidth? (Score:2, Insightful)

by lazyDog86 ( 1191443 ) writes: on Wednesday July 02, 2008 @04:57PM (#24036151)

I would assume that if you have enough transistors to have thousands of cores that you will be able to put on a lot of SRAM cache as well - just drop a few hundred or thousand cores. You won't be able to integrate DRAM since it requires a different process, but SRAM should be integrated easily enough.

Parent Share
twitter facebook
Re:Useless (Score:5, Insightful)

by CastrTroy ( 595695 ) writes: on Wednesday July 02, 2008 @04:59PM (#24036177)

Well, parallel programming is hard. It's not so hard that it can't be done, but it's harder than sequential programming. Unless your app will have a specific advantage because of this parallel programming, then it isn't worth the effort to do it in the first place. The nice thing however, would be that you could run each process on a separate core, and there wouldn't be any task switching needed. This would speed things up quite a bit. Also, if you locked a process or thread to each core, then one slow down wouldn't take out the entire system.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:3, Insightful)

by ViperOrel ( 1286864 ) writes: on Wednesday July 02, 2008 @05:05PM (#24036279)

Just a thought, but I would say that 3 billion operations should be enough for just about any linear logic you could need solved. Where we run into trouble is in trying to use single processes to solve problems that should be solved in parallel. If having a thousand cores means that we can now run things much more efficiently in parallel, then maybe people will finally start breaking their problems up that way. As long as you can only count the cores up on one hand, your potential benefit from multithreading your problem is low compared to the effort of debugging. Once you have a lot of cores, the benefit increases significantly. (I see this helping a lot in image processing, patern recognition, and natural language... not to mention robotics and general AI...)

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by pla ( 258480 ) writes: on Wednesday July 02, 2008 @05:10PM (#24036327) Journal

I'm no software engineer [...] but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?

As a software engineer, I wonder the same thing.

Put simply, the majority of code simply doesn't parallelize well. You can break out a few major portions of it to run as their own threads, but for the most part, programs either sit around and wait for the user, or sit around and wait for hardware resources.

Within that, only those programs that wait for a particular hardware resource - CPU time - Even have the potential to benefit from more cores... And while a lot of those might split well into a few threads, most will not scale (without a complete rewrite to chose entirely different algorithms - If they even exist to accomplish the intended purpose) to more than a handful of cores.

Parent Share
twitter facebook
Re:We all saw it coming anyway (Score:5, Insightful)

by ClosedSource ( 238333 ) writes: on Wednesday July 02, 2008 @05:10PM (#24036341)

"So whether programmers find this move acceptable or not is irrelevant because this path is probably the only way to design faster CPU:s once we've hit the nanometer wall."
I guess you should put "faster" in quotes.
In any case, it is absolutely relevant what programmers think since any performance improvements that customers actually experience is dependent on our code.
Historically a primary reason to buy a new computer is because a faster system makes legacy applications run faster. To a large extent this won't be true with a new multicore PC. So why would people buy them?
That's why Intel wants us to redesign our software - so that in the future their customers will still have a reason to buy a new PC with Intel Inside.

Parent Share
twitter facebook
Re:Memory bandwidth? (Score:3, Insightful)

by Gewalt ( 1200451 ) writes: on Wednesday July 02, 2008 @05:12PM (#24036361)

Not really. If you can put 1000 cores on a processor, then I don't see why you cant put 100 or so layers of ram on there too. Eventually, it will becomea requirement to get the system to scale.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:4, Insightful)

by mweather ( 1089505 ) writes: on Wednesday July 02, 2008 @05:19PM (#24036459)

Pleasing a woman is easy. Give her your credit card.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by Anonymous Coward writes: on Wednesday July 02, 2008 @05:21PM (#24036485)

That is what most current processors do and use branch prediction for. Even if you have a thousand cores, that's only 10 binary decisions ahead. You need to guess really well very often to keep your cores busy instead of syncing. Also, the further you're executing ahead, the more ultimately useless calculations are made, which is what drives power consumption up in long pipeline cores (which you're essentially proposing).
In reality parallelism is more likely going to be found by better compilers. Programmers will have to be more specific about the type of loops they want. Do you just need something to be performed on every item in an array or is order important? No more mindless for-loops for not inherently sequential processes.

Parent Share
twitter facebook
Re:Disagreement about this trend (Score:5, Insightful)

by RightSaidFred99 ( 874576 ) writes: on Wednesday July 02, 2008 @05:22PM (#24036507)

His premise is flawed. People using email, running a web browser, etc... hit CPU speed saturation some time ago. A 500MHz CPU can adequately serve their needs. So they are not at issue here. What's at issue is next generation shit like AI, high quality voice recognition, advanced ray tracing/radiosity/whatever graphics, face/gesture recognition, etc... I don't think anyone sees us needing 1000 cores in the next few years.
My guess is 4 cores in 2008, 4 cores in 2009, moving to 8 cores through 2010. We may move to a new uber-core model once the software catches up, more like 6-8 years than 2-4. I'm positive we won't "max out" at 64 cores, because we're going to hit a per-core speed limit much more quickly than we hit a number-of-cores limit.

Parent Share
twitter facebook
It's all changing too fast (Score:3, Insightful)

by blowhole ( 155935 ) writes: on Wednesday July 02, 2008 @05:26PM (#24036557)

I've only been programming professionally for 3 years now, but already I'm shaking in my boots over having to rethink and relearn the way I've done things to accomodate these massively parallel architectures. I can't imagine how scared must be the old timers of 20, 30, or more years. Or maybe the good ones who are still hacking decades later have already had to deal with paradigm shifts and aren't scared at all?

Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by jandrese ( 485 ) writes: <kensama@vt.edu> on Wednesday July 02, 2008 @05:28PM (#24036595) Homepage Journal

Process switching overhead is pretty low though, especially if you just have one thread hammering away and most everything else is largely idle. The fundamental limitation of being stuck with 1/1000 of the power of your 1000 core chip because your problem is difficult/impossible to parallelize is a real one.

From a practical standpoint, Intel is right that we need vastly better developer tools and that most things that require ridiculous amounts of compute time can be parallized if you put some effort into it.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by 192939495969798999 ( 58312 ) writes: <info AT devinmoore DOT com> on Wednesday July 02, 2008 @05:33PM (#24036649) Homepage Journal

I concur, furthermore I'd like to see one core per pixel, that would certainly solve your high-end gaming issues.

Parent Share
twitter facebook
Re:Memory bandwidth? (Score:5, Insightful)

by bluefoxlucid ( 723572 ) writes: on Wednesday July 02, 2008 @05:33PM (#24036651) Homepage Journal

Memory would have to be completely redefined. Currently, you have one memory bank that is effectively accessed serially.
Yes, in Intel land. AMD has this thing called NUMA. What do you think "HyperTransport" means?

Parent Share
twitter facebook
Re:Disagreement about this trend (Score:5, Insightful)

by MojoRilla ( 591502 ) writes: on Wednesday July 02, 2008 @05:34PM (#24036667)

This seems silly. If you create more compute power, someone will think of ways to use it.

Web applications are becoming more AJAX'y all the time, and they are not sequential at all. Watching a video while another tab checks my Gmail is a parallel task. All indications are that people want to consume more and more media on their computers. Things like the MLB mosaic allow you to watch four games at once.

Have you ever listened to a song through your computer while coding, running an email program, and running an instant messaging program? There are four highly parallelizable tasks right there. Not compute intensive enough for you? Imagine the song compressed with a new codec that is twice as efficient in terms of size but twice as compute intensive. Imagine the email program indexing your email for efficient search, running algorithms to assess the email's importance to you, and virus checking new deliveries. Imagine your code editor doing on the fly analysis of what you are coding, and making suggestions.

"Normal" users are doing more and more with computers as well. Now that fast computers are cheap, people who never edited video or photos are doing it. If you want a significant market besides gamers who need more cores, it is people making videos, especially HD videos. Sure, my Grandmother isn't going to be doing this, but I do, and I'm sure my children will do it even more.

And don't forget about virus writers. They need a few cores to run on as well!

Computer power keeps its steady progress higher, and we keep finding interesting things to do with it all. I don't see that stopping, so I don't see a limit to the number of cores people will need.

Parent Share
twitter facebook
Re:Memory bandwidth? (Score:2, Insightful)

by Anonymous Coward writes: on Wednesday July 02, 2008 @05:35PM (#24036673)

You need a basic course in TTL. No they haven't figured this out and putting address decoded on the chip makes very little difference when you scale. They also haven't figured out communication between cores. We had 1000s of CPUs rigged up with transputers back in the 80s. It was a mare, and near useless for just about everything. We had to use serial data to make things sane.
The more logic you have the longer the signal path. The longer the signal path the hard it is to sync on the clock pulse. The higher the clock freq the less like a square wave the single is, it starts to look like a ramp.
There are huge problems with scaling, whether it's speed or cores. If Intel want us to have all these cores, their engineers are going to have to overcome the same problems parallel programming has had for 30 year or more.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by Intron ( 870560 ) writes: on Wednesday July 02, 2008 @05:35PM (#24036677)

I wonder who has the rights to all of the code from Thinking Machines? We are almost to the point where you can have a Connection Machine on your desktop. They did a lot of work on automatically converting code to parallel in the compiler and were quite successful at what they did. Trying to do it manually is the wrong approach. A great deal of CPU time on a modern desktop system is spent on graphics operations, for example. That is all easily parallelized.

Parent Share
twitter facebook
Re:Disagreement about this trend (Score:3, Insightful)

by BlueHands ( 142945 ) writes: on Wednesday July 02, 2008 @05:38PM (#24036711)

I KNOW it is so very often sited but if every was a time to mention the "5 computers in the whole world" it is this. In fact, I would dare say that is the whole point of this push by Intel: trying to get people (programmers) used to the thought of having so many parallel cpus in a home computer.
Sure, from where we stand now, 64 seems like a lot but maybe a core for nearly each pixel on my screen makes sense, has real value to add. Or how about just flat-out smarter computers, something which might happen by simulating 100 neurons per core. As far as I understand it, speech recognition can always use more power. Let me put it differently:
Games requiring a lot of computing power makes sense to you in the future but not elsewhere. The same would have been said about a high end gaming rig just a handful of years ago, and yeta low-end PC today has amazing graphics,amazing everything, compared to what things were just 10 years ago. And it gets used, much of the time. If we have the power, we will use it. Games just push the envelope further, sooner, but they don't go anywhere that we all wouldn't wouldn't like to go anyways.
I can not think of a single task in a game that I would not want to be able to do in real life. Games are about living an idealized life, of some sort, inside your computer. The next step is bring it our here, to the rest of the world.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:4, Insightful)

by mikael_j ( 106439 ) writes: on Wednesday July 02, 2008 @05:44PM (#24036773)

Obviously just adding more cores does little to speed up individual sequential processes, but it does help with multitasking, which is what I really think is the "killer app" for multi-core processors.
Back in the late 90's (it doesn't feel like "back in.." yet but I'm willing to admit that it was about a decade ago) I decided to build a computer with an Abit BP6 motherboard, two Celeron processors and lots of RAM instead of a single higher end processor because I wanted to be able to multitask properly, my gamer friends mocked me for choosing Celeron processors but for the price of a single processor system I got a system that was capable of running several "normal" apps and one with heavy cpu usage without slowing down the system, and the extra RAM also helped (I saw lots of people back then go for 128 MB of RAM and a faster CPU instead of "wasting" their money on RAM, and then they cursed their computer for being slow when it started swapping). There was also the upside of having Windows 2000 run as fast on my computer as Windows 98 did on my friends' computers...
/Mikael

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:4, Insightful)

by hey! ( 33014 ) writes: on Wednesday July 02, 2008 @05:53PM (#24036889) Homepage Journal

Are you crazy? Context switches are the slowdown in multitasking OSes.
Unfortunately, multitasking OSes are not the slowdown in most tasks, exceptions noted of course.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by jonbryce ( 703250 ) writes: on Wednesday July 02, 2008 @05:57PM (#24036927) Homepage

At the moment, I'm looking at Slashdot in Firefox, while listening to an mp3. I'm only using two out of my four cores, and I have 3% CPU usage.
Maybe when I post this, I might use a third core for a little while, but how many cores can I actually usefully use.
I can break a password protected Excel file in 30 hours max with this computer, and a 10000 core chip might reduce this to 43 seconds, but other than that, what difference is it going to make?

Parent Share
twitter facebook
you mean SGI (Score:5, Insightful)

by ArchieBunker ( 132337 ) writes: on Wednesday July 02, 2008 @06:08PM (#24037093)

SGI and or Cray were using NUMA a decade ago.

Parent Share
twitter facebook
Re:It's all changing too fast (Score:5, Insightful)

by GatesDA ( 1260600 ) writes: on Wednesday July 02, 2008 @06:13PM (#24037133)

My dad's been programming for decades, and he's much more used to paradigm shifts than I am. His first programming job was translating assembly from one architechture to another, and now he's a proficient web developer. He understands concurrency and keeps up to date on new developments.
I'm reminded of an anecdote told to me during a presentation. The presenter had been introducing a new technology, and one man had a concern: "I've just worked hard to learn the previous technology. Can you promise me that, if I learn this one, it will be the last one I ever have to learn?" The presenter replied, "I can't promise you that, but I can promise you that you're in the wrong profession."

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by LandDolphin ( 1202876 ) writes: on Wednesday July 02, 2008 @06:41PM (#24037453)

"Having 2 cores is enough for most consumers"

Before having 1 core was enough, and having 512mb of RAM was enough for most consumers. Computing power grows, and software developers makes use of that additional power. However, this will mainly effect the gaming industry.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:4, Insightful)

by ceswiedler ( 165311 ) * writes: <chris@swiedler.org> on Wednesday July 02, 2008 @07:07PM (#24037745)

Uh, last time I checked, Python had a single interpreter lock per process which made it unsuitable for heavily multithreaded programs. Java would be a better example of a scalable and multithread-aware language.

Parent Share
twitter facebook
Difference (Score:3, Insightful)

by XanC ( 644172 ) writes: on Wednesday July 02, 2008 @07:09PM (#24037759)

What's different this time may be that nobody else has anything better. Last time, AMD64 was the easier solution, and it clobbered Itanium. Can AMD (or anybody) simply choose to keep making single cores faster, or is multi-core the way CPUs really must go from here?

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:3, Insightful)

by joto ( 134244 ) writes: on Wednesday July 02, 2008 @07:34PM (#24037977)

In reality parallelism is more likely going to be found by better compilers. Programmers will have to be more specific about the type of loops they want. Do you just need something to be performed on every item in an array or is order important? No more mindless for-loops for not inherently sequential processes.
I disagree. Having the compiler analyze loops to find out if they are trivially parallelizable is easy, there's little need to change the language.
On the other hand, a language that was really designed for kilocores or megacores would be radically different from most modern languages, adding a few extra (un)loop-statements wouldn't do. Functional languages are a good bet. When everything is side-effect-free, there's no good reason why all of it can't be executed in parallel.
But maybe we need even more abstraction. And more time. It took quite a while after the invention of the programmable computer for someone to invent FORTRAN. And we still program in something resembling FORTRAN. Maybe what we really need are actual many-core computers so that someone really smart will use them, and finally figure out a way to program them that's practical. That's where I'll put my money. Wait and see!

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by curunir ( 98273 ) * writes: on Wednesday July 02, 2008 @07:46PM (#24038081) Homepage Journal

...but other than that, what difference is it going to make?
This is, IMHO, the wrong question to be asking. Asking how current tasks will be optimized to take advantage of future hardware makes the fundamental flawed assumption that the current tasks will be what's considered important once we have this kind of hardware.
But the history of computers have shown that the "if you build it, they will come" philosophy applies to the tasks that people end up wanting to accomplish. It's been seen time and again that new abilities for using computers wait until we've hit a certain performance threshold, whether it CPU, memory, bandwidth, disk space, video resolution or whatever, and then become the things we need our computers to do.
Take, for instance, the huge success of mp3's. There was a time not so long ago when people were limited to playing music off a physical CD. This wasn't because there was no desire amongst computer users to listen to digital files that could be stored locally or streamed off the internet. It was because computer users did not know yet that they had the desire to do it. But technology advanced to the point where a) processors became fast enough to decode mp3's in real time without using the whole CPU and b) hard drives grew to the point where we had the capacity to store files that are 10% of the size of the size of the files on the CD.
Similarly, it's likely that when we reach the point where we have hundreds or thousands of cores, new tasks will emerge that take advantage of the new capabilities of the hardware. It may be that those tasks are limited in some other way by one of the other components we use or by the as yet non-existent status of some new component, but it's only important that multiple cores play a part in enabling the new task.
In the near term, you can imagine a whole host of applications that would become possible when you get to the point where the average computer can do real-time H.264 encoding without affecting overall system performance. I won't guess at what might be popular further down the road, but there will be people who will think of something to do with those extra cores. And, in hindsight, we'll see the proliferation of cores as enabling our current computer-using behavior.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by geekoid ( 135745 ) writes: <dadinportland&yahoo,com> on Wednesday July 02, 2008 @07:53PM (#24038125) Homepage Journal

Why wouldn't each core have it's own cache? It only needs to cache what it needs for its job.

Parent Share
twitter facebook
so, Intel made risc passé... (Score:3, Insightful)

by DragonTHC ( 208439 ) writes: <<moc.lliwtsalsremag> <ta> <nogarD>> on Wednesday July 02, 2008 @08:02PM (#24038203) Homepage Journal

and now they're bringing it back?
we all learned how 1000 cores doesn't matter if each core can only process a simplified instruction set compared to 2 cores that can handle more data per thread.
this is basic computer design here people.

Share
twitter facebook
Re:Not Sure I'm Getting It (Score:3, Insightful)

by geekoid ( 135745 ) writes: <dadinportland&yahoo,com> on Wednesday July 02, 2008 @08:07PM (#24038245) Homepage Journal

"Unfortunately all this is going to lead to bus and memory bandwidth contention, "
Good. Current bus needs to be redone.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:3, Insightful)

by geekoid ( 135745 ) writes: <dadinportland&yahoo,com> on Wednesday July 02, 2008 @08:15PM (#24038293) Homepage Journal

except when running an algorithm on 1 core, you can have 900 cores running different outputs based on the probability of a different out come of the previous part of the process.
WHen it is actually determined, kill the 899 that wher incorrect. In fact, what would probably happen is they would all branch differently, so you might kill 400, then after running for a bit, 200, and so on. This would exponentially decrease the time it takes to solve it.
In fact, for some application getting 'close enough' will do.
Example:
Chess. I move my pawn in the first move in chess. 18 processes started up on separate cores, each one calculating the next 5 steps that are possible. When the next mover is made, it kills the processes that didn't calculate 5 steps from that move.

Parent Share
twitter facebook
Re:Hey remember the 1980's and the Amiga? (Score:2, Insightful)

by ergo98 ( 9391 ) writes: on Wednesday July 02, 2008 @08:40PM (#24038491) Homepage Journal

We have come full circle now with dual core and up chips and the GPU being built into the CPU now, back to the Amiga, which was a superior system design.
How is that back to the Amiga?
The PC platform hit Amiga levels well over a decade and a half ago, with dedicated graphics hardware, dedicated audio hardware, dedicated network hardware, a numerical coprocessor, and so on. People need to stop claiming every new change finally brings things back to the Amiga. That argument is terribly old.
And yeah I was into the Amiga and Atari ST and Mac Classic back in those days, but then I moved on.

Parent Share
twitter facebook
Re:It's all changing too fast (Score:3, Insightful)

by geekoid ( 135745 ) writes: <dadinportland&yahoo,com> on Wednesday July 02, 2008 @08:46PM (#24038547) Homepage Journal

We're not scared. All the good ones spit in to their hands, brace themselves and say "Bring it on."
Any old timers actually scared needs to leave, and don't let your beard get caught in the door on the way out, wuss.
Don't worry about relearning, by the time this hits the market, tools will ahve been written, and there will ahve been a lot of documentation.
It's going to be a great step in computing... Or it will get killed becasue the tools weren't developed fast enough.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:3, Insightful)

by earthforce_1 ( 454968 ) writes: <earthforce_1 AT yahoo DOT com> on Wednesday July 02, 2008 @09:52PM (#24038983) Journal

You speed it up by rewriting sequential algorithms to run in parallel. It is surprising the number of algorithms you would swear are inherently sequential that can be rewritten to operate in parallel. Beyond that, you can have cores engaged in speculative execution, where the results may or may not be used. I could imaging a spell checker where multiple words and sentence fragments are dispatched to numerous cores for spelling/grammar checking. A compiler could devote a separate core to compiling/linking/optimizing each individual module or function.
Programmers don't think massively parallel and most programming languages (excluding hardware design languages such as Verilog/VHDL) are sequential in nature.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:3, Insightful)

by Gazzonyx ( 982402 ) writes: <scott,lovenberg&gmail,com> on Wednesday July 02, 2008 @10:01PM (#24039045)

Another thing to think about (besides cache coherency, ping ponging between sockets over the bus, locking overhead, etc.): You can have a million cores and it won't matter. You're only as fast as your weakest link. Right now, that's storage, but solid state hard drives will be common in the next decade for first tier storage (as straight memory bank storage becomes more common for high performance applications), the average disk access time will improve by a few orders of magnitude. Still, that only moves the problem 'forward' a level.

You still choke on the Memory Wall [wikipedia.org]; you have to feed all those cores data, and you're going a few orders of magnitude slower than the CPU cores. Increasing bandwidth on the front side bus doesn't help, as you have to increase bandwidth and decrease latency. You compound this when you have many cores/sockets doing backward cache flushes to RAM.

Even if you've got a hypertransport link (as Intel doesn't, they push bits on the front side bus between sockets, IIRC) to the north bridge for each socket, you've still only got a single north bridge. You're bottlenecked again. OK, use two front side buses with an interlink. Now we're back to coherency problems, but at two points. At some point, you have to either give each socket its own RAM bank (NUMA) and isolate data (and make CPU migration for tasks take an extra hit) or figure out how to perfectly isolate and stripe your data over multiple paths to a single backing store.

Parent Share
twitter facebook
Re:Yeah, right. (Score:3, Insightful)

by Doctor Faustus ( 127273 ) writes: <[Slashdot] [at] [WilliamCleveland.Org]> on Wednesday July 02, 2008 @10:41PM (#24039265) Homepage

The notion that some revolutionary compiler or IDE is going to solve this problem is just wrong. Tell it to Itanic, that was based on exactly these assumptions and failed miserably because of them.
With Itanium, they were trying to say compiler improvements could handle it invisibly, with no work from the application programmers. Taking advantage of more than two cores (since one can take care of other programs that would have slowed down your app) is going to take conscious thought about what can and can't be parallel. Taking advantage of more than a handful is going to take more fundamental shifts in how we program. They're asking a lot more this time.
On the other hand, you could easily opt out of Itanium. Now, this is the only way your programs are going to get much future processing improvement. Ever. No matter who you're buying CPUs from.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:5, Insightful)

by Stan Vassilev ( 939229 ) writes: on Thursday July 03, 2008 @01:32AM (#24040107)

As a software engineer, I wonder the same thing.
Put simply, the majority of code simply doesn't parallelize well. You can break out a few major portions of it to run as their own threads, but for the most part, programs either sit around and wait for the user, or sit around and wait for hardware resources.
Within that, only those programs that wait for a particular hardware resource - CPU time - Even have the potential to benefit from more cores... And while a lot of those might split well into a few threads, most will not scale (without a complete rewrite to chose entirely different algorithms - If they even exist to accomplish the intended purpose) to more than a handful of cores.
As a software engineer you should know that "most code doesn't parallelize" is very different from "most of the code's runtime can't parallelize", as code size and code runtime are substantially different things.
Look at most CPU intensive tasks today and you'll notice they all parallelize very well: archiving/extracting, encoding/decoding (video, audio), 2D and 3D GUI/graphics/animations rendering (not just for games anymore!), indexing and searching indexes, databases in general, and last but not least, image/video and voice recognition.
So, while your very high-level task is sequential, the *services* it calls or implicitly uses (like GUI rendering), and the smaller tasks it performs, actually would make a pretty good use of as many cores as you can throw at them.
This is good news for software engineers like you and me, as we can write mostly serial code and isolate slow tasks into isolated routines that we write once and reuse many times.

Parent Share
twitter facebook
Re:Not Sure I'm Getting It (Score:3, Insightful)

by makapuf ( 412290 ) writes: on Thursday July 03, 2008 @03:54AM (#24040567)

Why "before" ? I think 512Mb RAM / 1 or 2 GHz + decent speedy harddrive IS enough for most consumers, playing (moderately recent) games (maybe upgrading to a newer $50 video card), playing (moderate) HD, MP3, browsing sites, any office work usings lots of ajax/ on FF3.
You know what ? you could even (gasp) code on it (maybe not compile eclipse every 5 minutes, OK), run a small server on it, or transcoding videos (maybe 4x more slowly, so you'll end up letting it run for the night instead of 2 hours from time to time. big deal)
Of course, SOME people might need more. For most of us, 512Mb/1x2GHz is perfectly enough (see eeePC).

Parent Share
twitter facebook
Missing the point (Score:3, Insightful)

by Orgasmatron ( 8103 ) writes: on Thursday July 03, 2008 @09:58AM (#24043087)

The point is that this is going to happen, whether anyone likes it or not.

CPU clock speeds ran into the brick wall a few years ago. Here is a chart showing CPU clocks from 1993 to 2005. [tomshardware.com]

There have been no major performance improvements from that direction for the last few years, and probably won't be any more without a major breakthrough in semiconductors.

Moore's law is about transistor counts, and shows no real signs of stopping. Every 18 to 24 months, we double the number of transistors on a given wafer/die. The transistion to 64 bit CPUs used a generation or two of those extra transistors, but we aren't likely to move to 128 bits soon. We are already pretty deep into the diminishing-returns curve for on-die cache.

What is left to consume those transistors?

More cores. Lots more cores. If you replace your CPU every 2 years, you can pretty much bet that each one you buy for the next decade or so will have twice as many cores as the one it is replacing.

And if developers and compilers get good at managing parallel code (and they have no choice in this), you can expect core counts to go up even faster than doubling ever couple of years.

Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Not Sure I'm Getting It (Score:5, Insightful)

Good idea (Score:5, Insightful)

Re:Memory bandwidth? (Score:2, Insightful)

Re:Useless (Score:5, Insightful)

Re:Not Sure I'm Getting It (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

Re:We all saw it coming anyway (Score:5, Insightful)

Re:Memory bandwidth? (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:4, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

Re:Disagreement about this trend (Score:5, Insightful)

It's all changing too fast (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

Re:Memory bandwidth? (Score:5, Insightful)

Re:Disagreement about this trend (Score:5, Insightful)

Re:Memory bandwidth? (Score:2, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

Re:Disagreement about this trend (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:4, Insightful)

Re:Not Sure I'm Getting It (Score:4, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

you mean SGI (Score:5, Insightful)

Re:It's all changing too fast (Score:5, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

Re:Not Sure I'm Getting It (Score:4, Insightful)

Difference (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

so, Intel made risc passé... (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:3, Insightful)

Re:Hey remember the 1980's and the Amiga? (Score:2, Insightful)

Re:It's all changing too fast (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:3, Insightful)

Re:Yeah, right. (Score:3, Insightful)

Re:Not Sure I'm Getting It (Score:5, Insightful)

Re:Not Sure I'm Getting It (Score:3, Insightful)

Missing the point (Score:3, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals