Slashdot is powered by your submissions, so send in your scoop


Forgot your password?

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

Programming Supercomputing Hardware IT Technology

The Father of Multi-Core Chips Talks Shop 90

Posted by timothy
from the green-onion-layered-beneath-yogurt dept.
pacopico writes "Stanford professor Kunle Olukotun designed the first mainstream multi-core chip, crafting what would become Sun Microsystems's Niagra product. Now, he's heading up Stanford's Pervasive Parallelism Lab where researchers are looking at 100s of core systems that might power robots, 3-D virtual worlds and insanely big server applications. The Register just interviewed Olukotun about this work and the future of multi-core chips. Weird and interesting stuff."
This discussion has been archived. No new comments can be posted.

The Father of Multi-Core Chips Talks Shop

Comments Filter:
  • by Anonymous Coward on Saturday July 19, 2008 @05:17PM (#24256865)

    That strikes me as crackpottery. The stuff that link describes as "nonalgorithmic" is also easily algorithmic, just in a process calculus.
    And guess what? Non-kooks in the compsci community are busily working on process calculi and languages or language-facilities built around them.

  • by Skapare (16644) on Saturday July 19, 2008 @05:18PM (#24256875) Homepage

    Multi-core chips will be constrained by, among other things, the memory bandwidth going off-chip. Maybe they need larger caches. Maybe they just need to put all the RAM on the chip itself instead of so many other cores. How about 4GB of RAM at 1st level cache speed.

    Ultimately, we'll end up with PCs made from SoCs, and direct SATA, USB, Firewire, and DVI interfaces coming out instead of a RAM access bus. By the time they are ready to make 256 core CPUs, software still won't be ready to work well on that. So in the interim, they might as well just do tighter integration (that can also run faster there, too). No more north bridge or south bridge. Just a few capacitors, resistors, and maybe a buffer amp or two, around the big CPU.

    About the only thing that won't be practical to put in the CPU for a long time is the power supply. They could even put the disk drive in there (flash SSD).

  • by ddrichardson (869910) on Saturday July 19, 2008 @05:32PM (#24256925) Homepage

    That sounds ideal and in the long term is probably what will happen. But you need to overcome two massive issues first - leakage and interference between that many components in one space and of course heat dissapation.

  • by BrentH (1154987) on Saturday July 19, 2008 @05:39PM (#24256963)
    How do videocards handle feeding data to 800 (latest AMD chip) separate processors? The memory controller is onchip of course, and it has a bandwidth of about 50-60GB/s I believe. So, for normal multicore cpu's, try bumping up that DDR2 ram from a measly ~10GB/s (when used in dual channel) up to the same level (AMD again already has the memorycontroller onchip, Intel is going there I believe). DDR(2) being 64bits wide (why?) doesn't either help I'd say.
  • by Anonymous Coward on Saturday July 19, 2008 @05:39PM (#24256965)

    Hi MOBE2001 []. Trying Twitter's tricks as well now?

  • by lenski (96498) on Saturday July 19, 2008 @05:59PM (#24257143)

    Silly. I cannot believe Donald Knuth would be that dense, there must be more to the conversation.

    Every major system in existence today is already a "multiprocessor" system, we just don't think of them that way. The average PC is a parallel system running at least 14 CPUs in parallel. (two or three for every spindle, one or two for keyboard, a few for your broadband modem, a few in your firewall, etc etc etc).

    Multicore systems are simply an extension of the existing computational model. Plus, every supercomputer built in the last 20 years has been massively parallel.

    Out of ideas? I Don't think so.

  • by Anonymous Coward on Saturday July 19, 2008 @06:13PM (#24257239)

    yep. and that attitude is why, in the 21st century, we still run the horribly inefficient internal combustion engine, because fucktards like you believe all the alternatives are "crackpottery"

    Fuck you and all your ignorant "yes-man" kind

  • by lenski (96498) on Saturday July 19, 2008 @06:17PM (#24257265)

    I can see your point... I can imagine a thing that looks a whole lot like an FPGA whose cells are designed to accept new functional definitions extremely dynamically.

    (As you can tell, I don't agree with using the name "non-algorithmic": It's algorithmic by any reasonable theoretical definition. This is why I refer to it as being an extremely fine-grained data flow model.)

    However, if you look at modern FPGAs, you will discover that even there, the macrocells are fairly large objects.

    I guess that when it comes down to it, the "non-algorithmic" model proposed in the page you cite seems so fine-grained that benefits would be overwhelmed by connectivity issues. By this I mean not simply bandwidth among functional components, but in defining "who talks with whom under what dynamically changing circumstances". Any attempt to discuss fine-grained data flow must face the issue of efficiency in connecting the interacting data and control "elements".

    There's the possibly even more interesting question about how many of each sort of functional module should be built.

    What do you say to meeting in the middle, and thinking about a system that isn't so fine-grained, while also thinking of "control functions" as being just as movable as the data elements? Here's why I ask: In my opinion, there might well be some very good research work to be done in applying techniques related to functional programming to a system of extremely large number of simple functional units that know how to move functionality around with the data.

  • Horse Pucky..... (Score:5, Insightful)

    by FlyingGuy (989135) <> on Saturday July 19, 2008 @07:38PM (#24257833)

    We already have servers for INSANELY HUGE internet apps, its called a main-frame.

    It amazes me to no end, how many people still think its about the CPU. It about throughput, ok? Can we just get that fucking settled already? I don't give a rats ass how many damn cores you have running or if the are running 100 gigahertz, if you are still reading data across a bus, over an ethernet connection, ANYTHING that does not work at CPU speed then it makes little difference, that damn CPU will be sitting there spinning waiting for the data to come popping through so it can do something!

    Mainframes use 386 chips for I/O controllers and even those sit there and loaf, talk about a waste of electricity! About .01% of the worlds computers need the kind of power that a CPU with more then say 4 cores provide. Those that do are rather busy doing insanely complex mathematics, but even then I doubt that the CPU(s), even when running at "100%" utilization are actually doing the work that they were programmed to do, rather they are waiting for I/O to a database or RAM and fetching data.

    Until someone figures out how to move data in a far far more efficient manner then we currently understand, these mega-core CPU's, while nice to think about, are simply a waste of time and silicon with the possible exception of research.

  • by Louis Savain (65843) on Sunday July 20, 2008 @12:22AM (#24259651) Homepage

    What the heck? How is the parent flamebait? Did some mod mis-click?

    Nope. It's not a mis-click. It is called censorship on Slashdot. I made myself a lot of enemies apparently. LOL. On Slashdot, you are not allowed to criticise Turing or Darwinism or atheism. Like peer review in science, Slashdot's moderation system serves as a mechanism to suppress dissent, that's all. Too bad Slashdot is not the only forum for expression on the net. But it does not matter in the end, does it?

  • by Anonymous Coward on Sunday July 20, 2008 @01:21AM (#24259875)

    I made myself a lot of enemies apparently.

    I don't think its so much enemies you make, its the attitude you take towards the community you are trying to influence. There are many very intelligent computer scientists, and you seem to suggest that most are idiots. You will not be seen as insightful if you cannot recognize the great accomplishments already made.

    Personally, I disagree with your positions on physics, and (especially) mathematics. Statements like "Continuity ... leads to an infinite regress []" belie your lack of understanding of these mature fields. Who is going to trust your analysis when you make these statements without any real argument? Down-modding these statements is not censorship, it's moderation: we do not need any more of this crap on /.

    With this kind of broader view of your posts, its tempting to just throw away all of your comments as "crackpot posts." Which is, by the way, what happened with your previous post: someone just though, "Oh it's that crackpot Louis Savain again; time for a downmod." This is bad, because that post in particular was actually insightful.

    The thing is, we do need new programming languages; we do need implicit concurrency; we do need simplicity. Unfortunately, we don't need your arrogance or extremism. You may have something to offer, but it isn't your hate [].

    Your COSA project doesn't get traction because it requires the world to change; for better or worse, you must take the world as it is and nudge it where you think it should go. There are many people smarter than you who should and do have more sway in the matter: you should be seeking to convince them. Why not code a working version of COSA which can run on a single-core computer, but can exploit arbitrarily many additional cores? People would be less unimpressed by you if you produced a functional product.

  • by Anonymous Coward on Sunday July 20, 2008 @01:49AM (#24259965)

    You misperceive the role of a Stanford computer science professor. They're not there to educate you; they're there to create their startups in a risk-free environment with cheap talent. Teaching is just the price of admission.

  • by TheRaven64 (641858) on Sunday July 20, 2008 @04:03AM (#24260479) Journal

    Niagara has enough memory bandwidth to keep its execution units happy. The last chip I remember that didn't was the G4 (PowerPC). The problem is more one of latency. This isn't such a problem in a GPU, since they are basically parallel stream processors - you just throw a lot of data at them and they process it more-or-less in that order.

    There was a study conducted ages ago (70s or 80s) which determined that, on average, there is one branch instruction in every seven instructions in general purpose code. This means that you can pretty much tell where memory accesses are going to be for 7 instructions, you've got a 50% chance for 14 (assuming it's a conditional jump, not a computed jump), a 25% chance for 21 instructions and so on. The time taken to fetch something from memory if you guessed wrongly is around 200 cycles.

    This is a big reason why the T1/2 have lots of contexts (threads). If you need to wait for memory with one of them, then there are 3 or 7 (T1 or T2) waiting that can still use the execution units.

    Most CPUs use quite a lot of cache memory. This does two things. First, it keeps data that's been accessed recently around for a while. Second, you access memory via the cache, so when you want one word it will load an entire cache line and data near it will also be fast to access. This is good for programs which have a lot of locality of reference (most of them).

  • by Cheesey (70139) on Sunday July 20, 2008 @04:04AM (#24260483)

    Right, so you split your computation up into small units that can be efficiently allocated to the many core array. This allows you to express the parallelism in the program properly, because you're not constrained by the coarse granularity of a thread model. Cool.

    But the problem here is how you write the code itself. Purely functional code maps really well onto this model, but nobody wants to retrain all their programmers to use Haskell. We're going to end up with a hybrid C-based language: but what restrictions should exist in it? This depends on what is easy to implement in hardware - because if we wanted to stick with what was easy to implement in software, we'd carry on trying to squeeze a few extra instructions per second out of a conventional CPU architecture.

    The biggest restriction turns out to be the "R" in RAM. Most of our programs use memory in an unpredictable way, pulling data from all over the memory space, and this doesn't map well to a many core architecture. You can put caches at every core, but the cache miss penalty is astronomical, not to mention the problems of keeping all the caches coherent. Random access won't scale; we will need something else, and it will break lots of programs.

    This is going to lead to some really shitty science, because:

    • Many core architectures will only be good for running certain types of program: not just programs that can be split into tiny units of computation, but programs that access RAM in a predictable way.
    • The many core architects will pick the programs that work best on their system; these may or may not have anything to do with real applications for many core systems (And what is an application for a many core system anyway? Don't say graphics...)
    • It will be hard to quantitatively compare one many core architecture with another because of the different assumptions about what programs are able to do in each case. There are too many variables; there is no "control variable".

    I think that the eventual winning architecture will be the one that is easiest to write programs for. But it will have to be so much better at running those programs that it is worth the effort of porting them. So it will have to be a huge improvement, or easy to port to, or some combination of the two. However, those are qualitative properties. Anyone could argue that their architecture is better than another - and they will.

  • by Anonymous Coward on Sunday July 20, 2008 @04:31AM (#24260623)

    Precisely. This is why labs such as RAMP and PARLAB (both from Berkeley - take that, Stanford) have designed not just multicore systems, but 'manycore' systems possessing in excess of 1000 CPUs (the chips are actually FPGAs if I'm not mistaken). The chips run pretty slowly -- some of them around 100MHz, but the operation virtually any part of the chip can be observed and tweaked at a very low level. The idea is not to design a faster-clocked or more parallel CPU so much as it is to discover the best architecture for parallel multiprocessing; i.e. the architecture with the best throughput.

"Well I don't see why I have to make one man miserable when I can make so many men happy." -- Ellyn Mustard, about marriage