Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Hardware Technology

Next Generation Chip Research 174

Nyxs writes to tell us Sci-Tech-Today is reporting that researchers at the University of Texas are taking a new approach to designing microprocessor architecture. Doug Berger, a computer science professor at the University of Texas, and his colleagues hope to solve many of the pressing problems facing chip designers today with the new "microprocessor and instruction set architecture called Trips, or the Teraop Reliable Intelligently Adaptive Processing System."
This discussion has been archived. No new comments can be posted.

Next Generation Chip Research

Comments Filter:
  • by ed__ ( 23481 ) on Tuesday October 04, 2005 @01:03AM (#13710348) Journal
    apprently, one of the pressing problems that chip designers are facing is coming up with stupid, meaningless acronyms.
    • Re:pressing problems (Score:4, Interesting)

      by shanen ( 462549 ) on Tuesday October 04, 2005 @01:14AM (#13710381) Homepage Journal
      Small world, eh? A comment about the acronym, and my first reaction to the article was to remember TRAC, the Texas Reconfigurable Array Computer, which was something they were working on at the same school many years ago. Well, at least they didn't need "Texas" for the acronym this time, but I doubt anyone else remembers TRAC now.

      Disclaimer: In spite of having a degree from the school, I have a very low opinion of it. Yeah, it's large enough physically, and they had some oil money, but IMO they optimized towards narrow-minded mind-narrowing efficiency rather than breadth. Real education is about the breadth. Unfortunately, these days I feel as though my real alma mater seems to be following a similar path to mediocrity.

    • Yes, it's called SMAD - Stupid Meaningless Acronym Deficiency.
  • by Anakron ( 899671 ) on Tuesday October 04, 2005 @01:03AM (#13710350)
    It doesn't actually look any different. 128 instruction per "block" executed in parallel, just like a superscalar processor. This has been around since the time of the Pentiums (The pentiums weren't VLIW, though). What exactly is new?
    • by nzkbuk ( 773506 ) on Tuesday October 04, 2005 @01:41AM (#13710450)
      The thing that's new (if you read the article) is the instructions AREN'T executed specifically in a parallel fashon.
      They are executed in a JIT (just in time) fashon.
      currently with deep pipelines results can get stored in registers for a few cycles. this aims to execute instructions as soon as it can. That way it's needing alot less registers to store results.

      It's also meaning instructions are executed out of order AND in parallel in an effort to both increase speed and decrease chip complexity.
      If you don't have to use a transistor for storage / control, you can use it for the good bits, generating your answer.
      • I wonder if this would handle concurrent programming and constraint-based inference Mozart [mozart-oz.org] far better than existing chip architectures.
      • My question is how do they handle Out of Order opperations? They mentioned in the article data flow and immediate execution after receiving inputs, which is great, but could lead of OOO output couldn't it?

        -Rick
      • Sounds an awful lot like Transmeta???

        I don't see a whole lot of difference - they are using JIT techniques to get around the recompile for new hardware problems of VLIW. Beyond that it's just VLIW warmed over.

        Maybe they have some new ideas within the VLIW compiler space?

        In any case, I don't see it as revolutionary...more like evolutionary - and even then, just barely.
      • Out of order execution with multiple ALUs is nothing new either- both the pentium and Athlon lines have it. THe reason the Itanium was so dog slow was the original didn't have OOO execution- it expected the compiler to do it for them. So I still see nothing new here, other than scaling up of existing tech.
    • by Takahashi ( 409381 ) on Tuesday October 04, 2005 @03:41AM (#13710744)
      It really is different. Its not simply a super scaler. It's a data-flow machine. What this means is that instructions are arranged in a graph based on dependency and execute as soon as all inputs are ready.

      I work in a lab at the University of Washington where we are working on _implementing_ a different data flow machine that shares some of fundamentals with the UT machine.

      • I work in a lab at the University of Washington where we are working on _implementing_ a different data flow machine that shares some of fundamentals with the UT machine.


        So, which one is better? ;-)

        For a more serious question, I read the trips overview paper on their site and it all seems to make a lot of sense. So why aren't dataflow machines mainstream? The first papers were published in the early 1980s, not much later than risc started to make some noise.
        • I'm not an architecture expert, but it would seem obvious that all the requisite mechanisms to ensure that computations only occur when the inputs are available are a lot more complex in a dataflow machine. Having a central clock to ensure your timing steps are discrete makes ensuring this much easier (trivial with a non-pipelined chip, non-trivial but much easier with a pipelined CPU). It was probably simpler to just crank up the CPU clock, add extra execution units, and add cache.

          As chip fabrication pr

        • So why aren't dataflow machines mainstream?

          The reason is that dataflow is really a non-algorithmic, signal-based approach to computing whereas most programming languages and applications are strictly algorithmic. We need to change our way of programming in a radical way before the non-algorithmic model can take off. It's not easy to translate algorithmic code into a dataflow application.

          In my opinion, even though TRIPS has 'reliability' in its acronym, unless the execution of parallel code in a given object
      • If each instruction executes when its inputs are available, rather than in any specified order, and passes outputs to the next instruction, rather than to a specific register, it seems like such a system would be best for function programming. Is there any truth to that?
    • multiple independant instructions passed to the processor at once, yeah, that sounds a lot like itanium to me. Regardeless of the "data flow magic no-register jump-and-shout" it should still face the same problems itaniums see: finding enough parallelism in the code to put in a common block. Pushing the complexity out to the compiler is only going to benefit you at all if the hardware was doing a poor job of extracting parallelism, but taking a broader, more dynamic look will reveal parallelism. Good in som
  • Branching (Score:3, Interesting)

    by shmlco ( 594907 ) on Tuesday October 04, 2005 @01:05AM (#13710357) Homepage
    The article states that this works by sending blocks of up to 128 instructions at a time to the processor, where "The processor "sees" and executes a block all at once, as if it were a single instruction..." Makes you wonder if they'd ever get close to that target, as IIRC, one instruction in seven on average is a conditional branch.
    • Re:Branching (Score:4, Informative)

      by Anakron ( 899671 ) on Tuesday October 04, 2005 @01:11AM (#13710377)
      Branches can be predicted with fairly high accuracy. And most new architectures have some form of speculation in the core. And they actually execute 16 instructions at once. Only their word is 128 instructions long.
  • Loops as functions? (Score:4, Interesting)

    by ReformedExCon ( 897248 ) <reformed.excon@gmail.com> on Tuesday October 04, 2005 @01:12AM (#13710379)
    We can understand easily how a loop could be calculated as a function, if the contents of the loop block is composed solely of calculations. When this occurs, the output of the loop is simply a function of its input (f(x), if you will). However, computer scientists who think that programs can always be reduced to a simple function with given inputs have their heads too far in their books to see how the real world forces programs to be far removed from that ivory tower gobbledygook.

    In the real world, you aren't typically performing calculations in loops. Rather, you are usually reading and writing to memory, which may or may not be cached. So it isn't just a matter of saying f(x), it is much more complicated and possibly dependent on memory which you have no way to determine until the loop iteration reaches that point. And then you'll still get the bottlenecks which plague us today. Memory isn't fast enough, devices aren't fast enough, too much time is spent waiting for I/O to complete.

    Pushing as much brute-force computation off onto compilers is fine. Let them unroll loops and optimize functions. But what are the limits to this? Can we really optimize our way to 1-step loops? I don't think so, but the DOD seems to think it is possible.
    • Can we really optimize our way to 1-step loops?

      Of course we can. Just have a look at your favourite functional programming language; it probably doesn't even have a loop construct. The question is whether this can be done efficiently. Of course, it also requires programmers to think in a different way, which they tend to be reluctant to do.
      • by vought ( 160908 ) on Tuesday October 04, 2005 @04:01AM (#13710801)
        Is the guy who runs this machine named Captain Trips?

      • Just because the programming language hides the loop doesn't mean it isn't there. The processor itself is still execution a loop, wether you use a loop, recursion, or some sort of assignment concept.
        • Only because the processor instruction set is designed in such a way that loops are necessary. I'm not at all sure about this, but I wouldn't be surprised if Lisp Machines, for example, didn't have hardware support for loops.

          In order to see that loops are completely unnecessary, you only need to see that the lambda calculus is Turing complete.
          • TO see that they are completely necessary, you just need to understand processors at a gate level.

            You can only write so much data at once- the length of the smallest bus. To write more than that, you need to issue repeated write instructions. This means a loop of some sort, deciding how many write instructions to output. You can put all the fancy math you want on top of it- the hardware is implementing a loop.
            • To write more than that, you need to issue repeated write instructions. This means a loop of some sort

              No, it can be done as a recursive function. There will be no difference in terms of electric currents running across silicon because the logic involved is exactly the same. But if the processor presents a functional (ie. function-based) interface and uses a functional design, it will use recursion rather than looping to issue repeated instructions.

              the hardware is implementing a loop

              The hardware is i
              • A recursive function is a loop, implemented in an extremely inefficient manner. You add/remove something to the stack each iteration of the loop instead of incrementing/decrementing a counter or twiddling a conditional variable. Its still a loop. You're still repeating the same code over and over. Thats the definition of a loop. You don't need a special keyword to implement it, using gotos to loop is still a loop (and a function call is just a goto with a stack push).

                I would argue that the level at whi

                • Oops, I think we've been arguing semantics the whole time. I would define a loop to be a logical construction like a "while" or a "for" in C. If it were written as a recursive function, I would not consider it to be a loop even though it executes the same instructions multiple times.

                  There is a small difference, though between loops as we know them in imperative languages and recursive functions as we know them in functional languages: functions have no side effects, which opens up the possibility of optim
                  • We are in some ways. I view everything as the hardware sees it- if you're repeating the same instructions, you're looping. Probably why I'm a C coder at heart- I want to know whats really going on, anything that abstracts me from that is a hinderance. In the context of this question its what matters- we are talking about hardware optimizations in this article.

                    You can unroll all you want, it still won't optimize the loop away. There's still a physcial limit to the amount of data transferable per operatio
  • Boring (Score:3, Interesting)

    by Rufus211 ( 221883 ) <`gro.hsikcah' `ta' `todhsals-sufur'> on Tuesday October 04, 2005 @01:18AM (#13710392) Homepage
    So glancing over the article it doesn't look like they're actually doing anything "new." Basically expanding on register renaming, speculitive execution, and the likes which making the cpu's job slighty easier to do it. Also their bit about data flow and "direct target encoding" sounds oddly like this patent [freepatentsonline.com] by Cray from 1976 (!).

    Overal they might make some things marginally more efficient, but they aren't solving any fundamental problems. They're simply moving some around slightly.

    • by Rufus211 ( 221883 ) <`gro.hsikcah' `ta' `todhsals-sufur'> on Tuesday October 04, 2005 @02:10AM (#13710539) Homepage
      So after looking into their project page [utexas.edu] I realized I actually saw a presentation given by these people last year. The article makes this sound like something it completely is not. Basically it's a grid of functional units that can connect to their neighbors. You "program" the chip by telling node 1 and 2 to take inputs and invert them, then feed the output to node 3, which then multiplies the two inputs. Really it's a glorified DSP that has some interesting programmability. Their code analyzation to generate the DSP code and then schedule it across a 3d matrix (2d function array x time) will certainly be interesting.

      What this is *not* in any form is a general purpose CPU. It won't boot linux, plain and simple. This is for doing stream data processing such as compression or HPC simulations. I seem to remember in their presentation showing a prototype doing software-radio at a data rate usable for 802.11.


      • Really it's a glorified DSP that has some interesting programmability

        Actually, it sounds more like an FPGA [wikipedia.org]. And, since VHDL [wikipedia.org] is turing-equivalent, it would actually be possible to compile C code (such as the Linux kernel) into a gate array and run it on such a chip.
        • it would actually be possible to compile C code (such as the Linux kernel) into a gate array and run it on such a chip.

          That fact a lot like the the fact that Saturn would float if you dropped it in the ocean. Both are technically true (Saturn is mostly hydrogen and helium and really is less dense than water), but Linux will no more fit in any existing FPGA than Saturn will fit in any existing ocean. Chuckle.

          -
      • What this is *not* in any form is a general purpose CPU.

        The article doesn't seem to agree:

        One of the big challenges to becoming a mainstream commercial processor is compatibility with existing software and systems, especially x86 compatibility, Moore says. But one way to maintain compatibility would be to use Trips as a co-processor, he says. "The general-purpose [x86] processor could offload heavy tasks onto the co-processor while still handling legacy compatibility on its own."

        So, it l

    • So glancing over the article it doesn't look like they're actually doing anything "new." Basically expanding on register renaming, speculitive execution, and the likes which making the cpu's job slighty easier to do it. Also their bit about data flow and "direct target encoding" sounds oddly like this patent by Cray from 1976 (!).

      But they thought up a neat acronym for it, TRIPS! Seriously though, thats how research works ... Cynically we could say they are completely full of it. They also could have som

  • by Anonymous Coward on Tuesday October 04, 2005 @01:19AM (#13710396)
    I seem to remember that Intel designed Merced (now the Itanium, known colloquially as the Itanic to reflect how well it's gone in the marketplace) to shift the burden of branch prediction and parallelism to the compiler. Or, in other words, the compiler was expected to mark instructions that were capable of running in parallel, and also to state which branches were likely to be taken.

    All a great idea in theory; after all, the compiler should be able to figure out a fair amount of this information just by looking at the flow of data through the instructions (although it may not be so good at branch prediction; I'm not sufficiently strong on compiler theory and branch prediction to talk about that.) However, as can be seen by Itanium's (lack of) market success, the compiler technology just isn't there (or maybe we're using the wrong languages; there are, after all, languages that are designed to be inherently parallel.)

    If this team can get it working the way they want to, maybe -- just maybe -- Itanium will find its niche after all. But let's not kid ourselves; this is a hard problem, and it's more likely that they'll make incremental improvements to the knowledge that's out there, rather than a major breakthrough.
  • FTA ... Finally, data flow execution is enabled by "direct target encoding," by which the results from one instruction go directly to the next consuming instruction without being temporarily stored in a centralized register file.

    This sounds really cool.

    • Yes it is. It's also in every good architecture textbook. Not new by any means. Maybe these guys actually DID do something new. However, the article is skimpy enough on details to be nearly worthless.
      • by Anonymous Coward
        Having a routed network on which data can travel between function units without being merely copies of data assigned to a register file is not very mainstream (unlike say register bypass). It's not really a new idea, getting it to work well would be though.
      • Yeah. It sounds strangely similar to Tomasulos-algorithm. I can't believe that this is the main point of the new approach.

        How about linking to the frickin' homepage of the project [utexas.edu]
    • Temprorary values can be determined by the chip dynamically, or encoded into existing instruction sets.
  • I don't get a word he says, and I know a little bit about programming. Can somone dumb this down?

    From what I know, a loop is a loop and you need to satisfy a condition and do some processing. Won't it be a problem if I don't have the data resulting from the last loop before I do the next one?

    • Re:I don't get it... (Score:4, Interesting)

      by ReformedExCon ( 897248 ) <reformed.excon@gmail.com> on Tuesday October 04, 2005 @01:32AM (#13710430)
      I alluded to this in my earlier post. Some mathematical operations are simply loops over a seed input. A summation is one example. You can reduce the calculation of a summation from a long series (infinite, perhaps) of functions executed in a loop to a single function which is valid for all inputs (voila, Calculus).

      So they say they can take loops in 128 blocks at a time and calculate the result in less than 128 loop steps. They are requiring the compiler to come up with a valid function for those 128 steps that will work for any initial parameters. If it works, it means that you are no longer executing 128 time, but only once. That is a speed-up of just over 2 orders of magnitude. Really, really amazing.

      But does it work? Can they really ask the compiler to do that much work? Is the compiler capable of being that smart? The main thing I wonder is how well this works, and how optimized it can get when the main purpose of looping is not to calculate functions but to access memory which is itself not fast.
      • Memory is the other thing they are trying to sort out.
        Or more specifically registers so instead of storing the results from an instruction in a loop while a different instruction executes, then having to access the registers to get the stored data, they execute the instructions as soon as the inputs are ready. so reducing the register (internal memory) count.

        When they do that, there are a while chunk of transistors which can now be removed from the design, or used for computation instead of storage
      • Someone else above alluded that the design is not a generic CPU, but rather a vector-like CPU, and thus more oriented towards DSP and HPC.

        In those cases, a lot of the functionality is well conforming to the features you speak of.

        Now, true, the average home PC probably doesn't do anything near close to what loop optimization they're talking about.

        That's the reason why most home PCs right now don't usually need dual procs (they don't usually execute multi-threaded apps), or HPC-oriented procs (like the Itaniu
        • I just read their spec paper. It *is* a genereric CPU and it will provide a decent boost to even a normal non-threadded non-parallel home application. Not a huge multiplier, but a better boost than the current deep pipelined fry-an-egg-on-your-CPU approach. It can dig out almost all of the potential concurrency that is hiding in even the most linear application.

          And if you *do* have a multithreaded system then one chip can run up to 8 threads in at once. And if you *do* have some heavily parallel code, like
          • Ah, thanks for the accurate representation. It does sound nifty the way you put it. :)

            Maybe in the future, I'll try and RTFA instead of trust other +5 informative comments. All too often on stuff like this you read opposing information about the technology. *sigh* It'd be cool if moderators would read the specs on stuff like this before moderating informative, so they'd know if it were informative, or disinformative.
    • Re:I don't get it... (Score:3, Informative)

      by freidog ( 706941 )
      only if the subsequent loops are dependant on data from the current loop.

      something like
      for(int i = n-1; i>0; i--){ n = n * i }

      obviously the new value of n depends on the value for n calculated by the last loop so that might not be a good candidate to try and parallelize. (actually factorial is something that can be written to take advantage of instruction level parallelism (ILP), I choose not too simply for the example).

      however, if you're doing something that is not dependant on previous loops, various
      • dependent, dependent, dependent
      • Re:I don't get it... (Score:2, Informative)

        by kirinyaga ( 652081 )
        actually, as I understand it, the following loop :

        for(int i = n-1; i>0; i--){ n = n * i }

        is probably internally transformed into the following grid in a 10-instructions TRIPS processor :

        read n(transmitted as a & b) => decr a (transmitted as a & d) => comp a,0 => mul a,b (result transmitted as c)
        => decr d (transmitted as d & f) => comp d,0 => mul c,d (result transmitted as e)
        => decr f => comp f,0 => mul e,f

        where a,b,c,d,e & f are buses wiring the instru

      • You're obviously not going to run the 32 sub-units at full throttle on general software. The idea is that you can get 32 times the speed on the sort of CPU-killer parallisable code that desparately needs it, and you get run up to 8 threads in parallel if you have them, and you can speed up even the worst "general purpose" code to probably double or triple the speed of even the most sophisticated current CPU techniques by digging out far more intruction-level-parallelism. And you can do it without a blast fu
    • Re:I don't get it... (Score:5, Interesting)

      by RootsLINUX ( 854452 ) <rootslinux@gmai[ ]om ['l.c' in gap]> on Tuesday October 04, 2005 @04:04AM (#13710807) Homepage
      I recommend you read this paper [utexas.edu]. It gives a great overall picture of what TRIPS is all about and is actually really cool. (I read it about a year ago).

      I am an ECE grad student at UT Austin so I know quite well of TRIPS. In fact I often speak with Doug Burger himself because he's the faculty advisor for the UT Marathon team, of which I am a member. (By the way, his name is "Burger" not "Berger"). I think TRIPS is an awesome concept and its exactly the kind of project that I wanted to be a part of when I became a grad student at UT. I also know Steve Keckler because I'm taking his advanced computer architecture course this semester, and we're actually spending a good chunk of time talking about TRIPS (course schedule [utexas.edu]).
  • by Articuno ( 693740 ) <articunothelegend.yahoo@com@br> on Tuesday October 04, 2005 @01:26AM (#13710416) Homepage
    Bugs on the chip can lead to bad Trips
    • The article alludes to executing large numbers of executions simultainiously. Like creating new pathways in the brain that make certain modes of thought more efficient. If it works the shortcuts will avoid many program loops that would normally take processing time and make the trip shorter.

      I suppose the whole thing will have to be ACID compliant;)
  • by TCaM ( 308943 ) on Tuesday October 04, 2005 @01:26AM (#13710417) Homepage
    Their NEXT next generations chips will be powered entirely by buzzwords and acronyms.

  • TRIPS Project at UoT (Score:1, Informative)

    by Anonymous Coward
  • by Gopal.V ( 532678 ) on Tuesday October 04, 2005 @01:39AM (#13710445) Homepage Journal
    > is that for application software to take advantage of those multiple cores, programmers must structure
    > their code for parallel processing, and that's difficult or impossible for some applications.
    >
    > "The industry is running into a programmability wall, passing the buck to software and hoping the programmer
    > will be able to write codes for their systems," he says.

    So you want the programmer to be unaware of the parallel processing. Then the article goes off and says something stupid IMHO.

    > a huge amount of control logic, control transistors that don't do any work -- they just consume power. Trips is trying to push some of that complexity back up into the compiler

    I thought the point of TRIPS was to make the chip do all the scheduling (ie the Data Flow architecture) rather than depend on the compiler generated sequence of instructions. As a hobbyist compiler dev, I'd like to note that the data flow architecture is the basis of all compiler optimizers (DAG), though the typical compiler dev is likely to use this input to allocate registers to minimize pipeline stalls. I admit that it can be done at the CPU level to some extent - then this is even stranger.

    > Trips compiler sends executable code to the hardware in blocks of up to 128 instructions. The processor "sees" and executes a block all at once, as if it were a single instruction, greatly decreasing the overhead associated

    Somehow this just shifts the hard work of peephole optimisation to the CPU to be done at real time. It would have been far better to do it in the compiler properly - something which needs extra memory and lots more processing than the code that is being executed.

    All in all, I don't see this thing revolutionizing General purpose programming systems. Though what I call special purpose programming might be the way the future of programming might go - I'm no Gordon Moore.
  • Does any major piece of software that folks use come from UT?

    I can think of famous projects from MIT, Berkeley, Stanford, CMU, Caltech, Cornell ...

    But I can't think of a single one from UT. Not a single one. Is there something we all use that comes from UT?

    I know they have good petroleum engineering at A&M -- but I'm interested in CS.
    • Don't look down on the Texans. It has one of the highest ranked computer engineer programs in the country. I've heard of Doug Berger before and we have read his research papers and use his simulators (made between him and Todd Austin of Wisconsin) in our graduate classes at CMU (I'm BS&MS ECE, CS '01).

      Austin also has a high number of tech companies around - heck, AMD, IBM, Intel, Freescale, just to name a few. It's nicknamed Silicon Hills. UT may not have the legacies like that of MIT, CMU, Berkeley, St
      • Don't look down on the Texans. It has one of the highest ranked computer engineer programs in the country. I've heard of Doug Berger before and we have read his research papers and use his simulators (made between him and Todd Austin of Wisconsin) in our graduate classes at CMU (I'm BS&MS ECE, CS '01).

        I didn't ask about how well their program is rated. Has UT produced any programs that people use? E.g.

        MIT -- Kerberos
        Berkeley - RISC, BSD Unix, RAID, TCP/IP networking as standard OS feature
        Stanfo
        • UT -- a simulator that you've used. What sort of simulator, please?

          It's called SimpleScalar, a superscalar microarchitecture simulator. We have developed trace cache simulators with it in '98-'99 among other things. (Pentium 4 implementation wasn't that great however, it was a high cost cache anyway.)

          Most of the technology that you mentioned were developed 2 decades ago. Their pervasiveness today reflect the years of research and development that has gone into it. RAID was an idea developed by Patterson in
    • Why wouldn't they have CS programs in Texas?

      What, you think all they teach at Texas univiersities is agriculture and oil-related subjects?

      Don't judge Texas until you've spent some time there. I hate the place, but I'm from Oklahoma where hating Texas is a requirement of citizenship.
      • I've spent plenty of time in Texas.

        UT is a huge system. Upon reflecting on that and their relative lack of released software, I began to wonder if they'd made anything worth using.

        I forgot about ACL2, the only software project I've heard of that comes from Texas.
        • you just have a lack of knowledge about it.

          the UT applied research lab has developed the basis technology behind pretty much every US military sonar system in use since WWII. Ditto with a number of satellite and other techs (mostly defense related, but all that trickles down into mainstream usage). ARL is a combination of CS, ME, EE and other engineering fields.

          Numerous search engine technologies and the closely related 'recommendation' systems that places like amazon uses have been born and bred...

          UT doe
    • There was a somewhat famous CS person named Djikstra who taught there for years. Perhaps you've heard of him.

      He set the tone for UT's best known research for years - theory. They've also got a couple of well known robotics labs (not as well funded as CMU, but they're more focused on improving the software brains than building big flashy machines to crash around in a desert)

      Beyond CS undergrad (which is UT's second largest major, behind Biology - and UT is the highest populated university in the USA), UT's

    • Does any major piece of software that folks use come from UT? I can think of famous projects from MIT, Berkeley, Stanford, CMU, Caltech, Cornell ... But I can't think of a single one from UT.

      National Instruments, of Austin, TX [ni.com], sells a graphical programming language, called LabVIEW [ni.com], which has about a 90% market share in the research sector [both for-profit and "not-for-profit"], and which is moving aggressively into the automation sector [i.e. the factory floor].

      PS: Ironically, LabVIEW 8.0 was just ann [ni.com]

  • by Anonymous Coward
    When it comes to cpu design, such high level articles convey no information at all. It is akin to saying that I'm designing a cpu with 17 pipeline stages, 47-bit instruction words, 713k of L1 cache and 12 general purpose registers... What does all this tell you ? Precisely nothing because it all boils down to what frequency this chip will run at once the design is turned into transistors, how much current it will draw, etc. And this is not something people without vlsi design experience can speculate about.
    • What does all this tell you ? Precisely nothing because it all boils down to what frequency this chip will run at once the design is turned into transistors, how much current it will draw, etc.

      You're a product of Intel's marketing. AMD has been able to consistently produce systems that meet or beat Intel's performance with half the clock speed, because they have better instruction pipelining. (if only they could fix their manufacturing problems....)

      Frequency amounts to squat in the final evaluation. Sure,
    • Judging from the responses to the article, the article was already written at a level that was too high to be understandable for a sizeable portion of the slashdot crowd. Many of them do not appear to understand what dataflow computing is.
  • Old idéas? (Score:2, Insightful)

    by Anzya ( 464805 )
    This looks to me to be a combination of old and not so good idéas.
    I have read about out of order execution and using data when ready at least 5 years ago in Hennesy and Pattersons book "Computer Architecture A Quantitative Approach". To me it sounds like a typical scoreboarding architecture.
    And how he can claim that this will lead to less control logic someone else might be able to explain to me.
    As for executing two instruction at once since their destination and value are the same sounds like a operat
  • VLIW (superscalar) ? (Score:3, Interesting)

    by silverbyte ( 213973 ) on Tuesday October 04, 2005 @01:56AM (#13710490)
    IS it just me, or does this approach sound very similar to VLIW (http://en.wikipedia.org/wiki/VLIW [wikipedia.org]) architecture. The problem is that the branch prediction needs to be very accurate, for any kind of performance boost.
    Which is why these types of architecture lend very well to sequences of operations that are very similar (video processing, etc.).
    Will this work just as well in the general-computing sphere? No idea.
  • I wonder how long it's going to take these innovations to catch on in mainstream computing? Given that most desktops are still running on architectures burdened by 30-year-old design practices... I'd just like to see RISC finally embraced to the degree it deserves. That alone would certainly open up a lot of innovative designs that aren't feasible with the x86.
  • by pubjames ( 468013 ) on Tuesday October 04, 2005 @02:55AM (#13710646)
    I had an interesting discussion with a chip designer the other day. We were talking about parallel processing, and I spouted the usual perceived wisdom "But isn't the problem with parallel processing that many problems are very difficult or impossible to do in parallel? And isn't programming in parallel really difficult?"

    I found his answer very interesting, something like "that line of thinking comes from when computers weren't fast enough to do the basic things we wanted to do with them to do then. It's true, an application like a word processor is not a good problem to tackle with parallel processing - but we don't need to these days. Nearly all the stuff we want to do today - faster graphics, 3D video image and sound processing, processing massive amounts of data on the web, all the processing that goes into keeping the internet and telephone networks going - all of these problems are idea for parallel processing. What Google does - that's essentially parallel processing, isn't it?"

    That kind of changed my perception of things and made me realise my mindset was way out of date.

  • while I'm reading TFA could someone explain why branch "prediction" is such a big sticking point in CPU architecture... surely a processor has the compiled code and a bunch of data, it doesn't need to predict anything because it's all laid out. and by that i mean "for... if... break;" processor shouldn't be surprised when it gets to that nested if and reaches the break and has to jump out of the loop cos it's clearly there in the code to start with, it's not like it just magically showed up, is it.
    • Sure, I can try. All of this stuff about branch prediction is basically the result of something called 'pipelining.' The rational for pipelining goes something like this: an instruction on a modern computer chip is executed in several stages (fetch, decode, execute, and writeback, in an iconic sense) For any particular instruction you can't begin one stage before you've completed the previous stage. Different stages require different hardware on the chip, so in a non-pipelined CPU some parts of the chip
    • by tlambert ( 566799 ) on Tuesday October 04, 2005 @04:36AM (#13710884)
      Correct prediction keeps your instruction pipeline full. This is particularly important for code with long pipelines.

      Incorrect prediction results in having to back out CPU state from the speculative execution that has already taken place (this is called "squashing" the mispredicted instructions), and effectively this loses the pipeline slots that were used to perform the mispredicted execution. From an outside perspective, these lost slots look like a pipeline latency.

      (insert rude comment about GCC #pragma branch hinting and [lack of] basic block reordering to avoid cache busting on PPC here)

      -- Terry
    • The 'branch prediction' problem isn't a matter of predicting whether or not a branch will occur, it's predicting which of the outcomes the branch will take. The problem occurs in CPUs that are pipelined. In essence when one instruction is just completing, the processor has already started working on 5 or more subsequent instructions. Now the problem is that in order to this, it must 'predict' the outcome of a branch instruction. If it gets this prediction wrong, the work thats been done on those subsequ
  • by SoupIsGood Food ( 1179 ) * on Tuesday October 04, 2005 @03:16AM (#13710690)
    It seems to me any serious research into microprocessors will be hampered by the fact that it will be completely inapplicable unless it dumbs itself down to ape the x86 instruction set. All current and future processor design advances will be defined as better and faster ways of making modern silicon pretend it's a member of a chip family that was obsolete when the first President Bush was in office. That's not progress. That's just kind of sad.

    Heaven help any researcher if implementing their new chip design requires a new software paradigm that doesn't fit neatly into the OS/Application model, too. We're living in the perpetual now of 2000, and it's some boring shit. I want my future back.

    Bah.

    SoupIsGood Food
    • That's just a load of bunk. Let's see, how many ARM based 32 bit microprocessor were made last year? -- Over 500 Million, kind of puts x86 sales to shame. They saw 278 Million units in sales in one quarter last year. If you think everythings x86, you've just got your head in the sand.
  • by Takahashi ( 409381 ) on Tuesday October 04, 2005 @03:54AM (#13710785)
    This is not some boring super scaler! Nor is it some vector processor!

    in fact this is a complete departure from a von Neuman architecture. The architecture is called a Dataflow architecture. In one sentence a dataflow architecture is one where instruction execution is based on the availability of the instructions inputs not a program counter.

    The article does a very bad job at conveying the fact that this is a relatively new idea. Like most reporting they report something thats been in research for some time as a huge breakthrough without describing it at all. Instead its really just an incremental step in dataflow computing research.

    I work in a lab at the University of Washington on another dataflow architecture. Its a really interesting idea but it will take some time to develop and you're not going to get one on your desk for some years to come.
    • Dataflow is old news, but there were some fuzzy words in the article which seemed to imply that they were doing some sort of lazy partial evaluation in hardware. That seems like an interesting idea, and one generally applicable to any ISA: Imagine that your compiler could mark the interesting output registers for a basic block, and then the chip could optimize away all of the
      side-effects! The power savings alone would be enormous, plus you could fill pipelines with just those ops which were actually usef
  • by kwikrick ( 755625 ) on Tuesday October 04, 2005 @05:07AM (#13710959) Journal
    the homepage for the TRIPS project: http://www.cs.utexas.edu/users/cart/trips/ [utexas.edu] because the article doesn't do a good job at explaining the idea, which I think is very interesting. It's not mere branch prediction these people are talking about, and it's more than dumb parallel processing. They are basically fragmenting programs into small dataflow networks.
  • by master_p ( 608214 ) on Tuesday October 04, 2005 @05:09AM (#13710963)
    Pure functional programming languages will see a tremendous boost from architectures like Trips. In functional programming languages, variables are never assigned, thus making it possible for all parts of an expression to be executed simultaneously. With 128 instructions, it is possible that lots of algorithms that take lots of time when executed sequentially, will take constant time with this new architecture: matrix operations, quicksort, etc.
  • by Erich ( 151 )
    This is like a host of other academic projects. They all start out with the premise "Suppose I have this grid of CPUs/ALUs/whatever". Then they use an army of grad students to hand code for the grid. You get some interesting SPEC results, publish some papers, and get more research money. This is not new, this has been the case for a long, long time.

    But often the ideas don't pan out in real life. With TRIPS, you get inflated IPC results from inflated instruction counts from huge superblock schedules.

Term, holidays, term, holidays, till we leave school, and then work, work, work till we die. -- C.S. Lewis

Working...