Next Generation Chip Research 174
Nyxs writes to tell us Sci-Tech-Today is reporting that researchers at the University of Texas are taking a new approach to designing microprocessor architecture. Doug Berger, a computer science professor at the University of Texas, and his colleagues hope to solve many of the pressing problems facing chip designers today with the new "microprocessor and instruction set architecture called Trips, or the Teraop Reliable Intelligently Adaptive Processing System."
pressing problems (Score:5, Funny)
Re:pressing problems (Score:4, Interesting)
Disclaimer: In spite of having a degree from the school, I have a very low opinion of it. Yeah, it's large enough physically, and they had some oil money, but IMO they optimized towards narrow-minded mind-narrowing efficiency rather than breadth. Real education is about the breadth. Unfortunately, these days I feel as though my real alma mater seems to be following a similar path to mediocrity.
Re:pressing problems (Score:2)
Is this simply a VLIW architecture? (Score:4, Insightful)
Re:Is this simply a VLIW architecture? (Score:4, Informative)
They are executed in a JIT (just in time) fashon.
currently with deep pipelines results can get stored in registers for a few cycles. this aims to execute instructions as soon as it can. That way it's needing alot less registers to store results.
It's also meaning instructions are executed out of order AND in parallel in an effort to both increase speed and decrease chip complexity.
If you don't have to use a transistor for storage / control, you can use it for the good bits, generating your answer.
Mozart/Oz candidate (Score:2)
Re:Is this simply a VLIW architecture? (Score:2)
-Rick
Re:Is this simply a VLIW architecture? (Score:2)
I don't see a whole lot of difference - they are using JIT techniques to get around the recompile for new hardware problems of VLIW. Beyond that it's just VLIW warmed over.
Maybe they have some new ideas within the VLIW compiler space?
In any case, I don't see it as revolutionary...more like evolutionary - and even then, just barely.
Re:Is this simply a VLIW architecture? (Score:2)
Re:Is this simply a VLIW architecture? (Score:4, Informative)
I work in a lab at the University of Washington where we are working on _implementing_ a different data flow machine that shares some of fundamentals with the UT machine.
Re:Is this simply a VLIW architecture? (Score:2)
I work in a lab at the University of Washington where we are working on _implementing_ a different data flow machine that shares some of fundamentals with the UT machine.
So, which one is better?
For a more serious question, I read the trips overview paper on their site and it all seems to make a lot of sense. So why aren't dataflow machines mainstream? The first papers were published in the early 1980s, not much later than risc started to make some noise.
Re:Is this simply a VLIW architecture? (Score:2)
As chip fabrication pr
Dataflow is Non-algorithmic (Score:2)
The reason is that dataflow is really a non-algorithmic, signal-based approach to computing whereas most programming languages and applications are strictly algorithmic. We need to change our way of programming in a radical way before the non-algorithmic model can take off. It's not easy to translate algorithmic code into a dataflow application.
In my opinion, even though TRIPS has 'reliability' in its acronym, unless the execution of parallel code in a given object
Lisp Machine? (Score:2)
Re:Is this simply a VLIW architecture? (Score:2)
Branching (Score:3, Interesting)
Re:Branching (Score:4, Informative)
Loops as functions? (Score:4, Interesting)
In the real world, you aren't typically performing calculations in loops. Rather, you are usually reading and writing to memory, which may or may not be cached. So it isn't just a matter of saying f(x), it is much more complicated and possibly dependent on memory which you have no way to determine until the loop iteration reaches that point. And then you'll still get the bottlenecks which plague us today. Memory isn't fast enough, devices aren't fast enough, too much time is spent waiting for I/O to complete.
Pushing as much brute-force computation off onto compilers is fine. Let them unroll loops and optimize functions. But what are the limits to this? Can we really optimize our way to 1-step loops? I don't think so, but the DOD seems to think it is possible.
Re:Loops as functions? (Score:2)
Of course we can. Just have a look at your favourite functional programming language; it probably doesn't even have a loop construct. The question is whether this can be done efficiently. Of course, it also requires programmers to think in a different way, which they tend to be reluctant to do.
I'd really like to know... (Score:4, Funny)
Re:Loops as functions? (Score:2)
Re:Loops as functions? (Score:2)
In order to see that loops are completely unnecessary, you only need to see that the lambda calculus is Turing complete.
Re:Loops as functions? (Score:2)
You can only write so much data at once- the length of the smallest bus. To write more than that, you need to issue repeated write instructions. This means a loop of some sort, deciding how many write instructions to output. You can put all the fancy math you want on top of it- the hardware is implementing a loop.
Re:Loops as functions? (Score:2)
No, it can be done as a recursive function. There will be no difference in terms of electric currents running across silicon because the logic involved is exactly the same. But if the processor presents a functional (ie. function-based) interface and uses a functional design, it will use recursion rather than looping to issue repeated instructions.
the hardware is implementing a loop
The hardware is i
Re:Loops as functions? (Score:2)
Re:Loops as functions? (Score:2)
There is a small difference, though between loops as we know them in imperative languages and recursive functions as we know them in functional languages: functions have no side effects, which opens up the possibility of optim
Re:Loops as functions? (Score:2)
You can unroll all you want, it still won't optimize the loop away. There's still a physcial limit to the amount of data transferable per operatio
Boring (Score:3, Interesting)
Overal they might make some things marginally more efficient, but they aren't solving any fundamental problems. They're simply moving some around slightly.
Re:Boring (article, not project) (Score:5, Informative)
What this is *not* in any form is a general purpose CPU. It won't boot linux, plain and simple. This is for doing stream data processing such as compression or HPC simulations. I seem to remember in their presentation showing a prototype doing software-radio at a data rate usable for 802.11.
Re:Boring (article, not project) (Score:3, Insightful)
Actually, it sounds more like an FPGA [wikipedia.org]. And, since VHDL [wikipedia.org] is turing-equivalent, it would actually be possible to compile C code (such as the Linux kernel) into a gate array and run it on such a chip.
Re:Boring (article, not project) (Score:2)
That fact a lot like the the fact that Saturn would float if you dropped it in the ocean. Both are technically true (Saturn is mostly hydrogen and helium and really is less dense than water), but Linux will no more fit in any existing FPGA than Saturn will fit in any existing ocean. Chuckle.
-
An easier to program Itanic-workalike? (Score:3, Interesting)
The article doesn't seem to agree:
So, it l
Re:Boring (Score:2)
But they thought up a neat acronym for it, TRIPS! Seriously though, thats how research works ... Cynically we could say they are completely full of it. They also could have som
Isn't this what Intel tried to do with Merced? (Score:3, Insightful)
All a great idea in theory; after all, the compiler should be able to figure out a fair amount of this information just by looking at the flow of data through the instructions (although it may not be so good at branch prediction; I'm not sufficiently strong on compiler theory and branch prediction to talk about that.) However, as can be seen by Itanium's (lack of) market success, the compiler technology just isn't there (or maybe we're using the wrong languages; there are, after all, languages that are designed to be inherently parallel.)
If this team can get it working the way they want to, maybe -- just maybe -- Itanium will find its niche after all. But let's not kid ourselves; this is a hard problem, and it's more likely that they'll make incremental improvements to the knowledge that's out there, rather than a major breakthrough.
Reduction in register use (Score:2, Interesting)
This sounds really cool.
Re:Reduction in register use (Score:1)
Re:Reduction in register use (Score:1, Insightful)
Re:Reduction in register use (Score:2)
How about linking to the frickin' homepage of the project [utexas.edu]
Re:Reduction in register use (Score:2)
I don't get it... (Score:2)
From what I know, a loop is a loop and you need to satisfy a condition and do some processing. Won't it be a problem if I don't have the data resulting from the last loop before I do the next one?
Re:I don't get it... (Score:4, Interesting)
So they say they can take loops in 128 blocks at a time and calculate the result in less than 128 loop steps. They are requiring the compiler to come up with a valid function for those 128 steps that will work for any initial parameters. If it works, it means that you are no longer executing 128 time, but only once. That is a speed-up of just over 2 orders of magnitude. Really, really amazing.
But does it work? Can they really ask the compiler to do that much work? Is the compiler capable of being that smart? The main thing I wonder is how well this works, and how optimized it can get when the main purpose of looping is not to calculate functions but to access memory which is itself not fast.
Re:I don't get it... (Score:2)
Or more specifically registers so instead of storing the results from an instruction in a loop while a different instruction executes, then having to access the registers to get the stored data, they execute the instructions as soon as the inputs are ready. so reducing the register (internal memory) count.
When they do that, there are a while chunk of transistors which can now be removed from the design, or used for computation instead of storage
Re:I don't get it... (Score:2)
In those cases, a lot of the functionality is well conforming to the features you speak of.
Now, true, the average home PC probably doesn't do anything near close to what loop optimization they're talking about.
That's the reason why most home PCs right now don't usually need dual procs (they don't usually execute multi-threaded apps), or HPC-oriented procs (like the Itaniu
Re:I don't get it... (Score:2)
And if you *do* have a multithreaded system then one chip can run up to 8 threads in at once. And if you *do* have some heavily parallel code, like
Re:I don't get it... (Score:2)
Maybe in the future, I'll try and RTFA instead of trust other +5 informative comments. All too often on stuff like this you read opposing information about the technology. *sigh* It'd be cool if moderators would read the specs on stuff like this before moderating informative, so they'd know if it were informative, or disinformative.
Re:I don't get it... (Score:2)
-
Re:I don't get it... (Score:3, Informative)
something like
for(int i = n-1; i>0; i--){ n = n * i }
obviously the new value of n depends on the value for n calculated by the last loop so that might not be a good candidate to try and parallelize. (actually factorial is something that can be written to take advantage of instruction level parallelism (ILP), I choose not too simply for the example).
however, if you're doing something that is not dependant on previous loops, various
dependent (Score:1)
Re:I don't get it... (Score:2, Informative)
for(int i = n-1; i>0; i--){ n = n * i }
is probably internally transformed into the following grid in a 10-instructions TRIPS processor :
read n(transmitted as a & b) => decr a (transmitted as a & d) => comp a,0 => mul a,b (result transmitted as c)
=> decr d (transmitted as d & f) => comp d,0 => mul c,d (result transmitted as e)
=> decr f => comp f,0 => mul e,f
where a,b,c,d,e & f are buses wiring the instru
Re:I don't get it... (Score:2)
Re:I don't get it... (Score:5, Interesting)
I am an ECE grad student at UT Austin so I know quite well of TRIPS. In fact I often speak with Doug Burger himself because he's the faculty advisor for the UT Marathon team, of which I am a member. (By the way, his name is "Burger" not "Berger"). I think TRIPS is an awesome concept and its exactly the kind of project that I wanted to be a part of when I became a grad student at UT. I also know Steve Keckler because I'm taking his advanced computer architecture course this semester, and we're actually spending a good chunk of time talking about TRIPS (course schedule [utexas.edu]).
It uses the LSD technology ... (Score:5, Funny)
Re:Bad Trips ... (Score:2)
I suppose the whole thing will have to be ACID compliant;)
In other news. (Score:4, Funny)
TRIPS Project at UoT (Score:1, Informative)
Some contradictions in TFA (Score:5, Insightful)
> their code for parallel processing, and that's difficult or impossible for some applications.
>
> "The industry is running into a programmability wall, passing the buck to software and hoping the programmer
> will be able to write codes for their systems," he says.
So you want the programmer to be unaware of the parallel processing. Then the article goes off and says something stupid IMHO.
> a huge amount of control logic, control transistors that don't do any work -- they just consume power. Trips is trying to push some of that complexity back up into the compilerI thought the point of TRIPS was to make the chip do all the scheduling (ie the Data Flow architecture) rather than depend on the compiler generated sequence of instructions. As a hobbyist compiler dev, I'd like to note that the data flow architecture is the basis of all compiler optimizers (DAG), though the typical compiler dev is likely to use this input to allocate registers to minimize pipeline stalls. I admit that it can be done at the CPU level to some extent - then this is even stranger.
> Trips compiler sends executable code to the hardware in blocks of up to 128 instructions. The processor "sees" and executes a block all at once, as if it were a single instruction, greatly decreasing the overhead associatedSomehow this just shifts the hard work of peephole optimisation to the CPU to be done at real time. It would have been far better to do it in the compiler properly - something which needs extra memory and lots more processing than the code that is being executed.
All in all, I don't see this thing revolutionizing General purpose programming systems. Though what I call special purpose programming might be the way the future of programming might go - I'm no Gordon Moore.A vastly better site for information (Score:3, Informative)
They have CS Programs in Texas? (Score:2)
I can think of famous projects from MIT, Berkeley, Stanford, CMU, Caltech, Cornell
But I can't think of a single one from UT. Not a single one. Is there something we all use that comes from UT?
I know they have good petroleum engineering at A&M -- but I'm interested in CS.
Re:They have CS Programs in Texas? (Score:3, Informative)
Austin also has a high number of tech companies around - heck, AMD, IBM, Intel, Freescale, just to name a few. It's nicknamed Silicon Hills. UT may not have the legacies like that of MIT, CMU, Berkeley, St
Re:They have CS Programs in Texas? (Score:1, Redundant)
I didn't ask about how well their program is rated. Has UT produced any programs that people use? E.g.
MIT -- Kerberos
Berkeley - RISC, BSD Unix, RAID, TCP/IP networking as standard OS feature
Stanfo
Re:They have CS Programs in Texas? (Score:2)
It's called SimpleScalar, a superscalar microarchitecture simulator. We have developed trace cache simulators with it in '98-'99 among other things. (Pentium 4 implementation wasn't that great however, it was a high cost cache anyway.)
Most of the technology that you mentioned were developed 2 decades ago. Their pervasiveness today reflect the years of research and development that has gone into it. RAID was an idea developed by Patterson in
Re:They have CS Programs in Texas? (Score:2)
I mention Kerberos, RISC and RAID because that's what people are using, right now, not because it is the latest and greatest.
Re:They have CS Programs in Texas? (Score:3, Insightful)
What, you think all they teach at Texas univiersities is agriculture and oil-related subjects?
Don't judge Texas until you've spent some time there. I hate the place, but I'm from Oklahoma where hating Texas is a requirement of citizenship.
Re:They have CS Programs in Texas? (Score:2)
UT is a huge system. Upon reflecting on that and their relative lack of released software, I began to wonder if they'd made anything worth using.
I forgot about ACL2, the only software project I've heard of that comes from Texas.
there is no lack of release software (Score:2)
the UT applied research lab has developed the basis technology behind pretty much every US military sonar system in use since WWII. Ditto with a number of satellite and other techs (mostly defense related, but all that trickles down into mainstream usage). ARL is a combination of CS, ME, EE and other engineering fields.
Numerous search engine technologies and the closely related 'recommendation' systems that places like amazon uses have been born and bred...
UT doe
yes, lots in the theory area (Score:2)
He set the tone for UT's best known research for years - theory. They've also got a couple of well known robotics labs (not as well funded as CMU, but they're more focused on improving the software brains than building big flashy machines to crash around in a desert)
Beyond CS undergrad (which is UT's second largest major, behind Biology - and UT is the highest populated university in the USA), UT's
LabVIEW, by National Instruments, of Austin, TX (Score:2)
Does any major piece of software that folks use come from UT? I can think of famous projects from MIT, Berkeley, Stanford, CMU, Caltech, Cornell
National Instruments, of Austin, TX [ni.com], sells a graphical programming language, called LabVIEW [ni.com], which has about a 90% market share in the research sector [both for-profit and "not-for-profit"], and which is moving aggressively into the automation sector [i.e. the factory floor].
PS: Ironically, LabVIEW 8.0 was just ann [ni.com]
National Instruments -vs- Agilent (Score:2)
Unlike many of their competitors [e.g. Agilent], National Instruments weathered the dot-com/dot-bomb tech debacle pretty well...
Here's a better graphic of what I was talking about:
Or this:
Re:They have CS Programs in Texas? (Score:2)
MIT -- Kerberos
Berkeley - RISC, BSD Unix, RAID, TCP/IP networking as standard OS feature
Stanford -- RISC
Cornell -- distributed systems research
Caltech -- Carver Mead (VLSI, machine vision)
UT -- ?????????
Re:They have CS Programs in Texas? (Score:1)
Try reading 'flat' and then it will make sense.
The article is too high level (Score:1, Insightful)
Re:The article is too high level (Score:2, Insightful)
You're a product of Intel's marketing. AMD has been able to consistently produce systems that meet or beat Intel's performance with half the clock speed, because they have better instruction pipelining. (if only they could fix their manufacturing problems....)
Frequency amounts to squat in the final evaluation. Sure,
Re:The article is too high level (Score:2)
Re:The article is too high level (Score:3, Interesting)
What ends up happening is that parts are cherry picked before they're sold (with the costs passed down to the customers) or that the parts are binned and sold at different levels such as the case for Intel chips.
Increasingly methods to improve yield rates drive some of the design decisions, sometimes even at the architectura
Old idéas? (Score:2, Insightful)
I have read about out of order execution and using data when ready at least 5 years ago in Hennesy and Pattersons book "Computer Architecture A Quantitative Approach". To me it sounds like a typical scoreboarding architecture.
And how he can claim that this will lead to less control logic someone else might be able to explain to me.
As for executing two instruction at once since their destination and value are the same sounds like a operat
VLIW (superscalar) ? (Score:3, Interesting)
Which is why these types of architecture lend very well to sequences of operations that are very similar (video processing, etc.).
Will this work just as well in the general-computing sphere? No idea.
When are we gonna actually see this? (Score:1)
Parallel processing (Score:5, Insightful)
I found his answer very interesting, something like "that line of thinking comes from when computers weren't fast enough to do the basic things we wanted to do with them to do then. It's true, an application like a word processor is not a good problem to tackle with parallel processing - but we don't need to these days. Nearly all the stuff we want to do today - faster graphics, 3D video image and sound processing, processing massive amounts of data on the web, all the processing that goes into keeping the internet and telephone networks going - all of these problems are idea for parallel processing. What Google does - that's essentially parallel processing, isn't it?"
That kind of changed my perception of things and made me realise my mindset was way out of date.
Re:Parallel processing (Score:2)
Your car's engine isn't doing a lot of sound processing, is it? Your cell phone isn't doing a lot of 3D graphics?
No, but parallel processing aren't needed for those tasks.
The carefully cherry-picked sample of tasks provided
So you think "faster graphics, 3D, video, image and sound processing, processing massive amounts of data on the web, all the processing that goes into keeping the internet and telephone networks going" is a cherry
Re:Parallel processing (Score:2, Informative)
Re:Parallel processing (Score:2)
dumb question re: branch prediction (Score:1)
Re:dumb question re: branch prediction (Score:3, Informative)
Re:dumb question re: branch prediction (Score:4, Interesting)
Incorrect prediction results in having to back out CPU state from the speculative execution that has already taken place (this is called "squashing" the mispredicted instructions), and effectively this loses the pipeline slots that were used to perform the mispredicted execution. From an outside perspective, these lost slots look like a pipeline latency.
(insert rude comment about GCC #pragma branch hinting and [lack of] basic block reordering to avoid cache busting on PPC here)
-- Terry
Re:dumb question re: branch prediction (Score:2)
TRIPS Homepage and original announcement (Score:4, Informative)
The original announcement came in 2003:
http://www.utexas.edu/opa/news/03newsreleases/nr_Call me bitter, but... (Score:5, Interesting)
Heaven help any researcher if implementing their new chip design requires a new software paradigm that doesn't fit neatly into the OS/Application model, too. We're living in the perpetual now of 2000, and it's some boring shit. I want my future back.
Bah.
SoupIsGood Food
Re:Call me bitter, but... (Score:2)
Correct that... (Score:2)
You've all got the wrong idea (Score:5, Interesting)
in fact this is a complete departure from a von Neuman architecture. The architecture is called a Dataflow architecture. In one sentence a dataflow architecture is one where instruction execution is based on the availability of the instructions inputs not a program counter.
The article does a very bad job at conveying the fact that this is a relatively new idea. Like most reporting they report something thats been in research for some time as a huge breakthrough without describing it at all. Instead its really just an incremental step in dataflow computing research.
I work in a lab at the University of Washington on another dataflow architecture. Its a really interesting idea but it will take some time to develop and you're not going to get one on your desk for some years to come.
Re:You've all got the wrong idea (Score:2)
side-effects! The power savings alone would be enormous, plus you could fill pipelines with just those ops which were actually usef
Dataflow's been around for a while (Score:2)
See also this post [slashdot.org].
better to take a look at.... (Score:3, Informative)
A chance for pure functional languages to shine. (Score:3, Interesting)
Academia (Score:2)
But often the ideas don't pan out in real life. With TRIPS, you get inflated IPC results from inflated instruction counts from huge superblock schedules.
No (Score:1)
Re:No (Score:2, Funny)
Maybe Linux.... (Score:1)
Nope, but it runs on Linux tough!
ROTFL
No Joke, but you could:
1. Volunteer yourself,
2. Buy this Titanic II TRIPS chip,
3. Port GCC to it,
4. Compile Linux,
5. Be an hero!
???
6. Sorry, No profit.
Re:Cue the... (Score:2)
(Note: I'm alumnus of New Mexico State University, a rival school, so take this with a grain of salt.)
Re:Cue the... (Score:2)
So, hey, Genius. I'm teasing him that of course it's UT Austin, because no one cares about UTEP.
shit, why don't you try understanding posts, and implicit statements contained within them. Oh that's right. 90% of the slashdot crowd gets mad when you do that, because the