The Quest for More Processing Power 104
Hack Jandy writes "AnandTech has a very thorough, but not overly technical, article detailing CPU scaling over the last decade or so. The author goes into specific details on how CPUs have overcome limitations of die size, instruction size and power to design the next generation of chips. Part I, published today, talks specifically about the limitations of multiple cores and multiple threads on processors."
There can only be one (Score:5, Funny)
Re:There can only be one (Score:4, Interesting)
Re:There can only be one (Score:1)
Re:There can only be one (Score:5, Interesting)
Re:There can only be one (Score:2)
Who needs this much power? (Score:3, Interesting)
Who needs so much raw processing power? Your everyday Joe Computer User, only uses it for Word Processing and checking email, and surfing the interweb. Which is why when some of my friends (or their parents) go looking for a new computer, I ask them what they use their computer for, mostly. If they're not eXtreme gamers or something, then I don't see a point with them buying a processor screaming along at 4 Ghz or whatever.
In the light of this
Re:There can only be one (Score:1)
Actually you wouldn't need to develop it - you could charge the banks/pension funds money (e.g. one beeelion dollars) to not develop it. Oh, I mean, you'd give institutional shareholders votes on the technical steering comittee. Got to be careful of those pesky anti blackmail laws.
Re:There can only be one (Score:1)
Quickly stated the problem is this. DNA contains a code which determines the sequence of amino acids in a protein. If you streteched out a protein, it would be a linear chain of amino acids - and this is the form the cell assembles it in. As the chain comes out of the assembly machine, it folds into a shape, and the shape determines what it does. We can find sequence of genes in the DNA t
Re:There can only be one (Score:1)
But do quantum computers run Linux? *rimshot*
We dont need more power (Score:3, Interesting)
Re:We dont need more power (Score:4, Informative)
I remember the old motorola 68000 range having 16 32bit regs for general coding, and one of the prime benefits of the ppc was the vastly greater registry capacity.
I stopped coding assembler when I moved to x86 - what a horrible cludge of a stack stack biased platform it is.
Re:We dont need more registers (Score:5, Interesting)
What kind of algorithm are you imagining would benefit from 256 fields of non-vectorized data?
Of course, those registers could be used in larger things for everything that's worthy of a local variable, but as soon as you run into a stack operation you'll either only want to push a subset of the registers to the stack, or face a harder blow of memory access times by making each function call a 2048 byte write to memory.
Explicit encoding of parallelism, hints to branch prediction, and similar stuff, seems far more appropriate.
Again, few single functions in an imperative language have 256 separate variables, without involving arrays of data. Unless the register file is addressable by index from another register (basically turning it into a very small addressed memory, which is whta you try to avoid with registers), you have little use for 256 of them. Take for example a trivial string iteration algorithm, most of those registers would be completely useless. The same holds true for common graph algorithms.
Re:We dont need more registers (Score:2, Interesting)
Clarification: It's easy to see that you move in and out of registers and force the CPU to do register renaming to get good parallelism in x86. I fail to see the benefits from a real performance standpoint when you reach above let's say 32 of each kind, and I think that the 16 available in AMD64 should be fine for most tasks. The problem in x86 is that they are eight and even those have locked meanings to some degree.
Re:We dont need more registers (Score:1)
Locked meanings? I'm not so sure. If we do a MUL EAX then the result goes into EDX:EAX. Since EAX gets clobered, it'll get renamed. Combine that with the fact that most compilers generate code that does not use instructions in which registers have special meaning anyway and I don't think this is actually a problem.
Re:We dont need more registers (Score:4, Informative)
It does. Take a look at x86-64. The 98% reason 64 bit x86 code is faster when you are using less than 4 gigs of RAM is the fact it has double the registers. With the same number of registers, 64 bit code normally slows things down measurably because the pointer size doubled. The instruction word length doesn't change.
256 registers goes a bit far unless half of them are predication bits.
Re:We dont need more registers (Score:4, Interesting)
The Itanium has a huge file with, IIRC, even more registers in total. They are not inter-changeable, though, but the (almost) only point in that would be to keep the total number of registers down, while being flexible for most types of code. As I think that it's generally actually easier to make them separate for different execution units, that's not very interesting. Also, note that the Itanium currently has a 2-cycle (again, IIRC) register access time! They tried to be visionary, adding a huge register set, in addition to some parallelism encoding and other things I mentioned in the parent, but they traded (what seems to be) far too much to get it.
A huge (defined as MMIX-like, not AMD64-like)register file might be great, but you need selective register pushing to stack to get away with it, unless you or the compiler are performing very aggressive inlining. What's easier, if you're doing assembler -- calling a function and put a local on the stack or writing a huge fricking implementation of your main algorithm, taking great care to use all different registers in each function inlining?
How MMIX Uses Its Registers (Score:2)
If I recall correctly, MMIX uses its whole huge register file as a stack. All of your instructions specify register numbers as counted from the top-of-stack. Stack space is allocated and deallocated in frames, not a register at a time. A frame must be small enough to fit in registers. The stack spills to memory if it overflows, and refills from memory if it underflows. It does not have to spill/refill on a frame boundary. But activation records for compiled C routines could nest five or six deep and not spi
Re:We dont need more registers (Score:1)
Re:We dont need more power (Score:3, Interesting)
As with dual CPU motherboards, you go to dual, when you cant get anything else out of the single...
10GHz CPU, lol. Why not release one that requires a 100GHz clock? If its only processing every 30th cycle, whats the big deal? Oversimplification I know, but that is the essence of Intels laughable strategy. Consumer ignorance vs. product innovation. Well take the ignorance. How long can it las
Re:We dont need more power (Score:5, Funny)
TERRORIST!
Re:We dont need more power (Score:2, Interesting)
Re:We don't need more "power" (Score:2)
My guess is that this would work wonderfully for certain classes of problems, and would be quite useful for things like finite element analysis, MPEG encoding, and the like. The main problem is that a FPGA takes a fair bit of time to load its configuration file. Obviously, you would not want to multitask between two different applications trying to use this FPGA. Otherwise, you
Re:We don't need more "power" (Score:1)
More power will lead to more bloat.. (Score:5, Insightful)
Pun (Score:3, Funny)
Re:More power will lead to more bloat.. (Score:5, Interesting)
(Here's a simple cost analysis: We can pay this guy $100k/year to do hand-optimized tweaks on this code that then becomes a liability for future maintanence if that coder dies, quits, or whatever. Or, we could add another stick of $100 RAM, and buy a new processor next year for a fraction of his cost and get a similar performance bump... The math doesn't add up...)
Re:More power will lead to more bloat.. (Score:5, Insightful)
Getting back to your point, there is still a market for hand-coders. With most consumer electronics, I'm talking kid's toys, alarm clocks, talking dolls, you try to shave off every penny you can in manufacturing costs. Plus, once you start a product line, you run it out for years.
In that case, of high volume and low cost, it is easy to absorb the cost of a $100,000 hand coder. Especially if he can save you $0.10 a unit on lines where volume is measured in the millions of units.
Besides, most of the "hand coders" I know work more in the $36,000 dollar range.
Re:More power will lead to more bloat.. (Score:2)
Heck, Microsoft pays 50K-60K for an undergrad degree fresh grad.
Re:More power will lead to more bloat.. (Score:3, Interesting)
Speed is rarely an issue t
Quick answer (Score:5, Interesting)
Its only new software thats sucking up all the extra processing power.
Remember back with really sluggish 33mhz 486s etc (and a lot lower) and thinking of the ultimate computer being a whole 50mhz.
Well now you got a computer thats over 10 times faster with practically infinate capacity.
Fire up that old operating system and run you original software, you will be in heaven!
Re:Quick answer (Score:1)
Re:Quick answer (Score:1)
Re:Quick answer (Score:1)
Re:Quick answer (Score:2, Funny)
Re:Quick answer (Score:1)
Re:Quick answer (Score:2)
I remember when my 16 MHz 386 machine was the hottest thing around - blew the doors off of the 6 to 8 MHz AT's. Shortly after buying the 386, I picked up a copy of Gato which used timing loops intended for the 4.77 MHz 8088 - went w-a-y too fast to be playable until I learned how to set the clock speed compensation on the game.
Before that when an 8 MHz 8086 was pretty hot stuff (w
x86 centric (Score:3, Insightful)
Re:Which fanboy are you? (Score:1)
dividends from MSFT havn't been good in years (Score:2)
july 20, 2004 was pretty sweet....
Unbloated URL (Score:5, Informative)
Same article without 90% of the ad-bloat.
Re:Unbloated URL (Score:2)
Re:Unbloated URL (Score:1)
I like AnandTech. The articles are generally decent. Why try to screw the guy over like this?
Eliminate Bottlenecks (Score:5, Interesting)
Re:Eliminate Bottlenecks (Score:1)
In fact I would like to see research done on what operations are considered slow. For instance, if your word processor takes 1 sec to update the screen it is considered slow. But nobody will pay any attention if the DVD Burning takes 5 or 10 min..
Re:Eliminate Bottlenecks (Score:1)
Re:Eliminate Bottlenecks (Score:4, Informative)
L1 & L2 Cache: Almost instantanious, Picoosecond resonse time
L3 and higher Cache: A bit slower, but still pretty quick, Nano resonse time
Main memmory: Go do something else while waiting for this, Nano/Microsecond resonse time
Hard Drive: Go to lunch and come back, Milisecond resonse time
Limitations... (Score:3, Funny)
Ummm, my home machine has a 400MHz processor running Suse. I'm thinking of upgrading, as I have every 6 months for 5 years, but I just keep waiting for the "next" best thing rather than upgrading now.
There are mobile phones more powerful than my home PC, but it does the job.
The wonder of these future boxes is that we will STILL be able to write code that makes them run slow. Roll on Longhorn I say!
Re:Limitations... (Score:2)
Re:Limitations... (Score:4, Interesting)
Well, each version of Windows seems to bring about new hardware requirements. Most people buy a new Windows version with new hardware. It is more than just a little coincidence. I think Microsoft is well aware that most people aren't able to install Windows themselves, and that making them believe you'll need a faster box is a good idea to keep them upgrading to the "next" level, both on software and hardware.
Kjella
Myth of the single threaded desktop (Score:1, Interesting)
I expect that once multi-core desktop cpu's become more prevalent, the advantage of multi-threaded programming will become evident and start
Re:Myth of the single threaded desktop (Score:4, Interesting)
1) Programming for two or more processors is more work, and prone to more subtle and strange errors.
2) Most people only have one processor.
You can draw the obvious conclusions.
Fact #1 can be dealt with by proper techniquie, training, and tools.
Fact #2 is going to change due to the inability of AMD, Intel to deliver over 4GHz.
Re:Myth of the single threaded desktop (Score:3, Insightful)
Threaded apps, and multitasking OSes have been around for years. Even if an app is single threaded, the user is still benefited by having 2 or more processors because the system is still very responsive, even if one app has one CPU completely pegged.
Re:Myth of the single threaded desktop (Score:2)
Blocking man. There is a ready queue and a blocked list for a reason. Those disk accesses aren't instantaneous. Neither is waiting for input from the user, or waiting on a socket. If your thread is blocked, there might be other work that you can do while you wait.
Not particularly good for Windows (Score:4, Interesting)
Most OS's these days are not monolithic. Even MS is really a collection of smaller pieces, but not nearly to the degreee of Linux.
Linux just scales better than Windows on multiple CPUs. I have no doubt that MS will work indian programers day and night to catch up, but this is a game they are definetly playing catch up in.
Linux, in some versions is scalling past 64 CPUs now (oh the benefits of forked kernel development!), which should factor nicely when time comes that AMD ('cause may not be around then) is pushing ships with dozens if not hundreds of micro-cores.
Last I checked (and I may be out of date on this) Windows started bogging on 4 CPUs. And never mind it's assanine global message loop.
I fully realize Joe User cares more about percieved performance than real performance (long live xorg!), and explaining Linux's advanced scaling architecture will not win over the desktop, but it will have a signifigant impact on technical decision markets; from servers to embeded devices (HUGE market for these clustered chips).
Re:Not particularly good for Windows (Score:2)
Re:Not particularly good for Windows (Score:1)
User defined branch prediction (Score:2, Interesting)
Re:User defined branch prediction (Score:2, Insightful)
Re:User defined branch prediction (Score:2)
Re:User defined branch prediction (Score:2)
Forget everything you are told about X being optimal, and Y being old hat. Computer architectures come and go like bell bottoms and short skirts.
Branch prediction is a workaround. It is not a radical performance enhancing technology. It is there to keep the CPU busy when it would otherwise be starved for instructions and data. Branch prediction is simply there to allow the CPU to operate at an insanely high clock spe
Re:User defined branch prediction (Score:1)
Re:User defined branch prediction (Score:2)
The "Cell" architecture does something similar to what you've described - different cells handle different tasks of a multimedia system, say set-top box or Playstation 3. Better statistics modeling is what's needed in terms of b
Re:User defined branch prediction (Score:1)
i don't think much of this article (Score:1)
The only reference made to AMD is regarding their ingenious SOI technology. With the exception of that, the focus is maintained on Intel, (whom he calls the "#1 in the CPU market"). I find that somewhat absurd, since Intel is largely failing (stretching an obsolete architecture to extreme limits by extending the pipeline) where AMD is innovating and has already largely surpassed them.
AMD's CPU does a hell of alot more per clock cycle than Intel's. The AMD 64 bit
Re:i don't think much of this article (Score:1)
And, yeah, the only Intel CPU I currently like is the Pentium M, and I hope you can forgive that. If I would currently buy a new main machine, it would probably be AMD, but I'm holding out for dual-core releases from them. I like the effects that both "real" SMP and hyper-threading has
Re:i don't think much of this article (Score:2)
4 GHz as a wavelength (:-)) (Score:2)
My leaky brain suggests that this might correspond to the propogation speed in silicon for a given path length and a given process (eg, 90nm may give us better results).
--dave
Assumptions (Score:2)
If you are desperate to run your word processor or spreadsheet faster, then he's got a point. But realistically, don't the current systems already run those kinds of programs just fine? Is this the kind of application where more speed is most needed?
I think Sony have got it right with their whole "media processor" approach, with high
Re:Assumptions (Score:2)
Compare a game at Pentium I 100mhz and a Pentium II 200mhz, there is a massive difference. That's off by just 100mhz.
Compare a game at Pentium IV 1.4ghz and a Pentium IV 2.0ghz, there is hardly any difference. That's off by 600mhz.
The industry is obsessed with number crunching and generic software number benchmarking. Which is a bad measurement altogether.
More CPU reading (Score:1)
It is crazy how far we have come.
-The only sig I have is a cig with a good single malt.
just a thought (Score:4, Funny)
Re:just a thought (Score:3, Informative)
Leakage is not available as a power source. Leakage is turned into heat in that exact location where the leakage occurs.
In addition to the power of and.... (Score:1)