The Fight Against Dark Silicon 137
An anonymous reader writes "What do you do when chips get too hot to take advantage of all of those transistors that Moore's Law provides? You turn them off, and end up with a lot of dark silicon — transistors that lie unused because of power limitations. As detailed in MIT Technology Review, Researchers at UC San Diego are fighting dark silicon with a new kind of processor for mobile phones that employs a hundred or so specialized cores. They achieve 11x improvement in energy efficiency by doing so."
That's not the solution, this is (Score:5, Informative)
Re: (Score:2)
programmer-safe language.
That's just asking for trouble,that's like saying a keyboard is safe from illiterate people because it has letters printed on the keys.
Re: (Score:3)
...that's like saying a keyboard is safe from illiterate people because it has letters printed on the keys.
Sadly, that statement is true. An illiterate person will shy away from a keyboard, an on screen (TV) menu, a newspaper, etc. the same way someone who is broke is embarrassed by the sight of a checkbook or wallet... it becomes a reflex. I know someone who is a good intuitive mechanic, but somehow managed to get to adulthood with less than third grade reading and writing skills. Left to himself, a typical 5 page job application takes a couple of hours and many phone calls to complete. Now he has a 2 year old
Re: (Score:2)
cats are illiterate, they walk all over the bloody keyboard causing all kinds of havoc.
"I know someone who is a good intuitive mechanic, but somehow managed to get to adulthood with less than third grade reading and writing skills.",
quite possible the way that he learns things (ergo... schools are crap)
I have/had that problem, in that language is generally poorly designed and people like to fuck with other peoples heads. But I worked out how they do that now and it kind of, mostly, started to sort itself ou
Re: (Score:1)
My Dad was a bit like that, Mum introduced him to Science Fiction and made him read enough to get him hooked on the story. He now has no problem with reading.
Took me longer than most kids to pick up reading so she used the same technique on me, couple of years later my reading comprehension was far above my age group.
Writing and spelling have never really caught up though. still have problems with that at 30.
Re: (Score:1)
it's in the self.
Re: (Score:2)
Left to himself, a typical 5 page job application takes a couple of hours and many phone calls to complete.
Not so many phone calls, but job applications can take me a while. The "spray and pray" variety may be useful if you're unemployed, but if you already have a job and it's one of those rare opportunities I could easily spend 2 hours on it. Not because of language problems but for making the best possible application for the position. It's usually well spent time.
Re: (Score:3)
Re: (Score:2)
Definitely. As a programmer myself, I can switch *language* pretty quick. There are even some pretty easy to use GUI tools out there where "normal non-programmers" can implement something.
The problem is that very few people seem to be able to LOGICALLY solve a problem, that is define what should happen when certain conditions are met. Basically the definition of "what should the program do exactly?". Getting THAT defined is 90% of the "programming" problem. And that can't really be solved by different langu
Re: (Score:1)
ohhh.. often the problem is that they can define when logical conditions are met, they just can't then generalize and turn it into patterns etc... and well that's all too much like hard work when I can just hack and slash myself through the day...
it's like they've written a function in C++ but the body of the function looks more like very bad prologue.
void foobar(int &a)
{
int tmp = a;
if (a = 1)
{
a*=a;
a=(int)sqrt(float)a));
a++;
if (a + 1 = 2 )
{
printf ("goofie%d", a);
}
a--;
} else if (a
Re: (Score:3)
Re:That's not the solution, this is (Score:5, Interesting)
Uuum, no need to learn some obscure weird language that doesn't even exist yet, when you can learn a (less) obscure weird language that already exists. ;)
Haskell already has provable thread-safe implicit parallelization. In more than one form even. You can just tell the compiler to make the resulting binary "-threaded". You can use thread sparks. And that's only the main implementations.
Plus it is a language of almost orgasmic elegance on the forefront of research that still is as fast as old hag grandma C and its ugly cellar mutant C++.
Requires the programmer to think on a higher level though. No pointer monkeys and memory management wheel reinventors. (Although you can still do both if you really want to.)
Yes, good sir, you can *officially* call me a fanboy. ;))
But at least I'm a fan of something that actually exists!
(Oh, and its IRC channel is the nicest one I've ever been to. :)
Re: (Score:1)
Re: (Score:2)
You're right about Haskell being a beautiful language, but it is not as fast as C/C++.
Depends on the problem. My previous company found the Haskell proxy we wrote for testing could handle 5x the load of the best (thread-based) C++ implementation.
Re: (Score:2)
You're right about Haskell being a beautiful language, but it is not as fast as C/C++. Even Java is usually faster. It's still pretty fast for a declarative language and has a C interface for when you need to speed up certain parts of code.
Who cares? CPUs are 1000X as fast as they were 12 years ago, but I/O speed has barely changed. There are no CPU-bound problems anymore.
The only thing that matters is programmer efficiency. It takes 5X as long to write C code as to solve the same problem in a modern language.
Re: (Score:2)
Plus it is a language of almost orgasmic elegance on the forefront of research that still is as fast as old hag grandma C and its ugly cellar mutant C++.
People always claim this. And always against C and C++. It's essentially never true except for a) FORTRAN and b) occasional synthetic benchmarks. While it is undeniable elegant, the lack of for-loops is anything but elegant in scientific computation, image processing etc.
Does Haskell allow you to parameterize types with integers yet? It didn't last time I
for vs. map (Score:2)
the lack of for-loops is anything but elegant in scientific computation, image processing etc.
What's the difference between imperative "for" and functional "map" for iterating through a collection? Python has both, and I end up using generator expressions (which use syntax not unlike "for" and the semantics of "map") at least as often as an ordinary for-loop.
Re: (Score:2)
Perhaps the inelegance comes because, with for loops, you can:
and you can't do any of that with a map operation. Not the ones I've seen, anyway.
Examples from Python itertools (Score:2)
with for loops, you can: * break out of it early
Some cases of breaking early can be represented as composition of iterator operations: "Find the first ten elements that meet these criteria" is something like islice(ifilter(criterion, seq), 10) where criterion is a function returning nonzero for elements that match. "Find all elements until the first not meeting the criteria" is takewhile(crit
Re: (Score:2)
Yeah, itertools seems like a really nice library. I wish I had something similar in other languages.
But, you know, the itertools functions are just inefficient applications of a normal map operation. Islice, for example, iterates from the start, skipping everything until it gets to the elements you want to process. A proper for loop does not need those nop iterations, and is thus more elegant (for a certain definition of elegant).
I will grant that breaking out of the iteration works just peachy; I figure Py
Re: (Score:2)
Islice, for example, iterates from the start, skipping everything until it gets to the elements you want to process. A proper for loop does not need those nop iterations
If you know all the elements up front, in a numbered sequence of some sort (such as an array), you can use a regular slice (e.g. some_list[10:20]).
Re: (Score:2)
So you agree that the map operation is less elegant than the for loop in tricky iteration situations? In that the for loop can replace a map, a custom recursive loop with exit conditions, AND this particular application of pattern matching, and all at the same time to boot? :-)
Python generators, parallel map, distributed map (Score:2)
Ya know, the only people I've ever seen argue for the use of map vs for() are those academia types who never write useful code to begin with.
That and anyone who has ever worked with a framework that implements parallel map (e.g. Grand Central Dispatch [wikipedia.org]) or distributed map (e.g. Google MapReduce [wikipedia.org]).
Why do those [sex analogy] always think that for() loops only have 2-3 lines within them and no real logical operations ?
In a lot of cases, the body of a for loop can be refactored into a separate function. Such loops can be considered as having only a call to that function as the body, in which case map is equivalent.
Sure, map[] may replace first year CS student for() loops, but that's about it.
Tell that to all the generator expressions* found in the real-world Python code that I write for real-world work where I earn my real-world paycheck.
* In P
Re: (Score:2)
Asking for "for loops" will make most functional programmers chuckle. Usually what you want is a fold (or a special fold like a filter or a map). Speaking of parallelization, the semantics of generalized for loops require that each iteration be performed sequentially. What if you want to perform each iteration in parallel?
As for number-parameterized types, I haven't dealt with it myself, but I'll just leave this here: Number-parameterized types by Oleg [psu.edu]
Re: (Score:2)
I think you're behind the curve a bit - some review of the features of functional languages may be in order. Haskell really is fast, from what I've read and seen. Some of the 'program some standard thing in a zillion languages' websites have example Haskell implementations that are pretty performant.
The key to the 'no loops' issue is simple - tail recursion [wikipedia.org]. I quote excerpts:
In computer science, a tail call is a subroutine call that happens inside another procedure and that produces a return value, which is then immediately returned by the calling procedure.
[...]
Tail calls are significant because they can be implemented without adding a new stack frame to the call stack. Most of the frame of the current procedure is not needed any more, and it can be replaced by the frame of the tail call, modified as appropriate. The program can then jump to the called subroutine.
[...]
in functional programming languages, tail call elimination is often guaranteed by the language standard, and this guarantee allows using recursion, in particular tail recursion, in place of loops.
I personally like Erlang better - it's more of a 'real world' rather than 'ivory tower' language in its approach and I find it easi
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Haskell [...] is as fast as old hag grandma C and its ugly cellar mutant C++.
As I understand it, purely functional languages use a lot of short-lived immutable objects and therefore generate a lot more garbage than languages that rely on updating objects in place. If your target machine is a handheld device with only 4 MB of RAM, this garbage can mean the difference between your design fitting and not fitting. And for a design on the edge of fitting, this garbage can mean the difference between being able to keep all data in RAM and slowing down to read the flash over and over.
Re: (Score:2)
I wouldn't use Haskell (or Erlang) to write a device driver; I wouldn't use C to write much of anything else. Just as we have automotive vehicles that range from scooters to huge Terex earthmovers, it's important to use the right tool for the job. A friend of mine used to have a Volkswagen bug that had been converted to a pickup truck. It wasn't pretty! :D I wouldn't run a million row relational database on your 4MB device either.
Re: (Score:2)
I like Haskell but it has its warts.
The main problem of Haskell is going "full functional", with monads, etc. Monads are very difficult to understand and master.
Still, I think Haskell is much more close to "the solution" than Lisp for example. (or maybe Scala gets better)
Not to mention it's great to play with Hugh/GHC with its interactive console
Re: (Score:2)
The main problem of Haskell is going "full functional", with monads, etc. Monads are very difficult to understand and master.
Monads are what makes Haskell interesting. Take a look at some of the stuff like the STM implementation from Simon Peyton-Jones's group, for example. Functional programming without monads is just imperative programming with a bunch of irritating and pointless constraints.
Re: (Score:1)
Re:That's not the solution, this is (Score:5, Funny)
Re: (Score:2)
Java is the overachieving straight A's wants-to-do-everything crazy girl that can't settle on what she wants to do because she's busy doing it all but not being particularly good at any one thing.
Re: (Score:2)
Re: (Score:2)
What's your plan of attack on GC? Reference counting doesn't pause; but fails if you create cyclic references. Mark and Sweep doesn't have that problem; but creates the dreaded pause. The state of the art, AFAIK, is to check for recently created objects and kill them early (generational GC). There are heuristics to avoid a full mark-and-sweep; but AFAIK there aren't any airtight algorithms.
Now I wonder, is it possible to do a static analysis on a parse tree for some language and determine whether or not
Re: (Score:2)
Re: (Score:1)
Re: (Score:3)
Re: (Score:2)
Now this is what Slashdot should be. Your answer is a bit more clear to me than the "borrown/own semantics" the other poster described. Since Haskell is "type oriented", would anybody be aware if it does this kind of analysis?
Re: (Score:2)
It took me a bit longer to parse your answer than the one further down.
As long as I've been studying languages this is the first I've heard of the term "single assignment". After looking at the wiki for it, it seems to be nothing more than the consequence of being "purely functional". Maybe that's why I haven't heard it yet, the former being quite common in things I've read. Borrow/own is obvious enough.
I'm not sure if limiting yourself to purely functional programming qualifies as a "cheat" or not. Ever
Re: (Score:2)
The Cruelty of Really Teaching Computer Science (Score:2)
I think the main challenge will be making such a language intuitive
Dijkstra wrote that programming will never be intuitive [wikipedia.org].
Re: (Score:2)
I don't think that's an answer to the same problem. The problem is that it simply isn't possible to make a general purpose processor arbitrarily small due to power dissipation. You can parallelize all you want, you still might not hit the same performance for specific tasks that optimizing the processor architecture itself will. Quite clever if chips customized to particular phones can be cost effective.
Re: (Score:2)
Re: (Score:2)
Actually, no. Dark Silicon is not about having too many cores to effectively use them all at the same time. It's about maintaining a power envelope when your number of transistors is going up.
Currently we lower the voltage when we increase the number of transistors, which keeps power usage and heat generation in check. But we're at the limit of what voltage will work, especially since electron leakage becomes more of a problem the smaller your transistor. So dark silicon will be necessary to ensure reasonab
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Not required.. (Score:5, Informative)
Re: (Score:2)
The two biggest power draws are the screen and radios. This is what needs to be made more efficient.
With proper GUI design and AMOLED screens, the screen power draw can be drastically reduced but things like the 3G radio drain power like mad if the signal isn't perfectly strong (while 4G radios gobble power under all circumstances).
Re: (Score:3)
Re: (Score:1)
Re: (Score:2)
Well, some companies are working on HUD goggles for personal computers, so I guess that's a step in the right direction, even if it does make you look like a total dork.
Re: (Score:2)
Stareing into someone's eyes, you'll be able to see the porn they are viewing reflected off their cornea.
Re: (Score:2)
Re: (Score:2)
Unfortunately, there's a pretty fundamental problem with making more efficient transceivers. They have to operate, by their very nature, at high frequencies. High frequency signals inevitably draw more current, because they see capacitors as having a low impedance. Basic EE stuff: Z = 1/(jwC). And how do we generate the radio frequency? With a VCO that invariably involves big capacitors (big for an IC, at any rate). Those VCOs typically end up drawing at least 50-60% of your operating current.
Another
Re: (Score:2)
Re: (Score:2)
Lots of problems with the laser... how does the phone know where the closest tower is, especially if you turn it off while flying across the country or the world? What happens if something gets in the way of the laser? Can you even use it inside?
A directional, directable antenna might be possible, but it still presents some problems. You'd need a moving part with three axes of motion that can respond as quickly as a person swings a phone from one ear to another, and you'd need the motors behind it to ope
Re: (Score:2)
Re: (Score:2)
I wonder if you could make a high gain steerable antenna to track the dish on the cell tower while you're transmitting?
What you described could be done without physically moving the antennas. Read a paper on it a few years ago (sorry, no link) where some researchers built an antenna on a chip that consisted of hundreds of different physical antennas. By applying the signal to different antenna at different times, a directional beam can be formed, much like yagi. But unlike a yagi, the beam can be sent in any direction, one just has to alter the timing. I believe it is similar to how modern RADAR works.
Re: (Score:2)
Re: (Score:2)
You can't really get around using a VCO. Ceramic resonators lack the accuracy needed for complex modulation. You need the sub-100 ppm accuracy of a crystal, but crystals can't operate in the GHz range. So you use the crystal as a reference for a PLL and multiply the frequency up to the desired level, and VCOs are an integral part of PLLs.
Also, using a ceramic resonator wouldn't allow for "smaller" transceivers. Transceivers are already integrated circuits, with the only external components being a cryst
Re: (Score:1)
And how do we generate the radio frequency? With a VCO that invariably involves big capacitors (big for an IC, at any rate). Those VCOs typically end up drawing at least 50-60% of your operating current.
I can tell you from direct experience that the VCO (voltage controlled oscillator) used to generate the high frequency carrier is not the issue. The radio I'm messing with right now, in standby with the xtal and VCO running draws 200uA. 50uA with just the xtal. The problem with the radio's is
1) The demodulation circuitry is computationally expensive. Though as dies shrink this becomes less of an issue.
2) Transmit power, here you are up against a wall. You need to transmit a high enough power so that th
Re: (Score:2)
What frequency are you working in? Obviously something like AM/FM radio, RFID, or TV will draw low current because the frequency is relatively low. But once you start getting up to the GHz range, there's no way a VCO could draw that little current, unless your entire radio had less than 30 fF of capacitance. Are you sure it's the RF VCO that's running, and not some IF one?
Re: (Score:2)
The capacitor in a VCO is in a tuned circuit so the circulating power can indeed be high but the actual power draw is much much lower. If this was not the case, then the tuned circuit Q would be low leading to high oscillator noise.
Close in phase noise is actually a significant
Re: (Score:3)
The CPU in a cell phone does not use much power so there is little to gain.
Except when it's running Flash video or similar crap.
Re: (Score:2)
Link to attached Paper about specialized cores... (Score:3, Informative)
http://cseweb.ucsd.edu/users/swanson/papers/Asplos2010CCores.pdf [ucsd.edu]
They call the specialized cores "c-cores" in the paper. I took a quick skim through it. C-cores seem like a bunch of FPGA's and they take stable apps and synthesize it down to FPGA cells with the use of the OS on the fly. The C-core to hardware chain has Verilog and Synopsis in it.
Cool tech, guess they could add gated clocking and all the other things taught in classroom to further turnoff these c-cores when needed.
cheers.
Nice idea... (Score:2)
A couple of thoughts:
1. The common functionalities surely would include OS API's, as they seem pretty stable. But would they include common applications such as social networking apps, office apps, etc.?
2. If a patch is necessary, then upgrading hardware might be a little tricky. This will become a serious issue with the invasion of malware.
ZX81 logic array 100% used... (Score:1)
The Sinclair ZX81 replaced fourteen of the chips used on the ZX80 with one big programmable logic array chip that was only supposed to have 70% of the gates programmed in it. However, Sinclair used up all the gates on the chip and it ran nice and hot because of that. I suppose that the design could have used two chips instead, leading to lots of dark silicon and a cost implication.
Benchmarks in June. (Score:2)
I realise openjdk's is stack-based vm and dalvik is register-based. But aren't they essentially mapping virtual machine instructions to hardware instructions? In a rudimentary manner this was tried a decade ago with Java. It was found that general purpose processors would spank a Java-CPU in performance due to the way that a VM would interact with a JIT instead of processing raw instructions.
[Aside - ARM does include instructions for JVM-like code - Jazelle/ThumbEE. Can/does Dalvik even take advantage ?]
Th
Re: (Score:2)
A quick Google search only turns up one serious discussion [mail-archive.com] about the possibility of a ThumbEE - oriented Dalvik. The only reply wasn't very optimistic about it, saying that a 16-cycle mode switch between ThumbEE and regular instructions makes it unlikely to be worth it.
More's the pity- I really think VM guys and processor design folks need to get their heads together.
Re: (Score:2)
Cheers. I'm assuming the original instructions were concocted for Sun's proprietary Java ME/SE embedded platforms - AFAIK, none of which supported has made it into phoneME, openjdk.
Maybe if MIPS had 'won' on phones we'd greater synergy e.g. The reverse of NestedVM.
If they can get my phone to last a week or more (Score:2)
Re:If they can get my phone to last a week or more (Score:5, Insightful)
They can, they just don't want to. All they have to do is make it slightly thicker amd double the size of the battery. /without rebooting/.
Heck, I want to see a phone where the battery is the back cover(like the old Nokia dumbphones), and also has a small second battery inside it, something that can power the ram/cpu for 5 minutes.
Then, you can just yank the dead battery, plug a new one in
It would also allow for multiple battery sizes: Want a slim phone? Ok, use a small battery. Need two weeks of life? use a large battery.
Easy solution.
Re: (Score:2)
This is one of the great mysteries of the phone market, a situation where it seems to my ignorant amateur eyes that they're doing the same thing as the MPAA companies: saying, "No, we don't want your money. Fuck off, customers. Go find someone else to do business with."
Wouldn't a 2011 phone whose battery lasts as long as a 2006 phone sell like hotcakes? Is "slim" really all that "cool?"
Re: (Score:2)
Honestly, yes. Go look on the Maemo forum: There's a "mugen" aftermarket battery with double the capacity of stock. It makes the phone about a quarter-inch thicker, though. /already/ too fat!
Some people like it, others complain that the phone's
Re: (Score:2)
long ago i saw phones from philips(!) that they claimed had a battery life measured in months. the main problem with your need is that most people are ok with 5-6 days of battery on a smartphone.
cost inefficient (Score:1)
Drama (Score:1)
Dark Silicon: Luke, I am your father.
asynchronous design ? (Score:1)
the claim is that this is the most power efficient design route.
the problem is that there just aren't the sophisticated tool sets you need for design and analysis.
of course I've never been clear on why you couldn't just use the asynchronous design ideas and substitute
very low clock speeds in place of disables or some such thing.
not a digital designer so can't get too far into the details.
But what do you put in a specialized core? (Score:5, Insightful)
Specialized CPU elements have been tried. The track record to date is roughly this:
A lot of things which you might think would help turn out to be a lose. Superscalar machines and optimizing compilers do a good job on inner loops today. (If it's not in an inner loop, it's probably not being executed enough to justify custom hardware.)
Re: (Score:2)
One thing that I think *would* be a win for scientific calculation programs of many sorts:
A program that takes two arrays of doubles,
A1,A2,A3,A4... A98, A99,A100...
B1, B2, B3,B4... B98,B99,B100...
And given the start of A, start of B, and number of elements in each, parallelizes the sumproduct of
A1*B(N) + A2*B(N-1) + A3*B(N-2)... + A(N-1)B2 + A(N)B1.
The reason for this, is that many, many differential equation initial-value-problems can be solved exactly using the Parker-Sochacki solution to the Picard itera
Re: (Score:2)
Depending on the precision that you require, you could bolt those either 2 doubles at a time, 4 floats at a time, or 8 fixed-point 16-bit integers at a time into SSE registers and operate on them in parallel. There's some bizarrely cool byte-order rearranging assembly instructions that you could use to help you get the list of B values flipped around, couple that with a couple of cache hints and I think you could compute that reasonably quickly.
Re: (Score:2)
This reminds me of a funny experience I had in school a long time ago. The school had a brand new 'Harris 220' (formerly Perkin Elmer) timesharing machine. It used a timesliced architecture where every task gets its little slice of the CPU every so many milliseconds. I was helping a math student implement a small program to produce ten values of a particular function - I recall it was a Bessel function but I'm not sure - to 7 or 8 decimal places. The function converges very slowly, so we kept adding ite
Re: (Score:2)
I think he's referring to the x86 task feature, whereby the hardware would actually handle a context-switch instead of having the software set everything up manually. Neither Linux, nor Windows, nor the BSDs (and thus Mac OS X) use this feature, although the very earliest versions of Linux did. It's faster and less error-prone to do it in software.
Call gates are also not used, if they indeed ever really were. Old days: interrupts for system calls. These days: syscall or sysenter instructions, which are
CPU power usage, really? (Score:1)
So here is the power usage breakdown from my Samsung Galaxy S running Froyo:
Display: 89%
Maps: 5%
Wifi: 3%
Cell Standby: 2%
So how is enabling "Dark Silicon" going to help the power usage on my phone when the display uses the
vast majority of the power?
Dark Silicon? (Score:2)
We can't see it, we can only detect it by its power draw, and it makes up 95% of your chips!
Re: (Score:3)
Watch out for programmers (Score:2)
You can't win, because when a performance hacker reads this, he thinks, "Ooh, such waste! I need to parallelize all my stuff to increase utilization. Light 'em up!"
I'm forced to ask (Score:2)
Why do we need ever more powerful phones? I don't think people are going to want to run CFDs, protein-folding, or SETI-at-home on their phone.
On the other hand, if the phone consumes 11 times less power, you could go a few months without charging it which would be good.
Re: (Score:2)
Re: (Score:2)
Ah...okay then how about something that converts brainwaves into phone-charging energy so when people are talking on the phone constantly while they're driving it's being charged. Oh, wait, that would assume that they have brainwaves. My bad.
Some other solitions to dark silicon... (Score:2)
This is definitely an interesting approach they're taking.
In my research group, we're looking at a different tactic called near-threshold computing. Say you have a 32nm device that uses 100W at 1V. If you were to run it at 400mV, it would use about 1W, but logic slows by a factor of 10. So that 100X reduction in power translates into a 10X reduction in energy.
Fast-forward to 11nm, where the transistor density is 10X what it is at 32nm. Nominal voltage won't go down much, so without doing something drast
Offtopic (Score:1)
Novell approach to saving energy. (Score:1)
Re: (Score:3)
From the article, it seems like the processor usage would be transparent such that you don't need to explicitly target each processing element directly.
Re: (Score:2)
Re: (Score:3, Funny)
Why is it dark silicon they fight against? This represents the struggle of the black man to overcome racial prejudice and retake the word "nigger". The parallels are deep, man.
Exactly. Why do you think green olives are in glass jars and black olives are in tin cans? So the black olives can't look out. It's subliminal racism I tell you.