Hyperthreading Hurts Server Performance? 255
sebFlyte writes "ZDNet is reporting that enabling Intel's new Hyperthreading Technology on your servers could lead to markedly decreased performance, according to some developers who have been looking into problems that have been occurring since HT has been shipping automatically activated. One MS developer from the SQL server team put it simply: 'Our customers observed very interesting behaviour on high-end HT-enabled hardware. They noticed that in some cases when high load is applied SQL Server CPU usage increases significantly but SQL Server performance degrades.' Another developer, this time from Citrix, was just as blunt. 'It's ironic. Intel had sold hyperthreading as something that gave performance gains to heavily threaded software. SQL Server is very thread-intensive, but it suffers. In fact, I've never seen performance improvement on server software with hyperthreading enabled. We recommend customers disable it.'"
This is news? (Score:5, Informative)
Re:This is news? (Score:5, Interesting)
In fact turning it off results in a 20+ percent query time, especially with multiple fulltime queries.
Of course differently written queries and different systems/sql engines might behave differently.
In fact I am so happy with HT, that I am going to change my desktop to one, as it is a linux machine with lots of running apps at the same time. Not mentioning that it is also a devel station with SQL+apache that benefited with HT according to my experience.
(well it is time to upgrade anyway, and I choose HT over non HT).
Re:This is news? (Score:3, Interesting)
Experience here shows the servers I deal with running Linux 2.6 kernel/Apache/MySQL and dual Xeons up to 6GB is that turning HT on as well reduces performance. When a CPU fan failed and one CPU had to be temporarily removed, however, there was a clear benefit turning it on with t
Re:This is news? (Score:3, Interesting)
I was talking about a single proc and HT,
I imagine that with dual + HT it is different. I do not see why it is happening. Actually if I bought an expensive server and experienced that, I might try to get some official explanation
for the problem.
I wonder If you tried BSD or Windows on the same or similar hardware, that might be some OS specific problem as well.
Hmm, Google on it I will.
Re:This is news? (Score:2, Interesting)
The above (as you actually imply) is about single socket uniproc DESKTOP systems, not the servers (generally at least 2, 4+ socket) and server apps we're talking about.
As a matter of fact, both Intel's current dual socket, dual core cpu (Paxville DP) and the follow-on dual core Demp
Re:This is news? (Score:5, Interesting)
People also have to trouble themselves to configure things properly which isn't the obvious or the default. HT pretends to Windows that its another processor but as you know it isn't. So you have to set SQL Server's '# of processors for parallel processing' setting to the number of real processors, not virtual. We changed ours to this spec and performance went up markedly. SQL Server defaults to what Win tells it the number of procs are and tries to run a full CPU's worth of load on the HT. Not gonna happen.
Re:This is news? (Score:2)
Re:This is news? (Score:3, Interesting)
The only situation I can imagine is if SQLServer spawns say, 2 threads per CPU for performance. But this is a cheating way to get more CPU time and I wouldn't expect a _server_ class program to do such a thing when such a program would tend to expect its getting dedicated CPU anyway.
wrong on both counts (Score:5, Informative)
Running 2 threads per CPU is not cheating. It's normal to run 1 thread per CPU plus 1 thread per concurrent blocking IO operation. That could come out to be 2 threads per CPU.
Re:This is news? (Score:2)
Re:This is news? (Score:3, Informative)
Re:This is news? (Score:2)
Nobody (who knew what they were talking about) ever said HT always gave a speed improvement - but database generally does benefit from it. It would be interesting to do a rigourous analyis of what the writer's situation. Since Hyperthreading is hardly "new" - Intel has been shipping it on desktop and server chips for about 3 years (as the post suggests), one wonders what else the writer is clueless about.
It's all in the name (Score:3, Insightful)
HyperThreading might help poorly written thread management (independent audio and video subsystems for example), but not true multithreading, that's for sure.
The code wasn't changed (Score:5, Informative)
Re:The code wasn't changed (Score:5, Insightful)
Re:The code wasn't changed (Score:4, Insightful)
If you want to maximize performance then you want the compiler to know as much as possible about the architecture. If you have no cache then loop unrolling is a good thing, if you have a small cache then loop unrolling can bust the cache. If you are doing large matrix manipulations, how you choose to stride the matrix, and possibly pad it is exactly dependent on the size of the cache. Now, it may be that having the applications programmer worry about it is too much to ask, but the compiler most certainly needs to worry about such detail.
Re:The code wasn't changed (Score:4, Insightful)
Re:The code wasn't changed (Score:2)
Re:The code wasn't changed (Score:2)
Re:The code wasn't changed (Score:2)
Actually, they are threads, but they aren't neccessarily threads from the same process. Each running process has at least one thread, after all; otherwise there wouldn't be anything running.
Remember that a thread is just a list of instructions to be executed sequentially. Everything running in a computer is
Re:The code wasn't changed (Score:2)
Yes, but the idea of HT is to run threads from the same process on the logical cores. Otherwise you run into exactly the cache problem that is being discussed.
While the scheduler can run different processes on the logical cores, if it does performance will suffer compared to a non-HT system. The way to get any
Re:The code wasn't changed (Score:5, Insightful)
So MS had to make a choise: ship a binary optimized for every possible mix of hw (being the processor the most important factor, but not the only one), which is impossible, or ship images compatible with any recent x86 processor/hw... without being specially optimised for any. That's why hyperthreading performance suffers.
This is an important problem on Windows because most of the time you cannot simply recompile the un-optimised software to suit your hardware, as you can in Linux, etc.
(sorry for my bad english)
Re:The code wasn't changed (Score:5, Insightful)
Lots of programs are designed with the multiple thread model in mind. Programs should not be designed with the multiple thread model plus cache limitations in mind.
Re:The code wasn't changed (Score:5, Informative)
Re:The code wasn't changed (Score:3, Interesting)
If people want multiple of threads of execution on the same processor then they should get one with two cores.
If you read the article / summary you'd see what its talking about are servers that come with HT enabled by default. Thinking off the top of my head I can't come up with a single Intel processor still being sold and used in servers today that doesn't have HT technology built in. We're not talking about people specifically buying HT processors looking to get a performance boost, we're talking a
Re:The code wasn't changed (Score:2)
And, due to enormous MS dominance, for P4 HT processors as well.
Re:The code wasn't changed (Score:2)
Which is probably why MS is so gung-ho about machine-independent bytecode (.NET) and JIT compiling these days...
Unfortunately you pay huge costs in startup time and memory usage for that.
Re:The code wasn't changed (Score:5, Insightful)
For an application like SQL Server, I'd have to disagree. Are you saying there's no one on the MSSQL team who looks at cache usage? I'd hope there were a lot of resources devoted to some fairly in-depth analysis of how the code performs on different CPUs. After all, after correctness, performance is how SQL Server is going to be judged (and criticised).
Given that a while back I watched a PDC presentation by Raymond Chen on how to avoid page faults etc in your Windows application (improving start-up times, etc), I'd say that Microsoft are no strangers to performance monitoring and analysis.
For your average Windows desktop app, then yes, worrying about cache usage on HT CPUs is way over the top. For something like SQL Server? Hell, no.
Poor mans dual-core (Score:5, Interesting)
Question I find more interesting: What is the performance gap between dual CPU vs Dual-core?
Re:Poor mans dual-core (Score:2, Informative)
Re:Poor mans dual-core (Score:2)
Re:Poor mans dual-core (Score:5, Insightful)
It's the usual answer: it depends.
We have to get rid of the notion that there is one overall system architecture that is "right" for all computing needs.
For general, every-day desktop use, there should be little difference between a dual CPU SMP box and a dual core box.
I have a small cluster consisting of AMD 64 X2 nodes, and the nodes use the FC4 SMP kernel just fine. All scheduling between CPU's is handled by the OS, and MPI/PVM apps run just as expected when using the configurations suggested for SMP nodes.
In fact, with the dual channel memory model, dual core AMD systems might be a little better than generic dual CPU, since each processor has it's "own" memory.
Re:Poor mans dual-core (Score:3, Informative)
Nope, both cores use the same bridge to access central memory so that point is moot. On the other hand, the cores of an AthlonX2 get to discuss with one another through a special link while regular multiprocessor have to use the FSB (or HyperThreading for AMD's Opterons) link, and therefore have to compete with every other device using said FSB/HT (on
Re:Poor mans dual-core (Score:5, Informative)
As far as I know, all multi-cpu AMD packages use exactly the same method to talk amongst themselves, HyperTransport. They absolutely use a private, dedicated HT bus between cores. I *think* that when you run two single core Opterons, each has a link to main memory, and they also share a direct link. In the case of a 4-die system, I think the third and fourth CPUs 'piggyback' on the 1st and 2nd... they talk to processors 1 and 2, and each other. Processors 1 and 2 do main-memory fetches on their behalf. Each CPU has its own dedicated cache, and I think the cache ends up being semi-unified... so that if something is in processor 2's cache, when processor 4 requests the data, it comes from processor 2 instead of main memory. That's not quite as fast as direct cache, but it's a LOT faster than the DRAM.
The X2 architecture is like half of a 4-way system. There's one link to main memory, and one internal link between the two CPUs... the second one is piggybacking, just like processors 3 and 4 do in a the 4-way system. It's not quite as good as a dedicated bus per processor, but the AMD architecture isn't that bandwidth-starved, and a 1gb HT link is usually fine for keeping two processors fed. You do lose a little performance, but not that much.
Intel dual cores share a single 800mhz bus, with no special link between the chips. And the Netburst architecture is extremely memory bandwidth hungry. Because of its enormous pipeline, a branch mispredict/pipeline stall hurts terribly. The RAM needs to be very very fast to refill the pipeline and get the processor moving again.
So running two Netburst processors down a single, already-starved memory bus is just Not a Good Idea. It's a crummy, slapped-together answer to the much, much better design of the AMD chips. It's a desperate solution to avoid the worst of all possible fates... not being in a high-end market segment at all.
Next year this could all be different again, but at the moment, AMD chips, particularly dual core, are a lot better from nearly every standpoint.
Re:Poor mans dual-core (Score:5, Informative)
Multi-core Opterons have a special internal crossbar switch that allow the cores to share the memory controller and HT links, they do not 'piggy back' on the other. This reduces latencies and increases bandwidth for communication between the two cores and gives both cores the equal-opportunity access to the HT ports and CPU's local RAM. With a NUMA-enabled OS, applications will run off the CPU's local RAM whenever possible to minimize bus contention and this allows Opteron servers' overall bandwidth and processing power to scale up almost linearly with the number of CPUs.
As for Intel's dual-cores, the P4 makes sub-optimal use of its very limited available bandwidth. Turning HT on in a quad-core setup where the FSB is already dry on bandwidth naturally only makes things worse by increasing bus contention. Netburst was a good idea but it was poorly executed and the shared FSB very much killed any potential for scalability. If Intel gave the P4 an integrated RAM controller and a true dual-core CPU (two cores connected through a crossbar switch to shared memory and bus controllers like AMD did for the X2s), things would look much better. I'm not buying Intel again until Intel gets this obvious bit of common sense. The CPU is the largest RAM bandwidth consumer in a system, it should have the most direct RAM access possible. Having to fill pipelines and hide latencies with distant RAM wastes many resources and a fair amount of performance - and to make this bad problem worse, Intel is doing this on a shared bus. Things will get a little better with the upcoming dual-bus chipsets with quad-channel FBDIMM but this will still put a hard limit on practical scalability thanks to the non-scalable RAM bandwidth.
On modern high-performance CPUs, shared busses kill scalability. AMD moved towards independant CPU busses with the K7 and integrated RAM controllers with the K8 to swerve around the scalability brick wall Intel was about to crash into many years ago and has kept on ramming ever since. Right now, Intel's future dual-FSB chipset is nothing more than Intel finally catching up with last millenia's dual-processor K7 platforms, only with bigger bandwidth figures.
Re:Poor mans dual-core (Score:5, Informative)
This is why SMP makers are going nuts over the Opteron. Your effective memory bandwidth scales linearly with the number of processors, assuming your processes partition nicely.
Dual Core performance... (Score:3, Interesting)
Is it two complete cores? Front Side Bus speed? Memroy Speed? etc.
The IBM 970MP that Apple is using for the dual core PowerMacs was designed right. And due to the cache snooping (among other things), a dual core 970MP can be slightly faster than a dual processor setu at the same clock and bus speeds.
Another multicore chip to look at for being done right is the Sun UltraSPARC T1 processor. Up to 8 cores with 4 threads per core. Sun's threading model in this processor d
Re:Dual Core performance... (Score:4, Insightful)
Part of the reason for this is that desktop CPUs mostly run desktop apps and most desktop apps are single-threaded so Intel and AMD could not afford to give up on single-threaded performance. This forced them to add heaps of logic to extract parallelism and Intel made many (IMO dumb) decisions in the process. The SPARC stuff is used for scientific apps which have a long history of multi-threading and distributed computing so Sun does not have to worry about single-threaded performance, allowing for much simpler, leaner and more efficient designs.
Where I think Netburst is particularly bad is the execution engine... when I read Intel's improved hyper-threading patent, I was struck in disbelief: the execution pipelines are wrapped in a replay queue that blindly re-executes uOPs until they successfully execute and retire. Each instruction that fails to retire on the first pass enters the queue and has its execution latency increased by a dozen cycles until its next replay. Once the queue is full, no more uOPs can be issued so the CPU wastes power and cycles re-executing stale uOPs until they retire, causing execution to stall on all threads. Prescott added independant replay queues for each thread so one single thread would never be able to stall the whole CPU by filling the queue... this could have helped Northwood quite a bit but Prescott's extra latency killed any direct gains from it. Intel should roll back to the Northwood pipeline and re-apply the good Prescott stuff like dedicated integer multiplier and barrel shifter, HT2, SSE3 and a few other things, no miracle but it would be much better than the current Prescotts, though it certainly would not help the saturated FSB issue.
With a pure TLP-oriented CPU, there is no need for deep out-of-order execution, no need for branch prediction and no need for speculative execution. Going for TLP throughput allows the CPU to freeze threads whenever there is no nearby code that can execute deterministically instead of doing desperate deep searches, guesses and speculative execution: more likely than not, the other threads will have enough ready-and-able uOPs to fill the gaps and keep all execution units busy producing useful results on nearly every tick. Stick those SPARC chips on a P4-style shared FSB/RAM platform and they would still choke about as bad as P4s do.
The P4's greatest achile's heel is the shared FSB... it was not an issue back when Netburst was running at sub-2GHz speeds but it clearly is not suitable for multi-threading multi-core multi-processor setups. The shared FSB is clearly taking the 'r' out of Netburst. The single-threaded obsession is also costing AMD and Intel a lot of potential performance, complexity and power.
Re:Poor mans dual-core (Score:2, Interesting)
It's been that way since day one, desktop as well. (Score:2, Insightful)
Re:It's been that way since day one, desktop as we (Score:5, Interesting)
HT is just another chip technology like any other. It is only in the rarest circumstances that a new technology will be better/faster for everything. These things all have tradeoffs and the question is whether the benefits are enough to exceed the disadvantages.
I really think you are being a little unfair to intel. If you had evidence that it decreased performance for most systems even when the software was compiled taking HT into account then you might have a point. However, as it is this is no different than IBM touting its RISC technology or AMD talking about their SIMD capabilities. For each of these technologies you could find some code which would actually run slower. If you happen to be running code which makes heavy use of some hardware optimized string instructions a RISC system can actually make things worse not to mention a whole other host of issues. The SIMD capabilities of most x86 processors required switching the FPU state which took time as well.
It's only reasonable that companies want to publisize their newest fancy technology and they are hardly unsavory because they don't put the potential disadvantages centrally in their advertisements/PR material. When you go on a first date do you tell the girl about your loud snoring, how you cheated on your ex or other bad qualities about yourself. Of course not, one doesn't lie about these things but it is only natural to want to put the best face forward and it seems ridiculous to hold intel to a higher standard than an individual in these matters.
Behold! (Score:5, Funny)
Oh, wait...
sort of obvious (Score:5, Informative)
The first tests on Linux when Hyperthreading came out were also pretty discouraging.
Re:sort of obvious (Score:2)
Re:sort of obvious (Score:2)
Altered data is written back to disc pretty quickly but left in cache as long as possible for obvious reasons. Clearing stuff out of cache is basically a process of deciding which data pages have overstayed their welcome. I/O does not take place.
The whole idea of this is that a SW/HW stop should not cause data loss. All updates are also written to a separate Audit device as well.
Re:sort of obvious (Score:3, Insightful)
Re:sort of obvious (Score:3, Informative)
In hyperthreading, one thread simply stops contending for functional units for 10s of cycles letting the other, already loaded and running thread max out its ALU/FPU usage while the other waits for cache to get filled from DRAM. This is much higher granularity: the OS doesn't force a swap
Marketing ploy. (Score:2, Insightful)
Re:Marketing ploy. (Score:2)
I doubt many desktop apps use lots of CPU running lots of similar threads like SQLServer does (and other high-load applications like MySQL and Apache that also do not perform as well with HT turned on).
In an advert, the bing-bong-bung-bong jingle takes longer than any explanation anyway - you surely didn't e
Figures (Score:5, Interesting)
Re:Figures (Score:3, Informative)
Well, AFAIK, the HTT thing only allows for the processor to sort of split execution units (FPU, ALU, etc) so that one can work on one thread, the other on another one. If an application resorts heavily to one of those units -- and my somewhat uninformed feeling is that software like SQL probably works mostly on the ALU, it, can't possibly GAIN performance. On the other hand, I can see the effort of thrying to pigeonhole the idle threads on the wrong execution unit (will it even try that?) completely borkin
How is this news? (Score:4, Informative)
The real question is whether this issue can be optimized for. If the developers design their code with HT in mind will this still be a problem since the other thread may belong to another process or would properly optimized code be able to deal with his?
Most importantly is this a rare effect or a common one? Would it be rare or common if you optimize your programs for an HT machine?
Comment removed (Score:5, Funny)
Re:So, what do we call this? (Score:2, Interesting)
HT problems with firebird database (slowdowns) (Score:2, Informative)
One possible solution (code patch)
http://sourceforge.net/mailarchive/message.php?msg _id=12403341 [sourceforge.net]
Other threads with hyperthreading problems (slowdowns)
http://sourceforge.net/search/?forum_id=6330&group _id=9028&words=hyperthreading&type_of_search=mlist s [sourceforge.net]
Windows problem? (Score:3, Insightful)
Another question one needs to ask is, how is performance on single and dual CPU systems? Getting good performance on a dual CPU HT system (which means four logical CPUs) is more complicated and thus requires more sophisticated algorithms in the scheduler.
Applications are most likely not to be blamed for the decreased performance. Such hardware differences should be dealt with by the kernel. Occationally the scheduler should keep one thread idle whenever that leads to the best performance. Only when there is a performance benefit should both threads be used at the same time.
HT on Linux (Score:4, Informative)
In a nutshell:
- hyperthreading decreases syscall speed by a few percent
- on single-threaded workloads, the effect is often negligible, with occasional large improvements or degradations
- on multithreaded workloads, around 30% improvement is common
- Linux 2.5 (which introduced HT-awareness) performs significantly better than Linux 2.4
So, from that benchmark (and others like it, just STFW) it appears that HT offers significant benefits; you need multithreading to take advantage of it, and having a HT-aware OS helps.
Re:Windows problem? (Score:2)
Time to Buy AMD? (Score:5, Insightful)
I second the person that said programmers shouldn't be writing code to the cache size on a processor. How well your code fits in cache is not something you can control at run time. Different releases of the CPU often have different cache sizes. And frankly developers should always try to achieve tight efficent code, not develope to a particular cache size.
Run-Time Cache Size Optimization (Score:2)
You most certainly can, and the speed gains can be significant. One way to do it:
- write a version of your code optimized for 256 KB cache
- write a version of your code optimized for 512 KB cache
Use the contents of
I'm sure there are better ways, but this is just proof that it's possible. Whether or not
HT kills my ATI All in Wonder (Score:5, Interesting)
The 9800 sits on my XP box, which rarely gets rebooted. Games, browsing etc. My mac mini and linux boxes sit in their places with a KVM
Well after using the tuner part, it looks great with my digital cable. But the box would lock, couldnt kill the process of the ATI software MMC. A few times an hour sometimes at least once a day. Well I was on the point of sticking an old haupage in there. Or using another MMC.
Well after much digging I found a thread on how HT could cause issues with the software. I disabled it in the bios, do not really need it for anything. And ran the Tuner 48 hours solid without a lockup.
Now perhaps ATI is at fault for the software, but then again HT caused the incompatibility in my book.
Puto
Re:HT kills my ATI All in Wonder (Score:5, Funny)
Re:HT kills my ATI All in Wonder (Score:5, Funny)
I left it on for 48 hours unattended.
Puto
Re:HT kills my ATI All in Wonder (Score:2)
This is slashdot after all, must have been a Dr Who marathon.
Re:HT kills my ATI All in Wonder (Score:2)
*salivate
Re:HT kills my ATI All in Wonder (Score:3, Informative)
Not a developer from Citrix (Score:2, Informative)
Linux Server? (Score:2)
shared cache versus local memory (Score:2, Informative)
Of course it can hurt performance (Score:2, Informative)
Hyperthreading works best with "bad" code (Score:5, Insightful)
The reason why hyperthreading was introduced in first place was to reduce the "idle" time of the processor. The Pentium 4 class processors have an extremely long pipeline and this often leads to pipeline stalls. E.g. the processing of an instruction cannot proceed because it depends on the result of a previous instruction. The idea of hyperthreading is that whenever there is a potential pipeline stall, the processor switches to the other thread which hopefully can continue its executon because it isn't stalled by some dependency. Now most pipeline stalls occur when the code being executed isn't optimized for Pentium 4 class processors. However the better Pentium 4 optimized your code is, the less pipeline stalls you have and the better your CPU utilisation is with a single thread.
Marcel
Not Intel's fault; Microsoft's fault. c.f. Linux. (Score:5, Interesting)
That being said, I infer from the article that Windows does not do any such priority fairness checking. Consider the example they gave in the article. The DB is running, and then some disk-cache cleaner process comes along and competes for CPU cache. If the OS were SMART, it would recognize that the system task is of a MUCH lower priority and either not run it or only run it for a small portion of the time.
As said by others commenting on this article, the complainers are being stupid for two reasons. One, Intel already admitted that there are lots of cases where HT can hurt performance, so shut up. And Two, there are ways to ameliorate the problem in the OS, but since Windows isn't doing it, they should be complaining to Microsoft, not misdirecting the blame at Intel, so shut up.
(Note that I don't like Intel too terribly much either. Hey, we all hate Microsoft, but when someone is an idiot and blames them for something they're not responsible for, it doesn't help anyone.)
Offloading the responsibility to the OS (Score:3, Interesting)
Benchmark your own application (Score:2, Insightful)
Mod Parent UP! (Score:3, Informative)
Breaking the EULA ? (Score:5, Funny)
Interesting Technical Analysis on the subject (Score:5, Informative)
Inconclusive on Linux? (Score:3, Interesting)
I decided to check out how PostgreSQL did with HT.
The first link (1 [postgresql.org]) was suggesting to someone--who was having performance problems under FreeBSD--to turn off HT. Of course, that may not be related to PostgreSQL itself, but rather FreeBSD. I really don't know.
The next thing I found showed some mixed results with ext2 under Linux (2 [osdl.org]). Somethings showed gain with HT, but not others.
Another link (3 [codecomments.com]) commented that HT with Java requires special consideration when coding.
I didn't come up with anything useful under PostgreSQL, so I checked out Linux.
According to Linux Electrons [linuxelectrons.com], Linux performance can drop without proper setup.
That's not all (Score:2, Interesting)
MY understanding is it's this way with Cubase as well.
Who buys Intel chips anymore anyway? (Score:2)
Intel's Hyperthreading vs Sun's Chip Mulithreading (Score:2, Interesting)
The Fix is not in The Software (Score:3, Interesting)
Software shouldn't be expected to handle hardware quirks. It's up to the hardware to run the software efficiently.
Seems to me a hardware fix would be to partition the cache into two pieces when HT is enabled and running -- use the whole cache for the processor otherwise.
With 2MB caches per processor now becoming available, would this be such a bad thing? IIRC once you're up to 256KB of cache you've already got a hit rate near 90%. That severely limits your possible improvement to less than 10% regardless of how much more cache you add. And yes I am aware that increasing the processor multiplier does make every cache miss worse in proportion, but still having HT run more efficiently in the bargain could make this tradeoff worth it. And that's even before you consider uneven partitioning if the OS can determine that one thread needs more cache than the other.
I'd like to move all our servers to dual-core Opt. (Score:3, Interesting)
Re:I'd like to move all our servers to dual-core O (Score:3, Insightful)
That's not a hard sell. If you're doing number crunching of any kind in a professional setting an AMDx2 or opt will pay for itself quickly.
Oh that and you're not funding the never ending chain of stupidity that is the P4 design team
Tom
Re:The developers are not smart enough! (Score:2)
Probably the developers of sql server didn't undestand how to get the best from a hyperthreading architecture. There's a big difference between 'real' threads and 'pseudo' (time-sliced) threads. I'm betting it's the software that's at fault here and not Intel's architecture.
Maybe start by betting your Karma, Mr. AC?
AFAIK intel-HT is intended to improve the felt performance of users e.g. in front of a GUI by reducing response time. There has to be a catch, because if it was so easy, everyone would hav
Re:The developers are not smart enough! (Score:2)
Given that MS SQL isn't exactly a rare piece of software, what fraction of software will actually take advantage of the hyperthreading? It's sort of the Itanium argument all over again, who cares how wonderful the architechture is if no software is able to use it well? If I was building server software, my primary performance metrics would be si
Re:The developers are not smart enough! (Score:2)
Are you high, or are you just in the habit of randomly making up nonsensical stuff? While we're at it, which morons modded that post to +4 Insightful? Do you really think that Microsoft would design and target their database server platform for use in only single CPU servers? Database applications are alwasy proces
Re:The developers are not smart enough! (Score:3, Insightful)
Are you high, or are you just in the habit of randomly making up nonsensical stuff?
No, not high. Just willing to take pro-M$ flame bait today.
I guess I overestimated the intelligence of the /. readership, especially those from the PC world.
The fact is, if you are writing software to be efficient on a single processor the architecture of the software will be much different than if you know you have 32 processors. And neither is best for the other.
For single processor speed you don't want the overhead
Re:The developers are not smart enough! (Score:2)
lol, with an outrageous claim like this for dedicated server software, you really need to provide an unbiased source.
Re:The developers are not smart enough! (Score:2)
Where did you get this wallop of information? It is not true, MS SQL Server performs very well in multiprocessor environments (not using Hyperthreading). Checkout the TPC benchmarks if you don't believe me: http://www.tpc.org/ [tpc.org]
Wow, this post sure attracted a lot of flame bait from M$ 'n FUD crew.
Read the original post, "and likely largely tested in a single processor system".
I don't think Microsoft gave it's developers a $5.8M USD machine in #4 www.tcp.org spot that you can't even buy yet to develop MS
Re:Should we turn it off in PCs? (Score:2)
Usually not, no.
Best would - of course - be to perform your own test, but enabling HT on desktops usually improves the multi-app flow and reduces the cases of boxes "locking" with one application eating all the resources.
Re:Should we turn it off in PCs? (Score:2)
Re:Should we turn it off in PCs? (Score:3, Insightful)
Along with the rest of the machine.
Re:"High End"! - LOL @ U (Score:2)
Sun equipment is a bad joke compred to IBM iron. Some banks and big firms have been using the same software for decades; once you get something debugged to the point that it never crashes, and your needs don't vary too much (finance is a pretty well-understood field), you just want it to work. Period.
-Z
You on crack? (Score:2)
So all these Xeons around the farm are laptop CPUs or something?
Re:HyperThreading is not for servers (Score:2)
Re:HyperThreading is not for servers (Score:2)
Anyone have any links to any test reports?
Not teribly scientific, but when I ran seti 3.x on my HT w. Linux I got the following results:
1 seti at a time ran in about 4 hours for 6 units per day.
2 instances of seti at a time was about 5.2 hours per unit at 9.2 seti units per day.
So I ran 2 seti instances to get the throughput as I was after the work unit count.
Re:You'd think... (Score:3, Informative)
The P4 can issue 3 uOPs per cycle but IIRC only from one thread. They alternate [stalls free up slots though]. Also the decoder can only decode ONE x86 opcode per cycle. Then you have expensive memory ports. Fetches to L2 [or alternatively to system memory] are done one at a time per thread [with deep queues which were lengthened in the Prescott core].
Asi
Re:You'd think... (Score:3, Informative)
If you're running tasks that share [e.g. write to] the same small pocket of memory you're right. However, many tasks don't do that. Often a server will spawn an entire new thread [e.g. unique stack and heap objects] to handle connections.
It also makes sense in the desktop
Re:even dual core hurts (Score:3, Informative)
So if one core requests cache line $x and the other core has it the data will be sent over the internal HT link and not even hit the memory bus. The memory controller is pipelined [I suspect] so even while the L2 fulfillment is going on the memory bus can be busy fetching another cache line.
The HT cores have *one* L2 cache per physical core [so for instance, a