Microsoft Advice Against Nehalem Xeons Snuffed Out 154
Eukariote writes "In an article outlining hidden strife in the processor world, Andreas Stiller has reported the scoop that Microsoft advised against the use of Intel Nehalem Xeon (Core i7/i5) processors under Windows Server 2008 R2, but was pressured by Intel to refrain from publishing this advisory. The issue concerns a bug causing spurious interrupts that locks up the Hypervisor of Server 2008. Though there is a hotfix, it is unattractive as it disables power savings and turbo boost states. (The original German-language version of the article is also available.)"
VMWare may also be a problem (Score:2, Informative)
I've been experiencing problems with intermittent lockups under VMWare as well. DL370-G6 boxes. HP has given us BIOS fixes and is even shipping new boxes, but if there's a suspect problem
with working with MS' hypervisor, I wonder if this is the same issue?
Re:What about for Windows 7? (Score:5, Informative)
Re:What about for Windows 7? (Score:4, Informative)
No, this only applies to the Hyper-V component of Server 2008 R2. Normal people do not use Windows Server for "home use/gaming purposes" (cue a dozen replies of people talking about how cool they are because they use pirated copies for said purpose), so its not a big deal. Also, Core i5/i7 is already a Quad Core, I assume you mean Core 2 Quad.
Please Explain Further (Score:5, Informative)
The summary says that the hotfix disables power savings and turbo boost. But my reading of the MS report is that an affected system has two options, (1) a workaround, and (2) the hotfix. The difference is that the workaround disables advanced power savings and is known to be stable without side effects, but the hotfix actually fixes the problem with the vector table, presumably by following the instructions provided in the Intel advisory note.
Said another way, the hotfix doesn't disable power savings and doesn't disable turbo boost.
I expect that this is another fine example where Slashdot editors misunderstand a situation. Someone prove me wrong.
Re:First Rev of New Architecture (Score:3, Informative)
Re:Please Explain Further (Score:4, Informative)
The Microsoft KB article is quite explicit that the workaround is what disables the sleep states, leading to higher power usage - the hotfix itself does not exhibit this problem.
Re:Please Explain Further (Score:5, Informative)
Your explanation is exactly how I interpreted the KB article. I think Slashdot was going for some sensationalistic journalism. :-)
Taken from TFA:
You can disable the Advance Configuration and Power Interface (ACPI) C-states by using a BIOS firmware option on the computer. If the firmware does not include this option, a software workaround is available. You can disable the ACPI C2-state and C3-state by setting a registry key. To do this, follow these steps:
1. At a command prompt, run the following command: /v Capabilities /t REG_DWORD /d 0x0007c044
reg add HKLM\System\CurrentControlSet\Control\Processor
2. Restart the computer.
Note The computer idle power consumption will increase significantly if the deeper ACPI C-states (processor idle sleep states) are disabled. Windows Server 2008 R2 uses these deeper C-states on the Xeon 5500 series as a key energy saving feature.
To continue to benefit from these energy saving states, remove this registry key after you install the hotfix that this article describes. To do remove this registry key, follow these steps:
1. At a command prompt, run the following command: /v Capabilities /f
reg delete HKLM\System\CurrentControlSet\Control\Processor
2. Restart the computer.
Actual errata (Score:3, Informative)
From the pdf file linked from the Intel site, I think it's AAK36, as it's the only one that mentions the word "spurious." This has to do with writing to the interrupt vector table when a local interrupt is pending. That doesn't look terribly serious from my perspective. If I'm mistaken and it's a different errata, please reply with the correction.
Re:Actual errata (Score:3, Informative)
AAK36 for the Xeon version. AAN31 is the code for the i7 and i5 version. It's the same errata, just a different code number for different chips.
Re:AMD is looking better and this is the type of s (Score:2, Informative)
Xeon is just a marketing name. The Xeon 3400 are identical with the i5-7xx, i7-8xx CPUs, the Xeon 3500 are identical with the i7-9xx CPUs and the Xeon 5500 CPUs are basically i7-9xx with two QPI Links.
For example, this issue also affects als i5 and i7 CPUs.
Re:Broken processors (Score:5, Informative)
so much FUD.
#1. MS classified this interrupt as "unreliable" for all previous hypervisors and randomly decided to use it for this version of their hyper visor
#2. ONLY MS uses this interrupt, not vmware or anyone else.
#3. Intel's new Xeons still use less power and out perform AMD and any previous CPUs. It's still the best CPU, even if you use the "work around"
Re:Actual errata (Score:3, Informative)
I don't think it's either of them. The top one about changing vectors would be unlikely to happen in commercial software like Windows, because they would have handlers installed for all interrupts already.
I think it issue really is the watchdog, MS is using the APIC during C6 state and as the 119 errata, the APIC counter stops during C6 state. So some interrupt that is supposed to fire to reset the watchdog doesn't fire and thus the watchdog goes off (as indicated by the error code).
So the 119 errata is related only as much as it mentions that the APIC counter doesn't increment during C6 state (which is also probably documented elsewhere).
There really isn't enough info in this article to know for sure what is up. That didn't stop the slashdot editors from going off half-cocked though.
Re:AMD is looking better and this is the type of s (Score:2, Informative)
It's a processor bug exposed by a new hypervisor technique used by MS and nobody else.
I'm not sure why you want to blame this on MS.
Re:Broken processors (Score:5, Informative)
AMD looking better? Bullshit (Score:5, Informative)
AMD has also built parts with equally screwed up timers, particularly TSC clock skew on multi-cores. Timers are just messed up on x86 from either company. This nonsense goes back years. There are now at least four distinct general purpose clock sources that must be present on modern systems; tsc, apci_pm, hpet and pit (as labeled by the Linux kernel.) There will probably be further proliferation in the future as ALL of the existing timers are inadequate in subtle ways. Implementations from both manufacturers have been plagued with bugs that require nasty work-arounds; google "clocksource tsc unstable", "pm-timer bug" or "athlon x2 tsc" for some examples. This nonsense that Microsoft has stumbled upon is just the latest in a long and colorful history of failure that we'll now have to add to the list.
Computers are supposed to keep time. Today that means high resolution clocks that work correctly regardless of power saving, concurrency, etc. Using these crucial timers is not suppose to cause spurious interrupts, bus contention or other subtle problems. People that must work with this stuff are thoroughly fed up with this ever growing pile of half-baked bullshit.
Re:Broken processors (Score:3, Informative)
The hotfix fixes the problem and allows the use of power saving states.
Done!
Price For Bleeding Edege (Score:1, Informative)
There is a price to pay for being on the "bleeding edge" of technology.
You are essentially being an unpaid BETA tester for both Microsoft, Intel, and whatever other components you happen to be using.
You are paying for the privilige of BETA testing , and since your software comes with NO WARRANTY, or FITNESS FOR A PARTICULAR PURPOSE, and contains, KNOWN DEFECTS, you should be happy to know your hard work will be used to make other peoples life easier.
KB Link (Score:2, Informative)