Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Intel Bug Hardware

Errata Prompts Intel To Disable TSX In Haswell, Early Broadwell CPUs 131

Dr. Damage writes: The TSX instructions built into Intel's Haswell CPU cores haven't become widely used by everyday software just yet, but they promise to make certain types of multithreaded applications run much faster than they can today. Some of the savviest software developers are likely building TSX-enabled software right about now. Unfortunately, that work may have to come to a halt, thanks to a bug—or "errata," as Intel prefers to call them—in Haswell's TSX implementation that can cause critical software failures. To work around the problem, Intel will disable TSX via microcode in its current CPUs — and in early Broadwell processors, as well.
This discussion has been archived. No new comments can be posted.

Errata Prompts Intel To Disable TSX In Haswell, Early Broadwell CPUs

Comments Filter:
  • Can I have a refund? (Score:2, Informative)

    by Anonymous Coward on Tuesday August 12, 2014 @03:23PM (#47657319)

    In some countries I would be entitled to get the product that was advertised or get a refund.

  • by gman003 ( 1693318 ) on Tuesday August 12, 2014 @04:05PM (#47657651)

    I'm sure there are some Opterons laughing right now.

    Yes, but some of them take a while to get the joke because their TLB had to be disabled.

    (Certain releases of the "Barcelona" Opterons had a bug that could lock up the system. A workaround would prevent it, but had a stiff performance penalty. Later steppings had it fixed.)

  • by ShanghaiBill ( 739463 ) on Tuesday August 12, 2014 @04:09PM (#47657671)

    See also Pentium 5 and the FDIV bug. It falls under "too bad, so sad, try your luck with the next revision".

    No. Intel offered to replace any P5 with the FDIV bug upon request. Most customers did not request a replacement, but the option was available.

  • by Anonymous Coward on Tuesday August 12, 2014 @04:35PM (#47657903)

    Wikipedia has very detailed information on Intel processors. This page [wikipedia.org] does not list TSX for your processor and does list it for others.

    Most Linux distros automatically handle Intel microcode patches (which I assume is how this errata will be handled). See Debian wiki [debian.org] or Arch wiki [archlinux.org] for details.

  • by CajunArson ( 465943 ) on Tuesday August 12, 2014 @05:43PM (#47658467) Journal

    You can still "play with this instruction" all you want.

    What happened here is that a third party developer managed to uncover a corner case where certain interactions with TSX can lead to instability. In order to be safe, Intel acknowledged the bug (a refreshing response) and is now giving you the OPTION to disable TSX if you feel that it could impinge the stability of a production load.

    So basically: Go ahead and play with TSX all you want, but be aware of the errata and that it's theoretically possible to hang your machine in some corner cases.

  • by EvilJoker ( 192907 ) on Tuesday August 12, 2014 @05:58PM (#47658587)

    I know this was a troll, but I feel compelled to reply in case someone doesn't know.

    ALL CPUs have errata. Some of it more significant than others.

    A quick Google for "AMD errata" revealed Revision Guide for AMD Family 16h Models 00h-0Fh [amd.com], published June 2013, and applying to AMD's Mobile A,E, and G series, and Opteron X1100/X2100 (These are modern CPUs)

    There are 21 entries, with descriptions, system impact, and suggested workaround (if any)

    Haswell's errata [intel.com] has 131 entries

  • by Anonymous Coward on Tuesday August 12, 2014 @05:59PM (#47658589)

    See also Pentium 5 and the FDIV bug. It falls under "too bad, so sad, try your luck with the next revision".

    No. Intel offered to replace any P5 with the FDIV bug upon request. Most customers did not request a replacement, but the option was available.

    Not at first they didn't.

    My friend was doing his master on neural networks (?) at the time and some of his algorithms were giving back hinky results, especially when he compared them to some of the SPARC systems.

    He had to actually provide documentation that it effected him, and I think sign an NDA, before Intel would give him anything. He jumped through their hoops to get a replacement, and then the very next week Intel announced their carte blanche replacement program.

    It took much screaming in the industry before Intel became "generous".

  • by Anonymous Coward on Tuesday August 12, 2014 @08:10PM (#47659359)

    Huh? TSX shipped with Xeon-E3 v3 CPUs. I bought one LAST YEAR so I could play around with TSX.

    Note the RTM at the end of the flags. That signals support for the new TSX instructions. RTM means "Restricted Transactional Memory", as opposed to the other half of TSX, HLE, which is a backwards compatible change in semantics.

    $ cat /proc/cpuinfo | head -n25
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 60
    model name : Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
    stepping : 3
    microcode : 0x10
    cpu MHz : 800.000
    cache size : 8192 KB
    physical id : 0
    siblings : 8
    core id : 0
    cpu cores : 4
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 13
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
    bogomips : 6585.24
    clflush size : 64
    cache_alignment : 64
    address sizes : 39 bits physical, 48 bits virtual
    power management:

  • by enriquevagu ( 1026480 ) on Tuesday August 12, 2014 @08:28PM (#47659461)

    This is a real pity for the TM community. This is not the first chip with transactional memory support in hardware: The Sun Rock [wikipedia.org] was announced to have hardware TM support, and the IBM Blue Gene/Q Compute chip [wikipedia.org] also supports it. Unlike other proposals for unbounded transactional memory [berkeley.edu], all these systems employ Hybrid Transactional Memory (ref [cs.sfu.ca], ref [unine.ch], ref [auckland.ac.nz]), in which restricted hardware transactions are designed to correctly coexist with unbounded software transactions, so a software transaction can be started in case a hardware transaction fails for some unavoidable issue (such as lack of cache size or associativity to hold speculative data from the transaction, not because of a conflict). Note that, in any case, very large transactions should arguably be very uncommon, since they would significantly reduce performance (similar to very large critical sections protected by locks).

    The problem with the hardware implementation of transactional memory is that they are not simply a new set of instructions which are independent from the rest of the processor. HTM implies multiple aspects, including multiversioning caching for speculative data; allowing for the commit of speculative (transactional) instructions, which could be later rolled back (note that in any other speculative operation such as instructions after branch prediction, the speculation is always resolved before instruction commits because the branch commits earlier); a tight integration with the coherence protocol (see LogTM-SE [wisc.edu] for an alternative to this very last issue, but still...); a mechanism to support atomic commits in presence of coherence invalidations... From the point of view of processor verification, this is a complete nightmare because these new "extensions" basically impact the complete processor pipeline and coherence protocol, and verifying that every single instruction and data structure behaves as expected in isolation does not guarantee that they will operate correctly in presence of multiple transactions (and non-transactional conflicting code) in multiple cores. There are some formal studies such as this [nyu.edu] or this [cs.sfu.ca], and the IBM people discuss the verification of their Blue Gene TM system in this paper [acm.org] (paywalled).

    As some others commented before, the nature of the "bug" has not been disclosed. However, since it seems to be easy to reproduce systematically, I would expect it to be related to incorrect speculative data handling in a single transaction (or something similar), rather than races between multiple transactions.

    Regarding the alternatives, Intel cannot simply remove these instructions opcodes because previous code would fail. I assume that the patch will make all hardware transactions fail on startup, with an specific error (EAX bit 1 indicates if the transaction can succeed on a retry; setting this flag to 0 should trigger a software transaction). In such case, execution continues at the fallback routine indicated in the XBEGIN instruction, which should begin a software transaction. Effectively, this will be similar to a software TM (STM) with additional overheads (starting the hardware transaction and aborting it; detecting conflicts with nonexistent hardware transactions) that would make it slower than a pure STM implementation.

  • by Sun ( 104778 ) on Tuesday August 12, 2014 @11:56PM (#47660269) Homepage

    I have a firend who came to me, eyes all glowing, about this new feature his shining new CPU has. I listened in and was skeptical.

    He then tried, for over a month, to get this feature to produce better results than traditional synchronization methods. This included a lot of dead ends due to simple misunderstandings (try to debug your transation by adding prints: no good - a system call is guaranteed to cancel the transaction).

    We had, for example, a lot of hard times getting proper benchmarks for the feature. Most actual use cases include a relatively low contention rate. Producing a benchmark that will have low contention on the one hand, but allow you to actually test how efficient a synchronized algorhtm is on the other is not an easy task.

    After a lot of going back and forth, as well as some nagging to people at Intel (who, suprisingly, answered him), he came across the following conclusion (shared with others):
    Many times a traditional mutex will, actually, be faster. Other times, it might be possible to gain a few extra nanoseconds using transactions, but the speed difference is, by no means, mind blowing. Either way, the amount you pay in code complexity (i.e. bugs) and reduced abstraction hardly seems worth it.

    At least as it is implemented right now (but I, personally, fail to see how this changes in the future. Then again, I have been known to miss things in the past), the speed difference isn't going to be mind blowing.

    Shachar

  • by rrohbeck ( 944847 ) on Wednesday August 13, 2014 @04:06AM (#47661127)

    Singular: Erratum
    Plural: Errata

  • by TheRaven64 ( 641858 ) on Wednesday August 13, 2014 @05:02AM (#47661311) Journal

    It depends a lot on the data structures. There were a number of papers using TSX at EuroSys this year. The main conclusion was that TSX lets you get similar performance from simple approaches as you can get already from complex approaches. For example, you can protect a long linked list in a single lock and use HLE to get a big speedup with lots of concurrent insertions and accesses, but you can achieve similar performance with a fine-grained locking scheme. There was a nice paper about Cuckoo hashing where they initially found that TSX gave them a performance win, but then were able to get a similar speedup without it.

    The big win with TSX is that it's pretty easy to reason about coarse-grained locking and much harder to reason about fine-grained locking. If you can make coarse-grained locking almost as fast as fine-grained, then that's a huge saving on testing and debugging time.

2.4 statute miles of surgical tubing at Yale U. = 1 I.V.League

Working...