HPE Says Firmware Bug Will Brick Some SSDs Starting in October this Year (zdnet.com) 97
An anonymous reader writes: Hewlett Packard Enterprise (HPE) issued a security advisory last week warning customers about a bug in the firmware of some SAS SSDs (Serial-Attached SCSI solid-state drives) that will fail after reaching 40,000 hours of operation -- which is 4 years, 206 days, and 16 hours after the SSD has been put into operation. HPE says that based on when affected SSDs have been manufactured and sold, the earliest failures are expected to occur starting with October this year. The company has released firmware updates last week to address the issue. HPE warns that if companies fail to install the update, they risk losing both the SSD and the data. "After the SSD failure occurs, neither the SSD nor the data can be recovered," the company explained.
Re: (Score:1)
I'd bet it's planned failure system that was supposed to have a random cut off date once it reached 40,000 hours but they forgot the random part
This is not a repost... (Score:3)
...from the 32,768 power-on hours bug [slashdot.org]?
Re: (Score:2)
They probably farmed out the work to India.
Re: (Score:2)
Well they say 40,000 so maybe it's different. Let's see...
40k hours = 2,400,000 minutes = 144,000,000 seconds. Doesn't seem close to any powers of 2, but they might not be using seconds as their base. They could use any time base really.
Re: (Score:2, Informative)
(2**32) / (4.57 years) is roughly 30 Hz. The hours might be a rotation count of the spindle, normalized down to some common factor of usual spindle speeds (7200, 5400, etc) . But it may be a coincidence.
The "Power On Hours Count" for SMART is notorious for using a unit size that is manufacturer dependent: "Despite what the name suggests, the raw value of the attribute is stored using all sorts of measurement units (hours, half-hours, or ten-minute intervals to name a few) depending on the manufacturer of th
Re: (Score:2, Troll)
You could try reading the fucking headline before you post about counting spindle rotations.
Re: (Score:1)
HPE says firmware bug will brick some SSDs starting in October this year
HPE releases firmware patch to prevent some SSDs from failing after reaching 40,000 hours of operation.
Did you read it? Try re-reading it.
The bug is nearly identical to another issue the company disclosed in November 2019, which also permanently crashed HPE SAS SSDs, but after 32,768 hours of operation (3 years, 270 days, and 8 hours). However, today's bug impacts far fewer SAS SSD models than the previous issue.
Just like last year, HPE said it learned of the bug from another SSD manufacturer that uses its products.
Did you read that bit? Try re-reading it.
And finally you can try reading the limited information in the official release notes [hpe.com]
The issue affects SSDs with an HPE firmware version prior to HPD7 that results in SSD failure at 40,000 hours of operation (i.e., 4 years, 206 days 16 hours). Neither the SSD nor the data can be recovered after the SSD failure occurs.
So pray tell what information did you glean from the headline that explains the mechanism of failure satisfactory? I'll wait.
Re: (Score:1)
"The hours might be a rotation count of the spindle"
Or not because this is modern times and we're talking about SOLID STATE DRIVES which have no moving parts.
I mean, critical thinking isn't that hard to do if you even just RTFS.
Re:This is not a repost... (Score:5, Insightful)
Re:This is not a repost... (Score:5, Insightful)
no, its more a statement about how bad HP is, these days, in terms of quality and design.
in fact, they don't do work anymore, themselves; they are now mostly a 'labelling' oem, having others do the real work.
when real actual honest HP was still in business, decades ago, they were at the top of their game. now, they are a junk company, not trustable any more than some low end whitebox vendor.
Re: (Score:2)
Re: (Score:2)
The test and measurement division known as Keysight was never closely related to any of the computer-hardware divisions.
And the server-hardware division known as HP Enterprise is actually Compaq, absorbed by merger. It doesn't even share much blood with the desktop PCs division.
Re: (Score:2)
No, but it traces back directly to the roots of HP, which was founded with a test/instrumentation product (an audio oscillator). It deserves (and would honor) the HP name. The computer stuff came much later.
Re: (Score:2)
no, its more a statement about how bad HP is, these days, in terms of quality and design.
Horrible that this bug exists. However it seems it was caught and a patch made available many months before a single customer even had the opportunity to be affected. As much as the bug shouldn't exist in the first place the world would be better for it if all bugs were identified and handled in this way.
Re: (Score:2)
Re: (Score:2)
TFA made it clear that this issue is distinct from the 32,768 issue. RTFA!
Re: (Score:1)
All SSDs are buggy.
I think a quarter to half of my consumer grade spinning platter drives fail mechanically before 40,000 run hours.
Re: (Score:2)
Ouch (Score:2)
This requires seriously poor design and QA. There is an initial issue of holding some sort of counter on the wrong integer type (which, by now, should be caught by any mid-experienced developer), but then to make any sort of such counter giving the wrong count basically destroy the device/data?
And it is not even the first instance, according to the article there was a separate "32768" hours issue that disabled other SSDs disclosed a few months ago (https://www.zdnet.com/article/hpe-tells-users-to-patch-ssds
My class revolted when I taught overflow... (Score:4, Informative)
I once taught a class in intermediate to advanced "C" [with a brief introduction to C++], and when I spent a session talking about how the graph of addition overflow looked like a line, and how the graph of multiplication overflow looked like an HYPERBOLA, the class revolted, and went to the dean, and I barely made it to the end of the semester.
Needless to say, I never again taught another class at that institution.
Apparently the high school class known as "Algebra II & Trigonometry" wasn't even required for programming.
When I was a kid, you needed to have three or four semesters of CALCULUS before you could take programming, with at least a semester of multivariable calculus (to include VECTORS & MATRICES) under your belt.
This idea that you can turn a [sui generis] English major into a competent computer programmer is sheer rank insanity.
Re: (Score:3)
Nonsense. I was taught programming right out of high school. Numerical analysis was what required calculus (I forget how much). Nobody ever required multivariable calculus...and I don't even recall a course by that name. (There was multi-variate analysis, and factor analysis.) Vectors and matrices were taught in multiple different courses. If you want to be particularly picky even in high school one (well, a math major) had to handle determinates, and that's a simplified form of matrices.
And I would b
Re: (Score:1)
Re: (Score:2)
If you want to do it that way, it's a lot easier if you start with a simple assembler and work your way up. Otherwise you're quite likely to miss a lot of the basics. High level languages hide too much that is really basic to understanding what's going on, even if it's a PITA to have to do it all the time. I really prefer languages with garbage collectors...but those languages hide a lot of what's going on.
Re: (Score:2)
Sure, I learned to program on my own as well. But I also took Algebra II and trig. It doesn't so much matter how you acquired the required knowledge, but it does matter that you acquired it somehow.
Re: (Score:2)
Re: (Score:2)
Sure, multi-variable calculus may be a bit much, but anyone interested in computer programming would be well advised to take Algebra II and trig in high school at least. A year of calculus is certainly helpful. Geometry doesn't hurt either. There's no reason you can't get all of those in high school, but if you didn't, you can do those in college.
Re: (Score:1)
Re: (Score:2)
Ever done graphics? Ever needed to determine that if X and Y then Z is always true (or realize that Z is not necessarily true)?
Ever needed to determine where the trade-off between more complex set-up for faster iterations reaches break-even? Prove to at least your own satisfaction that two or more threads won't deadlock?
Re: (Score:2)
Counterexample (Score:2)
One of the best programmers I ever worked with was a philosophy major.
Re: (Score:2)
One of the best programmers I ever worked with was a philosophy major.
There are always gifted individuals that can overcome the lack of formal qualifications. In an engineering field they are very rare though. And most that lack the formal qualifications are not very good and have severe limitations in what they can do and what they understand.
Re: Counterexample (Score:1)
Re: (Score:2)
This idea that you can turn a [sui generis] English major into a competent computer programmer is sheer rank insanity.
Indeed. Coding is as hard as any other hardcore engineering job and most of it is custom design. Not using actual software engineers with at the very least a respective engineering BSc for anything at least somewhat critical is grossly negligent in my opinion. This amateur shit-show has to stop.
Re: (Score:1)
Why would you blame HPE's "SSD firmware department" for this? One of the previous bugs I'm aware of (which was after 1,700 hours of power-on time) was an Intel issue with the S4510 and S4610-series SSDs:
https://downloadcenter.intel.c... [intel.com]
This bug appears to be with SanDisk SSDs, given the model numbers affected. This article:
https://blocksandfiles.com/202... [blocksandfiles.com]
shows a picture of one of the affected models, and on the label you can see "Supplier Model Number: SXKLTK" and "Supplier Part Number: SDLTOCKR-016T-5C1
Re: (Score:2)
HPE is to blame as they re-sold an enterprise product, apparently without running any real tests on it.
Re: (Score:2)
This requires seriously poor design and QA. There is an initial issue of holding some sort of counter on the wrong integer type (which, by now, should be caught by any mid-experienced developer),
Remember that software that Boeing used to kill > 300 people recently? Made on the cheap. This one here just erases critical data and hence is probably made even cheaper. It is high time that not using experienced actual software engineers to write code on this level is classified as gross negligence per default and that the vendor that screwed up is liable for any and all damage done by them not using qualified personnel.
eBay (Score:1)
When planned obsolescence (Score:1)
Who uses these drives? (Score:2)
Serious here: Does HPE sell these SSDs to any OEM (Dell,Apple, etc)? Are they sold solely to corporate accounts or to consumers as well?
If these drives are in consumer machines, I'm not sure there is any way of tracing to original purchasers, let alone whoever was given them as a birthday present, or second-hand sales.
Re: (Score:2)
Re: (Score:2)
They're SAS drives, which wouldn't work in most consumer PCs. The article says that HP doesn't even sell them separately, but that they may be used in some servers.
These are expensive drives, on the order of $2000 for the 800GB version. I'm betting there weren't a lot of them given out as Christmas presents.
800GB == 800TB ??? (Score:2)
Are you talking 800 terabytes, or maybe 800 exabytes?
Re: (Score:2)
Re: 800GB == 800TB ??? (Score:1)
Re: (Score:2)
Serious here: Does HPE sell these SSDs to any OEM (Dell,Apple, etc)? Are they sold solely to corporate accounts or to consumers as well?
If these drives are in consumer machines, I'm not sure there is any way of tracing to original purchasers, let alone whoever was given them as a birthday present, or second-hand sales.
TFS indicates that these are SAS (Serial Attached SCSI) drives, sold by HPE (the 'E' stands for Enterprise), not HP Ink (pun intended, the consumer division). So, is easy to infer these are Enterprise drives.
AFAIK, Apple has no machines that have native SAS ports (not even the cheesegrater 2.0), so Apple is not an option. Actualy, now that I think about it, I am hard press to think of consumer machines that support SAS drives right out of the box, so not sold to consumers.
As for Dell, they do have an Enterp
Why the Fed is purchasing (((corporate debt)))... (Score:2, Troll)
I just glanced at the history of it, and Dell issued 46 BILLION DOLLARS IN DEBT to make the acquisition:
https://en.wikipedia.org/wiki/Dell_EMC#History [wikipedia.org]
As of today, the new "Dell Technologies" has a total Market Cap of only 31 Billion:
https://finance.yahoo.com/quote/DELL [yahoo.com]
I guess that's why the Fed is purchasing (((corporate debt))) during the great Corona virus scare-mongering panic of 2020.
Wouldn't
Re: (Score:2)
Not likely. HPE likely bought the drive from an OEM and rebadged it, which is who 99% of the stuff happens. It does mean that the OEM drives might have the same firmware bug, except we're not
Can you solder on a new eeprom? (Score:2)
Any chance if you do find yourself to be a victim later of e.g. SMD soldering on a chip with new firmware?
Re: (Score:1)
Re: (Score:1)
I worked on an SSD product and we kept the firmware in a section of NAND itself and had a mask ROM on the processor that could read the first few "boot blocks" of NAND. Prototype devices had a SMD socket that looked like a coffin. Production devices couldn't be ISP'd, so it was a lot of work with a heat gun to remove them and replace them with a working chip. Unfortunately all your data and configuration was on the old chip.
Planned obsolescence (Score:2)
On a new level.
'40000', or '40960'? (Score:2)
If it makes you feel better (Score:1)
I think 39,768.2157 hours makes more sense, if you're into powers-of-two. That's 17,179,869,184 cycles at 7200 RPM. Probably shifted down by 4 before being logged, and thus filling a 32-bit value.
Re: (Score:1)
I'm assuming that the firmware for SMART logging is the same code base and runs off a related timer.
The article didn't specify where the 40k hours bug came from specifically. I've posted a theory. If you have an alternate theory or are able to disprove my hypothesis, then please share.
Re: (Score:2)
It was really just an idle question. When we're talking about digital electronics I raise an eyebrow when something is i
Re: (Score:1)
You might be right. Although as far as I can tell, the firmware and SMART protocol doesn't really respect the base-10 of marketing department. I think engineering department still tries to pack as much crap into as few bits as possible. I'm a bit biased being in that department and guilty of packing hardware fields into registers wherever they can fit. (to be far we were short on address space for our I/O aperture, it's not like I was trying to save RAM or anything)
12-bits of 10 hour segments? I don't know.
Re: (Score:2)
Re: (Score:1)
I'm probably stretching to make the numbers fit. Maybe a holdover from the HDD era, as SSDs have the same SMART fields as an old drive. So one has to wonder what "spindle start/stop count" and "spin up time" even mean in the context of an SSD.
Re: (Score:2)
It makes sense if it is in the error checking; that would be a human-generated time period.
Usage of SAS-SSDs and mitigations... (Score:3)
Normally, these SAS-SSDs are NOT used in servers, except for very rare cases. More often than not, these are used in Storage Arrays, either as some sort of cache, or tiered storage, or AFA (All Flash Array). While in a PC/Server it may behove you to make the drive read only after the designated number of hours, in an array, the drives will probably be in some sort of RAID configuration Modern Ones are 0, 1 or 6, legacy or clueless people still use 5, 10 or 01. In these situations, having a read only drive is as bad as having it fail completely.
Having said that, is amazingly stupid to brick the drive upon reaching the designated number of safe hours, istead of just throwing an alarm through the standard S.M.A.R.T. mechanism...
And to top it off, this happens twice?! This points to either sheer incompetence of the HPE people in charge of the firmware, or HPE incorrectly using/modifying an Of-The-Shelf firmware (or a combination of both).
I hope that their arrays have adequate mechanisms for updating the firmware of drives one by one without rebuilds, like many array vendors have.
Re: (Score:1)
HP toner (Score:2)
Knowing HP, this was some feature they were working with where they sold a license to use the drives at 3/4 the regular new price, but you had to repurchase a license to keep using them every 40k hours of uptime over and over. But they forgot to disable it.
Sounds about right.
Re: (Score:2)
Knowing HP, this was some feature they were working with where they sold a license to use the drives at 3/4 the regular new price, but you had to repurchase a license to keep using them every 40k hours of uptime over and over. But they forgot to disable it.
Sounds about right.
It is a lot like HP's "limited" capacity ink cartridges.
Arrest programmers (Score:2)
Re: (Score:2)
Programmers need to held accountable for their bullshit software
Programmers? What about executives at companies that don't want to pay the cost for a complete software development life cycle (as in real design, real code reviews, and real testing)? Maybe we also should arrest people who make purchasing decisions based on cost - If you decide to purchase the $200 drive instead of the $205 one with the only reason being to save $5, you are encouraging companies to cut corners.
Re: (Score:2)
Burn them! Stake them! Quarter them!
Wait, what if it was the PHB who did it?
fwupdmgr on Linux (Score:2)
Re: (Score:2)
Re: (Score:2)
Also you need to have a valid and active support contract for that specific product type to be able to download anything off the support site.
Even if you find a bootload of those SSDs second-hand with less than 40k hours on them - good luck getting the "good" firmware.
Lifetime timer? (Score:4, Interesting)
Re: (Score:3)
Let me get this straight, you got upset because a device that uses a
Re: (Score:2)
Let me get this straight, you got upset because a device that uses a radioactive source with a short half-life (ionization smoke detector) or a chemical detector that slowly degrades (carbon monoxide detector) has a 7 year timer that keeps it from running past the point where the device can fail to detect dangerous levels of smoke or carbon monoxide, so that you may not get a timely alarm and instead die.
Half life of the radioactive source in smoke detectors is well over four hundred years. The CO component does degrade but rendering the whole thing useless because of that is BS.
Re: (Score:2)
Which means after 10 years you've lost 1.6% of the 1/5000th of a gram of americium-241 that is in the smoke detector. How much loss do you think that detector can tolerate and still be sensitive enough to trigger while giving you a reasonable escape time?
"The whole thing"? Then don't buy a combination detector, genius.
But no, you'
Re: (Score:1)
" How much loss do you think that detector can tolerate and still be sensitive enough to trigger while giving you a reasonable escape time?"
I've got detectors from the 1960s from an old industrial building that still operate very well.
Americium-241 emits primarily alpha radiation, which is stopped by just about anything, even small levels of steam in the air. If it stops emitting (that's unlikely for several generations,) the detector FAILS ON, assuming the battery/power and other circuitry are good. That's
Re: (Score:2)
That's all very nice, and not how ionization smoke detectors work [systemsensor.com]. The alpha radiation ionizes the air, which chemically interacts with the particulates in smoke, which changes the conductivity of the air in comparison to a sealed reference
Re: (Score:1)
An ionization smoke detector uses americium-241 to do more than ionize air. Any detectable difference due to smoke is detected and an alarm is generated. That difference can be caused by interference with a radiation detector or an ionization chamber or collector plate not generating a charge due to radiation being blocked. Several types of detection were in use in the 1960s. I've actually worked in a fire department (Memphis, volunteer squad, primarily worked with them doing inspection of in-building detec
Re: (Score:2)
Nope. Don't care about a claim that's already been shown to be false. You're wasting your time making source-free claims.
Not relevant to the home smoke detectors in use 60 years later.
Your go-to claim is that you've actually worked relevant indus
Re: (Score:2)
Re: (Score:2)
Is there a list of these smoke/carbon dioxide detectors that expire?
Re: (Score:2)
Is there a list of these smoke/carbon dioxide detectors that expire?
All of them expire now because it is required by state level legislation.
Re: (Score:1)
Like the way that at least some of their switches require their SFP's? A handful of years ago I saw that they wanted to charge $2200 or so for an SFP, where I could buy them from ChampionOne for $110. They were all made by Finisar and were identical.
Re: (Score:2)
At least it was not like Dell who kept the ATX power connector on their motherboards but changed the pin assignments so that replacing the Dell "ATX" power supply with a standard ATX power supply destroys the motherboard or worse.
Re: (Score:1)
Oh hey if we're going get into connector conspiracy:
The Sun 4/110 had a one-off sw (SCSI Weird) interface that IIRC applied termination power to pin 26, so it was possible to fry non-Sun peripherals unless one put a Sun device first in the chain or carefully cut pin 26 on a cable.
Sony back in the day would equip portable CD players with a DC power plug with tip/ring reversed from normal polarity.
The Credential Exchange (Score:2)
The fundamental law of storage design is that fuck-ups above the lowest layer of media integrity control should not melt the underlying raw data structures.
Having a bug: dime a dozen.
Melting the entire universe: Second career in the Mexican construction trade under an assumed identity.
Re: (Score:2)
These kinds of failures are frequently caused by planned obsolescent bugs. AKA, it should be going into some kind of fail-safe read only mode when it hits some number of age related metrics, and then after some additional time simply refusing to work. But then some bug tips and it goes right from working to refuses to boot enough to upgrade the firmware.
Great "Enterprise" Hardware! (Score:2)
It seems all it does is being much more expensive. It certainly is not carefully tested...
Drive Vendor (Score:2)
Anybody know who the drive vendor is (I'm pretty sure HP doesn't make them, just rebrands them)?