New Seagate Shingled Hard Drive Teardown 93
New submitter Peter Desnoyers writes: Shingled Magnetic Recording (SMR) drives are starting to hit the market, promising larger drives without heroic (and expensive) measures such as helium fill, but at a cost — data can no longer be over-written in place, requiring SSD-like algorithms to handle random writes.
At the USENIX File and Storage Technologies conference in February, researchers from Northeastern University (disclaimer — I'm one of them) dissected shingled drive performance both figuratively and literally, using both micro-benchmarks and a window cut in the drive to uncover the secrets of Seagate's first line of publicly-available SMR drives.
TL;DR: It's a pretty good desktop drive — with write cache enabled (the default for non-server setups) and an intermittent workload it performs quite well, handling bursts of random writes (up to a few tens of GB total) far faster than a conventional drive — but only if it has long powered-on idle periods for garbage collection. Reads and large writes run at about the same speed as on a conventional drive, and at $280 it costs less than a pair of decent 4TB drives. For heavily-loaded server applications, though, you might want to wait for the next generation. Here are a couple videos (in 16x slow motion) showing the drive in action — sequential read after deliberately fragmenting the drive, and a few thousand random writes.
At the USENIX File and Storage Technologies conference in February, researchers from Northeastern University (disclaimer — I'm one of them) dissected shingled drive performance both figuratively and literally, using both micro-benchmarks and a window cut in the drive to uncover the secrets of Seagate's first line of publicly-available SMR drives.
TL;DR: It's a pretty good desktop drive — with write cache enabled (the default for non-server setups) and an intermittent workload it performs quite well, handling bursts of random writes (up to a few tens of GB total) far faster than a conventional drive — but only if it has long powered-on idle periods for garbage collection. Reads and large writes run at about the same speed as on a conventional drive, and at $280 it costs less than a pair of decent 4TB drives. For heavily-loaded server applications, though, you might want to wait for the next generation. Here are a couple videos (in 16x slow motion) showing the drive in action — sequential read after deliberately fragmenting the drive, and a few thousand random writes.
Interesting idea, nasty downsides (Score:2)
Re:Interesting idea, nasty downsides (Score:4, Insightful)
Oh look here are some SSD optimised file systems already. Incidentally they apply to these drives rather well.
Re: (Score:2)
Re: (Score:3, Insightful)
Why not just go full SSD?
For much the same reason we still use tape. Sometimes read/write speed isn't nearly as important as $/TB.
Re: (Score:2)
This. Even auditors have stopped blinking at me when I say "No tapes, we just have another data center like this one and a big ol pipe and XYZ data backup solution attached to the disks at the other end."
When auditors stop blinking, you know it's hit mainstream.
Min
Re:Interesting idea, nasty downsides (Score:5, Insightful)
No tapes, we just have another data center like this one and a big ol pipe and XYZ data backup solution attached to the disks at the other end."
So, you're not protected against malicious destruction of data? Pretty sure that requires an air gap.
Re: (Score:2)
Raid drives are transparently removable and can be placed in a bank vault.
Re: (Score:2)
I somewhat agree, because "transparent" isn't exactly true. Popping a drive and putting a blank one in means a resync spike as the new one is filled. With multi-TB drives, that resync time isn't exactly trivial, and a drive failure during resync is certainly a possibility, as there is more activity.
Going with something that can handle multiple failures is helpful, such as RAID6, but you're still increasing risk just by performing your backup operation. In fact, part of the reason RAID6 exists is that there
Re: (Score:2)
You can do 3 drives /mirror to considerably reduce your risk. Even when you pull a drive for the vault, you still have a mirrored volume.
Re: (Score:2)
Depends on your risk scenario planning. But yes, it does. A full rundown of our data integrity program would exceed the tl;dr scope on Slashdot, as well as violating NDAs :).
In general though I'd point out that disk based vaulting technologies have advanced considerably in the last few years and if I were providing advice to someone I'd point out that there are cloud based solutions which are write-only type solutions if your risk tolerance permits the use of third parties to store your data (e.g. CrashPl
Re: (Score:1)
Security. A data center that uses SANs and devices like Avamars is just asking for an intruder to log on and purge their devices of all data. All it takes is something like blowing the disk images (target LUNs) away, or random dd-ing garbage onto their sectors.
With tape, a bad guy will have to have to have physical access to grab a batch, stuff it in the tape silo, then tell the silo to sequentially erase every single unit there.
Problem with SANs/NASs and backups on hard drives is simple: They are great
Re: (Score:3)
Re:Interesting idea, nasty downsides (Score:5, Interesting)
Re: Interesting idea, nasty downsides (Score:2)
I would say 98%
I have never heard of one who doesn't. One client still uses punch cards. if it ain't broke don't fix it!
Re:Interesting idea, nasty downsides (Score:5, Interesting)
Who still uses tape? Seriously, no data centric company on the planet still uses tape, its easier and cheaper to throw a bunch of large drives and a big fat pipe to offsite storage than deal with a tape robot.
People still using tape are doing so because they haven't moved on and like pain or are just ignorant of the alternatives.
Google, probably the most "data centric" company on earth, that's who!
http://highscalability.com/blo... [highscalability.com]
http://www.theregister.co.uk/2... [theregister.co.uk]
Re: (Score:1)
As someone whose job it is to support the data storage needs of some of the _largest_ data-centric companies on the planet, I assure that they *all* use tape and that you have *no* idea what the fuck are you are talking about.
But this is Slashdot, so what did I expect?
Because you're an idiot? (Score:4, Informative)
Find me an 8TB SSD that is even within spitting disttance (hell, within ICBM distance) of $300 and you win the prize, otherwise the suggestion is sless. Hint: Not today, not next year. Possibly this decade . The cost has come down a ton, but it was absolutely astronomical before.
The systems I'm buying now will be obsolete by the time SSD can even think about touching hard drives in terms of capacity per $. Typically, the ONLY reason to go full SSD now for large storage capacities is because you absolutely need the performance and are willing to pay essentially "whatever it costs" (at least 8x+ the price) because it's that important to get the IOPS. Maybe by the end of next year we'll get it down to "only" 4x the price (not counting that though because price per GB for large capacity hard drives still continutes to fall, balancing out a part of the cost reduction in SSDs).
Re: (Score:2)
The systems I'm buying now will be obsolete by the time SSD can even think about touching hard drives in terms of capacity per $. Typically, the ONLY reason to go full SSD now for large storage capacities is because you absolutely need the performance and are willing to pay essentially "whatever it costs" (at least 8x+ the price) because it's that important to get the IOPS. Maybe by the end of next year we'll get it down to "only" 4x the price (not counting that though because price per GB for large capacity hard drives still continutes to fall, balancing out a part of the cost reduction in SSDs).
You're right, but also somewhat wrong.
"Raw capacity" is indeed quite expensive. However, the increased speeds of flash have made it possible to provide in-line data reduction services. Data reduction is a widely used term for two techniques: de-duplication and compression. This works on a block level. When the host writes a block, the system will look in its table to see if that block has already been written. If so, it will simply write a pointer. If not, it will be forwarded to the compression engine a
Re: Because you're an idiot? (Score:2)
Compressing data for important backups is not a good idea. A one-bit error and entire files or worse become garbage.
Re: (Score:2)
If you block-level deduplicate a file on an HDD, and even a small fraction of the blocks from an otherwise sequential file are replaced by pointers, you have completely destroyed read performance for that file. Block-level deduplication is not a viable technology on hard drives except for very specific use patterns.
In contrast, almost no workloads suffer when doing block-level deduplication on an SSD.
Re: (Score:2)
There are also few workloads where block-level dedupe is any better than block-level compression. Most people don't store the same data over and over (yes, disk images of virtual machines, but even there the images stop converging soon after deployment).
With block-level compression I get ~30% for what is basically 'random' data (user home directories and medical imaging data, 100TB). Block-level deduplication would only give me ~15%.
Compression does internal de-duplication already (one of the easiest ways o
Re: (Score:2)
Re: (Score:2)
I think that will be available in 6 years.
Bear in mind though that this SMR drive is an exception. The other manufacturers are still producing 4-6 TB drives for that price.
Re: (Score:2)
Well, $2000 gets you spinning rust 8TB HDDs from the other guys with fancy helium and no SMR right now. You can easily buy 4x 1TB SSDs for that price.
But you can bet even those hard drives will probably drop to $300 range so
Re: (Score:2)
The prices haven't really come down since the 2010 flood and we lost Samsung as a competitor.
8 TB might be the last (premium) drive of the spinners.
Re: (Score:2)
The demand won't stop, we'll always need more cheap storage and until we can make chips as fast and cheap as we can coat a piece of metal with magnets, we'll have spinning rust. The spinning rust will get heavier and slower though as we are nearing the limits of what is physically possible but that's another decade or 2 away from now.
Eventually SSD will outperform hard drives in cost on other levels (energy usage, price per cubic feet) just as hard drives did to tape and their robots (robots are too expensi
Re: (Score:2)
Though, one of the ones I got then, had the mSATA already holding a 20GB SSD, set up as a cache for the slow spinner. It runs su
Mismatch (Score:2)
SSD for boot/OS/swap, and slow spinner for data gives 99% of the performance for 99% of people.
That would be great except 99% of people don't want more than one disk.
Hell, *I* don't want more than one disk, and I can ably manage them. But there's no way I can afford the SSD it would require to store everything I have (never mind the backups).
Re: (Score:2)
Danger (Score:2)
9% of people don't care. I set up my mother's computer with the OS on C and everything else
She sure will care if there's every a problem (and there will be a problem) with one of those drives.
As a rule of thumb it's way better not to double someones possible failure rates if they don't know themselves how to recover from it...
I've spent my life helping people get set up technically so they never need to talk to me again - at least not about their systems. It creates a lot less work for yourself, unschedule
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I haven't been brave enough to trust the SSD with my swap file, but am getting there. In part because the OS gets badly swapped out when you work with very large files -- stupid design. If it didn't, I wouldn't bother.
Similarly, I will be moving browser cache directories pretty soon. Opera (v12) seems to get particularly sluggish when things are all s
Re: (Score:2)
Re: (Score:2)
BTW, I moved my Opera browser cache to the SSD a couple of hours ago. Insert VBG here.
Re: (Score:1)
Re: (Score:3)
There are several solutions that take an SSD and hard drive and present one logical drive to the OS.
Windows: Intel Smart Response
OS X: Fusion Drive
Linux: bcache/flashcache
Re: (Score:2)
I had reason to boot up an old XP system I set up for my mum to play games recently (my graphics card died). I timed the boot : 13 seconds from bios to desktop.
I used to have 15 second boots on my system with Vista too, it's now closer to 30 with all the cruft I now have.
Does anyone need much quicker than that on a desktop system?
Re: (Score:2)
Re: (Score:2)
Just simple HDD. With the Vista installation I've striped the hard drives.
I personally was astonished how quick XP booted. I always remembered a dearth of incomprehensible chuntering. It seems I set up a decent system. I repeat, 13 seconds to desktop.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
If they'll add ~32 gigs of ssd cache for delayed writes (and faster reads as a bonus, and reliability in case of power failures) - it'll be overall winner.
Re: (Score:2)
Because that's worked so well for the seagate hybrid drives. (hint: no it doesn't)
Re: (Score:3, Insightful)
Re:Interesting idea, nasty downsides (Score:4, Insightful)
Drive performance is kind of like airplane legroom - people gripe about it, but in the end they ignore it and buy the cheap ticket.
Shingled drives aren't better - they're bigger, and that's what people pay for. WD's 10TB helium drive is shingled, and I would guess that every drive over 10TB will be shingled for the foreseeable future. By the time HAMR and BPM come out, SSDs will probably have killed off the high-performance drive market, so those technologies will probably be released as capacity-optimized shingled drives, too.
Re: (Score:2)
If this technology becomes commonplace, I can see this used as a third tier of storage, between normal HDDs and tape, either used as a live landing zone until it gets copied to tape, or perhaps used in concert with a higher tier landing zone, where the data is written onto the platters already deduplicated, aimed at staying there for long term storage.
Even operating systems are starting to become storage tier aware. Windows Server 2012R2 can autotier between SSD and HDD, and Windows Server 10 has improved
backup storage (Score:2)
This seems that it would be beautiful for a backup server that backs up every few weeks.
Re: (Score:1)
It also sounds good for a video server. I have one attached to my PC-based DVR, with playback clients in other rooms. For 99.9% or more of the data, it's very large files (100s of MB to multiple GB) that are written once and read many times until deleted.
However, since this server is also a backup server, its a RAID array. I wonder if this Shingled format has any effect on RAID performance. A lot of "green" drives do not work well in this RAID setup, causing stuttering video playback when they continual
Re: (Score:2)
Drive needles (Score:3, Interesting)
One thing I've tended to wonder, why have a single read-write needle on conventional drives (especially in multi-platter situations). Why not have two needles, one on either side so they can't touch.
Alternately, why not a "track" that runs across the drive with shuttles on either side to perform the reads/writes. You could have two perpendicular tracks to increase performance
Re: (Score:2)
Re:Drive needles (Score:4, Informative)
More than a head per side? It's been attempted, and turned out it's not really worth it. It's a lot of extra complication for not that much benefit. Heads are expensive and generate heat, so it works out to close 2X the price anyway, plus an increased change of failure. Easier and safer to just add another drive.
These days there are SSDs too.
Re: (Score:2)
About 10-15 years ago, some drives used to have two read/write stacks of heads, each independent from the other. This was killed due due to people wanting cheap drives.
Re: (Score:2)
I remember seeing a Conner hard drive like that many years ago. Tom's Hardware [tomshardware.com] discusses it.
Re: (Score:1)
Conner. There's a name I haven't heard in awhile.
It's interesting that they also did try the sweeping-servo (similar to a CD-ROm drive) read/write head design as well, once upon a time.
Log Filesystem - as hardware? (Score:1)
This reminds me of the original 'Log Filesystem' research in the 80s, back when drive geometry was known to the OS, and the OS could take steps to optimize for it.
The Log Filesystem concept was to write all data sequentially to disk, and update metadata during idle times.
The basic research influenced a number of filesystems, such as NetApp's WAFL, Sun's ZFS, Linux's JFFS2, etc.
Interesting to see the concept implemented 'in hardware'.
144p video? (Score:1)
Why is the camera a potato?
How big is it? (Score:2)
The summary says "Reads and large writes run at about the same speed as on a conventional drive, and at $280 it costs less than a pair of decent 4TB drives", one of the 7 links in the summary mentions a 5TB model. 5TB for the price of two 4TB drives doesn't sound that great.
Re: (Score:3)
Nope..... (Score:5, Insightful)
The real question is whether or not Seagate can maintain similar full drive performance compared to a non-SMR drive.
No.The real question is longevity. Per backblaze and my own anecdotal experience, Seagate drives already have a higher failure rate. Looking at this, any firmware bug or flaw could result in massive data loss of an entire 'band' if written incorrectly.
I understand that in any environment backups are crucial, but I live in the real world. A world where small and medium size business (for good or ill) neglect IT until it bites them. At least with regular drives recovery is often possible with block for block copies, and baring that a clean room has a good chance of recovering crucial data.
If a user has a performance need, I can suggest an SSD or SSD+HDD config with appropriate redundancy and backups. For pure space, large HDDs in an appropriate RAID or ZFS work fine. Per TFS, this is not ready for heavily drive loaded server configs yet, and i do not see a need in residential, or small biz or workstation use where other solutions are far better. To me this is currently a product looking for a solution, and one that is risky to data to boot.
Re: (Score:1)
There are clear, defined, industry standard ways to use a product and if you refuse to do so because you are a cheap and lazy, the ramifications are solely your own.
Can't afford to replace large hard drives? Get smaller o
Re: (Score:2)
Re: (Score:2)
It's actually due to a bad firmware bug (I've replaced 12 of them in the last few months) which still doesn't seem to have fixed the issue completely.
If you issue a SMART command at the exact same time something is writing to the disk, data is read back corrupted (the read times out or something, the controller kicks it out due to inactivity). If you are monitoring your SMART eg. every hour, you would eventually start having 'bad disks' in a seemingly random fashion. In my case, there could be months betwee
Re: (Score:2)
Aren't you afraid you are sticking your head in the sand that way.
What's left: log messages, iostat output ?
Re: (Score:2)
Yes, to both.
SMART has in my career not made one difference to notice whether a drive was bad. SMART only shows things after the drive has gone bad already. A rise in the number of read/write retries in a large disk array is immediately noticeable and you don't need SMART to tell you that a drive is bad (and just the timeouts will usually cause the bad drive to be kicked out). Bad data is immediately noticeable (and likewise, will cause the drive to be kicked out) if you use ZFS, even before SMART catches o
pOINTLESS vIDS (Score:1)
I want (Score:1)
Re: (Score:2)
That'll learn yer not to read YouTube descriptions.
Re: (Score:1)
I really wonder if those who started that rumor don't want us to be more worried about our privacy that we need to be.
Re: (Score:2)
Impatience (Score:4, Funny)
How impatient must one be to tear down a Seagate hard drive before it breaks down?
No data-overwrite-in-place = ransomware relief (Score:1)
data can no longer be over-written in place, requiring SSD-like algorithms to handle random writes.
Good, now when my clients get hit by ransomware there is still hope that the "over-written" file can be recovered.