Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Data Storage Hardware Linux

Linux Not Quite Ready For New 4K-Sector Drives 258

Theovon writes "We've seen a few stories recently about the new Western Digital Green drives. According to WD, their new 4096-byte sector drives are problematic for Windows XP users but not Linux or most other OSes. Linux users should not be complacent about this, because not all the Linux tools like fdisk have caught up. The result is a reduction in write throughput by a factor of 3.3 across the board (a 230% overhead) when 4096-byte clusters are misaligned to 4096-byte physical sectors by one or more 512-byte logical sectors. The author does some benchmarks to demonstrate this. Also, from the comments on the article, it appears that even parted is not ready, since by default it aligns to 'cylinder' boundaries, which are not physical cylinder boundaries and are multiples of 63."
This discussion has been archived. No new comments can be posted.

Linux Not Quite Ready For New 4K-Sector Drives

Comments Filter:
  • Parted / GPT (Score:1, Interesting)

    by Anonymous Coward on Sunday February 14, 2010 @01:34PM (#31135392)

    I heard using parted and GPT labels instead of MSDOS will optimize it on 4096 byte sectors automatically. Any truth to it?

  • by decora ( 1710862 ) on Sunday February 14, 2010 @01:37PM (#31135414) Journal
    the first time i have ever actually gotten 'first post'... it is when i try to make a joke about not having gotten first post. ya see my first post was supposed to come up like second or third.. it would have been HILARIOUS . .. but oh no in soviet russia, the fates mock you!!!!
  • Interesting (Score:1, Interesting)

    by Murdoch5 ( 1563847 ) on Sunday February 14, 2010 @01:39PM (#31135432) Homepage
    I actually have 2 of the these drives in my desktop right now. There is a slight decrease in performance compared to Windows 7 but nothing that it unacceptable or even a need for concern. If you need to worry about the performance lost with the 4k sectors then just go solid state.
  • by macemoneta ( 154740 ) on Sunday February 14, 2010 @01:54PM (#31135536) Homepage

    I know that Fedora seems to have addressed this with parted 2.1.1 [fedoraproject.org] and util-linux-ng 2.1 [fedoraproject.org]. Both are scheduled for Fedora 13, but can be pulled into Fedora 12 by those getting the hardware early.

  • by Anonymous Coward on Sunday February 14, 2010 @01:57PM (#31135556)

    Easiest fix: stop dividing your disks into partitions.

  • by Anonymous Coward on Sunday February 14, 2010 @02:05PM (#31135610)

    GPT wraps itself in a MBR partition map. At the very least the GPT is supposed to include an MBR map that claims the whole disk as used by GPT to avoid issues with old disk tools and the like. And if you've got a partition scheme that's compatible with the MBR scheme they can both contain the same information, assuming your disk tool supports this, so that MBR-only environments can still find your partitions.

    It's also possible to format with GPT and then use an MBR-only tool (fdisk) to go back and manipulate the (fake) MBR to contain a partition that points to the same start/end points as the GPT boot partition -- GPT-aware systems will just ignore the MBR record, and non-GPT systems will at least be able to find the boot partition.

    As to whether your motherboard/firmware supports GPT, it can be hard to say. Anything with EFI is required to support GPT. Some systems with a legacy BIOS pre-boot environment also have support for GPT, because it's the only way to support large disks. But I can't name particular firmware versions that do/don't support GPT.

  • by Sits ( 117492 ) on Sunday February 14, 2010 @02:25PM (#31135762) Homepage Journal

    There is an excellent thread talking about how recent (2.6.31+) linux kernels try to report the underlying hard drive architecture [gmane.org] (found via the OSNews comments [osnews.com]). Alas, it looks like some of these drives are not reporting this data correctly and thus automatic adjustment (at partitioning time) is not taking place. It looks like in the future rather than trying to do detection by reported capability fdisk (and hopefully gparted) will default to sectors of 1MiB if the topology can't be found by default [gmane.org] (unless your media is small).

    Additionally, I gather that recent Fedoras will try to adjust things like LVM to match larger sectors too [storagemojo.com]. Hopefully whatever is laying out LVM will also be fixed too.

    Coincidentally, it looks like Oracle have a very committed dev trying to make this stuff work by default...

  • Re:Interesting (Score:5, Interesting)

    by markus_baertschi ( 259069 ) <markus@@@markus...org> on Sunday February 14, 2010 @02:38PM (#31135818)
    About the microcode part. The drive pretends to be a 512byte drive, but internally is using 4k sectors and and claims to 'translate transparently'. I can understand that in a random-access scenario it it has to read-modify-write 2 sectors each time and performance suffers (2 additional reads and one additional write). But in a sequential access scenario, the penalty should be once per sequence/file, not once per sector. Here the microcode fails completely to make the best out of the suboptimal situation.
  • by Pentium100 ( 1240090 ) on Sunday February 14, 2010 @03:19PM (#31136076)

    Now I wonder why a hard drive company feels the need to have it's hardware LIE to the OS?

    So the hardware is compatible with more software. For example, hard drives still report some number of cylinders, heads and sectors to the BIOS and the OS, but hard drives have been using ZBR [wikipedia.org] for 20 years now (IIRC) so the sector number is meaningless.

    But, as it is now, if my old system needs a new hard drive, I do not need to find an old drive to be compatible with my system (as long as it is IDE or SCSI, I don't know of any adapter from the newer interfaces to ESDI or ST-506, but they probably exist).

    They could have made it a jumper setting set to 512B by default though. I assume the hard drive is faster using 4KB sectors instead of true 512B sectors, they could have made an option to reformat the drive to 512B (or maybe it's not possible with modern drives, I have an old 4GB SCSI drive that can be reformatted to a different sector size (I never tried it though)).

  • by Blakey Rat ( 99501 ) on Sunday February 14, 2010 @03:21PM (#31136092)

    I'm with you, but on the other hand that doesn't mean they should just not give a shit about the quality of their end-product. We know from experience that they can edit and correct stories as corrections arise in the comments, but how often does that happen in practice? (Hardly ever.) Somewhere between a third and half of the stories posted here are either outright lies, or extremely misleading-- I may be exaggerating, but not by much-- and almost never are they corrected.

    Look, any site that posts this article: http://tech.slashdot.org/article.pl?sid=09/02/16/2259257 [slashdot.org] without a single correct simply Does. Not. Give. A. Shit.

    I don't think anybody's expecting the New York Times when they visit here, but some minimum level of competence would be nice. I don't fault anybody for complaining.

  • DragonFly's solution (Score:5, Interesting)

    by m.dillon ( 147925 ) on Sunday February 14, 2010 @03:25PM (#31136120) Homepage

    We're adjusting our disklabel64 utility and kernel support to set the partition base offset such that it is physically aligned instead of slice-aligned, and we are using 32K alignment. That should fix the problem without having to mess around with fdisk.

    The DragonFly 64-bit disklabel structure uses 64-bit byte offsets instead of sector addressing to specify everything. It ensures things are at least sector aligned but we wanted to make disk images more portable across devices with potentially different sector sizes. The HAMMER fs uses byte-granular addressing for the same reason, 16K aligned.

    -Matt

  • by blincoln ( 592401 ) on Sunday February 14, 2010 @03:49PM (#31136294) Homepage Journal

    Actually this problem is potentially much worse on SSD's. Erase blocks are huge, and read-modify-write really sucks on flash.

    Couldn't this be addressed (at least in part) by a battery-backed write cache like better RAID controllers use? Set it up like SAN snapshots (so it just stores the diff between what's in the actual flash storage and what's been changed so far), and then write the changed blocks when it's most advantageous (e.g. when there's an entire block's worth of data, so it would all have to be erased by the flash storage anyway).
    Maybe combine that with something like a disk defrag, except instead of storing frequently-sequentially-read data in physical sequence, store frequently-written data (regardless of if it's sequentially-read or not) in physical sequence.

  • by YesIAmAScript ( 886271 ) on Sunday February 14, 2010 @07:36PM (#31138358)

    I forgot, there is one thing RawCHS nowadays. That is that there is no proper spec for how to know if a partition in an MBR (fdisk) partition table is a valid partition. So there are heuristics that are applied to the entries to guess if they are real or to be ignored as empty. One of the heuristics that some software uses is to ignore all partition entries that don't begin on a cylinder boundary. To be on a cylinder boundary, the partition has to start on a sector number that is a multiple of the number of sectors (S in CHS) in order to be valid. And since all drives 8GB or greater present an S of 63, that is why the first partition on an MBR disk has always started at sector 63, which makes it unaligned when the internal sector size is 4K (8 internal sectors).

    Windows before 2000 checks the CHS alignment of MBR entries and ignores any partition entries that don't start on a multiple of S. So all disks out there are misaligned. With Windows 2000 or later, you can start the partition on any boundary you want.

    Western Digital has a jumper you can put on the drive that adds 1 to all access requests, making all those misaligned first partitions aligned. But it'll also make any aligned partitions misaligned. So the real answer is just to layout your disk different. I would recommend using GUID disk partitioning instead of MBR anyway, because MBR doesn't work for >2TB drives. And GUID doesn't have any weird alignment requirements (and doesn't have any knowledge of CHS).

The hardest part of climbing the ladder of success is getting through the crowd at the bottom.

Working...