Breaking the ATA Addressing Barrier 145
BitMan writes: "If you haven't heard, there has been a new disk geometry limitation looming for some time at 128GB (gigabytes of 2^30 bytes), which is 137GB (gigabytes of 10^9 bytes). As many will note, there have been various BIOS and OS limitations in disk geometry before -- e.g., 512/528MB, 2GB, 8GB, 32/33GB, etc... But what makes the latest 128/137GB "limit" different is that it revolves around the "hard, physical addressing" limitations of the ATA (AT Attachment) interface. 28-bits are used for addressing, which results in the 2^28 sector * 512 bytes/sector = 128/137GB limitation. As such, hardware fiends like myself were wondering when the industry would get around to addressing this "hard" limitation in the ATA interface.
Fortunately, the solution is already in the works. The ANSI ATA committee has accepted a proposal from Maxtor that extends the ATA bus addressing to 48-bits. This allows for up to 128 pB (petabytes of 2^50), which is 144pB (petabytes of 10^15), sizes. This should tide the PC world over until the 2TB (terabytes of 2^40) limit is reached, which is the maximum number of sectors a 32-bit OS can address -- i.e. 2^32 sector * 512 bytes/sector = 2TB.
In addition to breaking the addressing limitation, another addressing limitation was overcome for performance considerations. The maximum number of sectors transferrable in any command was boosted from 8-bit = 256 sectors/command (~128KB max. transfer/command) to 16-bit = 65,536 sectors/command (~32MB max. transfer/command). This should increase ATA/IDE performance in burst transfers and many other operations.
A whitepaper on the new proposal can be found here from Maxtor. Small correction in the article: Maxtor says 144 pB (petabytes) = 144,000 GB (gigabytes), which is quite incorrect. 144pb (petabytes of 10^15) = 144,000 TB (terabytes of 10^12) = 144,000,000 GB (gigabytes of 10^9).
Thanx goes to the most excellent StorageReview site where I first heard of this."
*Yawn* (Score:1)
Human Consciousness (Score:3)
Argh, can't they get it right ONCE (Score:4)
ENOUGH already. Design it RIGHT the 1st time.
Re:Human Consciousness (Score:1)
A better solution to this problem. (Score:1)
How much longer are we going to be stuck with I/O interfaces that bog the CPU (and cripple the user inteface) during heavy disc access?
Apparently only about 1% of us actually want to use our computer to do more than one thing at a time.
--
I've already addressed this problem (Score:1)
Hrmmm..... (Score:1)
Sounds lovely doesn't it?
but then you'd have to pay a whole lot more for lovely SCSI hardware, which may not be an option for many of us (especially people in other not so fortunate countries)
and you'd have to recompile many applications to run on 64 bit, not to mention buying the lovely 64 bit stuff Intel and AMD can't see to agree on, which might I add is pricey, which may not be an option for many of us (especially people in other not so fortunate countries)
doing all of that? ... or a simple FLASH of the BIOS?
Sunny Dubey
48 bits is a lot (Score:3)
Re:A better solution to this problem. (Score:4)
ATA already has features which don't "bog the CPU"
and "cripple the user interface". Your criticism is 5 years old; a few OSes (like Linux) have this fully
implemented.
Of course, the OS can also cripple the user interface during heavy disk access, SCSI or ATA.
What is a gigabyte? (Score:4)
For computer memory, the SI prefixes are certainly used to refer to powers of 2: 640 kB of RAM means 655360 bytes, not 640000 bytes.
For networks or clock speeds, the SI prefixes are certainly used to refer to powers of 10: 10Mbps ethernet can carry 10^7 bits per second, not 10*2^20 bps; similarly, a 1GHz processor runs at 10^9 Hz, not at 2^30 Hz.
And disk space? The manufacturers all specify their sizes in terms of decimal powers. And why not? Everything else, with the exception of computer memory, is expressed in terms of decimal powers.
Let's put this silly argument to rest; I'm sure people have much more important things to argue about (vi vs emacs, BSD vs linux, bash vs ksh...)
Enough! (Score:5)
Maybe just once they should make the painful switch to a simple flat 128-bit address space and be done with it.
Re:Human Consciousness (Score:2)
Re:What is a gigabyte? (Score:1)
I may have it backwards, and i don't know if it ever was approved, but it sure does sound funny. can you imagine buying a 10 gibibyte hard drive?
--buddy
Re:*Yawn* (Score:1)
Re:I've already addressed this problem (Score:1)
You mean Serial ATA [serialata.org]? It should be on the market Q4/01-Q4/02 but since it only deals with the physical/transport layers, it should work in addition to this proposal.
This new generation buggy-whip....... (Score:1)
What's the point? (Score:1)
Ancient Lore (Score:4)
Re:Human Consciousness (Score:1)
OT: crippling user interface... (Score:1)
Re:A better solution to this problem. (Score:2)
Well, actually, I was thinking about VM policy
issues, but yeah, using the BIOS is a sure way
to shoot yoursel fin the foot.
The new ATA bus will have 42 bits (Score:1)
Re:OT: crippling user interface... (Score:1)
On the other hand, my Win9x machine used to all but die when I was formatting a floppy. My guess is Linux and NT have better I/O handling so they have no real problem with reading and writing 1.44mb of data.
Backward compatibility (Score:2)
To achieve this, I propose the creation of the "GATE-A29" to allow older software that wrap around the addressing space to access the first sector of the hard disk to continue to function properly. This gate could be controlled with one of the keyboard controller's free lines (to save one cent), and could be turned on and off from the BIOS. Also, there should be a new INT21h function to control the "GATE-A29".
What a perfect way to extend the PC-AT architecture in the totally unencumbered and elegant fashion it has evolved so far !
[Seriously people, just buy SCSI drives, they already do the work properly]
Re:Ancient Lore (Score:1)
Re:Still... (Score:1)
Re:A better solution to this problem. (Score:2)
I have not seen an ATA-based system perform this way. I find my 333MHz 20MBit (read: old!) SCSI system at work is much more pleasant to use than the 750MHz boxes with ATA66 setups (setup by Dell). All of these systems are running RH7.1.
Examples of the situation that will "bog" or rather "cripple" the faster ATA boxes but not the SCSI one include:
- Starting VMware and booting up the windows virtual machine
- Running updatedb (not really that bad though)
- Installing a couple hundred megs of RPMs.
- Installing Oracle 8i. (~500MB, uncompressing to 1.4GB)
Is there anything I can do to make the newer ATA boxes perform as well as the SCSI one? I'm getting a new computer at work soon to replace the 333, and it probably won't have SCSI, so I'm very interested to learn more.
I do know that Linux is recognizing the drives as ATA and not "reverting to PIO mode" which I've heard is the obvious solution to this type of complaint.
--
Re:Argh, can't they get it right ONCE (Score:1)
Don't you know that creating 64-bit addressing is difficult. The design should fit for today's need.
Let's say that we have the 64-bit addressing. Thus, every single transfer (either read or write) has to send this 64-bit signal in which some of those are padded with zeroes (i.e. unused). Don't you imagine how much power it wastes to transfer those zeroes? Moreover, 144 PB should be enough for 20 years [maxtor.com]. Come on! Be realistic. By 20 years, mankind would have come with different solution.
Re:I've already addressed this problem (Score:1)
minor correction (Score:2)
I don't think a "20MBit" SCSI system would stand much of a chance at competing with anything except my TI99/4A.
--
Re:OT: crippling user interface... (Score:1)
Re:A better solution to this problem. (Score:2)
Well, you didn't say if you had DMA enabled for the ATA drives or not. It doesn't come that way out of the box, and I have no idea if Dell does the right thing or not. The program you use to investigate this is "hdparm", and you should be able to find a HOWTO which discusses it.
However, DMA mode is only the half of it. The amount of memory you have and the ability of your OS to properly manage it is the other half of it.
Re:Ancient Lore (Score:1)
Re:Argh, can't they get it right ONCE (Score:1)
Re:Argh, can't they get it right ONCE (Score:5)
There are good reasons for using smaller address words, caching efficiency chief among them. On systems which run their filesystems fully-asynchronously (like linux), filesystem caching efficiency is a primary factor in limiting performance for filesystem-intensive applications. When your file data set exceeds main memory's ability to cache it, performance can plummet like a stone. If your filesystem cache metadata takes up 25% more space because you are using 64-bit address words instead of 32-bit (64 bits is 100% larger than 32 bits, but metadata records contain a lot more than just address words), then your maximum cacheable filesystem is only 80% (1.0/1.25=0.8) what it could have been.
Even with small data set sizes, this can mean a lot, because you see performance degradation when you spill L1 cache, and another when you spill L2. It's a lot easier to get 25% more compact metadata than it is to get 25% larger L1 and L2 caches!
What makes the most sense is to design our operating systems to be able to treat filesystems as large (2+ TB) or small (2- TB), and use cache data structures to match. That way we'll have higher performance in the "common case", and corporations who need to be able to support (eg) huge databases will be able to do so.
For the past few years, hard drive data density per dollar (in best density per dollar products, not top or bottom of the line products) has been increasing exponentially at a rate of 2.15x per year. If we project that naively and assume it holds steady (which is unlikely, so take this with appropriate salt) then we should expect to bump into the 2 TB limit on our home desktop computers in about 10 years. To me, that makes filesystem-segmented 32-bit sector addressing "good enough" for a good long time.
(I have been tracking hardware trends since the early 90's; the past two years' worth has been collected automatically via web-bot from the same vendor, so it is easily indexable
-- Guges --
Re:STOP POSTING THIS ALREADY (Score:1)
Slashdot did not post your article because this is not guarenteed to happen. If Apple were to take over development of powerPC processors then I'm sure the article would be posted. As of now it is only speculation.
So please, stop posting this...
Re:Ancient Lore (Score:2)
What about SerialATA? (Score:3)
Also, has anyone checked to see if CPRM is being "stealthed" into the spec?
Re:Human Consciousness (Score:1)
Re:translation of (Read this please) (Score:1)
wah, wah, wah, none of my articles ever get posted, wah, wah, wah, I post off-topic and it gets moderated as off-topic
What about copy protection? (Score:3)
It's easy to say that we'll stick with the uncrippled technology we already have. But as it ages and becomes obsolete (ie., can't handle normal sized disks), we'll be pushed into the next generation whether we like it or not. If that next generation includes copy protection, we'll have to live with it.
Re:What's the point? (Score:2)
Re:*Yawn* (Score:1)
New unit of measure... (Score:3)
How about the Bogo-Gig
For instance, 137 BG Bogo-Gig (base-10 or marketing-style), which translates to 128 GB (base-2, fair and actual non-marketing gigabytes)
Any ideas on better units for monitor sizes?
Maybe ideas for MTBF (Marketing Time Between Failures?) I was told that hard-drive manufacturers actually count on several returns per drive. It's definitely not 100,000 hours like they like to say.
Re:Human Consciousness (Score:1)
Re:*Yawn* (Score:1)
Only problem is.. where do we go from here?
Improving IDE performance in Linux (Score:4)
-d1: Use DMA
-c3: 32-bit IO
-m16: Transfer 16 sectors at a time
-u1: Unmask interrupts
According to the hdparm man page, the -u1 option will greatly increase system responsiveness. The other options mainly improve throughput. Many people also use the "-X66" flag to select UltraDMA mode2 transfers, although my BIOS seems to do that automatically.
To test your IDE transfer rate (do this before and after tuning), use "hdparm -tT /dev/hda". It is recommended you test your IDE settings on a read-only filesystem before you start using them regularly - usually the commands don't cause problems, but they can cause major data corruption with a few buggy chipsets (having a recent kernel might help).
You may also be able to recompile your kernel, having it use IDE drivers specific to your chipset, rather than the generic IDE drivers. Kernel 2.4 is probably needed, and you should know what kind of IDE chipset you have (check your motherboard manual). Go to "ATA/IDE/MFM/RLL support" in the kernel config menu, and make sure it is set to "Y". Then go to "IDE, ATA, and ATAPI block devices", make sure the following options are set to "Y":
Then save the config, compile, and install the kernel. You may still need to use the hdparm commands after doing this, just put them in a startup script.
Re:Linus Torvalds smells like most Europeans (Score:1)
Oh, and just for the record: Linus is Finnish
Re:Enough! (Score:1)
Most of the problems with compatibility have been BIOS problems, and operating systems that bypass the BIOS do not have these problems. Linux does not use the BIOS (although LILO does, so you might need a boot partition below 528MB), and I don't think WinNT and *BSD do either. Win9x does use the BIOS, and because so many people use it, a lot of people have had problems with large hard drives. These problems are not the fault of the ATA standard.
Quality costs (Score:1)
Higher quality (reliability, performance, etc) means higher cost.
Not just memory (Score:1)
Not very logical IMHO (and I think the thousands of newbies asking where the 2 gigs from their new 30 gig HD went will agree)
--
gigabytes (Score:3)
Calling both of these "gigabytes" is confusing. The second figure should be referred to as "metric gigabytes"!
Re:Human Consciousness (Score:1)
*rimshot*
Re:Can you imagine... (Score:1)
Your welcome.
Re:48 bits is a lot (Score:1)
Considering that 2001 adapters are register-compatible with those from 1984, I wouldn't be shocked if you could.
Re:A better solution to this problem. (Score:1)
Both machines are plenty fast for what I need, it's just a perceptual smoothness that's present on the SCSI machine and absent on the IDE one. I know the benchmarks make it look about equal, but if you've got the dough, SCSI is a worthwhile upgrade in my book, over say an incrementally faster CPU.
Re: LILO (Score:1)
I don't remember since when, but LILO now supports linear address mode, and my linux bootpart happens to live at 10GB just fine.
Re:OT: crippling user interface... (Score:1)
Many eggs in 1 basket (Score:2)
Local screwdriver shop has 40GB deskstars for $120. They have a 75 GB for $240. The obvious solution = buy 2x40 and stripe them. Same $$, more capacity, and 2x the speed.
Re:What is a gigabyte? (Score:1)
Re:Enough! (Score:1)
pB != PB (Score:1)
Not to be pedantic, but ... (Score:1)
Personally, I don't want any picobyte-capacity hard drives. How about you?
Re:gigabytes (Score:1)
If anyone knows what Im talking about...
Re:Not to be pedantic, but ... (Score:1)
Re:Whitepaper hints at 64-bit addressing (Score:2)
9.4 Zettabytes is correct. Zetta is the decimal prefix.
Re:*Yawn* (Score:2)
SCSI-3 divorced the transport and access protocols. Yes, parallel SCSI has limitations, but you can run SCSI over Fibre Channel (SCSI-FCP) and even IP (iSCSI).
Nice try.
Re:Human Consciousness (Score:2)
2TB Limit? (Score:1)
(terabytes of 2^40) limit is reached, which is the maximum number of sectors a 32-bit OS can address -- i.e. 2^32 sector * 512 bytes/sector = 2TB."
Well, you make some wrong assumptions here, a 32-bit OS can do 64 bit arithmetics quite well.
(many do so for file accesses e.g.).
Re:Many eggs in 1 basket (Score:2)
Yay data loss! I just wish tape drive manufacturers would get their heads (and prices) out of the clouds so I could get a cost effective backup solution.
Down that path lies madness. On the other hand, the road to hell is paved with melting snowballs.
Re:Argh, can't they get it right ONCE (Score:3)
Well, if you really want to kill this problem you need some kind of call that will tell you how many bits the address is. Even if that number is just 32-bits, the actual sector number could as large as 4 gigabits wide. How much is 2^(2^32)?. A bloody huge inconceivable mind exploding number (that's the technical term for it). Right now I can't think of any reason for having more sectors than there are atoms in the universe, but don't dismiss it. That's the same kind of thinking that got us the 640k barrier.
Re:What about copy protection? (Score:1)
Re:gigabytes (Score:2)
Try googling for "kibibyte" or even for "kibble byte".
It might even be an actual standard by now but everybody just feels entirely too silly saying them.
Re:Many eggs in 1 basket (Score:1)
No. Larger size in the same form factor == higher density. Assuming the platter spins at a constant linear velocity, and constant data density on disk, higher density means more bits per inch of circumference to read off and transfer. Of course there are ways to get around this by magicing with non-sequential sectors etc. but you want faster transfers on larger disks anyway, or have fun waiting for large (#/size) disk ops to finish.
For performance and redundancy (I can't imagine that nothing in your 128GB dataset is not valuable), you need to go multi-spindle, i.e. RAID. Case in point: Local screwdriver shop has 40GB deskstars for $120. They have a 75 GB for $240. The obvious solution = buy 2x40 and stripe them. Same $$, more capacity, and 2x the speed.
Most RAID(-like) striping schemes double the READ speed (two apertures seeking one piece of data) but reduce the write speed (two apertures writing the same piece of data and/or redundancy structures on to two disks) plus there's I/O overhead. Plus you have to get added/expensive goodlier controller(s) for decent performance.
Re:Not to be pedantic, but ... (Score:2)
Re:gigabytes (Score:2)
--Fesh
Re:Many eggs in 1 basket (Score:1)
Re:Many eggs in 1 basket (Score:1)
Re:Enough! (Score:2)
Sectors are referenced by a single number, cylinders and heads are no longer used.
The problem is that sometimes, for people who need to do seriously high-performance I/O, you want to be able to know the drive's geometry and reference sectors at specific cylinder/head locations, to optimize sequential access and minimize seeks. Sure, they're laid out sequentially, so you can just assign things sequentially and expect access to be "mostly" continuous, but if you don't know where the cylinder boundaries are, you'll occasionally get unlucky and have something spanning two cylinders and causing a lot of unnecessary seeks. Knowing a bit about your access patterns, you could have avoided this if you'd known where the boundaries were. You might even want to get really scary [tuxedo.org] and try to make the locations of things on the disk correspond to the time you expect to need them, so they'll always be just passing under the head when needed.
Of course, this mainly matters for things like high-end databases, but it might conceivably be worthwhile for other high-end applications like media streaming, video editing, or 3D rendering, or for low-level system things like swap-storage management or filesystem layout, where nobody else would have to worry about it but the benefit would apply across the board.
David Gould
serialATA not for some time (Score:1)
Re:What about SerialATA? (Score:1)
Serial ATA is designed to be hardware-compatible with current ATA. One can assume that the spec will account for 48-bit addressing. In the meantime, remember that widening the address space on a parallel bus will increase costs; more wires, more costs.
New, huge drives can be built with nary a concern over the serial or parallel-ness of the interface.
Re:A better solution to this problem. (Score:2)
So you have no idea if it's enabling ATA DMA or not... especially the bit that allows faster interrupts? You know, the Linux hdparm -u1 flag?
What? No copy prevention mechanism? (Score:2)
The MPAA and the RIAA will surely kill this technology!!!!
--
Knowledge is, in every country, the surest basis of public happiness.
Re:Enough! (Score:2)
It's a seductive idea. I've even written that code back in the olden days. Of course mine was nowhere near the level of Mel's scary [tuxedo.org] code. But I sure wouldn't want to attempt it today.
Modern disk drives with zoned recording, megabytes of cache, and automatic bad block remapping would make any attempt at software optimization a nightmare. You could easily end up pessimizing by abusing the cache on the controller or accessing sectors that the drive controller says are contiguous but which in reality have been transparently remapped to Lower Slobbovia [lil-abner.com]. I don't think there is any way to get guaranteed accurate geometry any more. I am afraid you just have to take what the drive is willing to give you and hope for the best.
Re:Many eggs in 1 basket (Score:2)
2) RAID5 (unfortunatelly i haven't seen ATA controlers doing this in hardware, so it's either SCSI or software raid)
3) Click on my
--
Re:gigabytes (Score:2)
It might sound slightly less ridiculous to use Homer Simpson-esque words "kilomabyte", "megamabyte", "gigamabyte"...
"Saxamaphone"...
Re:gigabytes (Score:2)
Re:What about SerialATA? (Score:2)
That was my point(granted it was implicit rather than explicit). They're dicking around with half-arsed compromises like 48bit instead of doing a real job and using 64bit or 128bit.
Serial ATA is designed to be hardware-compatible with current ATA.
Er, no. SerialATA is designed to use the same command protocols, and drive & controller subsystem will remain the same (as happens for IDE and SCSI drives) but the interface hardware changes on both the drive and the host. There will almost certainly be adaptor boards (like the SCSI SCA-U/SCSI2 boards) to make a parallel ATA wire-compatible with SerialATA. Everything else just becomes a software issue...
So what? (Score:3)
Remember ATA66? Intel was the LAST vendor to adopt that standard into their chipsets - Via, ALi et al all had solutions in the marketplace while Intels BX was their champion and CaminoGate was giving us all a jolly good laugh.
Remember too PC133 memory. Other chipset vendors have been supporting this for ages, but Intel have only just "gotten off the dime".
You should also give the drive makers more credit. They will realise that SerialATA is a change of maybe 15% to the drives controller board - just a change to the physical interconnect and the silicon that drives it. They're already doing this to produce both SCSI and ATA drives, so rolling out another is not that big a deal.
Re:gigabytes (Score:2)
Or maybe... "gigabytes for dummies"
Striping is not RAID! (Score:2)
To save people needless grief from following this advice, I feel that something should be explained.
There is a very good reason why striping is called RAID Level 0 (that's a zero), which is because it's not redundant! By utilizing RAID-0, you double the probability that one of the two hard drives will fail, and since 50% of all the data is on the other drive, even the drive that doesn't fail is basically unrecoverable! Sure, it doubles read speed, but even hardware implementations will have slightly slower write speed unless both drives are absolutely identical in geometry, access speed and spindle speed. RAID-0 is essentially useless for anything where reliability is concerned. There are numerous RAID tutorials that explain the differences. Just check Google, or see this quick explanation of RAID levels [astrumsoftware.com].
(Moderators, I'm expecting at least one mod point for this short but informative post which includes a link to an informative web page. If I do not get at least one mod point, I will go postal and kill my boss. Urm, I'm self-employed, nevermind.)
--RedBear
=============================================
s/the/the next/ (Score:2)
There have been a lot of improvements to ATA since it first appeared. But when I see so many changes and improvements year after year just to approach the same level of performance (and not really even try for the same flexibility) as SCSI has had since day 1, I can only conclude that ATA was, and presumably still is, misdesigned from the start. That's why I don't use it, and why you shouldn't either. Use SCSI, or 1394, or FC, or one of the hot new technologies that you can't buy yet. It's long past time for ATA to die. They blew it from the start and have never really recovered.
Of course, in that regard, ATA and the peecee are a match made in hell. Neither one has ever let the cost of competent design work raise the price by even 1 dollar. You might say you get what you pay for, but if you're buying this stuff you must be paying for rape.
NIST defined prefixes for binary multiples (Score:2)
Still plenty of size limits to come (Score:2)
But this won't be the end of "storage barriers". As a quick example, Linux 2.4.0 had a maximum size for any volume (even LVM & RAID) of only 1 TB last time I checked. Maxtor's white paper mentioned 2 TB limits in nearly all OS's due to use a 32 bit integer to store the sector address.
Microsoft undoubtedly has many similar implementation limits, even if the raw format of FAT32 and NTFS can handle more. If past history is any indication, PC bios code will also consistantly have problems with "next years" drives, simply due to bugs and short sighted coding, even though there's no "real reason" for it.
Will the real Gigabyte please stand up? (Score:2)
Re:*Yawn* (Score:2)
Not if everyone switches to it and it becomes the mainstream. The cost difference is almost completely due to economy of scale.
Probably the real reason that SCSI isn't mainstream, is that if it were, it would kill the differentiation betweenn the low-end and high-end markets, and then manufacturers wouldn't be able to get away with charging extra for SCSI. There is strong incentive for there to be more than one standard, so some form of IDE will always have its niche.
---
Re:Backward compatibility (Score:2)
Re:Enough! (Score:2)
The old CMOS configuration format used something like 1 byte for number of heads, 10 bits for number of cylinders (1-based), and 6 bits for number of sectors per track. Hence you can specify up to 255 heads, 1024 cylinders, and 63 sectors/track. I think the same format is used for the parameters of BIOS calls for reading and writing disk sectors, which are used by DOS and by the first stages of boot loaders. This results in a limit of about 8 GB.
The IDE C/H/S addressing mode allows something like 4 bits for head number, 10 bits for cylinder number, and 14 bits for sector number. If the BIOS uses this mode and doesn't invent a new geometry for use in BIOS calls then it can only work with drives that claim geometries of up to 16 heads, 63 cylinders, 1024 sectors. This results in a limit of about 504 MiB or 528 MB in older BIOSes.
If the drive had to tell the truth about its geometry then the limits would be even smaller.
Re:*Yawn* (Score:2)
Re:*Yawn* (Score:2)