64-bit x86 Computing Reaches 10th Anniversary 332

Posted by Unknown Lamer on Monday April 22, 2013 @06:42PM from the long-live-athlonmp dept.

illiteratehack writes "10 years ago AMD released its first Opteron processor, the first 64-bit x86 processor. The firm's 64-bit 'extensions' allowed the chip to run existing 32-bit x86 code in a bid to avoid the problems faced by Intel's Itanium processor. However AMD suffered from a lack of native 64-bit software support, with Microsoft's Windows XP 64-bit edition severely hampering its adoption in the workstation market." But it worked out in the end.

64-bit x86 Computing Reaches 10th Anniversary

This discussion has been archived. No new comments can be posted.

Search 332 Comments Log In/Create an Account

Comments Filter:

Re:Did it really work? (Score:1, Informative)

by Anonymous Coward writes: on Monday April 22, 2013 @07:07PM (#43520423)

Depends on how it's coded, for example: 64 bit MAME runs around 30% faster than the 32 bit version: http://www.mameui.info/Bench.htm [mameui.info]

Re:Did it really work? (Score:5, Informative)

by cbhacking ( 979169 ) writes: <been_out_cruisin ... AT yahoo DOT com> on Monday April 22, 2013 @07:10PM (#43520447) Homepage Journal

Most programs still don't need to work with numbers larger than 4 billion on a regular basis, so native 32-bit ints are just as fast as native 64-bit ones.
Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.
Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.
Where 64-bit does become really valuable is working with very, very large amounts of sequential data (want to allocate a 10GB array? Can't do that on x86, no way no how). That's hardly a typical requirement right now (although I wrote a program a few weeks ago that needed to do it). However, it's getting closer. Additionally, while clever memory mapping can allow a 32-bit process to access over 4GB of RAM (just not all at the same time), there is a (small) performance impact associated with the need to be constantly re-mapping that memory.
The other area where 64-bit really helps is with security, specifically exploit mitigation. High-entropy ASLR in recent versions of Windows and some other OSes randomly places 64-bit aware executables and their various data regions across their entire 64-bit address space. This not only makes it completely impossible to correctly guess the address of any given bit of code in memory, it also makes spraying (heap spray, JIT spray, etc.) attacks completely infeasible; to cover even a tenth of a percent of the address space, you'd need to spray 16 million gigabytes of data. That's not only quite impractical at modern CPU speeds (even on a blazingly fast CPU and done in parallel, it would take a week or more), it also is far more memory (physical or virtual) than any modern computer will be able to allocate.

Re:64 bit x86 worked out, but not for AMD (Score:5, Informative)

by sayfawa ( 1099071 ) writes: on Monday April 22, 2013 @07:17PM (#43520495)

The next console generation disagrees. Sony and MS are both using AMD.

x32 ABI (Score:5, Informative)

by Chirs ( 87576 ) writes: on Monday April 22, 2013 @07:49PM (#43520763)

And for those that want the best of both worlds, there is the x32 ABI, which uses all the good stuff from x86-64 (more registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, faster syscall instruction... ) while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers.
They're working on porting Linux to the new ABI...kernel and compiler support is there, not sure about all the userspace stuff.

Re:x32 ABI (Score:4, Informative)

by KiloByte ( 825081 ) writes: on Monday April 22, 2013 @08:14PM (#43520939)

kernel and compiler support is there, not sure about all the userspace stuff.
Just debootstrap it from Daniel Schepler's repository [debian.org]. Most of the work has since moved to official second-class repositories (AKA debian-ports), but because of the freeze, you want both, So after debootstrapping, echo "deb http://ftp.debian-ports.org/debian [debian-ports.org] unstable main" >>/etc/apt/sources.list and you're set.

Re:Let us give thanks.... (Score:5, Informative)

by Cyclon ( 900781 ) writes: on Monday April 22, 2013 @09:15PM (#43521285)

They're working on it: http://www.amd.com/us/press-releases/Pages/JimKellerJoinsAMD-2012aug01.aspx [amd.com]

Comment removed (Score:5, Informative)

by account_deleted ( 4530225 ) writes: on Monday April 22, 2013 @09:25PM (#43521343)

Comment removed based on user account deletion

Re:Did it really work? (Score:3, Informative)

by Anonymous Coward writes: on Monday April 22, 2013 @09:47PM (#43521443)

Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.
You're arguing on the correct side, but what you wrote here is badly flawed. Packing multiple 32-bit values into a 64-bit register is near worthless, what is valuable is amd64 gives you twice as many general-purpose registers (that also happen to be 64-bits wide). A far bigger gain for 64-bit on x86 was the addition of full relative addressing. Instead of 32-bit jumps always being to absolute addresses, in 64-bit mode software can do addressing relative to the program counter. This helps a great deal with libraries, since instead of needing large relocation tables, they simply use relative jumps that are valid no matter what address the library is loaded at. With most processors using 64-bit mode loses performance due to having to shuffle more data around, x86 is about the only one that gains performance.

Re:Did it really work? (Score:5, Informative)

by BitZtream ( 692029 ) writes: on Monday April 22, 2013 @10:43PM (#43521697)

PAE is more or less old school segmentation. You can't say 'it has a 3% slow down' because it has 0 slowdown if that particular page is already in memory, and if not ... it has the same 'slowdown' as an other paging operation plus a fixed number of cycles. So if you're dealing with tiny amounts of 'more than 2/3gb' then the overhead is a lot higher than if you're mapping out 2GB on every window change. PAE is just another form of paging. It is slower, but you're making numbers up from nothingness.
The interger math performance of the processer has nothing to do with it being 64 bit. Most (All now?) x86-64 processors internally will process 2 32 bit numbers in the same span as a 64 bit number if properly optimized by sending the 32 bit values through together. 64 bit code using less than the OS max for 32 bit code is actually slower than 32 bit code due to the increased pointer sizes wasting the processors registers filling them with 0s.
You really have no idea how processors work. While nothing you said is illogical, it is still in fact wrong in every account. Under the hood, processors don't work anything like they do on the surface.
Other processors also do other weird things. I have an 8 bit CPU that can handle 32 bit numbers in a single clock cycle, exactly like it does 8 bit numbers ... and the neat thing ... it can do 2 16 bit numbers in a single clock cycle! Why? Because the processor as I see it from a software developers perspective isn't anything like the actual hardware doing the work. Processors have translation units in front of them to provide you with one look while allowing themselves to rewire the backend in all sorts of different ways.

Re:Did it really work? (Score:4, Informative)

by Guy Harris ( 3803 ) writes: <guy@alum.mit.edu> on Monday April 22, 2013 @11:59PM (#43522041)

but PAE is nothing but a spec on top of the other mess of bullshit known as segmentation.
Actually, no, it's a mode that changes the page table format to allow larger physical addresses in page table entries. Nothing to do with segmentation.

Re:Did it really work? (Score:5, Informative)

by Alioth ( 221270 ) writes: <no@spam> on Tuesday April 23, 2013 @12:06AM (#43522079) Journal

x64 has twice as many registers. That alone means less having to move stuff in and out of memory, so that will improve the speed when compared to 32 bit applications. 32 bit x86 has only 4 truly general purpose registers. x64 adds another 8 64 bit registers.

Re:Twice as big as it needs to be? (Score:5, Informative)

by SEE ( 7681 ) writes: on Tuesday April 23, 2013 @01:06AM (#43522275) Homepage

it's an easy choice unless you absolutely need 16-bit support.
The annoying thing being that an x86-64 processor in long mode can, in fact, run 16-bit protected mode code (like essentially all actual Windows 3.x programs) with the same compatibility sub-mode that runs 32-bit code. It's merely that Microsoft decided they didn't want to bother supporting it.
That this can be done is easy enough to prove; take a Win16 app and run it in WINE on 64-bit Linux.

Re:How soon till we get 128-bit? (Score:4, Informative)

by petermgreen ( 876956 ) writes: <plugwash AT p10link DOT net> on Tuesday April 23, 2013 @08:57AM (#43523917) Homepage

A long time.
We don't even have true 64-bit x86-64 processors yet. While programmers are told to* treat pointers as 64-bit in the current implementation (reffered to as a "48-bit implementation" there are only 47 usable bits for user-mode pointers**. That is enough to map 128 terabytes to one process, afaict the most ram you can currently get in a PC architecture machine is 2 terabytes.
If we assume the largest available memory size doubles every 1.5 years and we want to be able to map all the memory to one process then we have 9 years until the current implementation is used up and another 24 years after that before a "full 64-bit" (with one bit used to distinguish between kernel and user mode) implementation is used up.
* Of course just because programmers are told to do something doesn't mean they will http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642750 [debian.org]
** A 48th bit is used to differentiate kernel and user addresses. The number is then sign-extended to produce a 64-bit number.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

64-bit x86 Computing Reaches 10th Anniversary 332

64-bit x86 Computing Reaches 10th Anniversary More Login

64-bit x86 Computing Reaches 10th Anniversary

Re:Did it really work? (Score:1, Informative)

Re:Did it really work? (Score:5, Informative)

Re:64 bit x86 worked out, but not for AMD (Score:5, Informative)

x32 ABI (Score:5, Informative)

Re:x32 ABI (Score:4, Informative)

Re:Let us give thanks.... (Score:5, Informative)

Comment removed (Score:5, Informative)

Re:Did it really work? (Score:3, Informative)

Re:Did it really work? (Score:5, Informative)

Re:Did it really work? (Score:4, Informative)

Re:Did it really work? (Score:5, Informative)

Re:Twice as big as it needs to be? (Score:5, Informative)

Re:How soon till we get 128-bit? (Score:4, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot