Catch up on stories from the past week (and beyond) at the Slashdot story archive

NASA Benchmarks the New G5 Powermac 751

Posted by michael on Friday July 04, 2003 @05:29PM from the measuring-up dept.

sockit2me9000 writes "Well NASA's Langley Research Center recently benchmarked the new G5 dual 2ghz Powermac against a dual 1ghz Xserve, a dual 1.25 ghz Powermac, a Pentium4 2 ghz, and a Pentium4 2.66 ghz. To make things fair, the second processor in the G5 was switched off, as well as the other dual sysytems. Then, they all ran Jet3d. Even with un-optimized code and one processor, the G5 performance is impressive."

This discussion has been archived. No new comments can be posted.

NASA Benchmarks the New G5 Powermac

Search 751 Comments Log In/Create an Account

Comments Filter:

Single Processor Mode (Score:5, Informative)

by CptChipJew ( 301983 ) * writes: <michaelmiller@@@gmail...com> on Friday July 04, 2003 @05:30PM (#6369191) Journal

Because I have a strong feeling this is going to be asked:

For those of you who were wondering, you too can switch off one of your Mac's dual CPU's with the Apple CHUD Tools [apple.com]. Look near the bottom of the page. It'll make you appreciate your second processor ;)

Personally though, I want to see how well it runs Seti@Home [berkeley.edu].

Share
twitter facebook
Re:Single Processor Mode (Score:1, Informative)

by Anonymous Coward writes: on Friday July 04, 2003 @05:34PM (#6369206)

CHUD also comes with Reggie, which USED to let you sort of overclock your iBook, but of course Apple removed that functionality.

Parent Share
twitter facebook
fortran compiler (Score:5, Informative)

by mz001b ( 122709 ) writes: on Friday July 04, 2003 @05:38PM (#6369230)

It is interesting to note that they used the Portland group compiler instead of the intel compiler. For the CFD code that I work on (which is mostly Fortran), the Intel compiler produces much faster code than the Portland group compiler (as much as 50%).

Share
twitter facebook
Re:Summary (Score:3, Informative)

by Anonymous Coward writes: on Friday July 04, 2003 @05:43PM (#6369253)

Translation: Slower than the P4 for anyone who didn't look at the grid.

Real Translation: 0.4% slower, at 75% of the clock speed.

Parent Share
twitter facebook
Re:Costs - correction (Score:5, Informative)

by mgkimsal2 ( 200677 ) writes: on Friday July 04, 2003 @05:47PM (#6369277) Homepage

$2999 for the mac 2x2ghz

Parent Share
twitter facebook
Re:Single Processor Mode (Score:5, Informative)

by jbm ( 17264 ) * writes: on Friday July 04, 2003 @05:50PM (#6369286)

you ... can switch off one of your Mac's dual CPU's with the Apple CHUD Tools.

You can also do this simply with the cpus= boot argument; here's a reference [apple.com].

Parent Share
twitter facebook
Re:Turn the optimizations on first. (Score:5, Informative)

by Phroggy ( 441 ) * writes: <slashdot3@phroUMLAUTggy.com minus punct> on Friday July 04, 2003 @05:58PM (#6369324) Homepage

I hope they didn't use gcc (the yet-another free and hopeless compiler).

It should be noted that Apple uses gcc to compile Mac OS X and most of their applications, so it would be appropriate to use gcc on the G5. Intel's compiler might be a more appropriate choice for the Xeon.

Parent Share
twitter facebook
Re:Interesting choice of processors (Score:4, Informative)

by Phroggy ( 441 ) * writes: <slashdot3@phroUMLAUTggy.com minus punct> on Friday July 04, 2003 @06:00PM (#6369335) Homepage

If there budget is such that dualie 2Ghz G5's are a possibility...

Budget had nothing to do with it; the PowerMac G5 isn't shipping yet. NASA had to have obtained theirs through a special arrangement with Apple.

Parent Share
twitter facebook
Re:Summary (Score:1, Informative)

by Anonymous Coward writes: on Friday July 04, 2003 @06:04PM (#6369358)

You'll notice that, while slower by about 1M flop, the G5 is a 2Ghz processor, while the P4 is a 2.66GHZ processor. The G5 is more efficient. If you were to bring the 2Ghz G5 chip up to 2.66GHZ, it would still smoke the P4.
Disclaimer: I'm still an x86 fan, just impressed by the new G5.

Parent Share
twitter facebook
If I remember right... (Score:4, Informative)

by LordOfYourPants ( 145342 ) writes: on Friday July 04, 2003 @06:14PM (#6369406)

If one thing is 80 dB and one is 90 dB, the second object is twice as "loud." Each 10 dB jump either doubles of halves "loudness." ie: If you're at 1000 dB vs 1010 dB, the 2nd object is twice as loud.

So, based on what was said at the keynote (and my interpretation), the G5s are 10dB quieter. Twice as quiet sounds more impressive. Note that saying "half as loud" still implies "loud" so psychologically it's not as impressive.

If I'm wrong, I'm sure someone will jump on me soon enough. I'm holding on tight.

Parent Share
twitter facebook
Re:MFLOPS/MHz? No AMD, Old P4, Old Redhat. (Score:4, Informative)

by Jeffrey Baker ( 6191 ) writes: on Friday July 04, 2003 @06:15PM (#6369411)

I think it's pretty obvious why they tested the G5: their Altivec program is 13X faster than their scalar program. They don't mention the SSE2 so I assume they have an investment in Altivec programs. Therefore they would naturally be interested in comparing the G5 versus the XServe and G4. Until Intel releases the 34.5GHz P4 (13X 2.66GHz), there doesn't seem to be any reason to run out and buy a latest P4 just for this comparison.

And surely the version of RH Linux hardly matters. Maybe they benchmarked using this OS because (shock, horror) it is what they use daily.

Parent Share
twitter facebook
Steve Jobs lucked out -- again. (Score:1, Informative)

by Anonymous Coward writes: on Friday July 04, 2003 @06:24PM (#6369454)

Um, yeah, sure is lucky Apple found the G5. I'm sure they had nothing to do with its development. It's not like Apple has been involved with development of the whole PowerPC architecture since the early 90s.

But Jobs wasn't at Apple during that time. What's the timeline:

1983: Scully joins Apple.
1985: Scully fires Jobs. Jobs, no longer at Apple , begins Next.
1991: PowerPC alliance between Apple and IBM.
1993: Scully leaves Apple, Spindler becomes CEO.
1996: Gil Amelio becomes CEO. Jobs approaches Amelio with idea for Apple to buy Next. Apple buys Next.
1997: Amelio fired from Apple.

Parent Share
twitter facebook
Re:G5 is really a full-blown workstation (Score:3, Informative)

by afidel ( 530433 ) writes: on Friday July 04, 2003 @06:31PM (#6369491)

The Power4 has 128MB of L3 cache, to get comparable performance to a Power4 the G5 needs considerably higher clock speeds. Also Sun hasn't been competitive in the single and dual CPU workstation market for some time. The only things they had going for them were large memory support and large CPU scalability, now everyone is getting large memory support with the transition to 64bit so they only have large cpu counts to fall back on.

Parent Share
twitter facebook
Re:Single Processor Mode (Score:0, Informative)

by Anonymous Coward writes: on Friday July 04, 2003 @06:31PM (#6369494)

No shit. You can also not be a total ramrod and do it form System Preferences. See the CPU icon? Check it out.

Parent Share
twitter facebook
And before anyone asks... (Score:0, Informative)

by SlashChick ( 544252 ) writes: <ericaNO@SPAMerica.biz> on Friday July 04, 2003 @06:32PM (#6369497) Homepage Journal

Just thought I'd point out that there's a bit more difference here than meets the eye.

Dell Dimension 4600
Pentium 4 2.66GHz
512MB RAM
40GB hard drive
48x CD-ROM
XP Home

$749. Ships today. Oh, and that comes with a free 15" flat-panel monitor. (Link [dell.com])

Configured with a CD-RW and a 120GB hard drive, the above system is $919.

Apple Power Macintosh G5
1.8GHz (I'll be fair and not pick the dual 2GHz model)
512MB RAM
160GB hard drive
Superdrive

$2399. Not shipping until September. (By then, the above Dell will be $500 or less.) No monitor included.

Even if the G5 has better performance (and honestly, I still haven't seen reliable benchmarks with the dual 2.0GHz facing a P4/3.2GHz with Hyperthreading on), it still can't compete on price.

No matter how cool Mac OS X is... no matter how awesome these new G5s will be... if they are three times the cost of a PC, buyers will have a hard time justifiying it. In this economy, Apple will be hard-pressed to sell those $2300-$3000 desktops when people can get an equivalent-or-just-slightly-slower Dell for $800... including a flat-panel monitor.

Share
twitter facebook
Re:I'll wait for a real comparison. (Score:5, Informative)

by JonathanBoyd ( 644397 ) writes: on Friday July 04, 2003 @06:35PM (#6369512) Homepage

Hyperthreading isn't a magic double-the-speed-of-your-processor feature. In fact, ti can slow a computer down. What it is nice for is for running multiple threads or programs more efficiently.

Parent Share
twitter facebook
Re:I'll wait for a real comparison. (Score:3, Informative)

by autopr0n ( 534291 ) writes: on Friday July 04, 2003 @06:35PM (#6369513) Homepage Journal

I'll be more impressed if the folks at the Langley Research Center compared the Apple Power Macintosh G5 with the 2.2 GHz PowerPC 970 CPU against a system running the Pentium 4 3.2 GHz CPU (which has Hyper threading instruction registers to have almost dual-CPU performance).

Hyper threading does not give you the performance of a multi-processor setup. Hyper threading speeds things up when you have lots of independent threads. Lets say You get into a situation where you have, say, a cache miss and the CPU has to wait like 100 cycles to read crap out of ram. With HT, the CPU can use those cycles to run programs running in other threads.

It's like having two CPUs, but only one can run at a time, so while one waits, the other runs.

It just lets the chip come closer to it's theoretical maximum speed.

Parent Share
twitter facebook
huh? (Score:2, Informative)

by autopr0n ( 534291 ) writes: on Friday July 04, 2003 @06:37PM (#6369523) Homepage Journal

The apple scored 0.39% less then a 2.6ghz p4. Given that there are 3.2ghz p4s out there, as well as dual athlons, I doubt this is the 'fastest ever'

Parent Share
twitter facebook
Re:And before anyone asks... (Score:1, Informative)

by PhunkySchtuff ( 208108 ) writes: <kai@@@automatica...com...au> on Friday July 04, 2003 @06:46PM (#6369559) Homepage

What you're not factoring in is that the G5 competes against the Xeon rather than the P4.
Go spec a Dell with a Xeon processor, a DVD burner, Digital Optical Audio I/O and a Huge hard drive (doesn't _have_ to be SATA) and a good graphics card and you will have a more realistic comparison.
Factor in the amazing case design (yes, as a tech it's so much easier to work on Apple machines than anything else) and things turn the other way...
Try adding 8GB of RAM to the PC and... oh, wait, you can't!
As for the benchmarks with HyperThreading tutned OFF they did that to speed up the PC, it was slower in those particular benchmarks with it turned off than it was with it enabled. HyperThreading isn't two real CPUs, it's not magic that speeds up _everything_
- k

Parent Share
twitter facebook
Another message from the Benchmark author (Score:5, Informative)

by jbridges ( 70118 ) writes: on Friday July 04, 2003 @06:49PM (#6369571)

Found this from last Jan:

Date: Mon, 13 Jan 2003 23:29:38 -0500
From: Craig Hunter
Subject: G4 vs. P4 performance

I have been following the discussion of Rob Galbraith's benchmarks with much interest, as I have spent a good deal of time testing, optimizing, and benchmarking software for the G4 (OS X) and P4 (Linux).

The first thing to realize is that there are numerous benchmarks that show the P4 is faster, and there are numerous benchmarks that show the G4 is faster. What matters? Well, probably the benchmarks that apply to the kind of work you do. For people doing photo processing with the software Rob tested, his results are extremely relevant. But, someone working with a program optimized for AltiVec and dual processors might have a completely opposite experience.

Just to give an example of a benchmark that goes the other way, see this chart.

(You're welcome to mirror this benchmark image, since my web site may not handle a lot of traffic). These real-world results come from the Jet3D computational fluid dynamics noise prediction software, which I developed for my doctoral thesis and currently use in my work at NASA. Jet3D is written in a combination of FORTRAN 77, FORTRAN 90, and C, and is optimized for AltiVec and dual processors on G4 hardware. When compiled on Linux using Intel's ifc compiler tools, Jet3D also becomes optimized for the P4 (using the various SIMD extensions available on the P4).

As you can see, the G4 does quite well here. A dual processor 1.25GHz G4 system is more than 3.5X faster than a single processor 2GHz P4 system. Though it's not shown on the chart, a single 1.25GHz G4 processor benchmarks at about 1589 MFLOPS, 1.9X faster than the P4. If you look at MFLOPS per MHz for a single processor, the G4 comes in at 1.27 MFLOPS/MHz, while the P4 comes in at 0.42 MFLOPS/MHz. If you want a good example of the MHz myth, look at the Cray, which comes in at 1.78 MFLOPS/MHz with only a 500MHz processor, beating both the G4 and P4.

Without AltiVec, the Jet3D benchmark would be about 794 MFLOPS on the dual-1.25GHz G4, which erases the performance lead over the P4. And then, using only a single processor, the 1.25GHz G4 benchmarks at about 418 MFLOPS, which is about half as fast as the P4. And all of a sudden, the G4 doesn't look very compelling. For the Jet3D benchmark, AltiVec and dual processors are key (AltiVec more so than dual procs). This is true for most benchmarks I have looked at; thus numerically intensive applications that can't use AltiVec and/or dual processors are likely to suffer on the G4.

In the case of Jet3D, it was easy to optimize for AltiVec. I was able to hand-vectorize about 10 lines of code within the guts of the FORTRAN algorithm and convert the computations to C for easy access to AltiVec hardware instructions. It had a huge effect for not a lot of work. For other more complicated cases, it may be possible to use the VAST compiler tools to automatically vectorize and tie in with AltiVec (VAST has parallel tools also). But in some cases, vectorization is not possible or feasible. In those instances, you're stuck with the processor's scalar performance, and the P4 generally has better scalar performance than the G4 in my experience. One final note: these are my personal views, and do not represent the views of NASA Langley Research Center, NASA, or the United States Government, nor do they constitute an endorsement by NASA Langley Research Center, NASA, or the United States Government

Share
twitter facebook
Re:And before anyone asks... (Score:3, Informative)

by Llywelyn ( 531070 ) writes: on Friday July 04, 2003 @06:50PM (#6369573) Homepage

Oh boy.

Those systems aren't *exactly* what I would call comparable. A HD that is 4x bigger, a superdrive, and thats just the stats you posted (I'm sure I could draw it out with things like the Airport Antenna).

"I still haven't seen reliable benchmarks with the dual 2.0GHz facing a P4/3.2GHz with Hyperthreading on"

Veritest disabled HT for tests where the system would be slower, left it on where it was faster. They also enabled SSE2. You can check all of that in their report off of their website.

"if they are three times the cost of a PC, buyers will have a hard time justifiying it."

Apple doesn't sell in the low-range (exempting iBooks), they sell mid-range and up. For those of us who purchase Apple systems, we don't want the cheapest system we can get, we want a system that /just works/.

Parent Share
twitter facebook
Re:NASA Verifies Apple Benchmarks? (Score:5, Informative)

by andreMA ( 643885 ) writes: on Friday July 04, 2003 @06:50PM (#6369577)

The 498 figure was presented strictly as an aside:
Though dual processor benchmarks are not presented in detail here, it is worth noting that the G5 system benchmarked at 498 MFLOPS...

More relevant, perhaps, are the figures in the raw MFLOPS graph:
254: PowerMac G5, 2x2GHz, (single CPU only)

255: Pentium 4, 1x2.66 GHz
Alas, difficulties in cross-platform benchmarking rear their ugly head:
Scalar Code:
G4 using Absoft F90 v8: f90 -s -O -lU77 -N11
P4 using Portland Group F90 v4.0-3: pgf90 -byteswapio -tp p7 -O1
The author did apparently make an effort to use the compiler and flags best suited for each architecture if I read this correctly....

Note that the higher level of optimization (-O2) and SSE/SSE2 options in the Portland compiler degraded Jet3D performance on the P4 system, and were therefore not used.

I don't know how much I trust NASA tho. Afterall, they only do RealMedia and WindowsMedia streaming media. Perhaps there's some bias there in favor of Windows (yes, I realize that the testbed P4 system ran Red Hat. Lighten up)

Parent Share
twitter facebook
Damn Dude, Read What I Wrote (Score:4, Informative)

by Dak RIT ( 556128 ) writes: on Friday July 04, 2003 @07:06PM (#6369650) Homepage

If you follow that little link I gave you for Apple's benchmark claims, you'll see that the performance advantage Apple is claiming for its Dual 2Ghz G5 (I wrote Dual if you reread) is almost identical to what NASA is claiming for the Dual 2GHz G5 against a 2.66MHz P4.
Apple claims 15.7 for the Dual 2GHz G5, and the 3GHz P4 getting an 8.07. NASA gives the Dual 2GHz G5 498MFLOPS and the 2.66GHz P4 255MFLOPS.
If you use your math skills: 15.7 / 8.07 about equals 498 / 255. So therefore we can draw the conclusion that they have similar results.
Now, NASA only used a 2.66MHz P4 while Apple used a 3GHz P4. Although remember NASA's figure that the P4 had 0.096 MFLOPS/MHz? Give the P4 333 more MHz, and you find it has about 286.968 MFLOPS. NASA also suggests a 20% performance increase can be expected with compilers that take advantage of the G5.
Although, even without this increase Apple's benchmark and NASA's benchmarks are very close. Which would lead one to draw the conclusion that Apple's benchmarks were in fact valid.
I should also note that a P4 would not perform as well in a dual system as the G5 does. So your 500 MFLOPS number is a little rediculous. The G5 which is an amazing dual proc chip saw it's 254 MFLOPS for a single processor (508 when doubled) drop to 498 MFLOPS in a dual system. And the P4 isn't designed for a dual system, doesn't support HyperTransport, etc.
Dak

Parent Share
twitter facebook
Re:OS X 10.2.7 (Score:4, Informative)

by Trurl's Machine ( 651488 ) writes: on Friday July 04, 2003 @07:08PM (#6369667) Journal

I'm only running 10.2.6, and Software Update says nothing new is available.

MacOS X 10.2.7 - codename Smeagol [thinksecret.com] - is a stop-gap solution to provide JUST ANY working OS to the G5's until Panther is ready for prime time. Your Sofware Update is right not to install it on your machine, as most probably it is not a G5, sir.

Parent Share
twitter facebook
Re:If I remember right... (Score:5, Informative)

by Filarion ( 548689 ) writes: on Friday July 04, 2003 @07:09PM (#6369675) Homepage

actually you only need 3 db to double the volume (which, btw, has little to do with loudness). and 1000 dBa is, I hope, impossible.

Parent Share
twitter facebook
Re:Single Processor Mode (Score:2, Informative)

by Spyritus ( 606674 ) writes: on Friday July 04, 2003 @07:15PM (#6369707) Journal

In the Power Mac G5 Dual Processor, there is a DUAL 1GHz bus, 1 for each processor.
Turning off a processor does not give the other one any performance increase at all.

Parent Share
twitter facebook
Re:And before anyone asks... (Score:5, Informative)

by mamer-retrogamer ( 556651 ) writes: on Friday July 04, 2003 @07:25PM (#6369753)

The G5s are Apple's flagship product line. Comparing el-cheapo Dell's to high-end Apple's is like comparing... well you know where I'm going with this train of thought (something about oranges, I think).
How about a more fair comparison? Namely, between similarly configured high-end single-processor systems:
Apple PowerMac G5:
1.8GHz PowerPC G5
250GB Serial ATA - 7200rpm
SuperDrive (DVD-R/CD-RW)
512MB DDR400 SDRAM (PC3200)
Mac OS X
AppleWorks
ATI Radeon 9800 Pro
56k V.92 internal modem
No Monitor
$2874
Dell Dimension XPS:
3.2GHz Pentium 4
200GB Ultra ATA - 7200rpm
DVD+RW/DVD+R/CD-RW
512MB DDR400 SDRAM
Microsoft® Windows® XP Professional w/ Microsoft® Plus!
Microsoft® Works Suite 2003
ATI Radeon 9800 pro
No Monitor
$3062
And if you are to believe the benchmarks, it seems that Apple is selling the faster system for a lesser price than a similarly configured Dell.
Apple has never competed at the low end. It is not starting now.
-Mike

Parent Share
twitter facebook
Re:Single Processor Mode (Score:5, Informative)

by furballphat ( 514726 ) writes: on Friday July 04, 2003 @07:25PM (#6369758)

You only get the CPU tab if you install the CHUD tools, so you'll have to install no matter what.

Parent Share
twitter facebook
Re:MFLOPS/MHz? No AMD, Old P4, Old Redhat. (Score:3, Informative)

by localghost ( 659616 ) writes: <dleblanc@gmail.com> on Friday July 04, 2003 @07:26PM (#6369759)

The unit MFLOPS/MHz is a little weird. Let's simplify. / MFLOP \ | ----- | \ S / ----------- / MCYCLE \ | ------ | \ S / Multiply by the reciprocal... MFLOP S ----- * ------ S MCYCLE Millions cancel, seconds cancel... FLOP ----- CYCLE So it seems that this unit is equal to 1 floating point operation per CPU cycle. That makes a little bit more sense as a unit.

Parent Share
twitter facebook
Re:5177 MFLOPS 288 MFLOPS (Score:5, Informative)

by Anonymous Coward writes: on Friday July 04, 2003 @07:30PM (#6369794)

I recently got the chance to do a testrun, doing some airflow simulation on a G5 1.8GHz demo machine, and with altivec optimizations it clocked in at roughly 2100MFLOPS average for 5 runs(I could probably get better results with a better compiler though), while the dual Opteron 1.8(which the place where I did the testrun has bought 10 boxes of for their renderfarm), running Suse Linux, and my program re-compiled for x86-64 and SSE2 performed at about 2960MFLOPS average, but that could probably be improved with a better compiler too, but I had to use GCC at this time. Both machines had 4GB RAM btw.

Parent Share
twitter facebook
Re:Single Processor Mode (Score:4, Informative)

by The Infamous Grimace ( 525297 ) writes: <emailpsc@gmail.com> on Friday July 04, 2003 @07:34PM (#6369816) Homepage

"No shit. You can also not be a total ramrod and do it form System Preferences. See the CPU icon? Check it out."
No shit. You can also not be a total ramrod and realize that one needs CHUD installed for that particular System Pref. (fyi : CHUD = Computer Hardware Understanding Developement Tools)

You should check your facts before you flame.

(tig)
"We do not inherit the land from our ancestors"
"We borrow it from our children"

Parent Share
twitter facebook
Re:NASA Verifies Apple Benchmarks? (Score:5, Informative)

by nsrbrake ( 233425 ) writes: on Friday July 04, 2003 @07:41PM (#6369838) Homepage

Go back and read the article. You have no idea what you are saying.

1) One of the Mac's processors was disabled
2) 195.3% advantage was on an MFLOP/MHz basis

That is how they are comparing the architechture of the chip and it's performance outside of a MHz pissing race. They are in the same ballpark now MHz wise so why shouldn't they take a look at how the actual chip performs. Not to mention how much more will likely come out of the chip with maturing compilers to take advantage of the arch.

Parent Share
twitter facebook
Re:Costs (Score:3, Informative)

by benh57 ( 525452 ) writes: <bhines&alumni,ucsd,edu> on Friday July 04, 2003 @08:18PM (#6369971) Homepage

No, autopr0n - the G4 was not slower, in fact it was faster than the P4 by a factor of 10 in the vector benchmarks (scroll down).

Parent Share
twitter facebook
Re:MFLOPS/Mhz. - Useless Metric (Score:5, Informative)

by maraist ( 68387 ) * writes: <michael.maraistN ... n 0 s p a m .com> on Friday July 04, 2003 @08:47PM (#6370063) Homepage

Actually, you're both correct, and you're both missing something.

Originally, the pipelining was segmented based on the I-Fetch, D-Fetch (register/etc), Exec, Reg-Write-Back, with expensive floating point doing with different timing considerations (externalized or delay-locking multi-stage execution). Then they started sub-dividing each of those stages (especially in CISC archetectures). Now its common to see 15 integer execution pipeline stages - either with shared resources, such that you can only have one divide occuring at any given time (early P-I, P-II, P-III), or with fully independent/concurrent resources (AMD's Athlon).

The addition of the pipelinable-stages between I-Fetch, D-Fetch, exec, and WB was somewhat trivial, because prior to pipelining, there were still seperate events on seperate clock-ticks with inter-stage latching. However, in CPU's with exec-stages that are pipelined, you are introducing additional latches that cause additional undesirable propagation delays.

So a 15 stage integer multiply unit (excluding fetch/WB) has 15 x [guestimating] 4 propagations of additional latency over a single-stage I-unit. If there are resource-based stage-interlocks, or worse data-dependencies, then the pipelining is useless and you're totally hit by the excess propagation delays.

Still, marketing being what it is these days, adding more stages means less propagations per stage, thus less worst-case propagation time, and thus higher clockability (all else being equal; temperature, etc).

The P4, however, compensated by double-clocking the core integer stages, so the number of advertised stages is somewhat misleading.

On a side note, due to the latching in pipelining, you're definately doing more total work for a given instruction. And more importantly, the designers have to think of totally different logic-algorithms to efficiently pipeline than to single-stage. My guess is that the pipelined versions will always be less efficient (especially considering that not all stages will fully utilize their allotted clock-time), and thus there's an additional loss.

Ok, so this supports your post, but here's the part about power/heat.

There are two types of transistors used in modern CPUs (everything past the Pentium). BiPolar and Field-Effect. The CMOS-FET refers to Complementary Metal-Oxide-Semiconductor Field-Effect-Transistor. This acts similarly to a capacitor in that there is a charge and discharge time with little waste current, and power dessipation is typical V=IR, Pwr = I^2 * R. The gate capacitor charge-time is the killer, and what limits switching speed (and thereby clock-speed). Shrinking the area of the capacitor (related to the micron-size stated, .18u, .15u, .13u, etc) means there's less time necessary to charge the capacitors, and thus speeds are increased. (This is only one aspect, and I'm no expert here).

There's another way of reducing switching speed.. Increasing the amount of current running through the wires that ultimately charge/discharge the gate-capacitors. FETs are poor amplifiers, but BiPolar (while more complex and harder to make small) are phenominal. In addition to their complexity, Bipolar also are power hogs. While a FET only consumes power while turning on or off, BiPolars are always on, consuming power (there is current bleeding from the switch). So what often happens is that designers sprinkle BJT's here and there to amplify the current (at the expense of cost/complexity and power-dessipation), and continue using FETs everywhere else.

The bigger and greater number of BJTs that are used, the faster some heavily loaded FET gates will charge and the quicker their switching time will be.

If you up the voltage on a CPU, you're enhancing the amplifier's ability to charge the capacitors and thus gives you more safety-room to up the external clock-speed.

Again, this deviates somewhat from my knowledge domain, but you can often merely co
Read the rest of this comment...

Parent Share
twitter facebook
+3db Doubles Power, +10dB Doubles Loudness (Score:3, Informative)

by DonnarsHmr ( 230149 ) writes: on Friday July 04, 2003 @08:47PM (#6370064) Homepage

Plus or minus 3dB represents a doubling or halving of the power, respectively. However, quietness or loudness is a subjective quality. Most statistically normal humans seem to agree that plus or minus 10dB doubles or halves the apparrent loudness. Psychoaccoustics bears no relation to math or physics.

Parent Share
twitter facebook
Re:The G5 (Score:3, Informative)

by dbrutus ( 71639 ) writes: on Friday July 04, 2003 @09:15PM (#6370187) Homepage

The price on the client side is generally the same until you add in the server side. Then clients get more expensive. An Apple server generally is CAL free for $999 (you *can* get a 10 client version for half that). A Windows server serving file and print makes you get a CAL for each machine that accesses it. If you have a server application like Exchange on the box, you need a second CAL (priced differently) to access that program. Each server has their own CAL and the money keeps rolling into MS.

Parent Share
twitter facebook
Re:The G5 (Score:0, Informative)

by p00p ( 630992 ) writes: <nightlydeposit@hotmail.com> on Friday July 04, 2003 @10:09PM (#6370414) Homepage Journal

dude, you can get WinXP Pro (not the upgrade edition, not academic license - normal XP Pro) brand new in Canada for $220 or less. That's Canadian dollars too, which equates to approx. 160 of your American bills. Then there's the fact that WinXP service packs are always going to be free. Compare that to Apple, who's charged $129 each for 3 upgrades since March 24, 2001. Sure you can skip an upgrade or two, but you'll pay a serious price in performance.. know anybody still using OSX 10.0 on a G3 machine? LOL! I pity the fool.
Alternatively, you can run WinXP on a $1200 whitebox PC from my local computer store down the road (again, Canadian dollars - about $900 US) with a P4-2.66GHz and a gig of DDR-RAM, 800MHz FSB, WinXP Pro, etc.
Hardware savings alone negate the US$160 paid for the OS, and PC hardware is easily upgradeable with off-the-shelf parts for the next ~3-5 years or so.

summary:
WinXP plus the next 3 service packs == US$160
OSX 10.3, 10.4, 10.5 == US$387 (plus inflation)

Of course, this is all academic unless you can actually afford the hardware, and anybody who wants Apple hardware badly enough will ignore the pricetag completely.

Parent Share
twitter facebook
G5 has much slower SpecFP than Power4 (same clock) (Score:3, Informative)

by vmp17 ( 680763 ) writes: on Friday July 04, 2003 @10:28PM (#6370490)

At "SPEC", you can easily find the performance of a Power4 @ 1.45 GHz. Its SPEC2000 rating for floating point is 1097. When you scale that to the 2.0 GHz processor in the G5, you conclude that it has a SPEC score of about 1500.

Wrong Conclusion!

According to this pdf (page 13) [ibm.com] G5 @ 1.8GHz has 1051 SpecFP.
At the same time Power4 @ 1.7GHz has 1598 SpecFP !!! [spec.org]

It is very clear that Power 970 (G5) is much-much slower in floating point than it's Power4.

Parent Share
twitter facebook
Bow wow wow yippie yo yippie yay... (Score:3, Informative)

by MsGeek ( 162936 ) writes: on Friday July 04, 2003 @11:36PM (#6370684) Homepage Journal

The reason why there are no RH 8 or 9 ISOs for PPC is that basically they are leaving development of the PPC branch of Red Hat to the makers of Yellow Dog Linux. Which, right now, is G5 ready and is basically RH9 for PPC. http://www.yellowdoglinux.com/ [yellowdoglinux.com]

Parent Share
twitter facebook
Re:Single Processor Mode (Score:5, Informative)

by TheCrazyFinn ( 539383 ) writes: on Friday July 04, 2003 @11:41PM (#6370704) Homepage

well, don't forget they were testing a SINGLE 2GHz G5 (They turned off the second CPU), running G4 code. That's really equivalent to the $2400 G5 (Single 1.8GHz) than the actual performance of the Dual G5 (Which should be at least 50% higher, if not more, due to the G5's optimized SMP design, which is similar to the Athlon MP's, barring the vastly faster system bus (Point-to-point, unlike the P3's [and the G4's] shared architecture. I think the P4 Xeon might use a ptp bus, not sure).

Also, they were getting baseline tests on performance, against the G4. They also broke it down to performance per MHz, which the G5 took a huge lead in.

I suspect a dual G5, with an optimized compiler, would prove more than a match for the dual Xeon setup (That would cost significantly more, similar spec dual-xeon dell's are in the $4000 range), at least for this application (which heavily benefits from Altivec, and Altivec is still king of the SIMD world, SSE2 isn't even close in performance)

Parent Share
twitter facebook
Re:Single Processor Mode (Score:5, Informative)

by Have Blue ( 616 ) writes: on Saturday July 05, 2003 @12:42AM (#6370885) Homepage

You can't "give 1 CPU the entire 1Ghz bus bandwidth". The two CPUs have independent 1Ghz connections to the system controller. They only share the 400Mhz RAM bus and the rest of the system devices.

Parent Share
twitter facebook
Re:Apple had little say in the Power4 CPU (Score:3, Informative)

by thatguywhoiam ( 524290 ) writes: on Saturday July 05, 2003 @12:54AM (#6370920)

As much as people would like to believe that Apple created the G5, that's simply not true. IBM was the one that created it, about 2.5 years ago for their servers.
Ah, but Power4 != G5. The G5 was called the GigaProcessor UltraLite (i know) in development and is quite a different beast from a full-blown Power4; it is scaled down to 'desktop tolerances', has not as many cores, and an AltiVec unit. Apple probably had a hand in what is now known as the 'G5' since the very beginning.

Parent Share
twitter facebook
Re:Single Processor Mode (Score:3, Informative)

by Billly Gates ( 198444 ) writes: on Saturday July 05, 2003 @02:27AM (#6371170) Journal

Well benchmarking is a strange and controversial thing.

A benchmark will only show what is benchmarked. How many here care about how fast the cpu is in signle percentage points? Today's cpus are not as much as a bottleneck as they once were. Hardware has advanced beyond software for most apps.
What really matters is what you plan to do with it.

A disagree on the smp thing if your testing out chip performance. The reason being is that the OS and the particular benchmark have a huge influence on what the numbers will look like.

If your running a single threaded app that requires alot of cpu time, the smp benchmark would obfuscate the result and not tell you which cpu is the fastest.

For running a web server or testing i/o on the new motherboard then yes smp should be included. Alot factors change performance including which OS has the best i/o scheduler.

I use pc's as workstations for coding and browsing the web. A benchmark for me would be bootime and latency. MacOSX has been rumoured to be quite sluggish in disk latency. A faster processor certainly would not help this. My current athlonX+ takes awhile to load W2k with mysqladmin, interbase, OpenOffice, and Mozilla at startup. Would a twice as fast G5 mac help? Or would a new Serial ATA controller and drive help?

MaximumPC has bootimes in their benchmarks and I would like to see this and i/o tested on these new macs. They have hypertransport pci buses besides Serial ATA support. Sweet! This should help make the system alot faster overall and more responsive.

Parent Share
twitter facebook
Re:G5 is really a full-blown workstation (Score:2, Informative)

by Josuah ( 26407 ) writes: on Saturday July 05, 2003 @02:29AM (#6371177) Homepage

About as many as are optimised for "OS X SMP". IOW, not many.

Mac OS X automatically splits execution threads among multiple CPUs. Even something as basic as a progress bar or a network service daemon will run in separate threads. The Mach microkernel makes heavy use of threads so the basic OS itself will experience noticeable improvements in message passing between tasks (e.g. crossing the kernel boundary), especially given the dedicated CPU buses.

Also, there are a whole bunch of tasks running at any given time. I've got 64 right now on my Beige G3. Each of those tasks is running one or more threads (most only one). But if iTunes can run on one CPU while my compiler runs on the other, that's going to be a big performance gain. Not to mention how the new XCode is going to benefit from SMP.

Windows is probably in the same boat as far as distributing threads, but Mac OS X makes heavier use of threads/tasks (at least, we think so because Windows is too proprietary for us to know for sure). Plus there are a bunch of very important applications where SMP really matters, e.g. Adobe products, scientific research.

Parent Share
twitter facebook
Re:Single Processor Mode (Score:2, Informative)

by WhiteBandit ( 185659 ) writes: on Saturday July 05, 2003 @02:52AM (#6371257) Homepage

Eh, using various linkage, if you put it together yourself, you get:

$394 P4 3.0 Ghz (800mhz bus, fastest they have at the moment) (pricewatch.com)
$165 1GB DDR 400 (pricewatch.com)
$376 Radeon 9800 Pro (pricewatch.com)
$159 Seagate 7200rpm 160GB SATA HD (via newegg.com)
$185 Albatron PX865PE Pro II Mobo (w/ USB 2.0 and SATA support) (newegg.com)
$242 LiteOn LDW-400D DVD-RW (newegg.com)
$80 Firewire 800 1394b PCI Host Adapter (firewiredirect.com)
$225 Creative Labs Sound Blaster Audigy 2 Platinum Ex (newegg.com) (Has the optical audio in/out)
$90 Liberal estimate for a decent case and power supply

Total (not including shipping and tax): $1916

*I don't know much about Intel systems honestly, so not sure if the motherboard I chose is decent. Was just doing a quick browsing of Anandtech for "good" mobos.

Also, some of my choices probably aren't the best. I've heard mixed reviews about SB Audigy's, but it was just a brand name I went for that I knew would have optical audio in/out.

Anyways, some of the specs are actually better than the G5, some aren't. Really depends on what you are going to use it for and what is available. I don't mind either platform honestly.

Oh. And just to spite the AC above, we'll go ahead and add an Apple Cinema Display to our system ;) $699 + ~$40 for the ACD to DVI adapter.

Grand Total!
$2650! (Add your tax and shipping charges and you probably get pretty close to $3k. I'm sure with more research you can bring this price down further... or wait 3 months ;)

Parent Share
twitter facebook
Re:Single Processor Mode (Score:3, Informative)

by sadtrev ( 61519 ) writes: on Saturday July 05, 2003 @03:48AM (#6371414) Homepage

This test was for large fluid dynamics computations. These involve lots of difficult sums (unsteady Naver-Stokes) on large discretised grids.
For this type of work the CPU usually runs flat out and the bottlenecks that apply to things like opening MSWord documents hardly come into play.
If it's properly written, then HDD access speed is irrelevant, and even main memory access is hardly ever the bottleneck.
This is one of those applications where the system speed is determined by the speed and efficiency of the CPU.

Parent Share
twitter facebook
Re:Single Processor Mode (Score:2, Informative)

by Andre Breton ( 605694 ) writes: on Saturday July 05, 2003 @04:21AM (#6371493)

"Personally though, I want to see how well it runs Seti@Home."
Not well, IIRC the seti folks were reluctant to do certain optimizations for the Mac. The distributed.net stuff works much better.

Parent Share
twitter facebook
Re:NASA Verifies Apple Benchmarks? (Score:3, Informative)

by John Harrison ( 223649 ) writes: <johnharrison@gma[ ]com ['il.' in gap]> on Saturday July 05, 2003 @09:23AM (#6372016) Homepage Journal

This is MGH. [harvard.edu]

Parent Share
twitter facebook
Re:Single Processor Mode (Score:3, Informative)

by WhiteBandit ( 185659 ) writes: on Saturday July 05, 2003 @03:51PM (#6373338) Homepage

Wow, I got modded as a troll for admitting I am possibly wrong but answering a question anyway. Nice! Anyway.

You're right. However the poster didn't ask for a dual-proc system ;) Sure that would substantially increase the price though and I would concede that the Mac system would perform better. But regardless, you can still put together a similiar system for less money using a PC setup. :P

And if you did read my post, I noted that you would have to factor in shipping costs, so I'm not claiming ignorance to the prices.

Whatever though!

IHBT.

Parent Share
twitter facebook
Re:Single Processor Mode (Score:3, Informative)

by ChadN ( 21033 ) writes: on Sunday July 06, 2003 @12:36AM (#6375428)

(which heavily benefits from Altivec, and Altivec is still king of the SIMD world, SSE2 isn't even close in performance.

Altivec is single-precision, SSE2 is double precision. The latter is invaluable for scientific computations of many types of matrix problems, and being wrong twice as fast is of little use.

Altivec is nice, for what it is meant for (mainly media type calculations, signal processing, etc.) But scientists will prefer SSE2.

Parent Share
twitter facebook
Re:Single vs Double Precision (Score:2, Informative)

by Ahaldra ( 534852 ) writes: on Sunday July 06, 2003 @07:06AM (#6376348) Homepage

k, I'm loosing the moderator points I already spend on this thread but what the heck-
You said:
Altivec is nice, for what it is meant for (mainly media type calculations, signal processing, etc.) But scientists will prefer SSE2.
Well I was under the impression, that if I need double-precision floating point arithmetic I use the FPU (which is quite fast with the G5 as you can see in Fig. 1 [cox.net] of the Article).
Concerning fast vector operations - some problems can be "dumbed down" to take advantage of the faster single precision units and lastly you can do an additional Newton-Raphson refinement step where double precision is needed.
Your claim "scientists will prefer SSE2" was proven wrong by the article, where the Jet3D test suite was only ported to take advantage of the Altivec SIMD unit, not the SSE2 SIMD unit - unless of course you argue that NASA engineers are not scientists ;-)
IMHO real scientists use whatever they think is best suited to solve their specific problem.

Parent Share
twitter facebook
Re:Single vs Double Precision (Score:4, Informative)

by ChadN ( 21033 ) writes: on Sunday July 06, 2003 @08:21AM (#6376491)

Well, first of all, I work at NASA as a researcher, doing numerical work (although I'm not a civil servant, and don't speak for them).

I agree that AltiVec is superior to SSE (ie. single precision), but you compared it to to SSE2, which is a bit apple-to-oranges (no pun intended, btw). If the G5 FPU is faster than current SSE2 at double precision, it just proves the well thought out design of the PowerPC architecture (and the unfortunate legacy of Intel's FPU instruction set, which is still a handicap even with SSE/SSE2, due to the need to mode switch).
But SSE2 is still immature, and I expect compilers to improve, as well as chip implementations. Once they do, a more meaningful comparision can be made.

The Intel chips NEED stuff like SSE/SSE2 to achieve faster floating point speeds, whereas the PowerPC can get by without it, thanks to a much better FPU design, and thus, PowerPC makers will probably not spend the silicon to make a double precision SIMD instruction set anytime soon.

I stand by my claim that while most consumer and media software can get by with single precision, scientific computing (ie. large matrix calculations, to be blunt) quite often needs double precision (hell, you can get libraries that use 128 bit long doubles, these days), and will ultimately prefer SSE2. Scientists fuss with single precision SIMD simply because many of their applications can benefit so much from SIMD that it is worth the pain to use single precision (with proper conditioning and verification, etc.) Now that double precision SIMD is available, I can only predict they will want to jump to it, once tools for using it are there.

Granted, if Intel can't make a double precision SIMD unit that outperforms a double precision general FPU like the G5's, for matrix problems, then they don't deserve to design chips for scientific computing. :)

Parent Share
twitter facebook
Re:Damn Dude, RTFA (Score:3, Informative)

by ChadN ( 21033 ) writes: on Sunday July 06, 2003 @08:37AM (#6376522)

Also, even with HyperThreading, the chips do NOT have 2 separate cores (although I suppose that could be done, in principle). It just allows software to think of them as two cores, and to schedule more than one process at a time.

The thing is, this usually works best when you have two separate types of task running concurrently on the processor, because they can (hopeully) take advantage of more execution units at one time. But, if you run two of the same scientific number crunching programs, on the hyper-threaded processor, they will both be competing for the floating point, and other, execution units at the same time. That competition has to be serialized by the chip, and the result may well be slower than if you just ran one process on the chip.

So, for this kind of computation, the hyper-threading may not be such a win (and it is probably marketed more for things like web-servers, where you might have a database program, and web content delivery program, both executing, and they may parallelize with each other a bit.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Single Processor Mode (Score:5, Informative)

Re:Single Processor Mode (Score:1, Informative)

fortran compiler (Score:5, Informative)

Re:Summary (Score:3, Informative)

Re:Costs - correction (Score:5, Informative)

Re:Single Processor Mode (Score:5, Informative)

Re:Turn the optimizations on first. (Score:5, Informative)

Re:Interesting choice of processors (Score:4, Informative)

Re:Summary (Score:1, Informative)

If I remember right... (Score:4, Informative)

Re:MFLOPS/MHz? No AMD, Old P4, Old Redhat. (Score:4, Informative)

Steve Jobs lucked out -- again. (Score:1, Informative)

Re:G5 is really a full-blown workstation (Score:3, Informative)

Re:Single Processor Mode (Score:0, Informative)

And before anyone asks... (Score:0, Informative)

Re:I'll wait for a real comparison. (Score:5, Informative)

Re:I'll wait for a real comparison. (Score:3, Informative)

huh? (Score:2, Informative)

Re:And before anyone asks... (Score:1, Informative)

Another message from the Benchmark author (Score:5, Informative)

Re:And before anyone asks... (Score:3, Informative)

Re:NASA Verifies Apple Benchmarks? (Score:5, Informative)

Damn Dude, Read What I Wrote (Score:4, Informative)

Re:OS X 10.2.7 (Score:4, Informative)

Re:If I remember right... (Score:5, Informative)

Re:Single Processor Mode (Score:2, Informative)

Re:And before anyone asks... (Score:5, Informative)

Re:Single Processor Mode (Score:5, Informative)

Re:MFLOPS/MHz? No AMD, Old P4, Old Redhat. (Score:3, Informative)

Re:5177 MFLOPS 288 MFLOPS (Score:5, Informative)

Re:Single Processor Mode (Score:4, Informative)

Re:NASA Verifies Apple Benchmarks? (Score:5, Informative)

Re:Costs (Score:3, Informative)

Re:MFLOPS/Mhz. - Useless Metric (Score:5, Informative)

+3db Doubles Power, +10dB Doubles Loudness (Score:3, Informative)

Re:The G5 (Score:3, Informative)

Re:The G5 (Score:0, Informative)

G5 has much slower SpecFP than Power4 (same clock) (Score:3, Informative)

Bow wow wow yippie yo yippie yay... (Score:3, Informative)

Re:Single Processor Mode (Score:5, Informative)

Re:Single Processor Mode (Score:5, Informative)

Re:Apple had little say in the Power4 CPU (Score:3, Informative)

Re:Single Processor Mode (Score:3, Informative)

Re:G5 is really a full-blown workstation (Score:2, Informative)

Re:Single Processor Mode (Score:2, Informative)

Re:Single Processor Mode (Score:3, Informative)

Re:Single Processor Mode (Score:2, Informative)

Re:NASA Verifies Apple Benchmarks? (Score:3, Informative)

Re:Single Processor Mode (Score:3, Informative)

Re:Single Processor Mode (Score:3, Informative)

Re:Single vs Double Precision (Score:2, Informative)

Re:Single vs Double Precision (Score:4, Informative)

Re:Damn Dude, RTFA (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals