Nvidia's Fermi Architecture Debuts; Nouveau Driver Already Working 70
crookedvulture writes Nvidia has lifted the curtain on reviews of its latest GPU architecture, which will be available first in the high-end GeForce GTX 680 graphics card. The underlying GK104 processor is much smaller than the equivalent AMD GPU, with fewer transistors, a narrower path to memory, and greatly simplified control logic that relies more heavily on Nvidia's compiler software. Despite the modest chip, Nvidia's new architecture is efficient enough that The Tech Report, PC Perspective, and AnandTech all found the GeForce GTX 680's gaming performance to be largely comparable to AMD's fastest Radeon, which costs $50 more. The GTX 680 also offers other notable perks, like a PCI Express 3.0 interface, dynamic clock scaling, new video encoding tech, and a smarter vsync mechanism. It's rather power-efficient, too, but the decision to focus on graphics workloads means the chip won't be as good a fit for Nvidia's compute-centric Tesla products. A bigger GPU based on the Kepler architecture is expected to serve that market." Read on below for good news (at least if you prefer Free software) from an anonymous reader. Update: 03/22 19:35 GMT by T : Mea culpa -- that headline should say "Kepler," rather than Fermi; HT to Dave from Hot Hardware (here's HH's take on the new GPU).
Our anonymous friend writes "The open-source Nouveau driver project that reverse-engineers the official NVIDIA driver to provide a free software alternative has made some big accomplishments. Nouveau announced today they have same-day Kepler support and are now de-staging on Linux. The GeForce GTX 680 'Kepler' launch just happened hours prior to Nouveau, somehow managing initial mode-setting support with early hardware, from a project that NVIDIA 'officially' does not support. The de-staging in the Linux kernel now means that the driver is at version 1.0 with a stable ABI."
Wrong architecture! (Score:5, Informative)
Re: (Score:2)
Exactly. Fermi launched two years ago.
It even mentions Kepler in the summary. (Score:2)
Re:Fermi ? (Score:5, Funny)
Exhibit A: "Posted by timothy"
The prosecution rests, your honor.
Re: (Score:3)
Exhibit A: "Posted by timothy"
The prosecution rests, your honor.
Your honor, we're asserting an affirmative defense based on the fact the it's nap time.
Re: (Score:2, Informative)
The saddest part is the summary correctly mentions that it's Kepler. Timothy once again shows off either his piss poor editing skills or the fact that he's illiterate.
Re: (Score:3, Funny)
What's it got to do with Timothy?
A careful reading of the source clearly shows...
Oh. Never mind.
fail (Score:1)
Re: (Score:2)
Troll AC, I know, but still...quit being a dumb ass.
Re: (Score:3)
Epic fail on your part. Nouveau got it to light up. Gaming support comes from acceleration support.
From the actual article on Phoronix:
There isn't any acceleration support yet for Kepler or anything besides mode-setting on Nouveau, but this is welcoming at least so early Kepler adopters won't need to fall-back to the xf86-video-vesa driver and likely some less-than-ideal resolution.
Next Consoles... (Score:1)
Re: (Score:3)
The rumours around all point to all next gen consoles having AMD GPUs in them though.
Re: (Score:3)
I don't know what licensing agreement Sony & Nvidia had with the PS3, but if I'm Sony and I see what the other guys are doing I would rather go with the more flexible GPU design house. That and ATI's Fusion experience would probably help tip the scales in their favor too.
Nouveau (Score:5, Interesting)
If the Nouveau project doesn't get support from Nvidia, how did they manage to support this new chip before it's release? Have they had access to one of the cards sent to the press?
Re:Nouveau (Score:4, Insightful)
Re: (Score:2)
It doesn't *officially* get support from Nvidia
To be fair, the summary says it officially doesn't get support, which to me brings images of the CEO phoning up some newspapers and saying "We don't support open source work. That is all, *click*"...
Re: (Score:1)
They have secretaries to deal with the open-source 'wierdos'.
Not what it means (Score:2)
It more means that nVidia helps them out in some ways, however it is at nVidia's discretion. They also aren't going to help you out if it doesn't work and so on.
So nVidia officially supports their binary driver, this one they are willing to help the project out when they want, but that's it.
Re: (Score:2)
Nouveau, somehow managing initial mode-setting support with early hardware, from a project that NVIDIA 'officially' does not support.
Straight from the summary....
Re: (Score:2)
Yes, I read that. That's why I was asking how they got the hardware.
Re: Nouveau (Score:5, Informative)
Troll spotted
As a Nouveau dev, I can tell what's wrong with Nouveau and it is not the lack of acceleration!
First of all, we have 2D and 3D acceleration (up to OpenGL 3 and a toy directx 10/11 support that runs Unigine Heaven) for all cards, back to the TNT 2 (of course, no hw opengl 3 there). OpenGL has been good-enough for me to play many games at decent framerates and have a composited desktop running on all my cards minus one. This one is the half-Fermi/half-Kepler nvd9 that still needs some love.
Up until the G50, there mostly was no real power management. Clocks were set at boot time and that was enough for us.
G50 introduced reclocking support on mobile GPUs. The boot clocks were no longer set to the stock values but only to lower clocks (let's say half the normal frequency). Most desktop GPUs lacked power management.
GT215 extended the laptop power management scheme to desktops.
Fermi, of course, kept that scheme but pushed it a little further. Now, boot clocks are terribly low (core = 50MHz, memory = 100MHz) at boot time.
On my GTX460, Nouveau is perfectly usable on kde 4.8 (I have 100fps with KWin and the OpenGL backend) but games are obviously really slow, about 30fps for xonotic.
At the same clocks, Nouveau's performance is about 80% of the proprietary driver and thus, not bad. Our real problem, is that we need reclocking support to get more performance out of the cards. We have been working on it for about 1.5 years and trust me, it isn't the easiest part of the hardware to reverse engineer.
So, what's the current state of reclocking support?
- G50->GT200: Clocks can be set to the desired frequency and the operation should be stable. Some cards don't work but we are ironing the corner cases. In some cases, the screen turns black for a few ms while reclocking. It's a bug I'm working on.
- G215 -> GF100: Clocks can be set for all engines and memory but the end-result isn't usually working because of some black voodoo we aren't doing right now. It is being addressed.
- GF100 series: Only the engines can be reclocked. Nothing but very experimental memory reclocking. It is being worked on.
- Kepler: Hey, it was released today, most of us haven't put my hands on anything yet.
If reclocking is supported on your card, dynamic reclocking is a piece of cake (compared to reclocking) and the support for it has already been written.
To sum up, we have hw acceleration on all cards but nvd9 (unless you use some microcode from the blob) and Kepler. The only problem with 3D is the lack of proper power management but it is being worked on and we have made great progress. As cards are all different but in fact doing the same thing for it (even across generations), I have good hopes that Kepler will be fully functional 3D-wise before a new series come.
Remember that, contrarily to the blob, we do support cards older than geforce 7 AND we provide out of the box/open source hw acceleration that is already way sufficient for desktop usage. Also, remember that this work is mostly done by a core team of less than 10 people, most of us being students and only one being paid by Red Hat.
Martin Peres, PhD student working on power management on Nouveau
Re: Nouveau (Score:5, Insightful)
Martin Peres, thank you for your hard work. It is no small thing that you do with Nouveau, especially considering the general lack of appreciation shown by some.
Anybody who uses their time and talent to develop OSS stuff deserves a lot of respect, and at least a little thanks, IMO.
I'm not an OSS dev, or a dev of any kind, but as a professional music recordist, I have done a lot of work with OSS devs in the audio realm, and although I've been too impatient sometimes with the progress of OSS music production on Linux, there has been some pretty impressive work done in the last couple of years, to the point where I've been able to do my first all-OSS music production project last year and get absolutely first-rate results. There are still rough patches, but today there is finally the possibility of serious creative audio work using all OSS, thanks to a lot of people like you.
So, salut!
Re: (Score:1)
Thanks. I do this work because I learn a lot from it AND I get to improve Linux and push towards more openness. I don't mind the lack of appreciation because I know why I'm doing this.
I have done a lot of work with OSS devs in the audio realm, and although I've been too impatient sometimes with the progress of OSS music production on Linux, there has been some pretty impressive work done in the last couple of years, to the point where I've been able to do my first all-OSS music production project last year and get absolutely first-rate results.
Good to know! I used to do some MAO on Linux a while back. I really loved Jack but I was such a noob at it that I grew tired of it and stuck to playing the instruments although I used ardour to record some fun little music projects.
Salut ;)
Re: (Score:1)
Are there really 3D acceleration for pre NV30 cards ?
Last time I checked there was some code in mesa but it was never finished.
Most of the dev was done on gallium capable card (nv40, NV50).
Well, it should works but as very few users are actively using it and reporting bugs, it isn't our main focus. However, It is being partialy rewritten because libdrm's API has been drastically updated. So you may expect some improvements/fixes.
Real support comes from nv30 (it has been massively re-written and should bring up to 100% speed improvement in nexuiz). That work isn't released yet. Nouveau_vieux should be updated first so as libdrm can be merged and all the drivers will start using it and nv30 wil
Re: (Score:2)
Does Nouveau really have 3D acceleration now? Last time I tried Nouveau was a few years ago, and it didn't have good 3D acceleration then. Was fine otherwise. But Google Earth was so horribly slow I had to switch back to the proprietary driver.
Re: (Score:1)
And this was the OP's point before he got modded into oblivion. Then some dev comes on and basically repeats the same thing except "oh, things are actually really good!" and gets modded up.
The fact is, Nouveau's performance sucks balls compared to the official nVidia driver. Also, good luck getting multiple monitors to work (they barely work with the official driver; mostly due to X.org/randr's crappiness).
Re: (Score:1)
When did you last test nouveau?
Nouveau can be fast and really reach 80% of the blob's speed when clocked properly (not even sure this one was clocked properly, it is a gt 220 which has very experiemental reclocking code): http://openbenchmarking.org/embed.php?i=1201287-BY-NOUVEAURE42&sha=a43fdd7&p=2
The following example shows my point better. Fermis are slow *when gaming* due to missing reclocking. The following two are not set to the right frequencies. The last one is clocked at the right frequency
Re: (Score:2)
I've had far more success with multiple monitors using nouveau than with the proprietary Nvidia drivers. I'm currently running with 4 monitors using nouveau, and have been for many years. Further, in the last few years I haven't encountered anyone else that's had problems with multiheaded support in nouveau either, and we have an office full of people doing so here.
Re: (Score:2)
There is one, and only one, reason that I use the nvidia propriatary driver:
VDPAU.
Re: (Score:2)
As a user of the nouveau driver in a system where the binary blob causes a lot of instability (which nouveau doesn't have), you have my eternal gratitude for your efforts.
I don't play games on my PC, so for me the performance is more than satisfactory. Desktop composition and effects are smooth enough, and I can play videos in any quality without any hickups.
The only issues I have now is that I can't set the brightness level (aparently it works in the new kernel. I'll check that once I upgrade my Ubuntu ins
Re: (Score:2)
Bad for GP-GPU computing (Score:5, Informative)
Re: (Score:2)
A GPU manufacturer optimising their cards for 3D graphics performance? Shocking!
Re: (Score:2)
Hint: the G in GPU stands for 'Graphics'. They only started offering them as compute cards when the graphics market began to run out of steam.
And I'm guessing that they're salivating at the prospect of being able to sell dedicated compute cards for 10x the price of 3D cards rather than having cheapskates just load their systems with cheap consumer 3D hardware.
Re: (Score:2)
why call the consumer cheapskate for using the full capabilities of his hardware? you should be calling the company grubby for artificial scarcity.
Re: (Score:2)
actually, no, nvidia artificially limits performance to specific profiles.. geforce has shitty gpgpu performance, quadro has decent gfx and gpgu, and their 'tesla' stuff is all gpgpu.
Re: (Score:1)
NVIDIA artificially limits their double-precision performance to boost sales of their Quadro chips.
Re:Bad for GP-GPU computing (Score:4, Informative)
The double precision situation is a lot worse than that. For GK104, fp64 performance is only 1/24 fp32. Previous to this, NV's consumer cards did fp64 at 1/12 (midrange) or 1/8 (high-end) fp32; I guess that wasn't enough handicapping to protect their Tesla line so they bumped it up.
If you need more precision than fp32 and want to use nV consumer GPUs you should consider software emulation. A very simple software double emulation scheme can give you 1/6 - 1/4 of fp32 performance. Of course it's less precise than fp64- it has 48 significand bits (double fp32's 24, less than fp64's 53) and 8 exp. bits (same as fp32, 3 less than fp64), and to get ~1/4 of fp32 performance you have to skip a lot of error/NaN/inf handling type stuff. But it's probably sufficient for a lot of applications where people use fp64. Even software "quad-single" (96 significand bits using 4 32-bit floats) would likely be faster than nV's native fp64.
OTOH, AMD doesn't have much reason to handicap its cards, as you mention, its cards do fp64 at 1/4 fp32-- and that's with full IEEE 754 compliance. They used to be at a big disadvantage for GPGPU, but with their new compute-oriented GCN architecture and their now-huge fp64 lead for $2000 cards, I think a lot of GPGPU folks will switch.
Re: (Score:2)
That should say "sub-$2000 cards" - I forgot that slashdot eats less than signs unless you use HTML entities.
Re: (Score:1)
I don't know where you get your numbers from. Fermi class hardware (C20xx) has 1/2 the fp64 performance (~450GF) compared to fp32 (~1TF), and old Tesla (C10xx) has about 1/8 or so.
Realistically unless you load tiny chips of data and wail on them from shared memory for a good long time it doesn't matter because there's no way main memory bandwidth (let alone streaming data in over pci-e) can keep up anyway.
You fail english (Score:2)
Try reading, it's fun!
If you bothered to read my post you would notice I said those were the performance figures for GK104 and consumer cards. Of course Tesla has fp64 at 1/2 fp32, but to get a worthwhile Tesla card you're looking at ~$2000.
Re: (Score:1)
Firstly, this new architecture (GK104) has a great number of cores (192 versus 32 of the Fermi architecture) sharing a single control logic within a stream multiprocessor (SM). Internally, each SM is SIMD, so this move is bad for divergent kernels, i.e., algorithms containing if-then-else constructs.
Actually this is not true. The SIMD width (warp size) is still 32. Divergent kernels won't suffer more with kepler. Maybe you got the wrong impression because nvidia's diagram with its architecture might be oversimplified.
Stupid Nvidia (Score:4, Funny)
Re: (Score:1)
Re: (Score:2)
their?
Pulled a fast one... (Score:1)
NVidia have pulled a past one here, which doesn't seem to have been widely picked up yet.
The codename for the 680 is GK104. The 460 and 560 cards were based on the cut-down GF104 and GF114 GPUs respectively and were midrange parts. The 480 and 580 high-end parts were based on the full GF100 and GF110 GPUs respectively and had a 384-bit memory bus (rather than the 256-bit bus used on the GF1x4 parts).
In other words - the 680 is really what would otherwise have been called the 660, it's just that nVidia's wor