Facts and Fiction of GPU-Based H.264 Encoding 79
notthatwillsmith writes "We've all heard a lot of big promises about how general-purpose GPU computing can greatly accelerate common tasks that are slow on the CPU — like H.264 video encoding. Maximum PC compared the GPU-accelerated Badaboom app to Handbrake, a popular CPU-based encoder. After testing a variety of workloads ranging from archival-quality DVD rips to transcodes suitable for play on the iPhone, Maximum PC found that while Badaboom is significantly faster than X264-powered Handbrake in a few tests that require video resizing, it simply can't compare to the X264-powered Handbrake for archival-quality DVD backups."
makes sense to me (Score:2, Interesting)
Wouldn't archival-quality backups be actual MPEG instead of H.2 or whatever? I mean if you're archiving, why go lossy?
Is it just a badly-designed test?
Re:makes sense to me (Score:5, Informative)
Re:makes sense to me (Score:5, Informative)
Re: (Score:3, Informative)
I was referring to the idea of encoding a lossy format in another lossy format, resulting in further losses. Not necessarily just the loss of the original lossless-to-lossy. Sorry if I was unclear.
Seriously, why encode twice? And why rate performance on how fast you can lose bits?
Re: (Score:3, Informative)
Re:makes sense to me (Score:5, Informative)
H.264/AVC includes lossless compression as well as lossy. The same is true for the wavelet based "snow" codec. Still, I'd recommend FFV1 for best compression, as long as you don't need the video to be playable by all the standard H.264 decoders out there.
This test is about reencoding from a DVD to H.264/AVC. If you want lossless quality, you need only copy the MPEG-2 stream... Reencoding to a lossless format will dramatically increase the file size, without any quality improvement.
Re:makes sense to me (Score:4, Funny)
Re: (Score:3, Informative)
Re:makes sense to me (Score:5, Informative)
You may have a point, or you might not. Depends on the definition of "archival", and your specific purpose for doing so. I imagine most historians who deal with digital data would scoff at your conflating the terms used to describe their work, with some home user who just wants to back-up their DVDs...
There's certainly going to be loss, when encoding from MPEG-2 DVDs to H.264. But considering how ridiculously large DVD video is for the relatively small amount of data it contains, I'd say a tiny drop in quality is generally acceptable in exchange for reducing the storage space required for near-as-high-quality backups of your DVDs in (eg.) 1/10th the space.
Don't quote me on that, though, it's just a hypothetical example. I just recently finished explaining, here, why H.264 isn't all that much more effective than MPEG-2 where indistinguishable/high-quality (rather than just "watchable") is desired: http://slashdot.org/comments.pl?sid=956141&cid=24940379 [slashdot.org]
In fact, you could probably re-compress a DVD with MPEG-2 (instead of H.264) and get equivalent quality at almost equally low data-rates, simply because the DVD producer's MPEG-2 encoders are terrible, and the settings they use (GOP size, fixed resolution/black borders, high frequency noise, etc.) waste a LOT of the bitrate on things which really don't improve visual quality.
And to be a bit pedantic... H.264 is, in fact "MPEG". It's MPEG-4 AVC (Part10), while DVDs use MPEG-2.
Re:makes sense to me (Score:5, Informative)
I don't know what your source is, but MPEG-2 can't even APPROACH MPEG-4 AVC quality at the same bitrate (at low bitrate), and MPEG-4 AVC can produce a much more compact file for a specified quality (such as where DVD-quality or better). On the other hand, MPEG-4 is much more recent, and takes an order of magnitude more processing power to encode and decode. MPEG-4 uses much improved intraframe compression, variable-size macroblocks, and more advanced descriptions of block motion. Even if we drop the issue of MPEG-2 support for B-frames and limits on P/B frames per GOP (limited by the MPEG-2 profiles, which could be ignored), MPEG-4 is much more efficient at removing redundant information. Finally, MPEG-4 adds more advanced entropy coding for the final lossless compression of coefficients, etc after lossy compression is performed -- the CAVLC coding is an improvement on MPEG-2's standard variable-length coding. CABAC's arithmetic coding is even more efficient than CAVLC.
MPEG-4/AVC was intended to deliver comparable quality to MPEG-2 at half of the bitrate, and certainly succeeds at low bitrates. At higher bitrates (near-perfect picture quality), you certainly would have been right about the Advanced Simple Profile for MPEG-4 (used in Divx, Xvid, etc), but AVC should still be more efficient.
Incidentally, the MPEG-2 profile allowed in DVDs was picked to ease the work of the decoding hardware (savings on cost for consumers), at the cost of compactness. The fixed resolutions, bit rate limitations (both max and min bitrates), and GOP limits make it much easier to create a compatible hardware decoder. Yes, they can sometimes significantly decrease compression, but they made early DVD players marketable. Within these significant limitations, the studio-grade encoding software and technicians are PHENOMENAL at delivering maximum quality. If you're used to consumer grade MPEG-2 encoding, something like the pro version of Cinema Craft Encoder is a revelation (an expensive one though -- nearly $2K). See if you can sniff up a trial or demo, and compare the output quality to premiere.
Re: (Score:3, Interesting)
Re: (Score:3, Informative)
Yours is the kind of response that I hate getting the most. You obviously didn't bother to read my post all the way through, AND most certainly didn't follow the link I provided where I explained everything in detail...
Yet, you spend time on a lengthy, indignantly reply, where you proceed to waste both your and my time, with questions I've already answered, in-depth. It only makes it more sad to know that your pointless rant got modded up. Anyhow, I'm going to skip those which you could already have read
Re: (Score:1)
Oh no, not the scoffing of historians. That's almost as bad as the whispered derision of computer nerds.
It seems even this article has a few fictions. (Score:5, Informative)
GPU encoders won't be able to compete with CPU encoders until they either get a lot faster (in which case they'll compete in the "high performance" market) or they get much better quality, since at sane settings x264 unsurprisingly blows Badaboom out of the water quality-wise, too. Until then, the product is not only completely proprietary but furthermore simply inferior, and they're going to have a very hard time marketing it.
Re:It seems even this article has a few fictions. (Score:5, Informative)
If you'd RTFA, you'd see this disparity is repeatedly mentioned, and they attempted to make a fair comparison.
In a direct comparison, using as close to the same visual quality settings as we could, Handbrake's circa February 2008 X264 codec actually beat the Elemental encoder by almost a minute. Image quality was roughly the same; we've included several stills below so you can directly compare the results.
Re: (Score:3, Informative)
Re: (Score:1)
constant unit time of media * (constant unit data / constant unit time) == inconsistent unit data
Re: (Score:2)
Re: (Score:2, Funny)
4 --aq-mode 0 --subme 1 --scenecut -1 --no-cabac --partitions i4x4 --no-dct-decimate in terms of x264 commandline... its no wonder its "fast" when they compare it to x264 on far slower settings!4
Do I lose nerd points for this looking like spanish?
Re: (Score:2, Funny)
This is what spanish looks like:
"esto parece como español"
Re: (Score:1)
Re: (Score:2, Funny)
As long as you can read some Spanish text and it looks to you like assembly language for some long-dead processor, you retain your nerd points.
Give it time - it is CPU bound right now (Score:1)
The CPU usage of the program when used with a good video card is 25% on my quad core machine, implying it is CPU bound right now. That means if they can get the CPU overhead down, even a little bit, they will stand to get huge gains.
Re: (Score:3, Insightful)
Wait, what?
If the CPU were running at 100%, then it would be CPU bound. Perhaps you meant to say it's GPU bound?
Re: (Score:2, Informative)
Obvious (Score:5, Interesting)
This is the most obvious and boring insight they could possibly offer... Everyone with the slightest interest knows this already.
The low quality of hardware-based video encoder cards is a very well-known fact, and those MPEG encoders cards are just ASICs on a PCI card, almost exactly the same hardware as your video card.
The point of offering up APIs for GPUs, and AMD's attempt to integrate the GPU ASIC with the CPU via HyperTransport, is aimed at improving things, however.
x264 does a good job because it's an open source project, with several skilled and interested individuals continually tweaking the code to improve quality and performance. Once hardware-based video encoding routines aren't hidden in closed-source firmware on a dedicated card, the same development effort can step up and improve HARDWARE encoding now, exactly as they have with software.
Not only can quality be significantly improved, you can expect performance to improve significantly as well, even with greater quality. The initial implementation of any codec is always relatively poor performing, and low quality, so this wouldn't even be an insightful observation if it was comparing x264 with any other software based encoder... The only difference is that a new software h.264/AVC encoder would be SLOWER than x264, as well as being much lower quality.
Compression isn't really parallel (Score:2, Informative)
From the CUDA guide-
"Every instruction issue time, the SIMT unit selects a warp that is ready to execute and issues the next instruction to the active threads of the warp. A warp executes one common instruction at a time, so f
Re:Compression isn't really parallel (Score:5, Informative)
Re: (Score:2)
Re: (Score:1)
One thing I notice though is that the SIMD instructions are used for the modelling the data and creating statistical probabilities for what the next lot of data will be. Other aspects such as the arithmetic/variable length encoding are very linear.
So it follows a loop
{
Get data block (linear)
Model data (SIMD-able)
Statistically Predict (SIMD-able)
Entropy encode (linear)
Write encoded data block (linear)
} while( there's data )
That entire loop mu
Re: (Score:2)
Re:Compression isn't really parallel (Score:4, Informative)
uh huh, tens of thousands of lines of asm....
~/x264-snapshot-20080812-2245/common/x86$ wc -l *.asm
165 cabac-a.asm
91 cpu-32.asm
51 cpu-64.asm
437 dct-32.asm
223 dct-64.asm
316 dct-a.asm
874 deblock-a.asm
659 mc-a2.asm
933 mc-a.asm
428 pixel-32.asm
1615 pixel-a.asm
600 predict-a.asm
383 quant-a.asm
968 sad-a.asm
519 x86inc.asm
124 x86util.asm
8386 total
Re: (Score:1)
Re: (Score:3, Informative)
Re: (Score:1)
Yeah but those tens of thousands of lines aren't exactly hand-coded then are they? It appears the developers have only hand-coded the ~9k lines as listed above.
Still, that is a fair amount of assembly code done by hand, relative to most modern programs written in 3rd and 4th generation languages (that might use only a handful of hand-coded assembly).
Re: (Score:2)
Re: (Score:1)
So if you have code that isn't SIMD-able you are really only using 1/32 available threads per unit of branching code.
In addition to what's already been said, there are other techniques that can be used when your code does in fact need to branch. For example, you can take BOTH paths, and then later pick the result from the path you want. This is common when you have lots of parallel hardware, whether made for you in a GPU, or in hardware you're designing yourself, like an ASIC or FPGA. So if you have
if( A ) {
Z = B + C;
} else {
Z = B - C;
}
then you have instructions (or hardware) that perform B+C, separate ins
Apples and Oranges (Score:1, Interesting)
Comparing a GPU, an SIMD (single instruction, multiple data) vector processor, to a CPU, a superscalar sequential processor, is like comparing apples and oranges. Sure, they are both fruits but they don't taste the same. Using the term 'general-purpose' to describe a GPU is pushing the limits of what a GPU is. Certainly, it can run general-purpose programs but much faster at running what it was designed to run, data-parallel applications. A GPU does not have to have a fast clock because it makes up for it b
Re: (Score:2)
To be fair, modern superscalar CPUs, particularly x86 (or x86-64), have extensively optimized SIMD units, in addition to their sequential/general purpose operations. The very reason Core2 outperformed its Opteron counterparts is because of much better SIMD performance. That generally means SSE instructions, but there are other options as well. A
Re: (Score:3, Informative)
You seem to not understand the difference (or that there is a difference) between multi-threaded programming, and SIMD data processing.
The former requires dividing a single application up into independent parts (threads), where no one part needs to wait for the output of
Re: (Score:3, Informative)
Yes, Core2 seems to have much better SSE units than the AMD chips, but this only really manifests itself when running code optimized to use SSE... And that's usually hand optimized assembly, as compilers aren't generally good at generating SSE code yet.
John the ripper SSE2 mode on a core2 is 2-3 times faster than the generic compile...
John the ripper SSE2 mode on an AMD (tested on a quad core phenom and dual core opterons) is slightly slower than the generic compile with gcc 4.3 and -O3.
The core2 beats a si
Re: (Score:2)
GCC certainly isn't, but GCC is more or less the slow dog in the race. ICC does quite a bit better.
And it doesn't necessarily have to be hand written ASM. Intrinsics seem to be gaining a bit more popularity in modern programs.
I'd bet a significant portion of the CPU-intensive programs out there, particularl
Re:Apples and Oranges (Score:4, Interesting)
This concept applies to many algorithms--the brute-force method is easily implementable on a GPU, but a faster and algorithmically smarter method is not well-suited to such an architecture.
Still in infancy... (Score:2)
It will take at least another 18 months before GPU encoding becomes seamless and the ideal solution for most users.
Intel is working on its own GPU, I am sure that they will exploit multimedia handling capabilities (video/photoshop) as one of the selling points of that GPU.
Re: (Score:1)
Re: (Score:3, Insightful)
Re:Clone DVD Mobile (Score:5, Funny)
So you paid money for a GUI that selects command-line options?
I'm in the wrong line of work.
Re: (Score:2)
Re: (Score:3, Informative)
You mean, like these?
http://ffmpeg.mplayerhq.hu/shame.html [mplayerhq.hu]
I happened to look at ConvertXtoDVD the other day. While ffmpeg itself is licensed under the LGPL, ConvertXtoDVD also appears to use both libpostproc and libswscale which are both GPL. The ffmpeg licensing page [mplayerhq.hu] states, "If those parts get used the GPL applies to all of FFmpeg."
I don't see any LICENSE.txt file nor any mention of the GPL or the LGPL in the version of the product I downloaded. Running strings against the binaries looking for things l
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Not only that, but x264 is one of the very best h.264 encoders out there. You could compare it to most other CPU based encoders and it would also come up trumps. Does this mean encoding on a CPU is better than encoding on a CPU?
Man I wish it was LGPL instead of GPL though.
Re:Not a valid comparison (Score:4, Interesting)
Re: (Score:3, Informative)
This is done by having an extremely simple open-source wrapper which is statically linked to x264; the raw frames to be encoded are passed to it over a pipe by the main program. This completely bypasses the limitations of the GPL without violating the spirit of it, since anyone who wants to can still read the source code of the wrapper, modify it, and recompile it as necessary and still use it with the main application.
Moreso, that is exactly how proprietary software is supposed to interact with GPL software. See Mere Aggregation [gnu.org], especially the last paragraph:
By contrast, pipes, sockets and command-line arguments are communication mechanisms normally used between two separate programs. So when they are used for communication, the modules normally are separate programs.
Re: (Score:3, Insightful)
Re: (Score:2)
I'm pretty sure that wouldn't fly very well in a court. You are still linking to the wrapper which has to be GPL. IANAL but you could state that includig the binary in your workflow is linking.
Re: (Score:3, Interesting)
Their wrapper is required to be GPL ; but since they don't distribute it, the source distribution clauses are not in effect.
Their commercial software pipelines frames into their wrapper ; they are separate processes, not linked, and thus their use does not violate GPL.
Otherwise you could argue that because you opened a Word document in OOo, that Word was now required to be GPL because it had emitted data that was now being consumed by a GPL application.
Re: (Score:2)
Well, if they don't distribute, then the GPL indeed doesn't apply. But if it they do, then an argument could definitely be made that the GPL would apply if the GPL code were an essential part of the software as a whole (ie. it couldn't be replaced), and they were distributing both sets of code together as a single software suite. The GPL license doesn't say anything
Re: (Score:2)
Re: (Score:1)
They're not encoding, they're transcoding (Score:3, Informative)
They're not encoding video. They're transcoding it. They're starting from one compressed representation and outputting another compressed representation. (Now, with twice the artifacts!)
The good test for this is football. The players, ball, and field are all moving in different directions. If the motion compensation gets that right, it's doing a very good job.
Re: (Score:2)
I wouldn't say football's real challenge is motion either--motion search is a rather simple part of most encoders and IMO definitely not the biggest challenge. The challenge o
Re: (Score:2)
No, they're encoding. Transcoding means you're reusing syntax elements from the original video to inform the encoder;
No, transcode means decoding one format and encoding into another format. You may have had a program or project that took advantage of shortcuts in that process but those techniques are not part of the definition of the word transcode.
Forget GPUs (Score:2)
Re: (Score:2)
http://www.fixstars.com/en/company/press/20080403.html [fixstars.com]
Which Graphics Card? (Score:1)
Did anyone catch what GPU/graphics card they used? The article mentions they used a Q6600 ($185) as their test CPU but it makes no mention of which GPU they ran with.
Did they run this on an 9800GT? 8800GT? 8600?
To make this a fair comparison they should be running the test on a system with a quadcore and the lowest end GPU for the CPU test. Then run the same comparison on a low end Intel CPU (same price as that low end GPU from above) and a GPU priced about the same as their Q6600.
This would fit better with