Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Hardware

Linux Gains AltiVec Support 179

Anonymous Coward writes: "Terra Soft today [Note: Thursday] announced development support for AltiVec (a.k.a. "Velocity Engine"), saying that Black Lab Linux running on a PowerPC G4 may offer up to a '150-300% increase [in performance], with some Linux applications running in excess of 10 times (1,000%) their normal performance.' The AltiVec-enabled Black Lab Linux offers the GCC compiler with support for the AltiVec C and C++ extensions, as well as Linux-kernel run-time support for AltiVec enabled applications."
This discussion has been archived. No new comments can be posted.

Linux Gains AltiVec Support

Comments Filter:
  • by Anonymous Coward
    I heard that it is so fast that even the "sleep" command is 10 times faster!
  • by Anonymous Coward
    Does GCC support 3dfx and SIMD yet? I know some MMX support is there, but when can x86 see some speedup due to the new instructions.
  • How is the compiler going to know what is to be manipulated as a vector? Which matrix will benifit from SIMD optimizations? These kinds of structures require definitions to distinguish them from ints and floats so that the compiler can generate the appropriate code. It will introduce incompatibilities, but the tradeoff os worth it, especially for things like scientific applications and games.
  • by Anonymous Coward
    Minimum 150-300% to max 10X speedups are being claimed here. Great, so I checked the site and I am a bit disturbed by the complete of details.
    • Where are the benchmarks?
    • What was speeded up?
    • Alan Cox has done MMX opts to memcpy, how does this compare?
    • If Alti-vec can do it so can MMX, 3dNow, and Katami. So are ports for these x86 vector ops also being done?

    So many questions and so little information.
  • Back when I was a Mac guy, I quizzed Apple reps about the amount of 68k code left in the Mac OS Apple's line was that most of their effor was going into converting the most-used paths over to PPC, since it was the most productive use of programmer time. They never expected the Mac OS to be completely 68k free, since some of the code would never be worth replacing.

  • FYI, Solaris on Intel systems pretty much will now cost the same as Linux, thanks to Suns new idea about licensing.

    That depends on one's idea of cost. Personally, I find installing Solaris on Intel costs a damn sight more than Linux, mainly because it is slower and comes without such basic tools as perl.

  • Perl will be, when released. Which saves you one build. When you can get it (I don't rate not-yet-shipping products).

    Of course, that still leaves you pulling down gcc so you can compile stuff, and all the wierd and wonderful tools most people get used to.

    IME, setting up Solaris as a productive environent takes about a day starting from scratch. Setting up a productive free *ix environment takes an hour or two.

    Of course, free *ixen don't run on E10000s, but for workstation and small server use (the Intel world, in other words), Solaris has a pretty marginal value, if any.

  • The acronym is SIMD. Which stands for Single Instruction Multiple Data.

    http://www.whatis.com/simd.htm [whatis.com]

    Graphics programming uses lots of matrices and vectors to represent geometric elements. Often you have to scale a vector by a certain factor which involves multiplying each element by that factor.

    [ 2 4 5 6 ] * 2 = [ 4 8 10 12 ].
    Instead of multiplying each value by 2 using a seperate instruction you can multiply the entire vector, by 2 with just one instruction.

    In a nutshell
  • cost savings to you? $0

    why? because both the machine and software are made by Apple, it's not like Apple spends any money giving you the OS on your machine, so BFD.
  • you might want to take a look at solaris 8 before you go off again. Perl is included, or at least it is according to [sun.com]
    http://www.sun.com/software/solaris/whatsnew.htm l
    --
  • In that case I'm sorry. However, if that was what you were trying to put across, what got quoted from the press release was badly worded. As for Apple, I'm sorry, I've just had bad experiences with them.

  • One hopes that they get a performance increace just by writing certain standard libraries like the x libs and mesa.

    One also hopes these people remember that all of those products, and gcc, are under the gpl, and that they're not the only ones with the right to use it. (Although if they have finished products now, this implies that Apple's been letting them fool around with it for longer than they've had the modifications public. Yet more fodder for the idea that they had more help from Apple, because they tend to stick more to Linux as a server rather than a desktop OS).

  • True, and remember that MMX was first...
  • AltiVec is the difference, so Apple (and IBM & Motorola, who make the chips) has plenty to brag about. Their hardware is mighty fine.

    So tell me, which apps are faster?

    Or is everyone just going off based upon a spacey press release?

  • 7400? Isn't that a quad TTL NAND gate? :) Is a G4 actually a PPC 740? (G3 = PPC 750, right?)
    #define X(x,y) x##y
  • pgcc can generate MMX code now. Check out The PGCC FAQ [goof.com]
    #define X(x,y) x##y
  • Damn, I should have stopped after the first sentence. A DM7400 is a quad NAND TTL chip. A G4 is a Motorola 7400.
    #define X(x,y) x##y
  • It's quite easy to put 3dnow instructions in your C source with gcc because GAS, the assembler knows about 3dnow instructions e.g. to add a couple of floats using 3dnow:

    #include

    float a,b,c;
    int main(void)
    {
    /*int foo;*/
    /*foo=0;*/
    a=1.0;
    b=2.0;
    c=0.0;

    asm("femms;");
    asm("movq a,%mm0;");
    asm("movq b,%mm1;");
    asm("pfadd %mm0,%mm1;");
    asm("movq %mm1,c;");
    asm("femms;");

    printf("%f\n",c);

    return 0;
    }

    I'm new to inline assembler, and not very experienced, but that hack seems to work.
  • A stupid question for the hardware gurus out there: what's a vector register?

    ___
  • Code is not specs or documentation.
  • Well that isn't quite true. Take a look a Mac OS 9.0 system -- typically 45% of the code is still uses 68k registers. Not quite as 68k free as Apple wants you to think.

    This still is a major improvement over Mac 0S 8.0 -- which if I recall correctly used alot more 68k code, probably around 60% of the registers were 68k. Still lots of cruft and old junk hanging around for compatiblity (although Windows isn't much better -- look at micros~1.doc, 16-bit apps, the really H^H^H^ MS-DOS, etc.)

    The classic Mac OS relies heavily on 68k assembly, in some cases replacing it with newer PowerPC-native code would break lots of applications. And don't be shocked if Mac OS X contains some 68k code -- especially with the Blue (Classic) box.

    Take a look at macfixit.com for some stats on this...
  • I know this was posted by an AC, but this is obviously not a troll--this is humor. Get with it.
  • What is with the Javascript site ads in the middle of a theme? Tell me that we aren't going to have to tolerate commercial "announcements" in the middle of Slashdot threads. Yee Gods!!
  • FYI, Solaris on Intel systems pretty much will now cost the same as Linux, thanks to Suns new idea about licensing... $75 for a media kit, or i believe you can just download Solaris from Sun for free, supposing you register. It's only free up to 8 CPU's, though... But if you're going beyond 8 CPU's, you're pretty much in desperate need of a real machine anyhow. In this context, real will mean Sparc, Mips, or PowerPC... Though not Mac hardware.
  • How much do you expect apple to take off the price tag for a machine without the OS? $50? Big savings...

    Why arent' other manufacturers jumping into the void that's waiting to be filled of stock G4 systems? Because overall, the demand isn't there. x86 will always be the commodity platform. Past that, and you need to buy the machine from a workstation vendor. In this case, Apple is at least a low cost provider of said workstations.

    Tell me, where can one buy stock PA-RISC systems? How about bare bone MIPS R10000 systems? Or even stock 32 CPU SPARC Systems? No where that I can tell.
  • That's what i meant... cpu's as in chips... shoulda been clearer, i guess. But what i meant was if you need more than 8 CPU's, you really don't want to use intel boxes...
  • Not all PowerPCs have these registers, but the G4s do.
  • actually, the altivec vector units can do quite a bit. 160 instructions, iirc.
  • VLIW/EPIC is a BAD idea. i'm not sure what you mean by "schedulers in RISC chips having trouble keeping up". VLIW is a crippled architechture; no runtime scheduling, only static compile time (note that i'm ignoring majc+java right now -- majc is actually a very interesting arch). also, ia-64 is seriously flawed, in that it directly exposes the implementation to the isa (that's just one of many bad design decisions with ia-64, though).
  • Wonderful wonderful flame war.
    Carry on.


  • On your left, we see a gathering of trolls, who have sunk to the level of pointing out verbal faux-pas.

    Although, i'll conceed one thing -- the "Summer grits, make me feel fine!" made me laugh.

    Bowie J. Poag
  • You've got it exactly backwards, so calm down.

    They're saying that the AltiVec Linux apps are [occasionally] 1000% faster than the non-AltiVec Linux apps. Linux is the common factor, not the difference. AltiVec is the difference, so Apple (and IBM & Motorola, who make the chips) has plenty to brag about. Their hardware is mighty fine.
  • hold the phone! The GCC had all the changes made to it. Linux is made to be a PORTABLE OS, and with a bit of work, will operate on any hardware with a MMU and (usually) an FPU. With you're arguement, there's no point in haveing Linux work on a Pentium chip, since it's nothing more than extra cache and some instructions more than two 386 chips wired in parrallel. We want Linux to work as speedily and efficiently on the chip we're useing it on as we can. And since we can omit uneeded code at compile time, bloat is impossible. Yes, the kernel tree can get a bit bigger, but it's not out of control. We're not trying to DO everything, we're trying to have the CAPABILITY to do ANYthing.
  • I do beleive you're looking for an IBM RS/6000 server.
  • You are soooo wrong.

    AltiVec happens to be the ONLY way to do SIMD type instructions on the PowerPC architechture. The x86 architechture, OTOH, has several incompatible systems: MMX, SSA, SSA-2, 3DNow!, etc. The PowerPC camp will never fall into this problem, since Motorola has liscenced the technology to IBM. And in case you didn't know, only Moto and IBM actually produce PowerPC chips. So, on the PowerPC, it is AltiVec or nothing. And the great thing is, AltiVec kicks the crap out of any one of those SIMD systems on x86.

    If you really want to cut out some bloat, start with all those x86 SIMD systems. Maybe support plain-vanilla MMX only or something. But that would just suck, wouldn't it?

    Linux users want the best performance possible. AltiVec gives this to them. A few more kB of source to d/l is a small price to pay.

  • This is due to use of the vector registers right? Then does it realy work on _all_ G4's. I seem to have some vauge memory that not all G4 cpu's have those registers.

    We have an old Fuijutsu here at work, it does 40Mflops whitout the vector regisers enabled and 1500 Mflops with.... drool

    /das Ix
  • You should do a little research before insinuating that someone else is a bad programmer next time. The vector datatypes being talked of here are VERY different from the STL vector template class. These are datatypes that represent the fundamental 128-bit data in the Altivec instruction set, much like double typically means a IEEE 64-bit floating point number.

    Altivec adds the following data types to to C/C++:

    vector unsigned char

    vector signed char

    vector bool char

    vector unsigned short -- a.k.a. vector unsigned short int

    vector signed short -- a.k.a vector signed short int

    vector bool short -- a.k.a vector bool short int

    vector unsigned int -- a.k.a vector unsigned long or a.k.a vector unsigned long int

    vector signed int -- a.k.a vector signed long or vector signed long int

    vector bool int -- a.k.a vector bool long or vector bool long int

    vector float -- 4 single-precision floats

    vector pixel -- 8 1/5/5/5 bit pixel elements (for graphics)

    The elements of the bool types can only be all zeros or all ones. These vectors are usually used as masks or selectors in certain Altivec calls. The pixel type is for representing 16-bit color pixels and handles overflow within the 1/5/5/5 portions of the pixel.

    This can all be found on pg 21-22 of the Altivec Technology Programming Interface Manual, which can be found on Motorola's site here [motorola.com].

  • That is, certain PPC Linux apps with Altivec perform 1000% times faster than without Altivec.

    What did you think they were talking about?
  • This has been gone over before. The reason they won't do it is because they are afraid of being sued for the inclusion of copyrighted, patented, or trademarked material into Darwin without the ability to pull it.

    Take the DeCSS thing. If Apple had been the originators of code that had had DeCSS tacked in, without the ability to perform fire control and remove the offending code without possibility of someone having said code with Apple's permission (as given in the GPL), then Apple could be sued for their open sourced code. Linux, as a system with more decentralized ownership over the code is a much harder to hit target than a large money-rich corporation like Apple. The potential legal losses outweigh the benefits. This way, they get much of the benefit of an open source model without the risk of being burned.
  • If I remember what I read a year or so ago, the compilers and libraries recognize C calls that look a lot like the assembler calls. In effect, you can call the assembly instructions like C functions. I believe there are also some libraries to do certain common vector tasks purely in Altivec. There are also additional data types to cover the different kinds of Altivec vectors (16 8-bit, 8 16-bit, 4 32-bit integer vectors and 4 32-bit FP vectors). At least, that's what I remember of the modifications done to Apple's exceptional MrC optimizing PPC compiler.

    I'm not sure that any of the kernel is enhanced, unless they've found a way to have the compiler optimize to parallelize some of the code, but this has been shown in the past to be a monstrously difficult task to accomplish, and is usually is only applicable on small sections of the code.
  • IBMs CHRP is supposed to allow you to buy one like that. last announced date was a coupla months ahead i think.
  • We provided Absoft with out AltiVec RPMs and I do believe that their compiler is indeed AltiVec enhanced.


    Regards,
    Dan

    Dan Burcaw
  • It's not like we tell Motorola what to say in *their* quote ;) Anyway, my point was that Motorola and TSS worked on this. In fact, I think Apple's own AltiVec egcs/gcc patches are from Motorola (plus their own additions I suppose).

    By the way, what does your personal experiences with Apple have to do with Apple helping or not helping Linux companies? I'll I can figure it that you're a BeOS user :P

    Regards,
    Dan

    Dan Burcaw
  • um, MMX is also x86 SIMD. It's not floating point, tho... and neither of KNI or 3DNOW are clones of the other... Vector processors have been around for a while.

    Oh, and KNI is 128 bit, while 3d NOW is 64 bit - in this case, twice as many bits is twice as fast.

  • I coulda sworn that it operated on 128 bit quadfloats... but I could be wrong. The way it does it is irrelavent.

  • CPU as in the actual processor is what you're referring to here. The license doesn't allow for a single box with more than 8 processors in it. Straight from Sun's page, "For only the cost of media and shipping, you can use the software on an unlimited number of computers with a capacity of 8 or fewer CPUs." Just wanted to clear some confusion up.
  • Ask abit or asus to make one. No one is stopping them. See here [openppc.org] for more info.
    --
  • SPECINT95

    Compaq Computer AlphaServer ES40 Model 6/667
    Result: 40.0 Baseline: 35.6
    (DEC Alpha 21264A 667 MHz, 4GB RAM, Tru64)

    Digital Equipment AlphaStation 200 4/166 Result: 2.31 Baseline: 2.31
    (DEC Alpha 21064 166 MHz, 64MB RAM, Digital UNIX)

    Dell Computer Dell Dimension XPS Pro200n
    Result: 8.08 Baseline: 8.08
    (PPro 200 MHz, 64MB RAM, NT4)

    Dell Computer Precision WorkStation 420
    Result: 38.9 Baseline: 38.2
    (Intel Pentium III "Coppermine" 800 MHz, 256MB RAM, NT4)

    Dell Computer Precision Workstation 610
    Result: 24.3 Baseline: 24.3
    (Intel Pentium III Xeon 550 MHz, 256MB RAM, NT4)

    Intel Corporation Intel VC820 motherboard
    Result: 38.4 Baseline: 37.9
    (Intel Pentium III "Coppermine" 800 MHz, 128MB PC800 RAMBUS RIMM, NT4)

    Sun Microsystems Ultra 80 Model 1450
    Result: 19.7 Baseline: 16.2
    (450 MHz UltraSPARC-II, 512MB RAM, Solaris 7)

    IBM Corporation RISC System/6000 H70
    Result: 16.0 Baseline: 13.7
    (340 MHz PowerPC RS64-II, 2496MB RAM, AIX 4.3.2)

    IBM Corporation RS/6000 44P-170
    Result: 25.3 Baseline: 23.5
    (400 MHz PowerPC-II, 1GB RAM, AIX 4.3.3)

    Source: specbench.org

    SPECINT2000 is too new. There aren't enough submissions yet.

    This is all for single CPU workstations. I dunno. Motorola doesn't seem to believe in submitting benchmarks to SPEC, so I had to use some older RS6000 systems running AIX. IBM doesn't seem overly interested in submitting benchmarks, either.

    For my money, I think I'll go with an Intel or Compaq/DEC solution. Sure, the Sun and IBM workstations scale like hell, but they cost ten times as much as an Intel solution. I couldn't possibly see using Intel boxes as enterprise servers, but for workstations, they seem to be tops. If the DEC Alpha was cheaper, I'd go with that. As it is, I just bought a brand new Multia (166 MHz DEC Alpha 21064) for $150. It's hard to beat that. 64 bit computing at the speed of a Pentium 100 (integer) or 200 (floating point), for practically nothing. It should be upgradable to the 233 MHz 21064, as well. We'll see...

    Of course, Intel systems suck at floating point, so I didn't bother to cut and paste that. We all know that Intel would come in dead last in that benchmark. Your only choice is the Alpha.

    I'm not quite sure where the new PowerPC processors fall. They're more expensive than Intel Coppermine chips, but there's little chance they can scale or perform better than the other entry-level solutions.
  • I agree. Quake 3 is a great benchmark if you're mainly going to be playing Quake 3. It's also a very good benchmark of total system performance: video, CPU, memory, etc.

    I'm most interested in pure CPU speed, though. Given a PCI motherboard, I can put whatever hardware I want in it. I feel kind of sorry for the Mac owners, locked into Apple/ATI hardware. It's really quite sucky. I just want the Motorola CPU. I couldn't care less about the rest of the Macintosh. I would just throw everything but the CPU (and maybe motherboard) into the trash.

    It's probably best to forget about Motorola hardware and save up for your very on Compaq/DEC Alpha 21264. Those fuckers are expensive!!
  • Okay, so where do I buy a single or dual processor Motorola PowerPC motherboard with 5 or 6 PCI slots, 2 serial, 1 parallel, and perhaps a USB or Firewire port?

    I don't see any of them on the market...
  • Motorola doesn't seem to believe in submitting benchmarks to SPEC, so I had to use some older RS6000 systems running AIX.

    The SPEC benchmark is for complete systems, not the CPU. Motorola doesn't make any computers, so they can't submit scores. It is Apple who should do that work.

    Kjetil T.

  • ummm.. KNI is actually two 64 bit functioning units operating in parallel. Intell had to do this so that they kept compatability with MMX instruction ergisters... so calling it 128 bit is like calling a dual processor PentiumPro machine a 64bit architeture....
  • I'm designing a new language (for a school/Intel Science Competition project, but I hope it'll be good...), and I'm wondering how well people would accept an extensible language that allows vector support... but would be a departure from C. There's no way to extend C enough to elegantly support vectors; is it time to move on?

    (Yes, that's AltiVec assembler in my sig, it's a quine):

    Where is my mind?
    mfspr r3, pc / lvxl v0, 0, r3 / li r0, 16 / stvxl v0, r3, r0
  • Ey!

    I've still gotta go explain that pseudocode I left off on, in the middle of the message...
    I'll go do that.

    Where is my mind?
    mfspr r3, pc / lvxl v0, 0, r3 / li r0, 16 / stvxl v0, r3, r0
  • You don't have to. AltiVec allows fp and int operations to be performed on vec registers. You are allowed to take a value in an int register and replicate it to fill all the spots on a vec register, tho.

    Where is my mind?
    mfspr r3, pc / lvxl v0, 0, r3 / li r0, 16 / stvxl v0, r3, r0
  • I didn't realize mot makes stuff like that - that's one small-ass mobo!

    Where is my mind?
    mfspr r3, pc / lvxl v0, 0, r3 / li r0, 16 / stvxl v0, r3, r0
  • if they can just get rid of these friggin tiny keyboards

    I'm still using my old-skool ADB keyboard with my G3. Before the iMac, Apple shipped two types of keyboard, Design and Extended - basically the same thing. There was very little market for third party makers because unless you busted yours, it worked just fine.

    I personally feel that they got paid off from some of those third party manufacturers who come out with a standard USB keyboard that allows you to dump the cruddy iMac style board.

    this seems to be the thing most complained about with the current Macs, that and the long wait for OS X client

  • C++ has the STL vector types (also a matrix type, right?)

    C++ has a vector, in the sense of a variable size array, and also a valarray, which acts like a mathematical vector. But valarray sucks hard (the design is based on F77 and gives quite poor performance on modern CPUs). There is not matrix type in the ISO libraries, however, Blitz++ is a (big complex) math library in C++ - it does matrix and vector operations, all kinds of weird functions that I don't want to know about, etc, etc. You can find in on Google, it's very well known (it's GPL/Articstic, BTW).

    I'm sure that if this became well known and popular, the libstdc++ and Blitz++ people would add support for it in their code.

    Damn - 32 128 bit registers! I fscking hate x86!! I'm so jealous! :(
  • I feel kind of sorry for the Mac owners, locked into Apple/ATI hardware.

    Don't feel sorry for me. My G4 has a 2x AGP port and supports 3dfx Voodoo cards just as well as the ATI Rage 128 Pro it came with, using the beta drivers 3dfx released recently.

    3dfx even announced they will have official, release-quality support for their upcoming VSA-100 boards on the Mac - the Voodoo4 and Voodoo5. See mac3dfx.com [mac3dfx.com] for more info.
  • and this makes me a jackass because... ?

    Mike Roberto (roberto@soul.apk.net [mailto]) - AOL IM: MicroBerto
  • There's a way more powerful and .. efficient API. I think efficient would be the best way to put it, because from what I know, the libraries are way cleaner and do not produce crap that you'd get from something like some x86 libraries out there. These puppies can also take things like recursion and clean it up better than anything else as well.
    Mike Roberto (roberto@soul.apk.net [mailto]) - AOL IM: MicroBerto
  • Ok, you might have me there, but this is proof that the open source model will dominate, at least. Seeing that linux runs under open source model, linux will dominate and take best advantage of it.

    Mike Roberto (roberto@soul.apk.net [mailto]) - AOL IM: MicroBerto
  • short answer - no.

    Mike Roberto (roberto@soul.apk.net [mailto]) - AOL IM: MicroBerto
  • Most cool!! Linux has Supercomputer Power now!!
    Now just imagine a beowulf of these!!!!

    Sorry, I had to say it.
  • The instruction definitions for MMX and 3DNow instructions are already there in gas. You can make use of them by means of the asm() feature in gcc and egcs. I've written several MMX-enhanced programs using egcs. But, these compilers will not themselves issue the egcs instructions. Also, you need to be careful to manually maintain the proper relationship with other FPU usage.

    Problems with this approach are:
    - gdb does not understand the FPU registers. Debugging MMX code is a real chore. You need to store things into memory before gdb can see them.
    - It is up to you to decide when and how FPU registers need initialization.
    - You are working in assembler and need to understand how to properly use asm(), __volatile__, and the like.

    But it definitely works. I got reasonable speedups. MMX, 3DNow, etc. noticably inferior to AltiVec as an instruction set, but that has nothing to do with gcc and egcs. The asm() integration with the rest of the egcs compiler does make short MMX sequences quite reasonable. For longer code sequences it is better to write separate modules in assembler.
  • There you are!!!

    What's up?

    I'll get back to you about AC/EC&Upla in a while, but for now my mind is fried. :)

    Dilbert: I have become one with my computer. It is a feeling of ecstacy... the blend of logic and emotion. I have reached...
  • There's a good article [arstechnica.com] at Ars Technica on SIMD architectures, including Motorola's Altivec.
  • This is actually really cool.

    The one thing that has really bothered me about apple was their marketing claims, since that apps had to be specially writtent to get their preformance gain.. Now that you can get it under linux.. hmm..

    thats all
  • A reasonably accurate description of vector operations (or more accurately Single Instruction, Multiple Data operations), but is it truly necessary to bring in the linear algebra concept of a vector space? Strictly speaking, your defintion is not even entirely correct or complete. You define b and c as doubles, while b and c are formally scalar quantities; it is entirely acceptable for b and c to be defined over the scalar field of the complex numbers. Moreover, it is not entirely true that all vector spaces can be represented as an ordered list of numbers. For certain vector spaces, some structure (ie. existence of a inner product) is lost when the representation of the vector space is coerced into such a form. You also fail to present the closure properties of formal vector spaces with regard to scalar multiplication and addition as defined over the vector space. In the future, please karma whore in a more accurate fashion.
  • Correct. Just as sales of Micro$oft Offie fund development of the Sindows OS, sales of Mac hardware subsidize Mac OS 10 [apple.com]. I refuse to call it X [x.org] because it doesn't come with an X server [xfree86.org], only some Display PostS#t they call Quartz.
  • I seem to have some vauge [sic] memory that not all G4 cpu's have those registers

    When IBM was first putting copper into its PowerPC 750, they codenamed the project G4. But Apple put those into "G3" computers; people just called those "copper G3." What Apple called G4 was the PowerPC 7400, the chip with AltiVec aka Velocity Engine(tm). And the name stuck.

  • A vector space is a set of objects for which the following are true for all b, c, x, y:

    • double b, c; vector x, y;
    • x + y == y + x;
    • x + (y + z) == (x + y) + z;
    • x + (vector)0 == x;
    • x + -x == (vector)0;
    • 1.0 * x == x;
    • (b * c) * x == b * (c * x);
    • c*(x + y) == c*x + c*y;
    • (b + c)*x == b*x + c*x;
    All vector spaces can be represented by an ordered list of numbers. A typical "vector register" holds a four-dimensional vector as an array of four scalars (ordinary numbers).

    A vector execution unit in a processor can do the same thing to all four components of a vector, or do other predefined transformations. For example:

    • x + y is defined to be [x[0]+y[0], x[1]+y[1], x[2]+y[2], x[3]+y[3]]
    • c*x is [c*x[0], c*x[1], c*x[2], c*x[3]]
    • x dot y is x[0]*y[0] + x[1]*y[1] + x[2]*y[2] + x[3]+y[3]
    Essentially, vector hardware increases the speed of doing the same thing to a lot of data. If you still need help, look for "linear algebra" on Google [google.com] or any other search engine [ignifuge.com].
  • Be thinks [be.com] Apple's not releasing specs, but Be's not R.E.ing [be.com] anything. What Be doesn't understand (I've mailed them about this) is that Apple provides the complete source [apple.com] for the kernel of Mac OS 10 [apple.com] (not X [x.org]).

  • Now if only they'd release Darwin [apple.com] (Mac OS 10 [apple.com]'s kernel) under [L]GPL instead of the dumb APSL [gnu.org], we'd have some good stuff.
  • Pentium III's KNI (your x86 simd stuff) is a (badly done?) clone of 3DNow!.

  • There are at least two mutually incompatible ways to do everything

    What's the other way to do SIMD on a PowerPC G4 chip? Anyway, if it's not used, it'll go unmaintained until someone picks it up. And the kernel (or a kernel-level module) is the right place for AltiVec, as it requires some low-level processor manipulation.

  • they added new primitive types and storage classes, like "vector", rather than bother to do loop vectorization in a compatible way.

    Did they do it the same way as the MacOS compilers? When the G4 Powermacs first came out, I took a quick look at some sample Altivec code on Apple's developer website, and thought the way they handled the vectors was pretty nasty looking... like to initialize a vector variable, you did something like vector v = (vector)(0x50147242, 0x72353233, 0xbedac0ed, 0x3aa10dab);

    Wasn't exactly like that, but the gist of it was that it looked like a cast of a list of constants separated by the comma operator. Eew :)

  • This story is about Linux, remember? If the kernel starts using AltiVec for memcpy, TCP checksumming, etc. all apps will benefit.

    Likewise, if some of the crucial libraries like libart and libjpeg get AltiVectorized then many apps will get faster with no changes.
  • by Anonymous Coward
    I strongly disagree with you on this.

    The linux kernel isn't going to get "bloated" in any way. Sure, there might be more source to download, but the size is determined about what elements you want in it.

    Also, Altivec and other SIMD operations are handled by the compilers (when the code is compiled), not the kernel. The kernel could care less what SIMDs the chip has on it.

    OpenGL isn't handled by the Linux kernel either.

    And none of the above APIs are in any way connected to a kernel of any operating system known to mankind. Windows Mediaplayer stuff is an application. DirectX is a "layer" that goes over windows, so its klunkyness doesnt get in the way.

    SIMDs are good things. It lets you break off a bit from the chips original instruction architecure and add new features, without breaking backwards compatibility. Sure, Intel, AMD, and Motorola use them for marketing reasons too, but those are just perks.

    They also enable you to set up dedicated data pipelines to a part of the chip that will in no way slow down or take away bandwidth from the other elements of the CPU. SIMDs are too complex and too good for somebody from the marketing department to dream up :)

    The downside of this is everybody has their own little SIMD. Using Open Source, we can just tell our compiler to compile it with whatever SIMD we have and it'l optimize the code for that. So its not -that- bad.

  • Be's excuse is stupid. Apple is also a new company in that they're not spending money on stupid projects that lose them money. When a company is going out of business, do you think they're going to subsidize Be's development? Nope. Given that nobody else has had any problems with Apple's hardware, and Apple itself has brought forth not one but _2_ open-source operating systems, Be's argument is more than a little silly at this point.

    Be has some ass-kicking technology, but the fact is that they can't stick with any business plan for an extended amount of time. They figured it'd be financially more beneficial to switch to X86 (and may have been right, at the time at least - no they're moving to 'IAs'), but didn't need to use the "Apple stopped us" excuse. Why should Apple stop them? They make a hardware sale anyhow, which is better than someone buying a PC, right? They just weren't willing to subsidize Be's development.


    - Jeff A. Campbell
    - VelociNews (http://www.velocinews.com [velocinews.com])
  • Signatures have traditionally been an 'acceptable' place to put a link or short description of a person's site. It's no longer than anyone else's sig, so what's the problem?

    Now, if it were spam - ie. the entire post was made for the sole purpose of promoting the person's site - then that's another thing. I handle spam as part of my job and I don't exactly view it favorably. But this is just a sig...

    - Jeff A. Campbell
    - VelociNews (http://www.velocinews.com [velocinews.com])
  • Um, the 'X' stands for 10. I believe latin predates the X Windowing system by at least a few millenia.

    And yes, sales of Mac hardware subsidize OSX development. What's so wrong with that? That's how companies work - they make money, and reinvest some of it into their operations. Big deal. Nobody is forcing you to buy Mac hardware - if you don't like it, buy something else. Do you flame any other make and model of car other than the one you drive?

    I can say this: Zealotry is never pretty.

    Also, it sounds to me like you don't know much about Quartz, either, as it has some decent features X would do well to emulate. Each system has its strengths and weaknesses.


    - Jeff A. Campbell
    - VelociNews (http://www.velocinews.com [velocinews.com])
  • The ATi hardware isn't locked down or anything. ATi Provides VERY good colour control which many publishers and graphic arts people adore, these people also adore their Macs. The G4 has a 2x AGP slot that will work with any AGP video card, providing the video card has Mac drivers and extensions. By September when alot of the new video chipsets are out there will almost certainly be Mac versions of the hardware, these only require a different ROM and drivers.
  • Will MPPC chips become commodity in the next year? One can hope so. I'm a big fan of the PPC architecture, I'd love to see it become a little more widespread. It would be nice to click through buycomp.com and see MPPC 750 and 7400's along with motherboards for them. I think a great use for G3/4 MPPC chips on Linux (or just about any other free Unix) would be media production. High power graphic workstations are getting more commong but they are still high priced pieces of equipment, the media companies have just now been able to afford them in larger numbers due to their relative success. The free Unicies make a real good bed for media to come visit. Open sourced kernels lend themselves to a good deal of optimization which will result in faster system performance in an area where time is money. I don't know if Linux is mature enough yet but FreeBSD on a render-farm of G4s would kick some ass, like the one from The Matrix but larger (and on MPPC 7400's).
  • Most cool!! Linux has Supercomputer Power now!!

    (tongue-in-cheek, yah, but cool nonetheless)

    I'm wondering... what do the C extensions look like? C++ has the STL vector types (also a matrix type, right?) but C just has arrays of int/float/double. Is there an API reference anywhere? Is an API even involved? What would AltiVec-enabled C code look like?
  • by Pope ( 17780 )
    Intel's MMX, 3DNow!, and other extensions to the x86 instruction set demanded re-writes too, ya know.
    Now if they can just get rid of these friggin tiny keyboards!! (I'm trying right now in my new G4, and man does it suck)

    Pope
  • > x86 will always be the commodity platform

    The commodity platform for the desktop, sure. But it's kind of short-sighted to think that desktop computers are the hot commodity for the future, isn't it?

    It's amazing that x86 has lasted this long (read: the WinTel alliance was a stroke of genius), but the design is _old_ and stretched terribly thin. I say good riddance.

    --Mid

  • Can Altivec do register moves between the GPR, FPR and Alti-Vec unit without having to do a Store/Load to memory/cache? One of the real pains of PPC is that it can't do a direct 64bit single beat read/write without using the FP registers, but you cant directly manipulate them, you have to do 2 32-bit stores then a double float load, then a double float store to your 64bit bus device.. its slow. PPC EC parts dont have floating point and have no method if accessing 64bit devices except through cacheline fill/castout bursting, not very helpful for I/O devices.
  • Where do you get this cruft? Do you see an Apple quote in this PR? Nope. Motorola has been working on AltiVec patches for gcc too, you know. (It is their technology). And yes, we helped them (Motorola) out some.

    Go download the gcc patches and put them in "Phil-14's Linux OS". The GPL allows that, and we welcome it.

    Regards,
    Dan

    Dan Burcaw
  • The .ppc.rpm files on altivec.org are from us and is what is shipping with Black Lab Linux.

    altivec.org is basically the starting point for everything AltiVec, so we're putting the RPMs there and linking that site to our web page, etc.

    rpm -qi on those .ppc.rpm's should show our information in the Vendor and Distribution fields.

    Regards,
    Dan

    Dan Burcaw
  • I wouldn't be a bit surprised if someone is working on an optimized X that uses G4 altivec acceleration- that would seem to be a no-brainer.

    heh, with the X Project, i don't think ANYTHING seems like a no-brainer!

    Mike Roberto (roberto@soul.apk.net [mailto]) - AOL IM: MicroBerto

  • IBM is working on their new PowerPC Open Platform [ibm.com] boards, which are very cool. Capability for an arbitrary number of CPUs and runs on a PCI bus.
  • Not 3dfx. (3dfx makes the Doodoo, erm, Voodoo graphics cards. At least they open sourced Glide.)

    YM 3DNow! the streaming SIMD extended instruction set AMD added to the K6 chips and that Intel copied in Katmai/PIII.

    BTW, SIMD = single instruction multiple data. First, instruction decoding limitations produced RISC (reduced instruction set CPU). Then the increasing popularity of graphics apps brought about SIMD (apply the same filter to a whole bunch of filters). Clock speeds rose so much that even the scheduler in a RISC chip was having trouble keeping up, leading to VLIW (very long instruction words) used in Intel's Merced Itanium and (internally) in Transmeta's Crusoe [transmeta.com].

  • How easy are these c and C++ libraries to use?

    Are they saying any vector type processing can be easilly rewritten, and so lots of aps can be enhanced?

    I vaguely know altivec is cleaner than the x86 simd stuff, but can the same thing be applied to mmx, 3dnow etc... ?

    what parts of their kernel gain performance?
  • You are either a bad C++ programmer or just haven't heard of these things: 'vector' is a member of the C++ standard template library and is therefore not added especially for AltiVec. The STL seems to be a good place to insert these assembler optimizations. Since the class abstraction is pretty high, you can do a lot of speed-increasing operations in the dark dwellings inside the classes. All applications written in standard C++ will benefit from this.
  • by Mondragon ( 3537 ) on Saturday March 25, 2000 @09:18AM (#1172672)
    The speed benefit comes from the processor, not the applications, as it were, although they must be optimized for it. Any OS that runs on a PPC 7400 (G4) can take advantage of the AltiVec instructions. So, x86 linux won't get this benefit. An interesting thing to know would be who's patches to GCC these are (I haven't looked at the web site). Apple's version of egcs for Darwin has Altivec support (from Motorola, actually), and it's been available for while. They're in the process of assigning copyright of 60,000+ lines of gcc patches to the FSF, so it should become part of the main tree at some point.
  • by Darchmare ( 5387 ) on Saturday March 25, 2000 @01:31PM (#1172673)
    ---
    a G4 system without the OS forcibly "bundled"?
    ---

    Probably around the time you can buy any VCR and have your choice of software bundled.

    It's Apple's hardware, and it's Apple's software. It's not like they're pulling a Microsoft here and forcing other companies to not bundle alternative operating systems - they _are_ the other company. Anyone out there is free to build their own PPC based machines with LinuxPPC preinstalled. It's not their fault that nobody has done so.


    - Jeff A. Campbell
    - VelociNews (http://www.velocinews.com [velocinews.com])
  • by john_boy ( 110600 ) on Saturday March 25, 2000 @09:45AM (#1172674)
    Altivec support has been in all of the 2.3.x kernels, but it hasn't done much yet -- only #ifdef'ed in a handful of lines of code. This is really quite cool; I'm already running Linux on a PowerPC 750 (the G3). My next machine will likely be a G4 or whatever's next.

    There's a good bit of info on the alti-vec and the G4 in this Ars Technica article [arstechnica.com] (that was slashdotted a while back).

    John
  • by Chris Johnson ( 580 ) on Saturday March 25, 2000 @09:26AM (#1172675) Homepage Journal
    Sweet :) and I for one am not surprised. 'Altivec' aka 'Velocity Engine' is a bunch of _general_ _purpose_ big-ass registers which are not shared with FP registers or hobbled unreasonably. PPC is already incredibly register-rich (what is it, 32 int and 32 FP and now 32 128-bit altivec registers? That can work like 192 32-bit registers (yes you can treat them like divided address spaces- multiple values) versus Intel which gets what, 8? 16? 32? and shares its vector processing with FP registers.

    Please, if anyone can flame my data and correct it I beg of you to do so ;) but I'm not a bit surprised that G4s are doing this. Altivec lends itself to big data operations, not just vector processing. Memory moves are faster 128 bits at a time, and so on. Screen blitting, likewise. I wouldn't be a bit surprised if someone is working on an optimized X that uses G4 altivec acceleration- that would seem to be a no-brainer.

  • by rillian ( 12328 ) on Saturday March 25, 2000 @10:30AM (#1172676) Homepage

    I had a lot of trouble trying to actually find this code. It may be in the yellowdog cvs [yellowdoglinux.com] but the server seems to be down, as is the ftp server [yellowdoglinux.com].

    They do say [yellowdoglinux.com] to go to altivec.org [altivec.org] to download the gcc and binutils. It's in the tools section behind a "you must sign up for our email forum [mailto]" form. The packages there include a new binutils, gcc, gdb, and libc to support the altivec extensions.

    Here are the direct links, for the curious:

Term, holidays, term, holidays, till we leave school, and then work, work, work till we die. -- C.S. Lewis

Working...