Start-up Could Kick Opteron into Overdrive 127
An anonymous reader writes "The Register is reporting that a new start-up, DRC Computer, has created a reprogrammable co-processor that can slot directly into Opteron sockets. This new product has the potential to boost the Opteron chips well ahead of their Xeon-based competition. From the article: 'Customers can then offload a wide variety of software jobs to the co-processor running in a standard server, instead of buying unique, more expensive types of accelerators from third parties as they have in the past.'"
Berkeley (Score:2, Interesting)
Re:Berkeley (Score:1)
Re:Berkeley (Score:2, Insightful)
Re:Berkeley (Score:5, Interesting)
You're quite right that these are not for you - their to run highly specialised calculations (the oil & gas industries are mentioned in TFA).
They make some operations much faster (think of a hardware mpeg decoder, useless for most things, but much more efficient for the single thing it can do then a general purpose CPU)
How does this affect cooling?
These things consume 10-20 watts compared to an Opeteron's 80, so it's affect on cooling is minimal (far less then adding the second opteron that you propose)
Re:Berkeley (Score:2)
if that would be the case, perhaps my gentoo machine
would be complete before christmas:)
Re:Berkeley (Score:3, Interesting)
Re:Berkeley (Score:2)
This will fund terrorisim by allowing us to transcode media files at an absolutely astounding rate*.
-nB
* Actually this looks great for the likes of LAL, Pixar, and other video shops. I'm a die hard Intel fanboi (last used AMD on my 386sx33) and this has me looking to buy a platform....
Didn't someone try this on the memory bus once? Someone by the name of neuron? Whatever happened with that?
-nB
Re:Berkeley (Score:2)
Re:Berkeley (Score:3, Informative)
There have been tons of addon cards that do FFTs, TCP offloading NICs, physics engines, or whatever you want. The problem is twofold. 1) These cards are expensive, or at the least nonfree and nonstandard as the rest of the computer and need software sup
Re:Berkeley (Score:2)
They claim 10-20x the performance of an Opteron for specific tasks. They also claim 3x the price/performance of an Opteron.
Since it costs about 3x the price of an Opteron, and performs atleast 10x better, their 3x price/performance claim seems pretty valid.
Ofcourse, it needs to be programmed for highly specific tasks. But chances are that, if you're in the Opteron-buying market, you need it for highly specific tasks.
Re:Berkeley (Score:2)
So... (Score:2, Insightful)
Rendering comes to mind, but I'm biased [slashdot.org]. But I'm sure that a glorified graphics card isn't the most interesting use...
If these become popular enough, will we be seeing a back-end to GCC for this FPGA?
Re:So... (Score:5, Interesting)
- MINLP/MILP [wikipedia.org] (Wikipedia article is a bit weak) and Branch and Bound optimisation for things like pipeline routing, well selection etc.
- fluid mechanics for pipeline design
- geological data-mining for finding reservoirs
Those kind of jobs can have runtimes measured in days and weeks, so an accelerator could make a real difference.
Re:So... (Score:3, Interesting)
Seti stats... (Score:1, Offtopic)
(they'd need fans though)
I'm in the top 3% worldwide.. and so are the 18,055 people above me.
And I don't believe I'll ever see top 1%
Re:Seti stats... (Score:2)
Flight simulation might be fun. Not just the graphics: air turbulance, AI for other aircraft, birds, etcetera...
Re:Seti stats... (Score:1)
> And I don't believe I'll ever see top 1%
I also doubt that you'll ever see the top 1%.
If there are six billion (6,000,000,000) people in this world, then the top 3% is one-hundred-and-eighty-million (180,000,000.)
Andy Out!
Re:Seti stats... (Score:2)
Analog data analysis and general calculus, IMO (Score:5, Insightful)
I'd imagine you'll need to have the application compiled in such a way that it is aware of the additional processing capability, so its not likely to be a plug-n-pray solution to your general game player's graphical wet dreams.
Optimize Audio (Score:2, Interesting)
Software instruments are a neces
Re:So... (Score:2)
Hardly. The languages for which gcc has front-ends (C, Fortran, C++, Ada, etc.) are heavily biased toward CPUs that process a stream of instructions: load, store, add, and, branch, compare, etc. The highly parallelized and pipelined designs that make FPGAs so much faster than microprocessors can't be expressed in software languages, and producing a good hardware design that is equivalent to a given C program is, well, a much harder problem than creating
Kick ass synth? (Score:4, Interesting)
Re:Kick ass synth? (Score:2, Informative)
Re:Kick ass synth? (Score:4, Interesting)
That's not true, at all. An FPGA will not be as good of a general-prupose DSP as a custom-made DSP, but it will still be better than a CPU -- even the low-cost Cyclone II comes with 150 dedicated multiplers coupled with embedded memory, so they can do parallel multiply/accumulate at 700+ MHz. And these are the low-end FPGAs...
Now, if you're actually programming the FPGA using custom-designed circuitry optimized for the task you're workin on, the FPGA will work a lot better than a general-purpose DSP, and be way ahead of an even more general purpose CPU. That's why you don't see generic DSPs being used in heavy DSP work (say, in telcos), but custom and semi-custom ASICs, and FPGAs in smaller environments.
Re:Kick ass synth? (Score:2)
That's why you don't see generic DSPs being used in heavy DSP work (say, in telcos), bu
Re:Kick ass synth? (Score:2)
Why read a re-written press release (Score:4, Informative)
http://www.drccomputer.com/pages/products.html [drccomputer.com]
Quality? (Score:1, Offtopic)
A bit more accurate summary (Score:5, Informative)
That's a pretty cool niche.
Re:A bit more accurate summary (Score:1)
Re:A bit more accurate summary (Score:2)
Additional question : are there any generic driver templates for Hypertransport-based devices ?
Re:A bit more accurate summary (Score:2)
Re:A bit more accurate summary (Score:2)
Re:A bit more accurate summary (Score:1)
Re:A bit more accurate summary (Score:2)
Re:A bit more accurate summary (Score:2)
The reason Math Coprocessors (like the 487) got built into the die was because as more and more people used things like photoshop, the floating point performance came out of the realm of 'niche' and into the mainstream. All in all, most people would be better off with a second opteron in the server to hand out web pages and e-mail, rather than this which is more s
Re:A bit more accurate summary (Score:2)
price performance (Score:5, Funny)
Re:price performance (Score:2)
Re:price performance (Score:1)
Re:price performance (Score:1)
They didn't say "price/performance ratio" (Score:1)
As for why they hyphenated it, I can't answer that one...
Re:price performance (Score:2)
Which I would assume means price TIMES performance, or perhaps the dash should be taken literally as price MINUS performance.
Re:price performance (Score:2)
Er.... question (Score:1, Interesting)
"DRC's flagship product is the DRC Coprocessor Module that plugs directly into an open processor socket in a multi-way Opteron system," the company notes on its web site.
If you have an open Opteron socket on your multi-way box, wouldn't you probably achieve better performance by shoving another Opteron into there?
I mean, sure, I can see the benefit of having a co-processor customized to handle your specific workload. But another Opteron would likely run at multiples of the clockspeed of that thing, and it
Re:Er.... question (Score:4, Informative)
Clockspeed is not a measurement of performance unless you are comparing similar architectures. With FPGAs you can do everything in parallel, whereas microprocessors are inherently sequential. In effect, you can potentially complete hundreds of instructions per clock cycle, whereas a microprocessor will complete 2 or 3.
In practical terms, this product lends itself to compute intensive tasks such as signal processing, not data serving.
Re:Er.... question (Score:3, Informative)
With FPGAs you can do everything in parallel, whereas microprocessors are inherently sequential. In effect, you can potentially complete hundreds of instructions per clock cycle, whereas a microprocessor will complete 2 or 3.
True, but if the microprocessor's clock speed is hundreds of thousands of times fater than the FPGA, then you are even again. There's no clock speed for this device in the article so we can't really compare.
Re:Er.... question (Score:4, Interesting)
Clock speed often depends on the circuit design put onto the FPGA. If you got your FPGA design running at even 100MHz (not unrealistic), you're maybe 30 times behind a general-purpose CPU. But not only are you running hundreds of instructions per cycle, but those instructions are specific to the application and probably many times more efficient.
It's probably not useful for making short-lived applications faster, but for seriously repetitive number-crunchy work like weather predictions, oil drilling, etc, where there are trillions of small-scale computations, the highly-parallel nature of the FPGA has great potential.
Also, if those small-scale computations need to interact for any reason, on-chip communication is far faster than any chip-to-chip could be. And that's happening in parallel, too.
Re:Er.... question (Score:4, Informative)
Re:Er.... question (Score:2)
The new thing is the hype(r)transport (Score:4, Insightful)
These guys claim their stuff is cheaper than more horsepower and that you get the extra speedboost from the hypertransport (over pci).
It clearly is a pr-release that has been regurgitated by a lazy journalist, as I found no or few critical notes, something this product might deserve. for one thing I don't see how they have solved the special software & programmers problem or how they really have taclked the economics of scale: this thing costs a couple of grands, vs a couple of hundres for a amd top notch processor. the regular processor has double cores and runs an order of magnitude faster than the fpga. The scarecity of programmers that can write software for this thing adds another order of magnitude to the wrong side of the equation.
Roughly, the fgpa solution must be a thousand times quicker/better than the regular-proc-with-lots-of-horsepower solution. I don't see that happen soon.
OTOH, the rosy images of a computer that can render a pixar animation in a few minutes the next mintes be used as a realtime sound-processing thing or simulate a neural net with as much neurons in it as in the human brain, that makes the geek in me drool. Computer, tell me it isn't so!
Re:The new thing is the hype(r)transport (Score:1)
Cray XD1 has something similar (Score:1)
Re:The new thing is the hype(r)transport (Score:2)
Re:A thousand times faster - it's already here. (Score:2)
High end gameing? (Score:4, Interesting)
neural networks or java? (Score:4, Interesting)
But I'm a fan of neural networks, and I imagine that if such coprocessor was programmed exactly to perform NN tasks it could bring "brain simulation" a few steps closer - especially if many such coprocessors were put into the system.
Re:neural networks or java? (Score:4, Informative)
In late 90's, I've been burned off in precisely such start-up. We built an ASIC Java piggy-back byte-code CPU. It worked... as a proof of an idea. It didn't give much performance boost, at best, in 20-30% range. Noone wanted it.
Re:neural networks or java? (Score:2)
Re:neural networks or java? (Score:2)
Why not? This sounds perfectly complementary to me since most Sys Admins also run on java.
Re:neural networks or java? (Score:2)
Azul does that, but it is a fully specialized hardware. No idea if you can take their core unit and transplant it into an Opteron socket.
What about sdram slots (Score:2)
An old method, not really suitable nowadays (Score:5, Interesting)
Another nice approach was the "swinging gate" RAM method in which you had two blocks of physical RAM in the same memory space. The main CPU filled one block with data, then flicked the switch so the co-processor could read that data while the CPU read the results from the other block, then put in new data for processing in the next cycle. Very easy to implement, much cheaper than FIFOs. It meant you could use a cheap DSP (from TI) in a system using a cheaper 8086 series processor for which you could get cheap tools and an embedded OS.
Re:What about sdram slots (Score:2, Interesting)
Re:What about sdram slots (Score:1)
I've an idea! (Score:2)
Open protocols win! (Score:5, Insightful)
About Time! (Score:5, Interesting)
One of first signs that PCs needed an FPGA or similar was hardware MPEG capture cards... They could do the job so much faster, and so much cheaper than your primary CPU, that the alternative is disappearing.
High-end graphics cards have been the most telling development. It's not that OpenGL is something magical, it's just that an ASIC can do many things so much better than a CPU that transfering much, much more raw data over the bus was still cheaper than actually processing it (despite the fact that interrupts are rather costly themselves).
PS2 clusters, Crypto cards, Hardware-accelerated NICs, SLI, all are a symptom of almost excatly the same problem...
The rising popularity of GPU programming made it extremely clear that there is a vaccuum here. Using the videocard isn't a very good method to accomplish this, just a stop-gap necessity. I thought from the beginning that FPGAs would become like the old math-coprocessors, and have their own motherboard socket, but neither AMD nor Intel were stepping up to fill this clear need. Installing it into a normal CPU socket, to get around this appathy, is a very clever hack I hadn't thought of.
I expect, with popularity, it will be cheaper to put a custom FPGA socket on motherboards, rather than building a full-fledged SMP motherboard for the purpose. After that, who knows... Maybe FPGAs will go the way of the math-coprocessors and get itegrated into future CPUs.
I know if I was running ATI or NVidia (or Hauppauge, or Level5), I'd be very worried about this thing eating the most profitable segment of my market.
Re:About Time! (Score:5, Interesting)
Re:About Time! (Score:2)
Re:About Time! (Score:3, Informative)
Re: (Score:2)
Re:About Time! (Score:2)
Memory bank? (Score:2)
Re:Memory bank? (Score:2)
You mean Like the new Cray? (Score:1)
http://www.cray.com/products/xd1/index.html [cray.com]
oh BTW a single 3U is around $45k. For certain memory bound calculations and some sequential algorithms, HFFPGA work well (high frequency FPGA).
386 DX? (Score:1, Interesting)
I still have my 386/40MHz + coprocessor.
And yes, AMD have called me to come in to their lab with my ancient relic about a year ago.
Weitek? (Score:1, Interesting)
Also there is a big fear of specialized hardware accelerators because they could be rigged in silicon, which you will never find out. With the functionality implemented in software on generic purpose CPU you at least have a chance to audit the code to find out if the SSL handling has some NSA backdoor added or so. You buy a Chrysalis Luna VPN booster PCI card and assuredly know Mossad reads whatever you transactio
Re:Weitek? (Score:2)
Choosing AMD Platform (Score:1)
FPGA vs. General Purpose CPU (Score:4, Informative)
Hardware, on the other hand, is massively parallel. All the "gates" (*) are all running all the time. It's like multi-threading a program, taken to the limit of infinity. However, if designed correctly, this thing can scale beyond belief, since it's all parallel.
It's also important to note that it's a Virtex4 [xilinx.com] on that card. That's one hell of an FPGA, they sure aren't cutting any corners. I'm not sure which one they're using, but some Virtex4 chips have PowerPC processors at 450 MHz.
This is definitely a niche product for now, due mainly to the lack of people who can write code in Hardware Description Languages (HDLs). But if you can figure it out, and you have an application that works on a massive scale, this may be for you.
Oh, and for all you detractors who are saying "that thing only runs at 500 MHz! How is it supposed to be faster than my 2 GHz AMD chip?" You're forgetting one very important factor. Your AMD chip executes one instruction at a time, and the important instructions are surrounded by instructions whose sole purpose is to control program flow or move data back and forth. However, the XtremeDSP slices of a Virtex4 can each execute a multiply and an add in a single cycle, and there are up to 512 of them in the most hardcore Virtex4 chip, and other logic executing in parallel can control the "program flow" and ferry data back and forth across the bus.
*: Modern FPGAs are actually built out of SRAMs that can implement arbitrary logic functions. They're no longer arrays of gates, so to speak.
Re:FPGA vs. General Purpose CPU (Score:2)
What a relief for the Linux crowd! We no longer have to imagine a Beowulf cluster of Vistas.
Might we ever have socketed Hypertransport GPU's? (Score:5, Interesting)
But what I find really exciting about this idea is that once the GPU is in the motherboard, I'm sure programmers would find an easy way to use all that logic to do calculations - say, media encoding. Heck, I know they are trying to do this with GPU's on cards, but this would be a much lower latency connection.
I wonder how this would affect total system cost. I mean, I know multi-socket mobos will always cost more, but then again, when the GPU is a chip instead of a card, that should bring costs down. Also, they could ditch all that PCI-e logic and those slots. Upgrading would definitely be cheaper, and can you imagine two socketed GPUs on the mobo running a Hypertransport version of SLi? That might be the fastest, quietest gaming rig ever!
Re:Might we ever have socketed Hypertransport GPU' (Score:2)
Re:Might we ever have socketed Hypertransport GPU' (Score:2)
Now, a programmable co-processor on a PCIe x16 card... I'd like to be able to encode a movie in five minutes.
Re:Might we ever have socketed Hypertransport GPU' (Score:2)
Now I'm confused. This sounds about like someone saying: "Now that they've got hybrid technology in cars, they should put it in trucks. Then we can take the trucks and make them smaller by removing the truck bed, and put more seats in. Maybe even put a car body on it..."
What do you think this FPGA is for, exa
Re:Might we ever have socketed Hypertransport GPU' (Score:3, Insightful)
WRT your vehicular analogy, there are people who buy cars and want to use them as trucks occasionally, and people who buy trucks but sometimes just use them as cars. It's no big deal.
Re:Might we ever have socketed Hypertransport GPU' (Score:2)
Furthermore, GPU's do require a fair amount of bandwidth, but are a lot mo
Hypertransport for general-purpose expansion? (Score:1)
10x - 20x performance? You betcha. (Score:3, Informative)
There's a market for GPU's on video cards running $1,200+... People that buy them won't be satisfied with standard GPU's no matter how fast their main processors run... The custom acceleration of graphics calculation makes it worthwhile.
Now, imagine doing massive calculations (think three blackboards filled with quantum physics equations) -- and you can see how some scientific/industrial applications would go ga-ga over this stuff...
On FPGAs as PC coprocessors -- latency rules (Score:3, Interesting)
http://www.fpgacpu.org/usenet/fpgas_as_pc_coproce
http://www.fpgacpu.org/log/aug01.html#010821-dimm [fpgacpu.org]
The latency to the FPGA fabric largely determines what kinds of coprocessing workloads are feasible.
When hypertransport came out, we (FCCM'ers) knew a HT-based lower latency interconnect should be possible. (Though I wouldn't call 75 ns +/- "low" latency -- that's a couple of hundred instruction issue slots, or a bit more than 1 full cache miss.) But DRC has gone and done it. I love the way it (apparently) just drops in and can even use that socket's DRAM DIMMs. Congrats to Steve Casselman and co.
Look! Data Flow! (Score:3, Insightful)
The best phrase to help the system design effort is data flow.
How does the machine chop up the task for the most performance?
The major problem in design is finding where to place the dotted line that says "cut here". Software mavens know this as refactoring, or partitioning.
The gotcha in development would be to ignore the internal architecture of the FPGA.
As a word of advice to the beginner, look carefully at the FPGA data flow, and try to decompose the algorithm ( or find a similar one) so that the data manipulation and movement fits the part as best as possible.
Just having an HDL is not enough, the neophyte hardware designer can easily write code that cannot be synthesised to work, let alone fit the part. A sensitivity to the underlying hardware is needed.
As an example of this, using hand crafted hardware design, Chuck Moore wrung several times the expected clock performance for a hardware Forth engine. A starting point for reading might be:
http://www.ultratechnology.com/cowboys.html
Using hand-crafting, you can get enormous processing gains, but the hardware and system designs have to be well understood.
Perhaps the GNU uber-geeks could handle the translation efforts to make a tool for the average application programmer, but until then the brave soul who tackles these efforts should be prepared to learn a lot of the edges of computer science, hardware, and system design. It's not a horrible job, just long. And the problem should be worthy of the efforts needed.
If you're going to require socket 940 platforms... (Score:2)
Re:If you're going to require socket 940 platforms (Score:2)
Speed (Score:2)
Re:Speed (Score:3, Insightful)
An FPGA doesn't equal dedicated hardware. It takes a performance hit (in some domains, a huge hit) in exchange for flexibility. It also requires code that supports it.
The first set of DRC modules will consume about 10 - 20 watts versus close to 80 watts for an Opteron chip.
People buying USD$5000 coprocessors, plus the cost of developing specialized code to use them, don't cut corners on the basis of their electric bill.
Fair points (Score:4, Insightful)
Also, power consumption matters if you've got a rack of these things in a small space and need to keep them cool. Five times as many systems might need a larger server room.
Re:Speed (Score:2)
Well, Cray is using FPGA's as dedicated co-processors in some of their supercomputers. So they can be quite fast indeed.
Re:Speed (Score:3, Interesting)
People buying USD$5000 coprocessors, plus the cost of developing specialized code to use them, don't cut corners on the basis of their electric bill.
You're doing the math wrong. For decent colo space, I pay somewhere around $150 per rack-unit year and $120 per amp-year. If the coprocessor is really 10-20x faster for my workload, I don't just save the half-amp on one coprocessor; I get the savings on
Re:Comrades! (Score:1)
Well, imagine a Beowulf cluster of them