Low-cost Reconfigurable Computing (FPGA's) 165
Anonymous Coward writes: "People at the at Chinese University of Hong Kong have developed a reconfigurable computing card which uses the SDRAM memory slot instead of the PCI bus. Measurements in the paper show greatly improved bandwidth and latency - why aren't more people using this idea?"
Linked directly to Postscript? (Score:1)
Re:Linked directly to Postscript? (Score:4, Informative)
Be gentle. And mirror and post mirrors, please. Bandwith costs, and I'm poor.
mirror (Score:2)
Re:Linked directly to Postscript? (Score:3, Informative)
2 Comments: (Score:2, Informative)
Try using ghostscript/GSView which will display the postscript file directly. (A quick search on Google gave the following link which should be useful
GhostScript [wisc.edu]
Comment 2: I don't think this is ever really likely to be part of 'consumer' system. FPGA's are great for
1) prototyping circuits that will later be implemented as an ASIC where the cost of "respinning" a chip is extremely high or
2) Situations where the system is only produced in very small numbers.
The main problems with FPGAs are that they are
1)Expensive!.
2) Relatively small in terms of the gates they can implement and
3) The clock speed that can be achieved is probably about an order of magnitude lower than an equivalent ASIC.
For many situations a multi-CPU system may be a much better option, and I certainly think that they'd be impractical for a mass produced system.
Simon
RAM-slot FPGAs (Score:4, Insightful)
modern processors are well-adapted to general computing tasks.
FPGAs (read: custom iron) might be good for a few specialized tasks (breaking 3DES, for instance), but most of us will be a lot happier on our UltraSparcs and Athlons and G4s.
Re:RAM-slot FPGAs (Score:2, Interesting)
It is very useful to have a chip to data gathering for a while, then reconfigure to do a DFT on the data, then reconfigure to spit this back to earth through telemetry.
Re:RAM-slot FPGAs (Score:1)
custom applications (Score:1)
Re:custom applications (Score:1)
I can see these devices being included under the SMP concept. And guess what? Now you can have two processors of different architectures!
Albeit the custom processor would have to be somewhat simple depending on the size of the FPGA.
Re:custom applications (Score:1)
"A simple Linux device driver was developed which allows user mode programs to access the Pilchard hardware. Although this driver was tested only with Linux Kernel 2.2.17, ports to other operating systems and Linux versions should be trivial."
I have always found that for research oriented stuff like this, linux is the primary development platform.
Re:custom applications (Score:1)
did you even read the paper? One of the design goals was "uses the Linux operating system" (see page 2)
Re:RAM-slot FPGAs (Score:1)
As greater amounts of processing power become available, we will start doing more complicated things with them in our homes; Imagine generalized, easy-to-use programs which do (for example) CFD tasks. Anyone could roll their own body kit for a car, or a windshield for a recumbent bicycle, et cetera. With reconfigurable computing we can make better use of our computing resources, and optimize for such tasks.
OTOH, there's really no need for this kind of technology any more. Processing power is getting cheaper and cheaper all the time. Might as well just use it. In addition, multiprocessor support is growing, and multiprocessor systems are getting cheaper.
Re:RAM-slot FPGAs (Score:1)
That was my final year project in University.
Re:RAM-slot FPGAs (Score:1)
Yeah, and we've reached the point where you can break a motherboard in a tower case by the weight of your heatsink and fan.
If you ask me, this would be the perfect solution for gaming consoles. A console includes one of these, and the game comes with a configuration in mind.
Imagine decent AIs in games becoming more common, or on-the-side graphics rendering. How about easy emulation of other console architectures?
Remember the Final Fantasy movie? The skin on the people didn't look quite right, because the movie developers hadn't figured out how to quickly calculate the light diffusion and reflection in human skin. This neat little device would have solved their problem.
Re:RAM-slot FPGAs (Score:1)
So would simply having more processing power. The additional power is coming quickly. This neat little device might or might not have solved their problem, either. It depends on if it's a matter of having to do an awful lot of things CPUs are already good at, or if it's something that is easily solved with custom logic.
Also, you can only do so much custom logic at once. I used to work for a company which made (among other things) a chip which was used to model a very well known intel CPU in hardware, though at a fraction of the full speed. These chips had a very large die (close to the maximum size for a die at the time, actually, and this was not so long ago) and it took eight of them to model the chip. Even if you enhanced this by a factor of ten, it's probably still cheaper just to throw more CPUs at the problem, especially with the new bus standards cropping up, making multiprocessing easier and cheaper.
Re:RAM-slot FPGAs (Score:2, Interesting)
Picture all the algorithms required to ray-trace one pixel. Now picture all those algorithms made into one small portion of an FPGA. Now picture many many such portions occupying a single FPGA.
Suddenly you have a device that encompasses your entire process, and can execute that process much faster than a normal CPU can. To give you an idea of the scale involved, picture a thousand CPU instructions compressed into one clock cycle.
Kinda cool, huh?
Re:RAM-slot FPGAs (Score:2, Insightful)
The problem is, you can't parellel this easily:
for each pixel in image
for each primative in scene
if ray through pixel hits primative
for each light
if ray from light to hit doesn't hit something else first, calculate illumination
calculate reflected color based on material
write color to image
so let's say we equip each node of an FPGA with a program that evaluates this program. If we have 1,000 of these nodes, that means we can like, render 1000 pixels at the same time right?
wrong. The scene is going to be far to large to store in each FPGA. So, each FPGA node is going to have to wander down the list of primatives in ram to do it's intersection tests. That is not fast.
Now sure, you can set things up so that all the nodes are listening to one broadcast bus, and all the primatives in the stream are listed off, and any nodes find a hit, they remember it. After that you let list the light sources, letting nodes calculate illumination at the hit, then let them process the material. Most likely they have to do some texture lookups here.
So sure, that's a way of reshuffling the loop order and doing a lot of tests in parellel, but the real truth is, if you use some sort of spatial hierarchy on a general purpose cpu, it will be much faster.
Traditional beowulf clusters are typically much better for this sort of thing, because they usually can store an signifigant portion of the scene discription locally, so there's no communication overhead that limits the parellelism.
The deal with Final Fantasy is they didn't take into account subsurface scattering. Only recently have good models for that surfaced, and the computation time is prehibitive.
Re:RAM-slot FPGAs (Score:1)
Re:RAM-slot FPGAs (Score:4, Informative)
Rather, "modern processors are well-adapted to general *serial* computing tasks." If you have a computation with an embarrassing amount of low-level parallelism (e.g. applying a filter to an image), you can either hope that streaming SIMD will come to your rescue, or you can burn an FPGA with an embarrassing number of parallel computation paths that implements the desired function. The FPGA would already win in many real-world computations, were it not for the fact that it's limited by the cost of getting the data on and off the chip over a slow data bus.
Re:RAM-slot FPGAs (Score:1)
making them embarassingly parallel would simply _kill_ serial processors speed-wise, but only where the process is parallel to begin with. in theory, you could make an equivalently parallel cmos (non-reconfigurable) processor, and jack the clock up much higher than an fpga would tolerate.
Re:RAM-slot FPGAs (Score:2)
Re:RAM-slot FPGAs (Score:1)
So think FPGA co-processor, not CPU replacement.
Re:RAM-slot FPGAs (Score:1)
Usually plugging away at one task, but occasionally need to do something different, given the choice of increaseing board space, having hardware powered when not needed, installing a faster more powerful micro just to do that one intermittent task using the FPGA as a co pro or even as the processor (Plenty of IP cores around that can run embedded apps.) can make a lot of sense.
For example how about encryption? gather data over weeks working as a slow dumb 4 or 8 bit micro, get ready to send, switch fpga from data gather and compress functions to highly parallel encryption, send data securely, back to gathering in serial processing type mode.
Not a real app yet AFIK, just thinking
Re:RAM-slot FPGAs (Score:3, Insightful)
This is a completely meaningless statement, because there's no such thing as a general computing task. Today's popular uses for computers developed as a result of the hardware's capabilities (which influenced the hardware's design, in an evolutionary feedback loop). We are only beginning to explore the uses of digital microcircuitry.
Modern processors and modern programming methods are well adapted to each other, so one should expect that unorthodox hardware would be difficult to program and give poor results. We just don't have the experience for it.
However, it becomes increasingly harder to get a consistent return on larger and larger surface-element counts with serial execution programming. Random memory accesses and conditional branching are discouraged in favor of "predictable" memory accesses and instruction execution, and greater and greater sacrifices of the illusion of serial execution are made in favor of efficiency. The advantages of parallelism grow as the chips grow, and reconfigurability at the level of the gate logic is the natural extreme we will likely tend toward as we figure out how to handle trillions of transistors in one device.
Can you really imagine current design trends extrapolated to instruction pipelines millions deep? Serial execution does not scale infinitely.
Re:RAM-slot FPGAs (Score:1)
Configurable Cards Already Featuring RAM (Score:2, Informative)
Chameleon VME Block Diagram [catalinaresearch.com]
Re:Configurable Cards Already Featuring RAM (Score:1)
Yeah but.... (Score:1, Insightful)
Starbridge Micro uses these for a supercompter. (Score:2, Interesting)
Starbridge == Scam? (Score:1)
Like I said, most people didn't buy into it, but the company is still around a year or so later, so who knows. Maybe they are selling some systems. Either way, they certainly aren't making much of an impact on anything in the computing world. Let's not forget that $15 million would get you a fuck of a lot of conventional CPU as well. It's enough to buy 10,000 high-end PCs and network 'em together (resulting in about the biggest Beowulf cluster in existence). And that wouldn't require any new programming technology.
I think these guys also claimed that they'd sell these things for $3-4k...
Why aren't more people... (Score:4, Interesting)
Another question I've had bouncing around in the back of my head is why no one uses MPEG decoder circuitry for MP3 playback? All the players I've tried, windows or linux, take 10-30% of the CPU for noraml playback operation. This is unacceptable when working in big apps like 3DStudio Max, make-ing a big app or running big scripts. I have an old MPEG decoder card from a Creative DVD, also I believe my G-Force has MPEG decoder acceleration... How much trouble would it be to write a driver for Winamp that uses preferred devices like that?
Re:Why aren't more people... (Score:5, Informative)
I have also wondered why more people aren't using the memory bus for peripherals....Seriously, the PCI buss can only offer so much (132 MB/S) which is certainly going to be a problem with anything faster than gigabit ethernet
Because the memory bus is a memory bus, and NOT a peripheral bus! Peripheral busses have things like interrupts, address space configuration, buffering, bridging, hot-plugging, and long-term stability that memory busses are simply not designed for.
How would you like it if you couldn't use the latest whizz bang 8.4GB/s memory technology because some peripheral you bought a year earlier needs to be on a 4.8GB/s memory interface?
Anyway, PCI v2.2 (?) offers 512MB/s in 64 bit 66MHz mode. And then there's PCI-X...
And show me a game that is PCI/AGP bandwidth limited once textures are uploaded to the GXF card anyway. Memory is cheap, use it...
Meanwhile, modern memory busses are upwards of 4.8Gb/s. Imagine multiple machines strung together with that kind of bandwidth between them!
Unfortunately, those pesky laws of physics (like the speed of light) come in and put paid to schemes like this. While it may be possible to get that bandwidth between machines, the latency becomes a problem. Certainly not feasable as a memory bus.
Re:Why aren't more people... (Score:1)
How would you like it if you couldn't use the latest whizz bang 8.4GB/s memory technology because some peripheral you bought a year earlier needs to be on a 4.8GB/s memory interface?
it's all in the controller. Perhaps abandoning the other busses and rigging up the interrupts for a single bus would be best? Also, it seems that having several memory busses would solve the problems of speed dependencies. multiplexing the south bridge to 2 or 4 spereate channels should do it.
And show me a game that is PCI/AGP bandwidth limited once textures are uploaded to the GXF card anyway. Memory is cheap, use it...
The slowdowns in heavy geometry transforms are still a bottleneck coming from the processor, even with h T&L. and who's to say game programming techniques wouldn't take off with so much more flexible pathways to design against?
Speaking of the 'speed of light', you could use actual fiber optic network cables much nearer their capacity with a bus that fast dumping straight into the RAM, cutting out several steps (which is where the latency comes from in the first place) along the way. This would make clustered systems fly, and open up altogether new techniques as well.
Re:Why aren't more people... (Score:1)
I don't know if you realize this or not, but that's exactly the way PCs currently work, except the busses are different for different things. PCI bus, memory bus, CPU/Cache bus, ISA bus, IDE bus, etc, etc, etc. Making all of the busses use the same interface would be insane, what's the point of having a 4.8gb/sec modem port? And with the huge memory caches on video cards these days (32-64megs) You don't need all that much bandwidth(but AGP4x provides plenty).
Modern PCs use different busses for different reasons, there's a lot more to consider then pure speed.
Speaking of the 'speed of light', you could use actual fiber optic network cables much nearer their capacity with a bus that fast dumping straight into the RAM, cutting out several steps (which is where the latency comes from in the first place) along the way. This would make clustered systems fly, and open up altogether new techniques as well.
Fiber optic memory?? Ai-ya! First of all, wire is about 66% as fast as the speed of light, and secondly, even then you won't overcome the lag. Current ram technology has lag measured in nanoseconds. And, that lag needs to be absolutely constant.
A simpler design isn't always the best design.
3d game acceleration vs. saturated PCI/AGP? (Score:2)
PCI/AGP are great for uploading static textures, on that you are correct.
However, there's more data than that to saturate the bus:
* Procedural textures
* Vertex cloud animation (bones aren't always appropriate!)
* Swapping textures when insufficient video ram is available.
Any of these can cause bus saturation. While many games are following the Half-Life model (static everything, use matrix driven hierarchical bones animation), this creates pretty bland worlds.
If you want to realistic water, more organically animated content, or more subtle animations, this bandwidth becomes critical. Vertex/pixel shaders regain some of this by allowing processing to be moved back into the 3D GPU, but that only works for inherently procedural and low order polynomial effects - data driven or more complex procedurals still need to upload obscene amounts of data!
I should also mention that accelerator card drivers are optimizing pipelines for static textures, Unreal ran into this problem badly, and it continues to this day.
Re:Why aren't more people... (Score:1, Insightful)
Something like Hypertransport is a lot more suited for high bandwith clustering, unfortunately AMD has not designed a port for it
But the forces that be have always resisted a cheap high bandwith non local interconnect, SCI has been kept down by the man
The industry does not want us to have cheap clusters with the same interconnect bandwith as the ultra expensive heavy iron, there is too much money at stake
Re:Why aren't more people... (Score:2, Informative)
I believe that some cards from Phillips claimed MP3 acceleration, and there is no reason why the Soundblaster Live chipset couldn't be programmed to do the same. (Phillips, iirc, doesn't support linux, btw)
Re:Why aren't more people... (Score:1)
Re:Why aren't more people... (Score:2)
Re:Why aren't more people... (Score:4, Informative)
Been there, done that. Most PCs prior to the 386 models used the ISA bus for both peripherals and memory. The buses where separated out in modern PCs for a reason: the laws of physics. At today's speeds, a memory bus can't be more than an inch or two in length. If you use your one free memory slot for I/O, you have no more memory expansion capability.
Re:Why aren't more people... (Score:1)
I'm curious, what hardware are you rendering on, where the cpu usage for decoding an mp3 stream takes up 10-30% of your cpu? Running winamp with the mini-vis set to 70fps, and checking task manager in NT4Wks reports that winamp uses 0-1% of my cpu. This can actually be taken to read 1-2%, since I am running a dual-processor pentium 2 at 233mhz, with 256mb of 60ns ram, an ISA sb64, and an old pci TNT using old detonator drivers, since the new detonators break avi playback for me.
As an alternative test, I fired up MXaudio on my SGI Indigo, which has a 100mhz r4000 cpu and ELAN graphics, and to decode a 256kilobit mp3 stream, it takes 35% of the cpu. (Not bad considering the age of the machine)
Although I am a sick bastard and raytrace images on the Indigo and my 486sx laptop, I hope you have a slightly more powerful machine for 3DStudio. Perhaps the huge amount of cpu usage for your mp3 player is due to poorly written sound card drivers? I would seriously look into this.
Nforce (Score:1)
This only makes the point that the processor should be able to use the bandwith better and that the 8xAGP bus the chipset is getting is 16 times faster than the PCI bus you are referring to.
Lets hope that the next generation point to point databuses is open enought to make adding an extra co-processor as easy as adding more storage.
VGA (Score:1)
Um... just what do you think AGP is?
Re:Why aren't more people... (Score:1)
Better sound card. (Score:1)
Re:Better sound card (or better processor) (Score:1)
Re:Why aren't more people... (Score:1)
Well, even back with my 400mhz k6-2, mp3 playback only took 4-5%. Now it isn't even noticeable, (task man reports between 0 and 4 percent)
If you're running Winamp, you might want to increase the 'buffer' or whatever (where Winamp puts the music after it decodes it but before it comes out of the speakers. I can't remember the term they used, and I'm to lazy to check... but not to lazy to write all this. Anyway)
You might have decreased the buffer to make the Graphic EQ more responsive, but by increasing it Winamp actually needs to work a bit less. Just a thought.
too much hassle, unfortunately (Score:4, Informative)
Not that much of a hassle (Score:1)
Boards are cheap (approx $110 US for a 200k gate chip that could easily hold 4 processors and a lot more).
My own direction is interfacing stuff to my own processor that is based heavily on Jan's design. It's for purely personal use, and saying that you are running code (assembly language only
Jan's site is at fpgacpu.org [fpgacpu.org] if you're interested. There are lots of details about all sorts of issues on the site. Some technical, some not so technical. Have a look under GR CPU's or XSOC
Simon.
As a HW designer (Score:1)
Come on, if FPGA CPUs catch on in a big way, I can start writing my own paychecks and stop designing these boring circuit boards, just concentrate on Logic design.
Furthermore, FPGA is embrassingly *slow* technology. ASIC is the real custom/screaming fast stuff. In embedded envrionment, you usually use FPGA for slow custom computational stuff, if you need 1-clock-cycle-latency or 6GB/s bandwith or anything like that, you need an ASIC. Just look at those motherboard chipsets, they're not FPGAs, no way..
From R&D point-of-view, FPGA is really nice to work with, tho. You can play around with it to your heart's content until you get it right and there's nothing to stop you from burning/flashing/uploading a new logic code every day if you want to. There are tools for FPGA-on-ASIC which gives you the capability to tape out that groovy logic design into mass-marked ASIC once you're sufficiently sure you got it right. Naturally, it's as slow as the FPGA would be, since otherwise it'd screw up your logic timings..
C programming for FPGAs (Score:1)
A Handel-C (C with extensions for parallelism and timming models etc.) to FPGA, development environment called DK1 was released by a UK company Celoxica [celoxica.com]earlier this year.
There's an eval download for DK1 on their site.
Karma Whoring (PDF version) (Score:3, Informative)
PDF AVAILABLE (Score:1, Redundant)
PLEASE MIRROR! I dont have nearly enough bandwidth to withstand the
Re:PDF AVAILABLE (Score:1)
Done - mirror here [cam.ac.uk] Should be enough bandwidth - couple of megabits available - if not, I'll move it to a bigger box next door...
Re:PDF AVAILABLE (Score:1, Redundant)
Try to be a bit less anxious to punish.
Re:PDF AVAILABLE (Score:1)
Re:PDF AVAILABLE (Score:1, Offtopic)
Yes, that sounds acceptable. Just hate to see someone lose karma for trying to help out.
Speed and gates... (Score:4, Insightful)
All this said, unless some big breakthrough happens, we won't see out Athlon or Pentium IV system replaced by these, the 2 main limitation of FPGA are the number of available gates, and the speed at which they operate.
While they've managed to increase the number of gates to something quite big (last time I read about this I think it was in the low million? 1 or 2, but I can't be sure), this is enough to "emulate" microcontrollers or lower end processors, but not enough for higher end microprocessors. While eventually they will catch up and maybe someone will do his thesis on emulating an Athlon off FPGA stuff, by that time we'll be at the 2nd or 3rd rev of Post-hammer processors, so it will look like today being able to emulate a 486 (granted, there could be some use in that, but none come to mind right now.. parrallel processing? 1 athlon can replace zillion of 486s...) Also the developpement of microprocessor is going at a faster pace than FPGA technology. I am not saying this couldn't happen, but it would need a serious bump in the fab process and technology to be able to reach Ghz speed, and probably few 100M's of gates.
Still, it's a very interresting technology.
Re:Speed and gates... (Score:2)
Re:Speed and gates... (Score:2)
You can get pci cards with FPGAs that interface to a digital simulator (like modelsim or quicksim). These are rather nice since they shorten simulation times hundredfold.
As for a reconfigurable device in a houshold device, they will certanly not be used as microprocessors, that would be a criminal waste!
Rather you would implement the time critical part of your algorithm in synthsizable code (rtl code) an dump that to the FPGA. There would not really be any need to send programmable circuits to such devices, you allready got one of those, your CPU.
Re:Speed and gates... (Score:2, Interesting)
FPGAs are great in prototyping when you want to produce a relatively small number of devices in a relatively short time. Problem is, a conventional microprocessor is always going to beat an FPGA at a comparable VLSI design level in terms of flexibility, and an ASIC will draw less power and perform a specific task more quickly. Limiting factor for an ASIC is cost (minimum production runs often in thousands) and design-production latency.
If you want to emulate an Athlon, use an Athlon. :-)
Stuff like high-volume encryption (e.g. Rijndael) is well suited to FPGAs because you've got a shedload of data coming in for a relatively simple series of calculations. Note that the AES contestants were evaluated partly on their ease of implementation in FPGA-like devices.
This SDRAM link might be a useful thing to increase bandwidth between a conventional microproc/bus/memory system and multiple FPGAs, bumping up performance by a factor of 2-3 maybe (on bandwidth-limited computations), but it won't change the world. IMHO, obviously.
Check out my review of the FPL 2000 [suslik.org] and FPGA 2001 [suslik.org] conferences for summaries of the current research in FPGAs.
Adrian
These have been out for a while. (Score:1)
Re:These have been out for a while. (Score:2)
Just as good? (Score:1)
That's just fine, when you can control exactly where the transister goes on the die, but FPGAs throw a curveball in:
The components in an FPGA are, IIRC, arrange in one big, massive grid. While it's still possible, controlling what a transister does by location is going to be much more difficult and time-consuming, in the development process.
Don't forget that one FPGA is different from another, so you can use the same 'ROM image' for different hardware. That's going to impact portability and development time.
Finally, don't forget that no matter how encrypted or secure the ROM image is before it gets flashed, it still has to be put into the hardware raw. Just build a virtual machine to intersept the flash data, and viola! You now have your (or your competitor's) CPU layout in a semireadable format. Now to run it through an FPGA emulator...
When (not if) this all comes about, you'll probably have hackers trying to tweak the trace lengths in their CPUs.
What Linux is to operating system kernels, a future hacker group will be to CPUs.
Re:Just as good? (Score:1)
Yes, portability is a big issue, but at least there is the hope of porting between similar architectures (such as between a Xilinx 4000-series and the Virtex series). In the case of ASICs, well, what does portability really mean? It's completely fabrication dependent.
Finally, regarding security of ROM images: I know that Xilinx keeps the format and interpretation of the bitstream proprietary and confidential. This doesn't mean that it is impossible to figure out, just more difficult than inserting a VM and "voila!".
I for one welcome the chance to re-design the processor in my computer
Panda Project (Score:1)
Using memory slots for devices is a bad idea (Score:5, Insightful)
Using memory slots for devices is a bad idea. The interface is not designed for devices. There are no IRQ lines. The address space can be configured by the chipset to fall anywhere in the address space of the whole machine (your device may end up starting at 0). The address space may even be interleaved with other memory devices in other slots. And the next generation of memory will use a whole different interface, and most new motherboards will soon migrate to it with little concern for backward compatibility.
Re:Using memory slots for devices is a bad idea (Score:2, Troll)
Perhaps you didn't read the article so carefully, but they seem to have overcome some of the difficulties, and they also aren't purporting this as a general solution to all computing woes ever. This device is a prototype and it currently is only setup to work on one motherboard type. What this does demonstrate is that for some applications (such as cryptography) this can be useful. The article specifically states that it can be useful for education, research, and a few other very focused tasks.
I can see an application where this is an aspect of the totally secure machine where all RAM is encrypted, and the only place that unencrypted data lies is on the silicon of the processor itself.
They aren't saying that the next sound cards should be made as DIMM socketed FPGAs. FPGAs only have a niche market currently, and almost none of the applications are for the average home user.
Re:Using memory slots for devices is a bad idea (Score:3, Interesting)
If you're trying to explore new coprocessor architectures, it's an interesting thing to try - certainly better than hanging coprocessors out on a PCI bus somewhere. Of course, these days, CPUs are fast enough that it's difficult to find applications that really need enough more horsepower than general-purpose processors can provide, but there are still enough edgy things to try that it could be worthwhile.
"People at the at Chinese University of Hong Kong" (Score:1)
--> Shuld't it be "The People's University of China"?
Reconfigurable chips vs DSPs - MPEG encoding (Score:2, Interesting)
Re:Reconfigurable chips vs DSPs - MPEG encoding (Score:4, Informative)
As any CPU they are sequential devices. The load a instruction, decode it and execute it and repeat. Though modern DSP can paralellize many intructions it's resources are still statically allocated at the time of design. A DSP with two multipliers may at most perform two multiplications at any one time.
Using a fpga on the other hand allows you to design the circuit from the ground up. now if your algorithm needs to do 20 multiplications at a time, you can do so simply by building them on the device.
Using a fpga is fundamentally different from using a DSP or microcontroller/processor. The latter is a finished circuit with an assorment of operators selectable by an instruction opcode. The former can be configured into any circuit.
FPGA CPUs at fpgacpu.org (Score:3, Interesting)
The main drawback is always going to be speed though - it's simply far and away more complex to have reconfigurable hardware than static h/w. The current "hot" CPU of any generation will almost certainly never be reconfigurable!
Simon.
Asus Board (Score:2, Funny)
Re:Asus Board (Score:2)
Re:Asus Board (Score:1, Informative)
Hong Kong" refers to "Chinese language" (which
is obvious from the Chinese name of the
university), so politics are irrelevant.
Wow (Score:1)
Cache memory bus... (Score:3)
IIRC, they had an expansion card that you'd attach to the cache slot near the original PowerPC CPU.
This way the new CPU would have all the memory bandwith it needed to run at 400 Mhz. 400 Mhz in a performa 6200... wow!
um, because 6 PCI slots, 2-3 SDRAM..duh.. (Score:1, Insightful)
Re:um, because 6 PCI slots, 2-3 SDRAM..duh.. (Score:2, Interesting)
Memory has become ridiculously large and cheap. 512MB boards are under $50. I'm sure there are people who need more than 1GB on a non-production machine (obviously production machines like database servers need all they can fit), but for most applications, by the time you need to fill the third memory slot on your box, you could just as well buy a new card that's 4X larger than the old one you're rolling out.
If you've only got two slots, you may have problems, but usually the main time you need the third slot is if you're upgrading a machine and want to keep the old memory as well. And sometimes you've got a board that doesn't have enough address lines or has a BIOS that doesn't understand them (my home machine *says* it can use 3x256MB memory, but it looks like I'll have to flash a new BIOS to do it, and so far 192MB has been plenty.) :-)
If you don't have something else special to do, like FPGA, you might as well keep the old RAM - doesn't hurt, and more memory is always usable as long as it doesn't force you to a lower speed. I recently added 512MB to a 128MB maachine at work, giving 640MB. Bill Gates says that ought to be enough for anybody
mirror posted (Score:1)
Suck that bandwidth up.
If Gates Were Reprogrammable.. (Score:5, Funny)
If Gates were reprogrammable, then we wouldn't be in this mess in the first place.
See my FPGA CPU News entries on this subject (Score:2, Informative)
On FPGAs as PC Coprocessors, redux:
http://www.fpgacpu.org/log/aug01.html#010811
On FPGAs as PC Coprocessors (1996):s sors.html
http://www.fpgacpu.org/usenet/fpgas_as_pc_coproce
problems with using RAM (Score:1)
Then again I hardly know squat about how this stuff works, apologies to those who know what they're talking about. (:
More people aren't using this idea because... (Score:1)
FPGA's: Resources and other random stuff. (Score:1)
If someone is looking to tinker with some (F)PGA's I would recommend Altera's student kits and software [altera.com]. You can use the standard part schematics included or you can define your own using VHDL. Only $150 or $105 if you're a Georgia Tech student.
Evaluating across paradigms (Score:2)
I am afraid that I must disagree with many of the comments posted concerning FPGAs. First off, FPGAs have been successfully demonstrated in the multiple GHz frequency range using SiGe as a base material (Dr. Jack McDonald's group at RPI has done such an implimentation with SiGe BiCMOS based systems.) Further, the contention that FPGAs are "difficult" to program, is I believe an oversimplification of the hardware/software relation in general. Are FPGAs more difficult to program than to implement C++ code on a PC? Yes, but they are also significantly more powerful pieces of hardware than the current computer architecture. For example, one of the most visionary uses of high speed FPGAs would be to replace component cards in the PC of today. For example, in a base system today, one typically has a video, audio, and I/O type card (i.e. hard disk/floppy disk/CD-CDRW-DVD). Imagine now a computer that consists of a large number of FPGAs, essentially reconfiguarable hardware. Now drivers can be reset on the fly, power up ready OSs with no boot time (using non-volatile configurations), and a host of other interesting and desirable properties are possible. If you want to send email the FPGA bank can reconfigure itself into a network or wireless ethernet card. This has some significant advantages over the current paradigm.
Several readers commented concerning the adoption of FPGAs is not going to happen quickly(i.e. no development support) or that the problems with bus interface speeds are nontrivial. However, these difficult problems are not the problem of the hardware, and attempting to interface it to the standard PC, however kludgy, is a rational approach. Criticizing the implimentation here is a bit like telling someone that they should have used a Porsche instead of a Pinto to build a time machine.
I have seen the card itself (Score:2, Interesting)
Anyway, PCI FPGA cards has always been available, and they are hugely expansive. But they are getting down in price. One of the problem of FPGA is that the speed of the chips are slow (depends on the complexity of the circuit, you can only clock it to around 1Ghz for very simple cores, lot slower for complex circuit) so consider the speed of microprocessor it is not worthwhile to use them in normal computer systems - but a new niche is open up in embedded market.
Notice the name of the university is right! Chinese name of the university suggested that the "Chinese" in the university name is actually means Chinese language, not China the country.
L2 Memory Slot Used to for CPU (Score:1)
We did it back in the days of Z80 (8 bit) micros (Score:2)
Everything old is new again... I wonder how many other ancient techniques would be useful now...
Anyone remember hardware memory swaping (bank switching)? You could take that machine with a measly 8GB memory limit and expand to 256 banks of 8GB for 2TB of memory (assuming you could afford all that) with a single memory mapped bank select byte. Only 1 memory access cycle to swap 8GB.
Actually, that might be pretty cool.
2 Questions for a FPGA guru… (Score:1)
Having first heard about FPGA tech a few years ago, and realising that while it presented some interesting "tricky" problems, I completely failed to think of one application proving both interesting and vaguely plausible for which an FPGA would be the best practical "way to go". Recently I had several ideas - for which only IO bandwidth seemed an unavoidable problem. Is there someone who can comment on the feasibility of implementing:-
Re:2 Questions for a FPGA guru… (Score:1)
there's alot of things going on here. by "main store do you mean main memory? if so, forget about it. DRAM is never going to be as fast as the SRAM in your cache. IMHO, capacitors just can't do it. writing to cache is a different story, but now you're talking about cache control issues by having two devices dirtying up the cache. maybe this wouldn't be such a problem becuase the processor is going to know when it tells the "bit manipulator" to do its work.
I have no idea what kind of clock speed you can get out of different FPGA's. I could look it up, but I won't. ASIC would probably be better for this. it's fast. the operation you're talking about is something like this (i think):
So you need something that can do a word-length AND + OR in one clock cycle... Can you do it at 2GHz+? I have no idea...
Nothing new for the Chinese (Score:1)
Re:Nothing new for the Chinese (Score:1)
Don't know what the status of the boards are, I just took a little offense at being called a fat cat.
Low Cost? No, not realy? (Score:1)
Why use the memory interface? (Score:1)
At any rate, placing the FPGA on the memory bus allows the massive amount of communications required to reconfigure the board to occur in a reasonable time and also effectively gives you parallel processing for free. You treat part of the FPGA board as a second CPU (but not a general purpose one) and the rest of it as memory. When you've got an instruction stream that is better adapted to the FPGA's current confuguration, you write the data to it's memory, let it run and you can use the CPU for other tasks. Since most algorithms you'd want to run on the FPGA have a deterministic time of completion (even searches because you can bound the time it would take to search the entire data set) you just read the results off the FPGA memory after a little while.
Anyway I haven't read the article yet, so I'll go do that now.
Re:Postscript file (Score:1)
[debianlinux.net]
Pilchard - A Reconfigurable Computing Platform
Wow... (Score:1)
dumbass moderators (Score:2)