Remote Direct Memory Access Over IP 166
doormat writes "Accessing another computer's memory over the internet? It might not be that far off. Sounds like a great tool for clustering, especially considering that the new motherboards have gigabit ethernet and a link directly to the northbridge/MCH."
Also (Score:5, Insightful)
Re:Also (Score:2)
That way, non-remote memory won't be accessible, and your data will stay your data.
Re:Also (Score:3, Insightful)
Re:Also (Score:1)
Re:Also (Score:2)
Re:Also (Score:2)
Re:Also (Score:3, Informative)
Some Objections to RDMA
Security concerns about opening
memory on the network
- Hardware enforces application buffer
boundaries
Makes it no worse than existing security
problem with a 3rd party inserting data into the
TCP data stream
- Buffer ID
Re:Also (Score:2)
haha... outlook worm writers will have a field day (Score:5, Insightful)
S
Re:haha... outlook worm writers will have a field (Score:4, Interesting)
Scott McNealy said that, but the vision was implemented by others. CMU's Mach [cmu.edu] (1985), Andrew Tanenbaum's Amoeba [cs.vu.nl] (1986), and Plan 9 [bell-labs.com] (1987) were OSes that made a network into a computer.
To be fair, Sun does have ChorusOS [experimentalstuff.com] , but that seems to have died the death (i.e. gone Sun Public Source) despite Scott's best intentions.
rdma? (Score:5, Funny)
The security implications are staggering.
How do we lobby for port number 31337 for the RDMA protocol?
Re:rdma? (Score:5, Insightful)
And doesn't tcp/ip involve a lot of overhead for memory access?
Re:rdma? (Score:3, Interesting)
Land of the free, void where prohibited.
Security Implications (Score:1, Insightful)
I think that shared memory a
Re:Security Implications (Score:2)
But it's one thing to have this feature in the machines that make up a cluster that runs a big DB, and another thing to have it in every machine. The story said that MS is talking about putting it every version of windows, to help spread the technology's adoption.
Re:Security Implications (Score:2)
It sure means that MY hosting providers shouldn't be...
Re:rdma? (Score:3, Interesting)
Remote shared memory (Score:5, Informative)
Re:Remote shared memory (Score:2)
Even swapping over NFS could be considered remote memory, although it is not exactly 'shared'.
Re:Remote shared memory (Score:5, Informative)
I'm not familiar with MOSIX, but Oracle uses RSM on the theory that the high-speed RSM link is always faster than accessing the physical disk. So if you have 2 nodes sharing a single disk array, and Oracle on one node knows that it needs a particular block (it can know this because in Oracle you can calculate the physical location of a block from rowid as an offset from the start of the datafile - that's how indexes work) then the first thing it will do is ask the other node if it has it. This is called "cache fusion". If it has, then it is retrieved. Previous versions of Oracle had to do a "block ping" - notify the other node that it wanted the block, the block would then be flushed to disk, and the first node would load it. This guaranteed consistency, but was slow. With RSM, the algorithms that manage the block buffer cache can be applied across the cluster, which is very fast and efficient.
Speaking of process migration, there is a feature of Oracle called TAF, Transparent Application Failover. Say you are doing a big select, retrieving millions of rows, connected to one node of a cluster, and that machine fails in the middle of the query. Your connection will be redirected to a surviving node, and your statement will resume from where it left off. I'm unaware of an open-source database that can do either of these.
Re:Remote shared memory (Score:1)
Re:Remote shared memory (Score:2, Informative)
There is some primitive distributed shared memory support in OpenMOSIX; no idea how stable it is though. Normal openmosix/mosix won't migrate tasks requiring shared memory (ie: threads)
Not by default (Score:2)
Re:Remote shared memory (Score:2)
Back then, there were expensive commercial interconnect systems --I don't think Infinniband was around then-- that did the job. But with the costs involved it made Mosix somewhat besides the point.
I may be wrong, but this could be
NUMA (Score:5, Informative)
This article [berkeley.edu] defines NUMA [unsw.edu.au] as
which seems to cover all of this.Re:NUMA (Score:2)
404, this DIMM not found.... (Score:2, Funny)
not necessary for 90% of distributed computing (Score:3, Informative)
I smell a hotfix... (Score:5, Insightful)
> over TCP/IP in all versions of Windows
Can you see it coming? The ultimate Windows root exploit!! Hmm... I guess someone has to go tell them. Othervise they won't notice it until it's too late...
Seriously, how do you dare to enable this kind of access?!?
Re:I smell a hotfix... (Score:3, Insightful)
"With Windows CX, your computer will have the latest in Remote Memory Management. Share your system's power with another Windows PC for added performance. Trusted applications will automatically control your memory remotely, saving you the trouble of worrying about the wrong programs using your PC."
(Which, in the usual MS doublespeak, means Bill's trusted computer can bork warezed ver
Re:I smell a hotfix... (Score:1)
A friend of mine wants to see a web server with an HTML form where you can just paste in some assembley code and the server will just execute it. The ultimate killer app! "Just give me some code and I'll just run it."
Security is not an impossibility (Score:2, Interesting)
While M$ is probably not going to get this one right, it doesn't mean that someone can't. This *is* a desirable feature for some applications, and it is possible to make a secure environment (where secure is defined for the application), and make it seamless as well. That is the whole goal of network security professionals.
If anything, the f
Prior art ;-) (Score:5, Interesting)
The file in question actually resided in a RAM drive on another machine on the LAN.
I couldn't get it to work in the 45 minutes or so I messed around with it. I'm not sure if Linux was unhappy using an NFS-hosted file for swap, or what exactly the problem was, but I did get some funny looks from people to whom I explained the idea (ie, to determine whether the network would be faster than waiting for my disk-based swap).
Of course, this was back when RAM wasn't cheap...
Re:Prior art ;-) (Score:1)
Re:Prior art ;-) (Score:2, Interesting)
NBD (Score:2)
2. Set up a network block device [sourceforge.net] to export said ramdisk.
3. Set up client using nbd-client to talk to server with network block device.
4. swapon
5. profit!!!
Using NFS for disk-based swap is possible but silly since you incur the extra overhead. NBD works on a plain vanilla TCP connection and avoids touchy issues like memory vs. packet fragmentation. If you have a gigabit ethernet card with zero-copy support in the driver, then you are in business.
Ha
Re:NBD (Score:2)
He was complaining about slow swapping, and I was like, "hmmm... I can probably swap over the network to the fileserver machine!"
I'm not sure if NBD was in the kernel at the time, and it definitely wasn't compiled in. This was, umm, '98 or '99, I believe.
Of course, this was from my P2-200 with 32 MB of RAM. Our file server was a dual 233, IIRC, with like 128 MB, most of which did nothing most of the time.
Campus network, al
iWarp (didn't think this was new) (Score:2)
Whoa (Score:2)
Virus problems (Score:1, Interesting)
What kind of problems will develope once virus & worm writers, and spammers get access to this mechanism?
Of course, if DRM (digital restriction management) comes along, at least it will give a back door into the system.
Yeah... (Score:5, Funny)
Brings up interesting ideas of ways to prank your friends & enemies though.
Re:Yeah... (Score:2)
Remote fun with an enhanced debug.com? (Score:3, Funny)
0103 mov al, 65
0105 mov ecx, 2000
010a rep stosb
010b jmp 100
g=100
Already Done (Score:5, Funny)
Intel's VI Architecture (Score:1)
VI Architecture [intel.com]
Re:Intel's VI Architecture (Score:3, Informative)
RDMA article [nwfusion.com]
Bah, old stuff (Score:5, Insightful)
It's very interesting that using memory over the network is very much the same problem as cache coherency amongst processors. If you have multiple processors, you don't want to have to go out to the slow memory when the data you want is in your neighbors cache... so perhaps you grab it from the neighbor's cache.
Similarly, if you have many computers on a network, and you are out of RAM, and your nighbor has extra RAM, you don't want to page out to your slow disk when you can use your neighbor's memory.
NUMA machines are somewhere in between these two scenarios.
There are lots of problems: networks aren't very reliable, there's lots of network balancing issues, etc. But it's certainly interesting research, and can be useful for the right application, I guess.
Disk is slow, though... memory access time is measured in ns, disk access time is in ms... that's a 1,000,000x difference. So paging to someone else's RAM over the network can be more efficient.
I don't have any good papers handy, but I'm sure you can google for some.
Re:Bah, old stuff (Score:2)
One such implementation allows you to write directly to memory using a message. This bypasses several system calls, several interrupts, and is quite safe as long as bounds are checked properly by the kernel. This type of setup is used in the high-performance networking used on supercomputers, where the bottleneck is the network. (google for "Portals message passing")
Allowing messa
Re:Bah, old stuff (Score:2, Informative)
That doesn't invalidate my point about networking latency though...
Re:Bah, old stuff (Score:2)
Anyway, network latency is determined mainly by processor speed, distance, layout, and bandwith of the network. On my 100bt network at home, going through a couple of 100bt switches, I usually get You can also generalize this to any sort of interconnect fabric, like RapidIO or HyperTransport or PCIExpress. There is always latency to the memory. Cache is rather low-latency, typically a few cycles, and is compensated for by the pipeline on your processor. L2 cache is hig
Re:Bah, old stuff (Score:2)
Indeed, the latency will suck - badly. It's an unavoidable fact that the speed of light is finite. The greater the distance you put between the nodes, the suckier the latency becomes. This is why Sun campus clusters are limited to just a few Km. when doing remote disk mirroring. On FC-AL direct attach storage over dark fibre, you start hitting the SCSI time-out limit on disk writes and a write-intensive (especially trans
InfiniBand (Score:1)
Re:InfiniBand (Score:1)
Re:InfiniBand (Score:1)
Re:InfiniBand (Score:3, Interesting)
The real next steps for IB is 12X (30 Gb) and on mother board IB. 12 X is in development. Currently, IB adapters are limited by the PCI-X slot they sit in. PCI-X DDR and PCI Express should help, but just having it on the mother board and throwing PCI out would be interesting. Small form factor clusters
Re:InfiniBand (Score:2)
Don't lose hope... we are out there... just need to get the word out!
Re:InfiniBand (Score:2)
Infiniband has excellent support for this (Score:4, Informative)
Re:Infiniband has excellent support for this (Score:2)
Infiniband is not dead (Score:2)
The high performance computing market and data
Re:Infiniband is dead (Score:2)
I think that the major future switch fabric is PCIExpress. It is what's going to go into the consumer and semi-consumer markets (at least, that's my guess). It's going to be fast, serial, switched, and most importantly, has the 800lb gorilla of Intel behind it.
FreeBSD's firewire already can do this (Score:5, Informative)
the firewire bridge ability to DMA to/from any
location of memory. Very handy for remote kernel
debugging.
No on read the article or looked at the spec. (Score:5, Informative)
First, what the headline would have you believe has been invented is making it appear as though the RAM of one machine is really the RAM of another machine. This technology has been around and used for quite some time in clustered/distributed/parallel computing communities since at least the 1980s.
If you look at a brief summary of the spec, http://www.rdmaconsortium.org/home/PressReleaseOct 30.pdf [rdmaconsortium.org], you'll find that all that's happening is that more of the network stack's functionality has been pushed into the NIC. This prevents the CPU from hammering both memory and the bus as it copies data between buffers for various layers of the networking stack.
I'll also note that the networking code in the linux kernel was extensively redesigned to do minimal (and usually no) copying between layers, thereby providing very little advantage of pushing this into hardware.
Please, folks, don't drink and submit!
Re:No on read the article or looked at the spec. (Score:2)
Offloading the TCP/IP stack will be needed for current servers to push 10 Gb over TCP/IP. It also becomes a big deal for latency reduction and for iSCSI performance. It makes a big difference. Most of todays dual CPU Intel based boxes have trouble going too
Re:No on read the article or looked at the spec. (Score:2)
I know of very few hardware platforms, Intel-based or otherwise, that can handle 10Gb/s over a single I/O stream. PCI just doesn't go that fast (yet).
You'll need more than what's in this spec to get to 10Gb/s.
not pushing 10 Gb...yet (Score:3, Informative)
Most of y'all are missing the point here (Score:1)
I can already do this... (Score:4, Insightful)
2. read from and write to
So where is the use of that? And shared memory emulation over a network is also a decades old technology.
You're not doing the same thing (Score:1, Insightful)
The approach you describe relies on CPU intervention on both ends of the connection. The article describes an approach that is much closer to the actual hardware than simply opening a ssh connection. I hope this clears the issue up for you!
Re:You're not doing the same thing (Score:2, Interesting)
Not allways. The ssh example was for show only. But about a decade ago I saw a diploma thesis advertised that should develop a hardware implementation for shared memory that could work without special drivers. True it was SCSI-based and therefore did not allow non-local networking. But with non-local networking the transfer dominates the latency anyway and hardware does not help.
All I am saying is that the idea is neither
Great for handheld devices (Score:2)
Neat... (Score:2)
I have heard of a similar technology (Score:2, Funny)
Sounds like SCRAMNET (Score:1)
This would only be a slightly different transport...
Re: (Score:1)
plan9's had this since it started (Score:5, Informative)
The mem file contains the current memory image of the process. A read or write at offset o, which must be a valid virtual address, accesses bytes from address o up to the end of the memory segment containing o. Kernel virtual memory, including the kernel stack for the process and saved user registers (whose addresses are machine-dependent), can be accessed through mem. Writes are permitted only while the process is in the Stopped state and only to user addresses or registers.
The read-only proc file contains the kernel per-process structure. Its main use is to recover the kernel stack and program counter for kernel debugging.
The files regs, fpregs, and kregs hold representations of the user-level registers, floating-point registers, and kernel registers in machine-dependent form. The kregs file is read-only.
The read-only fd file lists the open file descriptors of the process. The first line of the file is its current directory; subsequent lines list, one per line, the open files, giving the decimal file descriptor number; whether the file is open for read (r), write, (w), or both (rw); the type, device number, and qid of the file; its I/O unit (the amount of data that may be transferred on the file as a contiguous piece; see iounit(2)), its I/O offset; and its name at the time it was opened.
Please don't say it. (Score:2, Funny)
we don't need no steenkin' buffer overflow attacks (Score:2)
(ok, ok, there should be some serious security with remote memory. I couldn't resist.)
Worst possible technique for distibuted systems. (Score:3, Insightful)
The amount of book-keeping required to keep this thing going makes it a non-starter. And as for scale'ing. Forget it.
The sad truth is that it's common knowledge that this is the least efficient principle for distributed systems. This technique is usually the fall-back position if nothing else works.
dry fiber? (Score:2)
AFAIK there is not equivalent offering for fibre and one really needs fiber to be able to do anything interesting.
Now - if dry fiber did exist then it would make a great deal of sense to r
Brought to you by Microsoft (Score:2)
What this is REALLY for (Score:3, Insightful)
It will not allow arbitary access to your memory space. In fact, it would prevent a great number of buffer overflow exploits
The best analogy is the difference between PIO and UDMA modes of your IDE devices (or any device). This is all about offloading work from your CPU. It is moving the TCP/IP stack from the kernel to the network card for a very specific protocol.
Here's how RDMA would work layered over (under?) HTTP.
- browser creates GET request in a buffer
- browser tells NIC address of buffer and who to send it to.
- NIC does a DMA transfer to get buffer. OS not involved
- NIC opens RDMA connection to webserver
- server NIC has already been told by the webserver what buffer it should put incoming data
- webserver unblocks once data in buffer and parses it.
- webserver creates HTML page in second buffer.
- webserver tells server NIC to do a RDMA transfer from buffer to browser host
- client NIC takes data and puts it in browser buffer
- browser unblocks parse HTML and displays it.
All of this with minimal interaction with the TCP/IP stack. RDMA just allows you to move a buffer from one machine to another without alot of memory copying in the TCPIP stack.
In fact, the RDMA protocol could be emulated completely in software. It would probably have a small overhead verses current techniques but would still be useful. Just imagine real RDMA on the server and emulated RDMA on the clients (cheaper NIC). The server has less overhead and most clients have cycles to spare!
Just one problem... (Score:3, Insightful)
There's just one problem with that... ethernet (even GigE) is *not* a good connection for clustering. Sure, the bandwidth is semi-decent, but the *latency* is the killer. Instead of a processer waiting a number of nanoseconds for memory (as with local memory), it'll end up waiting as much as milliseconds. That may not sound like much, but from nano to micro you jump seven orders of mangitude!
steve
Microsoft announces... (Score:3, Funny)
..."See, we TOLD you it was a feature!" Microsoft will also sue the researchers working on this project, citing they Innovated this years ago.
Great, but what about XML? (Score:2, Funny)
That's what the whole thing sounds like to me...
Already exists to some degree (Score:2)
Cplant [sandia.gov] style clusters do this as well. They also provide an API called Portals which revolves around RDMA. Portals, incidentally, is being used in the Lustre cluster filesystem and is implemented in kernel space for that project. It can use TCP/IP I believe but its not real RDMA.
*sigh* some day all NICs will be smart enoug
All your memory are belong to us (Score:2)
Just imagine... (Score:2)
Unless... (Score:2)
1)Get into one machine behind firewall.
2)Sniff database's possibly encrypted RDMA setting your account to zero balance.
3)...
4)Profit!!! (Replay the message setting your account balance back to zero before you get billed.)
been there, done that (Score:4, Interesting)
"Back in the day", I wrote a virtual memory handler for my Amiga's accelerator card (which had a 68030 and MMU). Meanwhile, some friends of mine had developed this networking scheme that involved wiring the serial ports of our Amiga's together in a ring, which allowed us to have a true network without network cards.
Then came the true test: I configured my virtual memory to use a swapfile located in a friend's RAM-disk (he had way more memory than I did), fired up an image editor, opened a large image, and lo and behold: I was swapping at a whopping 9600 bytes per second! The fact that every packet had to pass through multiple other machines (because of the ring-nature of the network) didn't make it any faster either...
Re:uh.... (Score:4, Interesting)
When a program asks for memory there's a reasonable amount of loops it has to go through in the processor to get the memory, because the processor manages memory. Making a program that toys with memory over the internet wouldn't be slightly exciting.
DMA channels let something, usually a video card, sound card, IDE bus, etc. do what it needs to do with the system's memory without bothering the processor. The speed gained by not bothering the processor when accessing memory is what makes UltraDMA hard drives so fast, video cards accelerated (in addition to a lot of other l337 tricks), etc.
Now, you take a cluster, connected via gigabit network, in which each computer can directly access each other's memory as opposed to using a program to do it that just takes the target processor's cycles. THAT is slightly exciting.
Re:uh.... (Score:1, Interesting)
Re:uh.... (Score:2)
This RDMA could help. Just push the problem one level deeper, to the OS and to hardware, at raw memory access. Let the OS try to figure out all the problems. Suddenly, yo
Re:uh.... (Score:2)
Just to be pedantic here... But unless you've got an Opteron, the north bridge memory controller controlls memory access..
The opeteron is one of very few processors to ever include an on-die memory controller.
I'll leave it to other pedantds to list any others...
As to the comment by doormat about onboard gigabit lan... onboard LAN capabilities,
Re:uh.... (Score:4, Insightful)
Operating system mediated memory protection might be an issue here... Sane operating systems at least check to see whether Application 1 actually owns the bit of memory it's trying to read/write before letting it chew over memory that actually belongs to Application 2. Just letting some application read and write any memory is a recipe for disaster that sensible OSes have avoided for a long time...
Re:Memmory? (Score:2)
Re:Memmory? (Score:2)
Re:Kid N' Play R00Lz (Score:1, Troll)
Re:astounding (Score:1)
Re:XFree86 and DRI (Score:2, Informative)