Intel Develops Hardware To Enhance TCP/IP Stacks 271
RyuuzakiTetsuya writes "The Register is reporting that Intel is developing I/OAT, or I/O Acceleration Technology, which allows the CPU, the mobo chipset and the ethernet controller to help deal with TCP/IP overhead."
Good stuff! (Score:5, Interesting)
This is Good News.
Re:Good stuff! (Score:5, Informative)
Re:Good stuff! (Score:5, Informative)
FreeBSD for example, has a kernel option called ZERO_COPY_SOCKETS, which dramatically increases network throughput of syscalls such as sendfile(2). With this option enabled, as the name entails, data is no longer copied from userland to kernel space and then passed onto the network card's ringbuffers. It is copied in one swoop!
Re:Good stuff! (Score:2, Interesting)
Re:Good stuff! (Score:2)
So shouldn't it be called ONE_COPY_SOCKETS, then?
Re:side effects? (Score:4, Informative)
Re:Good stuff! (Score:2)
sending it straight from the userspace supplied buffer.
Doing so may ofcurse have other affects though.
Re:Good stuff! (Score:3, Funny)
Of curse?
d
What do you want to drop? [a?*]
?
a - a cursed -1 tcp/ip connection
a
Sorry, you can't drop the tcp/ip connection, it seems to be cursed.
Hmmm
Re:Good stuff! (Score:2, Interesting)
> If the application changes the data directly after the send()-call this should not affect what is sent.
So just don't let the application change the data (hint: single-assignment programming languages).
> This means that the OS has to copy the data into kernel memory,
Either that, or you could improve support for copy-on-write in the MMU (which might benefit other tasks than just networking).
Sometimes changing the assumptions is the proper way to solve the problem.
Re:Good stuff! (Score:2)
Re:Good stuff! (Score:2)
finally... (Score:5, Funny)
and they say the drug companies are miracle workers
Speaking of drugs (Score:2)
Have you ever wanted your TCP stack to be more secure? Has your internet ever dribbled? Sign up for intel soft tabs now!
White elephant? (Score:5, Interesting)
--
Toby
Re:White elephant? (Score:3, Informative)
Re:White elephant? (Score:5, Informative)
Also remember that a well implimented TCP/IP stack runs at about 90% of the speed of a memcpy() (Tannenbaum's book again).
For hardware TCP/IP processing to be useful, you need to be say 2x the speed of the CPUs memcpy() function!
Given that the main performance bottleneck is memory access, since you're basically copying buffers around and so caching isn't going to help you, I don't see how any sort of super-duper hardware is going to give you anything like a 2x speed up, let alone at an economic price.
--
Toby
Re:White elephant? (Score:5, Informative)
Simply put, software on general purpose processors sucks for doing heavy computational work. Hardware tuned especially for a task has, and always will, be where it's at. However, the costs involved in creating ICs specific to a task usually mean that ASICs are only created where there is a need. Modern graphics cards are a great example. The on-board graphics processors are designed especially to create graphics, something that, if offloaded onto the GP CPU, would crush even the highest of the high end.
Also, offloading the TCP/IP stack on a normal workstation probably isn't going to be a huge performance boost. Where this will be useful is in situations where there is a need for high-throughput, low-latency network I/O processing.
Re:White elephant? (Score:5, Insightful)
Also, Catalyst switches are not highly parallel. They can be parallel, depending on the exact model and configuration, as well as the exact path inside the switch that the traffic takes, but it's not even remotely the same in execution as having "hundreds of linux routers side by side."
Instead, it is the exacting way in which the various components of the switch pass data, the very specific purpose of each chip and circuit in the device that gives modern routers the speed they do. Special components such as content-addressable memory, tertiary content addressable memory (memory that allows you to store 0s, 1s, and wildcard values instead of just 0s and 1s, allowing for wire-speed match comparisons against ACLs and routing tables), etc. etc. It isn't merely a stack of GP CPUs all running in parallel to achieve a particular task.
Systems guys often mistake routers and switches for computers with a bunch of Ethernet jacks. They're far from it. They are highly specialized pieces of hardware designed from the bottom up to do one thing and do it well -- transport data. Computers are the opposite. They're designed from the bottom up to be able to do whatever you wish them to as fast as possible, but that flexibility comes with a price.
If you ever get the urge, you should read up on Catalyst switching architecture. You'll find it quite interesting.
Re:White elephant? (Score:2)
Re:White elephant? (Score:3, Interesting)
Re:White elephant? (Score:2)
Not at all true. Dipping into Ricardian economics, you can conclude that the best, most valuable, purpose of the primary CPU is to process user input and to execute applications. If another CPU can be introduced into the computational economy such that it can perform a task, even if at a lower rate than the primary CPU, thus freeing up the primary CPU to perform its most valuable task more efficien
I don't think you understand P4s (Score:2)
This is not a new concept.
DEPCAs made network I/O easy back in the days of ISA busses twenty odd years ago, and there have been PCI cards with their own CPUs which you can actually load a version of Linux into and use as standalone routers - so the network cards handle stuff like ICMP a
Re:White elephant? (Score:2)
Re:White elephant? (Score:4, Interesting)
I think in xyz's book there's a reference which states that offloading graphics processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the graphics processing has completed.
See how silly that sounds when you substitute network with graphics? We all know that offloading graphics processing is a good thing. Why? Because it's optimized for the task. Why couldn't the same be done for networking?
Re:White elephant? (Score:3, Interesting)
Well, does waiting 3 milliseconds at 3 GHz outrun waiting 3 milliseconds at 300 MHz?
The only advantage I can see to this is that it's often nice to have I/O handled in a separate process/thread running on a separate processor. But, as many have already noted, unless the I/O processor is tuned for this you've either got another expensive processor or you're running the I/O thread on a slower processor.
If the processor _is_ tuned for
Re:White elephant? (Score:3, Insightful)
Re:White elephant? (Score:5, Interesting)
Besides, GPUs are more powerful than CPUs at the task of rendering polygons.
Very often ASICs are better at a task than general purpose CPUs, just that considerations must be made as to whether the performance gain is worth the cost difference.
Re:White elephant? (Score:2)
Re:White elephant? (Score:2)
Yes, that's the whole point - they're more powerful at that task because they're specifically designed to perform that task (amongst others). Similarly, a "network processing unit" would be specifically designed to support in hardware the operations required of it. Make that chip fast enough, and it'll be faster at doing it than a general-purpose CPU. The only question is how fast it has to be, and whether or not it's cost-effecti
Re:White elephant? (Score:5, Informative)
You cannot accelerate networking very much because the problem is highly serial.
It is improper to compare the two because they are fundamentally different problems.
You can throw tons of hardware at 3D graphics and get good results, because just by having more and more pipelines, you go faster and faster.
Processing a network packet is quite different; the data goes through a series of serial steps and eventually reaches the application layer. The only way you can really make it go faster is to up the clock rate, and you find it's uneconomic to try to beat the main CPU, which remember has *already* been paid for. You have all that CPU for free; to then spend the kind of money you'd need to outpace the CPU makes no sense, let alone the money you'd need to spend to outpace the CPU by a decent margin.
--
Toby
Re:White elephant? (Score:2)
So it's a series of steps. Ok, then make each step a part of a pipeline, with a specialized circuit for exactly that step. Then while the next circuit on the pipeline gets to do the next step on that packet, the first one can already start processing the next packet. This is how modern CPUs speed up the decoding of machine instructions, so why shouldn't the same work with TCP/IP packets as well?
Re:White elephant? (Score:2)
what??? not when your doing multimedia decoding of compressed data... or other such tasks... offloading the networking stuff to hardware will have th
Re:White elephant? (Score:4, Informative)
I used to work at a company that did Fibre Channel.
One of the things we had was an ASIC that did network processing in hardware, allowing us to do all sorts of interesting stuff at wire speed (2Gbps). If we had to load into memory we would have been at least an order of magnitude slower.
Lots of people agree, including AC and DM (Score:4, Informative)
AC being Alan Cox, DM being Dave Miller.
Read Alan's opinion here [theaimsgroup.com].
Read Dave's opinion here [theaimsgroup.com].
There has been discussion of this specific Intel announcement here [theaimsgroup.com].
Re:White elephant? (Score:2)
It's not like this would be an easy thing to sell in some way that people would really understand very well. But regardless they aren't going to develop a whole new piece of hardware that is worthless. Making a design decision that pushes something down a bad path like clock speed is a whole different issue. I'm pretty sure intel guys would think this one out before spending
Re:White elephant? (Score:2)
The main CPU runs multiple things.
The cost of network traffic are cache flushes and context switches. And so on.
General purpose CPU is much weaker than special purpose CPU, if you can parallerize at all.
And MFG costs my ass. These things should be relatively small.
Think following scenario.
Network interrupt->context switch-> move lot of data around and compute some what-> context switch.
To finish what I was doing, and then compute the thing that I just put in the line. (u
Re:White elephant? (Score:3, Insightful)
Those little boxes were masters at multi-processing, and they did it right - one processor for pretty much every major peripheral task (disk, graphics, sound, something else I can't remember).
As long as these Intel coprocessors are going to be an open standard (which they almost certainly won't), then I'd welcome this addition to PC architecture.
And the CPU doesn't have other things to do? (Score:4, Insightful)
Not that this is a new idea. It's been done for donkey's years.
Re:And the CPU doesn't have other things to do? (Score:2)
Consider; you have a hundred users, all doing some sort of network based task - say, reading Usenet via an NNTP server.
You offload their network processing from the CPU to a slower CPU on the network card.
Every time a thread in your NNTP server blocks, waiting for a packet to arrive or be sent, the main CPU moves onto another thread...which also then needs a send/recv, and blocks, and so on.
In the meantime, the slow CPU gets around to deali
Re:White elephant? (Score:2)
This is a pretty ridiculous claim. Take a look at Cisco routers some time... With a slow CPU, they can transfer gigabit upon gigabits of data through every second. In some cases, they are even just using PCI network cards.
Packetizing data, and handling the incredible storm of interrupts, is something CPUs are very poor at. Servers stand to get a huge performance
Re:White elephant - flawed logic (Score:3, Insightful)
With all due respect to Mr. Tannenbaum, but if he stated what you put in your post, his logic is severely flawed.
Let's compare the general CPU/networking CPU combination with a manager/secretary.
The manager has a number of tasks which needs to be done, including scheduling a number of appointments. Without a secretary, he'll be obliged to call/contact the peo
Re:White elephant - flawed logic (Score:3, Insightful)
In the purest form, it would be like that: one single thread that does not gain much from the offloading. However: have you checked just how many threads are actually running on PCs nowadays? You specifically say 'more tasks can be done concurrently'... isn't this exactly the point of offloading?
Next thing you know, the difference between SCSI and IDE are moot because 'fo
Re:White elephant? (Score:2)
This a bit of an oversimplification. There are at least three cases in which offloading makes sense: dropping packets on the NIC (for example, during a DoS attack), reducing bus overhead by combining multiple req
Re:White elephant? (Score:2)
Good thing my scheduler has about 50 other tasks in the queue waiting for their turn.
Re:White elephant? (Score:3, Informative)
--
Toby
Re:White elephant? (Score:2)
This has the negative effect of making the thread which has had its network I/O offloaded slower, but the positive effect of freeing the CPU to perform other tasks.
However, I say to you, on a desktop system, which is where this Intel stuff is, the user is usually going to be the cause of the network traffic and he will want that thread to perform and will not care that he could be freeing up a few more percent CPU time to
Re:Is that the same Tannenbaum that said.... (Score:2)
Linux still works, though, and filled an important and well-supported part of the OS worldspace, and became successful.
These two facts are not mutually incompatable.
--
Toby
Re:Is that the same Tannenbaum that said.... (Score:2)
And, furthermore, it's the game guy that wrote The Bible Of Networking (Computer Networks, Prentice Hall). If you did networking courses in college, chances are high that you studied from his book. He _is_ one of the greatest authorities, living or dead, in the field.
So, when he has something to say regarding networking, you better listen up.
Besides, he was right. Monolithic kernels are obsolete technology. Linux success has nothing to do with it. Would you argue that Windows is cutting-e
Fastest network card EVAR (Score:4, Funny)
Security updates (Score:5, Funny)
So... how exactly are they going to ship patches in the case of a security issue?
Re:Security updates (Score:2, Informative)
Re:Security updates (Score:2)
Re:Security updates (Score:2)
Typically, the host system driver uploads the firmware code that deals with all non-essential features (obviously, booting from network already needs most of the firmware).
Ethernet controllers (Score:3, Interesting)
It seems like most common denominator board manufacturers have put off 64-bit PCI support for too long. It's going to bite them in the ass if it doesn't become standard very soon.
Re:Ethernet controllers (Score:5, Insightful)
Re:Ethernet controllers (Score:2, Informative)
a) Gigabit/sec = 1000 Mbit/sec = 125MByte/sec
b) Gigabit/sec = 1024 Mbit/sec = 128MByte/sec
True, even these speeds don't completely saturate the PCI bus, though because of how the PCI bus is shared (each device gets a few clock cycles to do it's thing before passing control off to the next device) no single device could anyway unless it's the O
Re:Ethernet controllers (Score:3, Informative)
gigabit is full duplex - double your figures.
But new motherboards are already starting to come with gigabit attached to PCI Express. For the last few years any decent board has had them on fast PCI-X, at least 64 bit 66 MHz.
Re:Ethernet controllers (Score:2)
Re:Ethernet controllers (Score:5, Insightful)
In truth, a gigabit ethernet card can saturate a 1X PCI-E link (2Gb/s after the 8B/10B encoding is removed), when sending small packets- basically due to packet overhead.
Re:Ethernet controllers (Score:2)
nvidia (Score:5, Interesting)
Re:nvidia (Score:4, Informative)
Yes. The nForce4 chipsets offload most TCP/IP processing and firewall [nvidia.com] from the main CPU.
If you go with a Athlon64 Socket 939 nForce4 board, you get PCI Express, lower power consumption, a ton of great features, good Linux support, and plug-compatible dual core upgrades down the road. Intel's offerings just seem anemic by comparison.
(Personally, I'd also do an NVIDIA graphics board for the excellent Linux driver support. And no, I don't work for NVIDIA, I'm just a satisfied customer.)
Re:nvidia (Score:2, Interesting)
Interesting (Score:5, Insightful)
Granted, I've never administered a server that was under anywhere remotely near the types of loads we are talking about for this to be useful, but I have a hard time imagining that dealing with the TCP/IP stack would be more intensive than running applications (as the article claims).
So, far all you people out there much more qualified to discuss this than I am, will having some part of the processor dedicated to handling TCP/IP really speed things up, or is this primarily a marketing technology?
Re:Interesting (Score:3, Insightful)
Note, this is enterprise-grade hardware hooked up to million-dollar disk arrays.
Now, is that entirely from dealing with the networking stack? No. Not quite. However, consider th
Qlogic TOE cards (Score:5, Informative)
Re:Qlogic TOE cards (Score:2, Informative)
yeah great (Score:5, Funny)
And then there will be other enhancements like the tcp/ip one.
For instance a special accelerator card for Word and Internet Explorer will be developed.
Furious Linux users will demand their own technology, so one manufacurer will come up with a special card for running GNOME apps. This card will have 4 duel core 6 Ghz processors and allow Gnome to run at normal speeds.
Re:yeah great (Score:2)
Re:yeah great (Score:3, Funny)
Re:yeah great (Score:2, Insightful)
Parallelism is great. Look the way things are going. Dual CPU motherboards, Dual core CPUs, Cell..
And gnome.. sheesh.. back when I ran a P100 and Gnome was slow, I thought "well one day I'll have a 500Mhz monster and Gnome will be fast". Here I am with a P4-2.6Ghz/1Gb and Gnome is STILL a dog. *sigh*
Re:yeah great (Score:2)
You speak in jest, but... (Score:3, Insightful)
This USB keyboard I'm typing on involves at least three processors, one to scan the keys, one to do the USB on the peripheral side and the third to do the USB on the motherboard side.
Will it support IPv6? (Score:5, Interesting)
So, now hackers will target your BIOS rather than (Score:3, Interesting)
So finally! (Score:5, Funny)
Old news (Score:5, Informative)
Of course it couldn't last, what with nVidia doing firewalls and NICs and all sorts of other things, Intel is a big company and they know when they need to compete. MS has also lost a bit of their clout when it comes to things like pressuring the bigger companies (intel, HP, Dell)
if i were to make wildly unsubstatiated guesses... (Score:3, Interesting)
i'd guess the tcp/ip stack implementations available to intel are pretty solid. still, i'd hope it'd be flashable just in case. i can imagine only once in a blue moon would you find someone with libpcap and the patience to find holes in some of the most trusted code in the net.
ha! who needs it? (Score:5, Funny)
And the integrated DRM? (Score:5, Interesting)
Don't think for a minute the big boys aren't trying to take the Internet away from us. The missed the opportunity once, never twice.
DoS Attacks (Score:3, Interesting)
Ha, old news! FPS's have had this for ages. (Score:2, Funny)
We're had this for years in FPS's- used to be that I used to have to practice for ages just to compete with the young kids at FPS's. Then along came some great 'acceleration' technology, and it's been so much easier. I call mine a bot.
Ever since it hasn't been about upgrading my CPU or graphics cards to get that head-shot. I've been offloading all that work!
3COM? (Score:2)
This old bit of snake-oil... (Score:5, Insightful)
Except:
We've seen successive waves of this concept, none of them have had much success. Graphics processors are one partial exception, and it took almost a decade of mis-designs of those before they became stable enough to be usable.
Dupe? Well not a /. dupe... (Score:2)
They at one point used to do just the PRO/100 cards, then they dropped them and started doing PRO/100 cards that did IPSEC hand off? If I remember correctly the S was security and they had a few other models? I was thinking back then that they would be looking at IP hand off at some point.
Comment removed (Score:4, Interesting)
Re:A good thing (Score:2)
Re:A good thing (Score:5, Funny)
1. Refresh your browser constantly until there's a new story on Slashdot, to post before everyone else.
2. Post something similar to "This is good/bad, for INSERT_OBVIOUS_REASON_HERE. And fuck the INSERT_RIAA-LIKE_ORGANIZATION_HERE." (second sentence is optional)
Re:A good thing (Score:3, Funny)
Attn MODS. (Score:2, Redundant)
HE
DIDN'T
SAY
A
DAMN
THING!
Re:Attn MODS. (Score:2)
Re:the good, the bad, the ugly? (Score:2, Interesting)
Re:the good, the bad, the ugly? (Score:4, Insightful)
What's new here is that Intel wants to put this in their chipsets everywhere and not just in $700+ NICs. Already this has been happening with checksum offloading, TCP fragmentation, smart interrupts, and so on in most GigE chips.
So yes, people have done this before and have been since at least 2000.
As far a DRM is concerned, look at the NIC market and look at the TCP/IP spec. TCP/IP? Standard and anything non-standard won't work with stuff that's out there. Wierd NICs? I've been getting Linux source-code drivers for even the cheapest of cheap NICs for years now. There's too much competition to sneak in something restrictive.
Re:the good, the bad, the ugly? (Score:2)
Rus
Re:the good, the bad, the ugly? (Score:2, Interesting)
Less generically, the original Auspex NFS servers had distinct boar
Ugly (Score:2)
Re:cpu? e-net controller? (Score:2, Funny)
The secret to faster downloads is to keep wiggling the mouse, that way it pushes the data through faster.
Re:cpu? e-net controller? (Score:2)
I said, TCP/IP data. Typically, the ethernet controller, mobo chipset, and cpu don't care what kind of data it's processing, just that it's processing data. Now it'll be sensitive to TCP/IP overhead and have special ways to process it.
Re:Great - no (Score:2)
Re:Nothing to see here (Score:4, Funny)
Craig Barrett here.
Listen we apologize for this distraction, and apologize for not consulting with you first. I guess some of our engineers just got caught up in something silly and they went off and did this when instead they could be doing things more valuable to you.
We immediately begin work on the porn accelerator coprocessor.