Follow Slashdot blog updates by subscribing to our blog RSS feed

Intel Develops Hardware To Enhance TCP/IP Stacks 271

Posted by timothy on Monday February 21, 2005 @04:01AM from the sure-that's-not-a-little-overspecific? dept.

RyuuzakiTetsuya writes "The Register is reporting that Intel is developing I/OAT, or I/O Acceleration Technology, which allows the CPU, the mobo chipset and the ethernet controller to help deal with TCP/IP overhead."

This discussion has been archived. No new comments can be posted.

Intel Develops Hardware To Enhance TCP/IP Stacks

Search 271 Comments Log In/Create an Account

Comments Filter:

Interesting (Score:5, Insightful)

by miyako ( 632510 ) writes: <(miyako) (at) (gmail.com)> on Monday February 21, 2005 @04:11AM (#11734132) Homepage Journal

This seems interesting, though given intels track record I wonder if it will really be as useful as they are speculating, as the article has no real technical information.
Granted, I've never administered a server that was under anywhere remotely near the types of loads we are talking about for this to be useful, but I have a hard time imagining that dealing with the TCP/IP stack would be more intensive than running applications (as the article claims).
So, far all you people out there much more qualified to discuss this than I am, will having some part of the processor dedicated to handling TCP/IP really speed things up, or is this primarily a marketing technology?

Share
twitter facebook
Re:White elephant? (Score:1, Insightful)

by Anonymous Coward writes: on Monday February 21, 2005 @04:11AM (#11734134)

Doesn't matter. Intel is eyeing AMD's success at courting the ricer community and trying to horn in on that action.

Parent Share
twitter facebook
Re:Ethernet controllers (Score:5, Insightful)

by afidel ( 530433 ) writes: on Monday February 21, 2005 @04:22AM (#11734167)

No, a gigabit adapter can't saturate a PCI bus by itself, 32bit 33MHz PCI is 133MB/s, gigabit is 100MB/s. Then there is 32bit 66MHz PCI, and if you want you could run a 32bit card at 133MHz as the standard supports it (though I've never heard of such a card, if you need 133MHz you generally also need 64bit but I assume a ADC could use the faster speed but not need the wider word size. The fastest current implementation of the slot local bus is 16 channel PCI-express which could handle 4 10gigabit adapters. The problem would be coming up with enough data to keep those pipes full, no disk subsystem is fast enough, and any meaningfull SQL transactions are going to be CPU limited on even the bigest of servers, so why would you need a bus with more bandwidth than that? Add to this the fact that servers which actually need more throughput have long had the faster PCI slots and you realize that it's not a problem in the real world.

Parent Share
twitter facebook
Re:the good, the bad, the ugly? (Score:4, Insightful)

by pc486 ( 86611 ) writes: on Monday February 21, 2005 @04:38AM (#11734223) Homepage

I can't believe the parent got modded up. This kind of thing has been done before (RTFA. Yeah yeah, I know. I must be new here...). It's called TOE (TCP Offload Engine) and many networking companies have done TOE. However, most cards are expensive and don't have much support across platforms.

What's new here is that Intel wants to put this in their chipsets everywhere and not just in $700+ NICs. Already this has been happening with checksum offloading, TCP fragmentation, smart interrupts, and so on in most GigE chips.

So yes, people have done this before and have been since at least 2000.

As far a DRM is concerned, look at the NIC market and look at the TCP/IP spec. TCP/IP? Standard and anything non-standard won't work with stuff that's out there. Wierd NICs? I've been getting Linux source-code drivers for even the cheapest of cheap NICs for years now. There's too much competition to sneak in something restrictive.

Parent Share
twitter facebook
Re:Interesting (Score:3, Insightful)

by AutumnLeaf ( 50333 ) writes: on Monday February 21, 2005 @04:40AM (#11734228)

I've seen extremely beefy NFS file-servers go into a crash-reboot-crash cycle after the first crash because all of the hosts trying to remount the filesystem completely crush the machine before it is fully up to speed. We've had to unplug the network cables on the server to prevent the mount storm for killing the server again.

Note, this is enterprise-grade hardware hooked up to million-dollar disk arrays.

Now, is that entirely from dealing with the networking stack? No. Not quite. However, consider this. It takes time to checksum headers and data. It takes time unwrap packets. If you have a ton of clients raining requests for data on your server, it's not hard to see that dealing with the networking bookkeeping could impact the throughput of requests. ie: Database servers and web servers are two things that come to mind here in addition to file servers.

Btw, note that this another part of the "platform" initiative/orientation. While Intel's track-record has not been great in many respects, they do have a good track-record of success with "platforms." eg: Centrino was a "platform."

Parent Share
twitter facebook
Re:Qlogic TOE cards (Score:1, Insightful)

by Anonymous Coward writes: on Monday February 21, 2005 @04:44AM (#11734242)

I'm guessing with sweeping comments such as Sun's TCP stack is crappy you've extensively tested solaris 10? nice to know theres people giving expert opinions on cutting edge software so that people like me dont have to form factually based opinions

Parent Share
twitter facebook
Re:White elephant? (Score:3, Insightful)

by Trogre ( 513942 ) * writes: on Monday February 21, 2005 @04:55AM (#11734279) Homepage

Try telling that to Amiga fans in 1989-1992.

Those little boxes were masters at multi-processing, and they did it right - one processor for pretty much every major peripheral task (disk, graphics, sound, something else I can't remember).

As long as these Intel coprocessors are going to be an open standard (which they almost certainly won't), then I'd welcome this addition to PC architecture.

Parent Share
twitter facebook
And the CPU doesn't have other things to do? (Score:4, Insightful)

by Moderation abuser ( 184013 ) writes: on Monday February 21, 2005 @05:07AM (#11734323)

My boxes all run tens to hundreds of processes for tens to hundreds of people. Offloading the processing to a networking subsystem isn't going to hurt, especially with gig and 10gig.

Not that this is a new idea. It's been done for donkey's years.

Parent Share
twitter facebook
Re:Interesting (Score:1, Insightful)

by Anonymous Coward writes: on Monday February 21, 2005 @05:22AM (#11734377)

Patch your OS, it should not crash due to high load, ever.

Parent Share
twitter facebook
Re:yeah great (Score:2, Insightful)

by yem ( 170316 ) writes: on Monday February 21, 2005 @06:30AM (#11734583) Homepage

I didn't know whether to mod you interesting or funny :-)

Parallelism is great. Look the way things are going. Dual CPU motherboards, Dual core CPUs, Cell..

And gnome.. sheesh.. back when I ran a P100 and Gnome was slow, I thought "well one day I'll have a 500Mhz monster and Gnome will be fast". Here I am with a P4-2.6Ghz/1Gb and Gnome is STILL a dog. *sigh*

Parent Share
twitter facebook
Re:White elephant? (Score:5, Insightful)

by Uhlek ( 71945 ) writes: on Monday February 21, 2005 @07:10AM (#11734717)

Comparing the two is completely valid when you're discussing the benefits of task-customized hardware and general purpose computing. Are there limitations where a hardware-based TCP/IP stack will be useful in the desktop/server market, yes, of course there is. But for high-bandwidth applications, I can assure you that offloading the TCP/IP overhead onto an ASIC will not only give you better performance, but also free up primary processor time for other applications.

Also, Catalyst switches are not highly parallel. They can be parallel, depending on the exact model and configuration, as well as the exact path inside the switch that the traffic takes, but it's not even remotely the same in execution as having "hundreds of linux routers side by side."

Instead, it is the exacting way in which the various components of the switch pass data, the very specific purpose of each chip and circuit in the device that gives modern routers the speed they do. Special components such as content-addressable memory, tertiary content addressable memory (memory that allows you to store 0s, 1s, and wildcard values instead of just 0s and 1s, allowing for wire-speed match comparisons against ACLs and routing tables), etc. etc. It isn't merely a stack of GP CPUs all running in parallel to achieve a particular task.

Systems guys often mistake routers and switches for computers with a bunch of Ethernet jacks. They're far from it. They are highly specialized pieces of hardware designed from the bottom up to do one thing and do it well -- transport data. Computers are the opposite. They're designed from the bottom up to be able to do whatever you wish them to as fast as possible, but that flexibility comes with a price.

If you ever get the urge, you should read up on Catalyst switching architecture. You'll find it quite interesting.

Parent Share
twitter facebook
Re:White elephant - flawed logic (Score:3, Insightful)

by morzel ( 62033 ) writes: on Monday February 21, 2005 @07:16AM (#11734735)

Using the same logic, machines with two (or more) CPUs wouldn't be useful, since the second CPU is not going to be any faster in than the first one.
With all due respect to Mr. Tannenbaum, but if he stated what you put in your post, his logic is severely flawed.
Let's compare the general CPU/networking CPU combination with a manager/secretary.
The manager has a number of tasks which needs to be done, including scheduling a number of appointments. Without a secretary, he'll be obliged to call/contact the people involved, wait for their responses and note the scheduled appointments in his calendar. Once that is done, he can go about with his other tasks.
When that manager has a secretary, he can just tell the secretery to make the appointments and notify him when they're done. That secretary isn't going to be any faster in time making those appointments (still has to call the same people); but in the mean time the manager can start working on something more useful (in theory).
While the secretary may not be that much faster at scheduling appointments (she probably is, since she knows how to deal with this and who to contact a lot quicker and in a more structured way than the manager), the end result is that the manager can get more work done because he delegated some of it to the secretary.
Note for the Politically Correct: feel free to swap he/she where approriate.

Parent Share
twitter facebook
Re:Ethernet controllers (Score:5, Insightful)

by Matt_Bennett ( 79107 ) writes: on Monday February 21, 2005 @08:25AM (#11734974) Homepage Journal

The critical aspect you leave out is that Gigabit ethernet is (inherently) Full Duplex. That means that that a 32/33 PCI bus would be saturated at a gigabit out, but have no bandwidth for anything incoming.
In truth, a gigabit ethernet card can saturate a 1X PCI-E link (2Gb/s after the 8B/10B encoding is removed), when sending small packets- basically due to packet overhead.

Parent Share
twitter facebook
This old bit of snake-oil... (Score:5, Insightful)

by Ancient_Hacker ( 751168 ) writes: on Monday February 21, 2005 @08:33AM (#11734996)
The nightmare continues. It goes something like this: Some drooling "computer scientist" is too dumb to do anything useful, so they speculate" "Wouldnt it be nice to free up this $XXXX CPU from this humdrum task (choose: moving bits/bytes/pixels/ or packets)". He finds a brain-addled silicon-stuffer to design a chip to do just that. All rejoice at the increased efficiency.
Except:
- The silicon-stuffer only has access to the slow processes of maybe two silicon generations back, unlike the CPU which paid for the latest whizzy xx picofurlong process. So the supposedly whizzy chip is still not particularly faster than the CPU.
- The whizzy chip shows up late, just about when the associated CPU is going to take a 2x speed hike.
- The chip is on the I/O bus, requiring many slow I/O cycles, with interrupts masked, to get its commands.
- Said whizzy bit-banger doesnt have any software support from the main operating systems.
- The silicon-etcher guy can't write english worth a damm, so nobody can understand the spec sheet.
- And oh, he didnt know the bus was active-low, so all the data packets have to be inverted.
- And sometimes byte-reversed too.
- The chip designer doesnt know or care about the whole system, so the chip does several things that spoil the overall performance, like hogging the bus, saturating the bus snoop logic, poisoning the cache, interrupting too often, etc.
- The droolers forgot to think about the multi-processor option, so the chip doesnt share well with multiple CPU's.
- The chip is all hard-wired gates, so there's no way to fix the problems.
Finally some software wizard finds a way of speeding up the code that runs in the CPU so it's now faster than the separate chip, so the chip is now useless and just an extra power waster.
We've seen successive waves of this concept, none of them have had much success. Graphics processors are one partial exception, and it took almost a decade of mis-designs of those before they became stable enough to be usable.
Share
twitter facebook
You speak in jest, but... (Score:3, Insightful)

by leonbrooks ( 8043 ) writes: <SentByMSBlast-No ... .brooks.fdns.net> on Monday February 21, 2005 @11:07AM (#11735847) Homepage

...the orignal IBM PC put a processor in the keyboard and another (dumb) processor on the motherboard to talk to it.

This USB keyboard I'm typing on involves at least three processors, one to scan the keys, one to do the USB on the peripheral side and the third to do the USB on the motherboard side.

Parent Share
twitter facebook
Re:White elephant - flawed logic (Score:3, Insightful)

by morzel ( 62033 ) writes: on Monday February 21, 2005 @02:29PM (#11737574)

This is the problem which faces networking processing. Any given thread which performs network I/O will be executing on a single CPU.

In the purest form, it would be like that: one single thread that does not gain much from the offloading. However: have you checked just how many threads are actually running on PCs nowadays? You specifically say 'more tasks can be done concurrently'... isn't this exactly the point of offloading?
Next thing you know, the difference between SCSI and IDE are moot because 'for one thread it won't make that much a difference since you'll end up waiting for the data to come of the platters anyway'

To consider your analogy, if the manager has only one task to do, and needs the other person his secretary calls to respond before he can continue, there's very little point having a secretary make the call for him. He's going to be stuck waiting till the reply comes through anyway.

There are just not many managers around nowadays that just have one task to do...

To take the problem to an illustrative extreme, we could in theory have a multitude of slow CPUs which the main zippy CPU offloads everything to; graphics, network, disk, etc.

Why would you think that a network processor would be slower? Just due to the fact that it is a specialized processor you can count on it that it'll do TCP checksumming and all that stuff a lot faster than most (if not all) general purpose CPUs. On top of that, you won't get interrupts/context switches for bad packets...
While this all may not seem much, this is definitely a performance improvement for the system as a whole.

Parent Share
twitter facebook
Re:White elephant - flawed logic (Score:2, Insightful)

by Bill_the_Engineer ( 772575 ) writes: on Monday February 21, 2005 @04:15PM (#11738570)

OK I'll bite...

The problem with Toby's argument is that he is fixated on the speed of the CPU. It doesn't matter how much slower or faster the Network CPU is compared to the Main CPU. It is more important to have the Network CPU fast enough to handle to I/O requirements dictated by the network architecture.

With L2 cache and DMA being the norm now a days, I don't see what the problem is. Sure the Main CPU will stall if the cache needs to do fetch something from main memory, but hardware can be adjusted to take these possibilities into account.

Having processors dedicated to tasks, frees the CPU to handle any other tasks on its agenda. I see a network ASIC being able to receive the data payload ready for transmission, and do its thing until it interrupts the CPU to report it is done.

Also, the cpu would not have to wait for the network transmission to complete before sending more data. The network device would keep accepting payloads until the buffer was full.

While the Graphics Card is a good example, a better example would to look at the FPU. Floating Point Arithmetic is more CPU intensive than integer. To speed things up, the CPU submits the desired computation to the FPU and the FPU notifies the CPU when the calculation is complete.

Then there is the other omission made by Toby, the bus does not have a 1:1 speed ratio with the CPU. With this in mind and using Toby's logic, the ASIC would only have to match the bus speed not the CPU's.

Toby keeps mentioning why pay for a dedicated CPU when expensive CPU you have can handle the task. I think most engineers would ask why tie up an expensive CPU when a dedicated CPU can do the task.

In other words, lets free our expensive CPUs to perform general computational tasks by off loading some of the mundane labor to dedicated ASICS.

I will say Toby is correct with one thing. In a personal computer, I don't see the advantage to the Network ASIC (other than API), since the CPU is idle most of the time anyway.

However, in Intel's target market. I would like to have the CPU perform the application logic and offload the networking to dedicated processors. The idea being that if more headroom to the CPU is possible with the Network ASICS, I could see an increase to the maximum number of transactions per second. This increase could be just enough to keep me from investing in another blade or even another server.

Then again.. I may need more sleep.

Best Regards,
Bill

Parent Share
twitter facebook
Re:White elephant? (Score:3, Insightful)

by aminorex ( 141494 ) writes: on Monday February 21, 2005 @09:20PM (#11740801) Homepage Journal

The IO processor can be made to do the task much faster than the CPU, because it is not a general-purpose chip. It implements in hardware what the CPU would implement in software. As a result, it costs much less to produce. These are the same considerations that apply to graphics pipelines. It would be grossly economically infeasible to implement the functions of a high-end GPU on the CPU, in part because it's on the wrong end of a bus.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Intel Develops Hardware To Enhance TCP/IP Stacks 271

Intel Develops Hardware To Enhance TCP/IP Stacks More Login

Intel Develops Hardware To Enhance TCP/IP Stacks

Interesting (Score:5, Insightful)

Re:White elephant? (Score:1, Insightful)

Re:Ethernet controllers (Score:5, Insightful)

Re:the good, the bad, the ugly? (Score:4, Insightful)

Re:Interesting (Score:3, Insightful)

Re:Qlogic TOE cards (Score:1, Insightful)

Re:White elephant? (Score:3, Insightful)

And the CPU doesn't have other things to do? (Score:4, Insightful)

Re:Interesting (Score:1, Insightful)

Re:yeah great (Score:2, Insightful)

Re:White elephant? (Score:5, Insightful)

Re:White elephant - flawed logic (Score:3, Insightful)

Re:Ethernet controllers (Score:5, Insightful)

This old bit of snake-oil... (Score:5, Insightful)

You speak in jest, but... (Score:3, Insightful)

Re:White elephant - flawed logic (Score:3, Insightful)

Re:White elephant - flawed logic (Score:2, Insightful)

Re:White elephant? (Score:3, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot