& cplSiteName &

Will Offload Chips Be Uploaded?

Light Reading
News Analysis
Light Reading
10/24/2002

SAN JOSE, Calif. -- The TOEs were tapping here at this week's Network Processors Conference.

TCP offload engines -- or TOEs, as they're affectionately called -- have become quite the buzz in the network processor community, particularly now that silicon is shipping from multiple vendors (see Vendors Chip Into IP Storage). These chips are targeted at the storage-networking market -- and any networking products handling Layer 4 and above.

TCP specialists claim that existing network processors aren't suitable for stateful protocols -- those in which a processor has to be aware of which data packets belong to which data streams. And the general-purpose processors traditionally used for terminating TCP connections won't be able to keep up as line speeds approach 10 Gbit/s.

The move to 10 Gbit/s might be stalling, "but that doesn't mean that this processing gap is going to go away," says David Lapp, CTO of Seaway Networks Inc., who feels the storage market is approaching density requirements that would swamp an ordinary microprocessor.

Seaway, which originated as a Nortel Networks Corp. (NYSE/Toronto: NT) project (see Nortel Folk Float to Seaway), chose to tackle the problem of sorting through packets for pertinent information, presenting only that information to content co-processors. In the case of SSL security, for example, the company's chip would isolate the SSL record headers, handling the job of finding those bits in their varying locations, and present just the headers to an SSL-processing chip.

"We've taken the kinds of things that take the most CPU cycles and put them in silicon rather than asking the processors to do that," Lapp says. The chip, called Streamwise, runs at 5 Gbit/s full duplex and can juggle 2 million Layer 4 sessions simultaneously, he claims.

Startup Astute Networks Inc. is tackling TCP termination more directly with a new chip called Pericles (see Astute Ships Storage Chip). Touted as a storage processor, Pericles terminates TCP as well as storage-specific protocols such as iSCSI. The device is programmable, allowing customers to run their own applications software for product differentiation.

Astute uses ten 32-bit, RISC-based microprocessor cores for protocol processing -- an architecture that's reminiscent of a "normal" network processor. The difference is that Astute adds a "flow state controller" that assigns each packet to the proper traffic flow. Packets are then sent to whichever processor core is handling that flow; a flow is never split between two of the processors, because "that can kill your performance," notes Fazil Osman, Astute chief technology officer.

iReady Corp. also debuted a TCP offload product this week, one focusing entirely on Ethernet. The company's EthernetMax is a single chip that combines TCP and iSCSI-processing hardware, based on iReady cores, with Gigabit Ethernet PHY chips and MAC chips from National Semiconductor Corp. (NYSE: NSM). Like Astute and Seaway, iReady is targeting 10-Gbit/s feeds.

IReady made overtures towards the storage market with a TCP/IP chip announced earlier this year (see Astute Ships Storage Chip), but EthernetMAX is targeted at more general enterprise traffic. Specifically, as Gigabit Ethernet proliferates in the enterprise, iReady expects that existing processors won't be able to keep up with the TCP offload requirements, because of the amount of encapsulation involved; packets can involve Ethernet embedded in IP embedded in TCP.

"You have to dig down three or four layers deep into the packet," says Mike Smith, CTO of iReady. "So there's a massive amount of fairly random packet access and cracking going on here.”

Some of the more established network-processor players are targeting existing devices at the TCP offload market as well. For example, officials at Fast-Chip Inc. noted that their PolicyEdge chip can be used to find the packets representing session starts and terminations, forwarding only those packets to a session processor, much as Seaway does.

— Craig Matsumoto, Senior Editor, Light Reading
www.lightreading.com

(4)  | 
Comment  | 
Print  | 
Newest First  |  Oldest First  |  Threaded View        ADD A COMMENT
skeptic
skeptic
12/4/2012 | 9:29:14 PM
re: Will Offload Chips Be Uploaded?


The expensive part of TCP isn't adding or
manipulating the TCP/IP headers, its in doing
CPU interrupts and moving data. If the data
is going to an application in the system via
TCP/IP, the data still has to come into the
CPU where the application is running.

You may be able to increase the granularity of
the data transfers between the offload device
and the application, but the data still has to
be moved from the offload device to the
application.

By my understanding, several years ago we
crossed the point where the CPU cycle overhead
of TCP/IP was a large performance issue.
And when that happened, the biggest issues became
memory copy speed and the I/O bus into the system.

And beyond that, TCP/IP is going to be difficult
to offload in a useful way. I suspect that the
people doing this are going to strip congestion
control and everything else they don't like out
of the offloaded TCP/IP.

Offloading checksums made sense. Offloading
all of TCP/IP doesn't make much sense to me.
MilanMerhar
MilanMerhar
12/4/2012 | 9:29:13 PM
re: Will Offload Chips Be Uploaded?
You are right that transfer of data to and from application memory can be a bottleneck. That's one reason why many of these TOE designs use a fast bus such as PCI-X and then try to optimize the efficiency of each host transfer.

One part of this optimization, and something that really requires much or all of TCP to be offloaded from the host, is elimination of repeated accesses to the same buffer, either on transmit (retransmission) or receive (fragment reassembly and out of order reception.)

I don't know of any TOE vendor who chose to ignore congestion control, etc. in their design. Every pitch I've heard has been for an "Internet Ready" TCP/IP implementation, including all the fussy bits the IETF wants to see done right. Different implementations may make different trade-offs as to what is in hardware and what's on an auxillary CPU, but it's all there somewhere.
achorale
achorale
12/4/2012 | 9:29:13 PM
re: Will Offload Chips Be Uploaded?
In order to get TCP hardware quacks in place, ....
skeptic
skeptic
12/4/2012 | 9:29:11 PM
re: Will Offload Chips Be Uploaded?
One part of this optimization, and something that really requires much or all of TCP to be offloaded from the host, is elimination of repeated accesses to the same buffer, either on transmit (retransmission) or receive (fragment reassembly and out of order reception.)
======================

In my opinion, that isn't something that should
be optimized because if either event is happening
(out-of-order reassembly, retransmissions), your
performance is going to be very suboptimal
no matter where you process the packet. Its
just not going to help to offload that activity.


======================

You are right that transfer of data to and from application memory can be a bottleneck. That's one reason why many of these TOE designs use a fast bus such as PCI-X and then try to optimize the efficiency of each host transfer.

=======================

There are difficulties in doing that in a way
that will really yield increased performance.
Featured Video
Upcoming Live Events
October 22, 2019, Los Angeles, CA
November 5, 2019, London, England
November 7, 2019, London, UK
November 14, 2019, Maritim Hotel, Berlin
December 3-5, 2019, Vienna, Austria
December 3, 2019, New York, New York
March 16-18, 2020, Embassy Suites, Denver, Colorado
May 18-20, 2020, Irving Convention Center, Dallas, TX
All Upcoming Live Events
Partner Perspectives - content from our sponsors
Sports Venues: Where 5G Brings a Truly Immersive Experience
By Peter Linder, 5G Evangelist, North America, Ericsson
Multiband Microwave Provides High Capacity & High Reliability for 5G Transport
By Don Frey, Principal Analyst, Transport & Routing, Ovum
All Partner Perspectives