Carrier-Class IPSec: the Bigger the Better

VPN Test Highlights:* Max t-put: 965.8 Mbit/s* Min latency: 19.2 ms* Thumbs: two, up

June 5, 2002

24 Min Read
Light Reading logo in a gray background | Light Reading

Managed security services are hot right now, and carriers have plenty of products to choose from. So which boxes are best for building scaleable, managed VPN services?

To find out, Light Reading worked with its testing partners, Network Test Inc. and Spirent Communications, to see which products are ready for the rigors of carrier-grade virtual private network service.

We asked vendors to supply IPSec-based products that would scale to securely support thousands of customers, move traffic into the gigabit range, and offer easy provisioning and management of customer circuits.

Turns out that was a little too tall an order.

We chose IPSec among the various VPN technologies available today because the alternatives simply aren’t suitable for managed security services.

Multiprotocol Label Switching (MPLS)-based VPNs and the Internet Engineering Task Force (IETF)'s MPLS Martini extensions offer a variety of benefits, but security isn’t one of them. Neither provides authentication or encryption, which are bedrock functions required to ensure data integrity and privacy. The Layer 2 Tunneling Protocol (L2TP) does authenticate users, but it’s mainly intended for dial-up links, and it doesn’t offer encryption or verify that data hasn’t been altered in flight.

In contrast, the IETF’s IPSec suite does provide strong security; even so, finding carrier-class products can be a challenge. To begin with, most IPSec gateways are intended for CPE (customer premises equipment) use and these won’t scale anywhere close to carrier-class levels.

Several vendors say they do offer carrier-class gear, but when it came time to put up equipment for testing, most – including Cisco Systems Inc. (Nasdaq: CSCO), Lucent Technologies Inc. (NYSE: LU), and Nortel Networks Corp. (NYSE/Toronto: NT) – proved awfully shy. (See: No Shows.)

In the end, only two vendors were willing to put their carrier-grade boxes to the test: NetScreen Technologies Inc. (Nasdaq: NSCN), a newly minted public company; and Quarry Technologies Inc., a startup.

We put both vendors’ IPSec gateways through a grueling set of tests, and both came up aces. While most vendors were busy hiding, the NetScreen and Quarry devices set new speed records: Both ran at Gigabit Ethernet line rates in at least some of our tests. Both scaled to support thousands of concurrent tunnels. Best of all, both delivered essentially the same performance with one secure tunnel and thousands active.

The throughput results are especially noteworthy, considering most CPE-based IPSec gateways can’t even run one tenth as fast. Even though these devices perform the most highly compute-intensive tasks imaginable, they manage to crank along at line rate while still providing strong security.

Picking a winner wasn’t easy. Quarry’s iQ series gateways delivered higher throughput in most tests, and offer full redundancy of components and an intuitive, powerful management platform. But the Netscreen-5200 is no slouch either. It set up far more concurrent tunnels than Quarry’s iQ, and the configuration we tested costs less. If we had to pick one, we’d give the nod to Quarry’s iQ, but either is up to the task of carrier-grade IPSec service.

The following report provides an in-depth account of what we tested, how, and what the results were. A hyperlinked index follows:

  • Inside the CO

  • Lies, Damned Lies, and Vendor Specs

  • Frames and Fragmentation

  • Speed Demons

  • Delay Tactics

  • Scaling Up

  • No Sweat

  • Grace Under Pressure

  • Management Material

  • Keeping Tabs

Alternatively, feel free to download an archived version of the May 23rd Webinar in which Light Reading shared the results of the test, by clicking here:

David Newman is president of Network Test Inc. (Westlake Village, Calif.), an independent benchmarking and network design consultancy. Network Test’s clients are end-users (enterprises and service providers), trade publications, and industry consortia; the company does not accept testing commissions from equipment makers.

Although managed security services can take many forms, much recent interest has focused on carrier-based IPSec services. Here, instead of deploying dozens or hundreds of CPE gateways, the carrier deploys one gateway at each central office and serves one or more (usually many more) customers from that gateway.

This greatly simplifies management and billing for the service provider, since there are far fewer gateways to look after. On the downside, there’s an implicit assumption in this approach that the network from the customer’s site to the gateway is already secure. While private-line and Frame Relay tail circuits have a much better security track record than the public Internet, they’re not perfect. (It’s well beyond the scope of this article to determine what vulnerabilities exist in a managed security service, but it is something that designers of such networks – and their customers – need to ask about).

Another potential drawback of CO-based VPN designs is that the gateways are potentially a single point of failure. Both NetScreen and Quarry address this concern with redundant and hot-swappable components (see Table 1).

Table 1: Selected Vendors of Carrier-Class IPSec Equipment

Vendor

NetScreen Technologieshttp://www.netscreen.com

Quarry Technologieshttp://quarrytech.com

Product/software version tested

NetScreen-5200\ScreenOS 3.1.0 (n5000[1].310a1.2)

iQ4000 Service Edge Switch\2.0.2

Interfaces supported

Gigabit Ethernet1

ATM: DS3, OC3c, OC12c; Ethernet: 10/100/1000 Mbit/s

Maximum 10/100 Ethernet interfaces per chassis

241

56

NEBS certified

Yes

No2

Redundant components

Power supplies

Power supplies, route processor, switch fabric

Hot-swappable components

Fan, power supplies

Line cards, management modules, fans, power supplies

NAT/PAT

Yes/yes

No/no3

Message authentication

On session startup

On every packet

Protocols used in tunnelling

DES, 3DES, AES, L2TP

ATM VCs; 802.1Q VLANs

IP routing protocols

None4

BGP, OSPF, RIP

Interoperates with certificate authorities

Yes

No

Radius/LDAP authentication

Yes/yes

Yes/no

Other services

Firewall, bandwidth management, denial-of-service protection

Firewall, bandwidth management, virtual routing

List price/price as tested

$99,000/$99,000

$73,485/$129,980

Management software list price

$5,995

$15,000



Quarry’s iQ has an advantage in this regard, as virtually all removable components – including line cards, power supplies, and management modules – can be hot-swapped without a reboot. Then again, the Netscreen-5200 has already earned NEBS certification – a series of tests that involve burning, freezing, electrocuting, and generally beating the hell out of a device to ensure it’s suitable for use in carrier networks. Quarry’s iQ was undergoing similar tests at press time.

A key difference between the NetScreen and Quarry products is their intended use: NetScreen’s device is a purpose-built VPN gateway, while Quarry’s product is a switch/router that happens to support IPSec. There are important pros and cons to each approach. Quarry’s gateway supports more protocols and offers a wider range of interfaces, including serial and Asynchronous Transfer Mode (ATM). Then again, NetScreen’s 5200 is part of a much broader product line that includes gateways for anything from small-office to central-office use – and all of these run similar software.

We evaluated VPN gateways in terms of performance, management, features, and price. To assess performance, we used a variety of metrics, including throughput, average latency, concurrent tunnel capacity, and failover time. Even though these are all well-understood metrics, it’s often difficult to get straight performance data from vendors. There’s good reason for the obfuscation: Throughput, probably the key metric when it comes to assessing IPSec gateway performance, often takes a huge hit because of all the processing required for encryption and authentication.

To hide the damage, vendors tout throughput numbers obtained in less-than-stressful network situations. It’s easier to achieve high throughput by using larger packets and weaker encryption and message authentication, thereby putting less of a strain on the gateway. As always, it’s a good idea to ask vendors how, exactly, they obtained their performance numbers.

Another trick is to publish the highest rate at which a device forwards traffic, even if that rate is accompanied by serious packet loss. That’s an abuse of terminology: The IETF’s RFC 1242 clearly defines throughput as a zero-loss metric.

To ensure that our tests were adequately stressful (not to mention RFC-compliant), we set up a test bed with two gateways linked via Gigabit Ethernet (see Test Methodology). Then we pounded the devices with traffic from a Spirent SmartBits analyzer with gigabit Ethernet interfaces.

We configured the SmartBits to generate a variety of frame sizes, two of which have special meaning for IPSec. Large Web downloads and file transfers typically use 1,518-byte frames, the largest allowed in Ethernet, but IPSec presents a problem in that these maximum-size frames must be fragmented, slowing throughput and increasing latency.

In its encapsulating security payload (ESP) mode, IPSec encrypts the original IP packet and wraps it in a new, larger packet. This works fine for smaller packets, but what happens when a packet is already at the maximum length?

The only possible way to make an ESP packet in this case is to break the original in two and transmit the fragments. As we’ll see, fragmentation has a serious impact on throughput.

We also generated 1,400-byte Ethernet frames – big enough to represent long-frame handling, but not large enough to force fragmentation.

As with any Ethernet test, we also used the minimum size allowed – 64 bytes. The importance of minimum-sized frames is twofold: First, because every TCP/IP packet (regardless of length) requires a short acknowledgment message in return, 64 bytes is the most commonly found length on many Internet Protocol (IP) networks. Second, because there are more short frames than long ones for a given unit of time, short frames put more stress on the device under test. (Unlike most other aspects of life, stress is a good thing in a lab test.)

The final size we offered was 256 bytes. Internet traffic studies by the Cooperative Association for Internet Data Analysis (CAIDA) and others have pegged average packet lengths at somewhere between 200 and 400 bytes. We used 256-byte Ethernet frames to approximate the Internet average.

So how did the two gateways fare with these various packet lengths? Suffice it to say that both set new speed records, in some cases filling a Gigabit Ethernet pipe.

In tests with one tunnel, Quarry was fastest at moving 64-, 256-, and 1,518-byte frames (see Figure 1). The most impressive result is Quarry’s handling of 64-byte frames: The iQ gateway moved traffic at 540.0 Mbit/s, close to the theoretical maximum rate for IPSec in ESP tunnel mode. This suggests the Quarry gateways can handle short frames – and transaction-intensive applications that use them, such as databases – with no throughput penalty.

16812_1.gifQuarry was also faster than NetScreen when handling 256-byte frames, our “average” size for Internet traffic. However, the gap between the two vendors’ gateways narrowed at this size.

When handling 1,400-byte frames, NetScreen’s 5200 delivered the highest single throughput number in our test. On a gigabit Ethernet pipe, NetScreen’s 5200 moved data at 965.8 Mbit/s, right up at the theoretical limit when factoring for ESP tunnel-mode overhead. Truly, NetScreen filled the gigabit pipe.

Quarry topped out at 922.6 Mbit/s with 1,400-byte frames. The company’s engineers confirmed this to be the top speed of the current product.

The results with 1,518-byte frames – which IPSec breaks into fragments, thereby degrading throughput – represent one of the major differentiators between the NetScreen and Quarry boxes. Quarry’s throughput of 875.3 Mbit/s isn’t far from its result with large unfragmented frames. With NetScreen, it was a different story: Throughput dropped to just 276.6 Mbit/s.

These differences are most significant for applications involving bulk data transfer, like backup or imaging. But because TCP always tries to make datagrams as big as it can, there will be a marked difference for even the occasional file transfer.

Quarry achieved its superior throughput numbers by trading off higher latency. Basically, Quarry used larger buffers to ensure it would forward more packets without loss. In tests with every frame size, Quarry’s devices delayed traffic far more than NetScreen’s. However, the difference isn’t meaningful for even the most delay-sensitive applications.

When handling 64-byte packets, a pair of NetScreen’s 5200s added an average of 19.2 microseconds of delay, compared with 360.7 microseconds for a pair of Quarry’s iQ devices (see Figure 2). NetScreen’s number is down near the latency added by the Extreme Networks Inc. (Nasdaq: EXTR) Summit 7i, the gigabit LAN switch we used as part of our testbed infrastructure. (The VPN gateway latencies we present here do not include delay added by the Extreme switch.)

16812_2.gifLower latency is certainly desirable, but in this case the difference between devices just isn’t meaningful. The point where delays start to degrade application performance is up in the milliseconds – at least one order of magnitude greater than even the highest average latency number we recorded. For carriers looking to deploy VPN gateways separated by hundreds or thousands of miles, the speed of light is likely to be a more significant contributor to latency than either of these devices.

NetScreen did beat Quarry in all the average latency measurements, usually by a wide margin. The only case where the two devices were close was in handling 1,518-byte frames. Here, the NetScreen device had to buffer packets during fragmentation and reassembly, and that added considerably to its delay.

Quarry acknowledged that its device has large buffers, and this is what contributed most to the high average latency numbers it posted. Latency could become an issue for Quarry if its device is used as a conventional router rather than an IPSec security gateway. In that case, the cumulative delay added by a network of many Quarry devices could grow quickly unless buffer sizes were reduced.

For IPSec products to be interesting to carriers, they must scale to support thousands of customers. This is a key difference from enterprise-class IPSec gateways, which typically handle no more than a handful of secure sessions – or “tunnels” – at a time.

In discussing scaleability, we should note that there is considerable confusion as to what constitutes an IPSec tunnel. A conversation over IPSec actually involves three secure sessions: one to exchange keys and other security parameters, and then a pair of one-way sessions to securely exchange data.

Vendors often double-count the pair of one-way sessions in their data sheets. This is technically accurate – after all, a device does have to keep track of two sessions – but it can be misleading, since both sessions are required for the exchange of data.

We define the initial handshake to exchange keying information (called Phase 1 in IPSec-speak) and the pair of one-way sessions (called Phase 2) over which devices exchange encrypted, authenticated data as constituing one tunnel. Counting all three handshakes as one entity is a stricter accounting method – and it’s also more representative of the way the gateways are used.

To assess scaleability, we not only attempted to establish a large number of tunnels but also verified that the tunnels could do actual work. In essence, we found the maximum number of tunnels we could set up, and then reran the throughput and latency tests at that limit. Ideally, the single-tunnel and maximum-tunnel numbers should be identical.

We were able to set up 10,000 concurrent tunnels through NetScreen’s 5200, compared with only 4,000 for Quarry, in our configuration. While NetScreen aced this test, Quarry pointed out that its device can scale to larger numbers of tunnels in configurations different than the one we used.

In our test setup, we modeled a carrier offering a managed IPSec service to one very large customer, such as a bank with a headquarters office and many branch offices. We used two gateways in this test, putting the headquarters office behind one and asking vendors to see how many branch offices could reside behind the other. The goal of the test was to determine just how many branch offices, each with its own tunnel, the equipment could handle.

While NetScreen had no trouble with this configuration, Quarry ran into two issues. First, the Quarry device supports a maximum of 8,000 one-way sessions (or “security associations,” to use the IPSec jargon) per interface. In our configuration, with just one hypothetical site on one side of the test bed, Quarry was able to build only 4,000 tunnels, since each involved a pair of one-way sessions. Quarry demonstrated a different configuration with 4,000 sites on each of two interfaces in which it set up 8,000 tunnels.

Quarry claims it has set up as many as 30,000 concurrent tunnels using a fully loaded chassis, but we did not verify this. Hence, in our configuration Quarry’s capacity number is 4,000.

Second, Quarry requires a unique pair of Layer 2 and Layer 3 addresses to uniquely identify each customer. Most IPSec gateways don’t require Layer 2 addresses to be unique per subscriber; indeed, as a Layer 3 technology, IPSec doesn’t know what Layer 2 transport it runs over.

Quarry’s requirement could pose a problem in network designs where there is a router between customers and the Quarry device. Because IP routers rewrite the Layer 2 source MAC address of every Ethernet frame they forward, the unique customer identification would be lost. In such cases, Quarry recommends use of 802.1q VLAN tags to identify customers.

The good news for both NetScreen and Quarry is that throughput numbers with maximum tunnels are nearly identical to those with just one tunnel active (see Figure 3). Both vendors’ gateways forwarded frames at all lengths at rates that were only very slightly below the single-tunnel case.

16812_3.gifLatency numbers with maximum tunnels were similarly indistinguishable from the single-tunnel case (see Figure 4). Here, delay was virtually identical to the single-tunnel numbers in almost every case. The notable exception was NetScreen’s handling of 1,518-byte frames, where latency actually improved by more than 100 microseconds. As in the single-tunnel results, however, the differences in latency we recorded here aren’t meaningful in terms of impact on application performance.

16812_4.gifThe lack of a difference between single-tunnel and maximum-tunnel cases is attributable to VPN gateway architecture. Like most routers and switches, these devices have a management module that handles “slow path” tasks like IPSec tunnel establishment and firewall rules enforcement. Once a tunnel is set up, the device moves packets destined for that tunnel’s far end over its “fast path” that’s assisted with silicon rather than software.

High availability is a key requirement for service providers. Obviously, if a device goes down the carrier can’t deliver its service – and can’t collect revenue from that service.

Vendors of carrier-grade VPN gateways say they address uptime requirements through a variety of failover mechanisms. These mechanisms keep track of the state of a device’s interfaces and the IPSec sessions active on those interfaces. If an interface or a device goes down, the failover mechanism will renegotiate new IPSec sessions on a backup device.

To see how well these mechanisms work, we set up a test involving three gateways – a primary, a secondary, and a receiver (see Test Methodology again). We offered traffic to the primary gateway destined for sites behind the receiver. Then we disconnected a cable connected to the primary gateway, forcing traffic to be rerouted through the secondary gateway. Pulling the cable forced rekeying and re-establishment of IPSec sessions, and we measured how long this took.

Quarry’s iQ device set up a new connection more quickly than NetScreen’s 5200, doing so in around 2 seconds compared with around 3 seconds for NetScreen (see Figure 5). While Quarry’s box is clearly faster, neither device delivers the subsecond response that is typically required to avoid users noticing a hiccup.

16812_5.gifWhile our tests showed that failover works as advertised, it also uncovered a couple of potential drawbacks. First, VPN failover schemes are proprietary; it isn’t possible to fail-over sessions between different vendors’ gateways.

Second, some gateways’ failover mechanisms require an “active/passive” configuration in which a second device sits idle until a failure occurs. Especially with today’s tight capex budgets, it’s preferable to use an “active/active” configuration in which both primary and secondary devices handle traffic at all times and share the load as long as they’re both up.

In our tests, Quarry supported the active/active configuration and NetScreen did not. However, NetScreen says the 5200 will support active/active failover before the end of the third quarter of 2002.

For most service providers, management of devices and customer services is at least as important as performance metrics. We assessed the gateways in this test on a variety of management capabilities – including monitoring security, provisioning and policy management, and log management (see Table 2).

Table 2: Managing Carrier-Class IPSec Gateways

Secure monitoring

Netscreen

Quarry

Communications between gateways and management console are encrypted

Yes

Yes

Encryption used

3DES

3DES

Communications between gateways and remote management software are encrypted

Yes

Yes

Encryption used

3DES

3DES

Policy management/provisioning

Ease of tunnel provisioning

N/A1

excellent; wizard-based

Ease of remote security gateway configuration

N/A1

excellent

Hierarchy of management roles

N/A1

very good; access per customer, not individual users per customer

Change accounting capabilities

N/A1

good; shows objects but not what has been done to them

Other capabilities

Customer �template� for rapid provisioning

N/A1

Yes

Multiple customers� policies reside in single data store

N/A1

Yes

View all of a given customer�s network from management platform

N/A1

Yes

View part of a given customer�s network from management platform

N/A1

No

Maximum number of devices managed by single data store

N/A1

System-dependent

Maximum number of policies managed by single data store

N/A1

System-dependent

3.2.3.3 Log management

Note ability to count packets entering/leaving individual tunnels in real time

No

Yes

Note ability to count packets dropped on individual tunnels in real time

No

Yes

Note ability to count bytes entering/leaving individual tunnels in real time

No

Yes; but not in aggregate

Note ability to log fault mode for packets dropped on individual tunnels in real time

No

Yes; CLI only

Note capability for dedicated log server (not simply Unix syslog)

Yes

Yes

Note capability for log server redundancy

Yes

Yes

Note capability of automation of log rotation and archiving

No

No

Note capability for log archiving/retrieval per customer

Yes

No

Note capability for log archiving/retrieval per customer site

Yes

No

Note capability for log archiving/retrieval per customer tunnel

Yes

No

Note back-end export formats (OSS, Oracle, etc.)

Oracle, Crystal Reports

Oracle



Of course, secure access to the gateways for management is a must, and both vendors support this. Both vendors use 3DES to encrypt communications between the gateway and a management console or a remote client (such as software running on a notebook from a network manager’s home). We verified that remote communications were encrypted by capturing management traffic with a protocol analyzer.

NetScreen scored “Not Applicable” in all of our policy management and provisioning tests. The 5200 is a new product, and it isn’t yet supported by NetScreen’s Global Pro management software. The vendor says Global Pro will support the 5200 by the end of the third quarter of 2002.

Quarry demonstrated its iQ Service Management Suite software, which generally did quite well with the policy and provisioning management tasks we attempted. Setting up a new tunnel was very easy with a wizard-based template in the management software. We also found the product’s remote device configuration screens to be simple and straightforward.

The iQ Service Management Suite allows some hierarchy of management roles, such as giving customers the ability to read and write configuration data about their networks. However, the management software does not support a hierarchy of management roles within a given customer’s organization. For example, it isn’t possible to give net managers at a customer’s headquarters office a view of the entire customer network, while giving employees at the customer’s branch offices access only to devices in their offices.

Quarry’s management software also logs changes in configuration, but here its display is less than helpful; when a change has been made to an object (which could be a gateway, a subscriber, or a set of either one), the software displays the object but doesn’t say what’s been done to it. Both Quarry and NetScreen’s forthcoming management software use Oracle Corp. (Nasdaq: ORCL) databases as their data stores. Disk space is essentially the only limit on the number of devices and policies the data stores can hold.

Logging is critical to service providers. The obvious application for logs is billing, but there are other uses too, like troubleshooting and capacity planning.

We assessed both vendors’ offerings in terms of the breadth and depth of logging information they stored. Unlike provisioning and policy management, we were able to compare both vendors on logging: Both vendors’ devices can store logs locally or on a back-end database.

One major differentiator is that Quarry’s gateway logs data on traffic for specific tunnels and NetScreen’s doesn’t. This includes both packet counts and dropped packets on a per-tunnel basis. NetScreen does display the number and names of active tunnels, but doesn’t report traffic statistics on a per-tunnel basis. Curiously, Quarry does not report statistics on the aggregate of all tunnels on a given interface; this capability, which is offered by NetScreen, is useful in capacity planning.

For dropped packets in a given tunnel, Quarry’s logs also note a fault mode, stating why a drop occurred. Such information can be useful in troubleshooting configuration errors or flaky circuits.

Both vendors can offload log data to one or more dedicated log servers. That’s important for service providers managing dozens or hundreds (or more) different gateways.

Neither vendor can automatically rotate or archive all logs at periodic intervals, but NetScreen does offer this capability on a per-customer basis. This could be a useful feature for giving subscribers periodic updates on their usage of a managed security service. NetScreen’s logging can report statistics per subscriber, per subscriber site, or even per tunnel.

Subscribe and receive the latest news from the industry.
Join 62,000+ members. Yes it's completely free.

You May Also Like