Stateful NAT64 Performance
EXECUTIVE SUMMARY: Cisco’s CRS-1 loaded with four CGSE cards successfully translated IPv6 traffic to IPv4 at 4 million translations per second. The same system scaled up to 78.4Gbit/s at a total of 67,107,840 translations with almost no loss.
While the industry embraces IPv6 now more than ever, it also recognizes that IPv4 services are not going away soon. The Internet is an obvious example where IPv4 addresses are going to be used for years to come. Cloud applications will use those addresses as well.
While data centers will have different IP migration strategies, they will likely look to serve both IPv4- and IPv6-based customers. Long-term strategies will include native IPv6 throughout the data center, but in the short term a complete IPv6 strategy might not be practical.
For this reason service providers and cloud operators are likely to find themselves needing to deploy Network Address Translation (NAT) from IPv6 users to IPv4 services (NAT64). Let's say an enterprise is building a brand-new large-scale office and wants to use unique IP addressing. The carrier could provide this adventurous customer with IPv6 addresses to use for internal hosts and servers. In order to communicate with the Internet, which at this point is still IPv4 heavy, the carrier could install a NAT64 device somewhere between the customer and their services to translate the IPv6 addressing to IPv4 before sending the datagrams to the Internet. Another example is the rollout of mobile services en masse using IPv6, to customers who still plan to access IPv4 services, including cloud services.
Cisco claimed to be ready for these scenarios -- delivering IPv4 services to IPv6 customers -- at scale. Since we have already reported results on Cisco's stateless NAT 64 capabilities we wanted to use this opportunity to verify Cisco's stateful NAT64 performance claims -- that by placing four Carrier-Grade Services Engine (CSGE) modules into a single CRS-1, we could scale up to 60 million NAT64 translations, at 4 million translations per second, all while transmitting up to 80Gbit/s of data.
Would any carrier need this performance? Probably not anytime soon, but we have learned that those who purchase large-scale core routers want to know that they can use their significant financial investment for a while.
Given the scale, we looked to verify each metric separately. Even with this divide-and-conquer approach, NAT can become complex to test. Cisco explained, and showed, that when their NAT64 implementation chooses an IPv4 address to map to an incoming IPv6 request, it is done at random. Now imagine manually configuring the tester for 60 million mappings, when all 60 million incoming requests are given random IPv4 addresses -- clearly this was not the way to go.
One alternative that we considered was to use stateful traffic using Ixia's IxLoad application, but emulating up to 60 million sessions would have required a significant amount of very high-performance test equipment -- again, not really a workable option. The solution we used involved Ixia's IxNetwork generating stateless traffic, with the appropriate TCP fields set to emulate a stateful session (TCP SYN/TCP ACKs). Since Cisco’s implementation randomly assigned TCP port numbers and IPv4 addresses to incoming IPv6 requests, we schemed to simply exhaust the entire pool of resources on the CRS-1. This way we were able to predict which addresses and ports would be used -- it would be all of them. If your head is spinning, we hope the following diagram will help.
To summarize, we sent client traffic from 1,024 IPv6 addresses -- each of whom opened 65,535 TCP sessions. In fact, this brought us to a total of 67,107,840 translations on the CRS-1. We sent traffic in return toward all 960 IPv4 addresses, each with all 65,535 TCP port numbers, as was configured in the CRS-1 pool. All traffic used IMIX frame sizes -- 122:7, 512:4, 1500:1 (106 in place of 122 on the IPv4 side) at a rate of 38.4Gbit/s toward the clients and 40Gbit/s toward the servers, all across four 10-Gigabit Ethernet links. Once the configuration was pre-staged and verified to be working, we could breathe a sigh of relief.
As we started the official test run we recorded only a small amount of loss -- 0.002 percent on eight of the 16 flows configured from the IPv6 emulated clients toward the IPv4 emulated servers. The other four of such flows ran with no loss, and no flows in the return direction observed any loss either. Considering that we had planned to only test 60 million translations rather than 67,107,840, the loss was considered very minimal. We also verified, using the CRS-1 Command Line Interface (CLI) that all expected translations appeared in the enormous translation table. We also measured latency. The maximum latency values were not very surprising given the translation work to be done by the CRS-1, but in general, given that the latency also included the seven other devices in the test bed, the average latency was quite low.
Next was performance. How quickly could these translations be built in hardware? Now that our test methodology was proven, we felt safe clearing the NAT table on the CRS-1. After doing so, we lowered all frame sizes to 150 bytes so we could increase the frame rate to 4 million frames per second -- 1 million frames per second on each of the four 10-Gigabit Ethernet ports. In order to add realism to the test we configured IxNetwork to randomly assign TCP ports to the IPv6 flows, so that they were not sequential. This however required that we also lower the total number of ports to 13,824, bringing the number of translations to 56,622,848 in total. We ran the test for two minutes without loss.
After some pretty long nights of some complex configuration, we had finally established a test that was able to verify the rate, translation capacity, and throughput of Cisco’s NAT64 solution. Impressive.
Next Page: IPv6 Rapid Deployment (RD) Performance
Previous Page: Intro: Cloud Intelligent Networks
Back to the Cisco Test Main Page