Are You Really Getting 400 Gbit/s Performance?
Driven by continuing IP traffic growth, a new generation of 400 Gbit/s network processors is beginning to emerge. Such processors have been announced by a handful of vendors, with more to come.
However, scaling from 100 Gbit/s to 400 Gbit/s processors is not an easy feat. Recently, we witnessed some 400 Gbit/s network processor performance testing in a lab and were surprised by some of the results. Based on this data, we believe that there may be some wide variations between what is claimed on spec sheets and the real world performance that is achievable.
We believe that service providers evaluating new higher capacity systems must perform extensive testing on their own in order to ensure that the high-end systems perform as advertised under a variety of conditions that mirror real-world traffic mixes and traffic growth. Otherwise, service providers may be stuck with true performance that is far below the advertised 400G sticker. In fact, under certain conditions, true performance may not even be better than the legacy routers that are being replaced.
One important test is the packet sweep test. The packet sweep tests router performance over a spectrum of different packet sizes, from small (64 bytes) to large (1,500 bytes or greater), in order to ensure consistent performance across the full range of incoming packet variants. It is somewhat counterintuitive, but the greatest demands on the processor occur at the smallest packet sizes. The reason is that small packet sizes force greater amounts of table look-ups per second, and this burdens the processor.
The test we reviewed showed a 400 Gbit/s processor consistently dropping packets at all sizes, both small and large. As the packet size increased, the performance did not improve. On a few occasions, packets dropped by as much as 50 percent, and we never saw more than 80 percent throughput at any point in the test.
A second test, known as the Internet mix (or IMIX) test, was reviewed. Just as there is no typical packet size, there is no typical service provider IP traffic mix. It varies greatly from provider to provider. The test we reviewed used a couple of different real-world IMIX samples provided by service providers, based on their network scenarios. The results showed poor performance across a range of IMIX profiles. The processor appeared to suffer from problems related to storing packets for lookup, which resulted in severe performance impact on all traffic types tested. Significantly, when comparing two generations of processor performance, the new 400 Gbit/s processor demonstrated anywhere from 20 percent to 50 percent less performance based on serviceable bandwidth.
A third test measured the performance of a processor while enabling service level agreements (SLAs). Here, the results showed that enabling SLAs on connections reduced the actual throughput of the processor from 400 Gbit/s to 100 Gbit/s, meaning that only 50 percent of the advertised 100 GigE ports would be available with SLAs turned on.
The tests above are tied closely to the conventional views of scale, meaning the ability to handle more and more bits through the processor and on the network. As the Internet evolves from person-to-person communications to the machine-to-machine dominated Internet of things, another component of scale is becoming increasingly important: the ability to handle more and more flows through the network.
For next-generation routers, this means they must not only process massive amounts of bits, but they must also be able to process massive numbers of flows. Routers equipped with 400 Gbit/s processors must be tested in their abilities to handle tens of thousands of flows. As a rule of thumb, a 10 GigE port will typically serve 500 customers. If we assume just four classes of service per customer (a low assumption), this creates 2,000 different flows per 10 GigE port. With 20 x 10 GigE ports per line card (the state-of-the-art in existing designs), this creates 40,000 different flows per card. As we move to an Internet of things, it is very possible that these next-gen routers could hit performance limits on supported flows long before they hit their maximum capacities in bit/s. It is another dimension of scale that must be accounted for and tested in evaluating core routers.
The final point we will touch upon is efficiency in power consumption and footprint. It is well understood that these opex factors are of critical importance to service providers. In some regions, where power costs are well above global averages or where equipment is deployed in dense urban areas where space is severely limited, power and space requirements may make or break a buying decision. Even in the US, however, we have had discussions with large service providers who place power and space at the top of their lists.
Here again, service providers need to dig deeper than the power and space requirements that are advertised. To get a true understanding of space and power, these advertised specs must be placed in the context of the overall performance of the system -- as determined by packet sweep tests, IMIX tests, flow limitations, and any other tests that are performed on the system.
For example, if performance testing and evaluations show that a system will perform at 80 percent of its capacity limits under real-world network conditions, then this needs to be taken into account for space and power consumption. In this case, 20 percent more space and 20 percent greater power consumption would be needed to achieve 400 Gbit/s of capacity. Failing to account for any performance limitations sets service providers up for unwelcome opex surprises when these new systems start to fill up in the network.
In summary, vendors rolling out new 400 Gbit/s silicon promise routers that are smarter, faster, and greener than previous generations, but some in-depth test results we've seen indicate that this may not be the case. Service providers cannot afford to rely on vendor claims in this area. Rather, extensive testing -- of the kind we’ve described in this article -- is a must. The added time and costs of upfront testing will pay dividends for years to come.
This blog was commissioned by Cisco Systems. The blog was created independently of Cisco, and Heavy Reading is wholly responsible for its contents.
— Sterling Perrin, Senior Analyst, Heavy Reading
Interested in learning more on this topic? Then come to Ethernet & SDN Expo, a Light Reading Live event that takes place on October 2-3, 2013 at the Javits Center in New York City. Co-located with Interop, Light Reading's Ethernet & SDN Expo will focus on how the convergence of Carrier Ethernet 2.0 with emerging carrier software-defined networking (SDN) and network functions virtualization (NFV) technologies could change the whole telecom landscape for service providers. For more information, or to register,