Testing Cisco's Media-Centric Data Center
Generally, most people who are not IT-related understand the concept of a software upgrade, and accept the need for them to exist. If not, after a short discussion with a network administrator one would know how his job consists of hearing about issues from his colleagues, and receiving new code from the vendor in order to resolve them. Most people who are not IT-related are also familiar with the chore of having to reboot their PCs once patches or upgrades have finished installing.
That rebooting becomes a much bigger issue when it involves a network device that is servicing millions of users. Cisco understands this issue and has developed a solution, which they claim would enable major and minor code upgrades to incur no negative effect on the users. In this test we measured the out-of-service time induced on the traffic when both major and minor software upgrades were performed on the Nexus 7000.
In order to perform a software upgrade on the Nexus 7000 we needed multiple versions of the NX-OS operating system – the system used on Cisco’s Nexus switches.
Cisco provided three code versions for the focus of the test:
- The Nexus 7000: 4.0(4)
- The Nexus 7000 4.1(3)
- The Nexus 7000 4.1(5)
The 4.0 and 4.1 operating system codes are considered by Cisco (and to the best of our understanding) major releases.
A new feature used in our testing, VPCs (Virtual Port Channels), were first introduced in the 4.1 code. The difference between 4.1(3) and 4.1(5) is considered minor and represents a set of bug fixes. For the minor software upgrade we ran the full service delivery network traffic profile and measured loss induced by upgrading the code on the Nexus 7000. The caveat of the major upgrade was the lack of support for VPC, which was required to transport traffic from the Nexus 5000 in the test setup. We accepted that for the major upgrade we would run all traffic except telepresence, digital signage, and IP video surveillance, leaving IP video, VoIP, and Internet traffic traversing one Nexus 7000, and VoIP and Internet traffic traversing the other Nexus 7000. It was clear that the two switches would still have more than enough frames to deal with while also being upgraded.
We ran each instance of this test for approximately 25 minutes, allowing for enough time for the code upgrade to finish and verifying that the new code had truly been installed on the device. We let the test traffic continue running for two extra minutes to make sure that the upgrade was not only successful, but that the switch was stable afterwards.
We stopped traffic between each iteration and measured frame loss. First we upgraded each Nexus 7000 from 4.1(3) to 4.1(5), one device at a time, and recorded zero frames lost. This was the minor software upgrade. Following these two upgrades we stopped traffic in order to reload each Nexus 7000 with 4.0(4) code to perform the major upgrade – 4.0(4) to 4.1(5). This major upgrade was also performed on one Nexus 7000 at a time, for a total of four test iterations.
One way that we verified both upgrades was through witnessing the device switch from one switch engine to the other, as this is the method used to upgrade the device without loss – one switch engine at a time. We, of course, also checked the versions of running code through CLI commands both before and after each test. The major upgrade was additionally verified through the attempts of certain commands that did not exist on the previous operating system.
Throughout both upgrades we witnessed zero packet loss. After the last major upgrade was completed, we added the business services from the Nexus 5000, still witnessing no loss. We conclude that data center administrators can upgrade their Nexus 7000 gear even during peak hours without users experiencing the negative impact of dropped packets. This obviates the need for the administrator to come in at night or during the weekend in order to do such upgrades, which in turn avoids an extra cost to the provider.
Next Page: Results: Control Plane Failover