Huawei Debuts AI-Powered Data Center Switch for AI Applications
Huawei's new CloudEngine 16800 feeds the bandwidth needs of data-hungry AI.
Huawei took the covers off a new data center switch this week that's powered by artificial intelligence and designed to power artificial intelligence applications.
The CloudEngine 16800 series of data center switches are built to be AI ready in two different ways. The products will use AI to improve network operations and also provide the underlying network foundation for companies looking to utilize AI as a way of improving processes or creating new ones. (See Huawei Releases 'Industry's First Data Center Switch Built for the AI Era'.)
The AI readiness on the 16800 is comprised of the following three components:
AI silicon. The switch's homegrown chip uses Huawei's iLossless algorithm to eliminate packet loss. Huawei explains that Ethernet, despite its mass adoption, is not appropriate as a backplane for AI, cluster computing and even some storage networks because Ethernet drops packets. AI is heavily dependent on data and even 0.1% packet loss can have a significant impact on AI accuracy and performance. The Huawei silicon ensures this doesn't happen. The CloudEngine 16800 also provides optimization of the entire network as it acts as a central brain and the single source of truth to dynamically optimize every path. Also, the chip uses AI to constantly monitor the status of the network and intelligently adjust the threshold of switching queues and queue buffer to ensure the fastest possible transmission of the data.
Figure 1:
48 port 400 Gig-E line card. Speed matters with AI. AI applications such as speech recognition, autonomous machines and real-time security require massive amounts of data. Cisco's latest Visual Network Index (VNI) predicts there will be 86 GB of Internet traffic per month per user by 2022, which is almost 3x what it was in 2017. (See Cisco: Get Ready for the Multi-Zettabyte Internet.)
Most of this this data can and will eventually be used in some kind of machine learning (ML) or AI. It seems like the 100 Gig-E cycle just started, but in networking things move fast (so to speak) and the rise in traffic volumes may choke 100 Gig-E faster than network architects expect. At this density, the largest Huawei chassis has a capacity of 768 400 Gig-E ports, which, in the words of Led Zepplin, is a "Whole lotta ports" (or a whole lotta love, which is the same thing for network managers). Huawei apparently is the only vendor to currently offer this density of 400 Gig-E; I was unable to find another.
This switch uses about 50% less power than its predecessor. Huawei has proprietary technology in the box to provide improved heat dissipation while increasingly the reliability of the system. It also uses mixed-flow fans to change to further improve cooling. So even at the increased speeds, network owners will spend less on power.
Self-driving network capabilities. The data center is changing and becoming more complex and dynamic. Containers, the cloud, virtualization and other trends have created demand for network operations to work faster and more accurately. Downtime in a data center was always bad but now it might have big revenue implications. The CloudEngine 16800 has local inferencing and real time decision-making to automate operational tasks. Using telemetry, data is pre-processed locally and is send to an operations platform for secondary processing based on the results. It's then compared to a fault library that provides case information from over 5 million live networks, continually updated. This lets the Huawei network identify and isolate faults in seconds and even predict them in some cases.
The next wave of business has arrived. Some call it the digital transformation era, IDC refers to it as the third platform and others the fourth industrial revolution.
Whatever you call it doesn't matter. What's important is to understand that AI and ML will be infused into almost everything we do. Huawei cited a survey they ran that found the adoption rate of AI will increase from 16% of companies in 2015 to 86% in 2025. This number may seem aggressive but I believe this to be true. The use of predictive systems, autonomous machines, facial recognition, natural language processing, real time translation and other AI use cases are going to explode over the next few years and businesses need to be ready.
The network plays a key role in success as it's one of the three legs of the AI/ML stool. From an infrastructure perspective, Al and ML require fast storage, processors and networks. If any of these fail, then it holds up others. For example, a company could spend millions on new GPU enabled servers but if the network is too slow to feed the GPUs, then highly paid data scientists sit around twiddling their thumbs waiting for models to complete. Similarly, if the network has been upgraded but the storage systems are slow, then the network will remain primarily idle. GPUs have increased processor speeds by orders of magnitude and storage is seeing a rapid transition to all flash and soon NVMe, and the network needs to keep up. Huawei's CloudEngine 16800 is fast but also has on-board AI specific silicon to ensure the network won't be the bottleneck.
It will be interesting to see if Huawei can use AI as a way of establishing itself as a vendor in the regions it has struggled in. China has been one of the most aggressive countries regarding the use of AI and Huawei could learn a number of best practices and could get a head start in this area. This is an area the US needs to focus on, and needs better evangelism at a country level to alleviate fear and stimulate usage, or we risk falling behind.
Related posts:
— Zeus Kerravala is the founder and principal analyst of ZK Research.
Read more about:
AsiaAbout the Author
You May Also Like