Google's Andromeda Relieves Cloud Strain
MOUNTAIN VIEW, Calif. -- Hot Interconnects -- Google relies on its Andromeda networking platform to deliver a global cloud infrastructure that gives customers the security and performance benefits of local private networks.
"We want bare-metal performance and low latency for the services we deliver," said Google (Nasdaq: GOOG) Distinguished Engineer Amin Vahdat, delivering a keynote at the conference here today.
SDN is key to delivering the needed performance and security, he said.
SDN at its most fundamental involves separating the control plane from the data plane, Vahdat noted. "A logically centralized hierarchical control plane beats peer-to-peer every time," he said. The data plane can run at network speed, while the control plane can run on commodity hardware, scaling as needed. The control plane requires 1% of the overhead of the entire network, Vahdat said.
But managing that infrastructure requires new tools and skills, he said.
"It turns out that running a hundred or a thousand servers is a very difficult operation. You can't hire people out of college who know how to operate a hundred or a thousand servers," Vahdat said. Tools are often designed for homogeneous environments and individual systems. Human reaction time is too slow to deliver "five nines" of uptime, maintenance outages are unacceptable, and the network becomes a bottleneck and source of outages.
Google looks to SDN and network functions virtualization (NFV) to orchestrate provisioning, high availability, and meet application performance requirements, Vahdat said. The technology must be distributed throughout the network, which is only as strong as its weakest link.
Andromeda is Google's code-name for its network virtualization platform. It's designed to provide each external user with the illusion that they're on a dedicated network with dedicated performance and its own IP address space. Applications require real-time high performance and low-latency communications to virtual machines. Users also require service chaining to tools such as load-balancing, and the ability to grow and shrink the number of servers available to applications as demand requires. (See Google, Microsoft Challenge Service Providers and Google's Andromeda Strain Is Spreading.)
Security is a huge requirement. "Large companies are constantly under attack. It's not a question of whether you're under attack but how big is the attack," Vahdat said.
Power and cooling are the major costs of a global infrastructure like Google's. "That's true of even your laptop at home if you're running it 24/7. At Google scale, that's very apparent," Vahdat said.
Google has a global infrastructure, with data centers and points of presence worldwide to provide low-latency access to services locally, rather than requiring customers to access a single point.
The company runs two networks. Its private, server-to-server network is bigger than its public network, and one of the world's largest SDN deployments. Connectivity between data centers is comparable to within data centers.
Andromeda provides significant performance improvements over a state-of-the-art baseline, as seen in Vahdat's slides:
The promise of cloud computing is just beginning.
"Many people think about cloud computing as being able to get on-demand access to computing. I don't have to go buy servers; I can rent them for a minute, or an hour, or a day. I can get burst capacity of as many servers as I like, whatever memory, configuration or disk, etc., that I like. I think actually yes, this is powerful, but this is really just the beginning," Vahdat said. "The really exciting parts of cloud computing are on the verge of happening."
These include a fundamentally easier operational model; higher uptime; state-of-the-art infrastructure services such as denial-of-service protection, load balancing, and storage; and new programming models for low latency and massive input-output performance.
What cloud doesn't do is take away the challenges of running an IT infrastructure. "Most cloud customers, if you poll them, say the operational overhead of running on the cloud is as hard or harder today than running on your own infrastructure," Vahdat said.
Click the photo below for a selection of Vahdat's slides -- and more.