NFV Strategies

Virtualization Challenge: Building Reliable Services From Unreliable Parts

BURLINGAME, Calif. -- OPNFV Summit -- While comms companies are putting the old benchmark of five-nines reliability behind them, the services they deliver still need to be reliable.

"If you call the police or fire department, it has to reach someone. It has to be certain," Philippe Lucas, VP standardization and ecosystems development for Orange (NYSE: FTE), said at an end-user panel Wednesday. As comms companies move to network virtualization, they need to continue to deliver quality of service to customers. (See Vodafone Calls for End to Five Nines.)

Margaret Chiosi, distinguished network architect, AT&T Inc. (NYSE: T) Labs, agreed. "We are supporting a lot of companies. Multi-tenancy is pretty critical. When our network goes down, we bring down a lot of companies," she said.

Reliability is also important to the new generation of hypercloud providers, Jeff Mogul, Google (Nasdaq: GOOG)'s principal software engineer, said. "Large companies like Google are not just cat video delivery companies anymore," he said. Google needs to provide quality of service (QoS) guarantees to its business service customers, on a network built from commodity equipment.

"One of the things that Google has learned from its inception is how to build high reliability services from low reliability parts," Mogul said.

Make it Work
Jeff Mogul, Google's principal software engineer (right), discusses building reliable services from unreliable parts, while Hal Stern, executive director of applied technology for Merck, listens.
Jeff Mogul, Google's principal software engineer (right), discusses building reliable services from unreliable parts, while Hal Stern, executive director of applied technology for Merck, listens.

To make the network work, Google needs centralized control, Mogul said. "The reason that's extremely important is that as our network gets larger and more complicated we can't hire sufficient smart people people to manage it."

The people are smart enough, there just aren't enough of them, Mogul said. "We have great network operators. The problem isn't lack of skill. It's the size of the problem. It's no longer possible for humans to manage this. We're trying to move away from using network operator tapes to using software engineers.”

Company culture in new hypercloud providers like Google and Amazon helps build business processes to improve uptime, Mogul said. "One of the things that is unique about the new companies is the 'blameless post-mortem culture,'" Mogul said. After a network outage, operators document what went wrong, how it was mitigated, what went well, and how to resolve the problem. "We don't treat [outages] as people doing bad things. We treat them as learning experiences."

Rather than managing individual components -- "little boxes" -- Google must manage the network as a whole system, he said.

Lucera, which provides network services to the financial industry, needs to deliver high reliability along with fast performance, CEO Jacob Loveless said.

Centralized management and automation are key to achieving those goals, which are often in opposition to each other, Loveless said. "It becomes very hygienic, because everything always happens at the same time."

Find out more about network functions virtualization on Light Reading's NFV channel.

Lucera achieves performance and reliability by eliminating unnecessary features -- doing less, but doing it well, Loveless said. "Our network is very large but from a feature standpoint it is very small. We only do a handful of things, but we do them very well," he said. For example, BGP is sufficient; MPLS is unnecessary. "Let's keep the world small and simple."

In addition to reliability, performance is absolutely essential to the financial services companies Lucera serves. In that industry, performance is so important that a company will build microwave links because straight-line fiber is too slow. Financial services companies gain advantage through microsecond improvements in speed; one-second jitter or one minute of outage is fatal. "Performance: There should be a new word for it in financial services," Loveless said.

"You're talking about 'money,'" quipped Jim Zemlin, moderator for the Linux Foundation , who moderated the panel.

Related posts:

— Mitch Wagner, Circle me on Google+ Follow me on TwitterVisit my LinkedIn profileFollow me on Facebook, West Coast Bureau Chief, Light Reading. Got a tip about SDN or NFV? Send it to [email protected]

inkstainedwretch 11/12/2015 | 6:11:37 PM
Five 9s Five 9s came out of the old AT&T, whose lifeline service HAD to be thoroughly reliable. Consumers have become ambivalent about reliability, at least as it applies to their communications services. Dropped calls? S**t happens. Buffering video? I'll just concentrate on the other two things I was doing. There's an emergency? What me worry? -- Brian Santo
Sign In