The Anatomy of Automation: Q&A With Cisco's Roland Acra
Unless you've been living in North Korea for the last couple of years you will have noticed Cisco Systems has undergone a pretty major techno-cultural shift. Cisco's big-name telco legends of yore (Pankaj Patel, Kelly Ahuja) are gone, and the vacuum they left is being filled by enterprise cloud folk.
Cisco Systems Inc. (Nasdaq: CSCO) has always been split between two orders: telco and enterprise -- a bit like the light and dark sides of the force in Star Wars. And today the balance of power within Cisco has clearly shifted to the enterprise cabal.
That's put the wind up some of Cisco's service provider customers, including several carrier executives at Light Reading's annual Executive Summit this week in Prague, where Cisco was a conspicuous no-show for the first time in the event's four-year history. As one exec told me yesterday: "Cisco looks like it's exiting telecom."
I'm not sure that's the case so much as Cisco trying to move everything -- enterprise and telco -- into a cloud: public, private, and hybrid.
And if "cloud everywhere" is indeed its strategy, much of the responsibility for making it work across both enterprise and service provider networks falls to Roland Acra, SVP & GM of Cisco's Data Center Business Group. This is where Cisco has parked most of the tech that will be used to build virtualized, automated and profitable networks (stuff like predictive analytics, application modeling and policy enforcement).
Roland has a bunch of degrees, which we're not especially impressed by (I mean, who doesn't have a Master's degree in Engineering from the Ecole Nationale Supérieure des Télécommunications in Paris? Oh, wait, I don't.)
And, like me, he's also a serial successful technology entrepreneur which, conversely, is obviously very impressive.
But perhaps his greatest skill is the ability to speak eloquently and authoritatively about some of the knottiest technology challenges ever faced by the communications industry. These are problems which every organization -- enterprise or telco -- must wrestle with successfully if they are to reinvent themselves as 21st century communications businesses. Or as Yoda put it: "Do. Or do not. There is no try."
Read on for the wit and wisdom of Roland Acra...
— Steve Saunders, Founder, Light Reading
Steve Saunders: Hey Roland -- tell me how you ended up back at Cisco.
Roland Acra: I was working on a project of my own, looking at how ready we are to automate large, complex networks, by leveraging data and learning algorithms. That's a big, fat word, "automation" -- and I was focused on identifying the processes that can be modeled, and automated, to make them less manpower-intensive, and more predictable.
I was working through all that when Cisco reached out about ten months ago, and I realized that what Cisco was doing in the data center, particularly on the software side, with the analytical tools and the intent-based software capabilities, overlapped completely with my interest areas. And I thought, well, I can go and build all this from scratch -- build a company, raise funds and so forth -- or there's a platform right there at Cisco for me to make all this a reality, now. And that's how I ended up back at Cisco.
SS: Okay. Did you take over an existing team at Cisco, or build a new one?
RA: Existing team. The purview of what we call the data center group at Cisco includes all the switching infrastructure that goes into data center, and that covers assets like actuation, and the analytical frameworks for policy and for security and so forth.
Next page: Automating the data center
Automating the data center
SS: How have you gone about bringing automation to life inside the data center?
RA: The first manifestation of automation in the data center was Cisco's Application Centric Infrastructure (ACI) intent-based networking framework. Before ACI we built things like network forwarding paradigms, access control and policy implementations around this static view of where the applications were on the network. You knew that behind this Ethernet port was the database, behind this other one was a web server or what have you -- and you manually created your policy based on this information.
The problem with doing things that way is that today's applications are much more fragmented; they have a multi-point footprint, and it changes over time because of virtualization and containerization. What we think of as a database at ten in the morning might have nine VMs implementing it on three servers, but by noon it will probably peak to 23 VMs on a whole bunch of other servers. And yet we still want our policy to follow that mobile, nomadic and elastic workload.
So, this was the genesis of ACI, which replaces this model with a programmatic paradigm that allows users to define a contract that they want to initiate with their network; one which defines which applications can talk to what databases; issues those instructions to the network using the kind of abstractions that an application developer is familiar with; and goes on to ensure that the network continues to enforce that policy regardless of whether the database expands, shrinks, or moves around because of things like containers.
And that's the genesis of ACI. Simply, it's a new, expressive language for you to declare what you would like the network to do for you, and we've had a ton of success with that. Today we have upwards of 4,000 customers with big ACI fabrics.
Step two happened about a year ago, when we brought Tetration Analytics to market. The purpose of Tetration is to answer the questions that came up when customers implemented ACI and realized that, while it was wonderful to be able to define intent on the network, they lacked the level of visibility into what was running where in their data centers to be able to take advantage of it and formulate their intent in the first place.
So, that's what Tetration does: delivering broad, pervasive visibility. Not a single packet gets missed. Not a single communication element gets missed across all the applications that are running. More importantly, it automatically creates the intent, expressed as a white list of what is allowed, using this pervasive visibility and behavioral observation of the applications over the network as well as within their host operating system. It does an amazing amount of work by sitting there and listening to what the network is doing, and what workloads are doing, teed up through observation and machine learning.
So, this is a new concept. Increasingly the industry is using the term "intent-based networking." And there are lots of definitions. To me, in the data center, intent-based networking is about creating the ability for programmers to formulate what they would like the network to do for them in way that is expressive, that a programmer relates to. A coder, not a network admin.
But Tetration is more than that. It gives you the whole life cycle of everything running in your data center; the inventory and footprint [of the applications]; how these change over time; how they talk within each cluster to one another, and how they talk across clusters; it gives you graphs of connectivity on who's calling who when, on which port, and in which direction; and ultimately it results in a suggested white list that you, with ultimate human agency, correct or agree with, or complement with context that we can't infer from observation of telemetry, and then lock it down as your unique policy, and make sure that it is honored and enforced in a way that the network can support.
And the exciting thing is that Tetration works both with Cisco switches, at line speed, if you happen to have a Cisco nexus switching network, or via a software agent that travels with the workload regardless of underlying switching infrastructure. This means it can work in a Cisco environment, a non-cisco environment, or where you have a mix. And it can work in a cloud infrastructure, so customers can use it with Amazon or Azure or whatever.
SS: Which means you can get the benefits of this whether you have your own private cloud or whether you're transitioning across a public cloud?
RA: That's right. Or if you're straddling both, which is often the case.
SS: A hybrid cloud.
RA: That's right; a multi-cloud footprint. And this is really where the majority of our customers are today. They like some attributes of Amazon, they like some attributes of Google or Azure, but there's a lot that they continue to do on premise, and they don't want to have a fragmented policy and compliance model. They're saying "I have to prove that my consumer data was treated with the utmost care... we can't have an Equifax situation. I want to prove that I've protected the databases from not talking to strangers or not being exposed to more connectivity than they have to."
We're moving on from how we used to build IP networks, and enterprise networks, which was "anything goes," with one or two exceptions. Today, the prevailing model for security and for compliance -- known as "zero trust" -- is to say "Nothing goes, except where I open the veins for the blood to flow through." The new paradigm is "don't talk to strangers."
Next page: Hand me my Kevlar jacket
Hand me my Kevlar jacket
SS: The fact that you are upgrading the policy capability to run across other people's clouds is very significant, isn't it?
RA: It is. Instead of having a model where the application is a fortress protected by a moat and thick walls, now the application has a Kevlar jacket that it takes with it everywhere it goes, in its own territory, or beyond.
SS: I like the analogy. You personally saw the automation opportunity years earlier than most. But where does it go next? Do you think we will see autonomous data centers which operate without human intervention in the next four or five years? Or is that still complete science fiction?
RA: Having cut my teeth on trying to make it work in startup land and at Cisco, I would say we need to be honest engineers when we talk about automation. The reality is that some of it is already out there and deployed today. But we must look at it function by function. There are things that we might call "day zero" functions: provisioning and configuring a complex network in an automated way with as few humans as possible, using programmatic interfaces, plus software that can call other software. To a large degree, this is within our reach. In a data center, it's completely within our reach. In a telecom network, with the diversity of services they offer to businesses and consumers and so on, I think they're well on their way to achieving that.
But what about change management, when I upgrade some software, or change the configuration, or redesign a customer's VPN, or an application.? I think software management, upgrade management, and so on, that's what I would describe as "fairly automatable."
Where the problems are harder, and where we should all be honest engineers, is a situation where you might not today be able to tell me to me what you would do to solve it manually. In a situation like that, how am I going to believe that we're going to automate it?
An example of a challenge like this that we are excited to work on over the next few years is network uptime and availability -- when the customer asks for five 9s reliability without a human in the loop. That's a lot more complicated.
How do you prevent downtime either before it happens, or how do you shrink the window of time to resolution after you've bumped into a speed bump on a running network? It's hard because there is no one-size-fits-all recipe that says whenever there's a bug, you do A, then B, then C, and D. Right now we still rely on human knowledge and correlation among a vast array of factors and contextual elements.
This is an area where we will add new forms of telemetry, data-driven learning, and high likelihood diagnoses. And that will require us to generate more data with new sensors in the code itself that lives in the routers and the switches and the servers, and to build the learning algorithms that harvest that data and then learn that hey, this looks like a bug is about to hit, let's just get in front of it before it happens. It's a big audacious goal. It’s exciting.
SS: Yeah, it is one of the last great challenges in communications. How much of the technology that you are working on and developing is usable by telecom operators in addition to enterprises?
RA: Where a large service provider is implementing NFV at scale, that network looks like a big complex enterprise application. That means all the toolkits that we built for the enterprise apply directly in that telecom network, and we are finding that service providers are very, very welcoming of the paradigms that we have developed for the enterprise data centers, and for the banks, and to ensure things like compliance.
The cloud space is interesting, especially the large cloud providers, because they have a different take on things. When it comes to automation those guys are the best do-it-yourselfers on the planet. So they're asking us for individual components, disaggregated from the solution that we deliver as a whole to an enterprise or a service provider. They are asking to be fed the telemetry, say, but they want to consume that data in their own Tetration-like solution that they have in house.
We realize that for the large companies with strong software chops we have to be able to say "You love my operating system, but you want to put it on a white box switch? Yep, no problem." Or "You love what I can deliver to you in hardware, but have your own homegrown operating system? Sure -- we can do that too." Or any variation of those.
We're in a very interesting time, Steve. For one sort of customer, we have to show excellence on component pieces. But when we go to a CIO who just wants to reduce head count, get best of breed everything, and just pay the bill every month, we can do that too. We're completely attuned to the duality of those consumption models.