IP Quality of Service
Optimizing IP networks to carry different applications * How is it done? * Is it needed? * What's the outlook?
October 9, 2002
If hot air were cash, the long-running debate over IP QOS - whether it’s needed, what’s needed, where it’s needed - would have saved many an impoverished IP startup from bankruptcy. Yet this is a debate that matters. Few any longer challenge the need for IP QOS, but debate continues to rage over how to implement it, and where. And, while a consensus is beginning to emerge in the industry about the shape of IP QOS, its benefits, by and large, have yet to filter through to users. Hanging unanswered over the whole debate: What, if any, are the commercial benefits of IP QOS?
First, a brief definition. The purpose of IP QOS (quality of service) is to control packet flows in such a way that some flows get better treatment than plain vanilla, best-effort flows. Its main purpose is to differentiate the quality supplied to different IP applications running on the same IP networks – so-called multiservice networks.
The four characteristics that QOS tries to control are latency (end-to-end delay); jitter (variation in delay); packet loss; and throughput or bandwidth. A fifth important measure is availability. However, availability includes consideration of physical link performance (e.g., line uptime), and because it’s not specifically an Internet Protocol (IP) issue, it won’t be discussed in this article.
How It’s Done
In order to achieve differential treatment of packets, engineers must first set policies for the applications that they run – how much latency they need, how much bandwidth they need (often called bandwidth partitioning), and so on. Unless there’s a desire to set a different policy per application, the applications must then be organized into classes that are used as the basis for differential QOS.
At this point it’s necessary to analyze end-to-end bandwidth and, if necessary, provision more bandwidth if it appears that minimum policy requirements can’t be met. Similarly, there is likely to be a need for traffic engineering; which is to say, policies that define how to load a particular network in such a way that all of the network’s resources – routers, links, and so on – are optimally utilized and applications get the bandwidth they need.
Next, and optionally, the engineer may choose to implement a signaling scheme that is recognized throughout the network and allows further QOS mechanisms to be invoked: guaranteed bandwidth, and path allocation. As we shall see later, these are among the more controversial issues in the QOS debate.
Depending on the scheme chosen, QOS may also require that packets generated by specific applications or classes are distinguished from one another by marking packets so that they can be recognized at each node in the network.
At the same time, these marked packets must be given specific treatment at the ingress and egress buffers that handle packets as they enter and leave a piece of equipment such as a router. Congestion at the ingress buffers will result in packet loss; congestion at the egress buffers will result in greater latency, or jitter if it’s variable. This is tackled in two main ways: through queuing techniques that in effect allow some packets to jump queues; and through discard (a.k.a. policing) strategies that decide which packets to drop if the buffers are full. Metering may also be used to count a traffic stream to determine whether a particular packet has exceeded the amount set in a policy for that class or application. Similarly, shapers delay packets in a traffic stream to bring it into conformance with a traffic profile.
Finally, engineers need to set strategies for monitoring performance and adjusting all of the other mechanisms if policy objectives are not being achieved.
* * *So much for the mechanisms. In isolation, these mechanisms cannot of themselves guarantee QOS, and the onus is on engineers and standards-builders to combine mechanisms to create an integrated solution that fits the specific requirements of the applications being protected.
According to Gary Holland, director of technical marketing at Riverstone Networks Inc. (Nasdaq: RSTN), the simple QOS mechanisms – marking, queuing, and discard strategies – may be adequate to meet the need for hop-by-hop QOS, which mimics the way that IP networks usually transmit packets. But large service provider networks, he says, are likely to need the more sophisticated capabilities that are provided by provisioning, shaping, and traffic engineering tools.
Ultimately, the solution must fit the problem – and it is likely to be multisided. In an Internet Engineering Task Force (IETF) RFC (request for comment) called “Next Steps in IP QOS Architectures,” Geoff Huston, an IP QOS expert with Telstra Corp. asks, “What is the precise nature of the problem that QOS is attempting to solve?” The diversity of potential responses is a pointer to the breadth of scope of the QOS effort, he writes. Any of the following could form parts of the QOS intention:
To control the network service response such that the response to a specific service element is consistent and predictable;
To control the network service response, such that a service element is provided with a level of response equal to or above a guaranteed minimum;
To allow a service element to establish in advance the service response that can or will be obtained from the network;
To control the contention for network resources, such that a service element is provided with a superior level of network resource;
To control the contention for network resources, such that a service element does not obtain an unfair allocation of resources (to some definition of “fairness”);
To allow for efficient total utilization of network resources while servicing a spectrum of directed network service outcomes.
As a result, most QOS-wonks believe that the solution in any particular situation will likely involve more than one mechanism – and many of the major standards that are relevant in this area, including: IntServ, DiffServ, RSVP, MPLS, and traffic engineering.
“It is extremely improbable,” writes Huston, “that any single form of service differentiation technology will be rolled out across the Internet and across all enterprise networks.”
It follows that no discussion of QOS can possibly cover every angle. In the following pages we provide a snapshot of the IP QOS debate, dividing up the topic in the following way:
What’s Driving QOS?
Bandwidth Boosters and Class Warriors
Class Considerations
Enter the IETF: IntServ & DiffServ
MPLS: QOS or Not QOS?
Across the Border
Mediation Efforts
To get the most out of this report, you may care to listen in on our archived Webinar on the subject, hosted by the author: here.
— Graham Finnie is an independent consultant. He may be contacted at [email protected].
Want to know more? The big cheeses of the optical networking industry will be discussing QOS at Lightspeed Europe. Check it out at Lightspeed Europe 02.
For users, IP is tantalizingly close to being the perfect communications protocol: low-cost, flexible, adaptable, and ubiquitous. But these qualities come at a price. Putting it simplistically, the Internet is not designed to guarantee the delivery of a specific packet to a specific destination within a specific period of time – or even to guarantee its delivery at all. In the jargon, it’s a “best effort” delivery system; and for many applications and users, best effort is no longer good enough.
IP QOS in its broadest sense has a long history. In fact, IP’s “other half,” TCP, is in effect a kind of QOS mechanism in itself, since its purpose is to control the flow of packets by finding out how many packets the receiver can handle. Moreover, so-called Type of Service (TOS) bits were first standardized as long ago as 1981 by the IETF (as we shall see, these TOS bits still play a role in providing for QOS).
More recently, however, interest in QOS has grown and has driven most of the work described in this article.
In sum, four developments in IP networks are driving the new interest in IP QOS:The first is IP network applications inflation. In the early days of IP, two applications predominated: Telnet and FTP. Today there are hundreds of applications on IP networks, including many that have exacting network requirements. Packeteer Inc.’s (Nasdaq: PKTR) PacketShaper, a standalone QOS tool, recognizes nearly 200 separate IP applications. Some have found even more: Vijay Krishnamoorthy, QOS manager with Cisco Systems Inc. (Nasdaq: CSCO), tells of one Cisco enterprise customer that found a mind-boggling 1,500 applications running on its network, 750 of which were HTTP (Web) applications.
Many of these applications have characteristics that cannot be met very well by best-effort IP networks – though, as we shall see later, not everyone believes this requires the application of complex QOS mechanisms.
In particular, QOS may be required for:
Real-time applications, such as telephony and videotelephony;
Applications that require very short or specific response times; and
Applications that are mission-critical and must not suffer packet loss or unacceptably long response times.
The second major motivator, related to the first, is that many – perhaps most – enterprises now expect to move to multiservice IP networks, which run more and more of their applications, and ultimately, perhaps, all of them. The reasons for this are beyond the scope of this article; suffice to say that one of the key barriers holding back construction of these networks and applications’ migration to them is a widespread concern that IP QOS isn’t up to the job.
Third, in a similar vein, most data network service providers are gradually migrating in the direction of an all-IP infrastructure, and they need QOS mechanisms for several reasons. First, where they are offering IP services to corporate clients, they need to reassure them about QOS, and one way to do that is to offer differential QOS with service-level agreements (SLAs) that guarantee performance. This need is felt most urgently today in the provision of VPNs, a recently introduced service that has attracted strong interest from both enterprises and service providers (see Light Reading's report on Virtual Private Networks). A second motivator here is the growing interest in running both legacy and new applications on a single network that is based on Multiprotocol Label Switching (MPLS) – a routing technology that can be combined with IP QOS mechanisms to guarantee the performance of Frame Relay and other legacy services.
Fourth, but not least, IP QOS is attracting interest because it might enable service providers to earn additional revenues from IP services. One of the paradoxes of IP and the Internet is that, though it is the most successful new communications service since cellular mobile telephony, it generates only a tiny fraction of the revenues earned by wireless providers. If IP were a brand, it would be Woolworth’s, not Saks; so creating premium IP services that increase revenue per user is a key objective of almost every carrier. That, in turn, requires IP QOS.
As the above implies, current interest in IP QOS is concentrated around the needs of corporate and enterprise customers that want to run multiple applications on their networks. As a consequence, the mechanisms to enable QOS are almost all being implemented within bounded networks. Between networks, the problems posed by IP QOS are considerably harder to resolve, and there has been much less progress.
In the pages that follow, readers should bear in mind that most of what is described has been developed for bounded networks, usually owned by a single service provider. In the final page, we will look at the problems of inter-domain IP QOS.
Arguments about IP and the Internet have sometimes taken on a religious flavor, and the argument about QOS is no exception. For some, the idea of differential QOS is an affront and a denial of the most important characteristic of IP: its simplicity.
Typically, they go on to argue that in an age of abundant bandwidth and low-cost gigabit routers, careful husbandry of capacity augmented with complex QOS mechanisms is simply unnecessary. Just as most major telcos have done away with the expensive compression devices that were used to cram more voice channels over a given tranche of international bandwidth, the argument runs, so IP network owners should devote their time and energy to pressuring their bandwidth suppliers for the lowest possible prices, and then run networks that are relatively underutilized. That way, there is little chance of packets being dropped or delayed. In fact, networks that are under 50 percent utilized usually work just fine, network engineers claim, and if bandwidth is cheap, why not just let the idle half of the pipe act, in effect, as the guarantor of QOS?
In the other corner we find the “class warriors,” who don’t trust IP or IP network owners to resolve all of the QOS issues via big pipes and believe that, at the least, a simple class of service (COS) scheme will be required to keep quality acceptable.
The class warriors adduce two main arguments to support this position:
First, they say, in most real-world networks, there is almost always at least one link in an end-to-end connection that acts as a bottleneck. More often than not, this is one or both of the access lines that connect an end user to the nearest network POP (point of presence). While bandwidth between POPs has multiplied many times, access bandwidth remains expensive and is still heavily “engineered” to carry as much traffic as possible.
Even in the backbone network, there is a wide variation on “oversubscription.” Analogous to overbooking on airlines, oversubscription makes statistical assumptions about how many individual packet streams will enter the network at any one time. Heavily oversubscribed networks are more vulnerable to congestion when traffic levels are high.
Similarly, the routers on a particular path may not all have the same capacity; even if only one router is underpowered, it can have a major impact on end-to-end QOS. And in real-world networks, it’s unusual for bandwidth, router capacity, and applications to be completely synchronized at all times: Congestion choke-points are unavoidable.
Second, it’s been shown theoretically that bandwidth can’t resolve sudden very high peaks in traffic, which are an unavoidable characteristic of packet-based connectionless networks. Even in a network with plenty of bandwidth, congestion will occasionally occur, and unless there are mechanisms to protect premium services, these will get the same treatment as everything else.
But this is an argument that is as much about perceptions and practices as it is about technology. In this regard, it’s highly significant that class warriors now include most of those who run large multinational networks and most of those who offer IP services such as IP VPNs to business customers. Big users say they want service-level guarantees; the suppliers say that can’t be done without a COS scheme. In other words, in a key segment of the market, the argument is already running strongly in the direction of the class warriors, and vendors are pretty much all in the COS camp. And in a poll conducted in the Webinar preview of this report, only 4 percent of respondents said they didn’t need COS.
Does that mean that the belief in bandwidth as a cure-all is dead? Not exactly. Everyone accepts that bandwidth resolves a significant part of the problem and that, for now, it is by far the most important technique actually in use in real-world networks to control IP QOS. The truth is that many of the other techniques are only just coming into use in big networks, and their cost effectiveness remains largely unproven. As a consequence, it may be too early to write off bandwidth as the main event in QOS.
In the next couple of sections, we look at some of the emerging techniques that take engineers beyond bandwidth. First, however, a quick look at COS and what it means.
Class of Service is not new. In fact, many of the concepts and class definitions are derived from Asynchronous Transfer Mode (ATM), which defines classes that include constant bit rate (CBR), variable bit rate (VBR), and available bit rate (ABR).
Some engineers and vendors have adapted this scheme for the IP world. For example, Jeremy Brayley, senior product manager with Laurel Networks Inc., advocates a simple four-class system that echoes the ATM scheme. It includes a real-time service class for applications such as voice over IP; a “premium data” class for any data application that a network manager thinks requires priority treatment; a best-effort class for all other user data; and a control class that is used by the network manager for critical functions such as alarms.
Brayley argues that this kind of scheme is easy to implement using well established QOS standards such as DiffServ (described in the next section). “With any type of real-time application, you know how much bandwidth you need. With premium data, you know how much you want, but it’s elastic. So you manage your backbone so that real-time traffic doesn’t exceed a certain percentage of the link; the premium data gets what’s left, and the best-effort gets what’s left after that.”
Brayley’s view – roughly translated, “Keep It Simple, Stupid” – is widely shared. Referring to the overly complex nature of some IP QOS mechanisms, Steve Garrison, an engineer with Riverstone Networks, declares that in certain respects, QOS is “a lot of hooey.”
“People are talking about more queues than service providers can figure out how to sell,” he says.
All the same, not everyone believes that three or four classes are enough.
Cisco’s Krishnamoorthy, for example, defines six classes for use with DiffServ. In Cisco’s model, the premium data class is split into two – one for highly interactive business applications, and the other for those that are important but don’t need low latency. Cisco also splits the real-time class into (effectively) voice telephony and video. Krishnamoorthy claims this scheme is based on growing customer demands.
Others also divide the applications pie differently. Riverstone defines five classes, for example, while Ralph Santitoro, director of network architecture at Nortel Networks Corp. (NYSE/Toronto: NT), describes eight “Nortel Networks Service Classes.”
In a poll in the Webinar preview of this report, we found that most respondents favored a simple scheme: 70 percent said three or four classes was enough; 25 percent said that they wanted more than four classes – a minority, but a significant one.
At one level, this difference of views needn’t necessarily matter: As all vendors are at pains to point out, it’s up to the user to decide how fine-grained they want to go. But it does matter if you are trying to develop services that run to agreed QOS metrics across domain boundaries – especially across many domain boundaries. Here, as we shall see in a later article, the lack of agreed class boundaries is among the problems that must be resolved.
Initially, QOS techniques were largely proprietary, implemented as a feature or option on routers and other networking equipment, as well as in standalone tools that worked well enough in bounded networks. However, service providers like standards, and the IETF soon obliged.
In this section, we’ll look at the two mechanisms developed explicitly by the IETF to meet QOS objectives. In the next section, we look at MPLS, which is widely seen by service providers as an important enabler for enterprise-oriented QOS.
The IETF has defined two mechanisms for improving QOS – Integrated Services, or IntServ, first defined in 1997; and Differentiated Services, or DiffServ, defined in 1999.
IntServ is a more ambitious scheme than DiffServ. In IntServ, resource reservation protocol (RSVP) is the major mechanism to guarantee QOS. However, note that RSVP need not be used with IntServ and is also being used outside the IntServ umbrella – for example, for MPLS traffic engineering, as the next section explains.
As its name suggests, RSVP reserves capacity along an end-to-end route for a specific packet flow by signaling its requirements before the flow is sent. It is, in effect, a “yes or no” protocol: Either it receives a signal that capacity is available, or it does not. This makes it more like conventional circuit switching or ATM switching, and in fact some have characterized IntServ-based QOS as, in effect, a connection-oriented network protocol, fundamentally different from connectionless IP QOS.
IntServ currently defines three classes of service: guaranteed service, in which delay is limited and zero packet loss is guaranteed; controlled load service, the aim of which is to provide the same service in a heavily loaded network that you would get in a lightly loaded network; and best-effort service, which is the same as the service one would get without IntServ. Each flow is assigned to one of these classes. However, in principle, IntServ could carry many different classes and isn’t restricted to three.
IntServ has two main drawbacks: It places a heavy processing load on routers in the core of the network; and it does not scale well in large networks with many IntServ flows. This is because it operates at the level of each individual packet flow, and there is thus a proportionate relationship between the number of IntServ flows and the processing load. Although IntServ defines classes, it does not aggregate flows into classes before they cross the network, so as to reduce processing load.
IntServ does, however, have one important advantage over DiffServ, which has been designed to address exactly that problem: because IntServ clears a path for the flow before sending it, it can guarantee QOS in certain circumstances – a point we come back to later in this section.DiffServ (defined in RFC 2475) marks the type-of-service (TOS) bits, which had been previously defined in the IP standard, so that differential levels of service can be given to different aggregate flows at the entry points to the network. The specific aggregates are identified in the DiffServ Control Point (DSCP) in the packer header. Different application flows are assigned to different aggregate flows (known as behavior aggregates) by looking at these bits.
DiffServ has two important advantages over IntServ: All of the processing takes place before the flows enter the network, at the boundaries; and the flows are aggregated so that there is no need for routers to analyze the requirements of each individual flow. The routers decide how to handle each aggregated flow using per-hop behaviors (PHBs). There are currently three defined PHBs:
assured forwarding, which defines four traffic classes, each of which can have three “drop-precedence” values;
expedited forwarding, also known as virtual leased line service, which provides for low latency, low jitter, low loss, and ‘assured’ bandwidth; and
best-effort forwarding.
Another features of DiffServ is that it does not require that every node be DiffServ-enabled. DSCPs can be passed transparently through the node to the next node.
DiffServ’s virtues also betray its shortcomings: It can’t guarantee a specific QOS, especially on an end-to-end link. Because no signaling is involved, it has no prior knowledge of whether a specific flow will receive adequate QOS, even if it is marked preferentially; if a route or router is heavily congested, all packets will be rejected, whether they are priority packets or not. Similarly, because there is no signaling, applications cannot adjust their requirements in advance in response to network conditions – and many IP applications do adjust to network conditions in this way, given the chance. Like IP itself, DiffServ is a hop-by-hop technology with all the limitations that implies.
Yet for all that, DiffServ is now preferred over IntServ: Its simplicity is the clincher, and IntServ as an end-to-end mechanism for QOS is dead.
Does that mean IntServ is dead, period? Not quite: In fact, it’s enjoying something of a rebirth, at least potentially, at the edge of the networks. In particular, several vendors argue that IntServ’s ability to do call admission control (CAC) will be important in certain real-time applications. This is because designers can use CAC to determine if resources are available to support a service and then, in effect, send a “busy” signal to the user if they are not – thereby both alerting the user and preserving the quality of all other calls in progress.
“DiffServ doesn’t solve the problem of call admission control,” says Cisco’s Krishnamoorthy. “This may not necessarily matter with voice, where you can overprovision the network. But it’s really hard to do that with video, because it’s bursty and takes up a variable amount of bandwidth, depending on the code you use.” As video applications spread and grow, Krishnamoorthy implies, so too will the need for CAC.
Key to this capability is a new standard called Aggregate RSVP, which was defined in September 2001 in RFC 3175. This overcomes a key shortcoming of RSVP by allowing flows to be aggregated and then provided with a guaranteed service, using DiffServ DSCPs to identify flows. By combining aggregate RSVP and DiffServ, it’s thus possible to guarantee service levels for particular kinds of services.
Not everyone, though, agrees that RSVP is the way to go. For Laurel’s Brayley, “IntServ is basically dead.” DiffServ-aware Traffic Engineering (DiffServ TE) could be used for the CAC function, he believes. DiffServ TE adapts the basic DiffServ standard by allowing some simple per-class provisioning to be implemented in a DiffServ network.
It’s a complex picture, not always easy for engineers to navigate, and still evolving. For Telstra’s Huston, the author of IETF 2990, progress in melding these two fundamentally different approaches has been slow. Responding to an emailed question about progress since he wrote his RFC, Huston wrote: “The tools [for QOS] have not changed appreciably over the past few years. The criticism that this is a loose-knit collection of tools without theme, structure or manageable outcomes still remains. In the case of QOS where managed outcomes are pretty much the totality of the desired outcome the contradiction between theory and current practice [is] glaringly obvious.”
Yet these tools are being used, and nowhere more so than in MPLS networks. In the next section, we look at MPLS and its peculiar status as a routing protocol, now widely seen as the key to initial QOS implementations.
Of all the technologies that are being touted to improve QOS, Multiprotocol Label Switching (MPLS) is certainly the most prominent and most widely cited. Yet most engineers see MPLS as a routing technology, not a QOS technology. So how come it barged in on the QOS party?
In MPLS, bits are marked (or labeled), and explicit paths called label-switched paths (LSPs) are assigned through the network based on the labeling, which is contained in an MPLS header that becomes a prefix to each IP frame. Labels are assigned based on agreed policies for specific applications. In an MPLS-enabled network, upgraded routers, called label-switched routers (LSRs) read the labels to forward the packet to the next router in the link. MPLS can also work across network boundaries so long as network operators are using the same version of MPLS on their routers.
The big benefit of MPLS from a QOS point of view is that it directs packet flows along specific paths, and is thus an enabler of IP traffic engineering (see MPLS Traffic Engineering and MPLS: 21st Century Traffic Engineering) . As we noted in the last section, neither DiffServ nor IntServ can assign paths; instead, it’s assumed that flows will run on the best-effort route that is assigned in the router, regardless of priority. But because MPLS assigns paths, this vital element in the QOS picture can be addressed.
What is Traffic Engineering? (TE)
Among other things, a traffic-engineered network can:
Route primary paths around known bottlenecks or points of congestion in the network
Provide precise control over how traffic is rerouted when the primary path is faced with single or multiple failures
Provide more efficient use of available aggregate bandwidth and long-haul fiber by ensuring that subsets of the network do not become overutilized while other subsets of the network along potential alternate paths do not become underutilized
Make an ISP more competitive within its market by maximizing operational efficiency, resulting in lower operational costs
Enhance the traffic-oriented performance characteristics of the network by minimizing packet loss, minimizing prolonged periods of congestion, and maximizing throughput
Enhance statistically bounded performance characteristics of the network (such as loss ratio, delay variation, and transfer delay) that will be required to support the forthcoming multiservices Internet
Provide more options, lower costs, and better service to their customers
[Source: Juniper Networks, “Traffic Engineering for the New Public Network”]
One reason for the excitement over MPLS is that it is beginning to be seen as an all-purpose fix that allows a single IP backbone to deliver legacy services such as ATM and Frame Relay by mimicking the QOS features of these services. Using the so-called Martini Draft standard, MPLS becomes a Layer 2 solution because it allows a Layer 3 network protocol (IP) to carry Layer 2 protocol traffic (e.g., Frame Relay) and retain those QOS guarantees typically offered with these Layer 2 protocols.
“A single frame-based backbone based on MPLS and carrying all services, including new services such as Ethernet private line – this is the main driver for the implementation of IP QOS techniques,” says Laurel’s Brayley. So how is MPLS combined with IP QOS techniques?
Standards setters have proposed two ways to implement QOS using the “EXP” (experimental) bits in the MPLS shim header. For example, when using one of these techniques, called EXP-inferred label-switched paths (E-LSPs), it’s possible to support eight service classes based on the DiffServ standard.
According to Cisco’s Krishnamoorthy, MPLS and DiffServ make a “perfect match,” partly because both technologies aggregate traffic at the edge and process aggregates in the core. By adding DiffServ-aware TE, “Service providers can nail down a point-to-point tunnel for specific premium applications,” with guaranteed bandwidth, he says.
QOS can also be enabled by signaling over the MPLS network using either RSVP-TE, based on the previously discussed RSVP standard, or the Constraint-Based Routed Label Distribution Protocol (CR-LDP), which builds on the existing MPLS LDP standard.
“You can use RSVP to ask the MPLS routers whether there is enough bandwidth to support a flow,” before it is sent into the network, says Nortel’s Santitoro. “And by using these protocols, you can set up an MPLS path with specific [QOS] parameters.”
For Tom Lemaire, ERX QOS Project Lead at Juniper Networks Inc. (Nasdaq: JNPR), RSVP has a role to play, too: “especially for signaling MPLS tunnels across service provider backbones.”
By judiciously combining the techniques described up to this point, service providers can meet much of the IP QOS agenda. Yet there’s still a big challenge ahead: Despite the fact that much of this work is based on agreed standards, QOS across network or domain boundaries has made little progress. In the next section, we look in detail at this problem – and at the pieces that need to be assembled if it is to be resolved.
In previous pages we described how a set of emerging technologies has enabled IP QOS to be deployed in well defined, bounded networks managed by a single enterprise or service provider. Over the past 12 to 18 months, nearly all service providers have implemented some form of IP VPN, for example, that delivers three or four classes to enterprise end users, usually based on the scheme described on page 3.
But what happens if QOS has to be delivered across network boundaries, in multiple domains? And will it ever be successfully implemented in the public Internet, where packets may be routed across many boundaries and where the routes are ill-defined?
Some industry observers believe that existing standards could provide much of what’s required across boundaries. DiffServ and MPLS, in particular, are relatively mature, and interoperability across vendor platforms has already been demonstrated for both. “DiffServ has taken us a very long way towards interoperability,” says Nortel’s Santitoro. “You might need some fine-tuning, but we can interoperate in this environment.”
Yet others believe that there is still a long way to go. “The truth is, there is no way to do QOS across AS [autonomous system] boundaries today,” says Riverstone’s Garrison. “You can do it within AS boundaries. You can tweak hardware flows and boxes to get what you need. But once you’re across the AS boundary, other than BGP peering [using TOS bits], there is no way you can do it IETF-wise.”
Whatever the issues, most industry observers believe this is a problem that could be resolved, given a strong will to resolve it. But do service providers want to resolve it, and would the rewards justify the investment?
The most significant non-technical barrier is a perception that standardized cross-border QOS could constitute a threat to particular service providers rather than an opportunity. Some service providers see QOS as a differentiator that may be used to justify higher prices or to help reduce churn. Providers such as Cable & Wireless (NYSE: CWP) and UUNet have begun to publish quality statistics to help make a case that their networks are more reliable than others. On this reading, the emergence of standardized service levels across network boundaries might render IP transport more commoditized than it is today, since it would require all service providers to meet similar standards for each class of service. “The reality is that they want you to stay on their network,” says Santitori. “There are some peering agreements, but they are often limited in scope.”
“It’s a political problem,” says Brayley. “Just getting everyone to agree on what classes to use is going to be pretty difficult to do.”
Moreover, many service providers are reluctant to allow other service providers to use QOS mechanisms to specify routes across the third-party network, because of the loss of control that this implies.
But even if IETF QOS standards are implemented everywhere, it doesn’t follow that users will benefit from standardized, universal services with QOS guarantees. The IETF believes that it is beyond its remit to specify quality metrics for particular services; so even if QOS standards are implemented in an agreed way across network boundaries, that doesn’t mean that end users will benefit from services that are as tightly defined as, say, international public telephony.
The International Telecommunication Union (ITU) is among those attempting to address this issue. Through a work program called Y.1541, it hopes to “quantify user/applications needs via standard IP QOS classes” and “standardize QOS signaling protocols using Y.1541.”
The IETF, meanwhile has begun its own work program to fill in some of the perceived gaps in the existing IETF QOS work and get QOS across network boundaries. The Next Steps in Signaling Group (NSIS) aims to “evaluate (aggregate) RSVP as a starting point,” and to “focus on mobility and roaming” – a major area that has not really been tackled in existing QOS work. The potentially widespread use of IP applications in mobile networks poses even more challenging cross-border challenges for QOS engineers.
Whether these efforts bear fruit remains uncertain. But even if all of these problems were successfully addressed, it wouldn’t resolve what is, arguably, the biggest issue of all in cross-domain QOS: the lack of standardized mechanisms to handle billing, accounting, data collection, and monitoring. Without these mechanisms, there is little incentive to implement cross-border QOS, and no ability to enforce cross-domain SLAs.
“Very little effort has gone into this [at the IETF level],” says Santitoro. “Some work has started, but mostly today it’s undefined. What we need is a set of MIBs defined in an RFC.” (A MIB, Management Information Base, is a network management concept that describes databases of information about managed network elements.)
Krishnamoorthy agrees: “The tools you need, such as MIBs and probes, are not available at all points in the network. This is very hard to do without a lot of infrastructure and intrusive monitoring, and in any case all the diverse tools you need are not very tightly integrated today.”
Despite the formidable problems implied by cross-border QOS, there is growing interest in finding workable solutions, and a wide variety of companies attempting to mediate among service provider networks. Three examples will suffice to illustrate some of the approaches now being taken.
First, there are companies such as Vanco that are mediating across network boundaries by acting as old-fashioned, value-added service companies – much as companies like GEIS and IBM Corp. (NYSE: IBM)did in the days of monopoly PTTs. Vanco keeps a comprehensive database of quality statistics on third-party networks, which it uses to build IP VPNs that are underwritten by SLAs. It owns no network facilities itself; instead, it works with clients to define the level of service they want, and then looks for service providers willing or able to provide it – often, multiple service providers.
Startup Nexagent is attempting an even more ambitious fix, namely the implementation of peering points that mediate among different networks to guarantee QOS. These peering points are usually located in collocation hotels or Internet data centers, and are based on a version of MPLS. Service control devices monitor the usual QOS metrics on all attached networks and try to fit customer requirements with the best match available. Nexagent’s proposition is based on a sophisticated software platform that not only pieces together network facilities but also attempts to handle the commercial issues such as billing and monitoring.
Finally, there are companies like Equant (NYSE: ENT; Paris: EQU) that are tackling the problem by painstakingly stitching together SLAs with local network providers in order to provide a global end-to-end IP VPN service to multinational customers. This has enabled Equant to offer five classes of IP service, including telephony, that are underwritten by SLAs – even though parts of the end-to-end solution are not owned by Equant. However, it’s not available everywhere: The company estimates that about one third of the 140 countries it connects are now covered by these SLAs, which cover latency, packet loss, and jitter.
For Riverstone’s Holland, these cross-domain efforts are the key to resolving the cross-border problem: “We’re now seeing a lot of interest in cross-domain service mediation. Customers are frustrated, and companies such as Vanco are solving the problem neatly.”
Yet even these efforts are really only attempting to resolve the cross-domain problem for one kind of customer: large and medium-sized enterprises. This is for two main reasons. First, it’s generally assumed that there is a bigger potential reward in providing differential QOS to enterprise customers than to other types of customers, such as consumers. “Of course everyone wants better Internet access, too, but how do you do that? It’s a lower priority, because it’s not an obvious money-maker,” says Laurel’s Brayley.
Secondly, as Brayley implies, it is technically an easier problem to resolve than providing QOS in the Internet as a whole, especially to indeterminate end points.
Does that mean that the idea of differential QOS in the Internet is a non-starter? Not quite. As in the enterprise segment, QOS in the consumer segment may ultimately be driven by applications that both generate revenue and require certain QOS guarantees that are beyond the ability of the best-effort Internet.
Advocates generally cite two potential applications: video on demand and gaming. Gaming, in particular, may demand very tight control over latency where the games are based on fast response times. The imminent launch of Microsoft Corp.’s (Nasdaq: MSFT) Xbox network gaming service may drive demand for QOS in this area, says Krishnamoorthy. Others note that the huge success of gaming services in Korea, where broadband networks are widely implemented, has driven providers there to implement QOS across network boundaries.
According to Juniper’s Lemaire, there are also providers in Korea and other Far Eastern countries that are doing streaming video across domain boundaries with QOS.
In a poll conducted in the Webinar preview of this report, many respondents saw this as an increasingly important objective. Asked where QOS offered the most value today, 41 percent cited voice over IP; 30 percent cited VPNs; and just 13 percent plumped for video on demand and gaming. But asked where IP QOS would offer most value tomorrow, nearly 40 percent cited video on demand – suggesting an expectation that consumer applications may ultimately supplant enterprise applications as the driver for IP QOS.
That may still be a ways off. But the fact remains that, little by little, QOS is being implemented in real networks, especially in networks that are serving enterprise customers. IP QOS is a dauntingly large and complex topic, and it remains unclear just how far QOS will get, or how fine-grained it needs to be. Yet there is little doubt that it will continue to make inroads for as long as differential service meets a clear commercial need. And that, in the end, is the crux of the matter.
“The important thing is to define the problem first, set a policy, and then decide what tools you need,” says Krishnamoorthy. “ It’s like baking a cake: All the ingredients are now there; but first, you have to decide what your users really need.”
You May Also Like