Can voice-over-IP be as good as (or better than) the public switched telephone network? * Reliability * QOS * Manageability

June 30, 2004

25 Min Read

It’s no secret that carriers are looking to move from traditional, circuit-switched voice to voice-over-packet technologies. In a way, of course, they have already done this years ago with Asynchronous Transfer Mode (ATM), but ATM’s fixed-length micropackets (cells) and connection orientation were adopted specifically to be voice-friendly. What’s new is the wholesale move to the variable length and connectionless orientation of Internet Protocol (IP) packets, which are definitely not the obvious architecture of choice for voice.

IP was designed as a best-effort, data-networking system with the ability to recover (somewhat) from a nuclear holocaust. This is not a requirement high up on a voice engineer’s wish list. He or she is much more interested in almost total availability (pick up the phone and you hear dialtone always, even when the lights are out), high reliability (the line doesn’t go dead in mid conversation), and high voice quality (no annoying gaps or delays in the conversation that make human interaction difficult).

But the attractions of turning voice into yet another application running over IP are immense in terms of potential cost savings, network convergence, and service innovation. And everyone is pretty much agreed that the way to make this happen is to run VOIP over a converged IP/MPLS core, where Multiprotocol Label Switching (MPLS) provides the connection-oriented layer needed for low latency, traffic engineering, protection, and security, among other things.

Carriers are worried, though, that junking time-honored TDM cores for voice in favor of IP/MPLS cores for VOIP could get them into trouble. "If it isn’t broke, don’t fix it" is a good motto for network architectures that still support the bulk of carriers’ revenues. As anyone who has ever struggled with a Windows PC, a LAN, or a broadband connection knows, an IP application is great when it works and a complete pain to fix when it doesn’t. The carriers’ nightmare is that their shiny new IP/MPLS cores running VOIP to support millions of users could be less reliable than their existing networks, and much more difficult and expensive to fix and operate.

So what are the risks associated with the move to the new voice architectures? And, more pertinently, what can be done to reduce or even remove those risks? Read on to find out.

Here’s a hyperlinked contents list:

  • Why Bother With VOIP?
    It's NOT about old wine in new bottles

  • VOIP Risks
    VOIP + IP/MPLS works - but how well?

  • VOIP Reliability
    Failures will happen - will new technologies help?

  • Management Challenges
    IP/MPLS management is at last coming up to speed for voice needs

  • Improving VOIP QOS
    Carriers are learning to reimplement the past to improve VOIP QOS


This report was previewed in a Webinar moderated by Geoff Bennett, Chief Technologist, Heavy Reading, and sponsored by Avici Systems Inc. (Nasdaq: AVCI; Frankfurt: BVC7), Multiservice Switching Forum, Nortel Networks Ltd. (NYSE/Toronto: NT), and Riverstone Networks Inc. (OTC: RSTN.PK). It may be viewed free of charge in our Webinar archives by clicking here.

Background Reading

  • Report: Getting the Value From VOIP

  • Report: IP Reliability

  • Report: IP Quality of Service

  • News Analysis: MPLS Gets the Management Blues

  • News Analysis: HR Survey Points to Big VPN Changes

  • News Analysis: Alliance Demos MPLS With TE, Diffserv

— Tim Hills, Freelance Telecommunications Writer and Journalist

Figure 1 shows a generic PSTN voice architecture. It uses North American terminology for convenience, but basically it is a modern, flattened, two-level circuit-switching hierarchy of local (Class 5) and long-distance (Class 4) switches. Everything below the broken red line forms the transport infrastructure for the network, while everything above that line represents the service, or application, that is running over the infrastructure. The local loop is still predominantly copper, but also contains a lot of electronics and fiber for digital loop carriers and customer premises equipment (CPE) for business services. The metro/long-distance transport is provided by a Sonet/SDH optical transport network (OTN)

53864_1.gifFigure 2 shows an equivalent VOIP architecture, which is also very generic and simple and just refers to media gateways and softswitches. In terms of transport, the local loop remains the same as in Figure 1, but the underlying transport is completely different, being an IP/MPLS infrastructure. A major issue for VOIP is to optimize this infrastructure to carry voice as a service, since both IP and MPLS were originally conceived essentially as data-environment protocols. The role of MPLS in this infrastructure is to create a virtual connection-oriented layer within the IP cloud.

53864_2.gifHard-bitten PSTN types will ask – why bother with VOIP? Why throw out an effective, tried-and-tested Sonet/SDH connection-oriented transport infrastructure, only to attempt to simulate it with the virtual MPLS connections over an IP packet network?

The classic rationale is to lower capital expenditure (capex) because an IP kit is inherently cheaper than Sonet/SDH. But this rationale is not enough these days as new build has declined. Carriers must also be able to lower their operational expenditure (opex), and obtain lower access costs for customers – the single largest cost of any service. VOIP should be good for this, too.

On the revenue side, VOIP should allow the deployment of new services that were not possible (or economical) on the legacy infrastructure. And this really is a key message about VOIP: There is no point in migrating if carriers are just going to run the same old services on a new infrastructure.

“At Qwest we are obviously very focused on improving our profitability and the customer price model,” says Ken Rambo, Director, Core Technology, Qwest Communications International Inc. (NYSE: Q). “Equally important, however, is the speed and flexibility of offering new services. Consumer demand for immediate and robust services has never been higher, and voice over packet is an enabling technology required to offer this leap forward.”

Changing the basis of networks is always risky, and the combination of VOIP and IP/MPLS is no exception. A really big issue is whether IP/MPLS is reliable enough to provide voice service for carriers and customers alike.

One risk is that the new infrastructure could end up costing more to run than the old. A big part of this will be the opex incurred by the carrier if the IP/MPLS equipment does not turn out to be truly carrier grade. And it is well known that IP on its own has never had good operations, administration, and maintenance (OAM) protocols, although the Internet Engineering Task Force (IETF) is trying hard to change this, as is the International Telecommunication Union (ITU). If opex were to go higher, the new packet-based services would not be as profitable as carriers would hope.

Carriers could also face losing customers who are dissatisfied with the quality of the voice service they are getting (poor availability or poor voice quality from excessive delay, for example). In this competitive environment, the threat of customer churn is always a big issue. It costs money to acquire customers, and losing them to a competitor is one of the worst things that can happen in a carrier’s business.

There is also the risk that multivendor service provisioning protocols will not materialize. A big reason for moving to packet infrastructures is to allow carriers to implement multivendor networks, instead of being limited to the predominantly single-vendor networks of today.

And, if carriers cannot get customers to migrate quickly to the new infrastructure, they will end up running two parallel networks, which is completely counter to the notion of convergence.

According to Hadriel Kaplan, Senior Solutions Architect, Avici Systems Inc. (Nasdaq: AVCI; Frankfurt: BVC7), the greatest risk carriers face is in compromising voice service quality by driving their VOIP strategy solely with the aim of cost reduction.

“For most carriers, voice is their single largest source of revenue and margin,” he says. “People are willing to live with lower quality and reliability when it gains them utility – like a cellphone, for example. But when you use a landline for fax or credit-card authorization to other revenue services, people really need them to work the same as they did last year over the PSTN.”

Reducing VOIP Risks

There are three main ways carriers can reduce these VOIP risks:

  • Improved reliability,

  • Improved OAM, and

  • Improved quality of service (QOS)

Improved reliability is basically a matter of higher reliability in links, nodes, and the VOIP end equipment attached to the network infrastructure. At a simple level, it means node equipment satisfying the Network Equipment Building System (NEBS)'s environmental standards and having built-in, basic hardware/software reliability features such as redundant power supplies, fans, control modules, hitless software upgrades, software protocol restart, and nonstop routing and forwarding. A lot of the carrier-class IP/MPLS equipment available today already has these types of features built in.

An interesting aspect of NEBS is that it has typically been applied to communications hardware, but a lot of voice over packet comes from the information technology domain, which has typically been a non-NEBS environment.

“You are really talking about mixing services between the NEBS and the non-NEBS worlds, and that convergence of information technology and communications infrastructure such as MPLS has got to be managed and planned out. That’s one of the great challenges that the industry faces,” says Qwest’s Rambo.

Then there are the techniques for recovering from link and node failures, such as MPLS Fast Reroute with sub-50ms failover time, Ethernet Rapid Spanning Tree, and VRRP (Virtual Router Redundancy Protocol).

“We also need to improve the reliability for the VOIP equipment attached to the network infrastructure, and so we see some of this infrastructure feature set being extended all the way up to the end station,” says Gary Leonard, Director of Field Marketing, Riverstone Networks Inc. (OTC: RSTN.PK).

The real challenge in OAM with VOIP is that carriers are moving from a network where the voice system is dedicated to voice, to a converged packet network where VOIP is just one of the services available over the network. That means the carriers’ underlying VOIP equipment fabric needs to successfully integrate information from the packet-network transportation layer to ensure successful and easy configuration of the VOIP network, and the VOIP system needs to avoid congestion and be able to troubleshoot voice transmission problems in the network layer.

The QOS Challenge

QOS has to be one of the most talked about topics in the industry, yet customers are generally still not happy with the QOS they think is available from packet networks. The problem is that, to date in data packet networks, the QOS is mostly best effort; it hasn’t been built around some of the legacy voice-network QOS capabilities. And best effort is OK for lots of things, but it might not be good enough for real-time video or voice.

One problem is that voice over the public Internet, or voice over an unmanaged IP network, is a completely different QOS problem compared to the problem with voice over a managed IP network. There are already many operational, carrier-based VOIP networks where a properly managed IP network provides excellent voice quality. In fact, many people have used those connections with standard phones calls and have never known that the calls were being delivered over a VOIP connection.

But there are specific network requirements for VOIP that are different or even foreign to the data point of view. The primary one is the delivery of packets for voice, where it is more important with voice to deliver most of the packets at a constant rate with as little delay as possible – that is, in real time – than it is to deliver all the packets with 100 percent accuracy.

The IP/MPLS network infrastructure is quickly moving towards improved QOS functionality with many different capacities being addressed. There are traffic-engineered tunnels in the core, and there is the class-of-service capability at the edge, where certain types of services can be mapped to the DiffServ control point at the SCP, and classes are expedited as assured best effort. Some people might say that these are not true QOS functions; they are class-of-service functions, but they do allow tunnels, paths, or circuits through the network with some level of differentiated services: low, medium, and high. There is a lot of work progressing in a variety of different areas. The Metro Ethernet Forum (MEF), for example, is working on a QOS initiative for Ethernet access and Ethernet aggregation at core networks. The MPLS and Frame Relay Alliance is working on a lot of different legacy-technology mappings as well as new Ethernet network class mappings to IP/MPLS services.

But they all need to work together to give the reliability, the improved OAM, and the QOS to deliver the VOIP service.

Reliability is clearly a hot issue. But what kind of reliability does VOIP need to have? And there is a key difference between simple reliability and availability. Availability relates to people's expectation of being able to use the service at any time, any day. Users do expect a higher level of availability for voice services than for IT. This means no maintenance windows for upgrades, no worm or virus impacts, and no call-attempt blocking during busy hours. And once a call is made, it has to stay up, with no multisecond convergence events, and without any oversubscription/congestion effects.

This is in stark contrast to legacy IT availability for services such as email or Web browsing. They have always been based on best-effort traffic with congestion/throttling inherent in the protocols, so that if there are more users than the system can handle, they get a reduced subset of the available bandwidth. This just can’t happen with VOIP.

And VOIP is more than just user voice – it includes fax and modem services. Awkwardly, fax/modem cannot be compressed, does not handle connection interruption well, and is used for many revenue-generating applications for customers (such as credit-card transactions) – so carriers can’t ignore these legacy applications.

Where Do Failures Happen in IP Networks?

The first step to improve VOIP reliability is to look for the source of failures, and Figure 3 gives the classic answer: these new packet networks are vulnerable to software-related failures.

This is hardly surprising, given that software has pervaded all layers of the network today, from applications downwards, so that modern packet networks contain huge amounts of software. Increasing the intelligence in the network elements by adding protocols such as OSPF-TE, RSVP-TE, Fast Reroute, etc., makes them more vulnerable to software failures. But the real point is that the applications at the top of the stack need to be more robust and more reliable than the underlying infrastructure as the underlying infrastructure goes through changes and reconfigurations.

53864_3.gifSo, a program to make VOIP more reliable has two parts:

  • Improve hardware and software reliability: This means hardware redundancy, software/protocol redundancy, link-layer resiliency, and path protection. Vitally, the links between network elements have to be hardened and protected, and this means going a step beyond basic, link-by-link protection and looking at rapid, end-to-end path protection.

  • Decouple the services so that any failure does not affect service: The softswitches and media gateways have to be able to ignore the fact that the network may be frantically reconfiguring. Any changes that happen to the hardware, software, or protocols must not affect services.

Fortunately, there is a range of technology and initiatives now at hand to help carriers reach these goals.

Improving Hardware and Software Reliability

Figure 4 roughly aligns some of the key technologies for improving reliability with the layers of the OSI protocol stack at which they operate. In terms of protection some of these mechanisms provide protection for just links and some protect both link and node, and the line is somewhat blurred today. For example, a router can be configured so that, if it fails, a link protection mechanism switches over to a backup router.

53864_4.gifAt the bottom are basic hardware reliability enhancements such as redundant power supplies and fabric control modules that are now built into a lot of carrier-class MPLS devices.

Next up the stack is Layer 1, which includes Sonet/SDH infrastructure with Automatic Protection Switching (APS) and Bidirectional Line-Switched Ring (BLSR) for sub-50ms protection reliability. Also in Layers 1/2 is Resilient Packet Ring (RPR), which is attracting renewed carrier interest for enhanced Sonet/SDH networks, as it offers more efficient use of the ring and somewhat easier configuration – and still with the sub-50ms ring-wrap capability for protection. At Layer 2 are various Ethernet technologies, such as Spanning Tree Protocol (STP), Rapid Spanning Tree protocol (RSTP), and Link Aggregation, which are all being used for hardware-link and software reliability.

At Layer 3 the key IP/MPLS work is progressing very fast, and a lot has been done on vendor interoperability tests for fast rerouting, including sub-50ms tests of vendor bypass and detour fast reroute. The key protocol is MPLS Fast Reroute, which aims to provide fast, sub-50ms link and node protection by using, at least initially, a combination of MPLS traffic-engineering, label-switch paths and established routing protocols such as IS-IS and OSPF. An issue with these routing protocols is that they converge more slowly as the network topology grows, and some vendors are looking at ways to speed up convergence. The current Internet Draft draft-ietf-mpls-rsvp-lsp-fastreroute-05.txt by Vasseur and others extends RSVP to establish backup LSP tunnels for the local repair of LSP tunnels, especially for real-time applications such as VOIP. The technique is fast because it computes and signals backup LSP tunnels in advance of failure, and redirects traffic as close to the failure point as possible, thus avoiding any path computation or signaling delays, including delays to propagate failure notification between label-switch routers (LSRs).

The IETF’s Virtual Router Redundancy Protocol (VRRP) sounds as if it should solve a lot of reliability issues in the upper regions of the OSI stack, but in reality it is much more modest, being basically for host redundancy as a default router. It provides dynamic failover in forwarding responsibility to an elected substitute router should the designated master VRRP router become unavailable. Nevertheless, this is useful for VOIP because a lot of VOIP infrastructure uses gateways, which are actually hosts. Unfortunately, VRRP is a fairly slow protection mechanism, at least for the automatic protection failure part, and takes about 3 seconds or more to operate. There is an Internet Draft draft-ietf-vrrp-ipv6-spec-06.txt from Hinden that extends VRRP to IPv6.

Vendors, of course, have developed protocols of their own to improve reliability. A couple of examples are:

  • HPS: This is Riverstone’s Hitless Protection System, which can be used to complement VRRP. Riverstone RS routers equipped with HPS and a redundant control module can overcome a failover or a software upgrade without a reboot, by constantly keeping the redundant control module updated with state information via spare CPU cycles. Riverstone says that HPS can cut reboot downtimes of anywhere between 5 and 40 minutes to around 8 seconds.

  • HPR: This is High-Performance Routing, an enhancement to IBM’s Advanced Peer-to-Peer Network (APPN) architecture that provides fast data routing and improved session reliability. It works by combining aspects of connection-oriented and connectionless-oriented architectures to try to preserve the good class-of-service (COS) capabilities of the former with the resilience of the latter. RFC 2338 defines managed objects for HPR using SMIv2.

The next critical step is to allow the VOIP elements handle the possibility that the IP/MPLS network may be trying to reconfigure itself in response to link or node failures, so that any network changes do not affect service. This is an important example of the difference between availability and reliability, because if the network fault is visible to the overlying equipment when the network is recovering, the service will not be available.

The good news is that there is a certain amount of inherent service resilience in today’s VOIP gateways against minor variations in network performance. Elastic buffers enable gateways to tolerate a certain amount of jitter and delay in network paths, and they can also mask some of the noise created when voice packets are dropped or data is corrupted. As long as the network reconfigures quickly and successfully, the voice customer may not notice much more than a pop or a short blip in the conversation.

But carriers need more than just this.

“It’s the totality of services that you are offering to the customer, not just the fact their voice is high quality, but also the enhanced services – it can’t just be about what we were offering on TDM,” says Qwest’s Rambo. “When you look at the survivability and the robustness of the application layer on top of this reconfiguring base, you’ve got to look at the way the IP/information-technology world has handled this in the past. And you’ve got to look at things like N + 1 or N × M scaling algorithms, and the ability to handle the failover from one path to a discretely different path if the time between one query/response seems to be too long.”

Management is always a tricky subject in telecom when something new comes along, and VOIP is no exception. An informal poll taken during the Webinar on which this report is based showed that nearly 54 percent of respondents had concerns over VOIP management capabilities. The basic issue is that traditional IP and voice management views are poles apart.

Traditional IP is managed pretty much as simple capacity plumbing, where millions of separate IP flows are treated as a single, best-effort service. Large IP-VPN customers are given special attention, of course, because they generate extra revenues, but they are still essentially fairly static aggregate flows between the same endpoints. Even though IP VPNs are treated specially, they are not nearly as complicated as a VOIP infrastructure needs to be.

Voice is completely different. For voice, each individual microflow counts because every voice call is important to the two parties on the end of the line. Even if they are not going to get a specific bill for that call, they expect it to be reliable and offer good audio quality, so dropped calls, crackles, echo, and other problems will not be tolerated. This means that PSTN management systems can do things like view call-detail records for each call, troubleshoot each call, and monitor/tap each call. This is more similar to a VPN-level of management, but on a much larger scale with constantly changing and churning (virtual) circuits.

Says Robert Scheible, Director of Carrier VOIP Solutions Marketing at Nortel Networks Ltd. (NYSE/Toronto: NT): “Voice management is focused on the delivery of the end-to-end voice experience, so when you get up to the voice layer you really have to take into account that you need to verify and deliver the application from end to end.”

Another difference is that voice management tools have a rather different style of working from that of IP management, as phone operators in general are not very familiar with IP or command-line interfaces (CLIs). And this voice style is very embedded, given the long history of investment with PSTN management tools.

PSTN-style tools do exist to handle these requirements, but they have not yet been integrated with the management of IP/MPLS equipment, especially since the latter’s management capability is still very much a 'work in progress,' and so varies quite dramatically between vendors, and even between individual products within a single vendor portfolio.

Nevertheless, IP/MPLS management is beginning to come up to speed; it is not complete, but it is moving forward quickly. Figure 5 lists some of the key initiatives from the ITU-T and the IETF in this area. As Light Reading has recently reported (see MPLS Gets the Management Blues), the IETF and the ITU-T have not always seen eye to eye over the importance of developing carrier-grade MPLS OAM, and this has led carriers to make something of a push for early standards from the ITU-T.

One major standards group in this area is the ITU-T Study Group 13, which has produced Recommendations Y.1710 and Y.1711. Of these, Y.1710 covers the requirements for OAM functionality for MPLS networks, and Y.1711 covers the operation and maintenance mechanism for MPLS networks. Their basic concepts are fairly simple and includes layering, and that is very well positioned from a carrier perspective.

The IETF has responded by drafting proposals such as LSP Ping, a descendant of the Ping function in IP, and LSR Self Test, for detecting network failures. Currently, the MPLS Working Groups, the Pseudo Wire Emulation WG and the Layer-2 VPN WG are all working on different aspects of MPLS OAM.

Virtual Circuit Connection Verification (VCCV) supports connection verification for pseudowire virtual circuits, and is independent of the underlying MPLS or IP tunnel technology. It uses IP-based protocols such as Ping and MPLS LSP Ping via an IP control channel associated with each pseudowire.

Something that Riverstone has focused on is at the bottom of Figure 5 in Layer 2 in the VPN area, Virtual Private LAN Service (VPLS) or Hierarchical VPLS (HVPLS). This is a multipoint Ethernet LAN service and raises interesting OAM questions, according to Riverstone’s Leonard.

“How do you troubleshoot that? How do you verify that it is operating? How do you manage it?” he asks. “What’s being pursued there are things like VPLS Ping, and VPLS Trace Route. There is also work being put forward in the Metro Ethernet Forum and the MPLS Frame Relay Alliance for interworking between legacy technologies.”

Nevertheless, recent OAM developments do show a convergence of ideas, and the ITU-T’s draft Y.17fw OAM framework is now very much aligned with the IETF work. As many vendors are following the IETF approach, there is a growing set of common tools available to carriers. Figures 6 and 7 compare the current status of some of these IETF and ITU-T MPLS OAM tools. Basic things like Defect Indication, Connectivity Verification, Continuity Checking, and Path Trace are pretty well covered; the big gaps are in Loopback and Performance Monitoring, and it will be interesting to see how the IETF in particular deals with this in the coming months.

53864_6.gif53864_7.gifVoice traffic has unique requirements that affect QOS. The two main ones are very low delay and very low jitter.

Interactive voice conversations must have low delay – if the delay is too great they just stop being interactive. And that maximum delay seems to be about 150ms from ear to ear. Unfortunately, there are limits on what can be done in the network to reduce delay, and most network devices today are already capable of very low delay in forwarding. A typical delay budget might be:

  1. 5ms used in phone to CO (×2 for an end-to-end call)

  2. 30ms used at each VOIP gateway (×2 for an end-to-end call)

  3. 15ms used by distance propagation

  4. 15ms used by serialization delay

Total = 100ms, leaving only about 50ms for the whole network. Note that transcoding uses more of the budget.

Although this aspect is pretty fixed, it is possible to do things to try to ensure that the delay remains constant.

“That’s what a lot of the underlying work on the infrastructure is there for,” says Nortel’s Scheible. “Variable delay, or jitter, in phone calls is bad. That’s partly because it confuses echo-cancellation equipment. But jitter is removed from the call by a buffer whose size has to be set to the worst-case jitter, and that buffer adds to the end-to-end delay. If you increase the jitter buffer to 50ms, you have cut into the overall delay budget for the conversion. So the worse the jitter, the more delay impact it has on the overall conversation, and we start to get into impacts where echo cancellation fails, or where the conversation is just not natural anymore.”

Apart from the human ear, other things in the network can be sensitive to delay. Faxes and modems can usually handle up to a half second of delay, but some call control may start experiencing problems if the delay exceeds 100ms in the network. Note also that call control does not go through codecs.

And, if the network goes down, or if the quality drops for too long, there are legal requirements for the service provider to report these outages.

Not only are the QOS expectations of VOIP different to those of traditional IP services, but also the characteristics of VOIP packet traffic are different to those of traditional IP data services. VOIP packets are generally smaller on average than traditional IP data-service packets, averaging about 100byte (70 and 120byte are the most common). IP data traditionally averages about 250byte, but 40 percent of packets are 40byte, and 35 percent are 1500byte.

Also the packet rate is very constant, being very similar to TDM, and it is not self-similar bursty. In other words, the burstiness is not the same over short timespans as it is over long ones, so traffic patterns change according to timescales. This is very different behavior to that of normal IP data traffic.

In terms of hardware design for routers this means a slightly different design for expedited forwarding for the higher-priority queues that are used for VOIP services. More generally, the implications of VOIP for networking equipment are:

  • Small, high-priority queues are best (Expedited Forwarding – EF – class).

  • Better to discard than delay packets (after a point).

  • Congestion and oversubscription must not affect EF traffic.

  • If bandwidth is reserved, adjusting it for time of day is sensible.

None of this is news to those who grew up in the bad old days of TDM. Small queues, discard over delay, and time-of-day reservation of bandwidth are all employed in TDM.

“The interesting thing about voice over packet is that we are starting to see the reimplementation of the time-honored techniques of how to manage congestion and how to manage the quality of service across a high-bandwidth network,” says Qwest’s Rambo. “What you are going to find is we are reimplementing the past, but with a modern technology base and with modern standards and approaches.”

Subscribe and receive the latest news from the industry.
Join 62,000+ members. Yes it's completely free.

You May Also Like