And Furthermore, Bill – Geoff Bennett

August 6, 2003

10 Min Read
And Furthermore,  Bill...

Bill, following on from my original open letter to you [now on page 2 of this column], I’d like to raise another issue concerning packet fragmentation, one that isn’t specific to Microsoft.

In fact, it’s something that the whole telecom/Internet industry needs to get its act together on, and it relates to a topic that’s high on the agenda of a lot of service providers: Virtual Private Networks.

In a nutshell, vendors need to come up with some standard ways of ensuring that their equipment can be configured to ensure that packets carrying streaming media won’t be dumped while they traverse a VPN tunnel. And in order for this to happen, service providers also need some rules on how to configure the hosts, switches, and routers so they all work in harmony when handling big packets.

I guess you may be wondering why I’m writing to you about this issue, if it concerns a whole host of other folk as well as Microsoft.

There are two reasons. First, it's folk using host software that experience the problems this creates, so you’re likely to get the blame even if it isn’t your fault. Second, someone needs to encourage the networking community to get its act together on streaming media, and I think you're just the hombre with the cojones to do this.

In the study that I cited in my original letter, the Cooperative Association for Internet Data Analysis (CAIDA) found that 16 percent of traffic flows were fragmented by tunneling technologies.

The basic problem here is that edge equipment sets maximum transmission units (MTUs) that are too big for the equipment at intermediate points in the tunnel. As the equipment typically can’t fragment packets in the middle of a tunnel, it drops the packets altogether.



One reason for this is that Path MTU Discovery – the protocol used to establish the MTU of a tunnel – sometimes doesn’t work. This may be because the host software or routers implement Path MTU Discovery in different ways.

The requirements for host software in this respect are laid down in a couple of ancient Internet Engineering Task Force (IETF) Requests for Comments – RFCs 1122 and 1123 – which date back to 1989! (I know there’ve been partial updates to these RFCs, but they don't address Path MTU Discovery.) The requirements for IP version 4 Routers (RFC 1812) are eight years old, and there’s no RFC at all covering IPv6 routers.

With such out-of-date or absent requirement RFCs, it’s hardly surprising that vendors end up implementing inconsistent approaches to Path MTU Discovery, which cause problems in practice. They’re often problems that drive service provider support staff wild, because “pings” in small packets can get through, while streaming media, contained in larger packets, can’t.



Another reason for Path MTU Discovery not working is that it’s often blocked by service providers anyhow. It’s part of the ICMP (Internet Control Message Protocol) family of protocols, and some service providers configure their equipment to block all ICMP packets because of security threats sometimes associated with the protocol family (for example, ping of death and smurf).

These Path MTU Discovery problems can happen with any VPN protocol, but there’s an added complication with Multiprotocol Label Switching (MPLS) VPNs. In some circumstances, intermediate routers can add an extra label to a packet, so that it becomes too big for the tunnel. (For an explanation of why this happens, see Note 2.)

The MPLS VPN problem is being addressed in an IETF Draft dealing with consistent MTU reporting for the Label Distribution Protocol (LDP). Work is nearly complete on this draft, being written by Ben Black of Layer 8 Networks and Kireeti Kompella of Juniper Networks Inc. (Nasdaq: JNPR). Work is already complete on the same feature for RSVP-TE (Resource Reservation Protocol – traffic engineering).

Many of the readers on the message board attached to this letter have urged increasing the Internet MTU to solve these issues. See, for example this message from celebrity poster Tony Li, Procket Networks Inc.’s chief scientist.

You can’t simply wave a magic wand and convert all Internet devices to larger MTUs. But by correct operation of the host software, routers, firewalls, load balancers, and all the other IP devices in the packet path, at least we can make the best of the MTUs that are supported today.

By the way, Bill, I really appreciate Microsoft’s David Caulton responding to the points made in my original open letter. In his post David raised a number of valid questions that I’ll respond to in due course, when I’ve got some feedback from various people. In other words, I haven't finished yet! (That's a promise, Bill, not a threat...)

Again, Microsoft (and, of course, any interested readers) may respond to these suggestions by either emailing me at [email protected] or contributing to the message board linked to this article.

And Bill? Bummer about that fine. Those Old Europeans just don't understand American Free Enterprise.

— Geoff Bennett, Director, Light Reading University

Let me start by congratulating you on Windows XP, which I think is the operating system that Windows always should have been. It’s stable; you can make it look like Windows 95 if you want; and Intel has managed to drag enough MIPS out of a Pentium 4 to make it possible to run Word and Excel at a decent speed.

I see you’re now preparing something even better – the next release of Windows, codenamed Longhorn – so I wanted to bend your ear on some improvements that you could incorporate in it that would have a big impact on networking performance.

Sooner or later, I’ll submit a few ideas through the official channel – by filling in a Microsoft Feature Request – but I suspect Microsoft receives so many of these that my voice wouldn’t be heard. After all, I’m just one of millions of other Microsoft Corp. customers.

That’s why I’m writing this open letter to you. I’m hoping to encourage the whole Light Reading community (which comprises just about everybody that’s anything to do with Internet, telecom, and cable infrastructure) to propose additional improvements to Windows that would result in improvements in network performance.

Even better, Microsoft could respond to our suggestions on Light Reading, by either emailing me at [email protected] or contributing to the message board linked to this article.

Let me start the ball rolling with one specific example based on the issue of packet fragmentation. You surely know, Bill (I may call you Bill, mayn't I?), what IP packet fragmentation is. (If you're a bit fuzzy on this, see Note 1 for a brief explanation.)

Fragmentation is a Bad Thing in the Internet, because everything (hosts and routers) has to work so much harder to break up IP packets and put them back together again. And fragments all have their own IP headers, which means there’s a lot more overhead.

In fact, IPv4 packet fragmentation is such a bad thing, that when they came to design IPv6, the idea of fragmentation was dropped, and all IPv6 hosts and routers must support correct Path MTU Discovery.

But what can we do about it? How about we start by considering which packets get fragmented on the Internet today? Let’s say we measure real Internet traffic to see which applications actually cause fragments. Now, if we were to find that 50 different applications were responsible for creating fragments, we’d have to conclude that doing something to prevent fragmentation would be really tough.

But a team from the Cooperative Association for Internet Data Analysis (CAIDA) has actually made these measurements, and the results are summarized in their paper: Beyond Folklore: Observations on Fragmented Traffic.

It turns out that a staggering 52 percent of IPv4 fragment series come from one application – can you guess what it is? Microsoft flipping Media Player! In case you didn't realize, that's why Media Player clips are always jerky when you stream them over the Net!

So, Suggestion #1 is that Windows Media Player must implement RFC 1191 Path MTU Discovery. While you’re at it, Bill, why not make sure your entire .NET suite and all the stuff you’re writing for the X-Box for online gaming does the same?

So that's my two cents... and I suspect I'm just scratching the surface. We'll see what other readers of Light Reading come up with.

— Geoff Bennett, Director, Light Reading University

Note 1: IPv4 packet fragmentation

IP packet fragmentation is as old as IP itself. Basically, different network technologies that carry IP datagrams have different sizes of frames they can support. This size is called the Maximum Transmission Unit (MTU). If an IP datagram has to be routed over a network with a smaller MTU than the current size of the packet, then the router that sits between the two networks will fragment the packet to the correct size, and slap a new IP header on each fragment. Remember that in connectionless IP we can’t know what path the datagram will take until it takes it! When the fragments reach the final destination host, they’re reassembled into the original datagram. Fragmentation is very bad for at least four reasons...

  • 1. The fragmentation process is a drain on router resources, and is often performed in the router “slow path.” In other words, it could be one or two orders of magnitude slower to fragment packets than to simply forward them.2. When the fragments arrive at the final host, it has to waste CPU resources on reassembling the packets.3. If fragments are delayed or misordered this leads to high latencies in the application while transmission timeouts are encountered.4. Fragmentation leads to additional overhead. If a packet is “just a little too long” to fit a given network technology, then the second IP fragment will still have to be a minimum legal size (64 bytes for Ethernet).

In theory, in a stream of packets with the same source/destination addresses, packets could be diversely routed. But in practice, such packets will follow the same path unless a link changes state (or if load balancing is used). So there’s a simple technique called Path MTU Discovery (defined in RFC 1191) in which an ICMP message is sent to the destination host. Routers in the path will “negotiate down” the MTU value in the Path Discovery message to ensure that the hosts send MTUs that are small enough for the minimum network MTU in the path. RFC 1191 was published in 1990, so there’s absolutely no excuse (other than ignorance or networking) for not implementing it. But for Path Discovery to work correctly, all the hosts and all of the routers have to take part.

If you’d like to find out more about fragmentation, there’s a useful FAQ here.

Note 2: Why tunnel routers might report the wrong MTU

In a provider-provisioned VPN architecture, the provider edge (PE) router performs the task of adding labels to the front of packets that have to enter a VPN tunnel. In an RFC 2547bis VPN, for example, two MPLS labels are added for a total of 8 bytes. If the packet is already at the maximum MTU for the underlying technology (e.g., 1,522 bytes for Ethernet and Fast Ethernet) then the PE router will have to fragment the packet. If the router is configured not to fragment (as many are), then it will simply drop any packets that are longer than 1,514 bytes prior to the addition of the tunnel labels. This is why a short, diagnostic ping will get through (a ping will send 32 bytes of data, plus overhead, unless it’s instructed otherwise), but a full packet will not.

But if it were that simple there’d be no problem. The PE router simply reports an MTU of 1,514 bytes when Path Discovery messages pass through. The issue is that the PE router may not know if other labels will be added by provider routers in the core. This could happen if, for example, MPLS Fast Reroute had to be invoked. It could also happen if the VPN tunnel were itself tunneled through another MPLS network. The only surefire way to find the correct MTU for the PE device is to enable the signaling protocols to collect information from LSRs (label-switched routers) in the path during the LSP (label-switched path) setup, and this is exactly what RSVP-TE (Resource Reservation Protocol – traffic engineering) and LDP (Label Distribution Protocol) extensions are designed to accomplish.

Subscribe and receive the latest news from the industry.
Join 62,000+ members. Yes it's completely free.

You May Also Like