& cplSiteName &
Comments
Threaded  |  Newest First  |  Oldest First        ADD A COMMENT
Duh!
Duh!
1/2/2019 | 4:16:16 PM
So many questions
Well, that was clear as mud.

What exactly is a "third-party network management card" and what is an "invalid frame packet"? [I think I can guess] And how did a broadcast storm in a "secondary channel" (OTN General Communications Channel?) wipe out working paths?

That's the first few that came to mind.  Lots more where they came from.
Phil Harvey
Phil Harvey
1/2/2019 | 4:58:04 PM
Re: So many questions
Hi, Duh!

Yes, CTL isn't going to directly call out the specific vendor product, sku or software provider. We'll find out at some point and report it.

I think the takeaway is that, as my colleague Ray put it, a company that manages networks for a living just had a massive network management problem that it couldn't find, didn't know how to fix, etc. 

That said, please keep the questions coming. We're hoping to have more to report in the next few days.

-ph
Keebler
Keebler
1/2/2019 | 5:07:07 PM
Reminiscent of TARP storms of old
The event reminds me of the TARP storms that plagued SONET networks when TARP was first introduced. TARP messages would replicate at the gateways to rings in both directions, circulate the ring, and get replicated again. Even with time-to-live settings, the amount of traffic quickly overwhelmed the systems and resulted in outages. It was hard to find and nontrivial to fix.

Sounds like those who forget history are doomed to repeat it. Or something along those lines.

Anyone taking bets yet on who the third party equipment vendor was this time?
Phil Harvey
Phil Harvey
1/2/2019 | 5:08:42 PM
Re: Reminiscent of TARP storms of old
Good call back. Was that something on one of the old RBOC networks -- US West or SBC? 
Keebler
Keebler
1/2/2019 | 5:12:46 PM
Re: Reminiscent of TARP storms of old
It was definitely on an RBOC network. Around 1997 I believe. My memory isn't quite good enough to recall exactly which one, but maybe Ameritech? That could be completely off. I usually throw out the Ameritech name just to confuse the youngsters.
Phil Harvey
Phil Harvey
1/2/2019 | 5:17:01 PM
Re: Reminiscent of TARP storms of old
That was even years before Verizon started hiring extra creepy white guys as their star pitchmen. 
brooks7
brooks7
1/2/2019 | 5:37:45 PM
Re: Reminiscent of TARP storms of old
The first thing I thought was...somebody still has x.25 in their oss network.

 

seven
Phil Harvey
Phil Harvey
1/2/2019 | 5:56:05 PM
Re: Reminiscent of TARP storms of old
And that (x.25) is a pre-IP networking way of getting switches to connect to OSS systems/carrier back offices?

If so, then there would be some kind of gateway sitting between the (presumably really old) switch and the IP network? 

 
brooks7
brooks7
1/2/2019 | 7:14:36 PM
Re: Reminiscent of TARP storms of old
That is correct Phil.  I recently looked at a network that had such gear still in place with an x,25 switch maker that went out of business 20 years ago or so.

seven

Edit:  And yes it was used to connect to the systems for OSMINE.
Duh!
Duh!
1/2/2019 | 10:49:57 PM
Re: Reminiscent of TARP storms of old
X.25 is the absolutely last protocol suite I would ever associate with packet storms. Heavyweight flow control was one of it's main architectural principles.
brooks7
brooks7
1/3/2019 | 10:27:04 AM
Re: Reminiscent of TARP storms of old
@Duh!,

 

And that is why a failure would go so badly.....

seven

 
f_goldstein
f_goldstein
1/3/2019 | 2:54:04 PM
Re: Reminiscent of TARP storms of old
The outage impacted optical services, not just packet service, but was caused by packets. That points to a control plane on an optical network that has something more elaborte than simple mangement commands. The packet of death didn't just knock out change-control, it knocked out circuits. What could that be?

I'm guessing that this was a GMPLS failure. That takes all of the brokenness of the IP protocol suite and puts it in charge of the underlying optical layer. It's what comes from folks who don't come from telecom, and who think the whole world will forever be IP and that IP is somehow infallible heavenly writ. The higher layer routing protocol's job is to route around physical failures. Put the physical layer under its control and of course hilarity ensues.

Surely somebody here is willing to name CTL's major optical vendor.
Phil Harvey
Phil Harvey
1/3/2019 | 5:38:52 PM
Re: Reminiscent of TARP storms of old
So you think it was GMPLS signaling between optical transport devices?
Phil Harvey
Phil Harvey
1/3/2019 | 5:40:04 PM
Re: Reminiscent of TARP storms of old
And, if so, do you interpret CTL's statement of "third-party network management card" as someone who built a protocol stack for the systems vendor? 
f_goldstein
f_goldstein
1/3/2019 | 5:59:09 PM
Re: Reminiscent of TARP storms of old
GMPLS comes to mind, because they were losing optical streams, and that's the usual approach to treating optical streams like BESQR cat videos. I am guessing that "third party" simply means the hardware vendor, not a real third party. There's an interesting alleged "outage report" on comp.dcom.telecom.
Phil Harvey
Phil Harvey
1/3/2019 | 6:02:55 PM
Re: Reminiscent of TARP storms of old
I'll have to ask CTL but it has some of the exact phrases that they sent me in their email replies to my questions. 
Phil Harvey
Phil Harvey
1/3/2019 | 6:48:34 PM
Re: Reminiscent of TARP storms of old
CTL said it "can't confirm its authenticity and don't have any additional information to share."

Which is not the same as, that's a fake, we didn't write it.
Sterling Perrin
Sterling Perrin
1/4/2019 | 8:49:30 AM
Re: Reminiscent of TARP storms of old
f_goldstein,

Your explanation seems the most reasonable of the ones I've seen so far - i.e., the explanation does fit the cryptic CTL commentary. IF this is the explanation, one thing that puzzles me is that GMPLS is well-entrenched in networks and has been for more than a decade. So why has this never happened before? (I'm not aware of a GMPLS breakdown like this at any time in the past.)

 

Sterling
f_goldstein
f_goldstein
1/4/2019 | 9:28:43 AM
Re: Reminiscent of TARP storms of old
Like many bugs in complex software, it is hard to test for and you don't know it's there until it's too late. Especially when it's triggered by a hardware failure. There's a hardware failure mode in some routers wherein the card stops relaying MPLS and other traffic (its job) but does maintain physical connectivity and its IS-IS daemon dutifully reports that the link is up. In that case the black hole it creates is relatively easy to locate. GMPLS, like MPLS and IP, depends on some underlying route determination code. And apparently the bad packets knocked out circuits previously set up. Here, it seems as if a packet of death, with no address, was allowed to propagate. The vendor probably didn't test for that since it's not supposed to happen. I'm guessing that a misdesign in the vendor code somewhere relayed, rather than forwarded, that packet, perhaps treating its non-address as  a broadcast address. That would really be, uh, hilarious.

But at this point we're just guessing based on limited information.


Featured Video
Upcoming Live Events
March 16-18, 2020, Embassy Suites, Denver, Colorado
May 18-20, 2020, Irving Convention Center, Dallas, TX
All Upcoming Live Events
Upcoming Webinars
Webinar Archive