x
White Box

AT&T Submits DDC White Box Using Broadcom Chips to Open Compute Project

DALLAS -- AT&T submitted today to the Open Compute Project (OCP) its specifications for a Distributed Disaggregated Chassis (DDC) white box architecture. The DDC design, which we built around Broadcom’s Jericho2 family of merchant silicon chips, aims to define a standard set of configurable building blocks to construct service provider-class routers, ranging from single line card systems, a.k.a. "pizza boxes," to large, disaggregated chassis clusters.

AT&T plans to apply the Jericho2 DDC design to the provider edge (PE) and core routers that comprise our global IP Common Backbone (CBB) – our core network that carries all of our IP traffic. Additionally, the Jericho2 chips have been optimized for 400 gigabits per second interfaces – a key capability as AT&T updates its network to support 400G in the 5G era.

"The release of our DDC specifications to the OCP takes our white box strategy to the next level," said Chris Rice, SVP of Network Infrastructure and Cloud at AT&T. "We’re entering an era where 100G simply can’t handle all of the new demands on our network. Designing a class of routers that can operate at 400G is critical to supporting the massive bandwidth demands that will come with 5G and fiber-based broadband services. We’re confident these specifications will set an industry standard for DDC white box architecture that other service providers will adopt and embrace."

AT&T’s DDC white box design calls for three key building blocks:

  1. A line card system that supports 40 x 100G client ports, plus 13 400G fabric-facing ports.
  2. A line card system that support 10 x 400G client ports, plus 13 400G fabric-facing ports.
  3. A fabric system that supports 48 x 400G ports. A smaller, 24 x 400G fabric systems is also included.

Traditional high capacity routers use a modular chassis design. In that design, the service provider purchases the empty chassis itself and plugs in vendor-specific common equipment cards that include power supplies, fans, fabric cards, and controllers. In order to grow the capacity of the router, the service provider can add line cards that provide the client interfaces. Those line cards mate to the fabric cards through an electrical backplane, and the fabric provides the connectivity between the ingress and egress line cards.

The same logical components exist in the DDC design. But now, the line cards and fabric cards are implemented as stand-alone white boxes, each with their own power supplies, fans and controllers, and the backplane connectivity is replaced with external cabling. This approach enables massive horizontal scale-out as the system capacity is no longer limited by the physical dimensions of the chassis or the electrical conductance of the backplane. Cooling is significantly simplified as the components can be physically distributed if required. The strict manufacturing tolerances needed to build the modular chassis and the possibility of bent pins on the backplane are completely avoided.

Four typical DDC configurations include:

  1. A single line card system that supports 4 terabytes per second (Tbps) of capacity.
  2. A small cluster that consists of 1 plus 1 (added reliability) fabric systems and up to 4 line card systems. This configuration would support 16 Tbps of capacity.
  3. A medium cluster that consists of 7 fabric systems and up to 24 line card systems. This configuration supports 96 Tbps of capacity.
  4. A large cluster that consists of 13 fabric systems and up to 48 line card systems. This configuration supports 192 Tbps of capacity.

The links between the line card systems and the fabric systems operate at 400G and use a cell-based protocol that distributes packets across many links. The design inherently supports redundancy in the event fabric links fail.

"We are excited to see AT&T's white box vision and leadership resulting in growing merchant silicon use across their next generation network, while influencing the entire industry," said Ram Velaga, SVP and GM of Switch Products at Broadcom. "AT&T's work toward the standardization of the Jericho2 based DDC is an important step in the creation of a thriving eco-system for cost effective and highly scalable routers."

"Our early lab testing of Jericho2 DDC white boxes has been extremely encouraging," said Michael Satterlee, vice president of Network Infrastructure and Services at AT&T. "We chose the Broadcom Jericho2 chip because it has the deep buffers, route scale, and port density service providers require. The Ramon fabric chip enables the flexible horizontal scale-out of the DDC design. We anticipate extensive applications in our network for this very modular hardware design."

AT&T

Internuthatch 9/27/2019 | 8:43:16 PM
Re: CRS? Additional stages were absolutely necessary for the scale at the time. We're talking very different generations of the chipset.

Any lock-in allows the vendor to extort the customer. This way, it'll just be through third parties. There cannot be other chipset vendors because the technology is closed and proprietary.

I agree with the disaggregation direction. I disagree with standardizing on an unnecessary and proprietary fabric.

 
skipper123 9/27/2019 | 5:29:47 PM
Re: CRS? Additional stages are extra cost and extra latency and extra sw complexity.

 

If you can achieve the scale you need (i think brcm gets to thousands of egress line cards) you dont need so much of fabric overhead.

 

The lockin on the chip vendor is clear. But if you dont lockin on the hardware (which is where the money is) you make a lot of savings.

Today you need to buy the chassis, line cards, fabric and software from one vendor. Tomorrow you will buy standard hardware (yes, currently based on single chip vendor, but in the futures may not) and can bring other NOS.

Even choosing between two is a huge advantage if i could run junos and xr on the same hardware. There has been (and currently being worked on by more than one that i earlier mentioned) other OSs (procket, avici, charlotte) they mostly fail because of exactly the lockin to the hardware.

Look at what happened in the data center. The giant IBM mainframes were thought to be irreplaceable and now they are viewed at dinosaurs. Same thing here. If collectively the industry will go to disaggregation (just like in the data centers) we will have a lower cost infrastructure, open for innovation and more affordable than what it is now.

We need to think how to move away from the duo pole and we will all benefit on scale, price and flexibility.

I believe what at&t presented here is the future of networking. If they are willing to lead we should all give a hand for a better future of the Internet
Internuthatch 9/27/2019 | 4:23:43 PM
Re: CRS? Additional stages allow for increased scalability and are a benefit, not a drawback.  You grow a Clos fabric by adding stages.

The addressing issue is quite clear: cells have a fixed-sized field representing the destination. That bounds the scalability of the fabric.

The fabric architecture and implementation, and especially the control flow implementation, are proprietary to the chip manufacturer. Having a white box wrapped around it by a third party does exactly nothing for removing that lock-in.

Today, there are exactly two proven carrier-class core-scale protocol stacks: Juniper and Cisco.  Good luck breaking in that third and fourth one.  All previous software stacks (gated, NextHop, IpInfusion, XORP, ...) have all gone to ground.

 
Internuthatch 9/27/2019 | 4:23:42 PM
Re: CRS? Additional stages allow for increased scalability and are a benefit, not a drawback.  You grow a Clos fabric by adding stages.

The addressing issue is quite clear: cells have a fixed-sized field representing the destination. That bounds the scalability of the fabric.

The fabric architecture and implementation, and especially the control flow implementation, are proprietary to the chip manufacturer. Having a white box wrapped around it by a third party does exactly nothing for removing that lock-in.

Today, there are exactly two proven carrier-class core-scale protocol stacks: Juniper and Cisco.  Good luck breaking in that third and fourth one.  All previous software stacks (gated, NextHop, IpInfusion, XORP, ...) have all gone to ground.

 
skipper123 9/27/2019 | 3:54:21 PM
Re: CRS? Not quite. The crs/ncs multi chassis are of multi stage (s1/s2/s3). In the referenced architectural description there is only s1. There is no s2 and on egress there is no fabric stage. There is no addressing issue known so far(100s of line cards can connect) As for moving from system lockin to chip locking it is somewhat true but it can change when there is another chip set vendor and more important, if on the same hardware you can install NOS of different vendors, you have introduced a giant leap towards the freedom from the OEM lockin
Internuthatch 9/27/2019 | 3:00:54 PM
Re: CRS? This is _exactly_ multi-chassis, using the exact same fabric architecture that Cisco has been using since it started using the Dune fabric. Scalability is limited by cell addressing. It moves the lockin from the system vendor to the chip vendor. How does that help?
skipper123 9/27/2019 | 1:59:57 PM
Re: CRS? This is not multi chassis here. Att refers to disaggregated chassis (single stage fabric different than multi stage of the crs). On the contrary, att approach provides virtually unlimited capacity since there is no chassis or rack boundaries. You scale to 100s of Tbps. I saw a webex from a new comer vendor (www.drivenets.com). These guy ride on the exact path AT&T is presenting here. Disaggregation of hardware from software and scaling horizontally to 800T or so. They also did it (or will do it?) on third party hardware so they provide the sw and an ODM provides the hw so you are no longer locked on single incumbent vendor. HTH.
Internuthatch 9/27/2019 | 12:46:20 PM
CRS? High capacity routers went multi-chassis in 2004 with the Cisco CRS.  Modular chassis were not able to satisfy the bandwidth requirements.
HOME
Sign In
SEARCH
CLOSE
MORE
CLOSE