May 26, 2015
Not satisfied with just changing how the world socializes, Facebook is setting its sights on remaking data center networking.
During Facebook 's remarkable 11 years from launch to 1.44 billion monthly active users as of the first quarter of this year, the social platform outstripped the capacity of existing networks to carry its traffic. So Facebook had to build four data centers at the heart of a global network, and design its own hardware and software, all to make sure that you don't have to wait a moment to see that photo of your nephew's trombone recital.
Facebook turned to its own designs because the service was moving too fast for networking vendors to keep up.
Figure 1: Rack 'Em Up Inside the Lulea, Sweden, data center, one of four that Facebook maintains worldwide.
"We try to work fast in terms of rolling out new topologies, new speeds, new features and flexibility. We often [find] ourselves waiting and not moving as fast as we want," Omar Baldonado, Facebook manager of the networking team, tells Light Reading.
Facebook doesn't blame vendors. "We totally understand we have special requirements that are different from the rest of their customer base," Baldonado says. "But vendors weren't meeting Facebook needs."
In addition to speed of innovation, Facebook also wants to be able to disaggregate its network -- to mix-and-match vendor-built hardware products with Facebook's own networking software.
Open source hardware design is key to Facebook's networking strategy, Baldonado says. Open source allows Facebook to collaborate with the community of hardware and software engineers outside its own walls, including other large operators such as Rackspace and Microsoft Corp. (Nasdaq: MSFT).
To that end, Facebook co-founded the Open Compute Project (OCP) in 2011, along with Intel Corp. (Nasdaq: INTC); Rackspace ; Goldman Sachs & Co. ; and Andy Bechtolsheim, co-founder of Sun Microsystems Inc. and Arista Networks Inc. and investor in Google (Nasdaq: GOOG) and other companies. The goal: to further open source data center and networking hardware designs.
The Social Network's network
Facebook has four major data centers around the world. The locations are no secret -- they have their own Facebook pages. The first was Prineville, Ore., opened four years ago in April 2011. Others are located in Altoona, Iowa; Forest City, NC; and Luleå, Sweden.
Facebook augments its data centers with a couple of dozen local PoPs to "extend the edge of the network beyond the data centers," Baldonado says. Some 82% of Facebook users are located outside the US, but the majority of data centers are inside the US, Baldonado says.
In designing its own data centers, Facebook realized early on that networking needed to be front and center. The traditional way to design a data center is to lay out racks and cages for the servers first, and then add networking cables in the remaining space, almost as an afterthought, Baldonado says.
Next Page: Weaving a Fabric
"We discovered that to grow efficiently and add on hundreds of apps you have to think about the cable design and the layout," Baldonado says. "We designed the data center with the networking layout in mind."
Initially, Facebook used network clusters, then upgraded to a data center fabric. The difference: The cluster involves hundreds of server cabinets with top-of-rack switches, which connect to larger cluster switches. The cluster is hierarchical.
The fabric is non-hierarchical, using smaller units, called "server pods," with high-performance connectivity between all the pods. It is being rolled out to data centers now, Baldonado says, but is only part of Facebook's push into networking hardware.
Facebook went into production testing of its open Wedge switch design in August 2014. The top-of-rack switch supports 16 or 32 40 Gbit/s ports. (See Facebook in Production Testing of Open 'Wedge' Switch.)
The social network followed up in February with "6-pack," a modular switch that uses Wedge as its basic building block, with 16x40 Gbit/s Ethernet ports in the front and back, for 640 Gbit/s in each direction. Another configuration exposes all 1.28 terabit/s to the back.
At the OCP US Summit in March, Facebook opened up the design of its Wedge switch, OpenBMC board management softwar and system-on-a-chip server hardware it calls Yosemite, developed in conjunction with Intel Corp. (Nasdaq: INTC) and Mellanox Technologies Ltd. (Nasdaq: MLNX). (See Facebook Releases Data Center Tech, OCP Summit 2015 in Pics: 'His Beard Is So Majestic'.)
At that time, it also introduced the FBOSS Agent, opening the central library of its FBOSS Wedge software. The agent is built on the Broadcom Corp. (Nasdaq: BRCM) OpenNSL Library to program the Broadcom ASIC inside Wedge.
Other vendors are also getting in on the act. For example, HP Inc. (NYSE: HPQ) plans to OEM Accton Technology Corp. switches based on OCP designs, using Cumulus Networks software. And Dell Technologies (Nasdaq: DELL) will also sell OCP-based switches running Cumulus, Big Switch Networks or Dell software. Likewise, Juniper introduced an OCP switch in December. (See Open Compute Project Hits Critical Mass, Juniper Introduces Open Compute Switch.)
Who's the FBOSS?
As part of its network strategy, Facebook developed its own Linux-based switch operating system which it calls FBOSS. The software lets Facebook manage switches using server management tools. Previously, Facebook found server management tools much more advanced than tools for managing networking devices. The network devices were much more difficult to manage than servers, even though there were and are far fewer networking devices than servers.
"The server folks sometimes gave us a hard time. 'We can manage hundreds of thousands of servers. Why is it hard to manage all the networking devices?' " Baldonado says. "It was because we didn't have our own software on them. But we do now with FBOSS."
With FBOSS, network operators no longer need to parse the command line interface, or navigate network management interfaces such as SNMP and MIBs. "My management problem for our switches is solved," Baldonado says. "I can manage them like I manage servers." Facebook uses the same Linux images on its switches as it does on its servers. "Server management is a much more familiar problem, at least here at Facebook."
Next Page: Futzing Around
Facebook has saved $2 billion in capex and opex on data center costs over the past three years using a combination of designing data centers for power efficiency, efficient layout of cables and custom networking hardware and software, Baldonado says.
But, embracing open source requires cultural changes for Facebook, and for carriers and cloud providers that might choose to follow that path, Baldonado says.
"It's a new technology cycle," he says. "Typically you buy networking equipment and build out the network, and you expect it to stay there, the same way, for years. But now service providers are looking to deploy new services on top of the network more quickly. They need a more flexible, agile network. We think disaggregation of the network can do that."
Disaggregation that comes from open source lets network operators upgrade software independently from hardware. You have the freedom to change hardware while keeping the same software, and vice-versa.
Operators need to run their networks the way they run server applications and make changes at the same speed. Staff needs to be familiar with Linux, in addition to Cisco, Juniper and whatever other networking equipment they're trained on.
Organizationally, network operators need to plan for dealing with multiple vendors, not a one-stop-shop for everything.
Cisco, Juniper and other incumbents can work with the new system, but they need to adapt to a disaggregated, multivendor world, Baldonado says. His message to vendors is: "This is the flexibility we need. If you can provide it to us, we're great. We build because we have to but we're totally happy to go back to buying." Vendors looking to work with Facebook can best do so through the Open Compute Project. The Facebook Altoona data center, which went online in November, is 100% OCP gear.
Will carriers join the Social Network?
Will carriers rush to follow Facebook's lead? Heavy Reading analyst Roz Roseboro says no. Carriers will be cautious, which will keep Facebook's open source hardware from being a big threat to conventional networking vendors, she says.
"I don't think they will be hugely disruptive in the near term as it's going to be a while before telcos are convinced that these switches are robust enough for them (if ever)," she says.
Carriers will have to decide how much they "want to futz around" with networking hardware and software. Few, if any, have the skills that Facebook, Google and Amazon have in these disciplines, Roseboro says.
Operators that want to use open source "still rely heavily on their suppliers. They like having things delivered with a nice neat bow, with the associated documentation and support," Roseboro said.
Want to know more about open source networking? This will be just one of the many topics covered at Light Reading's second Big Telecom Event on June 9-10 in Chicago. Get yourself registered today or get left behind!
That said, disaggregation appeals to telcos, and Roseboro expects them to "try new approaches in pockets," but not on a large scale anytime soon.
For the future, Facebook is working with OCP on a fabric switch and plans to contribute 6-pack to OCP, along with the Wedge switch, which it already contributed. Facebook and OCP plan to continue to develop open source networking software, and is looking into pushing optics to 100 Gbit/s and beyond.
"Two years ago, the networking project in OCP didn't exist," Baldonado says. "There were no open hardware designs for network switches. There was no open source software, open source hardware, no disaggregation. Now, two years later, you take a look and you have half a dozen switches whose full designs have been contributed, some accepted, others close."
About the Author(s)
You May Also Like
5G Network Automation and AI at Global Megaevents: A Telco AI-at-scale case study with Ooredoo and EricssonOct 10, 2023
5G Transport & Networking Strategies Digital Symposium.Oct 26, 2023
Improve Service Efficiency in the Call Center and Field with Slack AutomationOct 13, 2023
Open RAN Evolution Digital Symposium Day 1Jul 26, 2023