Facebook is developing a new traffic management technology called Autoscale to optimize energy consumption by 10-15% at its data centers.
Facebook currently uses a traditional round-robin approach to load balancing, but found that was less than optimal, because servers running low-level loads use power more inefficiently than idle servers or servers running at moderate or greater loads, writes Qiang Wu, Facebook infrastructure software engineer, on the Facebook Code Engineering Blog.
Autoscale is designed to optimize workloads so that servers are either idling, or running at medium capacity. It tries to avoid assigning workloads in a way that results in servers running at low capacity, Wu writes.
An idle server consumes about 60 watts. It takes a big power hit, to 130 watts, when it jumps to low-level CPU utilization, for a small number of requests per second. But it only takes a small power hit, to 150 watts, when it goes from low-level to medium-level CPU utilization, Wu writes.
Therefore, from a power-efficiency perspective, we should try to avoid running a server at low RPS and instead try to run at medium RPS.
To tackle this problem and utilize power more efficiently, we changed the way that load is distributed to the different web servers in a cluster. The basic idea of Autoscale is that instead of a purely round-robin approach, the load balancer will concentrate workload to a server until it has at least a medium-level workload. If the overall workload is low (like at around midnight), the load balancer will use only a subset of servers. Other servers can be left running idle or be used for batch-processing workloads.
Though the idea sounds simple, it is a challenging task to implement effectively and robustly for a large-scale system.
Autoscale dynamically adjusts the size of the server pool in use, so that each active server will get at least a medium-level CPU load. Servers not in the active pool don't receive traffic.
Optimizing both performance and power consumption was key in developing decision logic for traffic management: "On one hand, we want to maximize the energy-saving opportunity. On the other, we don't want to over-concentrate the traffic in a way that could affect site performance."
Results have been promising:
Autoscale led to a 27% power savings around midnight (and, as expected, the power saving was 0% around peak hours). The average power saving over a 24-hour cycle is about 10-15% for different web clusters.
Facebook is driving open source data center hardware design with its own Open Compute project. The project is self-serving -- Facebook runs among the most massive data centers in the world, and data center cost savings improves Facebook's bottom line. Facebook says it has saved $1.2 billion over three years using the Open Compute hardware designs it champions. (See Open Compute Project Takes on Networking.)
Earlier this week, Facebook bought PrivateCore, a security software company, to beef up its server security. (See Facebook Buys PrivateCore for Server Security.)