XCo: Explicit Coordination to Prevent Ethernet Congestion in Cloud Computing Clusters

Large cluster-based cloud computing platforms increasingly use commodity Ethernet technologies, such as Gigabit Ethernet, 10GigE, and Fibre Channel over Ethernet (FCoE), for intra-cluster communication. Traffic congestion can become a performance concern in the Ethernet due to consolidation of data, storage, and control traffic over a common layer-2 fabric, as well as consolidation of multiple virtual machines (VMs) over less physical hardware. Even as networking vendors race to develop switch-level hardware support for congestion management, we make the case that virtualization has opened up a complementary set of opportunities to reduce or even eliminate network congestion in cloud computing clusters. We present the design, implementation, and evaluation of a system called XCo, that performs explicit coordination of network transmissions over a shared Ethernet fabric to proactively prevent network congestion. XCo is a software-only distributed solution executing only in the end-nodes. A central controller uses explicit permissions to temporally separate (at millisecond granularity) the transmissions from competing senders through congested links. XCo is fully transparent to applications, presently deployable, and independent of any switch-level hardware support. We present a detailed evaluation of our XCo prototype across a number of network congestion scenarios, and demonstrate that XCo significantly improves network performance during periods of congestion.


