XenLoop : A Transparent Inter-VM Network Loopback Channel

Advances in virtualization technology have focused mainly on strengthening the isolation barrier between virtual machines (VMs) that are co-resident within a single physical machine. At the same time, there exist a large category of communication intensive distributed applications and software components, such as web services, high performance grid applications, transaction processing, and graphics rendering, that often wish to communicate across this isolation barrier with other endpoints on co-resident VMs. State of the art inter-VM communication mechanisms do not adequately address the requirements of such applications. TCP/UDP based network communication tends to perform poorly when used between co-resident VMs, but has the advantage of being transparent to user applications. Other solutions exploit inter-domain shared memory mechanisms to improve communication latency and bandwidth, but require applications or user libraries to be rewritten against customized APIs -- something not practical for a large majority of distributed applications. In this project, we have developed a fully transparent and high performance inter-VM network loopback channel, called XenLoop, in the Xen virtual machine environment. XenLoop does not sacrifice either user or kernel-level transparency and yet achieves high communication performance between co-resident guest VMs. XenLoop intercepts outgoing network packets and sheperds the packets destined to co-resident VMs through a high-speed inter-VM shared memory channel that bypasses the virtualized network interface. Guest VMs using XenLoop can migrate transparently across machines without disrupting ongoing network communications, and seamlessly switch between the standard network path and the XenLoop channel. In evaluation using a number of unmodified benchmarks, we observe that XenLoop can reduce the inter-VM round trip latency by up to a factor of 5 and increase bandwidth by up to a factor of 6.

Publications

  1. Jian Wang, Kwame-Lante Wright, and Kartik Gopalan, XenLoop : A Transparent High Performance Inter-VM Network Loopback, In the Journal of Cluster Computing -- Special Issue on HPDC, Volume 12, Number 2, pages 141--152, 2009. [pdf] [bibtex]
  2. Jian Wang, Kwame-Lante Wright, and Kartik Gopalan, XenLoop : A Transparent High Performance Inter-VM Network Loopback, In the International Symposium on High Performance Distributed Computing (HPDC), Boston, MA, June 2008. [pdf] [bibtex]

Download XenLoop Code

The XenLoop code is available for download (although its no longer maintained).
For Xen-3.2.0 [xenloop-2.0.tgz] [README-2.0]

Frequently Asked Questions

Q: Does XenLoop work with "routed" mode networking configuration?
Ans:
	Likely not yet. We've tested it only in "bridged" mode.
	Routed mode might need some more code changes. 
	Suggestions welcome!
Q: I get the following error on Xen 3.1
ERROR xf_connect: line 198: HYPERVISOR_grant_table_op failed ret = -1 status = 0
ERROR: Exiting xf_connect
ERROR xf_connect: line 198: HYPERVISOR_grant_table_op failed ret = -1 status = 0
ERROR: Exiting xf_connect
Ans:
	By default, 3.1 doesn't seem to allow two unprivileged domains to
	share memory (for some obscure reasons).
	I understand this had apparently been fixed, but
	they left some unnecessary permission checks in
	xen/common/grant_table.c in the xen hypervisor.
	Edit this file and comment out all the occurences of
	the following lines

	       if ( unlikely(!grant_operation_permitted(d)) )
		   goto out;

	mostly in do_grant_op function

	Then recompile and reinstall the xen hypervisor (recompiling the
	guest is not necessary)

	These changes are not needed from Xen 3.2 onwards.
Q: If i try to rmmod xenloop, I see lots of
   "WARNING: g.e. still in use!" 
messages on the guest domain with lower Xen domain ID.

Ans:
	Yes, this is an annoying, but benign, warning message. It happens 
	if the listener end of XenLoop (with lower domid) revokes page grants 
	before the connector end disconnects. It seems to be benign as long
	as both endpoints dissociate from each other. We haven't yet found a 
	perfectly clean way to synchronize the two endpoints so that connector 
	disconnects before listener revokes. We believe you can ignore 
	these messages for now.