Friday, March 13, 2009

Difference Engine: Harnessing Memory Redundancy in Virtual Machines

Here is link to paper (pdf) (MP3)

Recently I came across this paper published in OSDI '08. Its an extension to VMware's page-sharing and shows some amazing and hard to believe results. VMware page-sharing mechanism scans memory for all VMs and maps pages with same contents to a single page. This achieves memory savings if multiple VMs are hosted running same OS. However, with technique discussed in this paper, we find pages that are nearly same. For such pages, they save a base page and other similar pages as delta of original page. For pages which are not similar to any other page are simply compressed. Their benchmarks shows upto 45% more memory saving over ESX page-sharing under some (specially crafted) workload.

The idea is surely very interesting but I have serious doubts if it really will be that effective for real workload. Below I give my notes on this paper and some possible extensions to this work.

Following notes refer to paper extensively. So keep the paper handy while reading these (link to paper in beginning of this post).


  • Page Sharing (for identical pages) by DE itself is much more efficient than that of ESX (with scan rate of 10,000 pages/sec). See Fig 9 and 11 in particular. After long run, ESX eventually catches up but saving is much inferior *during* benchmark.
  • For all 3 "real world" workloads (ident, MIXED-1, MIXED-2), memory saving contribution of page sharing for DE in just first few seconds is nearly 30-50%. This looks too good to be real.
  • They don't show data for any of real workloads where only *one* of mem saving mechanisms is active. We compress only the pages that are not "similar" to any other page - what are perf numbers if all pages were compressed? They only show contribution of individual methods (page sharing, delta for similar pages, compression for unique pages) but not the effect of individual methods working alone.
  • They show effect of individual methods for artificial workloads (Fig 5, 6, 7). Fig 7(b) shows big savings with patching. However this is case where pages are 95% "similar". Author has not noted in what terms they are similar. Are changes confined to end of page, beginning or at random offsets? For each of these cases patching algorithm can give significantly different patch sizes.


  • Nothing can guarantee good saving for any arbitrary workload. Authors choice of "real world" mixed workload however looks good. Proper performance evaluation with these mixed workloads should be good enough to show strength of DE however this is not done properly in this paper as noted above.
  • DE used xdelta for patching. A far more superior algorithm for this is bsdiff but this is has much higher mem/CPU requirements. Maybe work on more efficient variation is worth it.

  • Lots of performance numbers comparing different patching algorithms can be found in this thesis. (pdf)

  • Some data transforms are expected to significantly increase memory savings without much additional CPU overhead (see Future Work).


  • Paper does not conclusively proves that data de-duplication gives significant space savings over compression alone even for its sample "real world" apps.

Future Work

  • Collect suspend images of various VMs with mixed workloads (similar to this paper) and check effectiveness of these cases:
    • share identical pages only
    • delta encode similar pages only (with sharing of ident pages)
    • compress individual pages only
    • All combined (showing individual contributions here).
  • Compare different patching algorithms in above cases.
  • Redundancy can be artificially created: BWT# transform followed by MTF# encoding is known to generate redundancy for various kinds of data. So, I expect more effective data de-duplication with following modification to DE:

# BWT (Burrows-Wheeler transform)

# MTF (Move-to-front)

// This function is called after search
// for *identical* page fails.
P1 = BWT(page)
P2 = MTF(P1)
// Typically two keys per page
keySet[] = Fingerprint(P2)
Psim = (
Find Page with at least
one key matching with keySet[]
if (Psim) {
// bsdiff - or some variation
// (better than xdelta used in DE)
DeltaEncode(P2, Psim)
} else {
// Expected must higher compressibility
// due to BWT+MTF already applied

DecodePage(encodedPage) - BWT is reversible!
BWT+MTF create more redundancy within each page and hence is expected to create more redundancy across pages. There is already a compression utility rzip that exploits this. Among other things, it applies BWT+MTF and then compresses using global dictionary. Its not useful as is but the idea can surely be used.

No comments:

Post a Comment