The power of prediction -- Cloud bandwidth and cost reduction

Eyal Zohar, Israel Cidon and Osnat Mokryn

This paper deals with reducing costs for cloud computing users. Cloud customers use “Traffic Redundancy Elimination” (TRE) to reduce bandwidth costs. Redundant data chunks are detected and removed – cloud providers will not implement middleboxes for this as they have no incentive. The paper gives a TRE solution which does not require a server to maintain client status. The system is known as PACK “Predictive ACKnowledgements” which is receiver driven.

The basic unit of data is the chunk. The receiver implements a chunk store which operates by the LRU principle. Chunks are identified by “hints” which are easily computed functions which identify data with a fast algorithm with low false positives. If hints match then SHA-1 signatures are checked and if matched then data about subsequent chunks (referred to as a “chain”) can be retrieved from pointers in chunk data. So, a chunk is the data, a signature to uniquely identify the data and a link to the next chunk (if received). On receiving a match the receiver asks the sender if the next chunks will be the same as the ones in its store by sending the identity of several chunks. This is known as a PRED command (predict). If this is correct the sender sends PRED-ACK and the receiver sends the chunks to its own TCP input buffers. If the prediction is not correct the sender continues to send as normal. The TCP Options field is used to carry the PACK wired protocol.

Section 4 lists some optimisations.

  • If lots of correct predictions are made then the rate at which the receiver sends predictions could be a bottleneck. An optimistic algorithm sends more predictions if many predictions are successful.

  • Several chunks can be combined to a similar effect.

  • If changes in data are scattered then PACK becomes inefficient. A hybrid mode allows searching for dispersed changes and moving to a sender driven mode.

Evaluation is tried on 24 hours of data from an ISP PoP. Youtube data isolated using DPI (13% by volume). PACK is tested for ability to reduce redundant data and reduces traffic by 30% (after an initial period where the cache of chunks is filling). Static data sets of the linux kernel in various versions (40 versions) and a user Gmail account with 1140 messages (1.09GB) were checked. Together the set of linux kernel files showed 83.1% redundancy and the email 31.6%. The algorithm is further analysed for computational load in the server and receiver based modes. The computational load to some extent reduces the benefits of the traffic reduction (in a situation where users pay for both traffic and CPU use) but still results in a reduction of bill by 20% (for the Youtube traffic example).

Tests are run on a fully working and implemented system with a workload consisting of emails, FTP data, http videos. CPU usage and real and virtual traffic volumes were checked. The cost estimates were based on amazon EC2 pricing comparing PACK with a sender based Redudancy Elimination algorithm similar to one from literature. The rival does not perform well as the SHA-1 matching on every chunk produces CPU usage expense.

Show bibtex
Page generated 2014-01-07, by jemdoc.