On the predictability of large transfer TCP throughput
This paper looks at ways of predicting the TCP throughput of a connection. The assumption is that some information is available about the connection. A comparison is made between “formula based” (FB) prediction, that is using round-trip time and loss versus time series analysis prediction (referred to here as history based (HB)), that is using previous measurements on the same connection. Both approaches require some measurements from the connection already.
The formula used is the standard from Padhye et al 2000 (a rather nice model based approach to the problem) which takes RTT, loss and TCP parameters such as max window, timeout and parameter (flights per window size increase). They add corrections to this formula for connections which are observed lossless. They also point out that queues at network edges increase RTT beyond its base value. In addition ping based sampling of a network will underestimate the loss suffered by a TCP connection as a TCP connection will ramp up to fill a pipe and hence be part of the loss problem.
Measurement data is collected on 35 internet paths on RON testbed (nodes at US universities, + 2 in europe and 1 in korea). The 35 paths include 5 transatlantic , one korea to NY and the rest within US. Seven traces are used for each path – total 245 traces which collect available bw using tool called pathload – each trace is 150 measurements on path. Tool measures available bw then estiamtes RTT and loss. Then iperf is used to load path and RTT and loss rate measured during transfer. “A 50 second transfer… is enough to ensure that the flow spends a negligible fraction of its lifetime in the initial slow-start.” Measurements all from May 2004.
In general formula based prediction rarely underestimates bandwidth but sometimes overestimates – the overestimates tend to be larger in magnitude. In 40% of measurements the RTT increased significantly during a transfer and in “almost all” measurements the loss rate increased. “An increase in loss rate from 0.1% to 1% can cause a throughput overestimation by a factor of about 3.2”. Prediction errors remain signficant if the real RTT and loss during the connection are known. “More than half the prediction errors are still larger than a factor of two”.
History based prediction uses standard time series tools – n-step moving average (not real time-series analysis MA model), exponentially weighted moving average, Holt-Winters (non-seasonal). This is pretty much a starting point for time-series analysis and it seems clear that other tools could easily be tried – e.g. proper time series ARMA or even ARIMA model.
The history based methods perform reasonably at prediction even with just limited training data. Simple heuristics for outliers and level shifts improve errors. Some paths are more predictable than others.
The authors then investigate what makes a TCP flow predictable. They confirm that highly variable flows are harder to predict, flows where the initial measurement of available bw varies are harder to predict, flows competing with several other flows are harder to predict.
Conclusions: HB prediction good but requires initial data and varies with underlying path. FB prediction attractive because doesn't require intrusive measurements but can be inaccurate.
Page generated 2014-01-07, by jemdoc.