# Traffic Statistics

The investigation of network traffic using statistical analysis to gain insight into performance and behaviour.

## Mobile Sensor Data Anonymization

This paper looks the problem of releasing time-series data when privacy is a concern. It uses information theory to look at what extra information could "leak" if our device sends motion data. For example, can users be reidentified or can features such as height and weight be determined. A machine learning framework is given that can produce a tradeoff between allowing useful data to pass through while distorting the signal minimally to disguise information we wish to be private.

## TCP in the wild

This talk is essentially the same as that delivered in Cambridge two months earlier (alas no progress on this research for that period).

The research is based on two papers:

A longitudinal analysis of Internet rate limitations -- http://www.richardclegg.org/tcp_rate_infocom_2014

and

On the relationship between fundamental measurements in TCP flows -- http://www.richardclegg.org/tcp_limitations_icc_2013

The essential findings are that TCP is not working as we expect. The expected correlation between throughput and packet loss is not found. The correlation with delay (RTT) is as expected -- throughput proportional to 1/delay. A high correlation with flow length is found -- longer flows have higher throughput. However, this may be a sampling error due to the restricted length of the samples used.

The TCP flows studied are broken down by assumed cause where TCP mechanisms are thought not to be the primary cause for throughput:

1) Application limited -- an application decides to reduce its own flow by deliberately not sending data.

2) Host Window limited -- one or other host has a low maximum window size that restricts flow.

3) Advertised window limitation -- a middlebox or the receiver manipulates their advertised window size to reduce the flow.

More than half of TCP flows (and more than 80% of long flows) are limited by these mechanisms and not by traditional TCP mechanisms.

## TCP in the Wild

This talk is an updated version of this talk at QMUL. The difference is two slides at the end which provide insight into the sampling issues related to the data.

The key message of this paper is that TCP/IP does not work in the real world as it is generally taught. The idea of a connection when one side sends data as fast as possible controlled by loss to fill a pipe is not what happens in the real world.

This work joins the two papers

A longitudinal analysis of Internet rate limitations (INFOCOM 2014)

and

On the relationship between fundamental measurements in TCP flows (ICC 2013)

The talk analyses passive traces with the aim of explaining what are the root causes of bandwidth on a connection. Theoretical results show that in equilibrium an unconstrained TCP flow has a bandwidth proportional to 1/RTT and 1/sqrt(p) where p is probability of packet loss. The experimental results here show different results, however. In particular, while the relationship with RTT is upheld, the relationship with loss is not found. A strong relationship with the length of flow is found. Longer flows have faster throughput in proportion to sqrt(L) where L is the length of the flow in packets.

A follow up analysis looks at the causes of throughput. It is found that less than half of flows are governed by loss. Flow bandwidth is very often governed by applications -- for example you tube deliberately throttles traffic so that users do not download too far ahead. Some flows are governed by operating system restrictions which do not scale window sizes. Some flows are governed by middleboxes which manipulate the window size. It is these restrictions which, the network studied, are the primary method which restricts bandwidth on connections.

## Studying TCP in the wild

An updated version of this talk was given at Cambridge and can be seen here

The key message of this paper is that TCP/IP does not work in the real world as it is generally taught. The idea of a connection when one side sends data as fast as possible controlled by loss to fill a pipe is not what happens in the real world.

This work joins the two papers

A longitudinal analysis of Internet rate limitations (INFOCOM 2014)

and

On the relationship between fundamental measurements in TCP flows (ICC 2013)

The talk analyses passive traces with the aim of explaining what are the root causes of bandwidth on a connection. Theoretical results show that in equilibrium an unconstrained TCP flow has a bandwidth proportional to 1/RTT and 1/sqrt(p) where p is probability of packet loss. The experimental results here show different results, however. In particular, while the relationship with RTT is upheld, the relationship with loss is not found. A strong relationship with the length of flow is found. Longer flows have faster throughput in proportion to sqrt(L) where L is the length of the flow in packets.

A follow up analysis looks at the causes of throughput. It is found that less than half of flows are governed by loss. Flow bandwidth is very often governed by applications -- for example you tube deliberately throttles traffic so that users do not download too far ahead. Some flows are governed by operating system restrictions which do not scale window sizes. Some flows are governed by middleboxes which manipulate the window size. It is these restrictions which, the network studied, are the primary method which restricts bandwidth on connections.

## A longitudinal analysis of Internet rate limitations

This paper looks at when TCP is "not" TCP by analysis of five years of data on a Japanese data set. That is to say, when TCP throughput is limited by mechanisms other than traditional TCP rate control (loss or delay in the network feedback causing a reduction in window size).

Other mechanisms are important:

1) Application limiting where the sender "dribbles" out data more slowly, for example in the way that you tube does, to reduce their bandwidth.

2) Window size limitations -- where hosts have an OS built in limitation on how large the TCP window can be.

3) Middle box/receiver window tweaking -- where the receiver or (more likely) a middle box tweaks the advertised window size to reduce throughput.

It is found that in the traces studied these three mechanisms account for more than half the packets. The traces include data from well known sites such as YouTube and it seems likely that the findings are more general than just applicability to this particular trace set.

In general this paper finds that TCP in the wild is not behaving in the way it is traditionally taught... by a variety of mechanisms, TCP is not "filling a pipe" and "controlled by loss"... other mechanisms are at play beyond traditional TCP congestion control.

## A discrete-time Markov modulated queuing system with batched arrivals

This paper looks at a markov chain based model and uses queuing theory to analyse its performance. The system is D-BMAP/D/1 and a closed form solution is found

## A critical look at power law modelling of the Internet

The aim of this paper is to provide a summary and a critique of power law modelling in the internet. Long-range dependence and self-similarity are considered as well as scale-free topology analysis.

## Forecasting Full-Path Network Congestion Using One Bit Signalling

This paper looks at a mechanism related to Explicit Congestion Notification. It uses a single bit in the IP header to communicate the congestion at each hop in the path. Statistical estimators are used to work out the accuracy of the congestion estimation.

## Criticisms of modelling packet traffic using long-range dependence (extended version)

This paper looks at the phenomenon of long-range dependence. It shows that certain long-range dependent models give answers which contain infinities and also that this behaviour will not be detected by a naive modelling approach. The work is an extension of an earlier published PMECT paper.

This talk is based around the Transactions on Networking paper. We use 232 traffic traces to establish that for "mid-large" internet link (backbone links or ingress/egress links from reasonable sized institutions) the traffic is well-modelled by a log-normal distribution.

The associated paper is here:

https://arxiv.org/abs/2007.10150