On the Distribution of Traffic Volumes in the Internet and its Implications

Submitted by richard on Fri, 06/28/2019 - 10:23
Proc IEEE Infocom

This paper updates previous work on fitting traffic profiles. We use more modern statistical techniques to question (and refute) previous assumptions about heavy tails in statistics. In this case we believe that the best fit for traffic volume per unit time is the log-normal distribution. Tail distributions an have big impacts for capacity planning and for prediction of pricing (say 95th percentile).

Mobile Sensor Data Anonymization

Submitted by richard on Wed, 06/05/2019 - 09:42
ACM/IEEE International Conference on Internet-of-Things Design and Implementation

This paper looks the problem of releasing time-series data when privacy is a concern. It uses information theory to look at what extra information could "leak" if our device sends motion data. For example, can users be reidentified or can features such as height and weight be determined. A machine learning framework is given that can produce a tradeoff between allowing useful data to pass through while distorting the signal minimally to disguise information we wish to be private.

On rate limitation mechanisms for TCP throughput: a longitudinal analysis

Submitted by richard on Tue, 12/13/2016 - 17:00
Computer Networks

This paper is a considerably expanded version of the INFOCOM paper.

Again it argues that TCP is no longer mainly controlled by loss and congestion but instead by algorithms and settings under the control of the sender or receiver deliberately or accidentally designed to restrict throughput for a variety of reasons (for example limiting video sending to the rate at which the viewer is watching).

It contains extended discussion of the methodology and in particular how flight and RTT data was extracted from passive traces.

Likelihood-based assessment of dynamic networks

Submitted by richard on Thu, 12/24/2015 - 11:36
Journal of Complex Networks

This paper used a likelihood based framework to create a rigorous way to assess models of networks. Network evolution is broken down into an operation model (it decides the 'type' of change to be made to the network, e.g. "add node" "add link" "remove node" "remove link") and an object model (that decides the exact change -- which node/link to add).

The system is shown to be able to recover known parameters on artificial models and to be useful in analysis of real data.

This work can generate graphs from a very large family with the aim of fitting those graph to parameters of real data sets.

Walking in Sync: Two is Company, Three’s a Crowd

Submitted by richard on Tue, 04/07/2015 - 19:27
2nd Workshop on Physical Analytics (WPA), Florence, Italy

This paper describes preliminary results on analysing the movements of people walking next to each other. The data is collected from mobile phone movement sensors carried by experimental subjects. The accelerometers on mobile phones show synchronisation when compared. Correlations between time series are used to infer the presence of a third party with when people are walking. This is preliminary work on a small data set with only three participants.

TCP in the wild

Submitted by richard on Thu, 06/26/2014 - 16:38
Kings College London

This talk is essentially the same as that delivered in Cambridge two months earlier (alas no progress on this research for that period).

The research is based on two papers:
A longitudinal analysis of Internet rate limitations --
On the relationship between fundamental measurements in TCP flows --

The essential findings are that TCP is not working as we expect. The expected correlation between throughput and packet loss is not found. The correlation with delay (RTT) is as expected -- throughput proportional to 1/delay. A high correlation with flow length is found -- longer flows have higher throughput. However, this may be a sampling error due to the restricted length of the samples used.

The TCP flows studied are broken down by assumed cause where TCP mechanisms are thought not to be the primary cause for throughput:

1) Application limited -- an application decides to reduce its own flow by deliberately not sending data.
2) Host Window limited -- one or other host has a low maximum window size that restricts flow.
3) Advertised window limitation -- a middlebox or the receiver manipulates their advertised window size to reduce the flow.

More than half of TCP flows (and more than 80% of long flows) are limited by these mechanisms and not by traditional TCP mechanisms.