Using a Bayesian approach to reconstruct graph statistics after edge sampling


Often, due to prohibitively large size or to limits to data collecting APIs, it is not possible to work with a complete network dataset and sampling is required. A type of sampling which is consistent with Twitter API restrictions is uniform edge sampling. In this paper, we propose a methodology for the recovery of two fundamental network properties from an edge-sampled network: the degree distribution and the triangle count (we estimate the totals for the network and the counts associated with each edge). We use a Bayesian approach and show a range of methods for constructing a prior which does not require assumptions about the original network. Our approach is tested on two synthetic and three real datasets with diverse sizes, degree distributions, degree-degree correlations and triangle count distributions.


This article deals with the problem of reconstructing a network that has been "edge sampled". An example might be a social network like twitter where the API limits the number of connections that can be sampled in a research setting, showing you only, say, 10% or 1% of all user-user connections if the sampled network is large. We look at the problem of reconstructing basic network properties like the degree distribution and the triangle count. We use both real network data sets and artificial networks. 

Naomi Arnold, Raul Mondragón, Richard G. Clegg
Applied Network Science