Reconstructing Degree Distribution and Triangle Counts from Edge-Sampled Graphs


Often, due to prohibitively large size or to limits to data collecting APIs, it is not possible to work with a complete network dataset and sampling is required. A type of sampling which is consistent with Twitter API restrictions is uniform edge sampling. In this paper, we propose a methodology for the recovery of two fundamental network properties from an edge-sampled network: the degree distribution and the triangle count (we estimate the totals for the network and the counts associated with each edge). We use a Bayesian approach and show a range of methods for constructing a prior which does not require assumptions about the original network. Our approach is tested on two synthetic and two real datasets with diverse degree and triangle count distributions.


This paper was is a conference version of this journal paper

Naomi A. Arnold, Raúl J. Mondragón, Richard G. Clegg
Complex Networks Conference