Nowadays, the message diffusion links among users or Web sites drive the development of countless innovative applications. However, in reality, it is easier for us to observe the time stamps when different nodes in the network react on a message, while the connections empowering the diffusion of the message remain hidden. This motivates recent extensive studies on the network inference problem: unveiling the edges from the records of messages disseminated through them. Existing solutions are computationally expensive, which motivates us to develop an efficient two-step general framework, Clustering Embedded Network Inference (CENI). CENI integrates clustering strategies to improve the efficiency of network inference. By clustering nodes directly on the time lines of messages, we propose two naive implementations of CENI: Infection-centric CENI and Cascade-centric CENI. Additionally, we point out the critical dimension problem of CENI: Instead of one-dimensional time lines, we need to first project the nodes to an Euclidean space of certain dimension before clustering. A CENI adopting clustering method on the projected space can better preserve the structure hidden in the cascades and generate more accurately inferred links. By addressing the critical dimension problem, we propose the third implementation of the CENI framework: Projection-based CENI. Through extensive experiments on two real datasets, we show that the three CENI models only need around 20–50 % of the running time of state-of-the-art methods. Moreover, the inferred edges of Projection-based CENI preserve or even outperform the effectiveness of state-of-the-art methods.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.