A probabilistic approach towards modeling email network with realistic features

Quangang Li; Jinqiao Shi; Tingwen Liu; Li Guo; Zhiguang Qin

doi:10.1109/ICCCN.2014.6911760

A probabilistic approach towards modeling email network with realistic features

Li, Quangang, Shi, Jinqiao, Liu, Tingwen, Guo, Li, Qin, Zhiguang

Source

2014 23rd International Conference on Computer Communication and Networks (ICCCN) > 1 - 8

Abstract

Email plays a very important role in our daily life. Much work have been put into practice on email network. Those studies mostly require real email network datasets and reliable models to analyze user information and understand the mechanisms of network evolution. However, much research work is constrained by the absence of real large-scale email datasets. Although email communication is ubiquitous, there are very few large-scale available email datasets satisfied different research purposes. Due to privacy policy and restricted permissions, it is arduous to collect a real large-scale email dataset in a short time. Various social network models are usually used to create synthetic email networks. However, these models focus on modeling several structural properties of network without considering user behaviour patterns. They are not appropriate to generate large-scale realistic synthetic email network datasets. Towards this end, we propose a probabilistic model by which we can construct large-scale synthetic email datasets with a small captured email log. What is more important is that the generated synthetic dataset matches real email network properties and individual communication patterns. Moreover, it has linear complexity, and can be paralleled easily. Experimental results on Enron dataset demonstrate the above benefits of our model.