Identify the influential user in online social networks using R, Hadoop and Python

K Sailaja Kumar; D Evangelin Geetha; N Nagesh; T V Sai Manoj

doi:10.1109/CIMCA.2016.8053302

Identify the influential user in online social networks using R, Hadoop and Python

Kumar, K Sailaja, Geetha, D Evangelin, Nagesh, N, Manoj, T V Sai

Source

2016 International Conference on Circuits, Controls, Communications and Computing (I4C) > 1 - 6

Abstract

The most powerful medium for communication among the individuals to share their valuable thoughts are Online Social Networks (OSNs). ‘Twitter’ is one of the most popular OSN rich with public data/tweets. In this paper we used Twitter Streaming API ‘streamR’ which is provided by ‘R’ statistical programming language, to extract the real-time tweets from Twitter. The tweet has many attributes which can be further analyzed to find most significant information about the Twitter user. We considered three attributes: screen-name, follower-count and friend-count. Twitter data is scaled up from gigabytes to petabytes and standalone system could not withstand or process this huge data due to hardware constraints. We used the prevailing parallel computing environment provided by ‘Hadoop’ with ‘Python’ programming language to analyze the Twitter users' whose follower-count and friend-count is less than 5000. We identified the user with maximum follower-count as the influential user who can contribute and maximize the information diffusion in Twitter.