2016 IEEE International Conference on Big Data (Big Data)

chapter

Scalable link community detection: A local dispersion-aware approach

Alex Delis, Alexandros Ntoulas, Panagiotis Liakos

2016 IEEE International Conference on Big Data (Big Data) > 716 - 725

Real-life systems involving interacting objects are typically modeled as graphs and can often grow very large in size. Revealing the community structure of such systems is crucial in helping us better understand their complex nature. However, the ever-increasing size of real-world graphs, and our evolving perception of what a community is, make the task of community detection very challenging. One...

chapter

Advantage of integration in big data: Feature generation in multi-relational databases for imbalanced learning

Farrukh Ahmed, Michele Samorani, Colin Bellinger, Osmar R. Zaiane

2016 IEEE International Conference on Big Data (Big Data) > 532 - 539

2016 IEEE International Conference on Big Data (Big Data)

Most real world applications comprise databases having multiple tables. It becomes further complicated in the realm of Big Data where related information is spread over different data repositories. However, data mining techniques are usually applied on a single flat table. This work focuses on generating a mining table by aggregating information from multiple local tables and external data sources...

chapter

Sampling labelled profile data for identity resolution

Matthew Edwards, Stephen Wattam, Paul Rayson, Awais Rashid

2016 IEEE International Conference on Big Data (Big Data) > 540 - 547

2016 IEEE International Conference on Big Data (Big Data)

Identity resolution capability for social networking profiles is important for a range of purposes, from open-source intelligence applications to forming semantic web connections. Yet replication of research in this area is hampered by the lack of access to ground-truth data linking the identities of profiles from different networks. Almost all data sources previously used by researchers are no longer...

chapter

Pick your choice in HBase: Security or performance

Frank Pallas, Johannes Gunther, David Bermbach

2016 IEEE International Conference on Big Data (Big Data) > 548 - 554

2016 IEEE International Conference on Big Data (Big Data)

When analyzing sensitive data in a cloud-deployed Hadoop stack, data-in-transit security needs to be enabled, especially in the underlying storage tier. This, however, will affect the performance of the system and may partially offset the cost benefits of the cloud. In this paper, we discuss two strategies for securing HBase deployments in the cloud. For both, we present benchmarking results which...

chapter

Accelerating range queries for large-scale unstructured meshes

Cuong Nguyen, Philip J. Rhodes

2016 IEEE International Conference on Big Data (Big Data) > 502 - 511

2016 IEEE International Conference on Big Data (Big Data)

Scientific datasets are steadily growing in size, due to increasing resolution and scale. Unstructured meshes are essential to certain fields of engineering and science, but they present special challenges for efficient access and processing. The work described in this paper accelerates range queries for very large unstructured meshes using the GPU. Prior work in the area introduced a preprocessing...

chapter

Adapting to data sparsity for efficient parallel PARAFAC tensor decomposition in Hadoop

Kareem S. Aggour, Bulent Yener

2016 IEEE International Conference on Big Data (Big Data) > 294 - 301

2016 IEEE International Conference on Big Data (Big Data)

Parallel Factor Analysis (PARAFAC) is used in many scientific disciplines to decompose multimodal datasets ('tensors') into principal factors to uncover multilinear relationships in the data. Today's popular implementations of PARAFAC are single-server solutions that do not scale well to big datasets. This paper presents the design, implementation, and testing of a Big Data-enabled Parallel PARAFAC...

chapter

Big data framework interference in restricted private cloud settings

Stratos Dimopoulos, Chandra Krintz, Rich Wolski

2016 IEEE International Conference on Big Data (Big Data) > 335 - 340

2016 IEEE International Conference on Big Data (Big Data)

In this paper, we characterize the behavior of “big” and “fast” data analysis frameworks, in multi-tenant, shared settings for which computing resources (CPU and memory) are limited, an increasingly common scenario used to increase utilization and lower cost. We study how popular analytics frameworks behave and interfere with each other under such constraints. We empirically evaluate Hadoop, Spark,...

chapter

Parallel clustering method for non-disjoint partitioning of large-scale data based on spark framework

Abir Zayani, Chiheb-Eddine Ben N'Cir, Nadia Essoussi

2016 IEEE International Conference on Big Data (Big Data) > 1064 - 1069

2016 IEEE International Conference on Big Data (Big Data)

Clustering large scale data has become an important challenge which motivates several recent works. While the emphasis has been on the organization of massive data into disjoint groups, this work considers the identification of non-disjoint groups rather than the disjoint ones. In this setting, it is possible for data object to belong simultaneously to several groups since many real-world applications...

chapter

Point of interest recommendation with social and geographical influence

Da-Chuan Zhang, Mei Li, Chang-Dong Wang

2016 IEEE International Conference on Big Data (Big Data) > 1070 - 1075

2016 IEEE International Conference on Big Data (Big Data)

Point of interest (POI) recommendation, a service which can help people discover useful and interesting locations has emerged rapidly with the development of location-based social networks (LBSNs), like Foursquare, Gowalla and Wechat. The large number of check-in histories make it possible to mine the preference of each user and then to provide accurate personalized POI recommendation. In real-world...

chapter

CCRP: Customized cooperative resource provisioning for high resource utilization in clouds

Jinwei Liu, Haiying Shen, Husnu S. Narman

2016 IEEE International Conference on Big Data (Big Data) > 243 - 252

2016 IEEE International Conference on Big Data (Big Data)

In cloud systems, efficient resource provisioning is needed to maximize the resource utilization while reducing the Service Level Objective (SLO) violation rate, which is important to cloud providers for high profit. Several methods have been proposed to provide efficient provisioning. However, the previous methods do not consider leveraging the complementary of jobs' requirements on different resource...

chapter

GraphFlow: Workflow-based big graph processing

Sara Riazi, Boyana Norris

2016 IEEE International Conference on Big Data (Big Data) > 3336 - 3343

2016 IEEE International Conference on Big Data (Big Data)

We introduce GraphFlow, a big graph framework that is able to encode complex data science experiments as a set of high-level workflows. GraphFlow combines the Spark big data processing platform and the Galaxy workflow management system to offer a set of components for graph processing using a novel interaction model for creating and using complex workflows. GraphFlow contributes an easy-to-use interface...

chapter

Uncovering information flow among users by time-series retweet data: Who is a friend of whom on Twitter?

Yuka Kamiko, Mitsuo Yoshida, Hirotada Ohashi, Fujio Toriumi

2016 IEEE International Conference on Big Data (Big Data) > 2500 - 2504

2016 IEEE International Conference on Big Data (Big Data)

Although it is crucial to transmit important information to those who require it during disasters, neither of the following questions have been answered: who contributes to information diffusion? How do users construct helpful relationships in social media? Unfortunately, most previous research has focused on the scale of information diffusion, instead of the flow of information and the paths traveled...

chapter

Finding informative comments for video viewing

Seungwoo Choi, Aviv Segev

2016 IEEE International Conference on Big Data (Big Data) > 2457 - 2465

2016 IEEE International Conference on Big Data (Big Data)

Video is an increasingly important method of information-sharing on the Web. Services such as YouTube, Vimeo, and Liveleak are platforms that support uploading User-Generated Content. Users tend to seek related information during or after watching an informative video by finding and reading comments on Web services. However, existing services only support sorting by recentness (newest) or rating (LIKES...

chapter

Implementing trajectory data stream analysis in parallel

Yongyi Xian, Chuanfei Xu, Yan Liu

2016 IEEE International Conference on Big Data (Big Data) > 2431 - 2436

2016 IEEE International Conference on Big Data (Big Data)

Implementing trajectory data stream analysis in parallel has technical issues of data partition and improvements of the analysis operations. In this paper, we define the trajectory analysis problem as discovering trajectory companies of moving objects. We develop a discovery workflow in parallel batch processing. We solve technical issues of data partition and data locality in the steps of analysis...

chapter

An adaptive information-theoretic approach for identifying temporal correlations in big data sets

Nguyen Ho, Huy Vo, Mai Vu

2016 IEEE International Conference on Big Data (Big Data) > 666 - 675

2016 IEEE International Conference on Big Data (Big Data)

In the past two decades, new developments in computing, sensing and crowdsourced data have resulted in an explosion in the availability of quantitative information. The possibilities of analyzing this so-called “big data” to inform research and the decision-making process are virtually endless. In general analyses have to be done across multiple data sets in order to bring out the most value of big...

chapter

Online social network evolution: Revisiting the Twitter graph

Hariton Efstathiades, Demetris Antoniades, George Pallis, Marios D. Dikaiakos, more

2016 IEEE International Conference on Big Data (Big Data) > 626 - 635

2016 IEEE International Conference on Big Data (Big Data)

In 2010 the popular paper by Kwak et al. [11] presented the first comprehensive study of Twitter as it appeared in 2009, using most of the Twitter network at the time. Since then, Twitter's popularity and usage has exploded, experiencing a 10-fold increase. As of 2015, it has more than 500 million users, out of which 316 million are active, i.e. logging into the service at least once a month.¹ In...

chapter

TV ratings vs. social media engagement: Big social data analytics of the Scandinavian TV talk show Skavlan

Henrikke Hovda Larsen, Johanna Margareta Forsberg, Sigrid Viken Hemstad, Raghava Rao Mukkamala, more

2016 IEEE International Conference on Big Data (Big Data) > 3849 - 3858

2016 IEEE International Conference on Big Data (Big Data)

This paper explores the relationship between TV viewership ratings for Scandinavian's most popular talk show, Skavlan and public opinions expressed on its Facebook page. The research aim is to examine whether the activity on social media affects the number of viewers per episode of Skavlan, how the viewers are affected by discussions on the Talk Show, and whether this creates debate on social media...

chapter

Max-node sampling: An expansion-densification algorithm for data collection

Katchaguy Areekijseree, Ricky Laishram, Sucheta Soundarajan

2016 IEEE International Conference on Big Data (Big Data) > 3944 - 3946

2016 IEEE International Conference on Big Data (Big Data)

In this work, we propose Max-Node sampling, a novel sampling algorithm for data collection. The goal of Max-Node is to maximize the number of nodes observed in the sample, given a budget constraint. Max-Node is based on the intuition that networks contain many densely connected regions (i.e., communities), that may be only weakly connected to another, and to maximize the number of nodes observed,...

chapter

Towards optimizing large-scale data transfers with end-to-end integrity verification

Si Liu, Eun-Sung Jung, Rajkumar Kettimuthu, Xian-He Sun, more

2016 IEEE International Conference on Big Data (Big Data) > 3002 - 3007

2016 IEEE International Conference on Big Data (Big Data)

The scale of scientific data generated by experimental facilities and simulations on high-performance computing facilities has been growing rapidly. In many cases, this data needs to be transferred rapidly and reliably to remote facilities for storage, analysis, sharing etc. At the same time, users want to verify the integrity of the data by doing a checksum after the data has been written to disk...

chapter

Big data availability: Selective partial checkpointing for in-memory database queries

Daniel Playfair, Amitabh Trehan, Barry McLarnon, Dimitrios S. Nikolopoulos

2016 IEEE International Conference on Big Data (Big Data) > 2785 - 2794

2016 IEEE International Conference on Big Data (Big Data)

Fault tolerance is an important challenge for supporting critical big data analytic operations. Most existing solutions only provide fault tolerant data replication, requiring failed queries to be restarted. This approach is insufficient for long-running time-sensitive analytic queries, due to lost query progress. Several solutions provide intra-query fault tolerance. However, these focus on distributed...

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data)

Scalable link community detection: A local dispersion-aware approach

Advantage of integration in big data: Feature generation in multi-relational databases for imbalanced learning

Sampling labelled profile data for identity resolution

Pick your choice in HBase: Security or performance

Accelerating range queries for large-scale unstructured meshes

Adapting to data sparsity for efficient parallel PARAFAC tensor decomposition in Hadoop

Big data framework interference in restricted private cloud settings

Parallel clustering method for non-disjoint partitioning of large-scale data based on spark framework

Point of interest recommendation with social and geographical influence

CCRP: Customized cooperative resource provisioning for high resource utilization in clouds

GraphFlow: Workflow-based big graph processing

Uncovering information flow among users by time-series retweet data: Who is a friend of whom on Twitter?

Finding informative comments for video viewing

Implementing trajectory data stream analysis in parallel

An adaptive information-theoretic approach for identifying temporal correlations in big data sets

Online social network evolution: Revisiting the Twitter graph

TV ratings vs. social media engagement: Big social data analytics of the Scandinavian TV talk show Skavlan

Max-node sampling: An expansion-densification algorithm for data collection

Towards optimizing large-scale data transfers with end-to-end integrity verification

Big data availability: Selective partial checkpointing for in-memory database queries

Filter options

Publication date

Keywords

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2016 IEEE International Conference on Big Data (Big Data)