2015 IEEE International Conference on Big Data (Big Data)

chapter

Online pattern mining for high-dimensional data streams

Yoshitaka Yamamoto, Koji Iwanuma

2015 IEEE International Conference on Big Data (Big Data) > 2880 - 2882

This paper studies one-scan approximation algorithms for streaming data mining (SDM). Despite of the importance of pattern discovery in streaming data, this issue has not sufficiently addressed yet in the big data community. In this context, we briefly review the previously proposed SDM methods. There is a recent work to improve their limitation using the tecnique of online compression. It is based...

chapter

Scheduling of Big Data application workflows in cloud and inter-cloud environments

B. Kezia Rani, A. Vinaya Babu

2015 IEEE International Conference on Big Data (Big Data) > 2862 - 2864

2015 IEEE International Conference on Big Data (Big Data)

Large amount of data is being generated every day and is creating new challenges and opportunities which lead to extraordinary new knowledge and discoveries in many application domains ranging from science and engineering to business. One of the main challenges in this era of Big Data is how to efficiently manage and analyse such scale of data. This is challenging not only due to the size of the data,...

chapter

Shaping data: Visualization under construction

Oliver Bieh-Zimmert, Carsten Felden

2015 IEEE International Conference on Big Data (Big Data) > 2445 - 2452

2015 IEEE International Conference on Big Data (Big Data)

Generating the maximum number of visual patterns by uncovering the entire space of possible visual designs remains a challenge within the construction process of information visualization. Users interact with different mindsets consisting of design, data analysis, application development, and hardware resource usage. Therefore, they desire a flexible and productive interface that keeps them clued...

chapter

Scalable k-NN based text clustering

Alessandro Lulli, Thibault Debatty, Matteo Dell'Amico, Pietro Michiardi, more

2015 IEEE International Conference on Big Data (Big Data) > 958 - 963

2015 IEEE International Conference on Big Data (Big Data)

Clustering items using textual features is an important problem with many applications, such as root-cause analysis of spam campaigns, as well as identifying common topics in social media. Due to the sheer size of such data, algorithmic scalability becomes a major concern. In this work, we present our approach for text clustering that builds an approximate k-NN graph, which is then used to compute...

chapter

A scalable approach for data-driven taxi ride-sharing simulation

Masayo Ota, Huy Vo, Claudio Silva, Juliana Freire

2015 IEEE International Conference on Big Data (Big Data) > 888 - 897

2015 IEEE International Conference on Big Data (Big Data)

As urban population grows, cities face many challenges related to transportation, resource consumption, and the environment. Ride sharing has been proposed as an effective approach to reduce traffic congestion, gasoline consumption, and pollution. Despite great promise, researchers and policy makers lack adequate tools to assess tradeoffs and benefits of various ride-sharing strategies. Existing approaches...

chapter

A transaction model for management of replicated data with multiple consistency levels

Anand Tripathi, Bhagavathi Dhass Thirunavukarasu

2015 IEEE International Conference on Big Data (Big Data) > 470 - 477

2015 IEEE International Conference on Big Data (Big Data)

We present a transaction model which simultaneously supports different consistency levels, which include serial-izable transactions for strong consistency, and weaker consistency models such as causal snapshot isolation (CSI), CSI with commutative updates, and CSI with asynchronous updates. This model is useful in managing large-scale replicated data with different consistency guarantees to make suitable...

chapter

A community driven social recommendation system

Deepika Lalwani, D. V. L. N. Somayajulu, P. Radha Krishna

2015 IEEE International Conference on Big Data (Big Data) > 821 - 826

2015 IEEE International Conference on Big Data (Big Data)

Recommendation systems play an important role in suggesting relevant information to users. In this paper, we introduce community-wise social interactions as a new dimension for recommendations and present a social recommendation system using collaborative filtering and community detection approaches. We use (i) community detection algorithm to extract friendship relations among users by analyzing...

chapter

CINTIA: A distributed, low-latency index for big interval data

Ruslan Mavlyutov, Philippe Cudre-Mauroux

2015 IEEE International Conference on Big Data (Big Data) > 619 - 628

2015 IEEE International Conference on Big Data (Big Data)

Intervals have become prominent in data management as they are the main data structure to represent a number of key data types such as temporal or genomic data. Yet, there exists no solution to compactly store and efficiently query big interval data. In this paper we introduce CINTIA — the Checkpoint INTerval Index Array — an efficient data structure to store and query interval data, which achieves...

chapter

Parallel meta-blocking: Realizing scalable entity resolution over large, heterogeneous data

Vasilis Efthymiou, George Papadakis, George Papastefanatos, Kostas Stefanidis, more

2015 IEEE International Conference on Big Data (Big Data) > 411 - 420

2015 IEEE International Conference on Big Data (Big Data)

Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. Typically, it scales to large volumes of data through blocking: similar entities are clustered into blocks so that it suffices to perform comparisons only within each block. Meta-blocking further increases efficiency by cleaning the overlapping blocks from unnecessary comparisons. However,...

chapter

A comprehensive evaluation of NoSQL datastores in the context of historians and sensor data analysis

Arun Kumar Kalakanti, Vinay Sudhakaran, Varsha Raveendran, Nisha Menon

2015 IEEE International Conference on Big Data (Big Data) > 1797 - 1806

2015 IEEE International Conference on Big Data (Big Data)

Data historians[1] are today transitioning from their traditional role as record-keepers and planners, to tools that provide the required flexibility and responsiveness to customers' requirements in terms of the type and volume of data stored, archived and queried. Added dimensions to these requirements are the need for high performance and scalability. Businesses are realizing that traditional database...

chapter

MHT: A light-weight scalable zero-hop MPI enabled distributed key-value store

Xiaobing Zhou, Tonglin Li, Ke Wang, Dongfang Zhao, more

2015 IEEE International Conference on Big Data (Big Data) > 2901 - 2903

2015 IEEE International Conference on Big Data (Big Data)

In this paper, we propose and implement a key-value store that supports MPI while allowing application access at any time without having to declaring in the same MPI communication world. This feature may significantly simplify the application design and allow programmers leverage the power of key-value store in an intuitive way. In our preliminary experiment results captured from a supercomputer at...

chapter

Dimensional scalability of supervised and unsupervised concept drift detection: An empirical study

Jorge David Destephen Lavaire, Anshuman Singh, Mahmoud Yousef, Sumi Singh, more

2015 IEEE International Conference on Big Data (Big Data) > 2212 - 2218

2015 IEEE International Conference on Big Data (Big Data)

Big Data presents challenges for predictive analytic algorithms due to the possibility of non-stationary populations. Concept drift detection algorithms can be used to detect changes in underlying distribution in order to retrain. Most concept drift detection methods are known to scale to a relatively low number of features (a few hundred). However, in many areas, datasets with thousands or even tens...

chapter

Workload-driven adaptive data partitioning and distribution — The Cumulus approach

Ilir Fetai, Damian Murezzan, Heiko Schuldt

2015 IEEE International Conference on Big Data (Big Data) > 1688 - 1697

2015 IEEE International Conference on Big Data (Big Data)

Cloud environments usually feature several geographically distributed data centers. In order to increase the scalability of applications, many Cloud providers partition data and distribute these partitions across data centers to balance the load. However, if the partitions are not carefully chosen, it might lead to distributed transactions. This is particularly expensive when applications require...

chapter

ACURDION: An adaptive clustering-based algorithm for tracing large-scale MPI applications

Amir Bahmani, Frank Mueller

2015 IEEE International Conference on Big Data (Big Data) > 785 - 792

2015 IEEE International Conference on Big Data (Big Data)

Communication traces help developers of high-performance computing (HPC) applications understand and improve their codes. When run on large-scale HPC facilities, the scalability of tracing tools becomes a challenge. To address this problem, traces can be clustered into groups of processes that exhibit similar behavior. Instead of collecting traces information of each individual node, it then suffices...

chapter

Employing in-memory data grids for distributed graph processing

Serafettin Tasci, Murat Demirbas

2015 IEEE International Conference on Big Data (Big Data) > 1856 - 1864

2015 IEEE International Conference on Big Data (Big Data)

In-memory data grid (IMDG) is a new technology that enables scalable and low-latency processing of big data by sharding it over the RAMs of multiple servers. In this paper, we explore the design space of IMDGs to identify their advantages and avoid their drawbacks. We present the performance tradeoffs of IMDGs using unit tests on core distributed operations and data structures. For evaluation, we...

chapter

Scalable storage structure for pattern matching on big graph data

Janani Balaji, Rajshekhar Sunderraman

2015 IEEE International Conference on Big Data (Big Data) > 1848 - 1855

2015 IEEE International Conference on Big Data (Big Data)

The wide popularity of graphs in areas such as Semantic Web and Social Network has necessitated the need to develop efficient methods to store and process graph data. However, the unique structure of graphs render traditional data handling methods and storage structures inefficient when dealing with large volumes of data. Existing graph storage structures either compromise scalability by adopting...

chapter

Spark deployment and performance evaluation on the MareNostrum supercomputer

Ruben Tous, Anastasios Gounaris, Carlos Tripiana, Jordi Torres, more

2015 IEEE International Conference on Big Data (Big Data) > 299 - 306

2015 IEEE International Conference on Big Data (Big Data)

In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the...

chapter

Panopticon: A lock broker architecture for scalable transactions in the datacenter

Serafettin Tasci, Murat Demirbas

2015 IEEE International Conference on Big Data (Big Data) > 253 - 262

2015 IEEE International Conference on Big Data (Big Data)

For datacenter applications that require tight synchronization, transactions are commonly employed for achieving concurrency while preserving correctness. Unfortunately, distributed transactions are hard to scale due to the decentralized lock acquisition and coordination protocols they employ. We investigate the use of a centralized lock broker architecture to improve the efficiency/scalability for...

chapter

Machine learning at the limit

John Canny, Huasha Zhao, Bobby Jaros, Ye Chen, more

2015 IEEE International Conference on Big Data (Big Data) > 233 - 242

2015 IEEE International Conference on Big Data (Big Data)

Many systems have been developed for machine learning at scale. Performance has steadily improved, but there has been relatively little work on explicitly defining or approaching the limits of performance. In this paper we describe the application of roofline design, an approach borrowed from computer architecture, to large-scale machine learning. In roofline design, one exposes ALU, memory, and network...

chapter

QueRIE reloaded: Using matrix factorization to improve database query recommendations

Magdalini Eirinaki, Sweta Patel

2015 IEEE International Conference on Big Data (Big Data) > 1500 - 1508

2015 IEEE International Conference on Big Data (Big Data)

Interactive database exploration is a key task in information mining. Relational databases have been long used as a critical infrastructure component to access and analyze large volumes of data in a variety of applications, including ad-hoc analytics over big data, large-scale data warehouses that support business-intelligence tools, and services for scientific-data exploration. To aid the users of...

INFONA - science communication portal

2015 IEEE International Conference on Big Data (Big Data)

Online pattern mining for high-dimensional data streams

Scheduling of Big Data application workflows in cloud and inter-cloud environments

Shaping data: Visualization under construction

Scalable k-NN based text clustering

A scalable approach for data-driven taxi ride-sharing simulation

A transaction model for management of replicated data with multiple consistency levels

A community driven social recommendation system

CINTIA: A distributed, low-latency index for big interval data

Parallel meta-blocking: Realizing scalable entity resolution over large, heterogeneous data

A comprehensive evaluation of NoSQL datastores in the context of historians and sensor data analysis

MHT: A light-weight scalable zero-hop MPI enabled distributed key-value store

Dimensional scalability of supervised and unsupervised concept drift detection: An empirical study

Workload-driven adaptive data partitioning and distribution — The Cumulus approach

ACURDION: An adaptive clustering-based algorithm for tracing large-scale MPI applications

Employing in-memory data grids for distributed graph processing

Scalable storage structure for pattern matching on big graph data

Spark deployment and performance evaluation on the MareNostrum supercomputer

Panopticon: A lock broker architecture for scalable transactions in the datacenter

Machine learning at the limit

QueRIE reloaded: Using matrix factorization to improve database query recommendations

Filter options

Publication date

Keywords

INFONA - science communication portal

2015 IEEE International Conference on Big Data (Big Data) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2015 IEEE International Conference on Big Data (Big Data)