2016 IEEE International Conference on Big Data (Big Data)

chapter

Analyzing the performance of data replication and data partitioning in the cloud: The BEOWULF approach

Alexander Stiemer, Ilir Fetai, Heiko Schuldt

2016 IEEE International Conference on Big Data (Big Data) > 2837 - 2846

Applications deployed in the Cloud usually come with dedicated performance and availability requirements. This can be achieved by replicating data across several sites and/or by partitioning data. Data replication allows to parallelize read requests and thus to decrease data access latency, but induces significant overhead for the synchronization of updates. Partitioning, in contrast, is highly beneficial...

chapter

Bad big data science

Frank S. Haug

2016 IEEE International Conference on Big Data (Big Data) > 2863 - 2871

2016 IEEE International Conference on Big Data (Big Data)

As hardware and software technologies have improved, our definition of a “manageable amount of data” has increased in its scope dramatically. The term “big data” can be applied to any of several different projects and technologies sharing the ultimate goal of supporting analysis on these large, heterogeneous, and evolving data sets. The term “data science” refers to the statistical, technical, and...

chapter

Digree: A middleware for a graph databases polystore

Vasilis Spyropoulos, Christina Vasilakopoulou, Yannis Kotidis

2016 IEEE International Conference on Big Data (Big Data) > 2580 - 2589

2016 IEEE International Conference on Big Data (Big Data)

In this paper we present Digree, an experimental middleware system that can execute graph pattern matching queries over databases hosting voluminous graph datasets. First, we formally present the employed data model and the processes of re-writing a query into an equivalent set of subqueries and subsequently combining the partial results into the final result set. Our framework guarantees the correctness...

chapter

Next-gen tools for big scientific data: ARM data center example

Ranjeet Devarakonda, Kyle Dumas, Sheman Beus, Everett Rush, more

2016 IEEE International Conference on Big Data (Big Data) > 3968 - 3970

2016 IEEE International Conference on Big Data (Big Data)

The Atmospheric Radiation Measurement (ARM) Climate Research Facility (www.arm.gov) provides atmospheric observations from diverse climatic regimes around the world. Currently, ARM archives over 22 million user assessable data files, primarily stored in NetCDF file format, with total data volumes close to one Petabyte. In this paper, we will discuss how ARM is currently storing, distributing, cataloging...

chapter

Exploring the utilization of places through a scalable “Activities in Places” analysis mechanism

Linlin You, Bige Tuncer

2016 IEEE International Conference on Big Data (Big Data) > 3563 - 3572

2016 IEEE International Conference on Big Data (Big Data)

People sensing data have been successfully utilized in various domains to support a more livable place with on-demand transport system, green environment, profitable economy and interactive governance, however, their potentials in supporting the design of places are not widely studied and explained. As an on-going multidisciplinary project in Singapore, “Livable Places” mins valuable insights from...

chapter

An experimental study of big spatial data systems

Andrew Hulbert, Thomas Kunicki, James N. Hughes, Anthony D. Fox, more

2016 IEEE International Conference on Big Data (Big Data) > 2664 - 2671

2016 IEEE International Conference on Big Data (Big Data)

With the rise of location-aware IoT devices, there is an increased desire to process and manage the stationary and moving trajectory data generated by these real-time sensors. There has been a corresponding evolution of distributed database and compute technology to handle the increasing data load. Here we describe challenges in managing this kind of data and survey the technologies that address those...

chapter

Data-at-rest security for spark

Syed Yousaf Shah, Brent Paulovicks, Petros Zerfos

2016 IEEE International Conference on Big Data (Big Data) > 1464 - 1473

2016 IEEE International Conference on Big Data (Big Data)

Apache Spark enables fast computations and greatly accelerates analytics applications by efficiently utilizing the main memory and caching data for later use. At its core Apache Spark uses data structures called RDDs (Resilient Distributed Datasets) to give a unified view to the distributed data. However, the data represented in the RDDs remain unencrypted which can result in leakage of confidential...

chapter

Distributed rank-1 dictionary learning: Towards fast and scalable solutions for fMRI big data analytics

Milad Makkie, Xiang Li, Tianming Liu, Shannon Quinn, more

2016 IEEE International Conference on Big Data (Big Data) > 3396 - 3403

2016 IEEE International Conference on Big Data (Big Data)

The use of functional brain imaging for research and diagnosis has benefitted greatly from the recent advancements in neuroimaging technologies, as well as the explosive growth in size and availability of fMRI data. While it has been shown in literature that using multiple and large scale fMRI datasets can improve reproducibility and lead to new discoveries, the computational and informatics systems...

chapter

Security and privacy for big data: A systematic literature review

Boel Nelson, Tomas Olovsson

2016 IEEE International Conference on Big Data (Big Data) > 3693 - 3702

2016 IEEE International Conference on Big Data (Big Data)

Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to...

chapter

Addressing the big-earth-data variety challenge with the hierarchical triangular mesh

Michael L. Rilee, Kwo-Sen Kuo, Thomas Clune, Amidu Oloso, more

2016 IEEE International Conference on Big Data (Big Data) > 1006 - 1011

2016 IEEE International Conference on Big Data (Big Data)

We have implemented an updated Hierarchical Triangular Mesh (HTM) as the basis for a unified data model and an indexing scheme for geoscience data to address the variety challenge of Big Earth Data. In the absence of variety, the volume challenge of Big Data is relatively easily addressable with parallel processing. The more important challenge in achieving optimal value with a Big Data solution for...

chapter

RADU: Bridging the divide between data and infrastructure management to support data-driven collaborations

Fan Jiang, Claris Castillo, Charles Schmitt

2016 IEEE International Conference on Big Data (Big Data) > 370 - 377

2016 IEEE International Conference on Big Data (Big Data)

We have witnessed a dramatic increase in national cyberinfrastructure resources to support data-driven research. Orchestrating these resources to enable the creation of collaborative infrastructure capable of supporting data intensive activities is challenging. In this work we present RADII, a novel architecture and system that enables the provisioning and configuration of collaborative infrastructure...

chapter

Robust K-subspaces recovery with combinatorial initialization

Jun He, Yue Zhang, Jiye Wang, Nan Zeng, more

2016 IEEE International Conference on Big Data (Big Data) > 3573 - 3582

2016 IEEE International Conference on Big Data (Big Data)

In this paper we propose a two-stage algorithm for robust K-subspaces recovery. In the first stage, a large number of local candidate subspaces are generated by probabilistic farthest insertion, and then the initial near-optimal K-subspaces are solved by combinatorial selection with randomized greedy method. In the second stage, the K-subspaces are further refined by assigning each data vector to...

chapter

Large-scale text processing pipeline with Apache Spark

A. Svyatkovskiy, K. Imai, M. Kroeger, Y. Shiraito

2016 IEEE International Conference on Big Data (Big Data) > 3928 - 3935

2016 IEEE International Conference on Big Data (Big Data)

In this paper, we evaluate Apache Spark for a data-intensive machine learning problem. Our use case focuses on policy diffusion detection across the state legislatures in the United States over time. Previous work on policy diffusion has been unable to make an all-pairs comparison between bills due to computational intensity. As a substitute, scholars have studied single topic areas. We provide an...

chapter

Linked data platform for building cloud-based smart applications and connecting API access points with data discovery techniques

Holly Ferguson, Charles Vardeman, Jarek Nabrzyski

2016 IEEE International Conference on Big Data (Big Data) > 3016 - 3025

2016 IEEE International Conference on Big Data (Big Data)

Globalization and cloud computing have allowed major strides forward in terms of communication possibilities, but it is also illuminating how many different resource options and formats exist access to which would dramatically increase the accuracy and reliability of choices made as a result of computational output. As a result, there is increasing need for methods resolving levels of data translations...

chapter

EStore: An effective optimized data placement structure for Hive

Xin Li, Hui Li, Zhihao Huang, Bing Zhu, more

2016 IEEE International Conference on Big Data (Big Data) > 2996 - 3001

2016 IEEE International Conference on Big Data (Big Data)

The data warehouse system Hive has emerged as an important facility for supporting data computing and storage. In particular, RCFile is a tailor-made data placement structure implemented in Hive, which is designed for the data processing efficiency. In this paper, we propose several optimized schemes based on RCFile and introduce EStore, which is an optimized data placement structure that is able...

chapter

CoLoc: Distributed data and container colocation for data-intensive applications

Thomas Renner, Lauritz Thamsen, Odej Kao

2016 IEEE International Conference on Big Data (Big Data) > 3008 - 3015

2016 IEEE International Conference on Big Data (Big Data)

The performance of scalable analytic frameworks supporting data-intensive parallel applications often depends significantly on the time it takes to read input data. Therefore, existing frameworks like Spark and Flink try to achieve a high degree of data locality by scheduling tasks on nodes where the input data resides. However, the set of nodes running a job and its tasks is chosen by a cluster resource...

chapter

Is elasticity of scalable databases a Myth?

Daniel Seybold, Nicolas Wagner, Benjamin Erb, Jorg Domaschka

2016 IEEE International Conference on Big Data (Big Data) > 2827 - 2836

2016 IEEE International Conference on Big Data (Big Data)

The age of cloud computing has introduced all the mechanisms needed to elastically scale distributed, cloud-enabled applications. At roughly the same time, NoSQL databases have been proclaimed as the scalable alternative to relational databases. Since then, NoSQL databases are a core component of many large-scale distributed applications. This paper evaluates the scalability and elasticity features...

chapter

Accessing and distributing large volumes of NetCDF data

Ranjeet Devarakonda, Yaxing Wei, Michele Thornton

2016 IEEE International Conference on Big Data (Big Data) > 3966 - 3967

2016 IEEE International Conference on Big Data (Big Data)

In this paper, we will discuss how NASA's Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) is distributing large volumes of ‘structured’ data using Daily Surface Weather Data and a corresponding Climatological Summaries Dataset (Daymet) as an example.

chapter

Scalable genomics: From raw data to aligned reads on Apache YARN

Francesco Versaci, Luca Pireddu, Gianluigi Zanetti

2016 IEEE International Conference on Big Data (Big Data) > 1232 - 1241

2016 IEEE International Conference on Big Data (Big Data)

The adoption of Big Data technologies can potentially boost the scalability of data-driven biology and health workflows by orders of magnitude. Consider, for instance, that technologies in the Hadoop ecosystem have been successfully used in data-driven industry to scale their processes to levels much larger than any biological-or health-driven work attempted thus far. In this work we demonstrate the...

chapter

Parallel clustering method for non-disjoint partitioning of large-scale data based on spark framework

Abir Zayani, Chiheb-Eddine Ben N'Cir, Nadia Essoussi

2016 IEEE International Conference on Big Data (Big Data) > 1064 - 1069

2016 IEEE International Conference on Big Data (Big Data)

Clustering large scale data has become an important challenge which motivates several recent works. While the emphasis has been on the organization of massive data into disjoint groups, this work considers the identification of non-disjoint groups rather than the disjoint ones. In this setting, it is possible for data object to belong simultaneously to several groups since many real-world applications...

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data)

Analyzing the performance of data replication and data partitioning in the cloud: The BEOWULF approach

Bad big data science

Digree: A middleware for a graph databases polystore

Next-gen tools for big scientific data: ARM data center example

Exploring the utilization of places through a scalable “Activities in Places” analysis mechanism

An experimental study of big spatial data systems

Data-at-rest security for spark

Distributed rank-1 dictionary learning: Towards fast and scalable solutions for fMRI big data analytics

Security and privacy for big data: A systematic literature review

Addressing the big-earth-data variety challenge with the hierarchical triangular mesh

RADU: Bridging the divide between data and infrastructure management to support data-driven collaborations

Robust K-subspaces recovery with combinatorial initialization

Large-scale text processing pipeline with Apache Spark

Linked data platform for building cloud-based smart applications and connecting API access points with data discovery techniques

EStore: An effective optimized data placement structure for Hive

CoLoc: Distributed data and container colocation for data-intensive applications

Is elasticity of scalable databases a Myth?

Accessing and distributing large volumes of NetCDF data

Scalable genomics: From raw data to aligned reads on Apache YARN

Parallel clustering method for non-disjoint partitioning of large-scale data based on spark framework

Filter options

Publication date

Keywords

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2016 IEEE International Conference on Big Data (Big Data)