2016 IEEE International Conference on Big Data (Big Data)

chapter

High-performance design of apache spark with RDMA and its benefits on various workloads

Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, Dhabaleswar K. D K Panda

2016 IEEE International Conference on Big Data (Big Data) > 253 - 262

The in-memory data processing framework, Apache Spark, has been stealing the limelight for low-latency interactive applications, iterative and batch computations. Our early experience study [17] has shown that Apache Spark can be enhanced to leverage advanced features (e.g., RDMA) on highperformance networks (e.g., InfiniBand and RoCE) to improve the performance of shuffle phase. With the fast evolving...

chapter

Cloud Kotta: Enabling secure and scalable data analytics in the cloud

Yadu N. Babuji, Kyle Chard, Aaron Gerow, Eamon Duede

2016 IEEE International Conference on Big Data (Big Data) > 302 - 310

2016 IEEE International Conference on Big Data (Big Data)

Distributed communities of researchers rely increasingly on valuable, proprietary, or sensitive datasets. Given the growth of such data, especially in fields new to data-driven research like the social sciences and humanities, coupled with what are often strict and complex data-use agreements, many research communities now require methods that allow secure, scalable and cost-effective storage and...

chapter

RADU: Bridging the divide between data and infrastructure management to support data-driven collaborations

Fan Jiang, Claris Castillo, Charles Schmitt

2016 IEEE International Conference on Big Data (Big Data) > 370 - 377

2016 IEEE International Conference on Big Data (Big Data)

We have witnessed a dramatic increase in national cyberinfrastructure resources to support data-driven research. Orchestrating these resources to enable the creation of collaborative infrastructure capable of supporting data intensive activities is challenging. In this work we present RADII, a novel architecture and system that enables the provisioning and configuration of collaborative infrastructure...

chapter

Managing hot metadata for scientific workflows on multisite clouds

Luis Pineda-Morales, Ji Liu, Alexandru Costan, Esther Pacitti, more

2016 IEEE International Conference on Big Data (Big Data) > 390 - 397

2016 IEEE International Conference on Big Data (Big Data)

Large-scale scientific applications are often expressed as workflows that help defining data dependencies between their different components. Several such workflows have huge storage and computation requirements, and so they need to be processed in multiple (cloud-federated) datacenters. It has been shown that efficient metadata handling plays a key role in the performance of computing systems. However,...

chapter

Kernels for scalable data analysis in science: Towards an architecture-portable future

Sreenivas R. Sukumar, Ramakrishnan Kannan, Seung-Hwan Lim, Michael A. Matheson

2016 IEEE International Conference on Big Data (Big Data) > 1026 - 1031

2016 IEEE International Conference on Big Data (Big Data)

In this paper, we pose and address some of the unique challenges in the analysis of scientific Big Data on supercomputing platforms. Our approach identifies, implements and scales numerical kernels that are critical to the instantiation of theory-inspired analytic workflows on modern computing architectures. We present the benefits of scalable kernels towards constructing algorithms such as principal...

chapter

Mini-apps for high performance data analysis

Sreenivas R. Sukumar, Michael A. Matheson, Ramakrishnan Kannan, Seung-Hwan Lim

2016 IEEE International Conference on Big Data (Big Data) > 1483 - 1492

2016 IEEE International Conference on Big Data (Big Data)

Scaling-up scientific data analysis and machine learning algorithms for data-driven discovery is a grand challenge that we face today. Despite the growing need for analysis from science domains that are generating ‘Big Data’ from instruments and simulations, building high-performance analytical workflows of data-intensive algorithms have been daunting because: (i) the ‘Big Data’ hardware and software...

chapter

Advancing NLP via a distributed-messaging approach

Ilaria Bordino, Andrea Ferretti, Marco Firrincieli, Francesco Gullo, more

2016 IEEE International Conference on Big Data (Big Data) > 1561 - 1568

2016 IEEE International Conference on Big Data (Big Data)

Natural Language Processing (NLP) constitutes a fundamental module for a plethora of domains where unstructured text is a predominant source. Despite the keen interest of both industry and research community in developing NLP tools, current industrial solutions still suffer from two main cons. First, the architectures underlying existing systems do not satisfy critical requirements of large-scale...

chapter

Research on the big data system of massive open online course

Zhenwei Du, Haopeng Chen, Jianwei Jiang

2016 IEEE International Conference on Big Data (Big Data) > 1931 - 1936

2016 IEEE International Conference on Big Data (Big Data)

With no limit on time and location [1], the number of users attracted by massive open online course (MOOC) has increased rapidly, and many platforms have been built to provide a variety of courses. All of these trigger an explosive growth in data volume. As we known, people have met big data in many areas and proposed many techniques and methods to deal with them. However, people still have no sense...

chapter

Multi-layer text classification with voting for consumer reviews

Yan Zhu, Melody Moh, Teng-Sheng Moh

2016 IEEE International Conference on Big Data (Big Data) > 1991 - 1999

2016 IEEE International Conference on Big Data (Big Data)

As social media has become increasingly popular in the modern world, people are using these platforms to express their opinions about products, businesses, and services. The need for categorizing these consumer reviews has been prominent. One effective solution is sentiment analysis (SA), which has been an active research topic. The goal of SA is to automatically extracting and classifying user opinions...

chapter

Spatial-crowd: A big data framework for efficient data visualization

Shahbaz Atta, Bilal Sadiq, Akhlaq Ahmad, Sheikh Nasir Saeed, more

2016 IEEE International Conference on Big Data (Big Data) > 2130 - 2138

2016 IEEE International Conference on Big Data (Big Data)

Analyzing and visualizing large datasets generated by real-time spatio-temporal activities (e.g. vehicle mobility or large crowd movement) are a very challenging task. Recursive delays both at middleware and front end applications limit the of usefulness of the real-time analysis. In this paper, we present a framework “Spatial-Crowd” that first handles spatial-temporal data acquisition and processing...

chapter

A workload aware model of computational resource selection for big data applications

Amit Gupta, Weijia Xu, Natalia Ruiz-Juri, Kenneth Perrine

2016 IEEE International Conference on Big Data (Big Data) > 2243 - 2250

2016 IEEE International Conference on Big Data (Big Data)

Workload characterization of Big Data applications has always been a challenging research problem. Big data applications often have high demands on multiple computing components in concert, such as storage, memory, network and processors and have evolving performance characteristics along with the scale of the workload. To further complicate the problem, the increasing diversity of hardware technologies...

chapter

Evaluation of K-means data clustering algorithm on Intel Xeon Phi

Sunwoo Lee, Wei-keng Liao, Ankit Agrawal, Nikos Hardavellas, more

2016 IEEE International Conference on Big Data (Big Data) > 2251 - 2260

2016 IEEE International Conference on Big Data (Big Data)

Intel Xeon Phi is a processor based on MIC architecture that contains a large number of compute cores with a high local memory bandwidth and 512-bit vector processing units. To achieve high performance on Xeon Phi, it is important for programmers to explore all the software features provided by the Intel compiler and libraries to fully utilize the new hardware resources. In this paper, we use the...

chapter

Big data analytics on HPC architectures: Performance and cost

Peter Xenopoulos, Jamison Daniel, Michael Matheson, Sreenivas Sukumar

2016 IEEE International Conference on Big Data (Big Data) > 2286 - 2295

2016 IEEE International Conference on Big Data (Big Data)

Data driven science, accompanied by the explosion of petabytes of data, has called into need dedicated analytics computing resources. Dedicated analytics clusters require large capital outlays due to their expensive hardware requirements. Additionally, if such resources are located far from the data they analyze, they also incur substantial data transfer, which has both cost and latency implications...

chapter

Legion-based scientific data analytics on heterogeneous processors

Lina Yu, Hongfeng Yu

2016 IEEE International Conference on Big Data (Big Data) > 2305 - 2314

2016 IEEE International Conference on Big Data (Big Data)

We present a study of scientific data analytics on heterogeneous architectures using the Legion runtime system. Legion is a new programming model and runtime system targeting distributed heterogeneous architectures. It introduces logical regions as a new abstraction for describing the structures and usages of program data. We describe how to leverage logical regions to express important properties...

chapter

A multi-layer software architecture framework for adaptive real-time analytics

Athena Vakali, Paschalis Korosoglou, Pavlos Daoglou

2016 IEEE International Conference on Big Data (Big Data) > 2425 - 2430

2016 IEEE International Conference on Big Data (Big Data)

Highly distributed applications dominate today's software industry posing new challenges for novel software architectures capable of supporting real time processing and analytics. The proposed framework, so called REAXICS, is motivated by the fact that the demand for aggregating current and past big data streams requires new software methodologies, platforms and services. The proposed framework is...

chapter

Towards a heterogeneous, polystore-like data architecture for the US Department of Veteran Affairs (VA) enterprise analytics

Edmon Begoli, Derek Kistler, Jack Bates

2016 IEEE International Conference on Big Data (Big Data) > 2550 - 2554

2016 IEEE International Conference on Big Data (Big Data)

The Polystore architecture revisits the federated approach to access and querying the standalone, independent databases in the uniform and optimized fashion, but this time in the context of heterogeneous data and specialized analyses. In light of this architectural philosophy, and in the light of the major data architecture development efforts at the US Department of Veterans Administration (VA),...

chapter

Analytics-driven data ingestion and derivation in the AWESOME polystore

Subhasis Dasgupta, Kevin Coakley, Amarnath Gupta

2016 IEEE International Conference on Big Data (Big Data) > 2555 - 2564

2016 IEEE International Conference on Big Data (Big Data)

Polystores, i.e., data management systems that use multiple stores for different data models, are gaining popularity. We are developing a polystore-based system called AWESOME to support social data analytics. The AWESOME polystore can support relational, semistructured, graph and text data and houses a Spark computation engine to produce derived data during ingestion. ADIL, the data ingestion language...

chapter

Towards a provenance-aware spatial-temporal architectural framework for massive data integration and analysis

Ivens Portugal, Paulo Alencar, Donald Cowan

2016 IEEE International Conference on Big Data (Big Data) > 2686 - 2691

2016 IEEE International Conference on Big Data (Big Data)

Spatial-temporal computing refers to the modeling, management, and analysis of spatial and temporal information. Despite the recent advances in massive data manipulation, software system approaches that support the massive spatial-temporal data integration and analysis still face numerous challenges, including the lack of: (i) a high-level architectural framework for massive data integration and analysis;...

chapter

Model-driven deployment and management of workflows on analytics frameworks

Merlijn Sebrechts, Sander Borny, Thomas Vanhove, Gregory Van Seghbroeck, more

2016 IEEE International Conference on Big Data (Big Data) > 2819 - 2826

2016 IEEE International Conference on Big Data (Big Data)

The data science skills shortage means that those who have the knowledge are under constant pressure to do more with less. While the data science tools are improving at a staggering pace, the operational tools around them can not keep up. Even researchers at Google state that the issue of automatic configuration and dependency management of services is still an “open, hard problem”. This manifests...

chapter

Is elasticity of scalable databases a Myth?

Daniel Seybold, Nicolas Wagner, Benjamin Erb, Jorg Domaschka

2016 IEEE International Conference on Big Data (Big Data) > 2827 - 2836

2016 IEEE International Conference on Big Data (Big Data)

The age of cloud computing has introduced all the mechanisms needed to elastically scale distributed, cloud-enabled applications. At roughly the same time, NoSQL databases have been proclaimed as the scalable alternative to relational databases. Since then, NoSQL databases are a core component of many large-scale distributed applications. This paper evaluates the scalability and elasticity features...

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data)

High-performance design of apache spark with RDMA and its benefits on various workloads

Cloud Kotta: Enabling secure and scalable data analytics in the cloud

RADU: Bridging the divide between data and infrastructure management to support data-driven collaborations

Managing hot metadata for scientific workflows on multisite clouds

Kernels for scalable data analysis in science: Towards an architecture-portable future

Mini-apps for high performance data analysis

Advancing NLP via a distributed-messaging approach

Research on the big data system of massive open online course

Multi-layer text classification with voting for consumer reviews

Spatial-crowd: A big data framework for efficient data visualization

A workload aware model of computational resource selection for big data applications

Evaluation of K-means data clustering algorithm on Intel Xeon Phi

Big data analytics on HPC architectures: Performance and cost

Legion-based scientific data analytics on heterogeneous processors

A multi-layer software architecture framework for adaptive real-time analytics

Towards a heterogeneous, polystore-like data architecture for the US Department of Veteran Affairs (VA) enterprise analytics

Analytics-driven data ingestion and derivation in the AWESOME polystore

Towards a provenance-aware spatial-temporal architectural framework for massive data integration and analysis

Model-driven deployment and management of workflows on analytics frameworks

Is elasticity of scalable databases a Myth?

Filter options

Publication date

Keywords

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2016 IEEE International Conference on Big Data (Big Data)