Understanding Software Platforms for In-Memory Scientific Data Analysis: A Case Study of the Spark System

Xuechen Zhang; Ujjwal Khanal; Xinghui Zhao; Stephen Ficklin

doi:10.1109/ICPADS.2016.0149

Understanding Software Platforms for In-Memory Scientific Data Analysis: A Case Study of the Spark System

Zhang, Xuechen, Khanal, Ujjwal, Zhao, Xinghui, Ficklin, Stephen

Source

2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) > 1135 - 1144

Abstract

Over the last five years, Apache Spark has become a major software platform for in-memory data analysis. Acknowledging its widespread use, we present a comprehensive study of system characteristics of Spark with a focus on scientific data analytics performing large-scale matrix operations. We compare its performance to SciDB, a disk-based platform for array data analysis. A benchmark, ArrayBench, is developed to evaluate the performance of matrix processing for scientific data analytics. ArrayBench is applied to data from a real biological workflow whose data inputs are in matrix form. Herein, we report the findings, which shed light on the improvement of Spark and SciDB and future development of large-scale scientific data analytics.

Identifiers

book ISSN :	1521-9097
book e-ISBN :	978-1-5090-4457-3
DOI	10.1109/ICPADS.2016.0149

Authors

Keywords

Scientific Data Analytics Spark In-memory

Additional information

Data set: ieee

Publisher

IEEE

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Understanding Software Platforms for In-Memory Scientific Data Analysis: A Case Study of the Spark System $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Zhang, Xuechen

Khanal, Ujjwal

Zhao, Xinghui

Ficklin, Stephen

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Understanding Software Platforms for In-Memory Scientific Data Analysis: A Case Study of the Spark System