Planning Your SQL-on-Hadoop Deployment Using a Low-Cost Simulation-Based Approach

Jun Liu; Bianny Bian; Samantika Subramaniam Sury

doi:10.1109/SBAC-PAD.2016.31

The term "SQL-on-Hadoop" has recently gained significant traction [19]. Impala represents a new emerging class of SQL-on-Hadoop systems that exploit a shared-nothing parallel database architecture over Hadoop. Impala was designed to close the gap of near real time data analytics on Hadoop stack and it has shown itself to be significantly more efficient than other SQL-on-Hadoop solutions [13]. However, it is not a trivial task to leverage Impala for handling queries with different business demands [12]. Improperly deploying an Impala cluster may not give you the expected performance you want. In this paper, we propose a novel Impala simulation framework to help IT professionals to understand its performance behavior. This would simplify the deployment planning work required to enable big data analytics on SQL-on-Hadoop systems. An Impala simulator models the behavior of a complete software stack and simulates the activities of cluster components such as storage, network, processors and memory. Moreover, the accuracy of the simulation remain high in response to both software configuration and hardware changes, it reflects the expected scaling trend with low cost overhead and fast simulation speed. The Impala simulator has been validated against various S/W and H/W configurations, using the well-known TPC-DS benchmark [15], and the simulation results are valid and expected. A use case is provided to show how one would use the simulator to solve their performance and deployment issues.

book e-ISBN :	978-1-5090-6108-2
DOI	10.1109/SBAC-PAD.2016.31

INFONA - science communication portal

Planning Your SQL-on-Hadoop Deployment Using a Low-Cost Simulation-Based Approach

Source

Abstract

Identifiers

Authors

Liu, Jun

Bian, Bianny

Sury, Samantika Subramaniam

Keywords

Additional information

Publisher


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Planning Your SQL-on-Hadoop Deployment Using a Low-Cost Simulation-Based Approach $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Liu, Jun

Bian, Bianny

Sury, Samantika Subramaniam

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Planning Your SQL-on-Hadoop Deployment Using a Low-Cost Simulation-Based Approach