PortHadoop: Support direct HPC data processing in Hadoop

Xi Yang; Ning Liu; Bo Feng; Xian-He Sun; Shujia Zhou

doi:10.1109/BigData.2015.7363759

PortHadoop: Support direct HPC data processing in Hadoop

Yang, Xi, Liu, Ning, Feng, Bo, Sun, Xian-He, Zhou, Shujia

Source

2015 IEEE International Conference on Big Data (Big Data) > 223 - 232

Abstract

The success of the Hadoop MapReduce programming model has greatly propelled research in big data analytics. In recent years, there is a growing interest in the High Performance Computing (HPC) community to use Hadoop-based tools for processing scientific data. This interest is due to the facts that data movement becomes prohibitively expensive, highperformance data analytic becomes an important part of HPC, and Hadoop-based tools can perform large-scale data processing in a time and budget efficient manner. In this study, we propose PortHadoop, an enhanced Hadoop architecture that enables MapReduce applications reading data directly from HPC parallel file systems (PFS). PortHadoop saves HDFS storage space, and, more importantly, avoids the otherwise costly data copying. PortHadoop keeps all the semantics in the original Hadoop system and PFS. Therefore, Hadoop MapReduce applications can run on PortHadoop without code change except that the input file location is in PFS rather than HDFS. Our experimental results show that PortHadoop can operate effectively and efficiently with the PVFS2 and Ceph file systems.

Identifiers

book e-ISBN :	978-1-4799-9926-2 , 978-1-4799-9925-5
DOI	10.1109/BigData.2015.7363759

Authors

Yang, Xi

Department of Computer Science Illinois Institute of Technology, Chicago, IL

Liu, Ning

Department of Computer Science Illinois Institute of Technology, Chicago, IL

Feng, Bo

Department of Computer Science Illinois Institute of Technology, Chicago, IL

Sun, Xian-He

Department of Computer Science Illinois Institute of Technology, Chicago, IL

see all

Additional information

Data set: ieee

Publisher

IEEE

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography

INFONA - science communication portal

PortHadoop: Support direct HPC data processing in Hadoop

Source

Abstract

Identifiers

Authors

Yang, Xi

Liu, Ning

Feng, Bo

Sun, Xian-He

Keywords

Additional information

Publisher


Assign to other user
	×
Wrong email address

INFONA - science communication portal

PortHadoop: Support direct HPC data processing in Hadoop $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Yang, Xi

Liu, Ning

Feng, Bo

Sun, Xian-He

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

PortHadoop: Support direct HPC data processing in Hadoop