Xi Yang

chapter

Visualization and Adaptive Subsetting of Earth Science Data in HDFS: A Novel Data Analysis Strategy with Hadoop and Spark

Xi Yang, Si Liu, Kun Feng, Shujia Zhou, more

2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom) > 89 - 96

2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom)

Data analytics becomes increasingly important in big data applications. Adaptively subsetting large amounts of data to extract the interesting events such as the centers of hurricane or thunderstorm, statistically analyzing and visualizing the subset data, is an effective way to analyze ever-growing data. This is particularly crucial for analyzing Earth Science data, such as extreme weather. The Hadoop...

chapter

Dominoes: Speculative Repair in Erasure-Coded Hadoop System

Xi Yang, Chen Feng, Zhiwei Xu, Xian-He Sun

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 366 - 375

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Data volume grows dramatically in the era of big data. To save capital cost on storage hardware, datacenters currently prefer using erasure coding rather than simply replication to resist data loss. Erasure coding can provide equivalent three-way fault tolerance to HDFS's default three replication mechanism but degrades data availability for task scheduling. In an erasure-coded system, data reconstruction...

chapter

IC-Data: Improving Compressed Data Processing in Hadoop

Adnan Haider, Xi Yang, Ning Liu, Xian-He Sun, more

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 356 - 365

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

As dataset sizes for data analytic applications and scientific applications running on Hadoop increases, data compression has become essential to store this data within a reasonable storage cost. Although data is often stored compressed, currently Hadoop takes 49% longer to process compressed data compared to uncompressed data. Processing compressed data reduces the amount of task parallelism and...

chapter

A Hadoop-based visualization and diagnosis framework for earth science data

Shujia Zhou, Xi Yang, Xiaowen Li, Toshihisa Matsui, more

2015 IEEE International Conference on Big Data (Big Data) > 1972 - 1977

2015 IEEE International Conference on Big Data (Big Data)

With rapidly growing computing power, ultra high-resolution Earth science simulations with a long period of time are feasible. However, it is still very challenging to distribute and analyze a huge amount of simulation results, which could be over 100TB. One key reason is that typical Earth science data are represented in NetCDF, which is not supported by the popular and powerful Hadoop Distribute...

chapter

PortHadoop: Support direct HPC data processing in Hadoop

Xi Yang, Ning Liu, Bo Feng, Xian-He Sun, more

2015 IEEE International Conference on Big Data (Big Data) > 223 - 232

2015 IEEE International Conference on Big Data (Big Data)

The success of the Hadoop MapReduce programming model has greatly propelled research in big data analytics. In recent years, there is a growing interest in the High Performance Computing (HPC) community to use Hadoop-based tools for processing scientific data. This interest is due to the facts that data movement becomes prohibitively expensive, highperformance data analytic becomes an important part...

chapter

IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems

Bo Feng, Xi Yang, Kun Feng, Yanlong Yin, more

2015 IEEE International Conference on Cluster Computing > 62 - 65

2015 IEEE International Conference on Cluster Computing (CLUSTER)

Hadoop, as one of the most widely accepted MapReduce frameworks, is naturally data-intensive. Its several dependent projects, such as Mahout and Hive, inherent this characteristic. Meanwhile I/O optimization becomes a daunting work, since applications' source code is not always available. I/O traces for Hadoop and its dependents are increasingly important, because it can faithfully reveal intrinsic...

INFONA - science communication portal

Search results for: Xi Yang

Visualization and Adaptive Subsetting of Earth Science Data in HDFS: A Novel Data Analysis Strategy with Hadoop and Spark

Dominoes: Speculative Repair in Erasure-Coded Hadoop System

IC-Data: Improving Compressed Data Processing in Hadoop

A Hadoop-based visualization and diagnosis framework for earth science data

PortHadoop: Support direct HPC data processing in Hadoop

IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Xi Yang

Visualization and Adaptive Subsetting of Earth Science Data in HDFS: A Novel Data Analysis Strategy with Hadoop and Spark

Dominoes: Speculative Repair in Erasure-Coded Hadoop System

IC-Data: Improving Compressed Data Processing in Hadoop

A Hadoop-based visualization and diagnosis framework for earth science data

PortHadoop: Support direct HPC data processing in Hadoop

IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options