Xi Yang

chapter

Visualization and Adaptive Subsetting of Earth Science Data in HDFS: A Novel Data Analysis Strategy with Hadoop and Spark

Xi Yang, Si Liu, Kun Feng, Shujia Zhou, more

2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom) > 89 - 96

2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom)

Data analytics becomes increasingly important in big data applications. Adaptively subsetting large amounts of data to extract the interesting events such as the centers of hurricane or thunderstorm, statistically analyzing and visualizing the subset data, is an effective way to analyze ever-growing data. This is particularly crucial for analyzing Earth Science data, such as extreme weather. The Hadoop...

chapter

A Hadoop-based visualization and diagnosis framework for earth science data

Shujia Zhou, Xi Yang, Xiaowen Li, Toshihisa Matsui, more

2015 IEEE International Conference on Big Data (Big Data) > 1972 - 1977

2015 IEEE International Conference on Big Data (Big Data)

With rapidly growing computing power, ultra high-resolution Earth science simulations with a long period of time are feasible. However, it is still very challenging to distribute and analyze a huge amount of simulation results, which could be over 100TB. One key reason is that typical Earth science data are represented in NetCDF, which is not supported by the popular and powerful Hadoop Distribute...

chapter

PortHadoop: Support direct HPC data processing in Hadoop

Xi Yang, Ning Liu, Bo Feng, Xian-He Sun, more

2015 IEEE International Conference on Big Data (Big Data) > 223 - 232

2015 IEEE International Conference on Big Data (Big Data)

The success of the Hadoop MapReduce programming model has greatly propelled research in big data analytics. In recent years, there is a growing interest in the High Performance Computing (HPC) community to use Hadoop-based tools for processing scientific data. This interest is due to the facts that data movement becomes prohibitively expensive, highperformance data analytic becomes an important part...

chapter

Overcoming Hadoop Scaling Limitations through Distributed Task Execution

Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, more

2015 IEEE International Conference on Cluster Computing > 236 - 245

2015 IEEE International Conference on Cluster Computing (CLUSTER)

Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through the Hadoop implementation and framework decoupling (e.g. YARN, Mesos) have allowed Hadoop to scale to tens of thousands of commodity cluster processors, the centralized designs of the resource manager, task scheduler and metadata management of HDFS file system adversely...

chapter

IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems

Bo Feng, Xi Yang, Kun Feng, Yanlong Yin, more

2015 IEEE International Conference on Cluster Computing > 62 - 65

2015 IEEE International Conference on Cluster Computing (CLUSTER)

Hadoop, as one of the most widely accepted MapReduce frameworks, is naturally data-intensive. Its several dependent projects, such as Mahout and Hive, inherent this characteristic. Meanwhile I/O optimization becomes a daunting work, since applications' source code is not always available. I/O traces for Hadoop and its dependents are increasingly important, because it can faithfully reveal intrinsic...

chapter

YARNsim: Simulating Hadoop YARN

Ning Liu, Xi Yang, Xian-He Sun, Johnathan Jenkins, more

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 637 - 646

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Despite the popularity of the Apache Hadoop system, its success has been limited by issues such as single points of failure, centralized job/task management, and lack of support for programming models other than MapReduce. The next generation of Hadoop, Apache Hadoop YARN, is designed to address these issues. In this paper, we propose YARNsim, a simulation system for Hadoop YARN. YARNsim is based...

chapter

ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing

Hui Jin, Xi Yang, Xian-He Sun, Ioan Raicu

2012 IEEE 32nd International Conference on Distributed Computing Systems > 516 - 525

2012 IEEE 32nd International Conference on Distributed Computing Systems (ICDCS)

The MapReduce programming paradigm is gaining more and more popularity recently due to its merits of ease of programming, data distribution and fault tolerance. The low barrier of adoption of MapReduce makes it a promising framework for non-dedicated distributed computing environments. However, the variability of hosts resources and availability could substantially degrade the performance of MapReduce...

INFONA - science communication portal

Search results for: Xi Yang

Visualization and Adaptive Subsetting of Earth Science Data in HDFS: A Novel Data Analysis Strategy with Hadoop and Spark

A Hadoop-based visualization and diagnosis framework for earth science data

PortHadoop: Support direct HPC data processing in Hadoop

Overcoming Hadoop Scaling Limitations through Distributed Task Execution

IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems

YARNsim: Simulating Hadoop YARN

ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Xi Yang

Visualization and Adaptive Subsetting of Earth Science Data in HDFS: A Novel Data Analysis Strategy with Hadoop and Spark

A Hadoop-based visualization and diagnosis framework for earth science data

PortHadoop: Support direct HPC data processing in Hadoop

Overcoming Hadoop Scaling Limitations through Distributed Task Execution

IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems

YARNsim: Simulating Hadoop YARN

ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options