Wei Tang

article

Improving Batch Scheduling on Blue Gene/Q by Relaxing Network Allocation Constraints

Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, more

IEEE Transactions on Parallel and Distributed Systems > 2016 > 27 > 11 > 3269 - 3282

As systems scale toward exascale, many resources will become increasingly constrained. While some of these resources have historically been explicitly allocated, many—such as network bandwidth, I/O bandwidth, or power—have not. As systems continue to evolve, we expect many such resources to become explicitly managed. This change will pose critical challenges to resource management and job scheduling...

chapter

I/O-Aware Batch Scheduling for Petascale Computing Systems

Zhou Zhou, Xu Yang, Dongfang Zhao, Paul Rich, more

2015 IEEE International Conference on Cluster Computing > 254 - 263

2015 IEEE International Conference on Cluster Computing (CLUSTER)

In the Big Data era, the gap between the storage performance and an application's I/O requirement is increasing. I/O congestion caused by concurrent storage accesses from multiple applications is inevitable and severely harms the performance. Conventional approaches either focus on optimizing an application's access pattern individually or handle I/O requests on a low-level storage layer without any...

chapter

Data-Aware Resource Scheduling for Multicloud Workflows: A Fine-Grained Simulation Approach

Wei Tang, Jonathan Jenkins, Folker Meyer, Robert Ross, more

2014 IEEE 6th International Conference on Cloud Computing Technology and Science > 887 - 892

2014 IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom)

Cloud infrastructures have seen increasing popularity for addressing the growing computational needs of today's scientific and engineering applications. However, resource management challenges exist in the elastic cloud environment, such as resource provisioning and task allocation, especially when data movement between multiple domains plays an important role. In this work, we study the impact of...

chapter

Balancing job performance with system performance via locality-aware scheduling on torus-connected systems

Xu Yang, Zhou Zhou, Wei Tang, Xingwu Zheng, more

2014 IEEE International Conference on Cluster Computing (CLUSTER) > 140 - 148

2014 IEEE International Conference On Cluster Computing (CLUSTER)

Torus-connected network is widely used in modern supercomputers due to its linear per node cost scaling and its competitive overall performance. Job scheduling system plays a critical role for the efficient use of supercomputers. As supercomputers continue growing in size, a fundamental problem arises: how to effectively balance job performance with system performance on torus-connected machines?...

chapter

A hierarchical framework to enhance scalability and performance of scheduling and mapping algorithms

Wei Tang, Forrest Brewer

Proceedings of the 2014 Electronic System Level Synthesis Conference (ESLsyn) > 1 - 6

2014 Electronic System Level Synthesis Conference (ESLsyn)

Crucial to design productivity, architecture level synthesis algorithms trade off between design quality and algorithm complexity. The well-known list scheduling algorithm has a O(N) complexity but has well known defi-ciencies. Ant Colony, FDLS and Simulated Annealing have at least O(N³) time complexity. These considerations force a limitation on the scale of design instances that can be synthesized...

chapter

A hierarchical Ant-Colony heuristic for architecture synthesis for on-chip communication

Wei Tang, Forrest Brewer

2013 Conference on Design and Architectures for Signal and Image Processing > 166 - 173

2013 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Architecture synthesis and high level synthesis are the paradigms to efficiently organize computations and communications at the high level. While research has been extensively conducted to solve those two problems, a gap between those two paradigms still exists. This paper presents an algorithm for architectural tradeoff for on-chip communication at operation-level granularity. Applied to practical...

chapter

Adaptive Metric-Aware Job Scheduling for Production Supercomputers

Wei Tang, Dongxu Ren, Zhiling Lan, Narayan Desai

2012 41st International Conference on Parallel Processing Workshops > 107 - 115

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

Job scheduling is a critical and complex task on large-scale supercomputers where a scheduling policy is expected to fulfill amorphous and sometimes conflicting goals from both users and system owners. Moreover, the effectiveness of a scheduling policy is dependent on workload characteristics which vary from time to time. Thus it is challenging to design a versatile scheduling policy that is effective...

chapter

Evaluating Performance Impacts of Delayed Failure Repairing on Large-Scale Systems

Zhou Zhou, Wei Tang, Ziming Zheng, Zhiling Lan, more

2011 IEEE International Conference on Cluster Computing > 532 - 536

2011 IEEE International Conference on Cluster Computing (CLUSTER)

With the fast improvement in technology, we are now moving toward exascale computing. Many experts predict that exascale computers will have millions of nodes, billions of threads of execution, hundreds of petabytes of inner memory and exabytes of persistent storage. For systems of such a scale, frequent failures are becoming a serious concern. One of the most important reasons is that in a large-scale...

chapter

Automatic and coordinated job recovery for high performance computing

Wei Tang, Zhiling Lan, Narayan Desai, Daniel Buettner

2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers > 1 - 9

2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS 2010)

As the scale of high-performance computing systems continues to grow, the impact of failures on the systems is increasingly critical. Research has been performed on fault prediction and associated precautionary actions. While this approach is valuable, it is not adequate because of the inevitability of failures. Postfailure recovery is equally important; however, most current work relies mainly on...

INFONA - science communication portal

Search results for: Wei Tang

Improving Batch Scheduling on Blue Gene/Q by Relaxing Network Allocation Constraints

I/O-Aware Batch Scheduling for Petascale Computing Systems

Data-Aware Resource Scheduling for Multicloud Workflows: A Fine-Grained Simulation Approach

Balancing job performance with system performance via locality-aware scheduling on torus-connected systems

A hierarchical framework to enhance scalability and performance of scheduling and mapping algorithms

A hierarchical Ant-Colony heuristic for architecture synthesis for on-chip communication

Adaptive Metric-Aware Job Scheduling for Production Supercomputers

Evaluating Performance Impacts of Delayed Failure Repairing on Large-Scale Systems

Automatic and coordinated job recovery for high performance computing

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results for: Wei Tang

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options