Search results for: Jia Jia

Items from 1 to 5 out of 5 results

article

FTPA: Supporting Fault-Tolerant Parallel Computing through Parallel Recomputing

Xuejun Yang, Yunfei Du, Panfeng Wang, Hongyi Fu, more

IEEE Transactions on Parallel and Distributed Systems > 2009 > 20 > 10 > 1471 - 1486

As the size of large-scale computer systems increases, their mean-time-between-failures are becoming significantly shorter than the execution time of many current scientific applications. To complete the execution of scientific applications, they must tolerate hardware failures. Conventional rollback-recovery protocols redo the computation of the crashed process since the last checkpoint on a single...

chapter

GiFT: Automating FTPA Implementation for MPI Programs

Hongyi Fu, Yunfei Du, Panfeng Wang, Jia Jia, more

2008 14th IEEE International Conference on Parallel and Distributed Systems > 91 - 98

2008 14th IEEE International Conference on Parallel and Distributed Systems

Fault tolerance is a critical issue in the arena of large-scale computing. The fault-tolerant parallel algorithm (FTPA) is an application-level technique for tolerating hardware failures. FTPA achieves fast failure recovery making use of parallel recomputing. However, it complicates the coding of the application program. This paper uses compiler technology to automate the design of FTPA, and introduces...

chapter

Compiler-Assisted Application-Level Checkpointing for MPI Programs

Xuejun Yang, Panfeng Wang, Hongyi Fu, Yunfei Du, more

2008 The 28th International Conference on Distributed Computing Systems > 251 - 259

2008 28th IEEE International Conference on Distributed Computing Systems (ICDCS)

Application-level checkpointing can decrease the overhead of fault tolerance by minimizing the amount of checkpoint data. However this technique requires the programmer to manually choose the critical data that should be saved. In this paper, we firstly propose a live-variable analysis method for MPI programs. Then, we provide an optimization method of data saving for application-level checkpointing...

chapter

Building Single Fault Survivable Parallel Algorithms for Matrix Operations Using Redundant Parallel Computation

Yunfei Du, Panfeng Wang, Hongyi Fu, Jia Jia, more

7th IEEE International Conference on Computer and Information Technology (CIT 2007) > 285 - 290

2007 7th IEEE International Conference on Computer and Information Technology

As the size of today's high performance computers continue to grow, node failures in these computers are becoming frequent events. Although checkpoint is the typical technique to tolerate such failures, it often introduces a considerable overhead and has shown poor scalability on today's large scale systems. In this paper we defined a new term called fault tolerant parallel algorithm which means that...

chapter

The Fault Tolerant Parallel Algorithm: the Parallel Recomputing Based Failure Recovery

Xuejun Yang, Yunfei Du, Panfeng Wang, Hongyi Fu, more

16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007) > 199 - 212

2007 16th International Conference on Parallel Architectures and Compilation Techniques

This paper addresses the issue of fault tolerance in parallel computing, and proposes a new method named parallel recomputing. Such method achieves fault recovery automatically by using surviving processes to recompute the workload of failed processes in parallel. The paper firstly defines the fault tolerant parallel algorithm (FTPA) as the parallel algorithm which tolerates failures by parallel recomputing...

Filter options

Keywords:
FAULT TOLERANT COMPUTING

Publication date

Set your own date range

Publication type

book (4)
article (1)

Keywords

PARALLEL ALGORITHMS (4)
CHECKPOINTING (3)
ALGORITHMS (2)
APPLICATION PROGRAM INTERFACES (2)
FAULT TOLERANCE (2)
FAULT-TOLERANT PARALLEL ALGORITHM (2)
MESSAGE PASSING (2)
PARALLEL RECOMPUTING (2)
PROGRAM PROCESSORS (2)
ALGORITHM DESIGN AND ANALYSIS (1)
APPLICATION SOFTWARE (1)
APPLICATION-LEVEL FAULT-TOLERANT APPROACH (1)
COMPILER-ASSISTED APPLICATION-LEVEL CHECKPOINTING AUTOMATION (1)
COMPUTATIONAL MODELING (1)
COMPUTER CRASHES (1)
COMPUTERS (1)
CONCURRENT PROGRAMMING (1)
CPU CLUSTER SYSTEM (1)
DATA SAVING (1)
DATA-FLOW ANALYSIS (1)
EQUATIONS (1)
FAILURE RECOVERY (1)
FAST SELF-RECOVERY (1)
FAULT SURVIVABLE PARALLEL ALGORITHMS (1)
FAULT TOLERANCE APPROACH (1)
FAULT TOLERANCE OVERHEAD (1)
FAULT TOLERANT PARALLEL ALGORITHM (1)
FAULT TOLERANT SYSTEMS (1)
FAULT-TOLERANCE (1)
FAULT-TOLERANT PARALLEL COMPUTING (1)
FLOW GRAPHS (1)
FTPA (1)
GAUSSIAN ELIMINATION (1)
GAUSSIAN PROCESSES (1)
GET IT FAULT-TOLERANT (1)
GET IT FAULT-TOLERANT SOURCE-TO-SOURCE PRECOMPILER TOOL (1)
GIFT (1)
HARDWARE (1)
HARDWARE FAILURES TOLERANCE (1)
INSTRUMENTS (1)
INTERPROCESS DEFINITION-USE RELATIONSHIP ANALYSIS (1)
KERNEL (1)
LARGE-SCALE SYSTEMS (1)
LIVE-VARIABLE ANALYSIS METHOD (1)
MATRIX ALGEBRA (1)
MATRIX OPERATIONS (1)
MEAN-TIME-BETWEEN-FAILURES (1)
MPI PROGRAM (1)
MPI PROGRAMS (1)
OPERATING SYSTEMS (1)
OPTIMISING COMPILERS (1)
OPTIMIZATION METHOD (1)
PARALLEL DENSE MATRIX-MATRIX MULTIPLICATION (1)
PARALLEL PROCESSING (1)
PARALLEL RECOMPUTING BASED FAILURE RECOVERY (1)
PARALLEL RECOMPUTING. (1)
PROGRAM COMPILERS (1)
PROGRAM DIAGNOSTICS (1)
PROGRAM PERFORMANCE EVALUATION (1)
PROTOCOLS (1)
REDUNDANT PARALLEL COMPUTATION (1)
RELIABILITY (1)
ROLLBACK-RECOVERY PROTOCOLS (1)
SOFTWARE PERFORMANCE EVALUATION (1)
SOFTWARE/SOFTWARE ENGINEERING (1)
SOURCE-TO-SOURCE PRECOMPILER (1)
more

INFONA - science communication portal

Search results for: Jia Jia

FTPA: Supporting Fault-Tolerant Parallel Computing through Parallel Recomputing

GiFT: Automating FTPA Implementation for MPI Programs

Compiler-Assisted Application-Level Checkpointing for MPI Programs

Building Single Fault Survivable Parallel Algorithms for Matrix Operations Using Redundant Parallel Computation

The Fault Tolerant Parallel Algorithm: the Parallel Recomputing Based Failure Recovery

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options