Search results for: S.L. Scott

Items from 1 to 5 out of 5 results

chapter

Proactive Fault Tolerance Using Preemptive Migration

C. Engelmann, G.R. Vallee, T. Naughton, S.L. Scott

2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing > 252 - 257

2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing

Proactive fault tolerance (FT) in high-performance computing is a concept that prevents compute node failures from impacting running parallel applications by preemptively migrating application parts away from nodes that are about to fail. This paper provides a foundation for proactive FT by defining its architecture and classifying implementation options. This paper further relates prior work to the...

chapter

Proactive process-level live migration in HPC environments

Chao Wang, F. Mueller, C. Engelmann, S.L. Scott

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

As the number of nodes in high-performance computing environments keeps increasing, faults are becoming common place. Reactive fault tolerance (FT) often does not scale due to massive I/O requirements and relies on manual job resubmission. This work complements reactive with proactive FT at the process level. Through health monitoring, a subset of node failures can be anticipated when one's health...

chapter

An optimal checkpoint/restart model for a large scale high performance computing system

Yudan Liu, R. Nassar, C. Leangsuksun, N. Naksinehaboon, more

2008 IEEE International Symposium on Parallel and Distributed Processing > 1 - 9

2008 IEEE International Parallel & Distributed Processing Symposium

The increase in the physical size of high performance computing (HPC) platform makes system reliability more challenging. In order to minimize the performance loss (rollback and checkpoint overheads) due to unexpected failures or unnecessary overhead of fault tolerant mechanisms, we present a reliability-aware method for an optimal checkpoint/restart strategy. Our scheme aims at addressing fault tolerance...

chapter

Transparent Symmetric Active/Active Replication for Service-Level High Availability

C. Engelmann, S.L. Scott, C. Leangsuksun, X. He

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '7) > 755 - 760

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)

As service-oriented architectures become more important in parallel and distributed computing systems, individual service instance reliability as well as appropriate service redundancy becomes an essential necessity in order to increase overall system availability. This paper focuses on providing redundancy strategies using service-level replication techniques. Based on previous research using symmetric...

chapter

Design and evaluation of a high performance parallel file system

Li Ou, Xubin He, S.L. Scott, Zhiyong Xu, more

The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'5)l > 100 - 107

Proceedings. The IEEE Conference on Local Computer Networks-30th Anniversary

In this paper we propose a high performance parallel file system over iSCSI (iPVFS) for cluster computing. iPVFS provides a cost-effective solution for heterogeneous cluster environment by dividing a set of I/O servers into two groups, one group with higher performance servers as I/O nodes, while another group with relatively lower performance machines serves as storage target nodes. This combination...

Filter options

Keywords:
PARALLEL PROCESSING

Publication date

Set your own date range

Content availability

Available (4)
None (1)

Keywords

FAULT TOLERANT COMPUTING (3)
MONITORING (2)
RELIABILITY (2)
SENSORS (2)
CHECKPOINT INTERVAL (1)
CHECKPOINT/RESTART MODEL (1)
CHECKPOINTING (1)
CLIENT-SERVER SYSTEMS (1)
CLIENT-SIDE INTERCEPTOR (1)
CLUSTER COMPUTING (1)
DISTRIBUTED GRID COMPUTING (1)
FAILURE DISTRIBUTIONS (1)
FAULT TOLERANCE (1)
FAULT TOLERANT MECHANISMS (1)
FAULT TOLERANT SYSTEMS (1)
FAULT-TOLERANCE (1)
GRID COMPUTING (1)
HEALTH MONITORING (1)
HEALTH-INFLICTED NODE FAILURE (1)
HETEROGENEOUS CLUSTER ENVIRONMENT (1)
HIGH PERFORMANCE COMPUTING (1)
HIGH PERFORMANCE PARALLEL FILE SYSTEM (1)
HIGH-PERFORMANCE COMPUTING (1)
HIGH-PERFORMANCE COMPUTING ENVIRONMENT (1)
HPC (1)
HPC ENVIRONMENT (1)
I-O NODES (1)
I-O SERVERS (1)
I/O REQUIREMENTS (1)
INDIVIDUAL SERVICE INSTANCE RELIABILITY (1)
INFORMATION FILTERING (1)
INFORMATION FILTERS (1)
IPVFS (1)
ISCSI (1)
LARGE SCALE HPC SYSTEM (1)
LARGE-SCALE DISTRIBUTED SYSTEM EVENTS LOG ANALYSIS (1)
MANUAL JOB RESUBMISSION (1)
MEMORY MANAGEMENT (1)
MESSAGE PASSING (1)
MPI EXECUTION ENVIRONMENT (1)
NODE FAILURES (1)
OPTIMAL CHECKPOINT-RESTART MODEL (1)
PARALLEL APPLICATION (1)
PARALLEL COMPUTING (1)
PERFORMANCE LOSS (1)
PREEMPTIVE MIGRATION (1)
PROACTIVE FAULT TOLERANCE ARCHITECTURE (1)
PROACTIVE PROCESS-LEVEL LIVE MIGRATION (1)
REACTIVE FAULT TOLERANCE (1)
REDUNDANCY STRATEGY (1)
RELIABILITY-AWARE METHOD (1)
ROLLBACK TIME (1)
SERVICE-LEVEL HIGH AVAILABILITY (1)
SERVICE-LEVEL REPLICATION TECHNIQUE (1)
SERVICE-ORIENTED ARCHITECTURE (1)
SERVICE-SIDE INTERCEPTOR (1)
SOFTWARE ARCHITECTURE (1)
SOFTWARE RELIABILITY (1)
STORAGE TARGET NODES (1)
SYSTEM FAILURE (1)
SYSTEM MONITORING (1)
SYSTEM RECOVERY (1)
TEMPERATURE MEASUREMENT (1)
TEMPERATURE SENSORS (1)
TRANSPARENT SYMMETRIC ACTIVE REPLICATION (1)
VIRTUAL COMMUNICATION LAYER (1)
WORKSTATION CLUSTERS (1)
more

INFONA - science communication portal

Search results for: S.L. Scott

Proactive Fault Tolerance Using Preemptive Migration

Proactive process-level live migration in HPC environments

An optimal checkpoint/restart model for a large scale high performance computing system

Transparent Symmetric Active/Active Replication for Service-Level High Availability

Design and evaluation of a high performance parallel file system

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options