Search results for: Dan Meng

Items from 1 to 6 out of 6 results

chapter

MAMS: A Highly Reliable Policy for Metadata Service

Jiang Zhou, Yong Chen, Weiping Wang, Dan Meng

2015 44th International Conference on Parallel Processing > 729 - 738

2015 44th International Conference on Parallel Processing (ICPP)

Most mass data processing applications nowadays often need long, continuous, and uninterrupted data access. Parallel/distributed file systems often use multiple metadata servers to manage the global namespace and provide a reliability guarantee. With the rapid increase of data amount and system scale, the probability of hardware or software failures keeps increasing, which easily leads to multiple...

chapter

HR-NET: A Highly Reliable Message-Passing Mechanism for Cluster File System

Jiang Zhou, Can Ma, Jin Xiong, Dan Meng

2011 IEEE Sixth International Conference on Networking, Architecture, and Storage > 364 - 371

2011 6th IEEE International Conference on Networking, Architecture, and Storage (NAS)

As PC clusters increase in popularity and quantity, message-passing between nodes has been an important issue for high failure rate in the network. File access in a cluster file system often contains several sub-operations, each includes one or more network transmissions. Any network failures will cause the file system service unavailable. In this paper, we describe a highly reliable message-passing...

chapter

Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration

Xiang Zhang, Zhigang Huo, Jie Ma, Dan Meng

2010 IEEE International Conference on Cluster Computing > 88 - 96

2010 IEEE International Conference on Cluster Computing (CLUSTER 2010)

As one of the key characteristics of virtualization, live virtual machine (VM) migration provides great benefits for load balancing, power management, fault tolerance and other system maintenance issues in modern clusters and data centers. Although Pre-Copy is a widespread used migration algorithm, it does transfer a lot of duplicated memory image data from source to destination, which results in...

chapter

DCR: A fully transparent checkpoint/restart framework for distributed systems

Can Ma, Zhigang Huo, Jingnan Cai, Dan Meng

2009 IEEE International Conference on Cluster Computing and Workshops > 1 - 10

2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER)

Checkpoint/restart has been widely used in computing systems for fault tolerance, job scheduling and system maintenance purposes. However, the lack of transparency has hindered adoptions of many implementations of it. In this paper, we present a fully transparent parallel checkpoint/restart framework, DCR, which takes the advantages of kernel-level checkpointing method and TCP session preservation...

chapter

HMF: High-available Message-passing Framework for Cluster File System

Dong Yang, Zhuan Chen, Rongfeng Tang, Jin Xiong, more

2009 IEEE International Conference on Networking, Architecture, and Storage > 249 - 252

2009 IEEE International Conference on Networking, Architecture, and Storage (NAS)

In large-scale cluster systems, the failure rate of network connection is non-negligibly high. A cluster file system must have the ability to handle network failures in order to provide high-available data accesses service. Traditionally, network failure handling is only guaranteed by network protocol, or implemented within the file system semantic layer. We present the high-available message-passing...

chapter

A Fast-Start, Fault-Tolerant MPI Launcher on Dawning Supercomputers

Xu Liu, Bibo Tu, Jianfeng Zhan, Dan Meng

2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies > 263 - 266

2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies

Daemon-based MPI launchers are the mainstream in nowadays, because they can startup processes rapidly. However, effective task management and fault tolerance become more important as the scale of supercomputers enlarges. A new fast-start and fault tolerant launcher, called SFLauncher, has been used to startup MPICH task on Dawning supercomputers. This paper details its features and implementation,...

Filter options

Keywords:
FAULT TOLERANCE

Publication date

Set your own date range

Keywords

FAULT TOLERANT SYSTEMS (5)
SERVERS (5)
PROTOCOLS (4)
CLUSTER FILE SYSTEM (2)
FILE SYSTEMS (2)
MESSAGE PASSING (2)
SYSTEM MAINTENANCE (2)
TRANSPORT PROTOCOLS (2)
WORKSTATION CLUSTERS (2)
ACCELERATION (1)
APPLICATION PROGRAM INTERFACES (1)
AVAILABILITY (1)
BANDWIDTH (1)
BANKING (1)
CHECKPOINTING (1)
CLUSTER FILE SYSTEMS (1)
COMMUNICATION CHANNELS (1)
COMPUTER CRASHES (1)
CONTEXT (1)
DATA CENTERS (1)
DATA DEDUPLICATION (1)
DAWNING 5000A (1)
DAWNING SUPERCOMPUTER (1)
DCR (1)
DISTRIBUTED SYSTEMS (1)
DUPLICATED MEMORY IMAGE DATA (1)
ENCODING (1)
FAST-START FAULT-TOLERANT MPI LAUNCHER (1)
FILE ORGANISATION (1)
FILE SERVERS (1)
FINGERPRINT RECOGNITION (1)
HARDWARE (1)
HASH BASED FINGERPRINT (1)
HEART BEAT (1)
HIGH AVAILABILITY (1)
HIGH RELIABILITY (1)
HIGH-AVAILABLE MESSAGE-PASSING (1)
INSTRUCTION SETS (1)
JOB SCHEDULING (1)
KERNEL-LEVEL CHECKPOINTING METHOD (1)
LARGE-SCALE CLUSTERS (1)
LIVE MIGRATION (1)
LIVE VIRTUAL MACHINE MIGRATION (1)
LOAD BALANCING (1)
MEMORY MANAGEMENT (1)
MESSAGE PASSING INTERFACE (1)
MESSAGE PASSING LAYER (1)
MESSAGE PASSING MECHANISM (1)
METADATA (1)
METADATA MANAGEMENT (1)
MPI LAUNCHER (1)
MPICH2 APPLICATIONS (1)
MULTIPLE METADATA SERVICE (1)
NETWORK CONNECTION (1)
NETWORK FAILURE HANDLING (1)
NETWORK FAULT-TOLERANCE DESIGN (1)
NETWORK PROTOCOL (1)
ON-DEMAND BLOCKING CHECKPOINT PROTOCOL (1)
OPERATING SYSTEM KERNELS (1)
PARALLEL APPLICATIONS (1)
PARALLEL FILE SYSTEMS (1)
PARALLEL MACHINES (1)
PARALLEL PROCESSING (1)
PEER TO PEER COMPUTING (1)
POWER MANAGEMENT (1)
REDUNDANT MEMORY DATA ELIMINATION (1)
RELIABILITY MECHANISM (1)
RESOURCE ALLOCATION (1)
RUN LENGTH ENCODE (1)
RUN-TIME MEMORY IMAGE (1)
SCALABILITY (1)
SCHEDULING (1)
SELF-SIMILARITY (1)
SOCKETS (1)
SOFTWARE (1)
SOFTWARE FAULT TOLERANCE (1)
SOFTWARE MAINTENANCE (1)
STORAGE AREA NETWORKS (1)
STORAGE MANAGEMENT (1)
TASK MANAGEMENT (1)
TCP SESSION PRESERVATION (1)
TCP/IP (1)
TRANSPARENT PARALLEL CHECKPOINT FRAMEWORK (1)
TRANSPARENT RESTART FRAMEWORK (1)
VIRTUAL MACHINES (1)
VIRTUALIZATION (1)
more

INFONA - science communication portal

Search results for: Dan Meng

MAMS: A Highly Reliable Policy for Metadata Service

HR-NET: A Highly Reliable Message-Passing Mechanism for Cluster File System

Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration

DCR: A fully transparent checkpoint/restart framework for distributed systems

HMF: High-available Message-passing Framework for Cluster File System

A Fast-Start, Fault-Tolerant MPI Launcher on Dawning Supercomputers

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options