Advanced search

Advanced search in people

From:

To:

Items from 1 to 15 out of 15 results

chapter

Transparent Fault-Tolerance Using Intra-Machine Full-Software-Stack Replication on Commodity Multicore Hardware

Giuliano Losa, Antonio Barbalace, Yuzhong Wen, Ho-Ren Chuang, more

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) > 1521 - 1531

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

As the number of processors and the size of the memory of computing systems keep increasing, the likelihood of CPU core failures, memory errors, and bus failures increases and can threaten system availability. Software components can be hardened against such failures by running several replicas of a component on hardware replicas that fail independently and that are coordinated by a State-Machine...

chapter

Design of OpenCL-compatible multithreaded hardware accelerators with dynamic support for embedded FPGAs

Alfonso Rodrıguez, Juan Valverde, Eduardo de la Torre

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 7

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

ARTICo³ is an architecture that permits to dynamically set an arbitrary number of reconfigurable hardware accelerators, each containing a given number of threads fixed at design time according to High Level Synthesis constraints. However, the replication of these modules can be decided at runtime to accelerate kernels by increasing the overall number of threads, add modular redundancy to increase...

chapter

A dynamically adaptable bus architecture for trading-off among performance, consumption and dependability in Cyber-Physical Systems

J. Valverde, A. Rodriguez, J. Camarero, A. Otero, more

2014 24th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

Cyber-Physical Systems need to handle increasingly complex tasks, which additionally, may have variable operating conditions over time. Therefore, dynamic resource management to adapt the system to different needs is required. In this paper, a new bus-based architecture, called ARTICo³, which by means of Dynamic Partial Reconfiguration, allows the replication of hardware tasks to support module redundancy,...

chapter

SIDE: Isolated and efficient execution of unmodified device drivers

Yifeng Sun, Tzi-cker Chiueh

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) > 1 - 12

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Buggy device drivers are a major threat to the reliability of their host operating system. There have been myriad attempts to protect the kernel, but most of them either required driver modifications or incur substantial performance overhead. This paper describes an isolated device driver execution system called SIDE (Streamlined Isolated Driver Execution), which focuses specifically on unmodified...

chapter

Data storage optimization of application-level checkpointing on heterogeneous systems

Jia Jia, Wei Song

IEEE Conference Anthology > 1 - 6

2013 IEEE Conference Anthology

General purpose GPU's (GPGPU) appearance made it possible that heterogeneous computing can be used by human beings. And it's also produce a reform for GPU's general purpose computing and parallel computing. Heterogeneous Systems has been adopted by large-scale of high-performance computers. Nowadays, fault tolerance technique is necessary among these large-scale kinds of scientific computing, but...

chapter

Applying Microreboot to System Software

Michael Le, Yuval Tamir

2012 IEEE Sixth International Conference on Software Security and Reliability > 11 - 20

2012 6th International Conference on Software Security and Reliability (SERE)

Availability is increased with recovery based on component micro reboot instead of whole system reboot. There are unique challenges that must be overcome in order to apply micro reboot to low-level system software. These challenges arise from the need to interact with immutable hardware components on one hand and, on the other hand, with a wide variety of higher level workloads whose characteristics...

chapter

A Scalable Fault Management Architecture for ccNUMA Server

Yan Yang, Xingjun Zhang, Endong Wang, Nan Wu, more

2011 Third International Conference on Intelligent Networking and Collaborative Systems > 709 - 714

2011 Third International Conference on Intelligent Networking and Collaborative Systems (INCoS)

Linux servers with heterogeneous architectures present a new challenge for fault management. With the significant increase in the numbers and types of hardware components, separate fault management becomes more complex and inefficient. It is clear that centralized management, automatic recovering and scalable design must be incorporated in the modern fault management system. Based on the ccNUMA architecture,...

chapter

Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU

Keun Soo Yim, Cuong Pham, Mushfiq Saleheen, Zbigniew Kalbarczyk, more

2011 IEEE International Parallel & Distributed Processing Symposium > 287 - 300

2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS)

High performance and relatively low cost of GPU-based platforms provide an attractive alternative for general purpose high performance computing (HPC). However, the emerging HPC applications have usually stricter output cor-rectness requirements than typical GPU applications (i.e., 3D graphics). This paper first analyzes the error resiliency of GPGPU platforms using a fault injection tool we have...

chapter

A Lightweight, GPU-Based Software RAID System

Matthew L Curry, H Lee Ward, Anthony Skjellum, Ron Brightwell

2010 39th International Conference on Parallel Processing > 565 - 572

39th International Conference on Parallel Processing (ICPP 2010)

While RAID is the prevailing method of creating reliable secondary storage infrastructure, many users desire more flexibility than offered by current implementations. Traditionally, RAID capabilities have been implemented largely in hardware in order to achieve the best performance possible, but hardware RAID has rigid designs that are costly to change. Software implementations are much more flexible,...

chapter

Fault Injection Technology for Software Vulnerability Testing Based on Xen

Fanping Zeng, Juan Li, Ling Li, Xufa Wang

2009 WRI World Congress on Software Engineering > 4 > 206 - 210

2009 WRI World Congress on Software Engineering. WCSE 2009

Fault injection technology devotes an efficient way for verifying fault tolerance of computer and detecting the vulnerability of software system. In this paper, we present a Xen-based fault injection technology for software vulnerability test (XFISV) in order to build an efficient and general-purpose software test model, which injects faults into interactive layer between software applications and...

chapter

Error Behavior Comparison of Multiple Computing Systems: A Case Study Using Linux on Pentium, Solaris on SPARC, and AIX on POWER

D. Chen, G. Jacques-Silva, Z. Kalbarczyk, R.K. Iyer, more

2008 14th IEEE Pacific Rim International Symposium on Dependable Computing > 339 - 346

2008 14th IEEE Pacific Rim International Symposium on Dependable Computing

This paper presents an approach to conducting experimental studies for the characterization and comparison of the error behavior in different computing systems. The proposed approach is applied to characterize and compare the error behavior of three commercial systems (Linux 2.6 on Pentium 4, Solaris 10 on UltraSPARC IIIi, and AIX 5.3 on POWER 5) under hardware transient faults. The data is obtained...

chapter

Minos—the design and implementation of an embedded real-time operating system with a perspective of fault tolerance

T. Kaegi-Trachsel, J. Gutknecht

2008 International Multiconference on Computer Science and Information Technology > 649 - 656

2008 International Multiconference on Computer Science and Information Technology

This paper describes the design and implementation of a small real time operating system (OS) called Minos and its application in an onboard active safety project for general aviation. The focus of the operating system is predictability, stability, safety and simplicity. We introduce fault tolerance aspects in software by the concept of a very fast reboot procedure and by an error correcting flight...

chapter

PALM: Security Preserving VM Live Migration for Systems with VMM-enforced Protection

Fengzhe Zhang, Yijian Huang, Huihong Wang, Haibo Chen, more

2008 Third Asia-Pacific Trusted Infrastructure Technologies Conference > 9 - 18

2008 Third Asia-Pacific Trusted Infrastructure Technologies Conference

Live migration of virtual machine (VM) is a desirable feature for distributed computing such as grid computing and recent cloud computing by facilitating fault tolerance, load balance, and hardware maintenance. Virtual machine monitor (VMM) enforced process protection is a newly advocated approach to provide a trustworthy execution environment for processes running on commodity operating systems.While...

chapter

Transparent system-level migration of PGAS applications using Xen on InfiniBand

D.P. Scarpazza, P. Mullaney, O. Villa, F. Petrini, more

2007 IEEE International Conference on Cluster Computing > 74 - 83

2007 IEEE International Conference on Cluster Computing (CLUSTER)

Checkpoint-restart is considered one of the most natural approaches to achieving fault-tolerance in a high-performance cluster. While experiences has focused attention on user-level solutions, the advent of efficient system-level virtualization software, such as Xen and VMWare, has opened the door to the possibility of efficient and scalable cluster-level virtualization. In this paper we present an...

chapter

Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability

J. N. Glosli, D. F. Richards, K. J. Caspersen, R. E. Rudd, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 11

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

We report the computational advances that have enabled the first micron-scale simulation of a Kelvin-Helmholtz (KH) instability using molecular dynamics (MD). The advances are in three key areas for massively parallel computation such as on BlueGene/L (BG/L): fault tolerance, application kernel optimization, and highly efficient parallel I/O. In particular, we have developed novel capabilities for...

Filter options

Keywords:
HARDWARE
FAULT TOLERANCE
KERNEL
Publication type:
book

Publication date

Set your own date range

Keywords

LINUX (6)
FAULT TOLERANT SYSTEMS (5)
COMPUTER ARCHITECTURE (3)
FAULT INJECTION (3)
CHECKPOINTING (2)
DYNAMIC AND PARTIAL RECONFIGURATION (2)
FPGAS (2)
GRAPHICS PROCESSING UNIT (2)
GRAPHICS PROCESSING UNITS (2)
MONITORING (2)
OPERATING SYSTEMS (2)
OPTIMIZATION (2)
PARALLEL PROCESSING (2)
PERFORMANCE EVALUATION (2)
RANDOM ACCESS MEMORY (2)
SECURITY (2)
SERVERS (2)
SOFTWARE FAULT TOLERANCE (2)
VIRTUAL MACHINES (2)
XEN (2)
ACCELERATION (1)
ADAPTATION MODELS (1)
AEROSPACE ENGINEERING (1)
AGGREGATE REMOTE MEMORY COPY INTERFACE (1)
AIX (1)
AIX 5.3 (1)
APACHE WEB SERVER (1)
APPLICATION-LEVEL CHECKPOINTING (1)
ARMCI ONE-SIDED COMMUNICATION LIBRARY (1)
ARRAYS (1)
CCNUMA (1)
CHAOS (1)
CHECKPOINT-RESTART APPROACH (1)
CLOUD COMPUTING (1)
COMMODITY OPERATING SYSTEMS (1)
COMPUTATIONAL MODELING (1)
COMPUTER CRASHES (1)
COMPUTER GRAPHIC EQUIPMENT (1)
CONTEXT (1)
COPROCESSORS (1)
CRYPTOGRAPHY (1)
CYBER-PHYSICAL SYSTEMS (1)
DATA PRIVACY (1)
DATA STRUCTURES (1)
DATABASES (1)
DEPENDABILITY (1)
DEPENDABILITY ANALYSIS (1)
DETECTORS (1)
DEVICE DRIVER ISOLATION (1)
DISTRIBUTED COMPUTING (1)
DRIVER CIRCUITS (1)
EMBEDDED REAL-TIME OPERATING SYSTEM (1)
EMBEDDED SYSTEMS (1)
ENCODING (1)
ERROR BEHAVIOR (1)
ERROR HANDLING (1)
FAULT TOLERANCE METHOD (1)
FAULT TOLERANT COMPUTING (1)
FIELD PROGRAMMABLE GATE ARRAYS (1)
FILE SERVERS (1)
FLIGHT DATA MEMORY (1)
FORCE (1)
GENERAL AVIATION (1)
GENERAL PURPOSE GPU (1)
GLOBAL RECOVERY LINE IDENTIFICATION (1)
GNU LINUX (1)
GPU BASED SOFTWARE RAID SYSTEM (1)
GRID COMPUTING (1)
HARDWARE ACCELERATORS (1)
HARDWARE FAILURE (1)
HARDWARE MAINTENANCE (1)
HARDWARE TRANSIENT FAULT (1)
HETEROGENEOUS SYSTEM (1)
HIGH-PERFORMANCE CLUSTER FAULT-TOLERANCE (1)
INFINIBAND NETWORK (1)
INTERNET (1)
KERNEL CODE (1)
KERNEL STACK (1)
LIBRARIES (1)
LINUX 2.6 (1)
LINUX OPERATING SYSTEM (1)
LIVE MIGRATION (1)
LOAD BALANCE (1)
MINOS (1)
MULTICORE (1)
MULTIPLE COMPUTING SYSTEM (1)
MULTITHREADING (1)
NFTAPE FRAMEWORK (1)
ONBOARD ACTIVE SAFETY PROJECT (1)
OPENCL (1)
OPERATING SYSTEM KERNELS (1)
OPERATING SYSTEMS (COMPUTERS) (1)
PALM (1)
PARTITIONED GLOBAL ADDRESS SPACE (1)
PENTIUM 4 (1)
PGAS PROGRAMMING MODEL (1)
POSIX (1)
more

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options