Advanced search

Advanced search in people

From:

To:

Items from 1 to 20 out of 53 results

chapter

A Performance and Energy Comparison of Fault Tolerance Techniques for Exascale Computing Systems

Daniel Dauwe, Sudeep Pasricha, Anthony A. Maciejewski, Howard Jay Siegel

2016 IEEE International Conference on Computer and Information Technology (CIT) > 436 - 443

2016 IEEE International Conference on Computer and Information Technology (CIT)

As the computing power of large scale computing systems increases exponentially with time, their failure rates are increasing exponentially as well. While current high performance computing (HPC) systems experience failures of some type every few days, projections indicate that the next generation exascale machines will experience failures up to several times an hour. The resilience techniques implemented...

chapter

On Time Redundancy of Fault Tolerant C-Based MPSoCs

Anjana Balachandran, Nandeesh Veeranna, Benjamin Carrion Schafer

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) > 631 - 636

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Most prior work on hardware reliability make use of module (spatial) redundancy or time redundancy. In the first case, these methods assume that each module is exactly the same. Multiple module replicas implementing the same logic function are executed in different hardware channels and a voting scheme detects if the outputs match or not. In the second case, they re-compute the result using the same...

chapter

Enabling NoC Performance Improvement Using a Fault Tolerance Mechanism

Alba Sandyra Bezerra Lopes, Marcio Eduardo Kreutz, Monica Magalhaes Pereira

2015 Brazilian Symposium on Computing Systems Engineering (SBESC) > 7 - 12

2015 Brazilian Symposium on Computing Systems Engineering (SBESC)

In multicore era, enabled by the decrease of transistors size, networks on chips (NoCs) emerged as a fast and scalable solution in replacement to buses systems. While providing high performance, the process of transistors miniaturization affects the dependability of the systems due to increase of fault rates caused by the susceptibility of transistors, wire and connections at deep submicron scale...

chapter

Diverse Compiling for Microprocessor Fault Detection in Temporal Redundant Systems

Andrea Holler, Tobias Rauter, Johannes Iber, Christian Kreiner

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing > 1928 - 1935

As hardware components are expected to become ever more unreliable due to the technology scaling, hardware errors have become unavoidable. Dependable systems that rely on a correct functionality often use redundancy to detect such hardware faults during operation. However, to design costefficient reliable systems, it is crucial to effectively exploit the available redundancy. Thus, researchers have...

chapter

Context-aware resources placement for SRAM-based FPGA to minimize checkpoint/recovery overhead

Sahraoui Fouad, Fakhreddine Ghaffari, Mohamed El Amine Benkhelifa, Bertrand Granado

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) > 1 - 6

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

Existing SRAM-based Field Programmable Gate Arrays (FPGAs) are very sensitive to Single Event Effects (SEE) phenomena in harsh environments. To protect applications running on SRAM-based FPGAs from SEE, those applications mainly relay on resources redundancy approaches, which involve significant resources overhead. New proposed fault mitigation approaches use Partial Dynamic Reconfiguration to overcome...

chapter

Study on Error-Detecting Approach for Fault Tolerance Recomputing Oriented Parallel Digital Terrain Analysis

Shoushuai Miao, Wanfeng Dou, Yan Li

2014 13th International Symposium on Distributed Computing and Applications to Business, Engineering and Science > 148 - 151

2014 13th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES)

In recent years, the research of parallel digital terrain analysis has become a hot spot. Using the parallel computing technology to solve data intensive problems, and it has become a development trend in digital terrain analysis. On the other hand, with the development of hardware technology and new applications, how to ensure the reliability of the computing results is a one of the key problems...

chapter

Real time scheduling of multiple executions of tasks to achieve fault tolerance in multiprocessor systems

Hussain Al-Asaad

2014 IEEE AUTOTEST > 323 - 328

2014 IEEE AUTOTEST

Modern computing systems are increasingly becoming more vulnerable to reliability issues due to both permanent (hard) and transient/intermittent (soft) errors. Various techniques have been proposed to incorporate redundancy into the hardware or software in order to achieve the desired fault tolerance. We present a technique that allows a task to be executed multiple times on multiprocessor systems...

chapter

Harnessing Unreliable Cores in Heterogeneous Architecture: The PyDac Programming Model and Runtime

Bin Huang, Ron Sass, Nathan Debardeleben, Sean Blanchard

2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks > 744 - 749

2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Heterogeneous many-core architectures combined with scratch-pad memories are attractive because they promise better energy efficiency than conventional architectures and a good balance between single-thread performance and multi-thread throughput. However, programmers will need an environment for finding and managing the large degree of parallelism, locality, and system resilience. We propose a Python-based...

chapter

A survey on simulation-based fault injection tools for complex systems

Maha Kooli, Giorgio Di Natale

2014 9th IEEE International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) > 1 - 6

2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS)

Dependability is a key decision factor in today's global business environment. A powerful method that permits to evaluate the dependability of a system is the fault injection. The principle of this approach is to insert faults into the system and to monitor its responses in order to observe its behavior in the presence of faults. Several fault injection techniques and tools have been developed and...

chapter

A Practitioner's Guide to Software-Based Soft-Error Mitigation Using AN-Codes

Martin Hoffmann, Peter Ulbrich, Christian Dietrich, Horst Schirmeier, more

2014 IEEE 15th International Symposium on High-Assurance Systems Engineering > 33 - 40

2014 IEEE 15th International Symposium on High-Assurance Systems Engineering (HASE)

Arithmetic error coding schemes (AN codes) are a well known and effective technique for soft error mitigation. Although coding theory being a rich area of mathematics, their implementation seems to be fairly easy. However, compliance with the theory can be lost easily while moving towards an actual implementation - finally jeopardizing the aspired fault-tolerance characteristics. In this paper, we...

chapter

Development of a Fault Injection System to Test a Weather Station Based on Rapid Prototyping Platform

Kleber Kruger, Fabio Iaione

2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing > 1652 - 1657

2013 IEEE International Conference on High Performance Computing and Communications (HPCC) & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

Fault injection has been an important mechanism to test the dependability properties of a system. Through this mechanism, it is possible to analyze the behavior of a computer program in case of anomalies and to obtain useful statistics to measure the effectiveness of techniques for fault tolerance. In areas such as telecommunications, aviation and finance, the use of fault tolerance is a common practice,...

chapter

ATS redundant design in support of system & mission sustainability

Larry Nick Wilson, Byron Radle, Michael N. Granieri

2013 IEEE AUTOTESTCON > 1 - 7

2013 IEEE AUTOTESTCON

Today system reliability, availability, serviceability, and manageability (RASM) are becoming more crucial as computer based systems continue to increase in complexity and importance to our daily lives. Redundancy is a viable approach to improve the RASM attributes of a system. There are many forms of fault tolerant/redundant system architectures employed in both commercial and military /aerospace...

chapter

Totally self-checking (TSC) VLSI circuits using Scalable Error Detection Coding (SEDC) technique

Natarajan Somasundaram, Farhad Mehdipour, Jeong-A Lee, N Ramadass, more

Fifth Asia Symposium on Quality Electronic Design (ASQED 2013) > 72 - 79

2013 5th Asia Symposium on Quality Electronic Design (ASQED)

Integrated circuits fabricated in deep sub-micron technology are vulnerable to intermittent or transient faults which are the predominant cause of system failures. With continued scaling, operating voltage levels have reduced and resultant decrease in noise margins, the possibility of transient faults is likely to increase. Also, during operation in adverse environments, transient faults occur upon...

chapter

Response-Time Analysis of Parallel Fork-Join Workloads with Real-Time Constraints

Philip Axer, Sophie Quinton, Moritz Neukirchner, Rolf Ernst, more

2013 25th Euromicro Conference on Real-Time Systems > 215 - 224

2013 25th Euromicro Conference on Real-Time Systems (ECRTS)

The advent of multi- and many-core processors comes with new challenges and opportunities for the designer of embedded real-time applications. By using parallel programming techniques (e.g. OpenMP) software engineers can leverage from the available hardware parallelism and speed up the algorithms. The inherent redundancy of multi-core architectures can also be used to implement fault-tolerance by...

chapter

Layout design and simulation of fault tolerant triple modular redundant ALU system

Sudipta Ghosh, Jitendra Singh Sengar

2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT) > 1 - 7

2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT)

Reliability is one of the most critical factors that is to be considered during the designing phase of any product. There are many factors that contribute to make a system more reliable in terms of area, power, operating frequency and accuracy. This paper proposes the design of a 4bit fault tolerant ALU system using backend designing. Parallel processing along with triple modular redundancy (TMR)...

chapter

Is 3D integration the way to future dependable computing platforms?

Saleh Safiruddin, Demid Borodin, Mihai Lefter, George Voicu, more

2012 13th International Conference on Optimization of Electrical and Electronic Equipment (OPTIM) > 1233 - 1242

2012 13th International Conference on Optimization of Electrical and Electronic Equipment

Achieving dependable computing systems is becoming increasingly more difficult as CMOS integrated circuits technology scaling reaches sub-22nm ranges and faces physical limitations. Dependable computing is also a major concern with the various new technologies that are being investigated to overcome the physical limitations of CMOS technology. 3D integration, though initially proposed as a way of...

chapter

A co-design approach for SET mitigation in embedded systems

A. Lindoso, L. Entrena, E. San Millan, S. Cuenca-Asensi, more

2011 12th European Conference on Radiation and Its Effects on Components and Systems > 489 - 492

2011 12th European Conference on Radiation and Its Effects on Components and Systems (RADECS)

We propose a new methodology for hardware/software co-design of embedded systems which is specifically aimed to mitigate SET effects. A hardening infrastructure is used to generate different versions of the design using several combinations of hardware and software hardening which are evaluated with respect to SET effects. The advantages of the proposed approach are demonstrated by means of a case...

chapter

On graceful degradation of chip multiprocessors in presence of faults via flexible pooling of critical execution units

Rance Rodrigues, Sandip Kundu

2011 IEEE 17th International On-Line Testing Symposium > 67 - 72

2011 IEEE 17th International On-Line Testing Symposium (IOLTS 2011)

Reliability and manufacturability have emerged as dominant concerns for today's multi-billion transistor chips. In this paper, we investigate how to degrade a chip multiprocessor (CMP) gracefully in presence of faults, by keeping its architected functionality intact at the expense of some loss of performance. The proposed solution involves sharing critical execution resources among cores to survive...

chapter

Error Detection by Redundant Transaction in Transactional Memory System

Wei Song, Jia Jia, Yu-xing Peng

2011 IEEE Sixth International Conference on Networking, Architecture, and Storage > 220 - 224

2011 6th IEEE International Conference on Networking, Architecture, and Storage (NAS)

This paper addresses the issue of error detection in transactional memory, and proposes a new method of error detection based on redundant transaction (EDRT). This method creates a transaction copy for every transaction, and executes both original transactions and transaction copies on adequate processor cores, and achieves error detection by comparing the execution results. EDRT utilizes the data-versioning...

chapter

CPU-aware, process-level redundancy to tolerate faults in multi-core

Hananeh Aliee, Hamid R. Zarandi, Alireza Tajary

2011 International Conference on High Performance Computing & Simulation > 343 - 349

2011 International Conference on High Performance Computing & Simulation (HPCS)

This paper proposes: 1) A dynamically scheduled Process-Level Redundancy (PLR) for enhancing reliability of multi-core systems, 2) A comparison between PLR and Thread-Level Redundancy (TLR), and 3) A fault study on the thread selector unit of a modern processor. The proposed technique employs underutilized CPU resources to improve fault tolerance ability of a system. The evaluation on PLR reliability...

Keywords:
HARDWARE
FAULT TOLERANCE
REDUNDANCY
Publication type:
book

Publication date

Set your own date range

Keywords

FAULT TOLERANT SYSTEMS (26)
CIRCUIT FAULTS (18)
RELIABILITY (11)
COMPUTER ARCHITECTURE (10)
FIELD PROGRAMMABLE GATE ARRAYS (9)
SOFTWARE (9)
FPGA (6)
INTEGRATED CIRCUIT RELIABILITY (6)
REGISTERS (6)
TUNNELING MAGNETORESISTANCE (6)
FAULT TOLERANT COMPUTING (5)
LOGIC GATES (5)
SOFTWARE FAULT TOLERANCE (5)
EMBEDDED SYSTEM (4)
HARDWARE REDUNDANCY (4)
INSTRUCTION SETS (4)
LOGIC DESIGN (4)
MAGNETIC CORES (4)
MULTICORE PROCESSING (4)
OPTIMIZATION (4)
PROGRAM PROCESSORS (4)
COMPUTERS (3)
DELAY (3)
EMBEDDED SYSTEMS (3)
ERROR DETECTION (3)
FAULT DETECTION (3)
FAULT INJECTION (3)
FAULT-TOLERANCE (3)
INFORMATION REDUNDANCY (3)
INTEGRATED CIRCUIT DESIGN (3)
IP NETWORKS (3)
MULTIPROCESSING SYSTEMS (3)
RECONFIGURABLE ARCHITECTURES (3)
SOFT ERROR (3)
SOFT ERRORS (3)
SYSTEM-ON-A-CHIP (3)
TIME REDUNDANCY (3)
TRANSISTORS (3)
TRIPLE MODULAR REDUNDANCY (3)
VLSI (3)
ADDERS (2)
ALGORITHM DESIGN AND ANALYSIS (2)
BENCHMARK TESTING (2)
BOOLEAN FUNCTIONS (2)
CHECKPOINTING (2)
CIRCUIT IMPLEMENTATION (2)
COMBINATIONAL CIRCUITS (2)
COMPUTATIONAL COMPLEXITY (2)
COMPUTATIONAL MODELING (2)
DATA MINING (2)
ENCODING (2)
ERROR CORRECTION (2)
ERROR CORRECTION CODES (2)
FAULT TOLERANT (2)
GENERATORS (2)
HARDWARE DESIGN LANGUAGES (2)
INTEGRATED CIRCUIT INTERCONNECTIONS (2)
MICROCONTROLLER (2)
MICROCONTROLLERS (2)
MICROPROCESSOR CHIPS (2)
NANOTECHNOLOGY (2)
OPERATING SYSTEMS (2)
PARALLEL PROCESSING (2)
PARALLELISM (2)
PERIPHERAL INTERFACES (2)
PROCESSOR CORES (2)
REAL TIME SYSTEMS (2)
REAL-TIME SYSTEMS (2)
RESILIENCE (2)
SAFETY-CRITICAL SOFTWARE (2)
SINGLE EVENT TRANSIENT (2)
SOFT ERROR MITIGATION (2)
SOFTWARE RELIABILITY (2)
SWITCHES (2)
SYSTEM-ON-CHIP (2)
TMR (2)
TRANSIENT ANALYSIS (2)
TRANSIENT FAULTS (2)
VERY LARGE SCALE INTEGRATION (2)
3D INTEGRATION (1)
3D NETWORK-ON-CHIP LINKS (1)
ACCURACY (1)
AEROSPACE COMPUTING (1)
AEROSPACE ELECTRONICS (1)
AEROSPACE ENGINEERING (1)
AIRBORNE ICE PROTECTION SYSTEM (1)
ALTERA (1)
ALU (1)
AN CODE (1)
ANALYTICAL REDUNDANCY INSPIRED APPROACH (1)
AOP LANGUAGE WEAVER (1)
APPLICATION SOFTWARE (1)
AREA REDUNDANCY (1)
ARINC429 (1)
ARITHMETIC ERROR CODING (1)
ARITHMETIC LOGIC UNIT (1)
ARRAYS (1)
more

INFONA - science communication portal

Advanced search

Advanced search in people

A Performance and Energy Comparison of Fault Tolerance Techniques for Exascale Computing Systems

On Time Redundancy of Fault Tolerant C-Based MPSoCs

Enabling NoC Performance Improvement Using a Fault Tolerance Mechanism

Diverse Compiling for Microprocessor Fault Detection in Temporal Redundant Systems

Context-aware resources placement for SRAM-based FPGA to minimize checkpoint/recovery overhead

Study on Error-Detecting Approach for Fault Tolerance Recomputing Oriented Parallel Digital Terrain Analysis

Real time scheduling of multiple executions of tasks to achieve fault tolerance in multiprocessor systems

Harnessing Unreliable Cores in Heterogeneous Architecture: The PyDac Programming Model and Runtime

A survey on simulation-based fault injection tools for complex systems

A Practitioner's Guide to Software-Based Soft-Error Mitigation Using AN-Codes

Development of a Fault Injection System to Test a Weather Station Based on Rapid Prototyping Platform

ATS redundant design in support of system & mission sustainability

Totally self-checking (TSC) VLSI circuits using Scalable Error Detection Coding (SEDC) technique

Response-Time Analysis of Parallel Fork-Join Workloads with Real-Time Constraints

Layout design and simulation of fault tolerant triple modular redundant ALU system

Is 3D integration the way to future dependable computing platforms?

A co-design approach for SET mitigation in embedded systems

On graceful degradation of chip multiprocessors in presence of faults via flexible pooling of critical execution units

Error Detection by Redundant Transaction in Transactional Memory System

CPU-aware, process-level redundancy to tolerate faults in multi-core

Filter options

Publication date

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options