Advanced search

Advanced search in people

From:

To:

Items from 1 to 20 out of 30 results

chapter

On the Robustness of a Neural Network

El Mahdi El Mhamdi, Rachid Guerraoui, Sebastien Rouault

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) > 84 - 93

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)

With the development of neural networks based machine learning and their usage in mission critical applications, voices are rising against the black box aspect of neural networks as it becomes crucial to understand their limits and capabilities. With the rise of neuromorphic hardware, it is even more critical to understand how a neural network, as a distributed system, tolerates the failures of its...

chapter

Survey of failures and fault tolerance in cloud

Soma Prathiba, S. Sowvarnica

2017 2nd International Conference on Computing and Communications Technologies (ICCCT) > 169 - 172

2017 2nd International Conference on Computing and Communications Technologies (ICCCT)

Cloud computing provides support for hosting client's application. Cloud is a distributed platform that provides hardware, software and network resources to both execute consumer's application and also to store and mange user's data. Cloud is also used to execute scientific workflow applications that are in general complex in nature when compared to other applications. Since cloud is a distributed...

chapter

A Performance and Energy Comparison of Fault Tolerance Techniques for Exascale Computing Systems

Daniel Dauwe, Sudeep Pasricha, Anthony A. Maciejewski, Howard Jay Siegel

2016 IEEE International Conference on Computer and Information Technology (CIT) > 436 - 443

2016 IEEE International Conference on Computer and Information Technology (CIT)

As the computing power of large scale computing systems increases exponentially with time, their failure rates are increasing exponentially as well. While current high performance computing (HPC) systems experience failures of some type every few days, projections indicate that the next generation exascale machines will experience failures up to several times an hour. The resilience techniques implemented...

chapter

Performance Scaling Variability and Energy Analysis for a Resilient ULFM-based PDE Solver

K. Morris, F. Rizzi, B. Cook, P. Mycek, more

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) > 41 - 48

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)

We present a resilient task-based domain-decomposition preconditioner for partial differential equations (PDEs) built on top of User Level Fault Mitigation Message Passing Interface (ULFM-MPI). The algorithm reformulates the PDE as a sampling problem, followed by a robust regression-based solution update that is resilient to silent data corruptions (SDCs). We adopt a server-client model where all...

chapter

Self-repairing mobile robotic car using astrocyte-neuron networks

Junxiu Liu, Jim Harkin, Liam McDaid, David M. Halliday, more

2016 International Joint Conference on Neural Networks (IJCNN) > 1379 - 1386

2016 International Joint Conference on Neural Networks (IJCNN)

A self-repairing robot utilising a spiking astrocyte-neuron network is presented in this paper. It uses the output spike frequency of neurons to control the motor speed and robot activation. A software model of the astrocyte-neuron network previously demonstrated self-detection of faults and its self-repairing capability. In this paper the application demonstrator of mobile robotics is employed to...

chapter

Design of OpenCL-compatible multithreaded hardware accelerators with dynamic support for embedded FPGAs

Alfonso Rodrıguez, Juan Valverde, Eduardo de la Torre

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 7

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

ARTICo³ is an architecture that permits to dynamically set an arbitrary number of reconfigurable hardware accelerators, each containing a given number of threads fixed at design time according to High Level Synthesis constraints. However, the replication of these modules can be decided at runtime to accelerate kernels by increasing the overall number of threads, add modular redundancy to increase...

chapter

Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults

Francesco Rizzi, Karla Morris, Khachik Sargsyan, Paul Mycek, more

2015 IEEE International Conference on Cluster Computing > 552 - 562

2015 IEEE International Conference on Cluster Computing (CLUSTER)

We present a domain-decomposition-based pre-conditioner for the solution of partial differential equations (PDEs) that is resilient to both soft and hard faults. The algorithm is based on the following steps: first, the computational domain is split into overlapping subdomains, second, the target PDE is solved on each subdomain for sampled values of the local current boundary conditions, third, the...

chapter

Building a Nature-Inspired Computer

Peter J. Bentley

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) > 20 - 21

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

Since the before birth of computers we have strived to make intelligent machines that share some of the properties of our own brains. We have tried to make devices that quickly solve problems that we find time consuming, that adapt to our needs, and that learn and derive new information. In more recent years we have tried to add new capabilities to our devices: self-adaptation, fault tolerance, self-repair,...

chapter

Hardware task migration module for improved fault tolerance and predictability

Shyamsundar Venkataraman, Rui Santos, Akash Kumar, Jasper Kuijsten

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) > 197 - 202

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Task migration has been applied as an efficient mechanism to handle faulty processing elements (PEs) in Multi-processor Systems-on-Chip (MPSoCs). However, current task migration solutions are either implemented or emulated in software, compromising intrinsically the predictability and degrading the system robustness. Moreover, the initial placement and mapping of the tasks in the MPSoC plays an important...

chapter

A Failure Recovery Solution for Transplanting High-Performance Data-Intensive Algorithms from the Cluster to the Cloud

Da-Qi Ren, Zane Wei

2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing > 1463 - 1468

2013 IEEE International Conference on High Performance Computing and Communications (HPCC) & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

The computing-cloud manages huge numbers of virtualized resources to provide uniquely beneficial computing paradigms for scientific research. A modern cloud can behave in a virtual context - much like a local homogeneous computer cluster - to deliver High Performance Computing (HPC) platforms that provide public users with access, cut purchase costs, and eliminate the maintenance burden of sophisticated...

chapter

Converting a High Performance Application to an Elastic Cloud Application

Dinesh Rajan, Anthony Canino, Jesus A. Izaguirre, Douglas Thain

2011 IEEE Third International Conference on Cloud Computing Technology and Science > 383 - 390

2011 IEEE 3rd International Conference on Cloud Computing Technology and Science (CloudCom)

Over the past decade, high performance applications have embraced parallel programming and computing models. While parallel computing offers advantages such as good utilization of dedicated hardware resources, it also has several drawbacks such as poor fault-tolerance, scalability, and ability to harness available resources during run-time. The advent of cloud computing presents a viable and promising...

chapter

Bio-inspired Parallel Computing Structures for High Reliability Servomotor Control Applications

V. Chindris, Cs. Sz´sz

2011 10th International Symposium on Parallel and Distributed Computing > 270 - 273

2011 10th International Symposium on Parallel and Distributed Computing (ISPDC)

Taking inspiration from biological organism's cell based structure, the paper is focused on modeling and implementation of bio-inspired artificial hardware structures for high reliability industrial control applications. As it known, living organisms offers the ability to grow with fault-tolerance and self-repair. These remarkable capabilities can be associated with principles to engine complex novel...

chapter

Effects of Soft Error to System Reliability

Lei Xiong, Qingping Tan, Jianjun Xu

2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications > 204 - 209

2011 25th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA 2011)

Soft errors on hardware could affect the reliability of computer system. To estimate system reliability, it is important to know the effects of soft errors to system reliability. This paper explores the effects of soft errors to computer system reliability. We propose a new approach to measure system reliability for soft error factor. In our approach, hardware components reliability is concerned first...

chapter

Towards Dependability-Aware Design of Hardware Systems Using Extended Program State Machines

K Gruttner, A Herrholz, U Kuhne, D Grosse, more

2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops > 181 - 188

2011 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops

Due to the continuous shrinking of the transistor sizes which is strongly driven by Moore's law, reliability becomes a dominant design challenge for embedded systems. Reliability problems arise from permanent errors due to manufacturing, process variations, aging as well as soft errors. As a result, the hardware will consist of unreliable components and hence, the development of embedded systems has...

chapter

Modeling and tolerating heterogeneous failures in large parallel systems

Eric Heien, Dan LaPine, Derrick Kondo, Bill Kramer, more

2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) > 1 - 11

2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

As supercomputers and clusters increase in size and complexity, system failures are inevitable. Different hardware components (such as memory, disk, or network) of such systems can have different failure rates. Prior works assume failures equally affect an application, whereas our goal is to provide failure models for applications that reflect their specific component usage. This is challenging because...

chapter

Study on Hardware Implementation of Artificial Immune System

Yong Liu, Wenhai Li

2010 2nd International Conference on Information Engineering and Computer Science > 1 - 4

2010 2nd International Conference on Information Engineering and Computer Science (ICIECS)

Negative selection algorithm is one of the most widely used techniques in the field of artificial immune systems. This paper proposed an approach to implement negative selection algorithm based on FPGA, aiming at fault detection problems. The negative selection algorithm generally uses binary matching rules to discriminate self from non-self. Firstly, three most widely used binary matching rules were...

chapter

HiAL-Ckpt: A hierarchical application-level checkpointing for CPU-GPU hybrid systems

Xinhai Xu, Yufei Lin, Tao Tang, Yisong Lin

2010 5th International Conference on Computer Science&Education > 1895 - 1899

2010 5th International Conference on Computer Science & Education (ICCSE 2010)

In light of its powerful computing capacity and high energy efficiency, GPU (graphics processing unit) has become a focus in the research field of HPC (High Performance Computing). CPU-GPU heterogeneous parallel systems have become a new development trend of super-computer. However, the inherent unreliability of the GPU hardware deteriorates the reliability of super-computer. We have researched on...

chapter

Algorithm-based fault tolerance for many-core architectures

Claus Braun, Hans-Joachim Wunderlich

2010 15th IEEE European Test Symposium > 253

2010 15th IEEE European Test Symposium (ETS 2010)

Modern many-core architectures with hundreds of cores provide a high computational potential. This makes them particularly interesting for scientific high-performance computing and simulation technology. Like all nano scaled semiconductor devices, many-core processors are prone to reliability harming factors like variations and soft errors. One way to improve the reliability of such systems is software-based...

chapter

A Cluster-Based Implementation of a Fault Tolerant Parallel Reduction Algorithm Using Swarm-Array Computing

Blesson Varghese, Gerard McKee, Vassil Alexandrov

2010 Sixth International Conference on Autonomic and Autonomous Systems > 30 - 36

2010 Sixth International Conference on Autonomic and Autonomous Systems (ICAS 2010)

Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. In the approach considered a task to be executed on a parallel computing system is decomposed...

chapter

Simulation of fault injection of microprocessor system using VLSI architecture system

A. Sharma, B. Singh

TENCON 2009 - 2009 IEEE Region 10 Conference > 1 - 5

TENCON 2009. 2009 IEEE Region 10 Conference

Evaluating and possibly improving the fault tolerance and error detecting mechanisms is becoming a key issue when designing safety-critical electronic systems. The proposed approach is based on simulation-based fault injection and allows the analysis of the system behavior when faults occur. The paper describes how a microprocessor board employed in an automated light-metro control system has been...

Keywords:
HARDWARE
FAULT TOLERANCE
COMPUTATIONAL MODELING
Publication type:
book

Publication date

Set your own date range

Content availability

Available (28)
None (2)

Keywords

FAULT TOLERANT SYSTEMS (15)
COMPUTER ARCHITECTURE (7)
PARALLEL PROCESSING (6)
FIELD PROGRAMMABLE GATE ARRAYS (5)
SOFTWARE (5)
FAULT TOLERANT COMPUTING (4)
INTEGRATED CIRCUIT MODELING (4)
VLSI (4)
CLOUD COMPUTING (3)
COMPUTERS (3)
DISTRIBUTED COMPUTING (3)
FPGA (3)
MICROPROCESSORS (3)
MULTIPROCESSING SYSTEMS (3)
ORGANISMS (3)
RELIABILITY (3)
RESILIENCE (3)
SERVERS (3)
BENCHMARK TESTING (2)
BIOLOGICAL SYSTEM MODELING (2)
BOUNDARY CONDITIONS (2)
CHECKPOINTING (2)
CIRCUIT CAD (2)
CIRCUIT FAULTS (2)
CLIENT-SERVER SYSTEMS (2)
CLUSTERING ALGORITHMS (2)
COMPUTER-AIDED MODELING (2)
EMBRYONIC SYSTEM (2)
FAULT INJECTION (2)
HARDWARE-SOFTWARE CODESIGN (2)
HIGH PERFORMANCE COMPUTING (2)
MESSAGE PASSING (2)
MICROPROCESSOR CHIPS (2)
NEURONS (2)
OPTIMIZATION (2)
PARALLEL ALGORITHMS (2)
PROGRAM COMPILERS (2)
PROGRAM PROCESSORS (2)
PROTOTYPES (2)
RANDOM ACCESS MEMORY (2)
REAL TIME SYSTEMS (2)
REDUNDANCY (2)
RELIABILITY ENGINEERING (2)
VIRTUAL MACHINING (2)
ABSTRACTED HARDWARE LAYER (1)
ACCURACY (1)
ADAPTATION MODEL (1)
ADAPTATION MODELS (1)
ADVERSARIAL MACHINE LEARNING (1)
ANALYTICAL MODELS (1)
APPLICATION PROGRAM INTERFACES (1)
APREDICTED CORE FAILURE (1)
ARTIFICIAL IMMUNE SYSTEM (1)
ARTIFICIAL IMMUNE SYSTEMS (1)
ARTIFICIAL NEURAL NETWORK (1)
ASTROCYTE (1)
AUTOMATED LIGHT-METRO CONTROL SYSTEM (1)
AUTOMATIC EXPERIMENT CONTROLLER FACILITY (1)
AUTONOMOUS AUTOMOBILES (1)
AVAILABILITY (1)
AVAILABILITY ANALYSIS (1)
AVAILABILITY MODEL (1)
AWARDS ACTIVITIES (1)
BINARY MATCHING RULE (1)
BIO INSPIRED HARDWARE SYSTEMS (1)
BIO-INSPIRED (1)
BIO-INSPIRED COMPUTER (1)
BIO-INSPIRED HARDWARE SYSTEMS DEVELOPMENT (1)
BIO-INSPIRED MATERIALS (1)
BIOLOGICAL NEURAL NETWORKS (1)
CELL-BASED STRUCTURE (1)
CHECKPOINT RESTART (1)
CHECKPOINTING CODE (1)
CLOUD (1)
CLUSTER TECHNIQUE (1)
CLUSTER-BASED IMPLEMENTATION (1)
CLUSTERING METHODS (1)
COMPLEX INTERACTION PHENOMENA (1)
COMPUTER AIDED SIMULATIONS (1)
COMPUTER CLUSTER (1)
COMPUTER CRASHES (1)
COMPUTER GRAPHIC EQUIPMENT (1)
COMPUTER NETWORK RELIABILITY (1)
COMPUTER SYSTEM RELIABILITY (1)
COMPUTER TESTING (1)
COMPUTING (1)
CONTINUOUS SHRINKING (1)
COPROCESSORS (1)
CORRELATION (1)
CPU DIRECTIVE (1)
CPU-GPU HETEROGENEOUS PARALLEL SYSTEM (1)
CYCLE ACCURATE BIT ACCURATE LEVEL (1)
DATA ACQUISITION (1)
DATA MEMORY DISC AVAILABILITY (1)
DATA MINING (1)
DATA MODELS (1)
DATA-INTENSIVE (1)
more

INFONA - science communication portal

Advanced search

Advanced search in people

On the Robustness of a Neural Network

Survey of failures and fault tolerance in cloud

A Performance and Energy Comparison of Fault Tolerance Techniques for Exascale Computing Systems

Performance Scaling Variability and Energy Analysis for a Resilient ULFM-based PDE Solver

Self-repairing mobile robotic car using astrocyte-neuron networks

Design of OpenCL-compatible multithreaded hardware accelerators with dynamic support for embedded FPGAs

Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults

Building a Nature-Inspired Computer

Hardware task migration module for improved fault tolerance and predictability

A Failure Recovery Solution for Transplanting High-Performance Data-Intensive Algorithms from the Cluster to the Cloud

Converting a High Performance Application to an Elastic Cloud Application

Bio-inspired Parallel Computing Structures for High Reliability Servomotor Control Applications

Effects of Soft Error to System Reliability

Towards Dependability-Aware Design of Hardware Systems Using Extended Program State Machines

Modeling and tolerating heterogeneous failures in large parallel systems

Study on Hardware Implementation of Artificial Immune System

HiAL-Ckpt: A hierarchical application-level checkpointing for CPU-GPU hybrid systems

Algorithm-based fault tolerance for many-core architectures

A Cluster-Based Implementation of a Fault Tolerant Parallel Reduction Algorithm Using Swarm-Array Computing

Simulation of fault injection of microprocessor system using VLSI architecture system

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options