Search results

Items from 1 to 20 out of 106 results

article

Realizing Software Reliability in the Face of Infrastructure Instability

Cornelia Davis

IEEE Cloud Computing > 2017 > 4 > 5 > 34 - 40

Cloud computing has brought with it the utilization of off the shelf, commodity hardware that has higher failure rates than the systems that have been used in enterprises for the last several decades. Coupled with increasingly complex, highly-distributed, constantly-changing data center environments that can no longer be treated as deterministic systems, this forces us to change the way that we view...

chapter

Using quality of computation to enhance quality of service in mobile computing systems

Dominik Schafer, Janick Edinger, Tobias Borlinghaus, Justin Mazzola Paluska, more

2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS) > 1 - 5

2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)

Mobile devices are ubiquitous but their resources are limited. However, they must be capable to run computationally intensive software, for example for image stitching, face recognition, and simulation-based artificial intelligence. As a solution, mobile devices can use nearby resources to offload computation. Distributed computing environments provide such features but ignore the nature of mobile...

chapter

Developing distributed computing applications with Tasklets

Janick Edinger, Dominik Schafer, Martin Breitbach, Christian Becker

2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) > 94 - 96

2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)

This demo paper presents the Tasklet system - a middleware for distributed computing. The Tasklet system allows developers to offload self-contained units of computation - the so-called Tasklets - to a pool of heterogeneous computing devices. In this demonstration of the Tasklet system, we uncover the otherwise transparent process of computation offloading and scheduling. Further, we show the easy...

chapter

Two Convergence Problems for Robots on Graphs

Armando Castaneda, Sergio Rajsbaum, Matthieu Roy

2016 Seventh Latin-American Symposium on Dependable Computing (LADC) > 81 - 90

2016 Seventh Latin-American Symposium on Dependable Computing (LADC)

The class of robot convergence tasks has been shown to capture fundamental aspects of fault-tolerant computability. A set of asynchronous robots that may fail by crashing, start from unknown places in some given space, and have to move towards positions close to each other. In this article, we study the case where the space is uni-dimensional, modeled as a graph G. In graph convergence, robots have...

chapter

Factory: Non-stop batch jobs without checkpointing

Ivan Gankevich, Yuri Tipikin, Vladimir Korkhov, Vladimir Gaiduchok

2016 International Conference on High Performance Computing & Simulation (HPCS) > 979 - 984

2016 International Conference on High Performance Computing & Simulation (HPCS)

Nowadays many job schedulers rely on checkpoint mechanisms to make long-running batch jobs resilient to node failures. At large scale stopping a job and creating its image consumes considerable amount of time. The aim of this study is to propose a method that eliminates this overhead. For this purpose we decompose a problem being solved into computational microkernels which have strict hierarchical...

chapter

Improving Spark performance with MPTE in heterogeneous environments

Hongbin Yang, Xianyang Liu, Shenbo Chen, Zhou Lei, more

2016 International Conference on Audio, Language and Image Processing (ICALIP) > 28 - 33

2016 International Conference on Audio, Language and Image Processing (ICALIP)

Spark has become the first choice of distributed computing framework for big data processing. The biggest highlight is the use of in-memory computations on large clusters, which is suitable for iterative computing and interactive computing. However, the straggler machines can seriously affect their performance. The current approach of Spark is speculative execution which selects the slow tasks and...

chapter

Approximate Agreement under Mobile Byzantine Faults

Silvia Bonomi, Antonella Del Pozzo, Maria Potop-Butucaru, Sebastien Tixeuil

2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS) > 727 - 728

2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS)

This paper considers the Approximate Agreement problem in presence of mobile Byzantine agents. We prove lower bounds on the number of correct processes to solve such problem. To do that we prove that the existing solutions tolerant to Byzantine agents still holds in such case and under which conditions.

chapter

DAS: Distributed analytics system for Arabic search engines

Ramzi Alqrainy, Sherenaz Al-Haj Baddar

2016 7th International Conference on Information and Communication Systems (ICICS) > 20 - 26

2016 7th International Conference on Information and Communication Systems (ICICS)

In this paper, we introduce the fault-tolerant Distributed Analytics System (DAS) for analyzing big data collected from search engines in Arabic. This system consists of three main subsystems: Logging and Archiving Subsystem (LAS), Analytics Subsystem (AS), and a User Interface (UI). We used the data provided by opensooq.com, an online market with Arabic content, and compiled four datasets with sizes:...

chapter

ASC: Improving spark driver performance with automatic spark checkpoint

Wei Zhu, Haopeng Chen, Fei Hu

2016 18th International Conference on Advanced Communication Technology (ICACT) > 607 - 611

2016 18th International Conference on Advanced Communication Technology (ICACT)

Many great big data processing platforms, for example Hadoop Map Reduce, are keeping improving large-scale data processing performance which make big data processing focus of IT industry. Among them Spark has become increasingly popular big data processing framework since it was presented in 2010 first time. Spark use RDD for its data abstraction, targeting at the multiple iteration large-scale data...

chapter

Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults

Francesco Rizzi, Karla Morris, Khachik Sargsyan, Paul Mycek, more

2015 IEEE International Conference on Cluster Computing > 552 - 562

2015 IEEE International Conference on Cluster Computing (CLUSTER)

We present a domain-decomposition-based pre-conditioner for the solution of partial differential equations (PDEs) that is resilient to both soft and hard faults. The algorithm is based on the following steps: first, the computational domain is split into overlapping subdomains, second, the target PDE is solved on each subdomain for sampled values of the local current boundary conditions, third, the...

chapter

Reducing the Energy Footprint of a Distributed Consensus Algorithm

Jehan-Francois Paris, Darrell D. E. Long

2015 11th European Dependable Computing Conference (EDCC) > 198 - 204

2015 11th European Dependable Computing Conference (EDCC)

The Raft consensus algorithm is a new distributed consensus algorithm that is both easier to understand and more straightforward to implement than the older Paxos algorithm. Its major limitation is its high energy footprint. As it relies on majority consensus voting for deciding when to commit an update, Raft requires five participants to protect against two simultaneous failures. We propose two methods...

chapter

A cloud-based approach to big graphs

Paul Burkhardt, Christopher A. Waring

2015 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 8

2015 IEEE High-Performance Extreme Computing Conference (HPEC)

Data sizes in today’s Big Data age presents a profound scalability challenge to modeling networks as graphs. Historically, memory-based solutions were utilized to cope with high latency incurred by irregular data access common in many natural networks. But current data rates impose both economic and environmental challenges to continually expand the total aggregate system memory to “fit” the graph...

chapter

A framework for priority based task execution in the distributed computing environment

Moin Hasan, Major Singh Goraya

2015 International Conference on Signal Processing, Computing and Control (ISPCC) > 155 - 158

2015 International Conference on Signal Processing, Computing and Control (ISPCC)

Distributed computing environments have been very much in sight for the last one and a half decade. Geographically distributed resources are provisioned to the user tasks in the distributed computing environment as per their requirements. A number of parameters are to be taken into account while provisioning the distributed resources such as task performance and fault tolerance etc. Extensive research...

chapter

Building a Nature-Inspired Computer

Peter J. Bentley

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) > 20 - 21

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

Since the before birth of computers we have strived to make intelligent machines that share some of the properties of our own brains. We have tried to make devices that quickly solve problems that we find time consuming, that adapt to our needs, and that learn and derive new information. In more recent years we have tried to add new capabilities to our devices: self-adaptation, fault tolerance, self-repair,...

chapter

Distributed in-memory cluster computing approach in scala for solving graph data applications

C I Johnpaul, Neetha Susan Thampi

2014 International Conference on Advances in Electronics Computers and Communications > 1 - 6

2014 International Conference on Advances in Electronics, Computers and Communications (ICAECC)

Large graph analysis is one of the significant applications of distributed computing frameworks. The distributed computing applications are solved by developing programs over different types of established distributed computing frameworks. Since graph analysis and prediction is one of the new trend in data analytics, designing the problems on an in-memory cluster framework which consumes graph data-sets...

chapter

Solvability-Based Comparison of Failure Detectors

Srikanth Sastry, Josef Widder

2014 IEEE 13th International Symposium on Network Computing and Applications > 269 - 276

2014 IEEE 13th International Symposium on Network Computing and Applications (NCA)

Failure detectors are oracles that have been introduced to provide processes in asynchronous systems with information about faults. This information can then be used to solve problems otherwise unsolvable in asynchronous systems. A natural question is on the "minimum amount of information" a failure detector has to provide for a given problem. This question is classically addressed using...

chapter

Evaluating MapReduce frameworks for iterative Scientific Computing applications

Pelle Jakovits, Satish Narayana Srirama

2014 International Conference on High Performance Computing & Simulation (HPCS) > 226 - 233

2014 International Conference on High Performance Computing & Simulation (HPCS)

Scientific Computing deals with solving complex scientific problems by applying resource-hungry computer simulation and modeling tasks on-top of supercomputers, grids and clusters. Typical scientific computing applications can take months to create and debug when applying de facto parallelization solutions like Message Passing Interface (MPI), in which the bulk of the parallelization details have...

chapter

RABID: A Distributed Parallel R for Large Datasets

Hao Lin, Shuo Yang, Samuel P. Midkiff

2014 IEEE International Congress on Big Data > 725 - 732

2014 IEEE International Congress on Big Data (BigData Congress)

Large-scale data mining and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have a large user base. R is one of the most widely used of these languages, but is limited to a single threaded execution model and problem sizes that fit in a single node. This...

chapter

Intermediate mode scheduling in computational grid

Sanjaya Kumar Panda, Pratik Agrawal, Durga Prasad Mohapatra

2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) > 1 - 6

2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE)

Mode of Scheduling plays the key role in Grid Scheduling. It is of two types, immediate and batch mode. Immediate mode takes one by one task in a sequence. But the batch mode takes in a random sequence. So, task assignment is mainly based on the mode selection. The task may be assigned to the resource as soon as arrive or in a batch. In this paper, we have introduced a new mode of heuristic called...

chapter

NEWT - A Fault Tolerant BSP Framework on Hadoop YARN

Ilja Kromonov, Pelle Jakovits, Satish Narayana Srirama

2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing > 309 - 310

2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing (UCC)

The importance of fault tolerance for the parallel computing field is ever increasing, as the mean time between failures is predicted to decrease significantly for future highly parallel systems. The current trend of using commodity hardware to reduce the cost of clusters forces users to ensure that their applications are fault tolerant. When it comes to embarrassingly parallel data-intensive algorithms,...

Data set:
ieee
Keywords:
FAULT TOLERANCE
DISTRIBUTED COMPUTING

Publication date

Set your own date range

Content availability

Available (103)
None (3)

Publication type

book (83)
article (23)

Keywords

FAULT TOLERANT SYSTEMS (44)
FAULT TOLERANT COMPUTING (24)
DISTRIBUTED PROCESSING (18)
COMPUTATIONAL MODELING (17)
PROTOCOLS (14)
RELIABILITY (13)
COMPUTER ARCHITECTURE (12)
SCALABILITY (12)
GRID COMPUTING (11)
AVAILABILITY (10)
PEER TO PEER COMPUTING (10)
QUALITY OF SERVICE (10)
REAL TIME SYSTEMS (10)
RESOURCE MANAGEMENT (10)
COMPUTER CRASHES (9)
HARDWARE (9)
MOBILE COMPUTING (9)
PARALLEL COMPUTING (9)
PEER-TO-PEER COMPUTING (9)
RESOURCE ALLOCATION (9)
ALGORITHM DESIGN AND ANALYSIS (8)
CLOUD COMPUTING (8)
DISTRIBUTED SYSTEMS (8)
MIDDLEWARE (8)
PARALLEL PROCESSING (8)
CHECKPOINTING (7)
DATA MINING (7)
FAULT-TOLERANCE (7)
PROCESSOR SCHEDULING (7)
REAL-TIME SYSTEMS (7)
SECURITY (7)
SERVERS (7)
WIRELESS SENSOR NETWORKS (7)
DISTRIBUTED ALGORITHMS (6)
MAPREDUCE (6)
MOBILE COMMUNICATION (6)
SECURITY OF DATA (6)
AD HOC NETWORKS (5)
CLUSTERING ALGORITHMS (5)
COMPUTER NETWORKS (5)
DETECTORS (5)
DISTRIBUTED DATABASES (5)
DISTRIBUTED SYSTEM (5)
GRAPH THEORY (5)
INTERNET (5)
JAVA (5)
MONITORING (5)
PROGRAMMING (5)
REDUNDANCY (5)
SENSOR NETWORKS (5)
SOFTWARE (5)
SOFTWARE ENGINEERING (5)
SPARKS (5)
SYSTEM RECOVERY (5)
APPLICATION SOFTWARE (4)
COMPUTER FAULT TOLERANCE (4)
COMPUTER NETWORK RELIABILITY (4)
COMPUTERS (4)
CONFERENCES (4)
CONVERGENCE (4)
DATA STRUCTURES (4)
EMBEDDED SYSTEM (4)
EMBEDDED SYSTEMS (4)
HIGH PERFORMANCE COMPUTING (4)
MPI (4)
PROBABILITY (4)
SCHEDULING (4)
SOFTWARE FAULT TOLERANCE (4)
SPARK (4)
TELECOMMUNICATION NETWORK RELIABILITY (4)
BIG DATA (3)
BROADCASTING (3)
CLIENT-SERVER SYSTEMS (3)
COMPLEXITY THEORY (3)
CONTEXT (3)
COSTS (3)
CRYPTOGRAPHY (3)
DATA COMMUNICATION (3)
DATA MANAGEMENT (3)
DATA PROCESSING (3)
DATABASES (3)
FAILURE DETECTION (3)
FAULT DETECTION (3)
FILE SYSTEMS (3)
GENETIC ALGORITHMS (3)
HETEROGENEITY (3)
IMAGE PROCESSING (3)
IP NETWORKS (3)
LARGE-SCALE SYSTEMS (3)
OPTIMIZATION (3)
PERFORMANCE EVALUATION (3)
PERVASIVE COMPUTING (3)
QOS (3)
ROUTING (3)
SCIENTIFIC COMPUTING (3)
SOFTWARE RELIABILITY (3)
SYNCHRONISATION (3)
SYSTEM PERFORMANCE (3)
more

INFONA - science communication portal

Search results

Realizing Software Reliability in the Face of Infrastructure Instability

Using quality of computation to enhance quality of service in mobile computing systems

Developing distributed computing applications with Tasklets

Two Convergence Problems for Robots on Graphs

Factory: Non-stop batch jobs without checkpointing

Improving Spark performance with MPTE in heterogeneous environments

Approximate Agreement under Mobile Byzantine Faults

DAS: Distributed analytics system for Arabic search engines

ASC: Improving spark driver performance with automatic spark checkpoint

Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults

Reducing the Energy Footprint of a Distributed Consensus Algorithm

A cloud-based approach to big graphs

A framework for priority based task execution in the distributed computing environment

Building a Nature-Inspired Computer

Distributed in-memory cluster computing approach in scala for solving graph data applications

Solvability-Based Comparison of Failure Detectors

Evaluating MapReduce frameworks for iterative Scientific Computing applications

RABID: A Distributed Parallel R for Large Datasets

Intermediate mode scheduling in computational grid

NEWT - A Fault Tolerant BSP Framework on Hadoop YARN

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options