The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Cloud computing has brought with it the utilization of off the shelf, commodity hardware that has higher failure rates than the systems that have been used in enterprises for the last several decades. Coupled with increasingly complex, highly-distributed, constantly-changing data center environments that can no longer be treated as deterministic systems, this forces us to change the way that we view...
Mobile devices are ubiquitous but their resources are limited. However, they must be capable to run computationally intensive software, for example for image stitching, face recognition, and simulation-based artificial intelligence. As a solution, mobile devices can use nearby resources to offload computation. Distributed computing environments provide such features but ignore the nature of mobile...
This demo paper presents the Tasklet system - a middleware for distributed computing. The Tasklet system allows developers to offload self-contained units of computation - the so-called Tasklets - to a pool of heterogeneous computing devices. In this demonstration of the Tasklet system, we uncover the otherwise transparent process of computation offloading and scheduling. Further, we show the easy...
The class of robot convergence tasks has been shown to capture fundamental aspects of fault-tolerant computability. A set of asynchronous robots that may fail by crashing, start from unknown places in some given space, and have to move towards positions close to each other. In this article, we study the case where the space is uni-dimensional, modeled as a graph G. In graph convergence, robots have...
Nowadays many job schedulers rely on checkpoint mechanisms to make long-running batch jobs resilient to node failures. At large scale stopping a job and creating its image consumes considerable amount of time. The aim of this study is to propose a method that eliminates this overhead. For this purpose we decompose a problem being solved into computational microkernels which have strict hierarchical...
Spark has become the first choice of distributed computing framework for big data processing. The biggest highlight is the use of in-memory computations on large clusters, which is suitable for iterative computing and interactive computing. However, the straggler machines can seriously affect their performance. The current approach of Spark is speculative execution which selects the slow tasks and...
This paper considers the Approximate Agreement problem in presence of mobile Byzantine agents. We prove lower bounds on the number of correct processes to solve such problem. To do that we prove that the existing solutions tolerant to Byzantine agents still holds in such case and under which conditions.
In this paper, we introduce the fault-tolerant Distributed Analytics System (DAS) for analyzing big data collected from search engines in Arabic. This system consists of three main subsystems: Logging and Archiving Subsystem (LAS), Analytics Subsystem (AS), and a User Interface (UI). We used the data provided by opensooq.com, an online market with Arabic content, and compiled four datasets with sizes:...
Many great big data processing platforms, for example Hadoop Map Reduce, are keeping improving large-scale data processing performance which make big data processing focus of IT industry. Among them Spark has become increasingly popular big data processing framework since it was presented in 2010 first time. Spark use RDD for its data abstraction, targeting at the multiple iteration large-scale data...
We present a domain-decomposition-based pre-conditioner for the solution of partial differential equations (PDEs) that is resilient to both soft and hard faults. The algorithm is based on the following steps: first, the computational domain is split into overlapping subdomains, second, the target PDE is solved on each subdomain for sampled values of the local current boundary conditions, third, the...
The Raft consensus algorithm is a new distributed consensus algorithm that is both easier to understand and more straightforward to implement than the older Paxos algorithm. Its major limitation is its high energy footprint. As it relies on majority consensus voting for deciding when to commit an update, Raft requires five participants to protect against two simultaneous failures. We propose two methods...
Data sizes in today’s Big Data age presents a profound scalability challenge to modeling networks as graphs. Historically, memory-based solutions were utilized to cope with high latency incurred by irregular data access common in many natural networks. But current data rates impose both economic and environmental challenges to continually expand the total aggregate system memory to “fit” the graph...
Distributed computing environments have been very much in sight for the last one and a half decade. Geographically distributed resources are provisioned to the user tasks in the distributed computing environment as per their requirements. A number of parameters are to be taken into account while provisioning the distributed resources such as task performance and fault tolerance etc. Extensive research...
Since the before birth of computers we have strived to make intelligent machines that share some of the properties of our own brains. We have tried to make devices that quickly solve problems that we find time consuming, that adapt to our needs, and that learn and derive new information. In more recent years we have tried to add new capabilities to our devices: self-adaptation, fault tolerance, self-repair,...
Large graph analysis is one of the significant applications of distributed computing frameworks. The distributed computing applications are solved by developing programs over different types of established distributed computing frameworks. Since graph analysis and prediction is one of the new trend in data analytics, designing the problems on an in-memory cluster framework which consumes graph data-sets...
Failure detectors are oracles that have been introduced to provide processes in asynchronous systems with information about faults. This information can then be used to solve problems otherwise unsolvable in asynchronous systems. A natural question is on the "minimum amount of information" a failure detector has to provide for a given problem. This question is classically addressed using...
Scientific Computing deals with solving complex scientific problems by applying resource-hungry computer simulation and modeling tasks on-top of supercomputers, grids and clusters. Typical scientific computing applications can take months to create and debug when applying de facto parallelization solutions like Message Passing Interface (MPI), in which the bulk of the parallelization details have...
Large-scale data mining and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have a large user base. R is one of the most widely used of these languages, but is limited to a single threaded execution model and problem sizes that fit in a single node. This...
Mode of Scheduling plays the key role in Grid Scheduling. It is of two types, immediate and batch mode. Immediate mode takes one by one task in a sequence. But the batch mode takes in a random sequence. So, task assignment is mainly based on the mode selection. The task may be assigned to the resource as soon as arrive or in a batch. In this paper, we have introduced a new mode of heuristic called...
The importance of fault tolerance for the parallel computing field is ever increasing, as the mean time between failures is predicted to decrease significantly for future highly parallel systems. The current trend of using commodity hardware to reduce the cost of clusters forces users to ensure that their applications are fault tolerant. When it comes to embarrassingly parallel data-intensive algorithms,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.