The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper models 1-out-of-N standby computing systems with a dynamic checkpointing policy. The system performs a real-time mission task that has to be accomplished within an allowed mission time. During the mission, to facilitate an effective failure recovery the system undergoes checkpointing procedures according to a policy that dynamically determines a checkpointing frequency based on the activated...
We give efficient algorithms to solve fundamental data movement problems on mesh-connected computers augmented with limited global bandwidth. Adding a small amount of global bandwidth makes a practical design that combines aspects of mesh and fully connected models to achieve the benefits of each. We give algorithms for sorting, finding the median, finding a spanning tree, and determining various...
This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and...
A $t/s$ -diagnosable system, a generalization of $t/t$ -diagnosable system, refers to such a system that all the faulty nodes of the system can be isolated within a set of size at most $s$ in the presence of at most $t$ faulty nodes. In this paper, the $t/s$ -diagnosability of the hypercubes under the PMC model (the comparison model) is evaluated. First, several novel properties of hypercube...
One of the most important machine learning techniques include clustering of data into different clusters or categories. There are several decent algorithms and techniques that exist to perform clustering on small to medium scale data. In the era of Big Data and with applications being large-scale and data-intensive in nature, there is a significant increment in volume, variety and velocity of data...
The left-preconditioned communication avoiding conjugate gradient (LP-CA-CG) method is applied to the pressure Poisson equation in the multiphase CFD code JUPITER. The arithmetic intensity of the LP-CA-CG method is analyzed, and is dramatically improved by loop splitting for inner product operations and for three term recurrence operations. Two LPCA-CG solvers with block Jacobi preconditioning and...
PWCS (Probabilistic Write / Copy-Select) is a new kind of lock-free synchronization mechanism with wait-free characteristics proposed by Nicholas Mc Guire at the 13th real-time Linux workshop, which utilizes the inherent randomness of the modern computer systems. It aims at addressing the multi-reader - single-writer problem in Linux. Based on the original label-based PWCS, we propose a hash-based...
This paper overviews a technique for verifying cache coherence protocols described in the Promela language. The approach is comprised of the following steps. First, a model written for a certain configuration of the memory system is generalized to the model being parameterized with the number of processors. Second, the parameterized model is abstracted from the exact number of processors. Finally,...
Sparse Matrix-Vector multiplication (SpMV) is a computational kernel widely used in many applications. There are many different implementations using different processors and algorithms for SpMV. The performances of different SpMV implementations are quite different, and it is basically difficult to choose the implementation that has the best performance for a given sparse matrix and a given platform...
This paper analyzes the parallelization efficiency of Menge [1], an open source virtual crowd simulation system widely used for algorithm benchmarking, with focuses on three aspects: performance of the existing parallel processing scheme, bottleneck of parallel processing, and improvement opportunities for parallel efficiency of the system. First, we calculate the speedup ratio of each Menge module...
We consider a practical makespan minimization problem that arises in a multiprocessor computer system where some processors may be shut down during computation to save an amount of shared power. The system consists of m processors driven by a common power source. The processors are modeled as a set of identical parallel machines. Moreover, we consider a set of n independent, nonpreemptive jobs which...
In today's scenario there is a need of fast computers to perform huge tasks in less time. In serial computation one task will be done after another but it takes more time. On the other hand, time taken by a computation problem can be reduced by performing several operations simultaneously. Parallel computing [4,8,9] is the concurrent use of multiple resources to solve a single problem. A computational...
A distributed system consists of several autonomous nodes. In a distributed system some of the nodes may be overloaded due to a large number of job arrivals while other nodes may remain idle without any processing. The performance of a distributed system depends crucially on dividing up work effectively among the computing nodes. So a way is needed to share load across all the computing nodes. In...
In computational electromagnetics, surface integral equation (SIE) formulations are widely used to predict the electromagnetic scattering from arbitrary structures. These SIE formulations are discretized into a matrix form by the well-known method of moments (MoM). Up to now, the lack of proper compilers made it necessary for the MoM codes to be parallelized by hand in order to obtain reasonable performance...
This paper proposes a novel distributed parallel EM modeling technique to speed up the process of neural network modeling for EM structures. Existing techniques for EM modeling usually need to repeatedly change the parameters of microwave devices and drive the EM simulator to obtain sufficient training and testing samples. As the complexity in EM modeling problem increases, traditional techniques...
Today's multi-processor system-on-chip (MPSoC) systems increasingly have to deal with dynamically changing application workload scenarios. To cope with such dynamic application behavior, these systems could dynamically adapt the mapping of application tasks onto the underlying system resources to improve the system's performance. However, such performance improvement comes at the cost of a system...
Future generation processors are expected to have high soft error rates and will require increased fault detection and fault tolerance. This work focuses on errors in execution units. Hardware or software duplication or triplication, parity, or residue codes could be used to detect errors in execution units. However, hardware duplication/triplication have significant area overhead and, in applications...
Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frameworks...
After studying the compressed sensing theory and its main reconstruction algorithm-Matching Pursuit (MP) algorithm, this paper proposes a new approach to improve the speed of MP algorithm, and it describes how to build a Beowulf parallel computing system with 8 PCs. Its parallel computations is implemented by Message-Passing-Interface(MPI), and a 100Mb/s high speed Ethernet network interconnects all...
A-Cell is a high-level abstraction of fine-grained parallelism specifically designed to be applicable to all range of parallel devices from super computers based on CPUs or GPUs, to network of embedded devices. To achieve this, A-Cell adopts a programming model called "connectionist computing" and with that takes a leap step away from Turing programming model. Also, in contrast with most...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.