The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper a solution is given for the problem to determine the duration of a parallel asynchronous multi-step process in a homogeneous system of combinational logic. In such a system the problem arises if the processors have different speeds and input data. If so each processor “knows” its own data and the type of the process but in general has no information about the speed of the other processors...
Cristian proposed a probabilistic algorithm for clock synchronization. This algorithm however is not adapted to the changing system load, and it needs the knowledge of communication delay distribution. In this paper, we presents an algorithm similar to that of Cristian, which supports system load changes. This algorithm is based on the statistic of the communication delays.
This paper addresses parallel processing in 3D medical imaging. A SIMD architecture common for image reconstruction, image processing, and volume visualization is proposed. The ray casting algorithm for volume visualization along with some of its improvements is described. The image-space parallelization of ray casting is given. The parallelization ensures an even load balancing among processors and...
The purpose of this work is to present two novel architectures for inner product computation. The proposed architectures incorporate shift switching into the reconfigurable buses. Given two arrays of N elements, each consisting of m bits, our first architecture achieves a latency of O((logN + logM)ta + (logN)tb), using Nm2 basic shift switches and m2 adders assuming that broadcasting on a bus takes...
Based on a novel array processor architecture, consisting of two tightly-coupled mesh-connected processing cells, a number of highly parallelizable sorting algorithms are realized by match the data flow with the interconnection topology. The sorting algorithms chosen are the odd-even sort, bitonic sort and binary tree sort. Taking the modularity of these algorithms, the array implementation of small...
We consider the parallel solution of time-dependent partial differential equations. Due to the fact that time is a one-way dimension, traditional methods attack this type of equation by solving the resulting sequence of problems in a sequential manner. Parallel solution methods retain this sequential process, obtaining their parallelism by distributing the problem at each discrete time-step. It has...
In this paper we study the parallel solution of elliptic partial differential equations with the sparse grid combination technique. This new algorithmic concept is based on the independent solution of many problems with reduced size and their linear combination. The resulting algorithm can be used as a solver and within a preconditioner. We will describe the algorithm for two and three-dimensional...
An algorithm for the numerical simulation of the fluid flow in a crystal growth process is presented. The algorithm is implemented on three parallel architectures. The performance analysis shows that special care has to be taken for the efficient realization of communication patterns on massively parallel systems.
A parallel structure of an adaptive algorithm for delecting a coherent pulse packet in the Gaussian narrowband noise background is discussed. In synthesis of the considered pulse packet detectors a vector model of input radar signals is assumed. The special attention is paid to different possible levels of paralleling of the signal performance. In order to reduce the processing time a parallel structure...
A processor-efficient systolic algorithm for the dynamic programming approach to the knapsack problem is presented in this paper. The algorithm is implemented on a linear systolic array where the number of the cells q, the cell memory storage α and the input/output requirements are design parameters. These are independent of the problem size given by the number of the objects m and the knapsack capacity...
A space-time mapping that describes a systolic array consists of a scheduling vector that specifies the temporal distribution and an allocation matrix that specifies the spatial distribution. The allocation matrix determines whether a variable is moving or stationary. The space-time mapping provides a complete description of the data flow for moving variables. But it fails to provide any help in handling...
In this paper, we use a variant of the geometric method to derive efficient modular linear systolic algorithms for the transitive closure and shortest path problems. Furthermore, we show that partially-pipelined modular linear systolic algorithms with an output operation, for matrix multiplication, can be as efficient as fully-pipelined ones, and in some cases needs less cells.
We compare three algorithms for reducing symmetric banded matrices to tridiagonal form and evaluate their performance on the Intel iPSC/860 hypercube parallel computer. Two of these algorithms, the routines BANDR and SBTRD from the EISPACK and LAPACK libraries, resp., are serial algorithms with little potential for coarse grain parallelism. The third one, called SBTH, is a new parallel algorithm....
The Liverpool Single-transputer library [10] was ported to the i860 as part of the Esprit Genesis project, P2702. There are approximately 250 routines in this library, including the BLAS (which are a well-known set of subroutines providing functions commonly used in numerical computing, see [8],[3], [2],[1]) and a set of vector routines, known as the FLO routines This strategy is expensive...
In this paper, we consider the parallel implementation of a block Cholesky factorization based on a nested dissection ordering for unstructured problems. We focus on loosely coupled networks of many processors with local memory and message passing mechanism. More precisely, we study a parallel block solver associated with refined partitions from the separator partition; the aim is to find the partition...
In this paper we present a component-wise error analysis of two different algorithms — one sequential and the other parallel — for solving triangular systems. The results show that each of the computed components of the solution vector using the parallel algorithm is an extended sum of slightly perturbed exact terms whose relative error bounds are comparable to those generated by the usual sequential...
The BBN TC2000 is a distributed-memory multiprocessor with up to 512 RISC processor nodes. The originality of the BBN TC2000 comes from its interconnection network (Butterfly switch) and from its globally addressable memory. We evaluate, in this paper, the impact of the memory hierarchy of the TC2000 on the design of algorithms for linear algebra. On shared memory multiprocessor computers, block algorithms...
A parallel homotopy algorithm is presented for finding a few selected eigenvalues (for example those with the largest real part) of Az = λBz with real, large, sparse, and nonsymmetric square matrix A and real, singular, diagonal matrix B. The essence of the homotopy method is that from the eigenpairs of Dz = λBz, we use Euler-Newton continuation to follow the eigenpairs of A(t)z = λBz with A(t) ≡...
We present two parallel algorithms for solving linear recurrence systems R 〈n,m〉 where m is relatively small, which can be simply implemented on message passing multiprocessors. Theorems concerning their time complexity are also given together with the criterion when each of them should be used. If m is O(1) then the algorithms are effective.
We present a new factorization for band symmetric positive definite (s.p.d) matrices which is more useful for parallel computations than the classical Choleskey decomposition method. Let A be a band s.p.d matrix of order n and half bandwidth m and let p = 2k be the number of processors. We show how to factor A as A = DDtBC using approximately 4nm2/p parallel operations, which...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.