The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
To support incremental replay of message-passing applications, processes must periodically checkpoint and must log some of the messages. The paper shows that known adaptive logging algorithms are likely to introduce deadlocks in replay and presents a new algorithm that prevents deadlocks and achieves better performance.
The increasing popularity of a Cluster of Workstations (COW) for the execution of parallel applications can be attributed to its impressive price to performance ratio. Unfortunately, currently available software to manage the execution of parallel applications on COWs do not provide satisfactory levels of performance, nor do they provide the application developer with a friendly programming environment...
Parallel programming can be made easier by means of a skeleton based methodology, such as P3L, which helps programmers to compose their applications by using a set of fixed parallel patterns. Such kind of approach is also useful to obtain portability because the “structured” nature of the language can be used to devise a composable support for each parallel pattern so that the...
The goal of the Swiss-Tx project is to develop, build and install a series of new supercomputers which are mostly based on commodity parts. Only the communication devices and the communication libraries are custom because available products (e.g. Ethernet with the standard socket interface) do not offer the necessary functionality, bandwidth and latency. This paper presents the high-performance communication...
Reducing the traffic between CPU and main memory is one of the main issues in the optimization of programs for load/store architectures. It is the register allocation module of optimizing compliers that keeps this traffic low by cleverly associating the program variables to the CPU registers. Since register allocation takes place during code generation and works on the intermediate code produced by...
The use of a standard binary format in the later part of code generation promotes efficiency and interchangeability of tools, but leaves little information on the source file in the machine code representation. We propose a new approach to code generation, based on a single, highly structured internal format used during proper compilation, machine code generation and linkage. This format offers new...
We describe Nestor, a library to easily manipulate Fortran programs through a high level internal representation based on C++ classes. Nestor is a research tool that can be used to quickly implement source to source transformations. The input of the library is Fortran 77, Fortran 90 and HPF 2.0. Its current output supports the same languages plus some dialects such as Petit, OpenMP, CrayMP. Compared...
In this paper, we propose the sandglass-type parallelization technique for a doacross loop which has the characteristics of iterationbased parallelizing and software pipelining. We prove its effectiveness by comparing the sandglass-type to well-known three parallelization techniques: iteration-based, software pipelining, and a combination of doalltype parallel and sequential techniques. We conclude...
Code transformations are a very effective method of parallelizing and improving the efficiency of programs. Unfortunately most compiler systems require implementing separate (sub-)programs for each transformation. This paper describes a different approach. We designed and implemented a fully programmable transformation engine. It can be programmed by means of a transformation language. This language...
Network of workstations (NOW) is a cost-effective alternative to a multiprocessor system. Here we propose a centralized architecture for parallel query processing on network of workstations. We describe a three-level processing strategy and evaluate its performance. The top two levels use a space-sharing technique to assign a partition to a query. The third-level uses a chunk-based load sharing policy...
In the paper a model of the object-oriented database system is presented for archiving results generated with particles simulations and for retrieving simulation results from the database system for further processing.
The concept of virtual engineering (VEng) can be understood as a generalization of “multi-disciplinary problem solving”, an ever more used term in scientific computing. An abstract space consisting of the physical, the geometrical, and the cost function directions, called CGP, is introduced. The VEng problem can be seen as a complex manifold embedded in this space. Common standard data formats, unified...
The need for tools for performance prediction of parallel database systems is generally recognised. One such tool which has been developed (Steady) is based on analytical techniques to obtain a rapid estimate of performance. The approach to predicting response time involves a heuristic approximation coupled with standard queueing solutions. This paper reports on preliminary results for both maximum...
The architecture and performance of a structured distributed shared memory system, PastSet, is described. The PastSet abstraction allows programmers to write applications that run efficiently on different architectures from four-way SMP nodes to larger clusters. PastSet is a tuple-based three-dimensional structured distributed shared memory system, which provides the programmer with operations to...
The problem of optimal compile-time multiprocessor scheduling of iterative data-flow programs with feedback (delay elements) is addressed in this paper, unlike the earlier studies assumed the availability of a large number of processors and complete interconnection among them along with the interprocessor communication (IPC) to be non-negligible to be more realistic. We first explain the effects of...
In the framework of distributed object systems, this paper presents the concepts and an implementation of an overlapping mechanism between communication and computation. This mechanism allows to decrease the execution time of a remote method invocation.
In this paper we present a run-time mechanism to simultaneously execute multiple threads from a sequential program on a simultaneous multithreaded (SMT) processor. The threads are speculative in the sense that they are created by predicting the future control flow of the program. Moreover, threads are not necessarily independent. Data dependences among simultaneously executed threads may exist. To...
NICAM is a communication layer for SMP PC clusters connected via Myrinet, designed to reduce overhead and latency by directly utilizing a micro-processor equipped on the network interface. It adopts remote memory operations to reduce much of the overhead found in message passing. NICAM employs an Active Messages framework for flexibility in programming on the network interface, and this flexibility...
Collective communication performance is critical in a number of MPI applications, yet relatively few results are available to assess the performance of mainstream MPI implementations. In this paper we focus on two widely used primitives, broadcast and reduce, and present experimental results for the Cray T3E and the IBM SP2. We compare the performance of the existing MPI primitives with our implementation...
In this paper we present MaDCoWS, a software implementation of a Distributed Shared Memory (DSM) runtime system, specifically designed for massively parallel 2-D grid multiprocessors. The system takes advantage of the network topology in order to minimise the paths of the message sequences realising the shared operations. As a result its performance is increased and the system becomes scalable even...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.