The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Field programmable gate arrays (FPGAs), graphics processing units (GPUs) and Sony's Playstation 2 vector units offer scope for hardware acceleration of applications. Implementing algorithms on multiple architectures can be a long and complicated process. We demonstrate an approach to compiling for FPGAs, GPUs and PS2 vector units using a unified description based on A Stream Compiler (ASC) for FPGAs...
The authors describe a new pipeline for computing non-bonded forces and its integration into the ProtoMol molecular dynamics (MD) code. There are several innovations: a novel interpolation strategy, including use of higher order terms; coefficient generation with orthonormal functions; the introduction of "semi-floating point" numbering; and various issues related to system integration....
In this paper, the authors propose a new design for feature detection used for tracking, which eliminates the need of a central computer to complete computations for the feature selection algorithm. Such a system constrains performance due to the delay in which data is transferred from camera to computer for processing. Our design suggests that feature detection computation can be done on a processor...
This paper introduces a novel architecture for performing the core computations required by dynamic programming (DP) techniques. The latter pertain to a vast range of applications that necessitate an optimal sequence of decisions to be issued. An underlying assumption is that a complete model of the environment is provided, whereby the dynamics are governed by a Markov decision process (MDP). Existing...
A design methodology and prototype tool to automate the design and architectural exploration of hardware accelerators are described in this paper. In comparison to other approaches, we utilize a well-engineered template to enable fast convergence to an area and speed efficient design. We show how this methodology is used for an application set with various architectural configurations
In this paper, the authors present a reconfigurable cluster-on-chip architecture and supporting parallel programming software library based on the well-known message passing interface (MPI) standard. The intent is to allow designers to program multi-core reconfigurable systems on chip using the same or similar methodologies that yielded tremendous productivity improvements in the workstation and HPC...
On-going improvements in the scaling of FPGA device sizes and time-to-market pressures encourage the use of module-oriented design flows [3], while economic factors favour the reuse of smaller devices for high performance computational tasks. One of the core problems in proposing dynamic modular reconfiguration approaches is supporting the differing communications needs of the sequence of modules...
The Parrotfish project is a low cost, distributed environment for (partial) reconfiguration of distributed field programmable systems, e.g. sensor networks. In this paper we present architectures and results in which the wireless nodes of a distributed system can undergo runtime task-reversals under triggering from external conditions, using Bluetooth as a low cost wireless medium. The project gets...
This paper presents a framework for rapid development of FPGA based custom processors based on floating-point calculation units. The framework consists of a fully parameterized floating-point library, an easy-to-use pipeline generator and an interface generator for memory and I/O-modules. The performance of this approach is shown for the implementation of an SPH-algorithm.
Cryptanalysis of symmetric and asymmetric ciphers is computationally extremely demanding. Since the security parameters of almost all practical crypto algorithms are chosen such that attacks with conventional computers are computationally infeasible, the only promising way to tackle existing ciphers (assuming no mathematical breakthrough) is to build special-purpose hardware. This contribution presents...
A synthesis environment called DSynth has been created for synthesizing high-performance pipelined circuits for FPGAs from synchronous data flow specifications. The goal of this work is to generate the minimum size circuit that meets the throughput constraint of the data flow model. To achieve this constraint efficiently, this approach relies heavily upon a library of pre-characterized pipelined circuit...
Given the logic density of modern FPGAs, it is feasible to use FPGAs for floating-point applications. However, it is important that any floating-point units that are used be highly optimized. This paper introduces an open source library of highly optimized floating-point units for Xilinx FPGAs. The units are fully IEEE compliant and acheive approximately 230 MHz operation frequency for double-precision...
FPGAs are becoming more and more attractive for high precision scientific computations. One of the main problems in efficient resource utilization is the quadratically growing resource usage of multipliers depending on the operand size. Many research efforts have been devoted to the optimization of individual arithmetic and linear algebra operations. In this paper the authors take a higher level approach...
Optimal reconfigurable hardware implementations may require the use of arbitrary floating-point formats that do not necessarily conform to IEEE specified sizes. The authors have previously presented a variable precision floating-point library for use with reconfigurable hardware. The authors recently added three advanced components: floating-point division, floating-point square root and floating-point...
This paper is concerned with the application of formal optimisation methods to the design of mixed-granularity FPGAs. In particular, the authors investigate the appropriate mix and floorplan of heterogeneous elements: multipliers, RAMs, and LUT-based logic, in order to maximise the performance of a set of DSP benchmark applications, given a fixed silicon budget. The authors extend our previous mathematical...
This paper presents a hardware-optimized variant of the well-known Gaussian elimination over GF(2) and its highly efficient implementation. The proposed hardware architecture can solve any regular and (uniquely solvable) overdetermined linear system of equations (LSE) and is not limited to matrices of a certain structure. Besides solving LSEs, the architecture at hand can also accomplish the related...
Approximate string matching is fundamental to bioinformatics, and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic programming- (DP) based methods. Our primary contributions are two new algorithms for emulating the seeding and extension phases of BLAST. These operate in a single pass through a database at...
This paper describes LURU, a methodology for FPGA combinational technology mapping through the parallel search capability of content-addressable memory (CAM). An overview was shown. First, a circuit is partitioned into a set of subcircuits. Topologies of these subcircuits are described using textual string representations. A precomputed set of strings for the circuit topologies that can be contained...
The Quartz framework allows the generation of parametrised high-performance placed IP cores from higher level descriptions. Circuits are described in the Quartz language, which provides advanced features such as polymorphism, overloading, higher-order combinators and formal reasoning while supporting precise and flexible control of layout for efficient FPGA design. Our compiler transforms Quartz descriptions...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.