The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Programming efficiency of heterogeneous concurrent systems is limited by the use of lock-based synchronization mechanisms. Transactional memories can greatly improve the programming efficiency of such systems. In field-programmable computing machines, a conventional fixed transactional memory becomes inefficient use of the silicon. We propose configurable transactional memory (CTM) as a mechanism...
This paper presents an FPGA-based flow solver based on the systolic architecture. We show that the fractional-step method employing central difference schemes can be expressed as a systolic algorithm, and therefore the systolic architecture is suitable for a dedicated processor to the flow solver. We have designed a 2D systolic array of cells, each of which has a micro-programmable data-path containing...
FPGA-based acceleration of molecular dynamics (MD) has been the subject of several recent studies. Implementing long-range forces, however, has only recently been addressed. Here we describe a solution based on the multigrid method. We show that multigrid is, in general, an excellent match to FPGAs: the primary operations take advantage of the large number of independently addressable RAMs and the...
Since 1998, no commercially available FPGA has been accompanied by public documentation of its native machine code (or bitstream) format. Consequently, research in reconfigurable hardware has been confined to areas which are specifically supported by manufacturer-supplied tools. Recently, detailed documentation of the bitstream format for the Atmel FPSLIC series of FPGAs appeared on the usenet group...
In this paper we discuss and analyze the FPGA-based implementation of an algorithm for the traveling salesman problem (TSP), and in particular of 2-Opt, one of the most famous local optimization algorithms, for Euclidean TSP instances up to a few hundred cities. We introduce the notion of "symmetrical 2-Opt moves" which allows us to uncover fine-grain parallelism when executing the specified...
Shortest path algorithms are key elements of many graph problems. They are used in such applications as online direction finding and navigation, and modeling of traffic for large scale simulations of major metropolitan areas. As shortest path algorithm are execution bottlenecks, it is beneficial to move their execution to parallel hardware such as field programmable gate arrays (FPGAs). One of the...
The design and implementing of a key point detector on embedded reconfigurable hardware is investigated. The major challenges are efficient hardware/software partitioning of the key point detector algorithm, data flow management as well as efficient use of memory, bus and processor. We present a modular and manual hardware/software co-design, with its implementation on a Xilinx XUP-Virtex II Pro board...
Creating a high throughput sparse matrix vector multiplication (SpMxV) implementation depends on a balanced system design. In this paper, we introduce the innovative SpMxV solver designed for FPGAs (SSF). Besides high computational throughput, system performance is optimized by reducing initialization time and overheads, minimizing and overlapping I/O operations, and increasing scalability. SSF accepts...
Mainstream processor architectures and field programmable custom computing machines (FCCMs) are colliding towards a heterogeneous system on chip architecture. This is apparent from Intel and AMD efforts to create new chip architectures with various processing cores focusing on DSP, networking, and graphics. From the embedded processor research, system-on-chips connected by network on chips have allowed...
This paper presents the porting of an RTOS Micro C/OS-II on a novel reconfigurable instruction cell based architecture which fills the gap between DSP, FPGA and ASIC with high performance, high flexibility and ANSI-C support. WiMAX physical layer program has been implemented on the target architecture with the RTOS support. A semaphore based synchronization scheme is used to improve the task independence...
A technique is presented which allows an FPGA-based reconfigurable system-on-chip to automatically and dynamically load hardware peripheral controllers and software device drivers depending on the system's automated identification of peripheral boards which are connected to the FPGA. The technique loads peripheral detection modules into peripheral controller slots at system startup, and after these...
Powerful multicomputer platforms that combine FPGAs and programmable processors promise tremendous performance benefits for applications that take advantage of these rapidly emerging architectures. Portable applications are desirable because they can be easily adapted to take advantage of different reconfigurable computing platforms. raditional practices, however, intertwine application code with...
Many video and image/signal processing applications can be structured as sequences of data-dependent tasks using a consumer/producer communication paradigm and are therefore amenable to pipelined execution. This paper presents an execution technique to speed-up the overall execution of successive, data-dependent tasks on a reconfigurable architecture. The technique pipelines sequences of data-dependent...
Many signal processing algorithms can be accelerated using reconfigurable hardware. To achieve a good speedup compared to running software on a general purpose processor, fine-grained control over the bitwidth of each component in the datapath is desired. This goal can be achieved by using NU's variable precision floating-point library. To analyze the usefulness of the floating-point divide unit,...
FPGA-based computing engines have become a promising option for the implementation of computationally intensive applications due to high flexibility and parallelism. However, one of the main obstacles to overcome when trying to accelerate an application on an FPGA is the bottleneck in off-chip communication, typically to large memories. Often it is known at compile-time that the same data item is...
We report the results of an FPGA implementation of double precision floating-point division with IEEE rounding. We achieve a total latency (i.e., cycles times clock period) that is 2:6 times smaller than the latency of the fastest previous implementation on FPGAs. The amount of hardware, on the other hand, is comparable to commercial cores. The division circuit is based on Goldschmidt's algorithm...
We present a domain-specific approach to generate high-performance hardware-software partitioned implementations of the discrete Fourier transform (DFT) in fixed point precision. The partitioning strategy is a heuristic based on the DFT's divide-and-conquer algorithmic structure and fine tuned by the feedback-driven exploration of candidate designs. We have integrated this approach in the Spiral linear-transform...
This paper provides an evaluation of SGIreg RASC^TM RC100 technology from a computational science software developer's perspective. A brute force implementation of a two-point angular correlation function is used as a test case application. The computational kernel of this test case algorithm is ported to the Mitrion-C programming language and compiled, targeting the RC100 hardware. We explore several...
In this paper we present our work toward FPGA acceleration of phylogenetic reconstruction, a type of analysis that is commonly performed in the fields of systematic biology and comparative genomics. In our initial study, we have targeted a specific application that reconstructs maximum-parsimony (MP) phylogenies for gene-rearrangement data. Like other prevalent applications in computational biology,...
Reconfigurable hardware (RH) is used in an increasing variety of applications, many of which require support for features commonly found in general purpose systems. In this work we examine some of the challenges faced in integrating RH with general purpose processors and memory systems. We propose a new CPU-RH-memory interface that takes advantage of on-chip caches and uses virtual memory for communication...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.