The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
FPGA has long been considered an attractive platform for high performance implementations of string matching. However, as the size of pattern dictionaries continues to grow, such large dictionaries can be stored in external DRAM only. The increased memory latency and limited bandwidth pose new challenges to FPGA-based designs, and the lack of spatial and temporal locality in data access also leads...
We propose a pipelined field-merge architecture for memory-efficient and high-throughput large-scale string matching (LSSM). Our proposed architecture partitions the (8-bit) character input into several bit-field inputs of smaller (usually 2-bit) widths. Each bit-field input is matched in a partial state machine (PSM) pipeline constructed from the respective bit-field patterns. The matching results...
Although soft microprocessors are widely used in FPGAs, limited work has been performed regarding how to automatically and efficiently generate soft multiprocessors. In this paper, an automated parallel compilation environment for multiple soft processors which incorporates parallel compilation and inter-processor communication structures is described. A total of eight previously-developed parallel...
NCBI BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. The problem is that it uses complex heuristics which make it difficult to simultaneously achieve both substantial speed-up and exact agreement with the original output. Our approach is to prefilter the database. To make this work we have developed a novel heuristic...
This paper will describe the architecture of a compiler which will convert an untimed C description of a floating point expression into a synthesizable datapath optimized for FPGAs. The concept of floating point fused datapath synthesis will be reviewed, along with the expected functional efficiency gains. The dataflow graph structure used by the compiler will be detailed, followed by the description...
This work presents a detailed implementation of a double precision, non-preconditioned, conjugate gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecturetrade in conjunction with x86 Opterontrade processors from AMD. We implement a common conjugate gradient algorithm, on a variety of systems, to compare and contrast performance...
Most high-speed Internet Protocol (IP) lookup implementations use tree traversal and pipelining. Due to the available on-chip memory and the number of I/O pins of Field Programmable Gate Arrays (FPGAs), state-of-the-art designs cannot support the current largest routing table(consisting of 257 K prefixes in backbone routers). We propose a novel scalable high-throughput, low-power SRAM-based linear...
The fine-grained parallelism inherent in FPGAs has encouraged their use in packet processing systems. To facilitate debugging and performance evaluation, designers require on-chip monitors that provide abstractions of low-level details and a system-level perspective. In this paper, we present five architectures that permit transaction-based communication-centric monitoring of packet processing systems...
The Ambric Massively Parallel Processor Array (MPPA) is a device that contains 336 32-bit RISC processors and is appropriate for embedded systems due to its relatively small physical and power footprint. Optical flow is a computationally-demanding and highly parallelizeable image-processing algorithm with applications in embedded systems such as robotics and autonomous vehicles. An optical flow algorithm...
Photodynamic therapy (PDT) is a method of treating cancer that combines light and light-sensitive drugs to selectively destroy cancerous tumours without harming the healthy tissue. The success of PDT depends on the accurate computation of light dose distribution. Monte Carlo (MC) simulations can provide an accurate solution for light dose distribution, but have high computation time that prevents...
This paper proposes a high performance least square solver on FPGAs using the Cholesky decomposition method. Our design can be realized by iteratively adopting a single triangular linear equation solver for modified Cholesky decomposition and forward/backward substitutions. Good performance is achieved by optimizing the Cholesky factorization algorithms, reordering the computation and thus alleviating...
One of the most efficient methods for cracking passwords is the one based on ldquorainbow tablesrdquo; those lookup tables are offering an almost optimal time-memory tradeoff in the process of recovering the plaintext password from a password hash generated by a cryptographic hash function. In this paper, we demonstrate the first known system, implemented in a state-of-the-art reconfigurable device...
A packet generator and network traffic capture system has been implemented on the NetFPGA. The NetFPGA is an open networking platform accelerator that enables rapid development of hardware-accelerated packet processing applications. The packet generator application allows Internet packets to be transmitted at line rate on up to four gigabit Ethernet ports simultaneously. Data transmitted is specified...
Scheduling and partitioning of task graphs on reconfigurable hardware needs to be carefully carried out in order to achieve the best possible performance. In this paper, we demonstrate that a significant improvement to the total execution time is possible by incorporating a library of hardware task implementations, which contains multiple architectural variants for each hardware task reflecting tradeoffs...
The computationally intensive power flow problem determines the voltage magnitude and phase angle at each bus in a power system for hundreds of thousands of buses under balanced three-phase steady-state conditions. We report an FPGA acceleration of the Gauss-Seidel based power flow solver employed in the transmission module of the GridLAB-D power distribution simulator and analysis tool. The prototype...
The Reconfigurable Computing Cluster Project at the University of North Carolina at Charlotte is investigating the feasibility of using FPGAs as compute nodes to scale to PetaFLOP computing. To date the Spirit cluster, consisting of 64 FPGAs, has been assembled for the initial analysis. One important question is how to efficiently communicate among compute cores on-chip as well as between nodes. Tight...
This paper describes an FPGA implementation of a single-precision floating-point multiply-accumulator (FPMAC) that supports single-cycle accumulation while maintaining high clock frequencies. A non-traditional internal representation reduces the cost of mantissa alignment within the accumulator. The FPMAC is evaluated on an Altera Stratix III FPGA.
In this paper, we introduce GroundHog 2009 benchmarking suite that can be used to evaluate the power consumption of reconfigurable technology implementing applications targeting the mobile computing domain. This benchmark suite includes seven designs; one design targets fine-grained FPGA fabrics, and six designs are specified at a high level, which allows them to target a range of reconfigurable technologies...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.