The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Architecturally diverse systems can improve streaming application performance by orders of magnitude, albeit with enormous programmer effort. To simplify the programming of such systems, we have constructed the Auto-Pipe application development environment, which supports the flexible mapping of application components onto computational resources and the automatic delivery of data between these computational...
Embedded computing platforms have long incorporated non-traditional architectures (e.g., FPGAs, ASICs) to combat the diminishing returns of Moore's Law as applied to traditional processors. These specialized architectures can offer higher performance potential in a smaller space, higher power efficiency, and competitive costs. A price is paid, however, in development difficulty in determining functional...
Many computation kernels that analyze large data streams can be accelerated by converting their recurrences to parallel systolic arrays. Application domains such as bioinformatics seek to minimize the total time to analyze a large set of discrete small inputs. While traditional methods for array synthesis produce a single most efficient array design, modern computational platforms support fast runtime...
Computational finance relies heavily on the use of Monte Carlo simulation techniques. However, Monte Carlo simulation is computationally very demanding. We demonstrate the use of architecturally diverse systems to accelerate the performance of these simulations, exploiting both graphics processing units and field-programmable gate arrays. Performance results include a speedup of 74times relative to...
RNA structure prediction, or folding, is a compute-intensive task that lies at the core of several search applications in bioinformatics. We begin to address the need for high-throughput RNA folding by accelerating the Nussinov folding algorithm using a 2D systolic array architecture. We adapt classic results on parallel string parenthesization to produce efficient systolic arrays for the Nussinov...
Pipelined computing applications often have their performance modeled using queueing techniques. While networks with infinite capacity queues have well understood properties, networks with finite capacity queues and blocking between servers have resisted closed-form solutions and are typically analyzed with approximate solutions. It is this latter case that more closely represents the circumstances...
Significant performance gains have been reported by exploiting the specialized characteristics of hybrid computing architectures for a number of streaming applications. While it is straightforward to physically construct these hybrid systems, application development is often quite difficult. We have built an application development environment, Auto-Pipe, that targets streaming applications deployed...
Direct-attached storage has historically had the reputation of being less capable than equivalently sized SAN installations. Here, we empirically demonstrate the performance achievable in multiple-terabyte, direct-attached disk subsystems. A number of parameters are explored, including file system, number of logical drives, and RAID configuration.
Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. The popular BLASTP software for this task has become a bottleneck for proteomic database search. One third of this software's time is spent executing the Smith-Waterman dynamic programming algorithm. This work describes a novel FPGA design for banded Smith-Waterman, an algorithmic variant tuned...
Hybrid computing systems consisting of multiple platform types (e.g., general purpose processors, FPGAs etc.) are increasingly being used to achieve higher performance and lower costs than can be obtained with homogeneous systems (e.g., processor clusters). Different platforms have different languages and simulators associated with them. Auto-pipe has been developed as a toolset to reduce the complexity...
BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more runtime or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this paper, we focus on seed...
Comparison between biosequences and probabilistic models is an increasingly important part of modern DNA and protein sequence analysis. The large and growing number of such models in today's databases demands computational approaches to searching these databases faster, while maintaining high sensitivity to biologically meaningful similarities. This work describes an FPGA-based accelerator for comparing...
Applications for constrained embedded systems are subject to strict time constraints and restrictive resource utilization. With soft core processors, application developers can customize the processor for their application, constrained by resources but aimed at high application performance. With such freedom in the design space of the processor, however, comes complexity. We present here an automatic...
A dedicated cluster is often not fully utilized even when all of its processors are allocated to jobs. This occurs any time that a running job does not use 100% of each of the processors allocated to it. We increase the throughput and efficiency of the cluster by scheduling background jobs to run concurrently with the "primary" jobs originally scheduled on the cluster. We do this while maintaining...
While improvements in the density of semiconductor circuitry have been dramatic, the density improvements in magnetic storage have been even greater. We now store much more data than we have time to process, implying that techniques for processing these data need to be significantly altered. This paper describes a new architectural approach that enables the processing of very large data sets, yielding...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.