The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a new algorithm, PMS6, for the (l; d)-motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. The run time ratio PMS5/PMS6, where PMS5 is the fastest previously known algorithm for motif discovery in large instances, ranges from a high of 2.20 for the (21,8) challenge instances to a low of 1...
We develop novel single-GPU parallelizations of the Smith-Waterman algorithm for pairwise sequence alignment. Our algorithms, which are suitable for the alignment of a single pair of very long sequences, can be used to determine the alignment score as well as the actual alignment. Experimental results demonstrate an order of magnitude reduction in run time relative to competing GPU algorithms.
This paper presents a design for parallel processing of synthetic aperture radar (SAR) data using multiple Graphics Processing Units (GPUs). Our approach supports real-time reconstruction of a two-dimensional image from a matrix of echo pulses and their response values. Key to runtime efficiency is a partitioning scheme that divides the output image into tiles and the input matrix into a collection...
We provide efficient single-precision and integer GPU implementations of Strassen's algorithm as well as of Winograd's variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen's 4-level implementation and 33% (36%) for Winograd's variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for...
This paper presents a design for parallel processing of synthetic aperture radar (SAR) data using one or more Graphics Processing Units (GPUs). Our design supports real-time reconstruction of a two-dimensional image from a matrix of echo pulses and their corresponding response values. Key to our design is a dual partitioning scheme that divides the output image into tiles and divides the input matrix...
We extend the fastest comparison based (sample sort) and non-comparison based (radix sort) number sorting algorithms on a GPU to sort large multifield records. Two extensions - direct (the entire record is moved whenever its key is to be moved) and indirect ((key, index) pairs are sorted using the direct extension and then records are ordered according to the obtained index permutation) are discussed...
We solve workflow scheduling problems in e-Science networks, whose goal is minimizing either makespan or network resource consumption by jointly scheduling heterogeneous resources such as compute and network resources. We formulate the workflow scheduling problem incorporating multiple paths as a mixed integer linear programming (MILP) and develop several linear programming relaxation heuristics based...
With its 9 cores per chip, the IBM Cell/Broadband Engine (Cell) can deliver an impressive amount of compute power and benefit the string-matching kernels of network security, business analytics and natural language processing applications. However, the available amount of main memory on the system limits the maximum size of the dictionary supported by the string matching solution. To counter that,...
We develop GPU adaptations of the Aho-Corasick string matching algorithm for the the case when all data reside initially in the GPU memory and the results are to be left in this memory. We consider several refinements to a base GPU implementation and measure the performance gain from each refinement. Experiments conducted on an NVIDIA Tesla GT200 GPU that has 240 cores running off of a Xeon 2.8GHz...
We propose algorithms for distributing the classifier rules to two TCAMs (ternary content addressable memories) and for incrementally updating the TCAMs. The performance of our scheme is compared against the prevalent scheme of storing classifier rules in a single TCAM in priority order. Our scheme results in an improvement in average lookup speed by up to 48% and our experiments demonstrate an improvement...
Time-domain Wavelength Interleaved Networking (TWIN) is a new optical network architecture which achieves a good balance between scheduling flexibility and deployment cost. In this paper, we solve the wavelength assignment problem for TWIN networks using topology sharing approach. We show that determining the wavelength assignment that use the minimum number of wavelengths is a NP-Complete problem...
We consider the sorting of a large number of multifield records on the Cell Broadband engine. We show that our method, which generates runs using a 2-way merge and then merges these runs using a 4-way merge, outperforms previously proposed sort methods that use either comb sort or bitonic sort for run generation followed by a 2-way odd-even merging of runs. Interestingly, best performance is achieved...
We propose a dual TCAM architecture - DUOS, for routing tables. Four memory management schemes for TCAMs also are proposed and evaluated. DUOS and our memory management schemes support control-plane incremental updates without delaying data-plane lookups. Compared to other TCAM architectures such as CAO OPT [19] that support incremental updates without delaying lookups, DUOS offers reduction in power...
An Internet router may receive a batch of tens of thousands of updates (insert a new rule or delete/change an existing rule) in any instant (i.e., with the same time stamp). This paper deals with analyzing possible orderings of a batch of updates such that forwarding table consistency is maintained while these updates are performed one at a time as in a table that supports incremental updates rather...
We develop a novel framework for supporting e-Science applications that require streaming of information between sites. Using a Synchronous Dataflow (SDF) model, our framework incorporates the communication times inherent in large scale distributed applications, and can be used to formulate the bandwidth allocation problem with throughput constraints as a multi-commodity linear programming problem...
We propose several algorithms for topology aggregation (TA) to effectively summarize large-scale networks. These TA techniques are shown to significantly better for path requests in e-Science that may consist of simultaneous reservation of multiple paths and/or simultaneous reservation for multiple requests. Our extensive simulation demonstrates the benefits of our algorithms both in terms of accuracy...
We develop an 0(1) time algorithm to a triangulate a set of N planar points using an N??N reconfigurable mesh with buses. Our algorithm works on all reconfigurable mesh with buses architectures.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.