The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The complete Voronoi map of a binary image with black and white pixels is a matrix of the same size such that each element is the closest black pixel of the corresponding pixel. The complete Voronoi map visualizes the influence region of each black pixel. However, each region may not be connected due to exclave pixels. The connected Voronoi map is a modification of the complete Voronoi map so that...
The main contribution of this paper is to show a new photomosaic generation method by rearranging subimages of an image. In the photomosaic generation, an input image is divided into small subimages and they are rearranged such that the rearranged image reproduces another image given as a target image. Therefore, this problem can be considered as a combinatorial optimization problem to obtain the...
The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution...
The main contribution of this paper is to show an implementation of the parallel convex hull algorithm on the Parallella architecture. Parallella is a single-board computer with 16 mesh-connected cores. We have considered the memory architecture and mesh-connected network of the Parallella architecture. We evaluated the computing time and the energy-efficiency by comparing with various computing platforms...
The main contribution of this paper is to present a very efficient GPU implementation of bulk computation of eigenvalues for a large number of small non-symmetric real matrices. This work is motivated by the necessity of such bulk computation in design of control systems, which requires to compute the eigenvalues of hundreds of thousands non-symmetric real matrices of size up to 30x30. In our GPU...
The task of finding strings having a partial match to a given pattern is of interest to a number of practical applications, including DNA sequencing and text searching. Owing to its importance, alternatives to accelerate the Approximate String Matching (ASM) have been widely investigated in the literature. The main contribution of this work is to present a memory-access-efficient implementation for...
Suppose that a sequence of sensing data with timestamps are transferred asynchronously. Some of sensing data may be delayed by some period of time and the sequence is not in proper increasing order of timestamps. A sequence of timestamps to, t1,..., tn-l is d-sorted if ti
Vertex coloring is an assignment of colors to vertex of an undirected graph such that no two vertices sharing the same edge have the same color. The vertex coloring problem is to find the minimum number of colors necessary to color a graph given, which is an NP-hard problem in combinatorial optimization. Ant Colony Optimization (ACO) is a well-known meta-heuristic in which a colony of artificial ants...
The main contribution of this paper is to present an FPGAbased implementation of an instance-specific hardware which accelerates the CKY (Cook-Kasami-Younger) parsing for context-free grammars. Given a context-free grammar G and a string x, the CKY parsing determines if G derives x. We have developed a hardware generator that creates a Verilog HDL source to perform the CKY parsing for any given context-free...
The main contribution of this paper is topresent Bitwise Parallel Bulk Computation (BPBC) technique, to accelerate bulk computation, which executes the same algorithm for a lot of instances in turn or in parallel. The idea of the BPBC technique isto simulate a combinational logic circuit for 32 inputsat the same time using bitwise logic operators for 32-bit integerssupported by most processing devices...
LZW algorithm is one of the most famous dictionary-based compression and decompression algorithms. The main contribution of this paper is to present a hardware LZW decompression algorithm and to implement it in an FPGA. The experimental results show that one proposed module on Virtex-7 family FPGA XC7VX485T-2 runs up to 2.16 times faster than sequential LZW decompression on a single CPU, where the...
The LZW compression is a well known patented lossless compression method used in Unix file compression utility "compress" and in GIF and TIFF image formats. It converts an input string of characters (or 8-bit unsigned integers) into a string of codes using a code table (or dictionary) that maps strings into codes. Since the code table is generated by repeatedly adding newly appeared substrings...
The main contribution of this paper is to present an intermediate approach of software and hardware using FPGAs. More specifically, we present a processor based on FDFM (Few DSP slices and Few Memory blocks) approach that supports arithmetic operations with flexibly many bits, and implement it in the Xilinx Virtex-6 FPGA. Arithmetic instructions of our processor architecture include addition, subtraction,...
The approximate string matching (ASM) problem asks to find a substring of string Y of length n that is most similar to string X of length m. The ASM can be solved by dynamic programming technique, which computes a table of size m × n. The main contribution of this work is to present a memory-access-efficient implementation for computing the ASM on a GPU. The key idea of our implementation relies on...
If we process large-integers on the computer, they are represented by multiple-length integer. Multiple-length multiplication is widely used in areas such as scientific computation and cryptography processing. However, the computation cost is very high since CPU does not support a multiple-length integer. In this paper, we present a GPU implementation of bulk multiple-length multiplications. The idea...
The Conway's Game of Life is the most well-known cellular automaton. The universe of the Game of Life is a 2-dimensional array of cells, each of which takes two possible states, alive or dead. The state of every cell is repeatedly updated according to those of eight neighbors. A cell will be alive if exactly three neighbors are alive, or if it is alive and two or three neighbors are alive. The main...
Error diffusion is a classical but still popular method for generating a binary image that reproduces an original gray-scale image. In error diffusion, pixel values are rounded to binary in raster scan order and the rounding error is distributed to neighboring pixels that have not yet been processed. The main contribution of this paper is to show several parallel algorithms and implementation techniques...
The main contribution of this paper is to show a new GPU implementation for the digital half toning by the local exhaustive search that can generate high quality binary images. We have considered programming issues of the GPU architecture to implement these two methods on the GPU. The experimental result shows that our GPU implementation for the local exhaustive search on NVIDIA GeForce GTX 980 for...
This paper presents a FIFO-based parallel merge sorter optimized for the latest FPGA. More specifically, we show a sorter that sorts K keys in latency K + log2 K -- 1 using log2 K comparators. It uses K/M + log2 K + log2 M -- 1 memory blocks with capacity M to implement FIFOs. It receives K keys one by one in every clock cycle and outputs the sorted sequence of them from K + log2 K -- 1 clock cycles...
RSA is one the most well-known public-key cryptosystems widely used for secure data transfer. An RSA encryption key includes a modulus n which is the product of two large prime numbers p and q. If an RSA modulus n can be decomposed into p and q, the corresponding decryption key can be computed easily from them and the original message can be obtained using it. RSA cryptosystem relies on the hardness...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.