The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a parallelized architecture of multiple classifiers for face detection based on the Viola and Jones object detection method. This method makes use of the AdaBoost algorithm which identifies a sequence of Haar classifiers that indicate the presence of a face. We describe the hardware design techniques including image scaling, integral image generation, pipelined processing of classifiers,...
This paper describes the acceleration of virtual ecology models using field-programmable gate arrays (FPGAs). Our approach targets models generated by the Virtual Ecology Workbench (VEW); an existing tool used by biological oceanographers to build and analyze models of the plankton ecosystem in the upper ocean. Depending on the plankton study and required level of detail, the logic, memory, and data...
Embedded appliances designers rely on heterogeneous multi-core system-on-chips (HMC-SoC) to provide the computing power required by modern applications. Due to the inherent complexity of this kind of platform, the development of specific system architectures is not considered as an option to provide low-level services to an application. Hence, the software is built either from scratch - when the softwarepsilas...
The advent of the mobile age has heavily changed the requirements of today's communication devices. Data transmission over interference-prone wireless channels requires additional steps of data processing, such as forward error correction, to ensure reliable communication. In this work we present RS(63,55) Reed-Solomon encoding and decoding algorithms according to the IEEE 802.15.4a standard executed...
Progressive alignment is a widely used approach for computing multiple sequence alignments (MSAs). However, aligning several hundred or thousand sequences with popular progressive alignment tools such as ClustalW requires hours or even days on state-of-the-art workstations. This paper presents MSA-CUDA, a parallel MSA program, which parallelizes all three stages of the ClustalW processing pipeline...
Demand for fast dynamic reconfiguration has increased since dynamic reconfiguration can accelerate the performance of processors. Dynamic reconfiguration has two important prerequisites: fast reconfiguration and numerous reconfiguration contexts. Unfortunately, fast reconfigurations and numerous contexts share a tradeoff relation on current VLSIs. Therefore, optically reconfigurable gate arrays were...
Conventional approach of detecting malwares relies on static scanning of malware signature. However, it may not work on the malwares that use software protection methods such as encryption and packing with run-time decryption and unpacking. We propose a hardware-assisted malware detection system that detects malwares during program run time to complement the conventional approach. It searches for...
In this paper we present an Application Customizable Branch Predictor, ACBP, that delivers efficiency in energy savings and performance without compromising prediction accuracy. The idea of our technique is to filter unnecessary global history information within the global history register to minimize the predictor size while maintaining prediction accuracy. We suggest in this work an efficient algorithm...
Signature-based network intrusion detection requires fast and reconfigurable pattern matching for deep packet inspection. In our previous work we address this problem with a hardware based pattern matching engine that utilizes a novel state encoding scheme to allow memory efficient use of Deterministic Finite Automata. In this work we expand on these concepts to create a completely software based...
In this paper we consider a multiresolution filter and its realization on the Cell BE and GPUs. We not only present common and specific optimization strategies undertaken for obtaining maximum performance on these architectures, but also how to obtain a speedup of 6.57x and 33.24x compared to an optimized OpenMP baseline implementation. Furthermore, we also undertake automated configuration space...
This paper presents the implementation of a novel parallel FFT algorithm on SmartCell, a coarse-grained reconfigurable architecture, which is targeted on data streaming applications. The proposed FFT algorithm achieves balanced workload and memory requirement among the computational units, while maintaining optimized data flow at low configuration and communication cost. The proposed parallel FFT...
Previously suggested transistor sizing algorithms assume that all input transitions are equally important. In this work we show that this is not an accurate assumption as input transitions appear in different frequencies. We take advantage from this phenomenon and introduce application specific transistor sizing. In application specific transistor sizing higher priority is given to more frequent transitions...
The spectral hash algorithm is one of the round 1 candidates for the SHA-3 family, and is based on spectral arithmetic over a finite field, involving multidimensional discrete Fourier transformations over a finite field, data dependent permutations, rubic-type rotations, and affine and nonlinear functions. The underlying mathematical structures and operations pose interesting and challenging tasks...
Most field programmable gate array (FPGA) devices have a special fast carry propagation logic intended to optimize addition operations. The redundant adders do not easily fit into this specialized carry-logic and, consequently, they require double hardware resources than carry propagate adders, while showing a similar delay for small size operands. Therefore, carry-save adders are not usually implemented...
The following topics are dealt with: application-specific system; architectures; arithmetic; field programmable gate array; media processing; image processing; cryptography; application-specific integrated circuit; computational biology; and application-specific instruction processor.
In this paper, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE 754-2008 floating-point standard. The multiplier design operates on either (1) 64-bit binary encoded decimal floating-point (DFP) numbers or (2) 64-bit binary floating-point (BFP) numbers. It returns properly rounded results for the rounding modes specified...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.