Search results

chapter

A high-performance VLSI architecture for variable block size motion estimation

Hsin-Chou Chi, Han-Sheng Liu, Hsi-Che Tseng

2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE) > 123 - 124

2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE)

Variable block size motion estimation (VBSME) is a video coding technique which improves video distortion, provides more accurate predictions, reduces video coding data, and increases the utilization of network bandwidth. This paper presents a high-performance VLSI architecture for VBSME which can be applied to the full search block matching algorithm. Our proposed architecture uses pipelined designs...

chapter

A Dynamic Core Grouping Approach to Improve Raw Architecture Many-core Processor Performance

Z Wan

2011 Sixth International Symposium on Parallel Computing in Electrical Engineering > 31 - 35

2011 6th International Symposium on Parallel Computing in Electrical Engineering (PARELEC 2011)

The ongoing move of hardware platforms to many-core processor challenges the traditional software design methodology. It is critical to develop new programming paradigms and efficient ways to port legacy applications. This paper analyzed a typical packet processing application and also the cache hierarchy and behavior of Raw architecture many-core processor. It presented an easy to implement run-time...

chapter

Synthesis and comparison of low-power high-throughput architectures for SAD calculation

F L Walter, C M Diniz, S Bampi

2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS) > 1 - 4

2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS)

This paper presents the standard-cells synthesis and comparison of parallel hardware architectures for the Sum of Absolute Differences (SAD) datapath, focusing on different design points such as high-performance (maximum throughput) and the tradeoff between high-performance and low-power dissipation (isoperformance target). Clock gating and different combination of parallelism and pipeline architectural...

chapter

Accelerated biomedical simulations using the FDTD method and the CUDA architecture

David Ireland, Wei Chern Tee, Marek Bialkowski

Asia-Pacific Microwave Conference 2011 > 70 - 73

2011 Asia Pacific Microwave Conference (APMC)

This paper presents empirical results of the finite difference time domain algorithm implemented on NVIDIA's CUDA architecture in order to achieve significant decreases in electromagnetic field simulation time. The proposed strategy is demonstrated with respect to biomedical applications where 10 different anatomical realistic body phantoms are simulated and speed increases quantified.

chapter

Pipelined implementation of AES encryption based on FPGA

Yulin Zhang, Xinggang Wang

2010 IEEE International Conference on Information Theory and Information Security > 170 - 173

2010 IEEE International Conference on Information Theory and Information Security

This paper presents the outer-round only pipelined architecture for a FPGA implementation of the AES-128 encryption processor. The proposed design uses the Block RAM storing the S-box values and exploits two kinds of Block RAM. By combining the operations in a single round, we can reduce the critical delay. Therefore, our design can achieve a throughput of 34.7 Gbps at 271.15 Mhz and 2389 CLB Slices...

chapter

A hybrid dual-core Reconfigurable Processor for EBCOT tier-1 encoder in JPEG2000 on next generation of digital cameras

Xin Zhao, A T Erdogan, T Arslan

2010 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 84 - 89

2010 Conference on Design and Architectures for Signal and Image Processing (DASIP 2010)

In this paper, we present a JPEG2000 EBCOT tier-1 encoder based on a hybrid dual-core processor composed of a coarse-grained Dynamically Reconfigurable Processor (DRP) and an ARM core targeting next generation of cameras. The complete EBCOT tier-1 encoder is partitioned into two tasks and mapped onto the two cores respectively according to different potentials of the two processors. A Partial Parallel...

chapter

Multi-parallel Architecture for MD5 Implementations on FPGA with Gigabit-level Throughput

Dongjing He, Zhi Xue

2010 International Symposium on Intelligence Information Processing and Trusted Computing > 535 - 538

2010 International Symposium on Intelligence Information Processing and Trusted Computing (IPTC 2010)

Multi-parallel architecture for MD5 (Message-Digest Algorithm 5) implemented on FPGA (Field-Programmable Gate Array) is presented in this paper. To accelerate the speed, a general architecture for Host Computer and FPGAs is proposed. The MD5 implementation is presented. Besides the internal parallelization of MD5 modules, FPGAs can be easily duplicated and connected to Ethernet LAN. The design was...

chapter

Parallel deblocking filter for H.264 AVC/SVC

S Vijay, C Chakrabarti, L J Karam

2010 IEEE Workshop On Signal Processing Systems > 116 - 121

2010 IEEE Workshop on Signal Processing Systems (SiPS 2010)

This paper presents a parallel and scalable solution for adaptive deblocking filtering in H.264/AVC. While traditionally in deblocking filtering, the edges in a macroblock are processed in a sequential order, this paper demonstrates how algorithm modifications can be used to enable processing multiple consecutive edges at the same time. The proposed method increases the throughput in proportion to...

chapter

A high throughput parallel architecture for category specific Deep Packet Inspection

Velacheri Jagadeesan Sananda

Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010) > 93 - 94

2010 8th IEEE/ACM International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010)

This paper describes a Field Programmable Gate Array hardware based Deep Packet Inspection Engine that uses regular expression matchers to simultaneously categorize and look for malicious signatures in Ethernet packets. This was a submission to the 2010 MEMOCODE Design Contest. It is the fastest Xilinx FPGA based design with a throughput of 734 Mbit/sec and the 2^nd fastest overall, out of all designs...

chapter

Efficient hardware support for the Partitioned Global Address Space

Holger Froning, Heiner Litz

2010 IEEE International Symposium on Parallel&Distributed Processing, Workshops and Phd Forum (IPDPSW) > 1 - 6

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW 2010)

We present a novel architecture of a communication engine for non-coherent distributed shared memory systems. The shared memory is composed by a set of nodes exporting their memory. Remote memory access is possible by forwarding local load or store transactions to remote nodes. No software layers are involved in a remote access, neither on origin or target side: a user level process can directly access...

chapter

High throughput multiple-precision GCD on the CUDA architecture

N. Fujimoto

2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 507 - 512

2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2009)

Investigation of the cryptanalytic strength of RSA cryptography requires computing many GCDs of two long integers (e.g., of length 1024 bits). This paper presents a high throughput parallel algorithm to perform many GCD computations concurrently on a GPU based on the CUDA architecture. The experiments with an NVIDIA GeForce GTX285 GPU and a single core of 3.0 GHz Intel Core2 Duo E6850 CPU show that...

chapter

LOP_RE: Range encoding for low power packet classification

Xin He, J. Peddersen, S. Parameswaran

2009 IEEE 34th Conference on Local Computer Networks > 137 - 144

2009 IEEE 34th Conference on Local Computer Networks (LCN 2009)

State-of-the-art hardware based techniques achieve high performance and maximize efficiency of packet classification applications. The predominant example of these, ternary content addressable memory (TCAM) based packet classification systems can achieve much higher throughput than software-based techniques. However, they suffer from high power consumption due to the highly parallel architecture and...

chapter

Design and implementation of image compression core based on CCSDS algorithm

Xiaodong Gu, Huaichao Wang, Xuequan Zhang, Shijun Xu

2009 4th International Conference on Computer Science&Education > 1873 - 1876

2009 4th International Conference on Computer Science & Education (ICCSE 2009)

Image compression is one of the major services in space flight mission and remote sensing system. This paper presents a high speed and high performance image compressor for future spacecrafts and micro-satellites. The proposed compression core is based on a modified CCSDS algorithm and the processing speed is improved 40 times using parallel architecture and pipeline technique. The experimental results...

chapter

A small-area parallel-pipeline architecture for MTO-convolutional encoders

H. Jaber, F. Monteiro, A. Dandache

2009 Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference > 1 - 4

2009 Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference (NEWCAS-TAISA)

In this paper, we propose a new parallel-pipeline approach to design small-area low complexity convolutional encoders, suitable for high data throughput communication applications. This approach can apply both to the OTM (one to many) and the MTO (many to one) encoder schemes. Here, we will discuss the problem of designing a low cost parallel-pipeline encoder for the MTO case. The new architecture...

chapter

An effective fast and small-area parallel-pipeline architecture for OTM-convolutional encoders

H. Jaber, F. Monteiro, A. Dandache

2009 15th IEEE International On-Line Testing Symposium > 257 - 261

2009 15th IEEE International On-Line Testing Symposium (IOLTS 2009)

With the ever increasing data throughputs required by communication application, there is an actual need for new effective architectures (small area and high speed) for circuit parts dedicated to error detecting/correcting coding (EDC/ECC). In this paper, we propose a new parallel-pipeline design scheme for convolution encoders that meets these requisites. This approach apply both to the OTM (One...

chapter

Flexible GF(2^m) divider design for cryptographic applications

Wen-Ching Lin, Ming-Der Shieh, Chien-Ming Wu

2009 IEEE International Symposium on Circuits and Systems > 25 - 28

2009 IEEE International Symposium on Circuits and Systems - ISCAS 2009

In cryptographic applications, private key algorithms usually aim at high-throughput data communication, while public key algorithms require much lower throughput for private key exchange and authentication. To increase hardware utilization and reduce area overhead, this paper presents a flexible divider design in GF(2^m), which can be configured to operate in either SIMD or SISD mode. When applied...

chapter

VLSI implementation of a soft bit-flipping decoder for PG-LDPC codes

Junho Cho, Jonghong Kim, Hyunwoo Ji, Wonyong Sung

2009 IEEE International Symposium on Circuits and Systems > 908 - 911

2009 IEEE International Symposium on Circuits and Systems - ISCAS 2009

Implementation of high throughput VLSI chips for low-density parity-check codes has been considered very difficult especially when the row or column weight of the code is high. In this paper, a projective-geometry (PG) LDPC code is implemented in VLSI employing the proposed soft bit flipping (SBF) algorithm. The SBF algorithm requires only simple interconnections, but its error correcting performance...

chapter

Superscalar power efficient Fast Fourier Transform FFT architecture

M. Ahsan, E. Elahi, W.A. Farooqi

2009 2nd International Conference on Computer, Control and Communication > 1 - 4

2009 2nd International Conference on Computer, Control and Communication

We develop Superscalar Architecture to compute fixed point FFT (Fast Fourier Transform). Some high-speed and time sensitive real time applications demand far better and efficient implementation of FFT and call for improved novel architectures. This account for bringing in place an embedded custom hardware for instance FPGA that helps us rally things in parallel yielding better performance. We take...

chapter

Distributed peak power management for many-core architectures

J. Sartori, R. Kumar

2009 Design, Automation&Test in Europe Conference&Exhibition > 1556 - 1559

2009 Design, Automation & Test in Europe Conference & Exhibition (DATE'09)

Recently proposed techniques for peak power management involve centralized decision-making and assume quick evaluation of the various power management states. These techniques do not prevent instantaneous power from exceeding the peak power budget, but instead trigger corrective action when the budget has been exceeded. Similarly, they are not suitable for many-core architectures (processors with...

chapter

SCORES: A scalable and parametric streams-based communication architecture for modular reconfigurable systems

A. Jara-Berrocal, A. Gordon-Ross

2009 Design, Automation&Test in Europe Conference&Exhibition > 268 - 273

2009 Design, Automation & Test in Europe Conference & Exhibition (DATE'09)

Parallel architectures have become an increasingly popular method in which to achieve high performance with low power consumption. In order to leverage these benefits, applications are decomposed into multiple computational modules (tasks) that collectively operate and communicate in parallel. In this paper, we present a scalable and highly parametric streams-based communication architecture for inter-module...

INFONA - science communication portal

Search results

A high-performance VLSI architecture for variable block size motion estimation

A Dynamic Core Grouping Approach to Improve Raw Architecture Many-core Processor Performance

Synthesis and comparison of low-power high-throughput architectures for SAD calculation

Accelerated biomedical simulations using the FDTD method and the CUDA architecture

Pipelined implementation of AES encryption based on FPGA

A hybrid dual-core Reconfigurable Processor for EBCOT tier-1 encoder in JPEG2000 on next generation of digital cameras

Multi-parallel Architecture for MD5 Implementations on FPGA with Gigabit-level Throughput

Parallel deblocking filter for H.264 AVC/SVC

A high throughput parallel architecture for category specific Deep Packet Inspection

Efficient hardware support for the Partitioned Global Address Space

High throughput multiple-precision GCD on the CUDA architecture

LOP_RE: Range encoding for low power packet classification

Design and implementation of image compression core based on CCSDS algorithm

A small-area parallel-pipeline architecture for MTO-convolutional encoders

An effective fast and small-area parallel-pipeline architecture for OTM-convolutional encoders

Flexible GF(2^m) divider design for cryptographic applications

VLSI implementation of a soft bit-flipping decoder for PG-LDPC codes

Superscalar power efficient Fast Fourier Transform FFT architecture

Distributed peak power management for many-core architectures

SCORES: A scalable and parametric streams-based communication architecture for modular reconfigurable systems

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options