Search results

Items from 41 to 60 out of 524 results

chapter

Superlinear speedup in HPC systems: Why and when?

Sasko Ristov, Radu Prodan, Marjan Gusev, Karolj Skala

2016 Federated Conference on Computer Science and Information Systems (FedCSIS) > 889 - 898

2016 Federated Conference on Computer Science and Information Systems (FedCSIS)

The speedup is usually limited by two main laws in high-performance computing, that is, the Amdahl's and Gustafson's laws. However, the speedup sometimes can reach far beyond the limited linear speedup, known as superlinear speedup, which means that the speedup is greater than the number of processors that are used. Although the superlinear speedup is not a new concept and many authors have already...

chapter

Hardware architecture for positive definite matrix inversion based on LDL decomposition and back-substitution

Carl Ingemarsson, Oscar Gustafsson

2016 50th Asilomar Conference on Signals, Systems and Computers > 859 - 863

2016 50th Asilomar Conference on Signals, Systems and Computers

In this paper we propose an efficient hardware architecture for computation of matrix inversion of positive definite matrices. The algorithm chosen is LDL decomposition followed directly by equation system solving using back substitution. The architecture combines a high throughput with an efficient utilization of its hardware units. We also report FPGA implementation results that show that the architecture...

chapter

Area-efficient one-cycle correction scheme for timing errors in flip-flop based pipelines

Jongeun Koo, Eunwoo Song, Eunhyeok Park, Dongyoung Kim, more

2016 IEEE Asian Solid-State Circuits Conference (A-SSCC) > 137 - 140

2016 IEEE Asian Solid-State Circuits Conference (A-SSCC)

We propose a new timing error correction scheme for area-efficient design of flip-flop based pipeline. Key features in the proposed scheme are 1) one-cycle error correction using a new local stalling scheme and 2) selective replacement of the error detection and correction flip-flops in critical paths only. A 32-bit MIPS testchip in a 65 nm CMOS technology has been implemented as a testbed. By employing...

chapter

Preliminary Investigation of Mobile System Features Potentially Relevant to HPC

David D Pruitt, Eric A Freudenthal

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC) > 54 - 60

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)

Energy consumption's increasing importance in scientific computing has driven an interest in developing energy efficient high performance systems. Energy constraints of mobile computing has motivated the design and evolution of low-power computing systems capable of supporting a variety of compute-intensive user interfaces and applications. Others have observed the evolution of mobile devices to also...

chapter

Pipelined implementation of Camellia encryption algorithm

Zoran Cica

2016 24th Telecommunications Forum (TELFOR) > 1 - 4

2016 24th Telecommunications Forum (TELFOR)

In modern communication networks, the security aspect is very important. Encryption algorithms are used to protect user communication from eavesdropping. Symmetric key algorithms must be used to achieve high speed secured communication. In this paper, we propose and evaluate the pipelined implementation of the Camellia encryption algorithm which has been approved for use by the ISO/IEC. Camellia algorithm...

chapter

SpaceFibre Port IP Core (GRSPFI): SpaceFibre, poster paper

Felix Siegle, Sandi Habinc, Johannes Both

2016 International SpaceWire Conference (SpaceWire) > 1 - 5

2016 International SpaceWire Conference (SpaceWire)

Cobham Gaisler presents the SpaceFibre Port IP Core implementation GRSPFI. A fully validated VHDL implementation is readily available.

chapter

90 nm 12.5 Gbit/s physical interface per SoC with SpaceFibre/GigaSpaceWire links for the space radars: Components, short paper

Dmitri Skok, Tatiana Solokhina, Jaroslav Petrichkovich, Juri Gerasimov

2016 International SpaceWire Conference (SpaceWire) > 1 - 3

2016 International SpaceWire Conference (SpaceWire)

The article presents 12.5 Gbit/s Physical Media Attachment (PMA) units, TX and RX, fabricated in 90 nm bulk CMOS process. The PMA are designed for use in SpaceFibre/GigaSpaceWire (SpaceWire-RUS) systems for the space radars. The units comprise SERDES and clock and data recovery (CDR). Supported set of data rates includes those of 1.25, 2.5, 6.25 and 12.5 Gbit/s, but intermediate rates are also available.

chapter

Modified lifting scheme algorithm for DWT with optimized latency & throughput and FPGA implementation for low power & area

S. Murali Mohan, P. Sathyanarayana

2016 IEEE International Conference on Advances in Computer Applications (ICACA) > 351 - 356

2016 IEEE International Conference on Advances in Computer Applications (ICACA)

The image processing applications require low power and high speed, the convolution based 1D-DWT is not desirable. In this proposed architecture the modified 5/3 lifting algorithm is realized on FPGA platform with optimizations. The latency and throughput is optimized with the modified algorithm. The architecture is modelled using HDL and implemented on FPGA. The proposal operates at 178MHz and realised...

chapter

LEGO-based VLSI design and implementation of polar codes encoder architecture with radix-2 processing engines

Xin-Yu Shih, Po-Chun Huang, Yu-Chun Chen

2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) > 577 - 580

2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)

Polar Codes become a new channel coding, which will be common to apply for next-generation wireless MIMO communication systems. In this work, we propose LEGO-based VLSI hardware design and implementation of the Polar encoder using radix-2 processing engines, which features low area cost, low power dissipation, high speed, and high throughput via serial computation. Under TSMC 90nm CMOS technology,...

chapter

On using the cyclically-coupled QC-LDPC codes in future SSDs

Qing Lu, Chiu-Wing Sham, Francis C. M. Lau

2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) > 625 - 628

2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)

As the flash memory continues its capacity scaling and correspondingly decreases its reliability, a technology upgrade regarding the error-correction engine in state-of-art solid-state drives (SSDs) is intensely expected. Due to their limit-approaching decoding ability, low-density parity-check (LDPC) codes are seen as one of the most promising substitute for the traditional BCH codes, though implementation...

chapter

A pipelined time stretching for high throughput counter-based time-to-digital converters

Seongheon Shin, Hyung-Joun Yoo

2016 International SoC Design Conference (ISOCC) > 57 - 58

2016 International SoC Design Conference (ISOCC)

This paper proposes a pipelined time stretching technique for high throughput counter-based time-to-digital converters (TDC). Time stretching technique is used to increase the resolution of counter-based TDCs, yet it carries an inherent weakness of having a long conversion time due to the stretching phase. Without significant increment of chip area, the proposed pipelined time stretching method is...

chapter

High-speed low-area-cost VLSI design of polar codes encoder architecture using radix-k processing engines

Xin-Yu Shih, Po-Chun Huang, Yu-Chun Chen

2016 IEEE 5th Global Conference on Consumer Electronics > 1 - 2

2016 IEEE 5th Global Conference on Consumer Electronics

Polar Codes applied for next-generation MIMO systems is an emerging research topic. In this work, we propose an efficient VLSI hardware architecture of the Polar encoder using radix-k processing engines. Under TSMC 90nm CMOS technology, the 16384-point radix-2 based Polar encoder design is synthesized with 0.244mm² under maximum clock frequency of 2.0GHz. In the similar manner, the VLSI hardware can...

chapter

Design of low latency successive cancellation decoder for polar codes

Zheyan Piao, Jin-Gyun Chung

2016 International SoC Design Conference (ISOCC) > 293 - 294

2016 International SoC Design Conference (ISOCC)

Polar codes have recently become increasingly popular due to their simple structure and low decoding complexity. However, polar codes are still not suitable for real-time applications because of the long decoding latency. In this paper, by analysis of the conventional architecture of SC decoder, a low latency SC decoder architecture is proposed. Using the proposed architecture, the decoding latency...

chapter

Improvement of Line Coding Overhead Targeting Both Run-Length and DC-Balance

Sarat Yoowattana, Tomohiro Yoneda

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 15 - 22

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

High-speed serial data communication is now very popular for connecting various resources in high-performance computing systems. In such high-speed serial links, a line coding is important to control the run length (RL) and the running disparity (RD), because a large run length causes insufficient transitions on data-links that make it difficult to perform reliable clock and data recovery (CDR), and...

chapter

Low cost resilient regular expression matching on FPGAs

Marcos T. Leipnitz, Eduardo Nunes de Souza, Gabriel L. Nazar

2016 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) > 75 - 80

2016 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

The Network Function Virtualization (NFV) paradigm promises to make networks more scalable and flexible by decoupling the network functions (NFs) from dedicated and vendor-specific hardware. However, network and compute intensive NFs may be difficult to virtualize without performance degradation. In this context, Field-Programmable Gate Arrays (FPGAs) have been shown to be a good option for hardware...

chapter

Sharing a global on-chip transmission line medium without centralized scheduling

Yashar Asgarieh, Bill Lin

2016 Tenth IEEE/ACM International Symposium on Networks-on-Chip (NOCS) > 1 - 8

2016 Tenth IEEE/ACM International Symposium on Networks-on-Chip (NOCS)

We consider the design of a shared global on-chip communication medium using repeated equalized transmission lines (RETLs). Our design overcomes a number of limitations with previously proposed shared global mediums based on transmission lines. Prior solutions require wide-pitch transmission lines that occupy considerable area, do not support multicast or broadcast operations, and employ centralized...

chapter

Memory efficient and high performance key-value store on FPGA using Cuckoo hashing

Wei Liang, Wenbo Yin, Ping Kang, Lingli Wang

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

Key-value stores (KVS) become critical in many applications because of the data explosion recently. There is a strong demand to improve the throughput and reduce the latency for KVS. FPGA-based parallel architecture can bring excellent performance and power efficiency. Cuckoo hashing has proven to be an efficient approach to implement KVS with good memory utilization and constant worst case access...

chapter

Improved resource sharing for FPGA DSP blocks

Bajaj Ronak, Suhaib A. Fahmy

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

Sharing multi-cycle hardware blocks like the DSP48E1 primitive in Xilinx FPGAs can result in significant resource savings, but complicates scheduling. For high-throughput, DSP blocks must be pipelined, which results in a high initiation interval (II) for resource shared implementations. In this paper, we propose a resource reduction technique that minimises DSP block usage while also offering improved...

chapter

Exploring the use of shift register lookup tables for Keccak implementations on Xilinx FPGAs

Jori Winderickx, Joan Daemen, Nele Mentens

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

We explore the possibility of using shift register lookup tables (SRLs) for the implementation of Keccak on Xilinx FPGAs. The approach originates from the observation that the ρ step in combination with the state storage can be implemented as a collection of shift registers. This way, we achieve a slice-wise implementation using 25 shift registers of various lengths, resulting in 75 32-bit and 6 16-bit...

chapter

Experimental demonstration of super-TDMA: A MAC protocol exploiting large propagation delays in underwater acoustic networks

Prasad Anjangi, Mandar Chitre

2016 IEEE Third Underwater Communications and Networking Conference (UComms) > 1 - 5

2016 IEEE Third Underwater Communications and Networking Conference (UComms)

The potential of exploiting large propagation delays in underwater acoustic (UWA) networks to maximize the network throughput is established in the recent past. Transmission scheduling strategies have been proposed to take advantage of large propagation delay. Super-TDMA is one among such Medium Access Control (MAC) strategies proposed. It is a form of Time Division Multiple Access (TDMA) protocol...

Keywords:
THROUGHPUT
CLOCKS

Publication date

Set your own date range

Content availability

Available (515)
None (9)

Keywords

HARDWARE (141)
FIELD PROGRAMMABLE GATE ARRAYS (140)
COMPUTER ARCHITECTURE (133)
DECODING (79)
REGISTERS (72)
FPGA (62)
PIPELINES (56)
ALGORITHM DESIGN AND ANALYSIS (52)
PIPELINE PROCESSING (50)
RANDOM ACCESS MEMORY (46)
CRYPTOGRAPHY (42)
DELAY (40)
ADDERS (38)
SYNCHRONIZATION (38)
LOGIC GATES (37)
DELAYS (34)
PROTOCOLS (31)
CMOS INTEGRATED CIRCUITS (30)
POWER DEMAND (27)
ENCRYPTION (26)
MICROPROCESSOR CHIPS (26)
PROGRAM PROCESSORS (25)
ENCODING (24)
PARITY CHECK CODES (23)
OPTIMIZATION (21)
ROUTING (21)
LOGIC DESIGN (20)
STANDARDS (20)
COMPLEXITY THEORY (19)
MEMORY MANAGEMENT (18)
NETWORK-ON-CHIP (18)
TABLE LOOKUP (18)
BANDWIDTH (17)
IP NETWORKS (17)
PARALLEL PROCESSING (17)
SWITCHES (17)
VLSI (17)
SCHEDULES (16)
ITERATIVE DECODING (15)
MIMO (15)
SOFTWARE (15)
POLYNOMIALS (14)
RECEIVERS (14)
WIRES (14)
ARRAYS (13)
LATCHES (13)
MULTICORE PROCESSING (13)
PERFORMANCE EVALUATION (13)
TIMING (13)
APPLICATION SPECIFIC INTEGRATED CIRCUITS (12)
ENERGY CONSUMPTION (12)
FLIP-FLOPS (12)
GENERATORS (12)
POWER CONSUMPTION (12)
SYSTEM-ON-A-CHIP (12)
SYSTEM-ON-CHIP (12)
ADVANCED ENCRYPTION STANDARD (11)
BENCHMARK TESTING (11)
DATA MINING (11)
DETECTORS (11)
MICROPROCESSORS (11)
MULTIPROCESSING SYSTEMS (11)
PORTS (COMPUTERS) (11)
QUALITY OF SERVICE (11)
SHIFT REGISTERS (11)
WIRELESS LAN (11)
BIT ERROR RATE (10)
FPGA IMPLEMENTATION (10)
INDEXES (10)
PARALLEL ARCHITECTURES (10)
RADIATION DETECTORS (10)
SECURITY (10)
TURBO CODES (10)
CIPHERS (9)
CMOS TECHNOLOGY (9)
DIGITAL SIGNAL PROCESSING (9)
ENERGY EFFICIENCY (9)
ENGINES (9)
ERROR CORRECTION (9)
MONITORING (9)
OFDM (9)
POWER AWARE COMPUTING (9)
POWER DISSIPATION (9)
RECONFIGURABLE ARCHITECTURES (9)
REED-SOLOMON CODES (9)
RELIABILITY (9)
SCHEDULING (9)
SERVERS (9)
SIZE 0.18 MUM (9)
VIDEO CODING (9)
WIMAX (9)
AES (8)
ASIC (8)
EMBEDDED SYSTEMS (8)
FFT (8)
FIELD PROGRAMMABLE GATE ARRAY (8)
HARDWARE DESCRIPTION LANGUAGES (8)
IMAGE CODING (8)
more

INFONA - science communication portal

Search results

Superlinear speedup in HPC systems: Why and when?

Hardware architecture for positive definite matrix inversion based on LDL decomposition and back-substitution

Area-efficient one-cycle correction scheme for timing errors in flip-flop based pipelines

Preliminary Investigation of Mobile System Features Potentially Relevant to HPC

Pipelined implementation of Camellia encryption algorithm

SpaceFibre Port IP Core (GRSPFI): SpaceFibre, poster paper

90 nm 12.5 Gbit/s physical interface per SoC with SpaceFibre/GigaSpaceWire links for the space radars: Components, short paper

Modified lifting scheme algorithm for DWT with optimized latency & throughput and FPGA implementation for low power & area

LEGO-based VLSI design and implementation of polar codes encoder architecture with radix-2 processing engines

On using the cyclically-coupled QC-LDPC codes in future SSDs

A pipelined time stretching for high throughput counter-based time-to-digital converters

High-speed low-area-cost VLSI design of polar codes encoder architecture using radix-k processing engines

Design of low latency successive cancellation decoder for polar codes

Improvement of Line Coding Overhead Targeting Both Run-Length and DC-Balance

Low cost resilient regular expression matching on FPGAs

Sharing a global on-chip transmission line medium without centralized scheduling

Memory efficient and high performance key-value store on FPGA using Cuckoo hashing

Improved resource sharing for FPGA DSP blocks

Exploring the use of shift register lookup tables for Keccak implementations on Xilinx FPGAs

Experimental demonstration of super-TDMA: A MAC protocol exploiting large propagation delays in underwater acoustic networks

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options