Search results

chapter

Scalable memory architecture for soft-core processors

Tiago T. Jost, Gabriel L. Nazar, Luigi Carro

2016 IEEE 34th International Conference on Computer Design (ICCD) > 396 - 399

2016 IEEE 34th International Conference on Computer Design (ICCD)

Restrictions over memory performance have always had a great impact on soft-core processors. The reduced number of ports on FPGAs' block RAMs may limit the exploitation of parallelism on soft-core processors that are implemented on top of these devices. Multiple memory ports on FPGAs are cumbersome and do not scale well, having a high cost in area and power consumption when implemented. In order to...

chapter

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Xushen Han, Dajiang Zhou, Shihao Wang, Shinji Kimura

2016 IEEE 34th International Conference on Computer Design (ICCD) > 320 - 327

2016 IEEE 34th International Conference on Computer Design (ICCD)

Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained...

chapter

Memos: A full hierarchy hybrid memory management framework

Lei Liu, Hao Yang, Yong Li, Mengyao Xie, more

2016 IEEE 34th International Conference on Computer Design (ICCD) > 368 - 371

2016 IEEE 34th International Conference on Computer Design (ICCD)

In this paper, we introduce memos, which integrates suitable memory management policies and schedules resources over the entire memory hierarchy in hybrid memory system. Powered by an OS kernel level monitoring tool, memos captures memory patterns online, and then leverages them to guide the memory page placement and data mapping. Experimental results show, on average, memos can benefit memory utilization,...

chapter

LiBek II: A novel compression architecture using adaptive dictionary

Lishamol Philip, K. M. Abubeker

2016 International Conference on Emerging Technological Trends (ICETT) > 1 - 4

2016 International Conference on Emerging Technological Trends (ICETT)

Data compression is a science of representing actual information in a more compact form by reducing its size to some extend. Reliable and efficient data compression accompanies least memory usage and less computational complexity. This work proposes a lossless compression technique for the efficient compression of both images and text files. LiBek II is an adaptive dictionary based algorithm in which...

chapter

8B9B encoding for crosstalk reduction in a high-speed parallel bus

Sunil Sudhakaran, Russell Newcomb

2016 IEEE 25th Conference on Electrical Performance Of Electronic Packaging And Systems (EPEPS) > 29 - 32

2016 IEEE 25th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)

This paper presents a new encoding and corresponding decoding scheme to reduce crosstalk on a high-speed parallel bus. The scheme is based on a modified Fibonacci sequence and is introduced along with potential benefits in some upcoming memory interfaces. The scheme provides appreciable eye opening for interfaces dominated by crosstalk such as existing memory interfaces.

chapter

NeuroSensor: A 3D image sensor with integrated neural accelerator

M. F. Amir, D. Kim, J. Kung, D. Lie, more

2016 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S) > 1 - 2

2016 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S)

3D integration provides opportunities to design high-bandwidth and low-power CMOS image sensors (CIS) [1–4]. The 3D stacking of pixel tier, peripheral tier, memory tier(s), and compute tier(s) enables high degree of parallel processing. Also, each tier can be designed in different technology nodes (heterogeneous integration) to further improve power-efficiency. This paper presents a case study of...

chapter

Improving Collective I/O Performance Using Non-volatile Memory Devices

Giuseppe Congiu, Sai Narasimhamurthy, Tim SuB, Andre Brinkmann

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 120 - 129

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Collective I/O is a parallel I/O technique designed to deliver high performance data access to scientific applications running on high-end computing clusters. In collective I/O, write performance is highly dependent upon the storage system response time and limited by the slowest writer. The storage system response time in conjunction with the need for global synchronisation, required during every...

chapter

How naive is naive SpMV on the GPU?

Markus Steinberger, Andreas Derlery, Rhaleb Zayer, Hans-Peter Seidel

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 8

2016 IEEE High Performance Extreme Computing Conference (HPEC)

Sparse matrix vector multiplication (SpMV) is the workhorse for a wide range of linear algebra computations. In a serial setting, naive implementations for direct multiplication and transposed multiplication achieve very competitive performance. In parallel settings, especially on graphics hardware, it is widely believed that naive implementations cannot reach the performance of highly tuned parallel...

chapter

Towards Resource Disaggregation — Memory Scavenging for Scientific Workloads

Alexandru Uta, Ana-Maria Oprescu, Thilo Kielmann

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 100 - 109

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Compute clusters, consisting of many, uniformly built nodes, are used to run a large spectrum of different workloads, like tightly coupled (MPI) jobs, MapReduce, or graph-processing data-analytics applications, each of which with their own resource requirements. Many studies consistently highlight two types of under-utilized cluster resources: memory (up to 50%) and network. In this work, we take...

chapter

Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters

Toshio Endo

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 21 - 29

2016 IEEE International Conference on Cluster Computing (CLUSTER)

The memory wall problem is one of major obstacles against the realization of extremely fast and large scale simulations. Stencil computations, which are important kernels for CFD simulations, have been highly successful on GPU clusters in speed, due to high memory bandwidth and computation speed of accelerators. However, their problem scales have been limited by small capacity of GPU device memory...

chapter

Exploring Data Migration for Future Deep-Memory Many-Core Systems

Swann Perarnau, Judicael A. Zounmevo, Balazs Gerofi, Kamil Iskra, more

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 289 - 297

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Upcoming high-performance computing (HPC) platforms will have more complex memory hierarchies with high-bandwidth on-package memory and in the future also non-volatile memory. How to use such deep memory hierarchies effectively remains an open research question. In this paper we evaluate the performance implications of a scheme based on a software-managed scratchpad with coarse-grained memory-copy...

chapter

On-chip memory efficient data layout for 2D FFT on 3D memory integrated FPGA

Shreyas G. Singapura, Rajgopal Kannan, Viktor K. Prasanna

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

3D memories are becoming viable solutions for the memory wall problem and meeting the bandwidth requirements of memory intensive applications. The high bandwidth provided by 3D memories does not translate to a proportional increase in performance for all applications. For an application such as 2D FFT with strided access patterns, the data layout of the memory has a significant impact on the total...

chapter

An efficient motion estimation hardware architecture using Modified Reference Data Access(MRDAS) skip algorithm for high Efficiency Video Coding(HEVC) encoder

Seongmo Park, Byoung Gun Choi, In Gi Lim, Hyung-il Park, more

2016 IEEE 6th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) > 85 - 89

2016 IEEE 6th International Conference on Consumer Electronics - Berlin (ICCE-Berlin)

In this paper, we propose an efficient motion estimation hardware architecture for High Efficiency Video Coding (HEVC) using a Modified Reference Data Access Skip (MRDAS) for reducing the minimum memory bandwidth. The memory bandwidth is responsible for the throughput limitations in motion estimation, especially when dealing with high quality video of a large frame size and search range. This architecture...

chapter

Low-latency TCP/IP stack for data center applications

David Sidler, Zsolt Istvan, Gustavo Alonso

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

TCP/IP is widely used both in the Internet as well as in data centers. The protocol makes very few assumptions about the underlying network and provides useful guarantees such as reliable transmission, in-order delivery, or control flow. The price for this functionality is complexity, latency, and computational overhead, which is especially pronounced in software implementations. While for Internet...

chapter

HPC Accelerators with 3D Memory

Manuel Ujaldon

2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES) > 320 - 328

2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES)

After a decade evolving in the High Performance Computing arena, GPU-equipped supercomputers have conquered the top500 and green500 lists, providing us unprecedented levels of computational power and memory bandwidth. This year, major vendors have introduced new accelerators based on 3D memory, like Xeon Phi Knights Landing by Intel and Pascal architecture by Nvidia. This paper reviews hardware features...

chapter

POWER9: Processor for the cognitive era

Brian Thompto

2016 IEEE Hot Chips 28 Symposium (HCS) > 1 - 19

2016 IEEE Hot Chips 28 Symposium (HCS)

This article consists only of a collection of slides from the author's conference presentation on the POWER9 processor.

chapter

The bifrost GPU architecture and the ARM Mali-G71 GPU

Jem Davies

2016 IEEE Hot Chips 28 Symposium (HCS) > 1 - 31

2016 IEEE Hot Chips 28 Symposium (HCS)

■ Leverages Mali's scalable architecture ■ Scalable to 32 shader cores ■ Major shader core redesign ■ New scalar, clause-based ISA ■ New quad-based arithmetic units ■ New geometry data flow ■ Reduces memory bandwidth and footprint ■ Support for fine grain buffer sharing with the CPU

chapter

Parallel-DFTL: A Flash Translation Layer That Exploits Internal Parallelism in Solid State Drives

Wei Xie, Yong Chen, Philip C. Roth

2016 IEEE International Conference on Networking, Architecture and Storage (NAS) > 1 - 10

2016 IEEE International Conference on Networking, Architecture and Storage (NAS)

Solid State Drives (SSDs) using flash memory storage technology present a promising storage solution for data-intensive applications due to their low latency, high bandwidth, and low power consumption compared to traditional hard disk drives. SSDs achieve these desirable characteristics using internal parallelism - parallel access to multiple internal flash memory chips - and a Flash Translation Layer...

chapter

Potential future research in computing: Heterogeneous systems, memory subsystems — Process-in-storage, or not to process-in-storage? That is the question

Uri Weiser

2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS) > i

2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)

The era of Heterogeneous systems and Big Data computing is already here. Handling huge amount of data poses new challenges in data management and in the effective usage of memory, caches, heterogeneous structures and available bandwidth. In addition, computing requirements of Big Data is unique; in many occasions the processing required per storage access is limited (i.e. low Instructions/Byte) which...

chapter

Locality-Aware Concurrency-Driven Issue

Zhan Shi

2016 International Conference on Information System and Artificial Intelligence (ISAI) > 208 - 213

2016 International Conference on Information System and Artificial Intelligence (ISAI)

The contribution of memory latency to execution time keeps increasing in modern memory systems. The hierarchical memory based on locality is the design to alleviate this effect. However, a modern memory system is also supported by various concurrency-driven technologies and the effect of leveraging locality with the consideration of concurrency becomes uncertain. We found that concurrency-driven technologies...

INFONA - science communication portal

Search results

Scalable memory architecture for soft-core processors

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Memos: A full hierarchy hybrid memory management framework

LiBek II: A novel compression architecture using adaptive dictionary

8B9B encoding for crosstalk reduction in a high-speed parallel bus

NeuroSensor: A 3D image sensor with integrated neural accelerator

Improving Collective I/O Performance Using Non-volatile Memory Devices

How naive is naive SpMV on the GPU?

Towards Resource Disaggregation — Memory Scavenging for Scientific Workloads

Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters

Exploring Data Migration for Future Deep-Memory Many-Core Systems

On-chip memory efficient data layout for 2D FFT on 3D memory integrated FPGA

An efficient motion estimation hardware architecture using Modified Reference Data Access(MRDAS) skip algorithm for high Efficiency Video Coding(HEVC) encoder

Low-latency TCP/IP stack for data center applications

HPC Accelerators with 3D Memory

POWER9: Processor for the cognitive era

The bifrost GPU architecture and the ARM Mali-G71 GPU

Parallel-DFTL: A Flash Translation Layer That Exploits Internal Parallelism in Solid State Drives

Potential future research in computing: Heterogeneous systems, memory subsystems — Process-in-storage, or not to process-in-storage? That is the question

Locality-Aware Concurrency-Driven Issue

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options