Search results for: . .

Items from 1 to 16 out of 16 results

chapter

A novel ReRAM-based processing-in-memory architecture for graph computing

Lei Han, Zhaoyan Shen, Zili Shao, H. Howie Huang, more

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA) > 1 - 6

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA)

Graph algorithms such as breadth-first search (BFS) have been gaining ever-increasing importance in the era of Big Data. However, the memory bandwidth remains the key performance bottleneck for graph processing. To address this problem, we utilize processing-in-memory (PIM), combined with non-volatile metal-oxide resistive random access memory (ReRAM), to improve the performance of both computation...

chapter

Small cache lookaside table for fast DRAM cache access

Xi Tao, Qi Zeng, Jih-Kwon Peir, Shih-Lien Lu

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) > 1 - 10

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)

Large off-die stacked DRAM caches have been proposed to provide higher effective bandwidth and lower average latency to main memory. Designing a large off-die DRAM cache with conventional block size requires a large tag array which is impractical to fit on-die. Placing the large directory off-die prolong the latency since a tag access is necessary before the data can be accessed. This additional trip...

chapter

AHRC: An Optimized Cache Associativity

Malik Al-Manasia, Zenon Chaczko, Asma Ounzar

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 811 - 817

Hardware resources require efficient scaling because the future of computing technology seems to be intensive multithreaded. One of the main challenges in the scalability of computers hardware is the hierarchy of the memory. Chip-multiprocessors (CMPs) rely on large and multi-level hierarchies of caches to reduce cost of resources and improve systems performance. These multi-level hierarchies are...

chapter

Sparse Matrix Multiplication on a Reconfigurable Many-Core Architecture

Joao Pinhao, Wilson Jose, Horacio Neto, Mario Vestias

2015 Euromicro Conference on Digital System Design > 330 - 336

2015 Euromicro Conference on Digital System Design (DSD)

Sparse matrix-vector multiplication (SMVM) is a fundamental operation in many scientific and engineering applications. In many cases sparse matrices have thousands of rows and columns where most of the entries are zero, while non-zero data is spread over the matrix. This sparsity of data locality reduces the effectiveness of data cache in general-purpose processors quite reducing their performance...

chapter

A high-rate MSR code with polynomial sub-packetization level

Birenjith Sasidharan, Gaurav Kumar Agarwal, P. Vijay Kumar

2015 IEEE International Symposium on Information Theory (ISIT) > 2051 - 2055

2015 IEEE International Symposium on Information Theory (ISIT)

We present a high-rate (n, k, d = n − 1)-MSR code with a sub-packetization level that is polynomial in the dimension k of the code. While polynomial sub-packetization level was achieved earlier for vector MDS codes that repair systematic nodes optimally, no such MSR code construction is known. In the low-rate regime (i. e., rates less than one-half), MSR code constructions with a linear sub-packetization...

chapter

An alternate construction of an access-optimal regenerating code with optimal sub-packetization level

Gaurav Kumar Agarwal, Birenjith Sasidharan, P. Vijay Kumar

2015 Twenty First National Conference on Communications (NCC) > 1 - 6

2015 Twenty First National Conference on Communications (NCC)

Given the scale of today's distributed storage systems, the failure of an individual node is a common phenomenon. Various metrics have been proposed to measure the efficacy of the repair of a failed node, such as the amount of data download needed to repair (also known as the repair bandwidth), the amount of data accessed at the helper nodes, and the number of helper nodes contacted. Clearly, the...

chapter

IMP: Indirect memory prefetcher

Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, Srinivas Devadas

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 178 - 190

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular...

chapter

A Memory-Based Continuous Query Index for Stream Processing

Cuiwen Xiong, Peng Zhang, Yan Li, Shipeng Zhang, more

2014 IEEE International Congress on Big Data > 768 - 769

2014 IEEE International Congress on Big Data (BigData Congress)

Most of the "Big Data" applications, such as decision support and emergency response, must provide users with fresh, low latency results, especially for aggregation results on key performance metrics. However, disk-oriented approaches to online storage are becoming increasingly problematic. They do not scale grace-fully to meet the needs of large-scale Web applications, and improvements...

chapter

CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search

Osama G. Attia, Tyler Johnson, Kevin Townsend, Philip Jones, more

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 228 - 235

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS operations tends to be memory-bound rather than compute-bound. In this paper, we present an efficient reconfigurable architecture for parallel BFS that adopts new optimizations for utilizing...

chapter

Performance of a Structure-Detecting SpMV Using the CSR Matrix Representation

Hans Pabst, Bev Bachmayer, Michael Klemm

2012 11th International Symposium on Parallel and Distributed Computing > 3 - 10

2012 11th International Symposium on Parallel and Distributed Computing (ISPDC)

Sparse matrix-vector multiplication (SpMV) is an important building block for many scientific applications. Various formats exist to store and represent sparse matrices in the computer's memory. The compressed row storage format (CRS or CSR) is typically a baseline to report a new hybrid or an improved representation of sparse matrices. In this paper, we describe the implementation and performance...

chapter

A reconfigurable macro-pipelined systolic accelerator architecture

Wenqi Bao, Jiang Jiang, Yuzhuo Fu, Qing Sun

2011 International Conference on Field-Programmable Technology > 1 - 6

2011 International Conference on Field-Programmable Technology (FPT 2011)

In this paper, we propose a reconfigurable macro-pipelined systolic architecture (MAPS), which aims to accelerate multiply-accumulate based algorithms by exploiting the temporal parallelism. To illustrate the performance, we implement a 32-PE accelerator on the Xilinx ML605 experiment board for the matrix multiplication and get a peak performance of 51.2 GFLOPS (about 8.0 GFLOPS per PE per GHz). To...

chapter

Vector processor customization for FFT

Bogdan Spinean, Georgi Kuzmanov, Georgi Gaydadjiev

2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation > 110 - 117

2011 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XI)

Processors and memory systems suffer from a growing performance gap between them. Each technology generation increases the on-chip performance capabilities however, memory bandwidth increases at a much slower pace. Therefore, overall performance improvements are constrained by the available memory bandwidth. In this paper, we address the memory bandwidth problem of vector processors by introducing...

chapter

The ZCache: Decoupling Ways and Associativity

D Sanchez, C Kozyrakis

2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture > 187 - 198

2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2010)

The ever-increasing importance of main memory latency and bandwidth is pushing CMPs towards caches with higher capacity and associativity. Associativity is typically improved by increasing the number of ways. This reduces conflict misses, but increases hit latency and energy, placing a stringent trade-off on cache design. We present the zcache, a cache design that allows much higher associativity...

chapter

Optimizing Sparse Matrix Vector Multiplication Using Diagonal Storage Matrix Format

Liang Yuan, Yunquan Zhang, Xiangzheng Sun, Ting Wang

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC) > 585 - 590

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010)

Sparse matrix vector multiplication (SpMV) is used in many scientific computations. The main bottleneck of this algorithm is memory bandwidth and many methods reduce memory bandwidth usage by compressing the index array. The matrices from finite difference modeling applications often have several dense diagonals and sparse diagonals. For these matrices, the index array can be deleted by using diagonal...

chapter

Novel microwave devices using tunable negative index metamaterials and ferrites

P.V. Parimi, P. Peyton, J.M. Kunze, C. Vittoria, more

2009 IEEE International Workshop on Antenna Technology > 1 - 4

2009 IEEE International Workshop on Antenna Technology "Small Antennas and Novel Metamaterials"

Next generation microwave devices require to be multifunctional for efficient, cost effective operation in light weight, low volume structures. A miniature tunable negative index metamaterial phase shifter and an ultra wideband phased array antenna have been designed using ferrite materials. Negative permeability ferrite material in combination with negative permittivity of plasmonic wires produces...

chapter

A BISR Architecture for Embedded Memories

K. Pekmestzi, N. Axelos, I. Sideris, N. Moshopoulos

2008 14th IEEE International On-Line Testing Symposium > 149 - 154

14th IEEE International On-Line Testing Symposium

In this paper a BISR architecture for embedded memories is presented. The proposed scheme utilises a multiple bank cache-like memory for repairs. Statistical analysis is used for minimisation of the total resources required to achieve a very high fault coverage. Simulation results show that the proposed BISR scheme is characterised by high efficiency and low area overhead, even for high defect densities...

Filter options

Keywords:
ARRAYS
INDEXES
BANDWIDTH
Publication type:
book

Publication date

Set your own date range

Keywords

RANDOM ACCESS MEMORY (4)
SPARSE MATRICES (4)
ALGORITHM DESIGN AND ANALYSIS (3)
MAINTENANCE ENGINEERING (3)
MEMORY MANAGEMENT (3)
ACCELERATION (2)
CACHE (2)
CACHE STORAGE (2)
CONFERENCES (2)
CSR (2)
DISTRIBUTED STORAGE (2)
FPGA (2)
MATERIALS (2)
MATRIX MULTIPLICATION (2)
OPTIMIZATION (2)
POLYNOMIALS (2)
PROBABILITY (2)
PROGRAM PROCESSORS (2)
REGENERATING CODES (2)
REGISTERS (2)
SPARSE MATRIX (2)
SPARSE MATRIX VECTOR MULTIPLICATION (2)
SPMV (2)
SUB-PACKETIZATION (2)
SYSTEMATICS (2)
2-DIMENSIONAL CONVOLUTION (1)
ACCELERATOR ARCHITECTURE (1)
ACCESS-OPTIMAL (1)
AHRC (1)
ANALYTICAL MODELS (1)
ANTENNA PHASED ARRAYS (1)
ARIP (1)
ARRAY CODES (1)
ASSOCIATIVITY (1)
AUSTRALIA (1)
AVAILABILITY (1)
BANDWIDTH CURRENT SHEET PHASED ARRAY ANTENNA (1)
BANK CACHE-LIKE MEMORY (1)
BIG DATA (1)
BISR (1)
BISR ARCHITECTURE (1)
BREADTH-FIRST SEARCH (1)
BUILT-IN SELF-TEST (1)
CACHE ASSOCIATIVITY (1)
CAPACITY PLANNING (1)
CHIP MULTIPROCESSOR (1)
CIRCUIT FAULTS (1)
CMP (1)
COMPRESS SPARSE DIAGONAL (1)
COMPUTER AIDED MANUFACTURING (1)
COMPUTER ARCHITECTURE (1)
COMPUTERS (1)
CONTENT-ADDRESSABLE STORAGE (1)
CONVEY HC-2 (1)
CONVOLUTION (1)
COPPER (1)
CRS (1)
CUCKOO HASHING (1)
DATA MINING (1)
DATA MODELS (1)
DATA STREAM (1)
DECODING (1)
DECOUPLING WAY (1)
DELAY (1)
DENSE DIAGONALS (1)
DESIGN AUTOMATION (1)
DIAGONAL STORAGE (1)
DIAGONAL STORAGE MATRIX FORMAT (1)
DICTIONARIES (1)
EDUCATIONAL INSTITUTIONS (1)
ELECTROMAGNETIC WAVE ABSORPTION (1)
ELECTRONIC MAIL (1)
EMBEDDED (1)
EMBEDDED MEMORIES (1)
EMBEDDED SYSTEMS (1)
ENERGY EFFICIENCY (1)
ENGINES (1)
FABRICATION (1)
FAULT (1)
FAULT DIAGNOSIS (1)
FAULT TOLERANCE (1)
FAULT TOLERANT SYSTEMS (1)
FERRITES (1)
FIELD PROGRAMMABLE GATE ARRAYS (1)
FIGURE OF MERIT (1)
FINITE DIFFERENCE METHODS (1)
FINITE DIFFERENCE MODELING APPLICATIONS (1)
FINITE ELEMENT METHODS (1)
FUSES (1)
GRAPHS (1)
HARDWARE (1)
HASHING (1)
HEURISTIC ALGORITHMS (1)
HIGH FREQUENCY LOSS (1)
INDEX ARRAY COMPRESSION (1)
INDUSTRIES (1)
INSTRUCTION SETS (1)
more

INFONA - science communication portal

Search results for: . .

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options