The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Graph algorithms such as breadth-first search (BFS) have been gaining ever-increasing importance in the era of Big Data. However, the memory bandwidth remains the key performance bottleneck for graph processing. To address this problem, we utilize processing-in-memory (PIM), combined with non-volatile metal-oxide resistive random access memory (ReRAM), to improve the performance of both computation...
Large off-die stacked DRAM caches have been proposed to provide higher effective bandwidth and lower average latency to main memory. Designing a large off-die DRAM cache with conventional block size requires a large tag array which is impractical to fit on-die. Placing the large directory off-die prolong the latency since a tag access is necessary before the data can be accessed. This additional trip...
Hardware resources require efficient scaling because the future of computing technology seems to be intensive multithreaded. One of the main challenges in the scalability of computers hardware is the hierarchy of the memory. Chip-multiprocessors (CMPs) rely on large and multi-level hierarchies of caches to reduce cost of resources and improve systems performance. These multi-level hierarchies are...
Sparse matrix-vector multiplication (SMVM) is a fundamental operation in many scientific and engineering applications. In many cases sparse matrices have thousands of rows and columns where most of the entries are zero, while non-zero data is spread over the matrix. This sparsity of data locality reduces the effectiveness of data cache in general-purpose processors quite reducing their performance...
We present a high-rate (n, k, d = n − 1)-MSR code with a sub-packetization level that is polynomial in the dimension k of the code. While polynomial sub-packetization level was achieved earlier for vector MDS codes that repair systematic nodes optimally, no such MSR code construction is known. In the low-rate regime (i. e., rates less than one-half), MSR code constructions with a linear sub-packetization...
Given the scale of today's distributed storage systems, the failure of an individual node is a common phenomenon. Various metrics have been proposed to measure the efficacy of the repair of a failed node, such as the amount of data download needed to repair (also known as the repair bandwidth), the amount of data accessed at the helper nodes, and the number of helper nodes contacted. Clearly, the...
Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular...
Most of the "Big Data" applications, such as decision support and emergency response, must provide users with fresh, low latency results, especially for aggregation results on key performance metrics. However, disk-oriented approaches to online storage are becoming increasingly problematic. They do not scale grace-fully to meet the needs of large-scale Web applications, and improvements...
Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS operations tends to be memory-bound rather than compute-bound. In this paper, we present an efficient reconfigurable architecture for parallel BFS that adopts new optimizations for utilizing...
Sparse matrix-vector multiplication (SpMV) is an important building block for many scientific applications. Various formats exist to store and represent sparse matrices in the computer's memory. The compressed row storage format (CRS or CSR) is typically a baseline to report a new hybrid or an improved representation of sparse matrices. In this paper, we describe the implementation and performance...
In this paper, we propose a reconfigurable macro-pipelined systolic architecture (MAPS), which aims to accelerate multiply-accumulate based algorithms by exploiting the temporal parallelism. To illustrate the performance, we implement a 32-PE accelerator on the Xilinx ML605 experiment board for the matrix multiplication and get a peak performance of 51.2 GFLOPS (about 8.0 GFLOPS per PE per GHz). To...
Processors and memory systems suffer from a growing performance gap between them. Each technology generation increases the on-chip performance capabilities however, memory bandwidth increases at a much slower pace. Therefore, overall performance improvements are constrained by the available memory bandwidth. In this paper, we address the memory bandwidth problem of vector processors by introducing...
The ever-increasing importance of main memory latency and bandwidth is pushing CMPs towards caches with higher capacity and associativity. Associativity is typically improved by increasing the number of ways. This reduces conflict misses, but increases hit latency and energy, placing a stringent trade-off on cache design. We present the zcache, a cache design that allows much higher associativity...
Sparse matrix vector multiplication (SpMV) is used in many scientific computations. The main bottleneck of this algorithm is memory bandwidth and many methods reduce memory bandwidth usage by compressing the index array. The matrices from finite difference modeling applications often have several dense diagonals and sparse diagonals. For these matrices, the index array can be deleted by using diagonal...
Next generation microwave devices require to be multifunctional for efficient, cost effective operation in light weight, low volume structures. A miniature tunable negative index metamaterial phase shifter and an ultra wideband phased array antenna have been designed using ferrite materials. Negative permeability ferrite material in combination with negative permittivity of plasmonic wires produces...
In this paper a BISR architecture for embedded memories is presented. The proposed scheme utilises a multiple bank cache-like memory for repairs. Statistical analysis is used for minimisation of the total resources required to achieve a very high fault coverage. Simulation results show that the proposed BISR scheme is characterised by high efficiency and low area overhead, even for high defect densities...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.