The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recent breakthroughs in memristive devices have demonstrated the potential of using resistive content addressable memories for associative processing. These architectures enable ultra-high density integrated circuits along with low-power computation. However, the reliability of memristive elements is limiting the widespread adoption of these architectures. In this study, we address the reliability...
The need to perform data analytics on exploding data volumes coupled with the rapidly changing workloads in cloud computing places great pressure on data-center servers. To improve hardware resource utilization across servers within a rack, we propose Direct Extension of On-chip Interconnects (DEOI), a high-performance and efficient architecture for remote resource access among server nodes. DEOI...
Physically Unclonable Function (PUF) is cost effective and reliable security primitives widely used in authentication and in-place secret key generation. With growing research in the area of non-CMOS technologies for memories and circuits, it is important to understand their implications on the design of security primitives. Resistive Random Accessible Memory (RRAM) offers easy integration with CMOS...
It is projected that hundreds of cores can be integrated into a chip at the sub-20nm technology nodes. However, some challenges exist in the many-core architecture such as maintaining memory coherence, underutilized parallelism, and increased inter-core communication delay. This work proposes the data-center-on-a-chip (DCoC) paradigm employing virtualization technologies commonly used in today's data...
OpenCL is designed as a parallel programming framework to support heterogeneous computing platforms. The implicit or explicit parallelism in OpenCL kernel code enables efficient FPGA implementation from a high-level programming abstraction. However, FPGA architecture is completely different from GPU architecture, for which OpenCL is widely used. Tuning OpenCL codes to achieve high performance on FPGAs...
Modern computer systems require fast, large and reliable memories to handle information explosion. With this goal in mind, not only deployment of main memories with new technologies are necessary, but also adopting innovative solutions for addressing newfound challenges must be considered as a priority. Recently, phase change memory (PCM) appeared as a preferred candidate for substituting DRAM. PCM...
The continuance of Moore's law and failure of Dennard scaling force future chip multiprocessors (CMPs) to have considerable dark regions. How to use up available dark resources is an important concern for computer architects. In harmony with these changes, we must revise processor allocation schemes that severely affect the performance of a parallel on-chip system. A suitable allocation algorithm...
Hardware caches are widely employed in GPGPUs to achieve higher performance and energy efficiency. Incorporating hardware caches in GPGPUs, however, does not immediately guarantee enhanced performance and energy efficiency due to high cache contention and thrashing. To address the inefficiency of GPGPU caches, various adaptive techniques (e.g., warp limiting) have been proposed. However, relatively...
This paper quantifies the difference in resource demand between modern and classic NoC workloads. In the paper, we show that modern workloads are able to better utilize higher numbers of VCs and smaller C factors in order to attain performance and energy efficiency. This is because of the high throughput and possible local congestions in their traffic pattern. As a result, such workloads are more...
Latent Semantic Indexing (LSI) has played a significant role in discovering patterns on the relationships between query terms and unstructured documents. However, the inherent characteristics of complex matrix factorization in LSI make it difficult to meet stringent performance requirements. In this paper, we present a deeply pipelined reconfigurable architecture for LSI, which parallelizes the matrix...
To achieve high throughput, core count in compute accelerators such as General-Purpose Graphics Processing Units (GPGPUs) increases continuously. The communication demand of these cores boosts the demand for a low-latency packet switched network. As packet latency is mainly composed of per-hop latency, contention latency and serialization latency, a favorable Network-on-Chip (NoC) design should efficiently...
Micron's Automata Processor (AP) efficiently emulates non-deterministic finite automata and has been shown to provide large speedups over traditional von Neumann execution for massively parallel, rule-based, data-mining and pattern matching applications. We demonstrate the AP's ability to generate high-quality and energy efficient pseudo-random behavior for use in pseudo-random number generation or...
The significance of computer architecture simulators in advancing computer architecture research is widely acknowledged. Computer architects have developed numerous simulators in the past few decades and their number continues to rise. This paper explores different simulation techniques and surveys many ×86 simulators. Comparing simulators with each other and validating their correctness has been...
As the 3D stacking technology still faces several challenges, the 2.5D stacking technology gains better application prospects nowadays. With the silicon interposer, the 2.5D stacking can improve the bandwidth and capacity of the memory system. To satisfy the communication requirements of the integrated memory system, the free routing resources in the interposer should be explored to implement an additional...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.