The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Future space missions require reliable architectures with higher performance and lower power consumption. Exploring new architectures worthy of undergoing the expensive and time-consuming process of radiation hardening is critical for this endeavor. Two such architectures are the Texas Instruments KeyStone II octal-core processor and the ARM® Cortex®-A53 (ARMv8) quad-core CPU. DSPs have been proven...
Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel for many scientific and engineering applications. However, SpMV performance and efficiency are poor on commercial of-the-shelf (COTS) architectures, specially when the data size exceeds on-chip memory or last level cache (LLC). In this work we present an algorithm co-optimized hardware accelerator for large SpMV problems. We start...
HPC (high-performance computing) applications usually show bursty I/O behaviors. In order to expedite the applications, permanent storage systems are usually provisioned to serve such I/O bursts. Approaching the era of exascale computing, non-volatile RAM is introduced as burst buffers, to absorb the bursty bulk data and relax the I/O provisioning requirement of the permanent storage systems. However,...
It is well known that DRAM memory performance cannot keep pace with the performance of today's multicore compute systems. In addition to the memory bandwidth problem, there is another major challenge, namely, the power/energy challenge. DRAMs are largely contributing to the overall power consumption. Thus, there is a need for power and bandwidth optimization of the DRAM memory subsystems. Moreover,...
Anonymous network provides user privacy to protect identity. The onion routing (TOR) project is one kind of Internet anonymous networks which attracts many researchers and clients nowadays, because of its simplicity and scalability. However, there are some difficulties to analyze TOR performance within live TOR networks since it is distributed and its security nature. This paper presents a TOR network...
Graph algorithms such as breadth-first search (BFS) have been gaining ever-increasing importance in the era of Big Data. However, the memory bandwidth remains the key performance bottleneck for graph processing. To address this problem, we utilize processing-in-memory (PIM), combined with non-volatile metal-oxide resistive random access memory (ReRAM), to improve the performance of both computation...
Memory interference is a critical impediment to system performance in CMP systems. To address this problem, we first propose a Dynamically Proportional Bandwidth Throttling policy (DPBT), which dynamically throttles back memory-intensive applications based on their memory access behavior. DPBT achieves a more balance memory bandwidth partitioning. Moreover, we improve the previous memory channel partitioning...
Hybrid quantum technologies seek to combine the advantages of two individual quantum architectures by transferring the information between the two systems. We want to benefit from the high mobility and ease of transmission of photons for quantum communication and exploit the excellent readout and storage capabilities of atomic qubits as a quantum memory, which is essential to build up quantum repeater...
Recent advances in Intrusion Risk Assessment (IRA) have brought promising solutions to enhance Intrusion Response Systems (IRS). However, current researches lack reasonable solutions to exploit system state information. Without the system state, the IRA results may suffer from the high false rate of Intrusion Detection Systems (IDS). To address this limitation, we propose a novel State-Aware Risk...
Data movement to and from off-chip memory dominates energy consumption in most video decoders, with DRAM accesses consuming 2.8x–6x more energy than the processing itself. We present a H.265/HEVC video decoder with embedded DRAM (eDRAM) as main memory. We propose the following techniques to optimize data movement and reduce the power consumption of eDRAM: 1) lossless compression is used to store reference...
High-performance analysis of big data demands more computing resources, forcing similar growth in computation cost. So, the challenge to the HPC system designers is providing not only high performance but also high performance at lower cost. For high performance yet cost effective cyberinfrastructure, we propose a new system model augmenting Amdahl's second law for balanced system to optimize price-performance-ratio...
Caching plays an important role in many domains, as it can lead to important performance improvements. A key-value based caching system typically stores the results of popular queries in efficient storage locations. While caching enjoys widespread usage in the context of dynamic web applications, most mainstream caching systems store static binary items, which makes them impractical for many real-world...
RAM-based storage aggregates the RAM of servers in data center networks (DCN) to provide extremely high storage performance. For quick recovery of storage server failures, Mem-Cube [1] exploits the proximity of the BCube network to limit the recovery traffic to the recovery servers' 1-hop neighborhood. However, previous design is applicable only to BCube, and has suboptimal recovery performance due...
The state-of-the-art customized accelerators of convolution neural networks (CNN) have achieved high throughput while the huge amount of data movements still remains as the dominant part of the total energy costs. In this paper, we propose an energy-efficient scheduling approach to find an efficient dataflow that minimizes data movements with limited hardware resource budgets. In detail, two-level...
Users of cloud computing platforms pose different types of demands for multiple resources on servers (physical or virtual machines). Besides differences in their resource capacities, servers may be additionally heterogeneous in their ability to service users — certain users' tasks may only be serviced by a subset of the servers. We identify important shortcomings in existing multi-resource fair allocation...
Heterogeneous chip-multiprocessors with integrated CPU and GPU cores on the same die allow sharing of critical memory system resources among the applications executing on the twotypes of cores. In this paper, we explore memory system management driven by the quality of service (QoS) requirement of the GPU applications executing simultaneously with CPUapplications in such heterogeneous platforms. Our...
This paper presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in storage, memory or both simultaneously, without major modifications. Initial performance results demonstrate that the presented MPI window extension could potentially...
An interference monitoring station is presented which detects, characterises and logs interference events. The system operates autonomously and continuously over multiple global navigation satellite system (GNSS) bands. With a bandwidth of up to 80 MHz, input resolution of 8 bit an overall data rate of approximately 1.3 Gbit/s can be supported. The interference detection is carried out in real-time,...
Comparison of the dominant emerging memory technologies at the fundamental cell level will be presented. Metrics will be discussed and the technologies will be compared with the objective of judging the suitability for high density memory applications.
Increasingly complex memory systems and onchip interconnects are developed to mitigate the data movement bottlenecks in manycore processors. One example of such a complex system is the Xeon Phi KNL CPU with three different types of memory, fifteen memory configuration options, and a complex on-chip mesh network connecting up to 72 cores. Users require a detailed understanding of the performance characteristics...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.