Search results

chapter

Is memory disaggregation feasible? A case study with Spark SQL

Pramod Subba Rao, George Porter

2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) > 75 - 80

2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)

This paper explores the feasibility of entirely disaggregated memory from compute and storage for a particular, widely deployed workload, Spark SQL [9] analytics queries. We measure the empirical rate at which records are processed and calculate the effective memory bandwidth utilized based on the sizes of the columns accessed in the query. Our findings contradict conventional wisdom: not only is...

chapter

A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel

Sicheng Li, Yandan Wang, Wujie Wen, Yu Wang, more

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 1 - 6

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Sparse matrix-vector multiplication (SpMV) is an important computational kernel in many applications. For performance improvement, software libraries designated for SpMV computation have been introduced, e.g., MKL library for CPUs and cuSPARSE library for GPUs. However, the computational throughput of these libraries is far below the peak floating-point performance offered by hardware platforms, because...

chapter

14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses

Seongwook Park, Sungpill Choi, Jinmook Lee, Minseo Kim, more

2016 IEEE International Solid-State Circuits Conference (ISSCC) > 254 - 255

2016 IEEE International Solid-State Circuits Conference (ISSCC)

Wearable head-mounted display (HMD) smart devices are emerging as a smartphone substitute due to their ease of use and suitability for advanced applications, such as gaming and augmented reality (AR) [1–2]. Most current HMD systems suffer from: 1) a lack of rich user interfaces, 2) short battery life, and 3) heavy weight. Although current HMDs (e.g. Google Glass) use a touch panel and voice commands...

chapter

14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks

Yu-Hsin Chen, Tushar Krishna, Joel Emer, Vivienne Sze

2016 IEEE International Solid-State Circuits Conference (ISSCC) > 262 - 263

2016 IEEE International Solid-State Circuits Conference (ISSCC)

Deep learning using convolutional neural networks (CNN) gives state-of-the-art accuracy on many computer vision tasks (e.g. object detection, recognition, segmentation). Convolutions account for over 90% of the processing in CNNs for both inference/testing and training, and fully convolutional networks are increasingly being used. To achieve state-of-the-art accuracy requires CNNs with not only a...

chapter

TEMP: Thread batch enabled memory partitioning for GPU

Mengjie Mao, Wujie Wen, Xiaoxiao Liu, Jingtong Hu, more

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)

As massive multi-threading in GPU imposes tremendous pressure on memory subsystems, efficient bandwidth utilization becomes a key factor affecting the GPU throughput. In this work, we propose thread batch enabled memory partitioning (TEMP), to improve GPU performance through the improvement of memory bandwidth utilization. In particular, TEMP clusters multiple thread blocks sharing the same set of...

chapter

Single-tier virtual queuing: An efficacious memory controller architecture for MPSoCs with multiple realtime cores

Yang Song, Kambiz Samadi, Bill Lin

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)

In heterogeneous MPSoCs, memory interference between the CPU and realtime cores is a critical impediment to system performance. Previous memory schedulers adopt the classic two-tier queuing system, but unfortunately the use of two-tier queuing deteriorates the QoS of scheduling policies. In this paper, we propose the Single-Tier Virtual Queuing (STVQ) memory controller for efficacious QoS-aware scheduling...

chapter

A study on user-level remote memory extension system

Shinyoung Ahn, Gyuil Cha, Youngho Kim, Eunji Lim, more

2016 18th International Conference on Advanced Communication Technology (ICACT) > 234 - 239

2016 18th International Conference on Advanced Communication Technology (ICACT)

The speed of memory capacity expansion of the computer system has not kept up with the speed of the increase of the memory requirement of large memory applications. Also, big memory system has been too expensive for many researchers and students. Therefore, approaches to utilize remote memory has been considered as a cost effective way to run large memory applications in the cluster environment where...

chapter

RPFF: A Remote Page-Fault Filter for Post-copy Live Migration

Kui Su, Wenzhi Chen, Guoxi Li, Zonghui Wang

2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity) > 938 - 943

2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity)

Live migration of virtual machine has attracted significant attention in recent years. It facilitates system online maintenance, load balancing, fault tolerance and power management. Existing pre-copy live migration approach has to iteratively copy redundant memory pages, which causes high network overhead and slow migration. Another post-copy live migration approach can provide quick migration with...

chapter

Investigations into techniques to accelerate memory intensive GPGPU applications

Winnie Thomas, Rohin D. Daruwala

2015 Annual IEEE India Conference (INDICON) > 1 - 6

2015 Annual IEEE India Conference (INDICON)

Recent advancements in the architecture of Graphic Processing Unit (GPU), enables the acceleration of many general purpose applications. Even with high memory bandwidth, GPUs are still faced with the challenge of accelerating highly memory intensive applications. To overcome this challenge this paper investigates the impact of scaling up of the memory partitions and also scaling of frequency of the...

chapter

A review of bit-error-rate performance of interleaves in IDMA systems

Olanrewaju B. Wojuola, Stanley H. Mneney, Viranjay M. Srivastava

2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) > 1 - 4

2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC)

The centrality of interleavers in interleave-division multiple-access (IDMA) cannot be over-emphasised, the interleaver being the only means of isolating signals for different users of the multiple-access system. This work gives a critical review of bit-error-rate (BER) performance of interleavers and IDMA systems. Existing literature shows that there are disagreements among results published by different...

chapter

Multi-resource schedulable unit for adaptive application-driven unified resource management in data centers

David M. Gutierrez-Estevez, Min Luo

2015 International Telecommunication Networks and Applications Conference (ITNAC) > 261 - 268

2015 International Telecommunication Networks and Applications Conference (ITNAC)

Applications in modern data centers have a wide variety of resource requirements along the four main dimensions of computing, memory, storage, and networking. Data centers must manage these resources separately for each dimension, resulting in highly inefficient allocation of precious resources or even disastrous schemes that contribute to low utilization or over-provisioning of resources. However,...

chapter

Early experience with optimizing I/O performance using high-performance SSDs for in-memory cluster computing

I. Stephen Choi, Weiqing Yang, Yang-Suk Kee

2015 IEEE International Conference on Big Data (Big Data) > 1073 - 1083

2015 IEEE International Conference on Big Data (Big Data)

This paper describes our experience with storage optimization that utilizes cost-effective PCIe solid-state drives (SSDs) to improve the overall performance of a Spark framework. A key problem we address is the limited memory system performance. In particular, we adopt high-performance SSDs to alleviate the saturated DRAM bandwidth and its limited capacity. We utilize SSDs to store shuffle data and...

chapter

Saving memory movements through vector processing in the DRAM

Marco A. Z. Alves, Paulo C. Santos, Francis B. Moreira, Matthias Diener, more

2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) > 117 - 126

2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)

Despite the ability of modern processors to execute a variety of algorithms efficiently through instructions based on registers with ever-increasing widths, some applications present poor performance due to the limited interconnection bandwidth between main memory and processing units. Near-data processing has started to gain acceptance as an accelerator device due to the technology constraints and...

chapter

SCP: Synergistic cache compression and prefetching

Bhargavraj Patel, Nikos Hardavellas, Gokhan Memik

2015 33rd IEEE International Conference on Computer Design (ICCD) > 164 - 171

2015 33rd IEEE International Conference on Computer Design (ICCD)

While processor caches cannot grow arbitrarily large due to area, power, and latency considerations, dataset sizes grow faster than Moore's Law and pressure caches to grow to accommodate the increasing working sets. Cache compression partially mitigates this problem by providing an effective cache capacity larger than the physical capacity of the cache, but the prevalent rule of thumb dictates that...

chapter

Implementation of multiport memory access arbitration logic in high speed image system

Yang Liu, Guoman Liu

2015 8th International Congress on Image and Signal Processing (CISP) > 949 - 953

2015 8th International Congress on Image and Signal Processing (CISP)

In the high-speed real-time image processing system, an arbitration module is used to solve the problem of access conflicts when single port memory is shared within functional modules of FPGA. In this paper, the shared memory characteristics of each port module are briefly analyzed, and then the implementation mechanism and the specific design steps of arbiter logic are given. Finally, the logic is...

chapter

A scalable and reconfigurable 2.5D integrated multicore processor on silicon interposer

Jie Lin, Shikai Zhu, Zhiyi Yu, Dongjun Xu, more

2015 IEEE Custom Integrated Circuits Conference (CICC) > 1 - 4

2015 IEEE Custom Integrated Circuits Conference - CICC 2015

This paper presents a novel 2.5D multicore processor which consists of 3 distinct silicon dies: a processor die with 8 MIPS-cores, a 16kB SRAM die, and an accelerator die for multimedia and communication applications. These dies are interconnected into multi-modes, like core-core (up to 32 cores), core-memory (4x storage capacity) and core-accelerator (4.4x speedup in H.264 decoder), to establish...

chapter

TailoredBRIEF: Online per-feature descriptor customization

Andrew Richardson, Edwin Olson

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) > 74 - 81

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Image feature descriptors composed of a series of binary intensity comparisons yield substantial memory and runtime improvements over conventional descriptors, but are sensitive to viewpoint changes in ways that vary per feature. We propose a method to improve the matching performance of such descriptors by specifically reasoning about the reliability of test results on a feature-by-feature basis...

chapter

Understanding DDoS Attacks from Mobile Devices

Paolo Farina, Enrico Cambiaso, Gianluca Papaleo, Maurizio Aiello

2015 3rd International Conference on Future Internet of Things and Cloud > 614 - 619

2015 3rd International Conference on Future Internet of Things and Cloud (FiCloud)

The recent proliferation of smartphones and tablets leads to consider such devices as means for the execution of cyber-attacks. This scenario has rarely been considered earlier, since mobile devices always represented a target for cyber-criminals, rather than a vector to exploit. In this paper we introduce an innovative mobile bot net infrastructure, composed by mobile agents, for the execution of...

chapter

UCX: An Open Source Framework for HPC Network APIs and Beyond

Pavel Shamis, Manjunath Gorentla Venkata, M. Graham Lopez, Matthew B. Baker, more

2015 IEEE 23rd Annual Symposium on High-Performance Interconnects > 40 - 43

2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI)

This paper presents Unified Communication X (UCX), a set of network APIs and their implementations for high throughput computing. UCX comes from the combined effort of national laboratories, industry, and academia to design and implement a high-performing and highly-scalable network stack for next generation applications and systems. UCX design provides the ability to tailor its APIs and network functionality...

chapter

Flexible memory: A novel main memory architecture with block-level memory compression

Yanan Cao, Long Chen, Zhao Zhang

2015 IEEE International Conference on Networking, Architecture and Storage (NAS) > 285 - 294

2015 IEEE International Conference on Networking, Architecture and Storage (NAS)

Main memory system is facing increasingly high pressure from the advances of multi-core processors. The simplicity of conventional memory architecture has helped minimize memory latency and reduce the design cost. However, in present multi-core era, it is increasingly attractive to adopt flexible and advanced memory organization to further improve memory bandwidth utilization, power efficiency, and...

INFONA - science communication portal

Search results

Is memory disaggregation feasible? A case study with Spark SQL

A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel

14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses

14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks

TEMP: Thread batch enabled memory partitioning for GPU

Single-tier virtual queuing: An efficacious memory controller architecture for MPSoCs with multiple realtime cores

A study on user-level remote memory extension system

RPFF: A Remote Page-Fault Filter for Post-copy Live Migration

Investigations into techniques to accelerate memory intensive GPGPU applications

A review of bit-error-rate performance of interleaves in IDMA systems

Multi-resource schedulable unit for adaptive application-driven unified resource management in data centers

Early experience with optimizing I/O performance using high-performance SSDs for in-memory cluster computing

Saving memory movements through vector processing in the DRAM

SCP: Synergistic cache compression and prefetching

Implementation of multiport memory access arbitration logic in high speed image system

A scalable and reconfigurable 2.5D integrated multicore processor on silicon interposer

TailoredBRIEF: Online per-feature descriptor customization

Understanding DDoS Attacks from Mobile Devices

UCX: An Open Source Framework for HPC Network APIs and Beyond

Flexible memory: A novel main memory architecture with block-level memory compression

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options