2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

chapter

Non-speculative load-load reordering in TSO

Alberto Ros, Trevor E. Carlson, Mehdi Alipour, Stefanos Kaxiras

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 187 - 200

In Total Store Order memory consistency (TSO), loads can be speculatively reordered to improve performance. If a load-load reordering is seen by other cores, speculative loads must be squashed and re-executed. In architectures with an unordered interconnection network and directory coherence, this has been the established view for decades. We show, for the first time, that it is not necessary to squash...

chapter

SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks

Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 13 - 26

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Deep Neural Networks (DNNs) have demonstrated state-of-the-art performance on a broad range of tasks involving natural language, speech, image, and video processing, and are deployed in many real world applications. However, DNNs impose significant computational challenges owing to the complexity of the networks and the amount of data they process, both of which are projected to grow in the future...

chapter

ObfusMem: A low-overhead access obfuscation for trusted memories

Amro Awad, Yipeng Wang, Deborah Shands, Yan Solihin

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 107 - 119

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Trustworthy software requires strong privacy and security guarantees from a secure trust base in hardware. While chipmakers provide hardware support for basic security and privacy primitives such as enclaves and memory encryption. these primitives do not address hiding of the memory access pattern, information about which may enable attacks on the system or reveal characteristics of sensitive user...

chapter

ThermoGater: Thermally-aware on-chip voltage regulation

S. Karen Khatamifard, Longfei Wang, Weize Yu, Selcuk Kose, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 120 - 132

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Tailoring the operating voltage to fine-grain temporal changes in the power and performance needs of the workload can effectively enhance power efficiency. Therefore, power-limited computing platforms of today widely deploy integrated (i.e., on-chip) voltage regulation which enables fast fine-grain voltage control. Voltage regulators convert and distribute power from an external energy source to the...

chapter

In-datacenter performance analysis of a tensor processing unit

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 1 - 12

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second...

chapter

PowerChief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained CMP

Hailong Yang, Quan Chen, Moeiz Riaz, Zhongzhi Luan, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 133 - 146

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Modern user facing applications consist of multiple processing stages with a number of service instances in each stage. The latency profile of these multi-stage applications is intrinsically variable, making it challenging to provide satisfactory responsiveness. Given a limited power budget, improving the end-to-end latency requires intelligently boosting the bottleneck service across stages using...

chapter

Hiding the long latency of persist barriers using speculative execution

Seunghee Shin, James Tuck, Yan Solihin

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 175 - 186

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Byte-addressable non-volatile memory technology is emerging as an alternative for DRAM for main memory. This new Non-Volatile Main Memory (NVMM) allows programmers to store important data in data structures in memory instead of serializing it to the file system, thereby providing a substantial performance boost. However, modern systems reorder memory operations and utilize volatile caches for better...

chapter

CHARSTAR: Clock hierarchy aware resource scaling in tiled architectures

Gokul Subramanian Ravi, Mikko H. Lipasti

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 147 - 160

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

High-performance architectures are over-provisioned with resources to extract the maximum achievable performance out of applications. Two sources of avoidable power dissipation are the leakage power from underutilized resources, along with clock power from the clock hierarchy that feeds these resources. Most reconfiguration mechanisms either focus solely on power gating execution resources alone or...

chapter

MTraceCheck: Validating non-deterministic behavior of memory consistency models in post-silicon validation

Doowon Lee, Valeria Bertacco

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 201 - 213

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

This work presents a minimally-intrusive, high-performance, post-silicon validation framework for validating memory consistency in multi-core systems. Our framework generates constrained-random tests that are instrumented with observability-enhancing code for memory consistency verification. For each test, we generate a set of compact signatures reflecting the memory-ordering patterns observed over...

chapter

EDDIE: EM-based detection of deviations in program execution

Alireza Nazari, Nader Sehatbakhsh, Monjur Alam, Alenka Zajic, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 333 - 346

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

This paper describes EM-Based Detection of Deviations in Program Execution (EDDIE), a new method for detecting anomalies in program execution, such as malware and other code injections, without introducing any overheads, adding any hardware support, changing any software, or using any resources on the monitored system itself. Monitoring with EDDIE involves receiving electromagnetic (EM) emanations...

chapter

Hybrid TLB coalescing: Improving TLB translation coverage under diverse fragmented memory allocations

Chang Hyun Park, Taekyung Heo, Jungi Jeong, Jaehyuk Huh

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 444 - 456

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

To mitigate excessive TLB misses in large memory applications, techniques such as large pages, variable length segments, and HW coalescing, increase the coverage of limited hardware translation entries by exploiting the contiguous memory allocation. However, recent studies show that in non-uniform memory systems, using large pages often leads to performance degradation, or allocating large chunks...

chapter

Stream-dataflow acceleration

Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, Karthikeyan Sankaralingam

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 416 - 429

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Demand for low-power data processing hardware continues to rise inexorably. Existing programmable and “general purpose” solutions (eg. SIMD, GPGPUs) are insufficient, as evidenced by the order-of-magnitude improvements and industry adoption of application and domain-specific accelerators in important areas like machine learning, computer vision and big data. The stark tradeoffs between efficiency...

chapter

Secure hierarchy-aware cache replacement policy (SHARP): Defending against cache-based side channel attacks

Mengjia Yan, Bhargava Gopireddy, Thomas Shull, Josep Torrellas

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 347 - 360

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

In cache-based side channel attacks, a spy that shares a cache with a victim probes cache locations to extract information on the victim's access patterns. For example, in evict+reload, the spy repeatedly evicts and then reloads a probe address, checking if the victim has accessed the address in between the two operations. While there are many proposals to combat these cache attacks, they all have...

chapter

LogCA: A high-level performance model for hardware accelerators

Muhammad Shoaib Bin Altaf, David A. Wood

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 375 - 388

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

With the end of Dennard scaling, architects have increasingly turned to special-purpose hardware accelerators to improve the performance and energy efficiency for some applications. Unfortunately, accelerators don't always live up to their expectations and may under-perform in some situations. Understanding the factors which effect the performance of an accelerator is crucial for both architects and...

chapter

Hardware translation coherence for virtualized systems

Zi Yan, Jan Vesely, Guilherme Cox, Abhishek Bhattacharjee

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 430 - 443

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

To improve system performance, operating systems (OSes) often undertake activities that require modification of virtual-to-physical address translations. For example, the OS may migrate data between physical pages to manage heterogeneous memory devices. We refer to such activities as page remappings. Unfortunately, page remappings are expensive. We show that a big part of this cost arises from address...

INFONA - science communication portal

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Non-speculative load-load reordering in TSO

SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks

ObfusMem: A low-overhead access obfuscation for trusted memories

ThermoGater: Thermally-aware on-chip voltage regulation

In-datacenter performance analysis of a tensor processing unit

PowerChief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained CMP

Hiding the long latency of persist barriers using speculative execution

CHARSTAR: Clock hierarchy aware resource scaling in tiled architectures

MTraceCheck: Validating non-deterministic behavior of memory consistency models in post-silicon validation

EDDIE: EM-based detection of deviations in program execution

Hybrid TLB coalescing: Improving TLB translation coverage under diverse fragmented memory allocations

Stream-dataflow acceleration

Secure hierarchy-aware cache replacement policy (SHARP): Defending against cache-based side channel attacks

LogCA: A high-level performance model for hardware accelerators

Hardware translation coherence for virtualized systems

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)