2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

chapter

In-datacenter performance analysis of a tensor processing unit

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 1 - 12

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second...

chapter

SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks

Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 13 - 26

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Deep Neural Networks (DNNs) have demonstrated state-of-the-art performance on a broad range of tasks involving natural language, speech, image, and video processing, and are deployed in many real world applications. However, DNNs impose significant computational challenges owing to the complexity of the networks and the amount of data they process, both of which are projected to grow in the future...

chapter

SCNN: An accelerator for compressed-sparse convolutional neural networks

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 27 - 40

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical for deployments of CNNs, especially in mobile platforms such as autonomous vehicles, cameras, and electronic personal assistants. This paper introduces the Sparse CNN (SCNN) accelerator architecture, which improves performance and energy efficiency...

chapter

Bespoke processors for applications with ultra-low area and power constraints

Hari Cherupalli, Henry Duwe, Weidong Ye, Rakesh Kumar, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 41 - 54

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

A large number of emerging applications such as implantables, wearables, printed electronics, and IoT have ultra-low area and power constraints. These applications rely on ultra-low-power general purpose microcontrollers and microprocessors, making them the most abundant type of processor produced and used today. While general purpose processors have several advantages, such as amortized development...

chapter

A programmable Galois Field processor for the Internet of Things

Yajing Chen, Shengshuo Lu, Cheng Fu, David Blaauw, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 55 - 68

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

This paper investigates the feasibility of a unified processor architecture to enable error coding flexibility and secure communication in low power Internet of Things (IoT) wireless networks. Error coding flexibility for wireless communication allows IoT applications to exploit the large tradeoff space in data rate, link distance and energy-efficiency. As a solution, we present a light-weight Galois...

chapter

XPro: A cross-end processing architecture for data analytics in wearables

Aosen Wang, Lizhong Chen, Wenyao Xu

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 69 - 80

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Wearable computing systems have spurred many opportunities to continuously monitor human bodies with sensors worn on or implanted in the body. These emerging platforms have started to revolutionize many fields, including healthcare and wellness applications, particularly when integrated with intelligent analytic capabilities. However, a significant challenge that computer architects are facing is...

chapter

Regaining lost cycles with HotCalls: A fast interface for SGX secure enclaves

Ofir Weisse, Valeria Bertacco, Todd Austin

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 81 - 93

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Intel's SGX secure execution technology allows running computations on secret data using untrusted servers. While recent work showed how to port applications and large-scale computations to run under SGX, the performance implications of using the technology remains an open question. We present the first comprehensive quantitative study to evaluate the performance of SGX. We show that straightforward...

chapter

InvisiMem: Smart memory defenses for memory bus side channel

Shaizeen Aga, Satish Narayanasamy

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 94 - 106

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

A practically feasible low-overhead hardware design that provides strong defenses against memory bus side channel remains elusive. This paper observes that smart memory, memory with compute capability and a packetized interface, can dramatically simplify this problem. InvisiMem expands the trust base to include the logic layer in the smart memory to implement cryptographic primitives, which aid in...

chapter

ObfusMem: A low-overhead access obfuscation for trusted memories

Amro Awad, Yipeng Wang, Deborah Shands, Yan Solihin

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 107 - 119

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Trustworthy software requires strong privacy and security guarantees from a secure trust base in hardware. While chipmakers provide hardware support for basic security and privacy primitives such as enclaves and memory encryption. these primitives do not address hiding of the memory access pattern, information about which may enable attacks on the system or reveal characteristics of sensitive user...

chapter

ThermoGater: Thermally-aware on-chip voltage regulation

S. Karen Khatamifard, Longfei Wang, Weize Yu, Selcuk Kose, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 120 - 132

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Tailoring the operating voltage to fine-grain temporal changes in the power and performance needs of the workload can effectively enhance power efficiency. Therefore, power-limited computing platforms of today widely deploy integrated (i.e., on-chip) voltage regulation which enables fast fine-grain voltage control. Voltage regulators convert and distribute power from an external energy source to the...

chapter

PowerChief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained CMP

Hailong Yang, Quan Chen, Moeiz Riaz, Zhongzhi Luan, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 133 - 146

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Modern user facing applications consist of multiple processing stages with a number of service instances in each stage. The latency profile of these multi-stage applications is intrinsically variable, making it challenging to provide satisfactory responsiveness. Given a limited power budget, improving the end-to-end latency requires intelligently boosting the bottleneck service across stages using...

chapter

CHARSTAR: Clock hierarchy aware resource scaling in tiled architectures

Gokul Subramanian Ravi, Mikko H. Lipasti

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 147 - 160

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

High-performance architectures are over-provisioned with resources to extract the maximum achievable performance out of applications. Two sources of avoidable power dissipation are the leakage power from underutilized resources, along with clock power from the clock hierarchy that feeds these resources. Most reconfiguration mechanisms either focus solely on power gating execution resources alone or...

chapter

Chasing Away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems

Matthew D. Sinclair, Johnathan Alsop, Sarita V. Adve

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 161 - 174

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

An unambiguous and easy-to-understand memory consistency model is crucial for ensuring correct synchronization and guiding future design of heterogeneous systems. In a widely adopted approach, the memory model guarantees sequential consistency (SC) as long as programmers obey certain rules. The popular data-race-free-0 (DRF0) model exemplifies this SC-centric approach by requiring programmers to avoid...

chapter

Hiding the long latency of persist barriers using speculative execution

Seunghee Shin, James Tuck, Yan Solihin

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 175 - 186

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Byte-addressable non-volatile memory technology is emerging as an alternative for DRAM for main memory. This new Non-Volatile Main Memory (NVMM) allows programmers to store important data in data structures in memory instead of serializing it to the file system, thereby providing a substantial performance boost. However, modern systems reorder memory operations and utilize volatile caches for better...

chapter

Non-speculative load-load reordering in TSO

Alberto Ros, Trevor E. Carlson, Mehdi Alipour, Stefanos Kaxiras

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 187 - 200

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

In Total Store Order memory consistency (TSO), loads can be speculatively reordered to improve performance. If a load-load reordering is seen by other cores, speculative loads must be squashed and re-executed. In architectures with an unordered interconnection network and directory coherence, this has been the established view for decades. We show, for the first time, that it is not necessary to squash...

chapter

MTraceCheck: Validating non-deterministic behavior of memory consistency models in post-silicon validation

Doowon Lee, Valeria Bertacco

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 201 - 213

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

This work presents a minimally-intrusive, high-performance, post-silicon validation framework for validating memory consistency in multi-core systems. Our framework generates constrained-random tests that are instrumented with observability-enhancing code for memory consistency verification. For each test, we generate a set of compact signatures reflecting the memory-ordering patterns observed over...

chapter

Redundant memory array architecture for efficient selective protection

Ruohuang Zheng, Michael C. Huang

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 214 - 227

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Memory hardware errors may result from transient particle-induced faults as well as device defects due to aging. These errors are an important threat to computer system reliability as VLSI technologies continue to scale. Managing memory hardware errors is a critical component in developing an overall system dependability strategy. Memory error detection and correction are supported in a range of available...

chapter

Clank: Architectural support for intermittent computation

Matthew Hicks

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 228 - 240

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

The processors that drive embedded systems are getting smaller; meanwhile, the batteries used to provide power to those systems have stagnated. If we are to realize the dream of ubiquitous computing promised by the Internet of Things, processors must shed large, heavy, expensive, and high maintenance batteries and, instead, harvest energy from their environment. One challenge with this transition...

chapter

MeRLiN: Exploiting dynamic instruction behavior for fast and accurate microarchitecture level reliability assessment

Manolis Kaliorakis, Dimitris Gizopoulos, Ramon Canal, Antonio Gonzalez

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 241 - 254

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Early reliability assessment of hardware structures using microarchitecture level simulators can effectively guide major error protection decisions in microprocessor design. Statistical fault injection on microarchitectural structures modeled in performance simulators is an accurate method to measure their Architectural Vulnerability Factor (AVF) but requires excessively long campaigns to obtain high...

chapter

The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions

Minesh Patel, Jeremie S. Kim, Onur Mutlu

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 255 - 268

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Modern DRAM-based systems suffer from significant energy and latency penalties due to conservative DRAM refresh standards. Volatile DRAM cells can retain information across a wide distribution of times ranging from milliseconds to many minutes, but each cell is currently refreshed every 64ms to account for the extreme tail end of the retention time distribution, leading to a high refresh overhead...

INFONA - science communication portal

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

In-datacenter performance analysis of a tensor processing unit

SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks

SCNN: An accelerator for compressed-sparse convolutional neural networks

Bespoke processors for applications with ultra-low area and power constraints

A programmable Galois Field processor for the Internet of Things

XPro: A cross-end processing architecture for data analytics in wearables

Regaining lost cycles with HotCalls: A fast interface for SGX secure enclaves

InvisiMem: Smart memory defenses for memory bus side channel

ObfusMem: A low-overhead access obfuscation for trusted memories

ThermoGater: Thermally-aware on-chip voltage regulation

PowerChief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained CMP

CHARSTAR: Clock hierarchy aware resource scaling in tiled architectures

Chasing Away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems

Hiding the long latency of persist barriers using speculative execution

Non-speculative load-load reordering in TSO

MTraceCheck: Validating non-deterministic behavior of memory consistency models in post-silicon validation

Redundant memory array architecture for efficient selective protection

Clank: Architectural support for intermittent computation

MeRLiN: Exploiting dynamic instruction behavior for fast and accurate microarchitecture level reliability assessment

The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)