2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

chapter

Maximizing CNN accelerator efficiency through resource partitioning

Yongming Shen, Michael Ferdman, Peter Milder

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 535 - 547

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection...

chapter

Scalpel: Customizing DNN pruning to the underlying hardware parallelism

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 548 - 560

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network...

chapter

Understanding and optimizing asynchronous low-precision stochastic gradient descent

Christopher De Sa, Matthew Feldman, Christopher Re, Kunle Olukotun

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 561 - 574

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called BUCKWILD! that uses both asynchronous execution and low-precision...

chapter

Aggressive pipelining of irregular applications on reconfigurable hardware

Zhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 575 - 586

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

CPU-FPGA heterogeneous platforms offer a promising solution for high-performance and energy-efficient computing systems by providing specialized accelerators with post-silicon reconfigurability. To unleash the power of FPGA, however, the programmability gap has to be filled so that applications specified in high-level programming languages can be efficiently mapped and scheduled on FPGA. The above...

chapter

Fractal: An execution model for fine-grain nested speculative parallelism

Suvinay Subramanian, Mark C. Jeffrey, Maleen Abeydeera, Hyun Ryong Lee, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 587 - 599

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Most systems that support speculative parallelization, like hardware transactional memory (HTM), do not support nested parallelism. This sacrifices substantial parallelism and precludes composing parallel algorithms. And the few HTMs that do support nested parallelism focus on parallelizing at the coarsest (shallowest) levels, incurring large overheads that squander most of their potential. We present...

chapter

Parallel automata processor

Arun Subramaniyan, Reetuparna Das

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 600 - 612

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Finite State Machines (FSM) are widely used computation models for many application domains. These embarrassingly sequential applications with irregular memory access patterns perform poorly on conventional von-Neumann architectures. The Micron Automata Processor (AP) is an in-situ memory-based computational architecture that accelerates non-deterministic finite automata (NFA) processing in hardware...

chapter

Viyojit: Decoupling battery and DRAM capacities for battery-backed DRAM

Rajat Kateja, Anirudh Badam, Sriram Govindan, Bikash Sharma, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 613 - 626

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Non-Volatile Memories (NVMs) can significantly improve the performance of data-intensive applications. A popular form of NVM is Battery-backed DRAM, which is available and in use today with DRAMs latency and without the endurance problems of emerging NVM technologies. Modern servers can be provisioned with up-to 4 TB of DRAM, and provisioning battery backup to write out such large memories is hard...

chapter

DICE: Compressing DRAM caches for bandwidth and capacity

Vinson Young, Prashant J. Nair, Moinuddin K. Qureshi

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 627 - 638

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

This paper investigates compression for DRAM caches. As the capacity of DRAM cache is typically large, prior techniques on cache compression, which solely focus on improving cache capacity, provide only a marginal benefit. We show that more performance benefit can be obtained if the compression of the DRAM cache is tailored to provide higher bandwidth. If a DRAM cache can provide two compressed lines...

chapter

The mondrian data engine

Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 639 - 651

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

The increasing demand for extracting value out of ever-growing data poses an ongoing challenge to system designers, a task only made trickier by the end of Dennard scaling. As the performance density of traditional CPU-centric architectures stagnates, advancing compute capabilities necessitates novel architectural approaches. Near-memory processing (NMP) architectures are reemerging as promising candidates...

chapter

Jenga: Software-defined cache hierarchies

Po-An Tsai, Nathan Beckmann, Daniel Sanchez

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 652 - 665

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Caches are traditionally organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, since working sets settle at the smallest (i.e., fastest and most energy-efficient) level they fit in. However, rigid hierarchies also add overheads, because each level adds latency and energy even...

chapter

APPROX-NoC: A data approximation framework for Network-on-Chip architectures

Rahul Boyapati, Jiayi Huang, Pritam Majumder, Ki Hwan Yum, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 666 - 677

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

The trend of unsustainable power consumption and large memory bandwidth demands in massively parallel multicore systems, with the advent of the big data era, has brought upon the onset of alternate computation paradigms utilizing heterogeneity, specialization, processor-in-memory and approximation. Approximate Computing is being touted as a viable solution for high performance computation by relaxing...

chapter

There and back again: Optimizing the interconnect in networks of memory cubes

Matthew Poremba, Itir Akgun, Jieming Yin, Onur Kayiran, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 678 - 690

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

High-performance computing, enterprise, and datacenter servers are driving demands for higher total memory capacity as well as memory performance. Memory “cubes” with high per-package capacity (from 3D integration) along with high-speed point-to-point interconnects provide a scalable memory system architecture with the potential to deliver both capacity and performance. Multiple such cubes connected...

chapter

Footprint: Regulating routing adaptiveness in Networks-on-Chip

Binzhang Fu, John Kim

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 691 - 702

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Routing algorithms can improve network performance by maximizing routing adaptiveness but can be problematic in the presence of endpoint congestion. Tree-saturation is a well-known behavior caused by endpoint congestion. Adaptive routing can, however, spread the congestion and result in thick branches of the congestion tree — creating Head-of-Line (HoL) blocking and degrading performance. In this...

chapter

EbDa: A new theory on design and verification of deadlock-free interconnection networks

Masoumeh Ebrahimi, Masoud Daneshtalab

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 703 - 715

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Freedom from deadlock is one of the most important issues when designing routing algorithms in on-chip/off-chip networks. Many works have been developed upon Dally's theory proving that a network is deadlock-free if there is no cyclic dependency on the channel dependency graph. However, finding such acyclic graph has been very challenging, which limits Dally's theory to networks with a low number...

INFONA - science communication portal

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Maximizing CNN accelerator efficiency through resource partitioning

Scalpel: Customizing DNN pruning to the underlying hardware parallelism

Understanding and optimizing asynchronous low-precision stochastic gradient descent

Aggressive pipelining of irregular applications on reconfigurable hardware

Fractal: An execution model for fine-grain nested speculative parallelism

Parallel automata processor

Viyojit: Decoupling battery and DRAM capacities for battery-backed DRAM

DICE: Compressing DRAM caches for bandwidth and capacity

The mondrian data engine

Jenga: Software-defined cache hierarchies

APPROX-NoC: A data approximation framework for Network-on-Chip architectures

There and back again: Optimizing the interconnect in networks of memory cubes

Footprint: Regulating routing adaptiveness in Networks-on-Chip

EbDa: A new theory on design and verification of deadlock-free interconnection networks

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)