Michael Ferdman

chapter

Maximizing CNN accelerator efficiency through resource partitioning

Yongming Shen, Michael Ferdman, Peter Milder

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 535 - 547

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection...

chapter

Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer

Yongming Shen, Michael Ferdman, Peter Milder

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 93 - 100

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

Convolutional neural networks (CNNs) are used to solve many challenging machine learning problems. Interest in CNNs has led to the design of CNN accelerators to improve CNN evaluation throughput and efficiency. Importantly, the bandwidth demand from weight data transfer for modern large CNNs causes CNN accelerators to be severely bandwidth bottlenecked, prompting the need for processing images in...

chapter

Proactive instruction fetch

Michael Ferdman, Cansu Kaynak, Babak Falsafi

2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 152 - 162

2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Fast access requirements preclude building L1 instruction caches large enough to capture the working set of server workloads. Efforts exist to mitigate limited L1 instruction cache capacity by relying on the stability and repetitiveness of the instruction stream to predict and prefetch future instruction blocks prior to their use. However, dynamic variation in cache miss sequences prevents correct...

chapter

Fused-layer CNN accelerators

Manoj Alwani, Han Chen, Michael Ferdman, Peter Milder

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 1 - 12

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Deep convolutional neural networks (CNNs) are rapidly becoming the dominant approach to computer vision and a major component of many other pervasive machine learning tasks, such as speech recognition, natural language processing, and fraud detection. As a result, accelerators for efficiently evaluating CNNs are rapidly growing in popularity. The conventional approaches to designing such CNN accelerators...

chapter

Overcoming resource underutilization in spatial CNN accelerators

Yongming Shen, Michael Ferdman, Peter Milder

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

Convolutional neural networks (CNNs) are revolutionizing a variety of machine learning tasks, but they present significant computational challenges. Recently, FPGA-based accelerators have been proposed to improve the speed and efficiency of CNNs. Current approaches construct an accelerator optimized to maximize the overall throughput of iteratively computing the CNN layers. However, this approach...

chapter

Demystifying cloud benchmarking

Tapti Palit, Yongming Shen, Michael Ferdman

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 122 - 132

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

The popularity of online services has grown exponentially, spurring great interest in improving server hardware and software. However, conducting research on servers has traditionally been challenging due to the complexity of setting up representative server configurations and measuring their performance. Recent work has eased the effort of benchmarking servers by making benchmarking software and...

article

A Case for Specialized Processors for Scale-Out Workloads

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, more

IEEE Micro > 2014 > 34 > 3 > 31 - 42

Emerging scale-out workloads need extensive amounts of computational resources. However, datacenters using modern server hardware face physical constraints in space and power, limiting further expansion and requiring improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints...

chapter

Scale-out processors

Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, Stavros Volos, more

2012 39th Annual International Symposium on Computer Architecture (ISCA) > 500 - 511

2012 ACM/IEEE 39th International Symposium on Computer Architecture (ISCA)

Scale-out datacenters mandate high per-server throughput to get the maximum benefit from the large TCO investment. Emerging applications (e.g., data serving and web search) that run in these datacenters operate on vast datasets that are not accommodated by on-die caches of existing server chips. Large caches reduce the die area available for cores and lower performance through long access latency...

article

Toward Dark Silicon in Servers

Nikos Hardavellas, Michael Ferdman, Babak Falsafi, Anastasia Ailamaki

IEEE Micro > 2011 > 31 > 4 > 6 - 15

Server chips will not scale beyond a few tens to low hundreds of cores, and an increasing fraction of the chip in future technologies will be dark silicon that we cannot afford to power. Specialized multicore processors, however, can leverage the underutilized die area to overcome the initial power barrier, delivering significantly higher performance for the same bandwidth and power envelopes.

chapter

TurboTag: Lookup filtering to reduce coherence directory power

Pejman Lotfi-Kamran, Michael Ferdman, Daniel Crisan, Babak Faisafi

2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED) > 377 - 382

2010 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED)

On-chip coherence directories of today's multi-core systems are not energy efficient. Coherence directories dissipate a significant fraction of their power on unnecessary lookups when running commercial server and scientific workloads. These workloads have large working sets that are beyond the reach of on-chip caches of modern processors. Limited to capturing a small part of the working set, private...

INFONA - science communication portal

Search results for: Michael Ferdman

Maximizing CNN accelerator efficiency through resource partitioning

Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer

Proactive instruction fetch

Fused-layer CNN accelerators

Overcoming resource underutilization in spatial CNN accelerators

Demystifying cloud benchmarking

A Case for Specialized Processors for Scale-Out Workloads

Scale-out processors

Toward Dark Silicon in Servers

TurboTag: Lookup filtering to reduce coherence directory power

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results for: Michael Ferdman

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options