Search results for: Nachiket Kapre

Items from 1 to 20 out of 48 results

chapter

Deflection-routed butterfly fat trees on FPGAs

Nachiket Kapre

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Bufferless, deflection-routed, Butterfly Fat Trees (BFTs) can outperform state-of-the-art FPGAs overlay NoCs such as Hoplite by as much as 2–5× on throughput and ≈5× on worst-case latency at identical PE counts, and by ≈1.5× on throughput at identical resource costs >16K LUTs for statistical traffic patterns. In this paper, we show how to modify the tree connectivity and routing function to support...

chapter

Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs

Kizhepatt Vipin, Jan Gray, Nachiket Kapre

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Deflection-routed FPGA overlay NoCs such as Hoplite suffer from high worst-case routing latencies due to the penalty of deflections at large system sizes. Segmentation of communication channels in such NoCs can (1) reduce worst-case packet routing latencies for FPGA traffic, (2) enable efficient composition of multi-application NoC workloads, and (3) ease the burden of supporting Partial Reconfiguration...

chapter

On Bit-Serial NoCs for FPGAs

Nachiket Kapre

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 32 - 39

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

We can build lightweight bit-serial FPGA NoC routers thatcost 20 LUT, 17 FF per router and operate at 800–900 MHzspeeds. Each bit-serial router implements deflection-routing on aunidirectional torus topology requiring 1b-wide connection perport. The key ideas that enable this implementation are (1)reformulation of the dimension-ordered routing (DOR) functionusing compact 1 LUT, 1 FF streaming pattern...

chapter

Implementing FPGA Overlay NoCs Using the Xilinx UltraScale Memory Cascades

Nachiket Kapre

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 40 - 47

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

We can enhance the performance and efficiency of deflection-routed FPGA overlay NoCs by exploiting the cascading featureof the Xilinx UltraScale BlockRAMs. This allows us to (1) hardenthe multiplexers in the NoC switch crossbars, and (2) efficientlyadd buffering support to deflection-routing. While buffering isnot required for correct operation of a deflection routed NoC, it can boost network throughputs...

chapter

eBSP: Managing NoC traffic for BSP workloads on the 16-core Adapteva Epiphany-III processor

Siddhartha, Nachiket Kapre

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 73 - 78

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

We can deliver high performance and energy efficient operation on the multi-core NoC-based Adapteva Epiphany-III SoC for bulk-synchronous workloads using our proposed eBSP communication API. We characterize and automate performance tuning of spatial parallelism for supporting (1) random access load-store style traffic suitable for irregular sparse computations, as well as (2) variable, data-dependent...

chapter

Deflection routing for multi-level FPGA overlay NoCs

Kumar H B Chethan, Shubham Agarwal, Nachiket Kapre

2016 International Conference on Field-Programmable Technology (FPT) > 149 - 156

2016 International Conference on Field-Programmable Technology (FPT)

Reducing worst case routing latencies while delivering high throughput and low energy are key design concerns in the engineering of overlay packet-switched NoCs for FPGA fabrics. Deflection routed torus NoCs are known to map particularly well to modern wire-rich FPGA substrates with fracturable LUT organizations while delivering high sustained bandwidths for various workloads and traffic patterns...

chapter

Learning to Extract API Mentions from Informal Natural Language Discussions

Deheng Ye, Zhenchang Xing, Chee Yong Foo, Jing Li, more

2016 IEEE International Conference on Software Maintenance and Evolution (ICSME) > 389 - 399

2016 IEEE International Conference on Software Maintenance and Evolution (ICSME)

When discussing programming issues on social platforms (e.g, Stack Overflow, Twitter), developers often mention APIs in natural language texts. Extracting API mentions in natural language texts is a prerequisite for effective indexing and searching for API-related information in software engineering social content. However, the informal nature of social discussions creates two fundamental challenges...

chapter

Boosting convergence of timing closure using feature selection in a Learning-driven approach

Que Yanghua, Harnhua Ng, Nachiket Kapre

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 9

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

Machine Learning approaches for automated selection of FPGA CAD tool parameters have been demonstrated to be useful for timing closure of FPGA designs [3], [4]. This is achieved by running the CAD tool multiple times with small variations in the the CAD parameter values. The timing slack from each run is recorded into a database along with all input parameter selections to help train a classifier...

chapter

Vector FPGA acceleration of 1-D DWT computations using sparse matrix skeletons

Sidharth Maheshwari, Gourav Modi, Siddhartha, Nachiket Kapre

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

We can exploit application-specific sparse structure and distribution of non-zero coefficients in Discrete Wavelet Transform (DWT) matrices to significantly improve the performance of 1-D DWT mapped to FPGA-based soft vector processors. We reformulate DWT computations specifically in terms of sparse matrix operations, where the transformation matrices have a repeating block with a fixed non-zero pattern,...

chapter

Survey of domain-specific languages for FPGA computing

Nachiket Kapre, Samuel Bayliss

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 12

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

High-performance FPGA programming has typically been the exclusive domain of a small band of specialized hardware developers. They are capable of reasoning about implementation concerns at the register-transfer level (RTL) which is analogous to assembly-level programming in software. Sometimes these developers are required to push further down to manage even lower levels of abstraction closer to physical...

chapter

Hoplite-DSP: Harnessing the Xilinx DSP48 multiplexers to efficiently support NoCs on FPGAs

Kumar H B Chethan, Nachiket Kapre

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 10

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

We can embed the crossbar functionality of NoC (network-on-chip) routers onto the hard multiplexers of Xilinx DSP48E primitives to support resource efficient mapping of FPGA overlay NoCs. This embedding also permits the use of dedicated hard wiring resources of the DSP cascade links to support vertical NoC channels. This unique mapping allows us to significantly reduce soft logic (LUTs+FFs) utilization...

chapter

Evaluating Embedded FPGA Accelerators for Deep Learning Applications

Gopalakrishna Hegde, Siddhartha, Nachiappan Ramasamy, Vamsi Buddha, more

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 25

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

FPGA-based embedded soft vector processors can exceed the performance and energy-efficiency of embedded GPUs and DSPs for lightweight deep learning applications. For low complexity deep neural networks targeting resource constrained platforms, we develop optimized Caffe-compatible deep learning library routines that target a range of embedded accelerator-based systems between 4 -- 8 W power budgets...

chapter

Communication Optimization for the 16-Core Epiphany Floating-Point Processor Array

Nachiket Kapre, Siddhartha

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 26

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

The management and optimization of communication in an NoC-based (network-on-chip) bespoke computing platform such as the Parallella (Zynq 7010 + Epiphany-III SoC) is critical for performance and energy-efficiency of floating-point bulk-synchronous workloads. In this paper, we explore the opportunities and capabilities of the Epiphany-III SoC for communication-intensive workloads. Using our communication...

chapter

Improving Classification Accuracy of a Machine Learning Approach for FPGA Timing Closure

Que Yanghua, Nachiket Kapre, Harnhua Ng, Kirvy Teo

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 80 - 83

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

We can use Cloud Computing and Machine Learning to help deliver timing closure of FPGA designs using InTime [2], [3]. This approach requires no modification to the input RTL and relies exclusively on manipulating the CAD tool parameters that drive the optimization heuristics. By running multiple combinations of the parameters in parallel, we learn from results and identify which parameters caused...

chapter

Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs

Nachiket Kapre

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 156 - 163

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

We can improve the performance of deflection-routed FPGA overlay networks-on-chip (NoCs) like Hoplite by as much as 10× (random traffic) at the expense of modest extra storage cost when combining static scheduling with packet switching in an efficient, hybrid manner. Deflection routed bufferless NoCs such as Hoplite, allow extremely lightweight packet switched routers on FPGAs, but suffer from high...

article

The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow

Deheng Ye, Zhenchang Xing, Nachiket Kapre

Empirical Software Engineering > 2017 > 22 > 1 > 375-406

Programming-specific Q&A sites (e.g., Stack Overflow) are being used extensively by software developers for knowledge sharing and acquisition. Due to the cross-reference of questions and answers (note that users also reference URLs external to the Q&A site. In this paper, URL sharing refers to internal URLs within the Q&A site, unless otherwise stated), knowledge is diffused in the Q&A...

chapter

Software-Specific Named Entity Recognition in Software Engineering Social Content

Deheng Ye, Zhenchang Xing, Chee Yong Foo, Zi Qun Ang, more

2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) > 1 > 90 - 101

2016 IEEE 23rd International Conference on Software Analysis, Evolution and Reengineering (SANER)

Software engineering social content, such as Q&A discussions on Stack Overflow, has become a wealth of information on software engineering. This textual content is centered around software-specific entities, and their usage patterns, issues-solutions, and alternatives. However, existing approaches to analyzing software engineering texts treat software-specific entities in the same way as other...

chapter

CaffePresso: An optimized library for Deep Learning on embedded accelerator-based platforms

Gopalakrishna Hegde, Siddhartha, Nachiappan Ramasamy, Nachiket Kapre

2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES) > 1 - 10

2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES)

Off-the-shelf accelerator-based embedded platforms offer a competitive energy-efficient solution for lightweight deep learning computations over CPU-based systems. Low-complexity classifiers used in power-constrained and performance-limited scenarios are characterized by operations on small image maps with 2– 3 deep layers and few class labels. For these use cases, we consider a range of embedded...

chapter

FPT2015 PC chair's report

Nachiket Kapre, Oliver Sinnen

2015 International Conference on Field Programmable Technology (FPT) > 1 - 3

2015 International Conference on Field Programmable Technology (FPT)

This year we have a strong FPT program with a full paper acceptance rate of ∼21% from a pool of 106 submissions. This is a ∼25% increase in the number of submissions over last year while still maintaining a near identical number of accepted papers. We strictly enforced all advertised deadlines, and significantly revised the Program Committee to help improve review quality. We had 132 original registrations...

chapter

G-DMA: improving memory access performance for hardware accelerated sparse graph computation

Andrew Bean, Nachiket Kapre, Peter Cheung

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

Scatter-gather direct memory access (DMA) transfers can be used to efficiently fetch graph memory data for onchip processing of graph applications. We present a hardware controlled graph DMA engine which can operate autonomously without the need for CPU interaction. Graph processing algorithms can asynchronously request graph data which is fetched from memory and streamed to the processing core. An...

Publication date

Set your own date range

INFONA - science communication portal

Search results for: Nachiket Kapre

Deflection-routed butterfly fat trees on FPGAs

Enabling partial reconfiguration and low latency routing using segmented FPGA NoCs

On Bit-Serial NoCs for FPGAs

Implementing FPGA Overlay NoCs Using the Xilinx UltraScale Memory Cascades

eBSP: Managing NoC traffic for BSP workloads on the 16-core Adapteva Epiphany-III processor

Deflection routing for multi-level FPGA overlay NoCs

Learning to Extract API Mentions from Informal Natural Language Discussions

Boosting convergence of timing closure using feature selection in a Learning-driven approach

Vector FPGA acceleration of 1-D DWT computations using sparse matrix skeletons

Survey of domain-specific languages for FPGA computing

Hoplite-DSP: Harnessing the Xilinx DSP48 multiplexers to efficiently support NoCs on FPGAs

Evaluating Embedded FPGA Accelerators for Deep Learning Applications

Communication Optimization for the 16-Core Epiphany Floating-Point Processor Array

Improving Classification Accuracy of a Machine Learning Approach for FPGA Timing Closure

Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs

The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow

Software-Specific Named Entity Recognition in Software Engineering Social Content

CaffePresso: An optimized library for Deep Learning on embedded accelerator-based platforms

FPT2015 PC chair's report

G-DMA: improving memory access performance for hardware accelerated sparse graph computation

Filter options

Publication date

Publication type

Keywords

Data set

Journal

INFONA - science communication portal

Search results for: Nachiket Kapre

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Journal

Reporting an error / abuse

Sending the report failed

Accessibility options