Search results for: James C Hoe

Items from 1 to 20 out of 37 results

chapter

StarT-NG: Delivering seamless parallel computing

Derek Chiou, Boon S. Ang, Robert Greiner, Arvind, more

Lecture Notes in Computer Science > EURO-PAR '95 Parallel Processing > 101-116

StarT-ng is a joint MIT-Motorola project to build a high-performance message passing machine from commercial systems. Each site of the machine consists of a PowerPC 620-based Motorola symmetric multiprocessor (SMP) running the AIX 4.1 operating system. Every processor is connected to a low-latency, high-bandwidth network that is directly accessible from user-level code. In addition to fast message...

article

FFTs with Near-Optimal Memory Access Through Block Data Layouts: Algorithm, Architecture and Design Automation

Berkin Akin, Franz Franchetti, James C. Hoe

Journal of Signal Processing Systems > 2016 > 85 > 1 > 67-82

Fast Fourier transform algorithms on large data sets achieve poor performance on various platforms because of the inefficient strided memory access patterns. These inefficient access patterns need to be reshaped to achieve high performance implementations. In this paper we formally restructure 1D, 2D and 3D FFTs targeting a generic machine model with a two-level memory hierarchy requiring block data...

article

HAMLeT Architecture for Parallel Data Reorganization in Memory

Berkin Akin, Franz Franchetti, James C. Hoe

IEEE Micro > 2016 > 36 > 1 > 14 - 23

3D-stacked integration of DRAM and logic layers using through-silicon via (TSV) technology has given rise to a new interpretation of near-data processing (NDP) concepts that were proposed decades ago. However, processing capability within the stack is limited by stringent power and thermal constraints. Simple processing mechanisms with intensive memory accesses, such as data reorganization, are an...

article

The CONNECT Network-on-Chip Generator

Michael K. Papamichael, James C. Hoe

Computer > 2015 > 48 > 12 > 72 - 79

Efficiently supporting the communication needs of systems on chip with tens to hundreds of interacting modules requires a systematic and flexible network-on-chip (NoC) infrastructure. The freely available CONNECT generator lets users quickly navigate a range of design parameters to produce tailored NoC design instances in Verilog. To date, it has generated nearly 4,000 designs.

chapter

CoRAM++: Supporting data-structure-specific memory interfaces for FPGA computing

Gabriel Weisz, James C. Hoe

2015 25th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2015 25th International Conference on Field Programmable Logic and Applications (FPL)

Facilitating DRAM access is an essential part of an application programming environment for FPGA computing. Existing FPGA application programming environments primarily focus on support for simple, regular memory access patterns, such as block copy and streaming. This paper presents CoRAM++, a programming environment for FPGA computing that is based on an extensible set of data-structure-specific...

chapter

Nautilus: Fast automated IP design space search using guided genetic algorithms

Michael K. Papamichael, Peter Milder, James C. Hoe

2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC)

Today's offerings of parameterized hardware IP generators permit very high degrees of performance and implementation customization. Nevertheless, it is ultimately still left to the IP users to set IP parameters to achieve the desired tuning effects. For the average IP user, the knowledge and effort required to navigate a complex IP's design space can significantly offset the productivity gain from...

chapter

Data reorganization in memory using 3D-stacked DRAM

Berkin Akin, Franz Franchetti, James C. Hoe

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) > 131 - 143

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

In this paper we focus on common data reorganization operations such as shuffle, pack/unpack, swap, transpose, and layout transformations. Although these operations simply relocate the data in the memory, they are costly on conventional systems mainly due to inefficient access patterns, limited data reuse and roundtrip data traversal throughout the memory hierarchy. This paper presents a two pronged...

chapter

Enabling portable energy efficiency with memory accelerated library

Qi Guo, Tze-Meng Low, Nikolaos Alachiotis, Berkin Akin, more

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 750 - 761

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Over the last decade, the looming power wall has spurred a flurry of interest in developing heterogeneous systems with hardware accelerators. The questions, then, are what and how accelerators should be designed, and what software support is required. Our accelerator design approach stems from the observation that many efficient and portable software implementations rely on high performance software...

chapter

Algorithm/hardware co-optimized SAR image reconstruction with 3D-stacked logic in memory

Fazle Sadi, Berkin Akin, Doru T. Popovici, James C. Hoe, more

2014 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2014 IEEE High Performance Extreme Computing Conference (HPEC)

Real-time system level implementations of complex Synthetic Aperture Radar (SAR) image reconstruction algorithms have always been challenging due to their data intensive characteristics. In this paper, we propose a basis vector transform based novel algorithm to alleviate the data intensity and a 3D-stacked logic in memory based hardware accelerator as the implementation platform. Experimental results...

chapter

HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM

Berkin Akin, James C. Hoe, Franz Franchetti

2014 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2014 IEEE High Performance Extreme Computing Conference (HPEC)

Memory layout transformations via data reorganization are very common operations, which occur as a part of the computation or as a performance optimization in data-intensive applications. These operations require inefficient memory access patterns and roundtrip data movement through the memory hierarchy, failing to utilize the performance and energy-efficiency potentials of the memory subsystem. This...

chapter

Highly-parallel special-purpose multicore architecture for SystemC/TLM simulations

N. Ventroux, J. Peeters, T. Sassolas, James C. Hoe

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV) > 250 - 257

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV)

The complexity of SystemC virtual prototyping is continuously increasing. Accelerating RTL/TLM SystemC simulations is essential to control future SoC development cost and time-to-market. In this paper, we present RAVES, a highly-parallel special-purpose multicore architecture that achieves simulation performance more efficiently by parallel execution of light-weight user-level threads on many small...

chapter

Understanding the design space of DRAM-optimized hardware FFT accelerators

Berkin Akin, Franz Franchetti, James C. Hoe

2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors > 248 - 255

2014 IEEE 25th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

As technology scaling is reaching its limits, pointing to the well-known memory and power wall problems, achieving high-performance and energy-efficient systems is becoming a significant challenge. Especially for data-intensive computing, efficient utilization of the memory subsystem is the key to achieve high performance and energy efficiency.We address this challenge in DRAM-optimized hardware accelerators...

chapter

FFTS with near-optimal memory access through block data layouts

Berkin Akin, Franz Franchetti, James C. Hoe

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3898 - 3902

ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

chapter

GraphGen: An FPGA Framework for Vertex-Centric Graph Computation

Eriko Nurvitadhi, Gabriel Weisz, Yu Wang, Skand Hurkat, more

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines > 25 - 28

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

Vertex-centric graph computations are widely used in many machine learning and data mining applications that operate on graph data structures. This paper presents GraphGen, a vertex-centric framework that targets FPGA for hardware acceleration of graph computations. GraphGen accepts a vertex-centric graph specification and automatically compiles it onto an application-specific synthesized graph processor...

chapter

A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing

Qiuling Zhu, Berkin Akin, H. Ekin Sumbul, Fazle Sadi, more

2013 IEEE International 3D Systems Integration Conference (3DIC) > 1 - 7

2013 IEEE International 3D Systems Integration Conference (3DIC)

This paper introduces a 3D-stacked logic-in-memory (LiM) system that integrates the 3D die-stacked DRAM architecture with the application-specific LiM IC to accelerate important data-intensive computing. The proposed system comprises a fine-grained rank-level 3D die-stacked DRAM device and extra LiM layers implementing logic-enhanced SRAM blocks that are dedicated to a particular application. Through...

chapter

3D Point Cloud Reduction Using Mixed-Integer Quadratic Programming

Hyun Soo Park, Yu Wang, Eriko Nurvitadhi, James C. Hoe, more

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops > 229 - 236

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Large scale 3D image localization requires computationally expensive matching between 2D feature points in the query image and a 3D point cloud. In this paper, we present a method to accelerate the matching process and to reduce the memory footprint by analyzing the view-statistics of points in a training corpus. Given a training image set that is representative of common views of a scene, our approach...

chapter

Highly Efficient Performance Portable Tracking of Evolving Surfaces

Wei Yu, Franz Franchetti, James C. Hoe, Tsuhan Chen

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 296 - 307

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

In this paper we present a framework to obtain highly efficient implementations for the narrow band level set method on commercial off-the-shelf (COTS) multicore CPU systems with a cache-based memory hierarchy such as Intel Xeon and Atom processors. The narrow-band level set algorithm tracks wave-fronts in discretized volumes (for instance, explosion shock waves), and is computationally very demanding...

chapter

Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes

Berkin Akin, Peter A. Milder, Franz Franchetti, James C. Hoe

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines > 188 - 191

2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

Prevailing VLSI trends point to a growing gap between the scaling of on-chip processing throughput and off-chip memory bandwidth. An efficient use of memory bandwidth must become a first-class design consideration in order to fully utilize the processing capability of highly concurrent processing platforms like FPGAs. In this paper, we present key aspects of this challenge in developing FPGA-based...

chapter

Improving fixed-point accuracy of FFT cores in O-OFDM systems

Robert Koutsoyannis, Peter A. Milder, Christian R. Berger, Madeleine Glick, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1585 - 1588

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Optical OFDM communication systems operating at data rates in the 40Gb/s (and higher) range require high-throughput/highly parallel fast Fourier transform (FFT) implementations. These consume a significant amount of chip resources; we aim to reduce costs by improving the system's accuracy per chip-area. For OFDM signals, we characterize the growth of data within the FFT and explore several cost-conscious...

chapter

Predistortion and OFDM realizations

Robert I. Killey, Rachid Bouziane, Yannis Benlachtar, Philip M. Watts, more

IEEE Photonic Society 24th Annual Meeting > 684 - 685

2011 IEEE Photonics Conference (IPC 2011)

We have investigated real-time DSP algorithms implementing electronic predistortion and optical OFDM. These studies have been carried out through experiments with FPGA-based transmitters, and the design and post-synthesis simulations of transceiver ASICs.

Publication date

Set your own date range

Content availability

Available (36)
None (1)

Publication type

book (27)
article (10)

Keywords

HARDWARE (9)
FIELD PROGRAMMABLE GATE ARRAYS (7)
PIPELINE PROCESSING (5)
OFDM (4)
OPTICAL TRANSMITTERS (4)
RANDOM ACCESS MEMORY (4)
ALGORITHM DESIGN AND ANALYSIS (3)
COMPUTER ARCHITECTURE (3)
DATA LAYOUT (3)
DRAM (3)
FAST FOURIER TRANSFORMS (3)
FPGA (3)
MICROARCHITECTURE (3)
PIPELINES (3)
SEMANTICS (3)
TRANSCEIVERS (3)
AUTOMATIC PIPELINING (2)
BANDWIDTH (2)
BENCHMARK TESTING (2)
CONTEXT (2)
DATA STRUCTURES (2)
DISCRETE FOURIER TRANSFORMS (2)
FFT (2)
HAZARDS (2)
MICROPROCESSORS (2)
NETWORK-ON-CHIP (2)
OPTICAL FIBERS (2)
OPTICAL RECEIVERS (2)
REGISTERS (2)
SIGNAL RESOLUTION (2)
SILICON (2)
SOFTWARE (2)
T-PIPER (2)
T-SPEC (2)
TRANSACTIONAL DATAPATH SPECIFICATIONS (2)
2D-DFT (1)
2D-FFT (1)
3D DRAM (1)
3D STACKING (1)
3D-STACKED DRAM (1)
ACCELERATION (1)
ACCELERATOR (1)
ACCURACY (1)
ADDITION CHAIN (1)
ALGORITHMS (1)
APPLICATION SPECIFIC INTEGRATED CIRCUITS (1)
ARRAYS (1)
ARTIFICIAL NEURAL NETWORKS (1)
AUTOMATIC MULTITHREADED PIPELINE SYNTHESIS (1)
AUTOMATION (1)
BENCHMARKS (1)
BLUESPARC MULTITHREADED PROCESSOR (1)
BLUESPEC SYSTEM VERILOG (1)
CASE STUDIES (1)
CHANNEL ALLOCATION (1)
CHIP DESIGN (1)
CISC PROCESSOR PIPELINE DEVELOPMENT (1)
COMMERCIAL ANTIVIRUS SOFTWARE EFFECTIVENESS (1)
COMMERCIAL AV PRODUCTS (1)
COMPLEXITY THEORY (1)
COMPUTER CRIME (1)
COMPUTER SECURITY (1)
COMPUTER VIRUSES (1)
CONTROL SYSTEMS (1)
COST/PERFORMANCE TRADEOFF POINTS (1)
CRYPTOGRAPHY (1)
DATA MINING (1)
DATA PROCESSING (1)
DATAPATH SPECIFICATION (1)
DESIGN AUTOMATION (1)
DESIGN EXPLORATION OF X86 PROCESSOR PIPELINES (1)
DESIGN FRAMEWORK (1)
DESIGN SPACE (1)
DESIGN SYNTHESIS (1)
DETECTORS (1)
DFT (1)
DIGITAL SIGNAL PROCESSING (1)
DISCRETE FOURIER TRANSFORM (1)
ENERGY EFFICIENCY (1)
FAST FOURIER TRANSFORM (FFT) (1)
FIELD-PROGRAMMABLE GATE ARRAY (1)
FILLING (1)
FIXED-POINT (1)
FORMAL SPECIFICATION (1)
FORMAL VERIFICATION (1)
FULL-SYSTEM SIMULATION (1)
GRAPH COMPUTATION (1)
HANDWRITING RECOGNITION (1)
HARDWARE DESCRIPTION LANGUAGE (HDL) (1)
HARDWARE DESCRIPTION LANGUAGES (1)
HARDWARE DESIGN LANGUAGES (1)
HARDWARE IMPLEMENTATION (1)
HARDWARE STRUCTURE (1)
HARDWARE SYNTHESIS (1)
HIGH-LEVEL DESIGN (1)
HIGH-LEVEL DESIGN LANGUAGE (1)
HIGH-LEVEL SYNTHESIS (1)
HIGH-LEVEL UNPIPELINED DATAPATH SPECIFICATION (1)
IMAGE LOCALIZATION (1)
INFRASTRUCTURE (1)
more

Data set

ieee (34)
Springer (3)

INFONA - science communication portal

Search results for: James C Hoe

StarT-NG: Delivering seamless parallel computing

FFTs with Near-Optimal Memory Access Through Block Data Layouts: Algorithm, Architecture and Design Automation

HAMLeT Architecture for Parallel Data Reorganization in Memory

The CONNECT Network-on-Chip Generator

CoRAM++: Supporting data-structure-specific memory interfaces for FPGA computing

Nautilus: Fast automated IP design space search using guided genetic algorithms

Data reorganization in memory using 3D-stacked DRAM

Enabling portable energy efficiency with memory accelerated library

Algorithm/hardware co-optimized SAR image reconstruction with 3D-stacked logic in memory

HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM

Highly-parallel special-purpose multicore architecture for SystemC/TLM simulations

Understanding the design space of DRAM-optimized hardware FFT accelerators

FFTS with near-optimal memory access through block data layouts

GraphGen: An FPGA Framework for Vertex-Centric Graph Computation

A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing

3D Point Cloud Reduction Using Mixed-Integer Quadratic Programming

Highly Efficient Performance Portable Tracking of Evolving Surfaces

Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes

Improving fixed-point accuracy of FFT cores in O-OFDM systems

Predistortion and OFDM realizations

Filter options

Publication date

Content availability

Publication type

Keywords

Data set

Journal

INFONA - science communication portal

Search results for: James C Hoe

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Data set

Journal

Reporting an error / abuse

Sending the report failed

Accessibility options