Search results

Items from 101 to 120 out of 506 results

1 ...
3
4
5
6
7
8
9

chapter

Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory

Srividya Ramanathan, Gautam Hazari, Kanishka Lahiri, Francesco Spadini

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 204 - 213

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

The increasing adoption of GPUs as mainstream computing devices, coupled with the imminent availability of large high-bandwidth caches based on die-stacked memory makes it important to analyze and understand modern GPU compute applications from the perspective of their memory access and data reuse characteristics. This paper presents detailed workload characterization studies on four GPU compute applications...

chapter

Investigations into techniques to accelerate memory intensive GPGPU applications

Winnie Thomas, Rohin D. Daruwala

2015 Annual IEEE India Conference (INDICON) > 1 - 6

2015 Annual IEEE India Conference (INDICON)

Recent advancements in the architecture of Graphic Processing Unit (GPU), enables the acceleration of many general purpose applications. Even with high memory bandwidth, GPUs are still faced with the challenge of accelerating highly memory intensive applications. To overcome this challenge this paper investigates the impact of scaling up of the memory partitions and also scaling of frequency of the...

chapter

Energy Aware Synthesis of Application Kernels Expressed in Functional Languages on a Coarse Grained Composable Reconfigurable Array

S. Nalesh, Kavitha T. Madhu, Saptarsi Das, S.K. Nandy, more

2015 IEEE International Symposium on Nanoelectronic and Information Systems > 7 - 12

2015 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS)

With transistor energy efficiency not scaling at the same rate as transistor density and frequency, CMOS technology has hit a utilization wall, whereby large portions of the chip remain under clocked. To improve performance, while keeping power dissipation at a realistic level, future computing devices will consist of heterogeneous application specific accelerators. The accelerators have to be synthesised...

chapter

A nonlinear feature selection method based on kernel separability measure for hyperspectral image classification

Pei-Jyun Hsieh, Cheng-Hsuan Li, Bor-Chen Kuo

2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 461 - 464

2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

Many research shows that we will encounter the Highes phenomenon when dealing with the high-dimensional data classification problem. In addition, non-linear support vector machine (SVM) has been shown that it can conquer the problem efficiently. However, the SVM is a black-box model based on the whole features and does not provide the feature importance or “good” feature subset for classification...

chapter

A comparative analysis of resource requirements for parallel applications in GPGPU

Winnie Thomas, Rohin D. Daruwala

TENCON 2015 - 2015 IEEE Region 10 Conference > 1 - 6

TENCON 2015 - 2015 IEEE Region 10 Conference

The Single Instruction Multiple Thread (SIMT) architecture based Graphic Processing Units (GPUs) are emerging as more efficient platforms than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within...

chapter

Population Synthesis via k-Nearest Neighbor Crossover Kernel

Naoki Hamada, Katsumi Homma, Hiroyuki Higuchi, Hideyuki Kikuchi

2015 IEEE International Conference on Data Mining > 763 - 768

2015 IEEE International Conference on Data Mining (ICDM)

The recent development of multi-agent simulations brings about a need for population synthesis. It is a task of reconstructing the entire population from a sampling survey of limited size (1% or so), supplying the initial conditions from which simulations begin. This paper presents a new kernel density estimator for this task. Our method is an analogue of the classical Breiman-Meisel-Purcell estimator,...

chapter

Efficient non-local kernel regression with structural classification for multiview image denoising

Zhou-Chi Lin, Jia-Fei Wu, Shing-Chow Chan

TENCON 2015 - 2015 IEEE Region 10 Conference > 1 - 4

TENCON 2015 - 2015 IEEE Region 10 Conference

This paper presents an effective image structure classification method, which was recently proposed for selecting the key parameter of non-local kernel regression (NLKR) namely the kernel bandwidth. Meanwhile, to overcome the problem of intensive computation cost of the non-local patch searching in NLKR, a fast patch searching strategy is proposed according to the classified structure regions. The...

chapter

Performance Evaluation of Hypervisors for HPC Applications

David Beserra, Felipe Oliveira, Jean Araujo, Felipe Fernandes, more

2015 IEEE International Conference on Systems, Man, and Cybernetics > 846 - 851

2015 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

High Performance Computing (HPC) aggregates computing power in order to solve large and complex problems in different knowledge areas. Nowadays, HPC users can utilize virtualized infrastructures as a low-cost alternative to deploy their applications. However, virtualization brings some challenges for HPC, specially in regard to overhead caused by hyper visors. In this work, our main goal is to analyze...

chapter

Progressive Codesign of an Architecture and Compiler Using a Proxy Application

Arpith Jacob, Ravi Nair, Tong Chen, Zehra Sura, more

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 57 - 64

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

The Active Memory Cube (AMC) is a novel near-memory processor that exploits high memory bandwidth and low latency close to DRAM to execute scientific applications in an energy-efficient manner. Its energy efficiency is derived from a combination of its novel scalar-vector data-flow path combined with its simple control-flow path that required the development of a sophisticated compiler, co-designed...

chapter

Adaptive Kernel background intensity estimation based on local 2D orientation

Johannes Wintenby, Daniel Svensson

2015 18th International Conference on Information Fusion (Fusion) > 1786 - 1793

2015 18th International Conference on Information Fusion (Fusion)

Many target tracking algorithms for radar systems assume homogeneous backgrounds of clutter. However, real backgrounds are rarely homogeneous. By estimating background intensity, and using the estimate in the likelihood measure, the tracking algorithm is given the ability to adapt to the background. In this work, a method for estimating the clutter intensity is introduced. The method is based on locally...

chapter

RDMA-Based Direct Transfer of File Data to Remote Page Cache

Shin Sasaki, Kazushi Takahashi, Yoshihiro Oyama, Osamu Tatebe

2015 IEEE International Conference on Cluster Computing > 214 - 225

2015 IEEE International Conference on Cluster Computing (CLUSTER)

The performance of a distributed file system significantly affects data-intensive applications that frequently execute I/O operations on large amounts of data. Although many modern distributed file systems are geared to provide highly efficient I/O performance, their operations are nonetheless affected by runtime overhead in data transfer between client nodes and I/O servers. A large part of the overhead...

chapter

Hierarchical clustering and k-means analysis of HPC application kernels performance characteristics

M.L. Grodowitz, Sarat Sreepathi

2015 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2015 IEEE High Performance Extreme Computing Conference (HPEC)

In this work, we present the characterization of a set of scientific kernels which are representative of the behavior of fundamental and applied physics applications across a wide range of fields. We collect performance attributes in the form of micro-operation mix and off-chip memory bandwidth measurements for these kernels. Using these measurements, we use two clustering methodologies to show which...

chapter

Sampling of signals with finite rate of innovation in parameter space

Zelong Wang, Jubo Zhu

2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) > 1 - 5

2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)

We consider the sampling of signals with finite rate of innovation (FRI) in parameter space to reach the minimal sampling rate. Although the sampling of signals with unknown time locations has been treated in previous works, it is difficult to sample the signals with unknown parameters in other parameter space. In this paper, we redefine the signal with FRI and propose a general framework of the FRI...

chapter

Kernel density estimation for target trajectory prediction

Vahab Akbarzadeh, Christian Gagne, Marc Parizeau

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) > 3449 - 3456

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

This paper proposes the use of a kernel density estimation to measure similarities between trajectories. The similarities are then used to predict the future locations of a target. For a given environment with a history of previous target trajectories, the goal is to establish a probabilistic framework to predict the future trajectory of currently observed targets based on their recent moves. Instead...

chapter

Compiling HPC Kernels for the REDEFINE CGRA

Kavitha T. Madhu, Saptarsi Das, Nalesh S., S. K. Nandy, more

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 405 - 410

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

In this paper, we present a compilation flow for HPC kernels on the REDEFINE coarse-grain reconfigurable architecture (CGRA). REDEFINE is a scalable macro-dataflow machine in which the compute elements (CEs) communicate through messages. REDEFINE offers the ability to exploit high degree of coarse-grain and pipeline parallelism. The CEs in REDEFINE are enhanced with reconfigurable macro data-paths...

chapter

Scaling number of cores in GPGPU: A comparative performance analysis

Winnie Thomas, Rohin D. Daruwala

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 501 - 507

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

The Single Instruction Multiple Thread (SIMT) architecture based, Graphic Processing Units (GPUs) are emerging as more efficient than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous finegrained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within a CTA...

chapter

Task behaviour inputs to a heterogeneous multiprocessor scheduler

Aniruddha Banerjee, Sharan Kumar Allur, Syam Prasad Kuncha

2015 Eighth International Conference on Contemporary Computing (IC3) > 25 - 29

2015 Eighth International Conference on Contemporary Computing (IC3)

A novel and intuitive way of scheduling entities on a heterogeneous multiprocessing system is presented. The key idea is to understand the behavioral characteristics (foreground/background, IO-bound/CPU-bound) of a scheduling entity to predict the need for its processing bandwidth. This is then used by the scheduler to influence the selection of the big cluster (high-performance) or the LITTLE cluster...

chapter

An FPGA Memory Hierarchy for High-level Synthesized OpenCL Kernels

Hsiang-Yu Tseng, Ssu-Ting Liu, Sheng-De Wang

In this paper, we propose an FPGA memory hierarchy based on the OpenCL memory model. The memory hierarchy allows application-specific memory optimizations during design compilation using information provided in OpenCL kernels. With the proposed memory hierarchy, FPGA application developers can focus on their designs in OpenCL kernel codes, and their designs can be synthesized into FPGA hardware via...

chapter

Lightweight built-in network monitor in Linux kernel for self-adaptive IoT devices

Junyeol Lee, Taehyun Kim, Woo-cheol Cho, Hyuk-jun Lee

2015 17th International Conference on Advanced Communication Technology (ICACT) > 609 - 612

2015 17th International Conference on Advanced Communication Technology (ICACT)

Most of the network monitor systems use a user-level network capture library. The user-level library incurs a large overhead and provides inaccurate and insufficient information for self-adaptive networks. For these reasons, we develop a lightweight built-in network monitor running at Linux kernel level for self-adaptive IoT devices.

chapter

Performance Analysis of LXC for HPC Environments

David Beserra, Edward David Moreno, Patricia Takako Endo, Jymmy Barreto, more

2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems > 358 - 363

2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)

Despite of Cloud infrastructures can be used as High Performance Computing (HPC) platforms, many issues from virtualization overhead had kept them unrelated. However, with advent of container-based virtualizers, this scenario acquires new perspectives because this technique promises to decrease the virtualization overhead, achieving a near-native performance. In this work, we analyzed the performance...

1 ...
3
4
5
6
7
8
9

Keywords:
KERNEL
BANDWIDTH

Publication date

Set your own date range

Content availability

Available (503)
None (3)

Keywords

ESTIMATION (80)
GRAPHICS PROCESSING UNITS (48)
LINUX (43)
COMPUTATIONAL MODELING (39)
OPTIMIZATION (39)
COMPUTER ARCHITECTURE (37)
MEMORY MANAGEMENT (37)
HARDWARE (35)
INSTRUCTION SETS (35)
SERVERS (32)
PERFORMANCE EVALUATION (31)
THROUGHPUT (31)
BENCHMARK TESTING (29)
CLUSTERING ALGORITHMS (28)
PROTOCOLS (28)
FIELD PROGRAMMABLE GATE ARRAYS (26)
RANDOM ACCESS MEMORY (25)
TRAINING (25)
GRAPHICS PROCESSING UNIT (24)
HISTOGRAMS (24)
GPU (23)
IMAGE SEGMENTATION (23)
KERNEL DENSITY ESTIMATION (23)
PARALLEL PROCESSING (23)
ALGORITHM DESIGN AND ANALYSIS (21)
MEAN SHIFT (20)
PIXEL (20)
SUPPORT VECTOR MACHINES (20)
DATA MODELS (19)
COPROCESSORS (18)
PROGRAM PROCESSORS (18)
REGRESSION ANALYSIS (18)
SMOOTHING METHODS (18)
FEATURE EXTRACTION (17)
IMAGE COLOR ANALYSIS (17)
REGISTERS (17)
ROBUSTNESS (17)
VECTORS (17)
DATA MINING (16)
DELAY (16)
LIBRARIES (16)
MATHEMATICAL MODEL (16)
MEASUREMENT (16)
MONITORING (16)
PROGRAMMING (16)
SHAPE (16)
IP NETWORKS (15)
NOISE (15)
PATTERN CLUSTERING (15)
SPARSE MATRICES (15)
TARGET TRACKING (15)
COMPUTER GRAPHIC EQUIPMENT (14)
DELAYS (14)
PROBABILITY DENSITY FUNCTION (14)
RECEIVERS (14)
ARRAYS (13)
CONVERGENCE (13)
MULTIPROCESSING SYSTEMS (13)
OBJECT DETECTION (13)
TRANSPORT PROTOCOLS (13)
ADAPTATION MODEL (12)
APPROXIMATION METHODS (12)
GPGPU (12)
POLYNOMIALS (12)
STANDARDS (12)
TRACKING (12)
MESSAGE PASSING (11)
MULTICORE PROCESSING (11)
OBJECT TRACKING (11)
OPERATING SYSTEM KERNELS (11)
QUALITY OF SERVICE (11)
VIRTUALIZATION (11)
COMPUTER VISION (10)
FPGA (10)
INTERNET (10)
ITERATIVE METHODS (10)
LEARNING (ARTIFICIAL INTELLIGENCE) (10)
MACHINE LEARNING (10)
RESOURCE MANAGEMENT (10)
SCHEDULING (10)
VIRTUAL MACHINING (10)
ACCURACY (9)
CLUSTERING (9)
CORRELATION (9)
CUDA (9)
EQUATIONS (9)
ESTIMATION THEORY (9)
GAUSSIAN PROCESSES (9)
MEDICAL IMAGE PROCESSING (9)
MEMORY BANDWIDTH (9)
OPENCL (9)
PROBABILITY (9)
SOCKETS (9)
STREAMING MEDIA (9)
SYNCHRONIZATION (9)
TCP (9)
TELECOMMUNICATION CONGESTION CONTROL (9)
ACCELERATION (8)
more

INFONA - science communication portal

Search results

Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory

Investigations into techniques to accelerate memory intensive GPGPU applications

Energy Aware Synthesis of Application Kernels Expressed in Functional Languages on a Coarse Grained Composable Reconfigurable Array

A nonlinear feature selection method based on kernel separability measure for hyperspectral image classification

A comparative analysis of resource requirements for parallel applications in GPGPU

Population Synthesis via k-Nearest Neighbor Crossover Kernel

Efficient non-local kernel regression with structural classification for multiview image denoising

Performance Evaluation of Hypervisors for HPC Applications

Progressive Codesign of an Architecture and Compiler Using a Proxy Application

Adaptive Kernel background intensity estimation based on local 2D orientation

RDMA-Based Direct Transfer of File Data to Remote Page Cache

Hierarchical clustering and k-means analysis of HPC application kernels performance characteristics

Sampling of signals with finite rate of innovation in parameter space

Kernel density estimation for target trajectory prediction

Compiling HPC Kernels for the REDEFINE CGRA

Scaling number of cores in GPGPU: A comparative performance analysis

Task behaviour inputs to a heterogeneous multiprocessor scheduler

An FPGA Memory Hierarchy for High-level Synthesized OpenCL Kernels

Lightweight built-in network monitor in Linux kernel for self-adaptive IoT devices

Performance Analysis of LXC for HPC Environments

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options