Search results

Items from 1 to 20 out of 39 results

chapter

Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs

Peng Di, Ding Ye, Yu Su, Yulei Sui, more

2012 41st International Conference on Parallel Processing > 350 - 359

2012 41st International Conference on Parallel Processing (ICPP)

Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of GPUs to obtain high performance. One state-of-the-art approach makes use of the polyhedral model to extract parallelism from a loop nest by applying a sequence of affine transformations to the loop nest. However, how to automate this process to exploit both intra and inter-SM parallelism for GPUs remains a...

chapter

Optimizing a Collaborative Filtering Recommender for Many-Core Processors

Aalap Tripathy, Suneil Mohan, Rabi Mahapatra

2012 IEEE Sixth International Conference on Semantic Computing > 261 - 268

2012 IEEE Sixth International Conference on Semantic Computing (ICSC)

The web is moving from an era of "search" to that of "discovery". Collaborative filtering (CF) recommender systems are now commonly used to predict user's preference towards an unknown item from past ratings. To be scalable or effective, they are typically deployed in distributed clusters and operate on extremely large apriori datasets. Improvement of the efficiency of these systems...

chapter

Procedural textures using tilings with Perlin Noise

David Maung, Yinxuan Shi, Roger Crawfis

2012 17th International Conference on Computer Games (CGAMES) > 60 - 65

2012 17th International Conference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educational & Serious Games (CGAMES)

In this paper, we demonstrate the use of tiling with noise to generate rich procedural textures. We introduce the idea of storing tiles which consist of only the gradients stored at the integer lattice points and constructing a texture on the GPU from these tiles. We also introduce the idea of using mipmapped tiles to store gradients for turbulence. Finally we demonstrate a novel use of mipmaps to...

chapter

Exploiting GPUs for multi-agent path planning on grid maps

Giuseppe Caggianese, Ugo Erra

2012 International Conference on High Performance Computing & Simulation (HPCS) > 482 - 488

2012 International Conference on High Performance Computing & Simulation (HPCS)

Multi-agent path planning on grid maps is a challenging problem and has numerous real-life applications ranging from robotics to real-time strategy games and non-player characters in video games. A^∗ is a cost-optimal forward search algorithm for path planning which scales up poorly in practice since both the search space and the branching factor grow exponentially in the number of agents. In this...

chapter

Fragment Reduction on Mobile GPU with Content Adaptive Sampling

Chia-Yang Chang, Yu-Jung Chen, Chia-Ming Chang, Shao-Yi Chien

2012 IEEE International Conference on Multimedia and Expo Workshops > 629 - 634

2012 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

Fragment shaders in a graphics pipeline are used to compute the color for each pixel, where lighting, texture loading, and other calculations are involved. The required computing power is proportional to the number of input fragments. In order to improve the power efficiency of mobile GPUs, a content adaptive sampling scheme is proposed to reduce the fragments. The proposed scheme is based on tile-based...

chapter

Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

George Teodoro, Tahsin M. Kurc, Tony Pan, Lee A.D. Cooper, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 1093 - 1104

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved...

chapter

Scaling Data-Intensive Applications on Heterogeneous Platforms with Accelerators

Ana Balevic, Bart Kienhuis

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1866 - 1873

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Heterogeneous parallel systems including accelerators such as Graphics Processing Units (GPUs), are expected to play a major role in architecting the largest systems in the world, as well as the most powerful embedded devices. Impressive computational speedups have been reported for numerous algorithms in fields of medical image processing, digital signal processing, astrophysics, modeling and simulations...

chapter

Tile-based GPU optimizations through ESL full system simulation

Hsu-Yao Huang, Chi-Yuan Huang, Chung-Ho Chen

2012 IEEE International Symposium on Circuits and Systems > 1327 - 1330

2012 IEEE International Symposium on Circuits and Systems - ISCAS 2012

We present a tile-based GPU design which is modeled in a full system simulation platform. The full system simulation platform includes a functional Linux-based system on which the GPU is incorporated for design explorations. To accurately estimate the execution time of the application graphics software, an execution time synchronization mechanism for the virtual platform is developed. We extend the...

chapter

Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs

Ehsan Totoni, Babak Behzad, Swapnil Ghike, Josep Torrellas

2012 IEEE International Symposium on Performance Analysis of Systems & Software > 78 - 87

2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)

Power dissipation and energy consumption are becoming increasingly important architectural design constraints in different types of computers, from embedded systems to large-scale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the best use of exponentially increasing numbers of transistors within the power and energy budgets...

chapter

A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC

Min Kyu Jeong, Mattan Erez, Chander Sudanthi, Nigel Paver

DAC Design Automation Conference 2012 > 850 - 855

2012 49th ACM/EDAC/IEEE Design Automation Conference (DAC)

Diverse IP cores are integrated on a modern system-on-chip and share resources. Off-chip memory bandwidth is often the scarcest resource and requires careful allocation. Two of the most important cores, the CPU and the GPU, can both simultaneously demand high bandwidth. We demonstrate that conventional quality-of-service allocation techniques can severely constrict GPU performance by allowing the...

chapter

Performance evaluation of CPU-GPU and CPU-only algorithms for detecting defective tablets through morphological imaging techniques

Hasan Baig, Jeong-A Lee, Jieun Lee

7th Iberian Conference on Information Systems and Technologies (CISTI 2012) > 1 - 6

2012 7th Iberian Conference on Information Systems and Technologies (CISTI)

Pharmaceutical industries which are intended for the packaging of different tablets in a strip of blister need to make sure that the tablets are free from defects before letting them go into the packing box. The purpose of this project is to speed-up the system process via implementing the image processing algorithm on GPU. Morphological and mathematical operations have been implemented on both GPU...

chapter

LU factorization for accelerator-based systems

Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Mathieu Faverge, more

2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA) > 217 - 224

2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA)

Multicore architectures enhanced with multiple GPUs are likely to become mainstream High Performance Computing (HPC) platforms in a near future. In this paper, we present the design and implementation of an LU factorization using tile algorithm that can fully exploit the potential of such platforms in spite of their complexity. We use a methodology derived from previous work on Cholesky and QR factorizations...

chapter

Real-time Person Tracking in High-resolution Panoramic Video for Automated Broadcast Production

Rene Kaiser, Marcus Thaler, Andreas Kriechbaum, Hannes Fassold, more

2011 Conference for Visual Media Production > 21 - 29

2011 Conference for Visual Media Production (CVMP)

For enabling immersive user experiences for interactive TV services and automating camera view selection and framing, knowledge of the location of persons in a scene is essential. We describe an architecture for detecting and tracking persons in high-resolution panoramic video streams, obtained from the Omni Cam, a panoramic camera stitching video streams from 6 HD resolution tiles. We use a CUDA...

chapter

Image mosaic using Log-polar binning

Le Thanh Hoan, Chun Youngjae, Oh Kyoungsu

The First Asian Conference on Pattern Recognition > 144 - 148

2011 First Asian Conference on Pattern Recognition (ACPR 2011)

Image mosaic is a large image assembled from many smaller tiles which one tile itself is an actual image. In this research, we introduce an efficient method to make image mosaic. Our method is based on Log-polar mapping which enables us to detect the color and shape change. We also successfully make an image mosaic version by exploiting GPU power. Our algorithm is simple, easy to implement, gives...

chapter

Parameterized Micro-benchmarking: An Auto-tuning Approach for Complex Applications

Wenjing Ma, Sriram Krishnamoorthy, Gagan Agrawal

2011 International Conference on Parallel Architectures and Compilation Techniques > 181 - 182

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Auto-tuning has emerged as an important practical method for creating highly optimized code. However, the growing complexity of architectures and applications has resulted in a prohibitively large search space that preclude empirical auto-tuning. Here, we focus on the challenge to auto-tuning presented by applications that require auto-tuning of not just a small number of distinct kernels, but a large...

chapter

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework

George Bosilca, Aurelien Bouteiller, Thomas Herault, Pierre Lemarinier, more

2011 IEEE International Conference on Cluster Computing > 395 - 402

2011 IEEE International Conference on Cluster Computing (CLUSTER)

Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with non-uniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present...

chapter

Ultra High Definition video decoding with Motion JPEG XR using the GPU

Bart Pieters, Jan De Cock, Charles Hollemeersch, Jeroen Wielandt, more

2011 18th IEEE International Conference on Image Processing > 377 - 380

2011 18th IEEE International Conference on Image Processing (ICIP 2011)

Many applications require real-time decoding of high-resolution video pictures, for example, quick editing of video sequences in video editing applications. To increase decoding speed, parallelism can be exploited, yet, block-based image and video coding standards are difficult to decode in parallel because of the high number of dependencies between blocks. This paper investigates the parallel decoding...

chapter

Research on GPU-Based Real-Time MTF Compensation Algorithm

Liu Yang Fang, Mi Wang, De Ren Li, Bing Xian Zhang

2011 International Symposium on Image and Data Fusion > 1 - 5

2011 International Symposium on Image and Data Fusion (ISIDF)

chapter

Large Terrain Modeling and Visualization for Planets

Steven Myint, Abhinandan Jain, Jonathan Cameron, Christopher Lim

2011 IEEE Fourth International Conference on Space Mission Challenges for Information Technology > 177 - 183

2011 IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT)

Physics-based simulations are actively used in the design, testing, and operations phases of surface and near-surface planetary space missions. One of the challenges in real-time simulations is the ability to handle large multi-resolution terrain data sets within models as well as for visualization. In this paper, we describe special techniques that we have developed for visualization, paging, and...

chapter

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators

Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Mathieu Faverge, more

2011 IEEE International Parallel & Distributed Processing Symposium > 932 - 943

2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS)

One of the major trends in the design of exascale architectures is the use of multicore nodes enhanced with GPU accelerators. Exploiting all resources of a hybrid accelerators-based node at their maximum potential is thus a fundamental step towards exascale computing. In this article, we present the design of a highly efficient QR factorization for such a node. Our method is in three steps. The first...

Keywords:
TILES
Publication type:
book

Publication date

Set your own date range

Content availability

Available (37)
None (2)

Keywords

KERNEL (12)
GPU (11)
COPROCESSORS (10)
INSTRUCTION SETS (9)
COMPUTER GRAPHIC EQUIPMENT (8)
RENDERING (COMPUTER GRAPHICS) (8)
REAL TIME SYSTEMS (7)
HARDWARE (6)
COMPUTER ARCHITECTURE (5)
PARALLEL PROCESSING (5)
ARRAYS (4)
OPTIMIZATION (4)
RUNTIME (4)
ALGORITHM DESIGN AND ANALYSIS (3)
BANDWIDTH (3)
CAMERAS (3)
CUDA (3)
GPGPU (3)
GRAPHICS (3)
GRAPHICS PROCESSOR (3)
IMAGE COLOR ANALYSIS (3)
IMAGE RESOLUTION (3)
JACOBIAN MATRICES (3)
MATHEMATICAL MODEL (3)
PIXEL (3)
RANDOM ACCESS MEMORY (3)
STRIPS (3)
TERRAIN MAPPING (3)
TILING (3)
ACCELERATION (2)
CACHE STORAGE (2)
CENTRAL PROCESSING UNIT (2)
COMPUTATIONAL MODELING (2)
COMPUTER GRAPHICS (2)
CPU (2)
DATA MODELS (2)
DATA STRUCTURES (2)
DATA VISUALISATION (2)
DATA VISUALIZATION (2)
DYNAMIC PROGRAMMING (2)
EQUATIONS (2)
GPU-BASED (2)
GRAPHICS PROCESSING UNITS (2)
GRAPHICS RENDERING (2)
HEURISTIC ALGORITHMS (2)
IMAGE ANALYSIS (2)
IMAGE TEXTURE (2)
LAYOUT (2)
LIBRARIES (2)
LOOP TILING (2)
MEASUREMENT (2)
MULTICORE PROCESSING (2)
NOISE (2)
OPTIMISATION (2)
PARALLEL ALGORITHMS (2)
PERFORMANCE EVALUATION (2)
PROCESSOR SCHEDULING (2)
REGISTERS (2)
SMITH-WATERMAN ALGORITHM (2)
SYSTEM-ON-A-CHIP (2)
TERRAIN VISUALIZATION (2)
VECTORS (2)
VISUALIZATION (2)
3D GRAPHICS (1)
A^∗ ALGORITHM (1)
ACCELERATORS (1)
ADVANCED ENCRYPTION STANDARD (1)
ADVANCED SHADER LANGUAGE (1)
ALTERNATING DIRECTION IMPLICIT APPROXIMATE FACTORIZATION (1)
AMERICAN OPTION (1)
APPROXIMATION METHODS (1)
ART (1)
ASTRONOMY COMPUTING (1)
AUTO-TUNING (1)
AUTOMATIC POINT TARGET DETECTION (1)
BENCHMARK TESTING (1)
BIOLOGICAL SEQUENCE ALIGNMENT (1)
BIOLOGY (1)
BIOLOGY COMPUTING (1)
BISMUTH (1)
BYFIELD LAYOUT (1)
CACHE (1)
CHOLESKY MATRIX FACTORIZATION (1)
CLOCKS (1)
CLUSTER (1)
CNB COHERENCY MECHANISM (1)
COALESCED MEMORY ACCESS (1)
COLLABORATION (1)
COLLABORATIVE FILTERING (1)
COLOR (1)
COLOR MATCHING (1)
COLOR-CODED QUALITY MEASUREMENT (1)
COLORED NOISE (1)
COMPUTE UNIFIED DRIVER ARCHITECTURE (1)
COMPUTE-INTENSIVE KERNEL (1)
COMPUTER SCIENCE (1)
CONVERGENCE (1)
COSMOLOGICAL DATA SETS (1)
more

INFONA - science communication portal

Search results

Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs

Optimizing a Collaborative Filtering Recommender for Many-Core Processors

Procedural textures using tilings with Perlin Noise

Exploiting GPUs for multi-agent path planning on grid maps

Fragment Reduction on Mobile GPU with Content Adaptive Sampling

Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

Scaling Data-Intensive Applications on Heterogeneous Platforms with Accelerators

Tile-based GPU optimizations through ESL full system simulation

Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs

A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC

Performance evaluation of CPU-GPU and CPU-only algorithms for detecting defective tablets through morphological imaging techniques

LU factorization for accelerator-based systems

Real-time Person Tracking in High-resolution Panoramic Video for Automated Broadcast Production

Image mosaic using Log-polar binning

Parameterized Micro-benchmarking: An Auto-tuning Approach for Complex Applications

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework

Ultra High Definition video decoding with Motion JPEG XR using the GPU

Research on GPU-Based Real-Time MTF Compensation Algorithm

Large Terrain Modeling and Visualization for Planets

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options