Search results

Items from 1 to 9 out of 9 results

chapter

Fast time-domain volterra filtering

Harald Enzinger, Karl Freiberger, Gernot Kubin, Christian Vogel

2016 50th Asilomar Conference on Signals, Systems and Computers > 225 - 228

2016 50th Asilomar Conference on Signals, Systems and Computers

We present two algorithms for fast time-domain Volterra filtering. The first algorithm computes the required products of input samples using only one multiplication per term. Since the products are explicitly computed, this algorithm can be used for adaptation as well as for filtering. The second algorithm generalizes Horner's method for polynomial evaluation and directly computes output samples without...

chapter

LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation

Alexander Heinecke, Greg Henry, Maxwell Hutchinson, Hans Pabst

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 981 - 991

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Many modern highly scalable scientific simulations packages rely on small matrix multiplications as their main computational engine. Math libraries or compilers are unlikely to provide the best possible kernel performance. To address this issue, we present a library which provides high performance small matrix multiplications targeting all recent x86 vector instruction set extensions up to Intel AVX-512...

chapter

FACE-CHANGE: Application-Driven Dynamic Kernel View Switching in a Virtual Machine

Zhongshu Gu, Brendan Saltaformaggio, Xiangyu Zhang, Dongyan Xu

2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks > 491 - 502

2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Kernel minimization has already been established as a practical approach to reducing the trusted computing base. Existing solutions have largely focused on whole-system profiling - generating a globally minimum kernel image that is being shared by all applications. However, since different applications use only part of the kernel's code base, the minimized kernel still includes an unnecessarily large...

chapter

Parallelism Extraction Algorithm from Stream-Based Processing Flow Applying Spanning Tree

Guyue Wang, Shinichi Yamagiwa, Koichi Wada

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 632 - 641

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Manycore architecture promotes a massively parallel computing on the accelerators. Especially GPU is one of the main series of the high performance computing, which is also employed by top supercomputers in the world. The programming method on such accelerators includes development of a control program. The accelerator executes it to schedule the invocation timing of the accelerator's kernel program...

chapter

Rootbeer: Seamlessly Using GPUs from Java

Philip C. Pratt-Szeliga, James W. Fawcett, Roy D. Welch

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 375 - 380

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

When converting a serial program to a parallel program that can run on a Graphics Processing Unit (GPU) the developer must choose what functions will run on the GPU. For each function the developer chooses, he or she needs to manually write code to: 1) serialize state to GPU memory, 2) define the kernel code that the GPU will execute, 3) control the kernel launch and 4) deserialize state back to CPU...

chapter

An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence

Jun Lee, Jungwon Kim, Junghyun Kim, Sangmin Seo, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 56 - 67

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Recently, Intel has introduced a research prototype many core processor called the Single-chip Cloud Computer (SCC). The SCC is an experimental processor created by Intel Labs. It contains 48 cores in a single chip and each core has its own L1 and L2 caches without any hardware support for cache coherence. It allows maximum 64GB size of external memory that can be accessed by all cores and each core...

chapter

Fast multipole method on GPU: Tackling 3-D capacitance extraction on massively parallel SIMD platforms

Xueqian Zhao, Zhuo Feng

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC) > 558 - 563

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC)

To facilitate full chip capacitance extraction, field solvers are typically deployed for characterizing capacitance libraries for various interconnect structures and configurations. In the past decades, various algorithms for accelerating boundary element methods (BEM) have been developed to improve the efficiency of field solvers for capacitance extraction. This paper presents the first massively...

chapter

Multi-operand block-floating point arithmetic for image processing

A Lipchin, I Reyzin, D Lisin, M Saptharishi

2010 IEEE Workshop On Signal Processing Systems > 122 - 127

2010 IEEE Workshop on Signal Processing Systems (SiPS 2010)

We extend the application of block-floating point arrays to multi-operand algebraic expressions consisting of additions and multiplications. The proposed method enables automatic runtime calculation of binary shifts of array elements. The shifts are computed for all elementary operations in an expression using a dataflow graph. The method attempts to preserve accuracy across the entire expression...

chapter

A Batched GPU Algorithm for Set Intersection

Di Wu, Fan Zhang, Naiyong Ao, Fang Wang, more

2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks > 752 - 756

2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN 2009)

Intersection of inverted lists is a frequently used operation in search engine systems. Efficient CPU and GPU intersection algorithms for large problem size are well studied. We propose an efficient GPU algorithm for high performance intersection of inverted index lists on CUDA platform. This algorithm feeds queries to GPU in batches, thus can take full advantage of GPU processor cores even if problem...

Filter options

Data set:
ieee
Keywords:
INDEXES
KERNEL
RUNTIME

Publication date

Set your own date range

Keywords

ARRAYS (3)
COMPILERS (2)
GRAPHICS PROCESSING UNIT (2)
SIGNAL PROCESSING ALGORITHMS (2)
ACCURACY (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ALGORITHMS (1)
ATTACK PROVENANCE (1)
ATTACK SURFACE MINIMIZATION (1)
BATCHED GPU ALGORITHM (1)
BLOCK CSR (1)
BLOCK-FLOATING POINT (1)
CACHE COHERENCE (1)
CAPACITANCE (1)
CAPACITANCE EXTRACTION (1)
CARAVELA (1)
CODE GENERATION (1)
COHERENCE (1)
COMPUTATIONAL COMPLEXITY (1)
COMPUTER VISION (1)
CONDUCTORS (1)
CONTEXT (1)
CONVOLUTION (1)
COPPER (1)
COPROCESSORS (1)
CPU INTERSECTION ALGORITHM (1)
CUDA PLATFORM (1)
DATA FLOW GRAPHS (1)
DATAFLOW GRAPH (1)
DIGITAL ARITHMETIC (1)
DIGITAL SIGNAL PROCESSING CHIPS (1)
DISTRIBUTED PROCESSING (1)
FEM (1)
FIXED-POINT (1)
FIXED-POINT PROCESSORS (1)
FREQUENCY-DOMAIN ANALYSIS (1)
GPGPU (1)
GPU (1)
GPU INTERSECTION ALGORITHM (1)
GPU PROCESSOR CORES (1)
GPUS (1)
GRAPHICS PROCESSING UNITS (1)
HEURISTIC ALGORITHMS (1)
HIGH PERFORMANCE COMPUTING (1)
IMAGE PROCESSING (1)
INPUT PREPROCESSING METHOD (1)
INVERTED LISTS (1)
JAVA (1)
JIT COMPILATION (1)
LIBRARIES (1)
LOAD IMBALANCE (1)
LOADING (1)
MATRIX DECOMPOSITION (1)
MEMORY CONSISTENCY (1)
MINIMIZATION (1)
MULTIOPERAND ALGEBRAIC EXPRESSIONS (1)
MULTIOPERAND BLOCK-FLOATING POINT ARITHMETIC (1)
OPENCL (1)
PARALLEL FAST MULTIPOLE METHOD (1)
PARALLEL PROCESSING (1)
PIXEL (1)
RESOURCE ALLOCATION (1)
SEARCH ENGINE SYSTEMS (1)
SEARCH ENGINES (1)
SEM (1)
SET INTERSECTION (1)
SINGLE-CHIP CLOUD COMPUTER (1)
SMALL GEMM (1)
SPANNING TREE ALGORITHM (1)
SPARSE MATRICES (1)
STREAM COMPUTING (1)
SWITCHES (1)
SYNCHRONIZATION (1)
TIME-DOMAIN ANALYSIS (1)
VIRTUALIZATION (1)
XML (1)
YARN (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options