Search results

chapter

Accelerating outlier detection with intra- and inter-node parallelism

Fabrizio Angiulli, Stefano Basta, Stefano Lodi, Claudio Sartori

2014 International Conference on High Performance Computing & Simulation (HPCS) > 476 - 483

2014 International Conference on High Performance Computing & Simulation (HPCS)

Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms,...

chapter

Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

Milan Stanic, Oscar Palomar, Ivan Ratkovic, Milovan Duric, more

2014 International Conference on High Performance Computing & Simulation (HPCS) > 47 - 54

2014 International Conference on High Performance Computing & Simulation (HPCS)

Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released...

chapter

Burrows-Wheeler Transform based indexed exact search on a multi-GPU OpenCL platform

David Nogueira, Pedro Tomas, Nuno Roma

2014 International Conference on High Performance Computing & Simulation (HPCS) > 31 - 38

2014 International Conference on High Performance Computing & Simulation (HPCS)

A multi-GPU parallelization of exact string matching algorithms based on the backward-search procedure by using indexing techniques, such as the Burrows-Wheeler Transform and the FM-Index, is proposed in this paper. To attain an efficient execution on highly heterogeneous parallel platforms, the proposed parallelization adopted an unified OpenCL implementation that allows its execution either in CPUs...

chapter

A bias-scalable current-mode analog support vector machine based on margin propagation

Ming Gu, Shantanu Chakrabartty

2014 IEEE International Symposium on Circuits and Systems (ISCAS) > 273 - 276

2014 IEEE International Symposium on Circuits and Systems (ISCAS)

Bias-scalability in analog CMOS circuits refers to a current-mode design paradigm where the operation of the circuit remains invariant to the operating conditions (weak-inversion, moderate-inversion or strong-inversion) of the transistors. In this paper we present the design and implementation of a bias-scalable analog support vector machine (SVM) based on our previously reported margin propagation...

chapter

Fine-grain task aggregation and coordination on GPUs

Marc S. Orr, Bradford M. Beckmann, Steven K. Reinhardt, David A. Wood

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA) > 181 - 192

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)

In general-purpose graphics processing unit (GPGPU) computing, data is processed by concurrent threads executing the same function. This model, dubbed single-instruction/multiple-thread (SIMT), requires programmers to coordinate the synchronous execution of similar operations across thousands of data elements. To alleviate this programmer burden, Gaster and Howes outlined the channel abstraction,...

chapter

A stochastic geometric approach to sensor array processing

Ba Ngu Vo, Ba Tuong Vo

2014 IEEE Workshop on Statistical Signal Processing (SSP) > 236 - 239

2014 IEEE Statistical Signal Processing Workshop (SSP)

A new unified mathematical framework for sensor array processing is proposed. The proposed framework combines Bayesian estimation with stochastic geometry to accommodate prior information, uncertainty in array parameters, and unknown and stochastically time-varying number of nonstationary sources. A system model for a common signal setting is constructed to demonstrate the proposed framework.

chapter

The design and optimization of Connect6 computer game system

Chang Liu, Bingke Wu, Sichen Wu

The 26th Chinese Control and Decision Conference (2014 CCDC) > 3936 - 3940

2014 26th Chinese Control And Decision Conference (CCDC)

Computer game, a new field of artificial intelligence, as the name suggests, is to make the computer learn to think and play chess games like human beings. As one of the important research field of the artificial intelligence, computer game, which is considered as the touchstone of the artificial intelligence, has brought many important methods and theories to the field. Connect6, is a newly introduced...

chapter

Simulation and verification of the virtual memory management system with MSVL

Meng Wang, Zhenhua Duan, Cong Tian

Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD) > 360 - 365

2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD)

The paging mechanism is widely used in most modern systems to handle the virtual memory. Many page replacement algorithms have been proposed. Therefore, the cor-rectness and reliability of virtual memory management systems become very important. It is essential to formalize and verify the system in a formal way. In this paper, we model the virtual memory management system with MSVL, which is a parallel...

chapter

CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search

Osama G. Attia, Tyler Johnson, Kevin Townsend, Philip Jones, more

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 228 - 235

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS operations tends to be memory-bound rather than compute-bound. In this paper, we present an efficient reconfigurable architecture for parallel BFS that adopts new optimizations for utilizing...

chapter

GPU-based dynamic search on adaptive resolution grids

Francisco M. Garcia, Mubbasir Kapadia, Norman I. Badler

2014 IEEE International Conference on Robotics and Automation (ICRA) > 1631 - 1638

2014 IEEE International Conference on Robotics and Automation (ICRA)

This paper presents a GPU-based wave-front propagation technique for multi-agent path planning in extremely large, complex, dynamic environments. Our work proposes an adaptive subdivision of the environment with efficient indexing, update, and neighbor-finding operations on the GPU to address several known limitations in prior work. In particular, an adaptive environment representation reduces the...

chapter

Dymaxion++: A Directive-Based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems

Shuai Che, Jiayuan Meng, Kevin Skadron

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 916 - 924

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

There has been a growing trend in using heterogeneous systems with CPUs and GPUs to solve diverse compute problems. However, high application performance on these platforms relies on efficient memory accesses. For many applications, CPUs and GPUs prefer different memory mappings and data structure layouts. This in turn requires developers to use device-specific strategies for memory access optimizations...

chapter

Programming the Adapteva Epiphany 64-Core Network-on-Chip Coprocessor

Anish Varghese, Bob Edwards, Gaurav Mitra, Alistair P. Rendell

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 984 - 992

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

With energy efficiency and power consumption being the primary impediment in the path to exascale systems, low-power high performance embedded systems are of increasing interest. The Parallella System-on-module (SoM) created by Adapteva combines the Epiphany-IV 64-core coprocessor with a host ARM processor housed in a Zynq System-on-chip. The Epiphany integrates low-power RISC cores on a 2D mesh network...

chapter

Transparent GPU Execution of NumPy Applications

Troels Blum, Mads R.B. Kristensen, Brian Vinter

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1002 - 1010

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from the GPU. For the integration into NumPy, we use the Bohrium runtime system. Bohrium hooks into NumPy through the implicit data parallelization of array operations, this approach requires no annotations or...

chapter

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Dajiang Liu, Shouyi Yin, Leibo Liu, Shaojun Wei

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines > 32

2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

Coarse-Grained Reconfigurable Architecture (CGRAs) are a promising parallel architecture with both high performance and high power-efficiency. Inner loop pipelining and outer loop merging techniques are usually used to improve the execution performance when mapping loops ontoCGRA. However, the number of concurrently executable operators (CEOs) from the kernel still can not make the best of PEs in...

chapter

Optimized KPCA method for chemical vapor class recognition by SAW sensor array response analysis

Sunil Kr Jha, Kenshi Hayashi

2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP) > 1 - 6

2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP)

This paper confirms the suitability of kernel principal component analysis (KPCA) as a robust feature extraction and denoising method in sensor array based vapor detection system (E-nose). Particularly the study focuses on response analysis of surface acoustic wave (SAW) sensor array in chemical class recognition of volatile organic compounds (VOCs). Usually KPCA results deprived performance compare...

chapter

Eris: A Tool for Combinatorial Testing of the Linux System Call Interface

Bernhard Garn, Dimitris E. Simos

2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops > 58 - 67

2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

In this paper, we show the applicability of combinatorial testing to the system call interface of the Linux kernel. Our approach is two-fold: first we analyze the Trinity fuzz tester and in the aftermath we adapt the input parameter modeling of Trinity to the field of combinatorial testing. Furthermore, apart from the modeling itself, we target to provide a configurable testing framework for executing...

chapter

Importance of GPGPUs in efficiency improvement of real world applications

Shreyas Bhatia, Minal Tolpadi, Akhtar Rasool

2014 IEEE Students' Conference on Electrical, Electronics and Computer Science > 1 - 6

2014 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS)

The changing times have caused the requirements to change, causing a revolution in the field of parallel computing. The emergence of parallel computing as a necessity has boosted the use of GPGPUs for this purpose. With such an emergence comes a drastic improvement in many real world applications of GPGPUs as well. In this paper we discuss about GPGPUs, their evolution, and their contribution to many...

chapter

Supporting x86-64 address translation for 100s of GPU lanes

Jason Power, Mark D. Hill, David A. Wood

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) > 568 - 578

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

Efficient memory sharing between CPU and GPU threads can greatly expand the effective set of GPGPU workloads. For increased programmability, this memory should be uniformly virtualized, necessitating compatible address translation support for GPU memory references. However, even a modest GPU might need 100s of translations per cycle (6 CUs * 64 lanes/CU) with memory access patterns designed for throughput...

chapter

Design of a systolic array based multiplierless support vector machine classifier

Bhaswati Mandal, Manash Pratim Sarma, Kandarpa Kumar Sarma

2014 International Conference on Signal Processing and Integrated Networks (SPIN) > 35 - 39

2014 International Conference on Signal Processing and Integrated Networks (SPIN)

This paper presents design of a multiplierless kernel operation for binary Support Vector machine which is based on systolic array architecture. This design provides reduced area, reduced cost and high speed performance due to the use of multiplierless kernel operation. Binary SVM classifier classifies two groups of linearly or nonlinearly separable data. We have designed an algorithm which is expected...

chapter

Flattening-based mapping of imperfect loop nests for CGRAs?

Jongeun Lee, Seongseok Seo, Hongsik Lee, Hyeon Uk Sim

2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) > 1 - 10

2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

For loop accelerators such as coarse-grained reconfigurable architectures (CGRAs) and GP-GPUs, nested loops represent an important source of parallelism. Existing solutions to mapping nested loops on CGRAs, however, are either designed for perfectly nested loops only, or expensive and inflexible. Efficient CGRA mapping of imperfect loops with arbitrary nesting depth still remains a challenge. In this...

INFONA - science communication portal

Search results

Accelerating outlier detection with intra- and inter-node parallelism

Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

Burrows-Wheeler Transform based indexed exact search on a multi-GPU OpenCL platform

A bias-scalable current-mode analog support vector machine based on margin propagation

Fine-grain task aggregation and coordination on GPUs

A stochastic geometric approach to sensor array processing

The design and optimization of Connect6 computer game system

Simulation and verification of the virtual memory management system with MSVL

CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search

GPU-based dynamic search on adaptive resolution grids

Dymaxion++: A Directive-Based API to Optimize Data Layout and Memory Mapping for Heterogeneous Systems

Programming the Adapteva Epiphany 64-Core Network-on-Chip Coprocessor

Transparent GPU Execution of NumPy Applications

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Optimized KPCA method for chemical vapor class recognition by SAW sensor array response analysis

Eris: A Tool for Combinatorial Testing of the Linux System Call Interface

Importance of GPGPUs in efficiency improvement of real world applications

Supporting x86-64 address translation for 100s of GPU lanes

Design of a systolic array based multiplierless support vector machine classifier

Flattening-based mapping of imperfect loop nests for CGRAs?

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options