The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms,...
Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released...
A multi-GPU parallelization of exact string matching algorithms based on the backward-search procedure by using indexing techniques, such as the Burrows-Wheeler Transform and the FM-Index, is proposed in this paper. To attain an efficient execution on highly heterogeneous parallel platforms, the proposed parallelization adopted an unified OpenCL implementation that allows its execution either in CPUs...
Bias-scalability in analog CMOS circuits refers to a current-mode design paradigm where the operation of the circuit remains invariant to the operating conditions (weak-inversion, moderate-inversion or strong-inversion) of the transistors. In this paper we present the design and implementation of a bias-scalable analog support vector machine (SVM) based on our previously reported margin propagation...
In general-purpose graphics processing unit (GPGPU) computing, data is processed by concurrent threads executing the same function. This model, dubbed single-instruction/multiple-thread (SIMT), requires programmers to coordinate the synchronous execution of similar operations across thousands of data elements. To alleviate this programmer burden, Gaster and Howes outlined the channel abstraction,...
A new unified mathematical framework for sensor array processing is proposed. The proposed framework combines Bayesian estimation with stochastic geometry to accommodate prior information, uncertainty in array parameters, and unknown and stochastically time-varying number of nonstationary sources. A system model for a common signal setting is constructed to demonstrate the proposed framework.
Computer game, a new field of artificial intelligence, as the name suggests, is to make the computer learn to think and play chess games like human beings. As one of the important research field of the artificial intelligence, computer game, which is considered as the touchstone of the artificial intelligence, has brought many important methods and theories to the field. Connect6, is a newly introduced...
The paging mechanism is widely used in most modern systems to handle the virtual memory. Many page replacement algorithms have been proposed. Therefore, the cor-rectness and reliability of virtual memory management systems become very important. It is essential to formalize and verify the system in a formal way. In this paper, we model the virtual memory management system with MSVL, which is a parallel...
Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS operations tends to be memory-bound rather than compute-bound. In this paper, we present an efficient reconfigurable architecture for parallel BFS that adopts new optimizations for utilizing...
This paper presents a GPU-based wave-front propagation technique for multi-agent path planning in extremely large, complex, dynamic environments. Our work proposes an adaptive subdivision of the environment with efficient indexing, update, and neighbor-finding operations on the GPU to address several known limitations in prior work. In particular, an adaptive environment representation reduces the...
There has been a growing trend in using heterogeneous systems with CPUs and GPUs to solve diverse compute problems. However, high application performance on these platforms relies on efficient memory accesses. For many applications, CPUs and GPUs prefer different memory mappings and data structure layouts. This in turn requires developers to use device-specific strategies for memory access optimizations...
With energy efficiency and power consumption being the primary impediment in the path to exascale systems, low-power high performance embedded systems are of increasing interest. The Parallella System-on-module (SoM) created by Adapteva combines the Epiphany-IV 64-core coprocessor with a host ARM processor housed in a Zynq System-on-chip. The Epiphany integrates low-power RISC cores on a 2D mesh network...
In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from the GPU. For the integration into NumPy, we use the Bohrium runtime system. Bohrium hooks into NumPy through the implicit data parallelization of array operations, this approach requires no annotations or...
Coarse-Grained Reconfigurable Architecture (CGRAs) are a promising parallel architecture with both high performance and high power-efficiency. Inner loop pipelining and outer loop merging techniques are usually used to improve the execution performance when mapping loops ontoCGRA. However, the number of concurrently executable operators (CEOs) from the kernel still can not make the best of PEs in...
This paper confirms the suitability of kernel principal component analysis (KPCA) as a robust feature extraction and denoising method in sensor array based vapor detection system (E-nose). Particularly the study focuses on response analysis of surface acoustic wave (SAW) sensor array in chemical class recognition of volatile organic compounds (VOCs). Usually KPCA results deprived performance compare...
In this paper, we show the applicability of combinatorial testing to the system call interface of the Linux kernel. Our approach is two-fold: first we analyze the Trinity fuzz tester and in the aftermath we adapt the input parameter modeling of Trinity to the field of combinatorial testing. Furthermore, apart from the modeling itself, we target to provide a configurable testing framework for executing...
The changing times have caused the requirements to change, causing a revolution in the field of parallel computing. The emergence of parallel computing as a necessity has boosted the use of GPGPUs for this purpose. With such an emergence comes a drastic improvement in many real world applications of GPGPUs as well. In this paper we discuss about GPGPUs, their evolution, and their contribution to many...
Efficient memory sharing between CPU and GPU threads can greatly expand the effective set of GPGPU workloads. For increased programmability, this memory should be uniformly virtualized, necessitating compatible address translation support for GPU memory references. However, even a modest GPU might need 100s of translations per cycle (6 CUs * 64 lanes/CU) with memory access patterns designed for throughput...
This paper presents design of a multiplierless kernel operation for binary Support Vector machine which is based on systolic array architecture. This design provides reduced area, reduced cost and high speed performance due to the use of multiplierless kernel operation. Binary SVM classifier classifies two groups of linearly or nonlinearly separable data. We have designed an algorithm which is expected...
For loop accelerators such as coarse-grained reconfigurable architectures (CGRAs) and GP-GPUs, nested loops represent an important source of parallelism. Existing solutions to mapping nested loops on CGRAs, however, are either designed for perfectly nested loops only, or expensive and inflexible. Efficient CGRA mapping of imperfect loops with arbitrary nesting depth still remains a challenge. In this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.