The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The increasing adoption of GPUs as mainstream computing devices, coupled with the imminent availability of large high-bandwidth caches based on die-stacked memory makes it important to analyze and understand modern GPU compute applications from the perspective of their memory access and data reuse characteristics. This paper presents detailed workload characterization studies on four GPU compute applications...
Recent advancements in the architecture of Graphic Processing Unit (GPU), enables the acceleration of many general purpose applications. Even with high memory bandwidth, GPUs are still faced with the challenge of accelerating highly memory intensive applications. To overcome this challenge this paper investigates the impact of scaling up of the memory partitions and also scaling of frequency of the...
With transistor energy efficiency not scaling at the same rate as transistor density and frequency, CMOS technology has hit a utilization wall, whereby large portions of the chip remain under clocked. To improve performance, while keeping power dissipation at a realistic level, future computing devices will consist of heterogeneous application specific accelerators. The accelerators have to be synthesised...
Many research shows that we will encounter the Highes phenomenon when dealing with the high-dimensional data classification problem. In addition, non-linear support vector machine (SVM) has been shown that it can conquer the problem efficiently. However, the SVM is a black-box model based on the whole features and does not provide the feature importance or “good” feature subset for classification...
The Single Instruction Multiple Thread (SIMT) architecture based Graphic Processing Units (GPUs) are emerging as more efficient platforms than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within...
The recent development of multi-agent simulations brings about a need for population synthesis. It is a task of reconstructing the entire population from a sampling survey of limited size (1% or so), supplying the initial conditions from which simulations begin. This paper presents a new kernel density estimator for this task. Our method is an analogue of the classical Breiman-Meisel-Purcell estimator,...
This paper presents an effective image structure classification method, which was recently proposed for selecting the key parameter of non-local kernel regression (NLKR) namely the kernel bandwidth. Meanwhile, to overcome the problem of intensive computation cost of the non-local patch searching in NLKR, a fast patch searching strategy is proposed according to the classified structure regions. The...
High Performance Computing (HPC) aggregates computing power in order to solve large and complex problems in different knowledge areas. Nowadays, HPC users can utilize virtualized infrastructures as a low-cost alternative to deploy their applications. However, virtualization brings some challenges for HPC, specially in regard to overhead caused by hyper visors. In this work, our main goal is to analyze...
The Active Memory Cube (AMC) is a novel near-memory processor that exploits high memory bandwidth and low latency close to DRAM to execute scientific applications in an energy-efficient manner. Its energy efficiency is derived from a combination of its novel scalar-vector data-flow path combined with its simple control-flow path that required the development of a sophisticated compiler, co-designed...
Many target tracking algorithms for radar systems assume homogeneous backgrounds of clutter. However, real backgrounds are rarely homogeneous. By estimating background intensity, and using the estimate in the likelihood measure, the tracking algorithm is given the ability to adapt to the background. In this work, a method for estimating the clutter intensity is introduced. The method is based on locally...
The performance of a distributed file system significantly affects data-intensive applications that frequently execute I/O operations on large amounts of data. Although many modern distributed file systems are geared to provide highly efficient I/O performance, their operations are nonetheless affected by runtime overhead in data transfer between client nodes and I/O servers. A large part of the overhead...
In this work, we present the characterization of a set of scientific kernels which are representative of the behavior of fundamental and applied physics applications across a wide range of fields. We collect performance attributes in the form of micro-operation mix and off-chip memory bandwidth measurements for these kernels. Using these measurements, we use two clustering methodologies to show which...
We consider the sampling of signals with finite rate of innovation (FRI) in parameter space to reach the minimal sampling rate. Although the sampling of signals with unknown time locations has been treated in previous works, it is difficult to sample the signals with unknown parameters in other parameter space. In this paper, we redefine the signal with FRI and propose a general framework of the FRI...
This paper proposes the use of a kernel density estimation to measure similarities between trajectories. The similarities are then used to predict the future locations of a target. For a given environment with a history of previous target trajectories, the goal is to establish a probabilistic framework to predict the future trajectory of currently observed targets based on their recent moves. Instead...
In this paper, we present a compilation flow for HPC kernels on the REDEFINE coarse-grain reconfigurable architecture (CGRA). REDEFINE is a scalable macro-dataflow machine in which the compute elements (CEs) communicate through messages. REDEFINE offers the ability to exploit high degree of coarse-grain and pipeline parallelism. The CEs in REDEFINE are enhanced with reconfigurable macro data-paths...
The Single Instruction Multiple Thread (SIMT) architecture based, Graphic Processing Units (GPUs) are emerging as more efficient than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous finegrained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within a CTA...
A novel and intuitive way of scheduling entities on a heterogeneous multiprocessing system is presented. The key idea is to understand the behavioral characteristics (foreground/background, IO-bound/CPU-bound) of a scheduling entity to predict the need for its processing bandwidth. This is then used by the scheduler to influence the selection of the big cluster (high-performance) or the LITTLE cluster...
In this paper, we propose an FPGA memory hierarchy based on the OpenCL memory model. The memory hierarchy allows application-specific memory optimizations during design compilation using information provided in OpenCL kernels. With the proposed memory hierarchy, FPGA application developers can focus on their designs in OpenCL kernel codes, and their designs can be synthesized into FPGA hardware via...
Most of the network monitor systems use a user-level network capture library. The user-level library incurs a large overhead and provides inaccurate and insufficient information for self-adaptive networks. For these reasons, we develop a lightweight built-in network monitor running at Linux kernel level for self-adaptive IoT devices.
Despite of Cloud infrastructures can be used as High Performance Computing (HPC) platforms, many issues from virtualization overhead had kept them unrelated. However, with advent of container-based virtualizers, this scenario acquires new perspectives because this technique promises to decrease the virtualization overhead, achieving a near-native performance. In this work, we analyzed the performance...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.