The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Most stereoscopic 3D (S3D) image visual discomfort predictors use the Support Vector Regressor (SVR) as the regression model. However, there are other good regression models such as the Random Forests (RF) and Gradient Boost Regression Tree (GBRT). Here we study the efficacy of these regression models for S3D image visual discomfort prediction. We deployed several regression models to predict the...
This paper provides performance evaluation of PL330 DMA in Zynq SoC based device. Direct Memory Access is the feature that allows computer hardware to access system memory for data movement in bulk without CPU intervention. The I/O devices operate at a slower speed than CPU, but using DMA the CPU can be available for performing other computing tasks while data is transferred, as CPU has to only initiate...
Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the performance benefits of using multiple streams...
The recent advent of stacked memory devices has led to a resurgence of researchassociated with the fundamental memory hierarchy and associated memory pipeline. The bandwidth advantages provided by stacked logic and DRAM devices haveinspired research associated with eliminating the bandwidth bottlenecksassociated with many applications in high performance computing. Further, recent efforts have focused...
The increasing programmability, performance, and cost/effectiveness of GPUs have led to a widespread use of such many-core architectures to accelerate general purpose applications. Nevertheless, tuning applications to efficiently exploit the GPU potentiality is a very challenging task, especially for inexperienced programmers. This is due to the difficulty of developing a SW application for the specific...
Linux container virtualisation is gaining momentum as lightweight technology to support cloud and distributed computing. Applications relying on container architectures might at times rely on inter-container communication, and container networking solutions are emerging to address this need. Containers can be networked together as part of an overlay network, or with actual links from the container...
Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today's platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But recently we are starting to see the introduction...
The classification of an image scene having multiple class labels produces significant challenge to the researchers. A semantic scene may be described by multiple objects or by multiple classes. For example, a beach scene may also contain mountain or buildings in the background. This research work proposes a multi-label scene classification model by using Binary Relevance (BR) based one-versus-rest...
In Linux, Sysfs entries are created to let the kernel export information to user space processes as well as to take in user input. The entries go through the File System to locate the show and store functions that are registered for it. Although this method is a good way to give inputs from user to the kernel space while restricting access, it is a slower method as it has to go through the file system...
A novel scheme for fast browser launch is presented. Our scheme caches the frame buffer data of launched browser by using non-volatile memories, and reuses the cached data when browser launches later. Through implementation, we show that our scheme significantly reduces the launch time of browser.
On embedded devices the physical memory is a critical resource. RAM should be used very efficiently without affecting the performance of the device. In-kernel memory swapping is a Linux feature which creates RAM based swap area and provides a form of virtual memory compression. It increases performance by using a compressed block device in RAM for paging instead of disk. Since In-kernel memory swapping...
According to the statistics, there is low resource utilization and high energy consumption in traditional servers. To reduce the cost, more and more companies begin to build virtual servers. Sever virtualization implements the mapping from virtual resources to physical resources and deal with resource contention among all VMs. Because of complexity of virtualized server systems, it is necessary to...
Traditional media such as hard disk and NAND flash are low latency storage devices. I/Os from and to such devices are completed asynchronously via interrupts in most cases. However, introduction of ultra-low latency devices using next-generation non-volatile memory changes the appropriate way to complete I/O requests. Therefore, it is an awaiting solution to make better way to complete I/O requests...
In streaming dataflow applications such as video conferencing systems, the applications are often subjected to traffic occurring in bursts. As systems consisting of a CPU and a GPU are becoming ubiquitous, efficient utilization of such platforms for handling bursts of data becomes an interesting problem. For GPUs to be efficient, the chunk size of data to process must be large. The bursty nature of...
Architecture designers tend to integrate both CPU and GPU on the same chip to deliver energy-efficient designs. To effectively leverage the power of both CPUs and GPUs on integrated architectures, researchers have recently put substantial efforts into co-running a single application on both the CPU and the GPU of such architectures. However, few studies have been performed to analyze a wide range...
The functional simulator Simics provides a co-simulation integration path with a SystemC simulation environment to create Virtual Platforms. With increasing complexity of the SystemC models, this platform suffers from performance degradation due to the single threaded nature of the integrated Virtual Platform. In this paper, we present a multi-threaded Simics SystemC platform solution that significantly...
High Performance Computing (HPC) aggregates computing power in order to solve large and complex problems in different knowledge areas. Nowadays, HPC users can utilize virtualized infrastructures as a low-cost alternative to deploy their applications. However, virtualization brings some challenges for HPC, specially in regard to overhead caused by hyper visors. In this work, our main goal is to analyze...
Kernel fusion is an optimization method, in which the code from several kernels is composed to create a new, fused kernel. It can push the performance of kernels beyond limits given for their isolated, unfused form. In this paper, we introduce a classification of different types of kernel fusion for both data dependent and data independent kernels. We study kernel fusion on three types of OpenCL devices:...
Heterogeneous systems with different types of compute devices are common nowadays in the field of High Performance Computing (HPC). This heterogeneity is not limited to compute devices, but also includes cluster nodes with different hardware configurations leading to asymmetric cluster architectures. In such a hierarchical system OpenCL is not sufficient any more. Support is required to distribute...
Traditionally, programmers and software tools have focused on mapping a single data-parallel kernel onto a heterogeneous computing system consisting of multiple general-purpose processors (CPUS) and graphics processing units (GPUs). These methodologies break down as application complexity grows to contain multiple communicating data-parallel kernels. This paper introduces MKMD, an automatic system...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.