The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we propose an Iterative Re-Weighted Least Square procedure in order to solve the Support Vector Machines for regression and function estimation. Furthermore, we include a new algorithm to train Support Vector Machines, covering both the proposed approach instead of the quadratic programming part and the most advanced methods to deal with large training data sets. Finally, the performance...
With the increased popularity of multi-GPU nodes in modern HPC clusters, it is imperative to develop matching programming paradigms for their efficient utilization. In order to take advantage of the local GPUs and the low-latency high-throughput interconnects that link them, programmers need to meticulously adapt parallel applications with respect to load balancing, boundary conditions and device...
Virtual Machine Cluster (VMC) is now widely used to host network applications due to its well scalability and high availability compared to physical cluster. To provide fault tolerance, VMC snapshot is one well known technique, it saves the entire VMC state into stable storage and rollbacks the VM from the latest saved state upon failures. However, due to the large snapshot size as well as numerous...
GPUs use thousands of threads to provide high performance and efficiency. In general, if one thread of a kernel uses one of the resources (compute, bandwidth, data cache) more heavily, there will be significant contention for that resource due to the large number of identical concurrent threads. This contention will eventually saturate the performance of the kernel due to contention for the bottleneck...
GPUs are being widely used to accelerate different workloads and multi-GPU systems can provide higher performance with multiple discrete GPUs interconnected together. However, there are two main communication bottlenecks in multi-GPU systems -- accessing remote GPU memory and the communication between GPU and the host CPU. Recent advances in multi-GPU programming, including unified virtual addressing...
The Stochastic On-Time Arrival (SOTA) problem has recently been studied as an alternative to traditional shortest-path formulations in situations with hard deadlines. The goal is to find a routing strategy that maximizes the probability of reaching the destination within a pre-specified time budget, with the edge weights of the graph being random variables with arbitrary distributions. While this...
Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution to program emerging Scalable Heterogeneous Computing (SHC) platforms. However, the increased complexity in the SHC systems incurs several challenges in terms of portability and productivity. This paper presents an open-sourced OpenACC compiler, called OpenARC, which serves as an extensible research...
Much attention has been given to the efficient execution of the scale-out applications that dominate in datacenter computing. However, the effects of the hardware support in the Memory Management Unit (MMU) in combination with the distinct characteristics of the scale-out applications have been largely ignored until recently. In this paper, we comprehensively quantify the MMU overhead on a real machine...
Android operating system, which is the most popular platform for smartphones and tablet computers, has an original memory managing system, which is call “low memory killer”. In case of lacking of memory, Android operating system terminates processes until enough memory is available. It selects targets in order of pre-defined priority and consuming memory size regardless of its re-launching time, its...
In order to solve the training time problem of the support vector machine for a large dataset, in this paper, an alternative approach motivated by the radial basis function neural network is developed to partition the subset of SVs for the SVM. The proposed method aims at obtain an optimal decision boundary based on the RBFNN, because it has good convergence and fast training. On the other hand, the...
The MapReduce paradigm is one of the best solutions for implementing distributed applications which perform intensive data processing. In terms of performance regarding this type of applications, MapReduce can be improved by adding GPU capabilities. In this context, the GPU clusters for large scale computing can bring a considerable increase in the efficiency and speedup of data intensive applications...
Due to the diversity of processor architectures and application memory access patterns, the performance impact of using local memory in OpenCL kernels has become unpredictable. For example, enabling the use of local memory for an OpenCL kernel can be beneficial for the execution on a GPU, but can lead to performance losses when running on a CPU. To address this unpredictability, we propose an empirical...
Cloud computing is now being used by a wide variety of users, ranging from expert programmers and system administrators to scientists and laymen. Cloud providers are taking full advantage of all their resources as much as they can. Memory is the most expensive resource in terms of oversubscription and this has resulted in high price to the end user. Furthermore, performing swapping in Virtual Machines...
Emerging non-volatile memory technologies are promising to serve as storage class memory to replace hard disk and even DRAM. In this paper, we focus on the energy issue of Storage Class Memory (SCM) when one exploits its scalability and near-DRAM latency to provide large-capacity memory system. SCM, such as PCM and RRAM, incurs high write energy and write energy asymmetry. Especially, the energy of...
Phase Change Memory (PCM) has been considered as a leading candidate to replace the traditional DRAM in embedded systems due to its promising characteristics such as low leakage power, low cost, non-volatility, and high scalability. One of the constraints that undermine the credential of PCM as main memory is its limited write endurance. In this paper, we develop wear-leveling techniques purely on...
In this paper, we propose a new fast parallel sparse matrix-vector multiplication (SpMV) algorithm on GPU platforms. The new algorithm, called segSpMV, is based on the compressed sparse row (CSR) format and can be applied to wide computational applications with both structured and unstructured matrices. The SpMV operation has very low computing to communication ratio and is bandwidth-limited. The...
Exact values of Spatial and temporal consumption are needed when we are judging the space and time complexities of an algorithm, but few researchers paid attention on whether the their methods were valid. In this paper, we discussed about some key concepts involved in the process of monitoring process's spatial and temporal consumption, and then we explained and distinguished those concepts. Further,...
Embedded systems are constantly becoming more complex, as they are increasingly equipped with more functionality. Networking capability is one of the most desired features even for embedded systems, hence network applications, typically used in desktop systems, are required to become available in the embedded system domain. Rewriting these applications to fit into embedded root file systems takes...
Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the...
In this paper, we present a novel lattice-based memory model called max-plus projection autoassociative morphological memory (max-plus PAMM). The max-plus PAMM yields the largest max-plus combination of the stored patterns which is less than or equal to the input. Such as the original autoassociative morphological memories (AMMs), it is idempotent and it gives perfect recall of undistorted patterns...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.