Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage...
The thriving success of the Cloud Industry greatly relies on the fact that virtual resources are as good as bare metal resources when it comes to ensuring a given level of quality of service. Thanks to the isolation provided by virtualisation techniques based on hypervisors, a big physical resource can be spatially multiplexed into smaller virtual resources which are easier to sell. Unfortunately,...
GPU-based clusters are widely chosen for accelerating a variety of scientific applications in high-end cloud environments. With their growing popularity, there is a necessity for improving the system throughput and decreasing the turnaround time for co-executing applications on the same GPU device. However, resource contention among multiple applications on a multi-tasked GPU leads to the performance...
In this paper we present MLTBiqCrunch, a hierarchically parallelized version of the open-source solver BiqCrunch [1]. More precisely, this version has two levels of parallelization: a coarse grain, assigning a thread to a node evaluation and a fine grain, parallelizing a node evaluation when some threads are not busy. We present experiments on some classical binary quadratic optimization problems...
FPGAs are becoming an attractive choice as a heterogeneous computing unit for scientific computing because FPGA vendors are adding floating-point-optimized architectures to their product lines. Additionally, high-level synthesis (HLS) tools such as Altera OpenCL SDK are emerging, which could potentially break the FPGA programming wall and provide a streamlined flow for domain experts in scientific...
Recent developments in storage class memory such as PCM, MRAM, RRAM, and STT-RAM have strengthened their leadership as storage media for memory-based file systems. Traditional Linux memory-based file systems such as Ramfs and Tmpfs utilize the Linux page cache as a file system. These file systems, when adopted as a file system for SCM, have the following problems. First, current implementation of...
Most security software tools try to detect malicious components by cryptographic hashes, signatures or based on their behavior. The former, is a widely adopted approach based on Integrity Measurement Architecture (IMA) enabling appraisal and attestation of system components. The latter, however, may induce a very long time until misbehavior of a component leads to a successful detection. Another approach...
CNN involves large number of convolution of feature maps and kernels, necessary for extracting useful features for accurate classification. However, it requires significant amount of computationally intensive power and area hungry multiplications limiting its deployment on embedded devices under resource constrained scenario. To address this problem, we propose modified distributed arithmetic based...
Excessive memory usage in software applications has become a frequent issue. A high degree of parallelism and the monitoring difficulty for the developer can quickly lead to memory shortage, or can increase the duration of garbage collection cycles. There are several solutions introduced to monitor memory usage in software. However they are neither efficient nor scalable. In this paper, we propose...
The advent of 8K and better resolutions of video pose problems for the capture and storage of data by these standards. The contemporary alternative is to compromise on quality and use various (often lossy) compression techniques to reduce the bandwidth required to move this data. This paper proposes a novel method for handling large volumes of video data without compromising its quality through space...
In this paper we propose a novel CNN hardware accelerator, called AlScale, capable of accelerating convolutional, pooling, fully-connected and adding CNN layers. In contrast to most existing solutions, AIScale offers a complete solution to the full CNN acceleration. AIScale is designed as a coarse-grained reconfigurable architecture, which uses rapid, dynamic reconfiguration during the CNN layer processing...
OpenCL is a standard that supports a parallel programming paradigm which enables heterogeneous multi-core systems and also offers a high level of portability for the application. Some of the systems that are used with OpenCL might have vector capabilities at device compute units level. There are more ways the vector capabilities could be exploited by the OpenCL device application, the most common...
RDMA (Remote Direct Memory Access) is a technology that enables user applications to perform direct data transfer between the virtual memory of processes on remote endpoints, without operating system involvement or intermediate data copies. Achieving zero intermediate data copies using RDMA requires specialized network interface hardware. Software RDMA drivers emulate RDMA semantics in software to...
Heterogeneous platforms with large numbers of processing elements (PEs) have been proposed to satisfy the computational requirements of computer vision applications. Limiting the incurred communication cost here is key to meet the power constraints of embedded devices.We present a new heuristic to reduce communication among PEs and to external memory by aggregating inter-process communication and...
Kernel Samepage Merging (KSM) is a Linux kernel module for improving memory utilization by searching and merging the redundant memory pages. When working with the hypervisors, such as Kernel-based Virtual Machine, KSM helps share identical memory pages of the hosted virtual servers so as to increase the server density. Nevertheless, while KSM improves the efficiency of the host system, it hurts the...
String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA...
Sparse general matrix-matrix multiplication (SpGEMM) is one of the key kernels of preconditioners such as algebraic multigrid method or graph algorithms. However, the performance of SpGEMM is quite low on modern processors due to random memory access to both input and output matrices. As well as the number and the pattern of non-zero elements in the output matrix, important for achieving locality,...
We address the problem of optimizing global shared memory usage in deeply heterogeneous accelerators in the context of HPC systems running multiple applications with different quality of service levels. We explore predictive memory allocation algorithms, allowing to serve up to 28% more high priority requests when using a moving average based prediction in a low-workload scenario.
GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking...
A bi-dimensional filter for high accuracy image processing is implemented by using a novel partitioning method. The method is based on a number theory theorem, which permits to reduce the complexity of the operation to that of an adder chain and also the amount of the coefficients stored in memory, improving the memory organization. To show the advantage of such method, we implemented a Floating Point...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.