Wyniki wyszukiwania

Pozycje od 1 do 20 spośród 402 wyników

Poprzednia

Następna

rozdział

Unreliable memory operation on a convolutional neural network processor

Jose Marques, Joao Andrade, Gabriel Falcao

2017 IEEE International Workshop on Signal Processing Systems (SiPS) > 1 - 6

2017 IEEE International Workshop on Signal Processing Systems (SiPS)

The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage...

rozdział

ACDC: Advanced consolidation for dynamic containers

Damien Carver, Julien Sopena, Sebastien Monnet

2017 IEEE 16th International Symposium on Network Computing and Applications (NCA) > 1 - 8

2017 IEEE 16th International Symposium on Network Computing and Applications (NCA)

The thriving success of the Cloud Industry greatly relies on the fact that virtual resources are as good as bare metal resources when it comes to ensuring a given level of quality of service. Thanks to the isolation provided by virtualisation techniques based on hypervisors, a big physical resource can be spatially multiplexed into smaller virtual resources which are easier to sell. Unfortunately,...

rozdział

GScheduler: Optimizing resource provision by using GPU usage pattern extraction in cloud environments

Zhuqing Xu, Fang Dong, Jiahui Jin, Junzhou Luo, więcej

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 3225 - 3230

2017 IEEE International Conference on Systems, Man and Cybernetics (SMC)

GPU-based clusters are widely chosen for accelerating a variety of scientific applications in high-end cloud environments. With their growing popularity, there is a necessity for improving the system throughput and decreasing the turnaround time for co-executing applications on the same GPU device. However, resource contention among multiple applications on a multi-tasked GPU leads to the performance...

rozdział

Solving 0-1 quadratic problems with two-level parallelization of the BiqCrunch solver

Camille Coti, Etienne Leclercq, Frederic Roupin, Franck Butelle

2017 Federated Conference on Computer Science and Information Systems (FedCSIS) > 445 - 452

2017 Federated Conference on Computer Science and Information Systems (FedCSIS)

In this paper we present MLTBiqCrunch, a hierarchically parallelized version of the open-source solver BiqCrunch [1]. More precisely, this version has two levels of parallelization: a coarse grain, assigning a thread to a node evaluation and a fine grain, parallelizing a node evaluation when some threads are not busy. We present experiments on some classical binary quadratic optimization problems...

rozdział

Evaluating irregular memory access on OpenCL FPGA platforms: A case study with XSBench

Yingyi Luo, Xianshan Wen, Kazutomo Yoshii, Seda Ogrenci-Memik, więcej

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

FPGAs are becoming an attractive choice as a heterogeneous computing unit for scientific computing because FPGA vendors are adding floating-point-optimized architectures to their product lines. Additionally, high-level synthesis (HLS) tools such as Altera OpenCL SDK are emerging, which could potentially break the FPGA programming wall and provide a streamlined flow for domain experts in scientific...

rozdział

ZonFS: A Storage Class Memory File System with Memory Zone Partitioning on Linux

Jang Woong Kim, Jae-Hoon Kim, Awais Khan, Youngjae Kim, więcej

2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W) > 277 - 282

2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W)

Recent developments in storage class memory such as PCM, MRAM, RRAM, and STT-RAM have strengthened their leadership as storage media for memory-based file systems. Traditional Linux memory-based file systems such as Ramfs and Tmpfs utilize the Linux page cache as a file system. These file systems, when adopted as a file system for SCM, have the following problems. First, current implementation of...

rozdział

Software-design for internal security checks with dynamic Integrity Measurement (DIM)

Kai-Oliver Detken, Marcel Jahnke, Thomas Rix, Andre Rein

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 1 > 367 - 373

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

Most security software tools try to detect malicious components by cryptographic hashes, signatures or based on their behavior. The former, is a widely adopted approach based on Integrity Measurement Architecture (IMA) enabling appraisal and attestation of system components. The latter, however, may induce a very long time until misbehavior of a component leads to a successful detection. Another approach...

rozdział

Modified distributed arithmetic based low complexity CNN architecture design methodology

Madhuri Panwar, J. Padmini, Venkatasubrahmanian, Amit Acharyya, więcej

2017 European Conference on Circuit Theory and Design (ECCTD) > 1 - 4

2017 European Conference on Circuit Theory and Design (ECCTD)

CNN involves large number of convolution of feature maps and kernels, necessary for extracting useful features for accurate classification. However, it requires significant amount of computationally intensive power and area hungry multiplications limiting its deployment on embedded devices under resource constrained scenario. To address this problem, we propose modified distributed arithmetic based...

rozdział

Dynamic trace-based sampling algorithm for memory usage tracking of enterprise applications

Houssem Daoud, Naser Ezzati-jivan, Michel R. Dagenais

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

Excessive memory usage in software applications has become a frequent issue. A high degree of parallelism and the monitoring difficulty for the developer can quickly lead to memory shortage, or can increase the duration of garbage collection cycles. There are several solutions introduced to monitor memory usage in software. However they are neither efficient nor scalable. In this paper, we propose...

rozdział

Non-von-neumann heap for better streaming, capturing and storing of raw 8K video data

Mohamed Shaafiee, Rajasvaran Logeswaran

2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) > 469 - 473

2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)

The advent of 8K and better resolutions of video pose problems for the capture and storage of data by these standards. The contemporary alternative is to compromise on quality and use various (often lossy) compression techniques to reduce the bandwidth required to move this data. This paper proposes a novel method for handling large volumes of video data without compromising its quality through space...

rozdział

AIScale — A coarse grained reconfigurable CNN hardware accelerator

Rastislav Struharik, Bogdan Vukobratovic

2017 IEEE East-West Design & Test Symposium (EWDTS) > 1 - 9

2017 IEEE East-West Design & Test Symposium (EWDTS)

In this paper we propose a novel CNN hardware accelerator, called AlScale, capable of accelerating convolutional, pooling, fully-connected and adding CNN layers. In contrast to most existing solutions, AIScale offers a complete solution to the full CNN acceleration. AIScale is designed as a coarse-grained reconfigurable architecture, which uses rapid, dynamic reconfiguration during the CNN layer processing...

rozdział

Experimentation of vision algorithm performance using custom OpenCL™ vector language extensions for a graphical accelerator with vector architecture

Bogdan Ditu, Fred Peterson, Ciprian Arbone

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP) > 339 - 346

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP)

OpenCL is a standard that supports a parallel programming paradigm which enables heterogeneous multi-core systems and also offers a high level of portability for the application. Some of the systems that are used with OpenCL might have vector capabilities at device compute units level. There are more ways the vector capabilities could be exploited by the OpenCL device application, the most common...

rozdział

Userspace RDMA Verbs on Commodity Hardware Using DPDK

Patrick MacArthur

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI) > 103 - 110

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI)

RDMA (Remote Direct Memory Access) is a technology that enables user applications to perform direct data transfer between the virtual memory of processes on remote endpoints, without operating system involvement or intermediate data copies. Achieving zero intermediate data copies using RDMA requires specialized network interface hardware. Software RDMA drivers emulate RDMA semantics in software to...

rozdział

Automatic Control Flow Generation for OpenVX Graphs

Merten Popp, Stef van Son, Orlando Moreira

2017 Euromicro Conference on Digital System Design (DSD) > 198 - 204

2017 Euromicro Conference on Digital System Design (DSD)

Heterogeneous platforms with large numbers of processing elements (PEs) have been proposed to satisfy the computational requirements of computer vision applications. Limiting the incurred communication cost here is key to meet the power constraints of embedded devices.We present a new heuristic to reduce communication among PEs and to external memory by aggregating inter-process communication and...

rozdział

GPU acceleration for Kernel Samepage Merging

Wei-Cheng Lin, Chia-Heng Tu, Chih-Wei Yeh, Shih-Hao Hung

2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA) > 1 - 6

2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)

Kernel Samepage Merging (KSM) is a Linux kernel module for improving memory utilization by searching and merging the redundant memory pages. When working with the hypervisors, such as Kernel-based Virtual Machine, KSM helps share identical memory pages of the hosted virtual servers so as to increase the server density. Nevertheless, while KSM improves the efficiency of the host system, it hurts the...

rozdział

Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores

Minyoung Jung, Jinwoo Park, Johann Blieberger, Bernd Burgstaller

2017 46th International Conference on Parallel Processing (ICPP) > 271 - 281

2017 46th International Conference on Parallel Processing (ICPP)

String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA...

rozdział

High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

2017 46th International Conference on Parallel Processing (ICPP) > 101 - 110

2017 46th International Conference on Parallel Processing (ICPP)

Sparse general matrix-matrix multiplication (SpGEMM) is one of the key kernels of preconditioners such as algebraic multigrid method or graph algorithms. However, the performance of SpGEMM is quite low on modern processors due to random memory access to both input and output matrices. As well as the number and the pattern of non-zero elements in the output matrix, important for achieving locality,...

rozdział

Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators

Anna Pupykina, Giovanni Agosta

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 291 - 300

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

We address the problem of optimizing global shared memory usage in deeply heterogeneous accelerators in the context of HPC systems running multiple applications with different quality of service levels. We explore predictive memory allocation algorithms, allowing to serve up to 28% more high priority requests when using a moving average based prediction in a low-workload scenario.

rozdział

Overlapping Data Transfers with Computation on GPU with Tiles

Burak Bastem, Didem Unat, Weiqun Zhang, Ann Almgren, więcej

2017 46th International Conference on Parallel Processing (ICPP) > 171 - 180

2017 46th International Conference on Parallel Processing (ICPP)

GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking...

rozdział

Hardware architecture for 2D Gaussian filtering of HD images on resource constrained platforms

Carmine Cappetta, Gian Domenico Licciardo, Luigi Di Benedetto

2017 International Symposium on Signals, Circuits and Systems (ISSCS) > 1 - 4

2017 International Symposium on Signals, Circuits and Systems (ISSCS)

A bi-dimensional filter for high accuracy image processing is implemented by using a novel partitioning method. The method is based on a number theory theorem, which permits to reduce the complexity of the operation to that of an adder chain and also the amount of the coefficients stored in memory, improving the memory organization. To show the advantage of such method, we implemented a Floating Point...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
KERNEL
MEMORY MANAGEMENT

Data publikacji

Ustaw własny zakres dat

Dostępność treści

Dostępna (398)
Brak (4)

Słowa kluczowe

INSTRUCTION SETS (81)
LINUX (80)
HARDWARE (74)
GRAPHICS PROCESSING UNITS (58)
RANDOM ACCESS MEMORY (55)
RESOURCE MANAGEMENT (50)
GRAPHICS PROCESSING UNIT (45)
GPU (42)
BENCHMARK TESTING (38)
PARALLEL PROCESSING (38)
BANDWIDTH (37)
SERVERS (36)
OPTIMIZATION (33)
LIBRARIES (28)
ARRAYS (27)
STORAGE MANAGEMENT (25)
COMPUTATIONAL MODELING (24)
CUDA (24)
REGISTERS (24)
FIELD PROGRAMMABLE GATE ARRAYS (23)
PROGRAMMING (23)
RUNTIME (21)
OPERATING SYSTEM (20)
OPERATING SYSTEMS (COMPUTERS) (19)
PERFORMANCE EVALUATION (19)
VIRTUAL MACHINING (19)
EMBEDDED SYSTEMS (18)
SECURITY (18)
COPROCESSORS (17)
OPERATING SYSTEMS (17)
PROTOCOLS (17)
ALGORITHM DESIGN AND ANALYSIS (16)
MONITORING (16)
OPERATING SYSTEM KERNELS (16)
COMPUTER GRAPHIC EQUIPMENT (15)
DATA STRUCTURES (15)
MULTIPROCESSING SYSTEMS (15)
PROGRAM PROCESSORS (15)
SYNCHRONIZATION (15)
THROUGHPUT (14)
VIRTUAL MACHINE MONITORS (14)
INDEXES (13)
OPENCL (13)
VIRTUAL MACHINES (13)
ACCELERATION (12)
DATA MINING (12)
PARALLEL PROGRAMMING (12)
VIRTUALIZATION (12)
CACHE STORAGE (11)
CLOUD COMPUTING (11)
GPGPU (11)
IMAGE PROCESSING (11)
PREFETCHING (11)
REAL TIME SYSTEMS (11)
YARN (11)
COMPUTE UNIFIED DEVICE ARCHITECTURE (10)
FPGA (10)
MULTICORE PROCESSING (10)
NONVOLATILE MEMORY (10)
RADIATION DETECTORS (10)
CONVOLUTION (9)
DRIVER CIRCUITS (9)
POWER DEMAND (9)
RELIABILITY (9)
STREAMING MEDIA (9)
VECTORS (9)
EQUATIONS (8)
HIGH PERFORMANCE COMPUTING (8)
LATTICES (8)
REAL-TIME SYSTEMS (8)
SCALABILITY (8)
SUPPORT VECTOR MACHINES (8)
SYSTEM-ON-A-CHIP (8)
TRAINING (8)
VIRTUAL MACHINE (8)
COMPUTER ARCHITECTURE (7)
COMPUTER GRAPHICS (7)
DATA TRANSFER (7)
MEMORY (7)
MEMORY ARCHITECTURE (7)
NEURAL NETWORKS (7)
PIXEL (7)
PROCESSOR SCHEDULING (7)
RECONFIGURABLE ARCHITECTURES (7)
SCHEDULES (7)
SWITCHES (7)
ACCURACY (6)
APPLICATION PROGRAM INTERFACES (6)
CLOCKS (6)
COMPLEXITY THEORY (6)
CONTEXT (6)
DATABASES (6)
DIGITAL SIGNAL PROCESSING (6)
ENERGY CONSUMPTION (6)
GRAPHICS (6)
INSTRUMENTS (6)
INTERNET (6)
LOGIC GATES (6)
więcej

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu