Wyniki wyszukiwania

Pozycje od 81 do 100 spośród 402 wyników

Poprzednia

Następna

rozdział

A comprehensive performance analysis of HSA and OpenCL 2.0

Saoni Mukherjee, Yifan Sun, Paul Blinzer, Amir Kavyan Ziabari, więcej

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 183 - 193

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Heterogeneous systems, that marry CPUs and GPUs together in a range of configurations, are quickly becoming the design paradigm for today's platforms because of their impressive parallel processing capabilities. However, in many existing heterogeneous systems, the GPU is only treated as an accelerator by the CPU, working as a slave to the CPU master. But recently we are starting to see the introduction...

rozdział

Engineering software using automation

William I. Lundgren, James W. Steed, Kerry B. Barnes

2016 IEEE Aerospace Conference > 1 - 9

2016 IEEE Aerospace Conference

Gedae has developed automated software engineering technology for computers and software. This paper presents the research, prototypes, and documented software engineering improvements from real-world case studies that led to the Gedae technology. Gedae's technology is based on the creation and analysis of software models, specifically dataflow software models. The dataflow software model is implemented...

rozdział

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Anamaria Vizitiu, Lucian Mihai Itu, Ranveer Joyseeree, Adrien Depeursinge, więcej

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 431 - 434

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Visual pattern recognition is a key research topic in the field of image processing and computer vision. Texture analysis based on steerable Riesz wavelets is powerful, but requires computing pixel -- wise operations resulting in a run time in the order of days when large volumes of data are processed. To overcome this limitation we propose a Graphics Processing Unit (GPU) based solution. A standard...

rozdział

Specific Read-Only Data Management for Memory System Optimization

Gregory Vaumourin, Guerre Alexandre, Dombek Thomas, Denis Barthou

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 337 - 340

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

This paper proposes a new way of managing the cache by exploiting the difference of behavior in the memory system between read-only data and read-write data. A division of the existing cache-based memory hierarchy is proposed in order to create a dedicated data path for read-only data. In order to justify this approach, an analysis performed on a set of benchmarks shows that read-only data count for...

rozdział

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

Elias Konstantinidis, Yiannis Cotronis

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 448 - 455

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Modern Graphics Processing Units (GPUs) have evolved to high performance general purpose processors, forming an alternative to CPUs. However, programming them effectively has proven to be a challenge, not only due to the mandatory requirement of extracting massive fine grained parallelism but also due to its susceptible performance on memory traffic. Apart from regular memory caches, GPUs feature...

rozdział

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

Elias Konstantinidis, Yiannis Cotronis

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 448 - 455

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

rozdział

Specific Read-Only Data Management for Memory System Optimization

Gregory Vaumourin, Guerre Alexandre, Dombek Thomas, Denis Barthou

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 337 - 340

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

rozdział

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Anamaria Vizitiu, Lucian Mihai Itu, Ranveer Joyseeree, Adrien Depeursinge, więcej

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 431 - 434

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

rozdział

Scheduling techniques for GPU architectures with processing-in-memory capabilities

Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, więcej

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 31 - 44

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and applications, where main memory bandwidth is a critical bottleneck, can benefit from the use of PIM. To this end, an application should be properly...

rozdział

Joint loop mapping and data placement for coarse-grained reconfigurable architecture with multi-bank memory

Shouyi Yin, Xianqing Yao, Tianyi Lu, Leibo Liu, więcej

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 1 - 8

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Coarse-Grained Reconfigurable Architecture (CGRA) is a promising architecture with high performance, high power-efficiency and attraction of flexibility. The compute-intensive parts of an application (e.g. loops) are often mapped onto CGRA for acceleration. Since the high-parallel demands of PEs and the extremely expensive cost of single-bank memory with multi-port, the architecture with multi-bank...

rozdział

A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel

Sicheng Li, Yandan Wang, Wujie Wen, Yu Wang, więcej

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 1 - 6

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Sparse matrix-vector multiplication (SpMV) is an important computational kernel in many applications. For performance improvement, software libraries designated for SpMV computation have been introduced, e.g., MKL library for CPUs and cuSPARSE library for GPUs. However, the computational throughput of these libraries is far below the peak floating-point performance offered by hardware platforms, because...

rozdział

TEMP: Thread batch enabled memory partitioning for GPU

Mengjie Mao, Wujie Wen, Xiaoxiao Liu, Jingtong Hu, więcej

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)

As massive multi-threading in GPU imposes tremendous pressure on memory subsystems, efficient bandwidth utilization becomes a key factor affecting the GPU throughput. In this work, we propose thread batch enabled memory partitioning (TEMP), to improve GPU performance through the improvement of memory bandwidth utilization. In particular, TEMP clusters multiple thread blocks sharing the same set of...

rozdział

A study on user-level remote memory extension system

Shinyoung Ahn, Gyuil Cha, Youngho Kim, Eunji Lim, więcej

2016 18th International Conference on Advanced Communication Technology (ICACT) > 234 - 239

2016 18th International Conference on Advanced Communication Technology (ICACT)

The speed of memory capacity expansion of the computer system has not kept up with the speed of the increase of the memory requirement of large memory applications. Also, big memory system has been too expensive for many researchers and students. Therefore, approaches to utilize remote memory has been considered as a cost effective way to run large memory applications in the cluster environment where...

rozdział

Optimize In-kernel swap memory by avoiding duplicate swap out pages

Srividya Desireddy, Dinakar Reddy Pathireddy

2016 International Conference on Microelectronics, Computing and Communications (MicroCom) > 1 - 4

2016 International Conference on Microelectronics, Computing and Communications (MicroCom)

On embedded devices the physical memory is a critical resource. RAM should be used very efficiently without affecting the performance of the device. In-kernel memory swapping is a Linux feature which creates RAM based swap area and provides a form of virtual memory compression. It increases performance by using a compressed block device in RAM for paging instead of disk. Since In-kernel memory swapping...

rozdział

V-PFORDelta: Data Compression for Energy Efficient Computation of Time Series

Abdullah Al Hasib, Juan M. Cebrian, Lasse Natvig

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 416 - 425

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Chip multiprocessors (CMPs) and heterogeneous architectures have become predominant in all market segments, from embedded to high performance computing. These architectures exacerbate on-chip data requirements, creating additional pressure on the memory subsystem. Consequently, efficient utilization of on-chip memory space becomes critical for data intensive applications. A promising means of addressing...

rozdział

Towards Practical Page Placement for a Green Memory Manager

Ashish Panwar, K. Gopinath

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 155 - 164

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Increased performance demand of modern applications has resulted in large memory modules and higher performance processors in computing systems. Power consumption becomes an important aspect when these resources go underutilized in a running system, e.g. during idle periods or lighter workloads. CPUs have come a long way in optimizing away the unnecessary power consumption in both hardware and software...

rozdział

Evaluation of CUDA memory fence performance; Berlekamp-Massey case study

Hanan Ali, Zeinab Fayez, Ghada M. Fathy, Walaa Sheta

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 586 - 590

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

Graphics processors Unit (GPU) architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of general purpose applications compared to contemporary general-purpose processors (CPUs). However, GPU architecture depends on multithreading that needs to share data and resources that face memory concurrency issues. Data races and deadlocks are the most...

rozdział

A configurable management strategy for parallel access of coarse-grained reconfigurable architecture for radar processing

Xiaotong Wang, Weiqi Ge, Yu Gong, Bo Liu

2015 4th International Conference on Computer Science and Network Technology (ICCSNT) > 1 > 812 - 815

2015 4th International Conference on Computer Science and Network Technology (ICCSNT)

In order to reduce the date access conflicts and improve the memory access efficiency in radar signal processing, a linear varying step-size data management strategy combined with a hierarchical memory structure is proposed. By proposing the logical mapping strategy between the reconfigurable arrays and the multi-bank memory, the memory access performance of reconfigurable processor is improved and...

rozdział

Out of Memory Prevention Based on Memory Allocation Rate

Gaku Nakagawa, Hirotaka Kawata, Shuichi Oikawa

2015 Third International Symposium on Computing and Networking (CANDAR) > 566 - 570

2015 Third International Symposium on Computing and Networking (CANDAR)

The amount of free memory have a great influence on system stability because out of memory occurs performance degradation phenomena, unexpected process terminations and so on. Thus, It is an important administration task to design the memory utilization plan based on the characteristics of the processes. However, in sometimes, processes demand a large amount of main memory rapidly and unexpectedly...

rozdział

Investigations into techniques to accelerate memory intensive GPGPU applications

Winnie Thomas, Rohin D. Daruwala

2015 Annual IEEE India Conference (INDICON) > 1 - 6

2015 Annual IEEE India Conference (INDICON)

Recent advancements in the architecture of Graphic Processing Unit (GPU), enables the acceleration of many general purpose applications. Even with high memory bandwidth, GPUs are still faced with the challenge of accelerating highly memory intensive applications. To overcome this challenge this paper investigates the impact of scaling up of the memory partitions and also scaling of frequency of the...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
KERNEL
MEMORY MANAGEMENT

Data publikacji

Ustaw własny zakres dat

Dostępność treści

Dostępna (398)
Brak (4)

Słowa kluczowe

INSTRUCTION SETS (81)
LINUX (80)
HARDWARE (74)
GRAPHICS PROCESSING UNITS (58)
RANDOM ACCESS MEMORY (55)
RESOURCE MANAGEMENT (50)
GRAPHICS PROCESSING UNIT (45)
GPU (42)
BENCHMARK TESTING (38)
PARALLEL PROCESSING (38)
BANDWIDTH (37)
SERVERS (36)
OPTIMIZATION (33)
LIBRARIES (28)
ARRAYS (27)
STORAGE MANAGEMENT (25)
COMPUTATIONAL MODELING (24)
CUDA (24)
REGISTERS (24)
FIELD PROGRAMMABLE GATE ARRAYS (23)
PROGRAMMING (23)
RUNTIME (21)
OPERATING SYSTEM (20)
OPERATING SYSTEMS (COMPUTERS) (19)
PERFORMANCE EVALUATION (19)
VIRTUAL MACHINING (19)
EMBEDDED SYSTEMS (18)
SECURITY (18)
COPROCESSORS (17)
OPERATING SYSTEMS (17)
PROTOCOLS (17)
ALGORITHM DESIGN AND ANALYSIS (16)
MONITORING (16)
OPERATING SYSTEM KERNELS (16)
COMPUTER GRAPHIC EQUIPMENT (15)
DATA STRUCTURES (15)
MULTIPROCESSING SYSTEMS (15)
PROGRAM PROCESSORS (15)
SYNCHRONIZATION (15)
THROUGHPUT (14)
VIRTUAL MACHINE MONITORS (14)
INDEXES (13)
OPENCL (13)
VIRTUAL MACHINES (13)
ACCELERATION (12)
DATA MINING (12)
PARALLEL PROGRAMMING (12)
VIRTUALIZATION (12)
CACHE STORAGE (11)
CLOUD COMPUTING (11)
GPGPU (11)
IMAGE PROCESSING (11)
PREFETCHING (11)
REAL TIME SYSTEMS (11)
YARN (11)
COMPUTE UNIFIED DEVICE ARCHITECTURE (10)
FPGA (10)
MULTICORE PROCESSING (10)
NONVOLATILE MEMORY (10)
RADIATION DETECTORS (10)
CONVOLUTION (9)
DRIVER CIRCUITS (9)
POWER DEMAND (9)
RELIABILITY (9)
STREAMING MEDIA (9)
VECTORS (9)
EQUATIONS (8)
HIGH PERFORMANCE COMPUTING (8)
LATTICES (8)
REAL-TIME SYSTEMS (8)
SCALABILITY (8)
SUPPORT VECTOR MACHINES (8)
SYSTEM-ON-A-CHIP (8)
TRAINING (8)
VIRTUAL MACHINE (8)
COMPUTER ARCHITECTURE (7)
COMPUTER GRAPHICS (7)
DATA TRANSFER (7)
MEMORY (7)
MEMORY ARCHITECTURE (7)
NEURAL NETWORKS (7)
PIXEL (7)
PROCESSOR SCHEDULING (7)
RECONFIGURABLE ARCHITECTURES (7)
SCHEDULES (7)
SWITCHES (7)
ACCURACY (6)
APPLICATION PROGRAM INTERFACES (6)
CLOCKS (6)
COMPLEXITY THEORY (6)
CONTEXT (6)
DATABASES (6)
DIGITAL SIGNAL PROCESSING (6)
ENERGY CONSUMPTION (6)
GRAPHICS (6)
INSTRUMENTS (6)
INTERNET (6)
LOGIC GATES (6)
więcej

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu