Search results

Items from 121 to 140 out of 843 results

1 ...
4
5
6
7
8
9
10

chapter

Effective Utilization of CUDA Hyper-Q for Improved Power and Performance Efficiency

Ryan S. Luley, Qinru Qiu

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1160 - 1169

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

High utilization of hardware resources is the key for designing performance and power optimized GPUapplications. The efficiency of applications and kernels, which do not fully utilize the GPU resources, can be improved through concurrent execution with independent kernels and/or applications. Hyper-Q enables multiple CPU threads or processes to launch work on a single GPU simultaneously for increased...

chapter

Counting Triangles in Large Graphs on GPU

Adam Polak

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 740 - 746

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The clustering coefficient and the transitivity ratio are concepts often used in network analysis, which creates a need for fast practical algorithms for counting triangles in large graphs. Previous research in this area focused on sequential algorithms, MapReduce parallelization, and fast approximations. In this paper we propose a parallel triangle counting algorithm for CUDA GPU. We describe the...

chapter

When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration

Yu-Ting Chen, Jason Cong, Zhenman Fang, Jie Lei, more

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 29

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

FPGA-enabled datacenters have shown great potential for providing performance and energy efficiency improvement, and captured a great amount of attention from both academia and industry. In this paper we aim to answer one key question: how can we efficiently integrate FPGAs into state-of-the-art big-data computing frameworks? Although very important, this problem has not been well studied, especially...

chapter

On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures

Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1249 - 1258

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Many scientific applications, ranging from national security to medical advances, require solving a number of relatively small-size independent problems. As the size of each individual problem does not provide sufficient parallelism for the underlying hardware, especially accelerators, these problems must be solved concurrently as a batch in order to saturate the hardware with enough work, hence the...

chapter

Accelerating frequency-domain simulations using small shared-memory CPU/GPU cluster

Tomasz Topa, Artur Noga, Andrzej Karwowski

2016 21st International Conference on Microwave, Radar and Wireless Communications (MIKON) > 1 - 4

2016 21st International Conference on Microwave, Radar and Wireless Communications (MIKON)

Numerical approach to frequency response problems usually requires that the system governing equation is solved repeatedly at many frequencies. The computational efficiency of the overall process can be increased by departing from traditional sequential computing model in favor of utilizing the parallel processing capability commonly offered by modern hardware. In this paper, we consider a hybrid...

chapter

Efficient kernel management on GPUs

Xiuhong Li, Yun Liang

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 85 - 90

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

As the complexity of applications continues to grow, each new generation of GPUs has been equipped with advanced architectural features and more resources to sustain its performance acceleration capability. Recent GPUs have been featured with concurrent kernel execution, which is designed to improve the resource utilization by executing multiple kernels simultaneously. However, prior systems only...

chapter

Critical points based register-concurrency autotuning for GPUs

Ang Li, Shuaiwen Leon Song, Akash Kumar, Eddy Z. Zhang, more

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1273 - 1278

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

The unprecedented prevalence of GPGPU is largely attributed to its abundant on-chip register resources, which allow massively concurrent threads and extremely fast context switch. However, due to internal memory size constraints, there is a tradeoff between the per-thread register usage and the overall thread concurrency. This becomes a design problem in terms of performance tuning, since the performance...

chapter

Agave: A benchmark suite for exploring the complexities of the Android software stack

Martin K. Brown, Zachary Yannes, Michael Lustig, Mazdak Sanati, more

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 157 - 158

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Traditional suites used for benchmarking high-performance computing platforms or for architectural design space exploration use much simpler virtual memory layouts and multitasking/ multithreading schemes, which means that they cannot be used to study the complex interactions among the layers of the Android software stack. To demonstrate this, we present memory reference and concurrency data showing...

chapter

Platform-independent reverse debugging of the virtual machines

Pavel Dovgalyuk, Denis Dmitriev, Vladimir Makarov

2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology (FRUCT-ISPIT) > 41 - 47

2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology (FRUCT-ISPIT)

Prototyping and debugging of operating systems and drivers are very tough tasks because of hardware volatility, kernel panics, blue screens of death, long periods of time required to expose the bug, perturbation of the drivers by the debugger, and non-determinism of multi-threaded environment. This paper shows how the deterministic replay of the virtual machine execution can be used to reduce the...

chapter

GSI: A GPU Stall Inspector to characterize the sources of memory stalls for tightly coupled GPUs

Johnathan Alsop, Matthew D. Sinclair, Rakesh Komuravelli, Sarita V. Adve

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 172 - 182

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

In recent years the power wall has prevented the continued scaling of single core performance. This has lead to the rise of dark silicon and motivated a move toward parallelism and specialization. As a result, energy-efficient high-throughput GPU cores are increasingly favored for accelerating data-parallel applications. However, the best way to efficiently communicate and synchronize across heterogeneous...

chapter

X-Mem: A cross-platform and extensible memory characterization tool for the cloud

Mark Gottscho, Sriram Govindan, Bikash Sharma, Mohammed Shoaib, more

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 263 - 273

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Effective use of the memory hierarchy is crucial to cloud computing. Platform memory subsystems must be carefully provisioned and configured to minimize overall cost and energy for cloud providers. For cloud subscribers, the diversity of available platforms complicates comparisons and the optimization of performance. To address these needs, we present X-Mem, a new open-source software tool that characterizes...

chapter

Performance analysis of Fast Fourier Transform on Field Programmable Gate Arrays and graphic cards

Muhammad Ibrahim, Omar Khan

2016 International Conference on Computing, Electronic and Electrical Engineering (ICE Cube) > 158 - 162

2016 International Conference on Computing, Electronic and Electrical Engineering (ICE Cube)

The Fast Fourier Transform (FFT) is an important algorithm in the fields of science and engineering, where it is used in diverse areas such as communications, signal processing, instrumentation, image and video analysis, etc. The algorithm is essentially a fast implementation of the Discrete Fourier Transform which allows it to reduce the asymptotic complexity of the latter from O(n²) to the former's...

chapter

An improved faulting detection algorithm for subway tunnel segment

Zhengzhe Yang, Xinwen Gao, Haibing Xia

2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) > 1710 - 1716

2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET)

Novel security detection technology are needed to meet the growing demand of subway operation. In this paper, an improved faulting detection algorithm for subway tunnel segment is proposed. A combined denoising technique is used to convert the depth image of faulting acquired by Kinect into binary image of the height difference which can be processed by digital image. The obvious advantage of this...

chapter

Security Identifier Randomization: A Method to Prevent Kernel Privilege-Escalation Attacks

Lifeng Wei, Yudan Zuo, Yan Ding, Pan Dong, more

2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA) > 838 - 842

2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA)

Privilege escalation attack is one of the serious threats to Linux. So the protection of the root user is an important requirement for Linux systems and SELinux has tackled this issue in some degree. But by exploiting kernel privilege-escalation vulnerabilities, the attackers can tamper security identifiers allocated for the process's security contexts, which are the foundation of SELinux enforcing...

chapter

Engineering software using automation

William I. Lundgren, James W. Steed, Kerry B. Barnes

2016 IEEE Aerospace Conference > 1 - 9

2016 IEEE Aerospace Conference

Gedae has developed automated software engineering technology for computers and software. This paper presents the research, prototypes, and documented software engineering improvements from real-world case studies that led to the Gedae technology. Gedae's technology is based on the creation and analysis of software models, specifically dataflow software models. The dataflow software model is implemented...

chapter

Parallel edge detection by SOBEL algorithm using CUDA C

Adhir Jain, Anand Namdev, Meenu Chawla

2016 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS) > 1 - 6

2016 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS)

Edge detection is one of the most important paradigm of Image processing. Images contain millions of pixel and each pixel information is independent of its neighbouring pixel. Hence this paper puts to test the capability of Graphics Processing Unit (GPU) to compute in parallel against the millions of pixel calculations involved in image processing. Each pixel operation is independent from other thus...

chapter

A GPU-Parallel Algorithm for ECG Signal Denoising Based on the NLM Method

Salvatore Cuomo, Pasquale De Michele, Ardelio Galletti, Livia Marcellino

2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA) > 35 - 39

2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA)

In recent years, the real-time diagnosis in the Ehealthis a widely used practice. Employing distributed computingsystems, it is possible to obtain excellent results, avoiding longdelays and invasive processes. However, the data processing stage, generally assigned on standard computational CPU environments, is a critical aspect, especially when the computational complexity of the numerical method...

chapter

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Anamaria Vizitiu, Lucian Mihai Itu, Ranveer Joyseeree, Adrien Depeursinge, more

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 431 - 434

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Visual pattern recognition is a key research topic in the field of image processing and computer vision. Texture analysis based on steerable Riesz wavelets is powerful, but requires computing pixel -- wise operations resulting in a run time in the order of days when large volumes of data are processed. To overcome this limitation we propose a Graphics Processing Unit (GPU) based solution. A standard...

chapter

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

Elias Konstantinidis, Yiannis Cotronis

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 448 - 455

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Modern Graphics Processing Units (GPUs) have evolved to high performance general purpose processors, forming an alternative to CPUs. However, programming them effectively has proven to be a challenge, not only due to the mandatory requirement of extracting massive fine grained parallelism but also due to its susceptible performance on memory traffic. Apart from regular memory caches, GPUs feature...

chapter

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

Elias Konstantinidis, Yiannis Cotronis

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 448 - 455

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

1 ...
4
5
6
7
8
9
10

Keywords:
KERNEL
INSTRUCTION SETS

Publication date

Set your own date range

Content availability

Available (840)
None (3)

Keywords

GRAPHICS PROCESSING UNITS (354)
GRAPHICS PROCESSING UNIT (291)
GPU (204)
CUDA (164)
COMPUTER ARCHITECTURE (155)
PARALLEL PROCESSING (149)
HARDWARE (131)
OPTIMIZATION (110)
COMPUTATIONAL MODELING (109)
GPGPU (94)
COPROCESSORS (92)
REGISTERS (84)
MEMORY MANAGEMENT (81)
ARRAYS (77)
COMPUTER GRAPHIC EQUIPMENT (70)
PROGRAMMING (62)
SYNCHRONIZATION (57)
ALGORITHM DESIGN AND ANALYSIS (56)
BENCHMARK TESTING (54)
LINUX (48)
PERFORMANCE EVALUATION (46)
VECTORS (44)
ACCELERATION (43)
LIBRARIES (41)
SPARSE MATRICES (41)
MATHEMATICAL MODEL (38)
BANDWIDTH (35)
MULTIPROCESSING SYSTEMS (34)
THROUGHPUT (34)
MULTICORE PROCESSING (33)
RUNTIME (33)
OPENCL (32)
RANDOM ACCESS MEMORY (32)
RESOURCE MANAGEMENT (31)
MESSAGE SYSTEMS (30)
INDEXES (29)
CONTEXT (28)
FIELD PROGRAMMABLE GATE ARRAYS (27)
PARALLEL COMPUTING (27)
PARALLEL ARCHITECTURES (26)
CENTRAL PROCESSING UNIT (25)
DATA STRUCTURES (25)
REAL-TIME SYSTEMS (22)
EQUATIONS (21)
SCHEDULING (21)
SWITCHES (20)
PERFORMANCE (19)
PARALLEL ALGORITHMS (18)
PIPELINES (18)
CLUSTERING ALGORITHMS (17)
PARALLEL PROGRAMMING (17)
ACCURACY (16)
DATA TRANSFER (16)
HEURISTIC ALGORITHMS (16)
OPENMP (16)
EMBEDDED SYSTEMS (15)
IMAGE PROCESSING (15)
MULTI-THREADING (15)
PIXEL (15)
SYSTEM-ON-CHIP (15)
LAYOUT (14)
OPTIMISATION (14)
PROCESSOR SCHEDULING (14)
SCHEDULES (14)
SERVERS (14)
TRAINING (14)
COMPUTE UNIFIED DEVICE ARCHITECTURE (13)
COMPUTERS (13)
HIGH PERFORMANCE COMPUTING (13)
PARALLEL (13)
REAL TIME SYSTEMS (13)
GPU COMPUTING (12)
GRAPHIC PROCESSING UNIT (12)
MONITORING (12)
MPI (12)
SCALABILITY (12)
STANDARDS (12)
TILES (12)
DECODING (11)
ESTIMATION (11)
FEATURE EXTRACTION (11)
FPGA (11)
GENETIC ALGORITHMS (11)
GPUS (11)
GRAPHICS (11)
HISTOGRAMS (11)
JACOBIAN MATRICES (11)
MATRIX DECOMPOSITION (11)
SPMV (11)
TUNING (11)
ANALYTICAL MODELS (10)
APPLICATION PROGRAM INTERFACES (10)
CONVOLUTION (10)
CPU (10)
EDUCATIONAL INSTITUTIONS (10)
ENCODING (10)
ENERGY CONSUMPTION (10)
IMAGE COLOR ANALYSIS (10)
more

INFONA - science communication portal

Search results

Effective Utilization of CUDA Hyper-Q for Improved Power and Performance Efficiency

Counting Triangles in Large Graphs on GPU

When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration

On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures

Accelerating frequency-domain simulations using small shared-memory CPU/GPU cluster

Efficient kernel management on GPUs

Critical points based register-concurrency autotuning for GPUs

Agave: A benchmark suite for exploring the complexities of the Android software stack

Platform-independent reverse debugging of the virtual machines

GSI: A GPU Stall Inspector to characterize the sources of memory stalls for tightly coupled GPUs

X-Mem: A cross-platform and extensible memory characterization tool for the cloud

Performance analysis of Fast Fourier Transform on Field Programmable Gate Arrays and graphic cards

An improved faulting detection algorithm for subway tunnel segment

Security Identifier Randomization: A Method to Prevent Kernel Privilege-Escalation Attacks

Engineering software using automation

Parallel edge detection by SOBEL algorithm using CUDA C

A GPU-Parallel Algorithm for ECG Signal Denoising Based on the NLM Method

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options