Wei Zhang

chapter

A Sample-Based Dynamic CPU and GPU LLC Bypassing Method for Heterogeneous CPU-GPU Architectures

Xin Wang, Wei Zhang

2017 IEEE Trustcom/BigDataSE/ICESS > 753 - 760

2017 IEEE Trustcom/BigDataSE/ICESS

Heterogeneous multicore processors with integrated CPU and GPU (Graphic Processing Units) cores on the same chip post new challenges for resources sharing, which is crucial for performance. Unlike traditional multicores, the CPU and GPU cores in the integrated architecture can generate significantly different numbers of cache traffics and exhibit quite diverse temporal or spatial data locality. The...

chapter

Projection particle swarm optimizer

Qingshan Liu, Bingrong Xu, Jiang Xiong, Wei Zhang

2017 Seventh International Conference on Information Science and Technology (ICIST) > 161 - 168

2017 Seventh International Conference on Information Science and Technology (ICIST)

In this paper, a novel particle swarm optimizer is developed by introducing projection operators described by projection matrices into the algorithm. Under the projection operators, the particles will oscillate along the directions determined by the projection operators to enhance global explorations. At the same time, the particles explore locally the optimal solutions when they are close to the...

chapter

Edge chain detection by applying Helmholtz principle on gradient magnitude map

Xiaohu Lu, Jian Yao, Li Li, Yahui Liu, more

2016 23rd International Conference on Pattern Recognition (ICPR) > 1364 - 1369

2016 23rd International Conference on Pattern Recognition (ICPR)

In this paper, we present an efficient edge chain detection algorithm by applying the Helmholtz principle on the gradient magnitude map of an image. An edge chain validation method is proposed which uses the “relative number of false alarms” (RNFA) instead of the traditional “number of false alarms” (NFA). The edge chains are detected first and then validated according to their RNFA values. In this...

chapter

Parallelism tuning according to the deadline for power-gated ILP processors

Yufeng Tong, Yu Liang, Yung-Cheng Ma, Wei Zhang

2016 International Conference On Communication Problem-Solving (ICCP) > 1 - 2

2016 International Conference On Communication Problem-Solving (ICCP)

Digital signal processors (DSP) with very-longinstruction-word (VLIW)processors have been widely used incommunication systems in recent years. It is obvious that parallelism requirement are different between applications, even within an application. As a result, the scheme, which is to partition the application into several regions and assign each region with adapted parallelism, has been proposed...

chapter

A novel improved teaching-learning based optimization for functional optimization

Xinghua Qu, Bo Liu, Zhengyang Li, Wenzhe Duan, more

2016 12th IEEE International Conference on Control and Automation (ICCA) > 939 - 943

2016 12th IEEE International Conference on Control and Automation (ICCA)

Despite the global fast coarse search capability of Teaching-Learning Based Optimization (TLBO), analysis in literature on the performance of TLBO reveals it often risks getting prematurely stuck in local optima for numerical optimization problems. In this study, Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method is incorporated into the conventional TLBO to enhance its local searching performance...

chapter

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

Yijie Huangfu, Wei Zhang

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 972 - 976

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

Intelligent GPU cache bypassing can improve the efficiency of using GPU memory bandwidth, which can benefit GPU performance. In this paper, we study a pure hardware-based GPU cache bypassing method that can be applied to GPU applications without having to recompile the programs. Moreover, we introduce a hybrid method that can exploit profiling information to further enhance the hardware-based bypassing...

chapter

Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup

Wei Zhang, Hang Zhang, John Lach

2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED) > 49 - 54

2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)

To minimize the access latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup. However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. This paper proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that...

chapter

Boosting GPU Performance by Profiling-Based L1 Data Cache Bypassing

Yijie Huangfu, Wei Zhang

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 1119 - 1122

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Cache memories have been introduced in recent generations of Graphics Processing Units (GPUs) to benefit general-purpose computing on GPUs (GPGPUs). In this work, we analyze the memory access patterns of GPGPU applications and propose a cost-effective profiling-based method to identify the data accesses that should bypass the L1 data cache to improve performance. The evaluation indicates that the...

chapter

Real-Time GPU Computing: Cache or No Cache?

Yijie Huangfu, Wei Zhang

2015 IEEE 18th International Symposium on Real-Time Distributed Computing > 182 - 189

2015 IEEE 18th International Symposium on Real-Time Distributed Computing (ISORC)

Recent Graphics Processing Units (GPUs) have employed cache memories to boost performance. However, cache memories are well known to be harmful to time predictability for CPUs. For high-performance real-time systems using GPUs, it remains unknown whether or not cache memories should be employed. In this paper, we quantitatively compare the performance for GPUs with and without caches, and find that...

chapter

Hardware-Based Performance Enhancement Guaranteed Caches

Yijie Huangfu, Wei Zhang

2015 IEEE 18th International Symposium on Real-Time Distributed Computing > 166 - 173

2015 IEEE 18th International Symposium on Real-Time Distributed Computing (ISORC)

Cache memories are widely used in microprocessors to improve the average-case memory performance. However, they are harmful to time predictability, and thus may not be desirable for real-time systems. In this paper, we make simple hardware extensions of a regular cache to implement the performance enhancement guaranteed cache (PEG-C). The PEG-C is totally controlled by hardware, which can automatically...

chapter

Exploring shared memory and cache to improve GPU performance and energy efficiency

Hao Wen, Wei Zhang

Sixteenth International Symposium on Quality Electronic Design > 402 - 405

2015 16th International Symposium on Quality Electronic Design (ISQED)

Graphic Processing Units(GPU) use multiple, multithreaded, SIMD cores to exploit data parallelism to boost performance. State-of-the-art GPUs use configurable shared memory and cache to improve performance for applications with different access patterns. Unlike CPU programs, GPU programs usually exhibit different access patterns, whose performance may not be heavily dependent on the cache access latencies...

chapter

Worst-case performance guaranteed data cache

Yijie Huangfu, Wei Zhang

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) > 1 - 2

2014 IEEE International Performance Computing and Communications Conference (IPCCC)

In this paper, we propose a Performance Enhancement Guaranteed Cache (PEG-C) to ensure performance benefit in the worst case while achieving as good average-case performance as a regular hardware-controlled cache. Our experiments indicate that with a small number of preloaded data and a simple hardware extension, the PEG-C can guarantee performance enhancement in the worst case while achieving the...

chapter

WCET analysis of static NUCA caches

Yiqiang Ding, Wei Zhang

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) > 1 - 6

2014 IEEE International Performance Computing and Communications Conference (IPCCC)

Large on-chip caches with uniform access time are inefficient to be used in multicore processors due to the increasing wire delays across the chip. The Non-Uniform Cache Architecture (NUCA) is proved to be effective to solve the problem of the increasing wire delays in multicore processors. For real-time systems that use multicore processors, it is crucial to bound the worst-case execution time (WCET)...

chapter

Characterizing Energy Consumption of Real-Time and Media Benchmarks on Hybrid SPM-Caches

Lan Wu, Yiqiang Ding, Wei Zhang

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 526 - 533

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

In this paper, we comparatively evaluate the energy consumption of real-time and media benchmarks on three different hybrid on-chip memory architectures. Our evaluation indicates that while pure SPMs can lead to less on-chip memory energy consumption than pure caches of the same size, the pure caches can reduce total energy consumption than pure SPMs by improving the performance. The hybrid SPM-caches...

chapter

Exploiting Hybrid SPM-Cache Architectures to Reduce Energy Consumption for Embedded Computing

Wei Zhang, Lan Wu

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 340 - 347

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

Scratch-Pad Memories (SPMs) have been increasingly used in embedded systems due to their time predictability and better energy efficiency as compared to caches. However, the SPM is typically controlled by software, which is less adaptive to runtime instruction/data access patterns that are dependent on the input data and hence may lead to performance degradation. In this paper, we study the energy...

chapter

Performance Implication of Multicore Cache Locking on General-Purpose Processors

Matthew Loach, Wei Zhang

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 328 - 331

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

Multicore processors are a common and a necessary step in the evolution of the microprocessor. Today's general-purpose multicore processors cannot even provide soft real-time guarantee. This work studies the performance of cache-locking on a general-purpose multicore processor. The performance results for two different locking methods are determined in a variety of multicore configurations. It is...

chapter

Reconfigurable Dynamic Trusted Platform Module for Control Flow Checking

Sanjeev Das, Wei Zhang, Yang Liu

2014 IEEE Computer Society Annual Symposium on VLSI > 166 - 171

2014 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Trusted Platform Module (TPM) has gained its popularity in computing systems as a hardware security approach. TPM provides the boot time security by verifying the platform integrity including hardware and software. However, once the software is loaded, TPM can no longer protect the software execution. In this work, we propose a dynamic TPM design, which performs control flow checking to protect the...

chapter

Soft error mitigation through selection of noninvert implication paths

Bin Zhou, Srikanthan Thambipillai, Wei Zhang

2014 NASA/ESA Conference on Adaptive Hardware and Systems (AHS) > 77 - 82

2014 NASA/ESA Conference on Adaptive Hardware and Systems (AHS)

As transistor feature size scales down, soft errors in combinational logic because of high-energy particle radiation is gaining increasing concerns. In this paper, a soft error mitigation method based on accurate mathematical modeling of SER and addition of non-invert functionally redundant wires (FRWs) is proposed. In the proposed method, the factors which have significant influences on SER because...

chapter

A Real-Time Instruction Cache with High Average-Case Performance

Yijie Huangfu, Wei Zhang

2014 IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing > 109 - 116

2014 IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC)

Cache memories, while useful for improving the average-case performance for general-purpose applications, are not suitable for real-time systems due to the time unpredictability. In this paper, we propose a Performance Enhancement Guaranteed Cache (PEG-C) to ensure performance improvement in the worst case while achieving as good average-case performance as a regular hardware-controlled cache. We...

chapter

Reconfigurable DSP block design for dynamically reconfigurable architecture

Rakesh Warrier, Liang Hao, Wei Zhang

2014 IEEE International Symposium on Circuits and Systems (ISCAS) > 2551 - 2554

2014 IEEE International Symposium on Circuits and Systems (ISCAS)

Reconfigurable architectures, such as Field-Programmable Gate Arrays (FPGAs), have become one of the key digital circuit implementation platform over the last decade due to its short time-to-market and low design cost. However, the major bottlenecks of FPGAs are their low logic utilization rate and long reconfiguration latency. In order to overcome these limitations, novel dynamically reconfigurable...

INFONA - science communication portal

Search results for: Wei Zhang

A Sample-Based Dynamic CPU and GPU LLC Bypassing Method for Heterogeneous CPU-GPU Architectures

Projection particle swarm optimizer

Edge chain detection by applying Helmholtz principle on gradient magnitude map

Parallelism tuning according to the deadline for power-gated ILP processors

A novel improved teaching-learning based optimization for functional optimization

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup

Boosting GPU Performance by Profiling-Based L1 Data Cache Bypassing

Real-Time GPU Computing: Cache or No Cache?

Hardware-Based Performance Enhancement Guaranteed Caches

Exploring shared memory and cache to improve GPU performance and energy efficiency

Worst-case performance guaranteed data cache

WCET analysis of static NUCA caches

Characterizing Energy Consumption of Real-Time and Media Benchmarks on Hybrid SPM-Caches

Exploiting Hybrid SPM-Cache Architectures to Reduce Energy Consumption for Embedded Computing

Performance Implication of Multicore Cache Locking on General-Purpose Processors

Reconfigurable Dynamic Trusted Platform Module for Control Flow Checking

Soft error mitigation through selection of noninvert implication paths

A Real-Time Instruction Cache with High Average-Case Performance

Reconfigurable DSP block design for dynamically reconfigurable architecture

Filter options

Publication date

Publication type

Keywords

Journal

INFONA - science communication portal

Search results for: Wei Zhang

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Journal

Reporting an error / abuse

Sending the report failed

Accessibility options