The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Heterogeneous multicore processors with integrated CPU and GPU (Graphic Processing Units) cores on the same chip post new challenges for resources sharing, which is crucial for performance. Unlike traditional multicores, the CPU and GPU cores in the integrated architecture can generate significantly different numbers of cache traffics and exhibit quite diverse temporal or spatial data locality. The...
In this paper, a novel particle swarm optimizer is developed by introducing projection operators described by projection matrices into the algorithm. Under the projection operators, the particles will oscillate along the directions determined by the projection operators to enhance global explorations. At the same time, the particles explore locally the optimal solutions when they are close to the...
In this paper, we present an efficient edge chain detection algorithm by applying the Helmholtz principle on the gradient magnitude map of an image. An edge chain validation method is proposed which uses the “relative number of false alarms” (RNFA) instead of the traditional “number of false alarms” (NFA). The edge chains are detected first and then validated according to their RNFA values. In this...
Digital signal processors (DSP) with very-longinstruction-word (VLIW)processors have been widely used incommunication systems in recent years. It is obvious that parallelism requirement are different between applications, even within an application. As a result, the scheme, which is to partition the application into several regions and assign each region with adapted parallelism, has been proposed...
Despite the global fast coarse search capability of Teaching-Learning Based Optimization (TLBO), analysis in literature on the performance of TLBO reveals it often risks getting prematurely stuck in local optima for numerical optimization problems. In this study, Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method is incorporated into the conventional TLBO to enhance its local searching performance...
Intelligent GPU cache bypassing can improve the efficiency of using GPU memory bandwidth, which can benefit GPU performance. In this paper, we study a pure hardware-based GPU cache bypassing method that can be applied to GPU applications without having to recompile the programs. Moreover, we introduce a hybrid method that can exploit profiling information to further enhance the hardware-based bypassing...
To minimize the access latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup. However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. This paper proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that...
Cache memories have been introduced in recent generations of Graphics Processing Units (GPUs) to benefit general-purpose computing on GPUs (GPGPUs). In this work, we analyze the memory access patterns of GPGPU applications and propose a cost-effective profiling-based method to identify the data accesses that should bypass the L1 data cache to improve performance. The evaluation indicates that the...
Recent Graphics Processing Units (GPUs) have employed cache memories to boost performance. However, cache memories are well known to be harmful to time predictability for CPUs. For high-performance real-time systems using GPUs, it remains unknown whether or not cache memories should be employed. In this paper, we quantitatively compare the performance for GPUs with and without caches, and find that...
Cache memories are widely used in microprocessors to improve the average-case memory performance. However, they are harmful to time predictability, and thus may not be desirable for real-time systems. In this paper, we make simple hardware extensions of a regular cache to implement the performance enhancement guaranteed cache (PEG-C). The PEG-C is totally controlled by hardware, which can automatically...
Graphic Processing Units(GPU) use multiple, multithreaded, SIMD cores to exploit data parallelism to boost performance. State-of-the-art GPUs use configurable shared memory and cache to improve performance for applications with different access patterns. Unlike CPU programs, GPU programs usually exhibit different access patterns, whose performance may not be heavily dependent on the cache access latencies...
In this paper, we propose a Performance Enhancement Guaranteed Cache (PEG-C) to ensure performance benefit in the worst case while achieving as good average-case performance as a regular hardware-controlled cache. Our experiments indicate that with a small number of preloaded data and a simple hardware extension, the PEG-C can guarantee performance enhancement in the worst case while achieving the...
Large on-chip caches with uniform access time are inefficient to be used in multicore processors due to the increasing wire delays across the chip. The Non-Uniform Cache Architecture (NUCA) is proved to be effective to solve the problem of the increasing wire delays in multicore processors. For real-time systems that use multicore processors, it is crucial to bound the worst-case execution time (WCET)...
In this paper, we comparatively evaluate the energy consumption of real-time and media benchmarks on three different hybrid on-chip memory architectures. Our evaluation indicates that while pure SPMs can lead to less on-chip memory energy consumption than pure caches of the same size, the pure caches can reduce total energy consumption than pure SPMs by improving the performance. The hybrid SPM-caches...
Scratch-Pad Memories (SPMs) have been increasingly used in embedded systems due to their time predictability and better energy efficiency as compared to caches. However, the SPM is typically controlled by software, which is less adaptive to runtime instruction/data access patterns that are dependent on the input data and hence may lead to performance degradation. In this paper, we study the energy...
Multicore processors are a common and a necessary step in the evolution of the microprocessor. Today's general-purpose multicore processors cannot even provide soft real-time guarantee. This work studies the performance of cache-locking on a general-purpose multicore processor. The performance results for two different locking methods are determined in a variety of multicore configurations. It is...
Trusted Platform Module (TPM) has gained its popularity in computing systems as a hardware security approach. TPM provides the boot time security by verifying the platform integrity including hardware and software. However, once the software is loaded, TPM can no longer protect the software execution. In this work, we propose a dynamic TPM design, which performs control flow checking to protect the...
As transistor feature size scales down, soft errors in combinational logic because of high-energy particle radiation is gaining increasing concerns. In this paper, a soft error mitigation method based on accurate mathematical modeling of SER and addition of non-invert functionally redundant wires (FRWs) is proposed. In the proposed method, the factors which have significant influences on SER because...
Cache memories, while useful for improving the average-case performance for general-purpose applications, are not suitable for real-time systems due to the time unpredictability. In this paper, we propose a Performance Enhancement Guaranteed Cache (PEG-C) to ensure performance improvement in the worst case while achieving as good average-case performance as a regular hardware-controlled cache. We...
Reconfigurable architectures, such as Field-Programmable Gate Arrays (FPGAs), have become one of the key digital circuit implementation platform over the last decade due to its short time-to-market and low design cost. However, the major bottlenecks of FPGAs are their low logic utilization rate and long reconfiguration latency. In order to overcome these limitations, novel dynamically reconfigurable...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.