The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a static-placement, dynamic-issue (SPDI) framework for the coarse-grained reconfigurable architecture (CGRA) in order to tackle the inefficiencies of the static-issue, static-placement (SISP) CGRA. This framework includes the compiler that statically places the operations and hardware design, a SPDI CGRA, that automatically schedule the operations. We stress on introducing the...
Graphics Processing Units (GPUs) are designed to exploit large amount of parallelism. However, warp-level divergence occurring due to different amounts of work, memory access latency experienced, etc., results in warps of a thread block (TB) finishing kernel execution at different points in time. This, in effect, reduces utilization of resources of SMs and hence performance of the GPU. We propose...
GPU adoption for general purpose computing hasbeen accelerating. To support a large number of concurrentlyactive threads, GPUs are provisioned with a very large registerfile (RF). The RF power consumption is a critical concern. Oneoption to reduce the power consumption dramatically is touse near-threshold voltage(NTV) to operate the RF. However, operating MOSFET devices at NTV is fraught with stabilityand...
We present an effective code compression technique to reduce the area and energy overhead of the configuration memory for coarse-grained reconfigurable architectures (CGRA). Based on a statistical analysis of existing code, the proposed method reorders the storage locations of the reconfigurable entities and splits the wide configuration memory into a number of partitions. Code compression is achieved...
CGRAs are promising as accelerators due to their improved energy-efficiency compared to FPGAs. Existing CGRAs support reconfigurability for operations, but not communications because of the static neighbor-to-neighbor interconnect, leading to both performance loss and increased complexity of the compiler. In this paper, we introduce HyCUBE, a novel CGRA architecture with a reconfigurable interconnect...
As more sophisticated services are increasingly offered by the OS kernel on mobile devices, the security and sensitivity of kernel data that they depend on are becoming a critical issue. Data isolation has emerged as a key technique that can address the issue by providing strong protection for sensitive kernel data. However, existing data isolation mechanisms for mobile devices all incur non-negligible...
To consider QoS for resource-limited mobile systems, we introduce a fast preemption mechanism on GPUs. First, we involve a dual-kernel execution model to support fine-grained preemption, and a resource allocation policy to avoid resource fragmentation problem. Second, we propose a preemption victim selection scheme to reduce the throughput overhead while satisfying a required preemption latency. Evaluations...
Virtualization is essential in supporting today's information infrastructure, and in particular the Cloud Computing area. However, the move to a virtualized architecture implies the addition of a new single point of failure: the hypervisor. Attempts to characterize and compare the susceptibility of systems (including virtualized systems) are often limited to the study of failure modes and their probabilities...
Attacks on an operating system kernel using kernel rootkits pose a particularly serious threat. Detecting an attack is difficult when the operating system kernel is infected with a kernel rootkit. For this reason, handling an attack will be delayed causing an increase in the amount of damage done to a computer system. In this paper, we discuss KRGuard (Kernel Rootkits Guard), which is a new method...
Risk permeates all aspects of doing business. However, support tools capable of systematically identifying the complete spectrum of risks that a company might face are currently lacking. Such a tool would need to reliably identify company-risk relationships from unstructured sources, therefore providing a qualitative assessment of risk exposure. We propose a supervised learning approach that combines...
With the rapid development of network technology and the increasingly complexity of system function, the embedded system is facing more and more serious threats. Previous researches on kernel monitoring and protection widely relies on higher privileged system components, such as hardware virtualization extensions, to isolate security tools from potential kernel attacks. These approaches increase both...
Context switching is a key technique enabling preemption and time-multiplexing for CPUs. However, for single-instruction multiple-thread (SIMT) processors such as high-end graphics processing units (GPUs), it is challenging to support context switching due to the massive number of threads, which leads to a huge amount of architectural states to be swapped during context switching. The architectural...
Modern video standards such as H.264 and HEVC introduce new simplified transform functions that allow for simple hardware implementation, different block sizes and enhanced coding efficiency. However, the number of different transforms to implement has increased, leading to the need of shared architectures able to process several transforms with minimum hardware overhead. This trend started with H...
The Clang implementation of OpenMP® 4.5 now provides full support for the specification, offering the only open source option for targeting NVIDIA® GPUs. While using OpenMP allows portability across different architectures, matching native CUDA® performance without major code restructuring is an open research issue.In order to analyze the current performance, we port a suite of representative benchmarks,...
Today's multimedia and DSP applications impose requirements on performance and power consumption that only custom processor architectures with SIMD capabilities can satisfy. However, the specific features of such architectures, including vector operations and high-bandwidth complex memory organization, make them notoriously complicated and time consuming to program. In this paper we present an automated...
Stencil computations form the basis for computer simulations across almost every field of science, such as computational fluid dynamics, data mining, and image processing. Their mostly regular data access patterns potentially enable them to take advantage of the high computation and data bandwidth of GPUs, but only if data buffering and other issues are handled properly. Finding a good code generation...
The emergence of new era of Internet of Things or IoT have encouraged intensive if not extensive usage of modern mobile apps, thus multi-ISA equipped multicore processor gain great potential to be used for more efficient instruction binary processing in near future. In order to support this ISA diversity of computing platforms, mix modes of statically and dynamically Binary Translation and Optimization...
Recent advances in processor manufacturing has led to integrating tens of cores in a single chip and promise to integrate many more with the so-called manycore architectures. Manycore architectures usually integrate many small power efficient cores, which can be 32-bit cores in order to maximize the performance per Watt ratio. Providing large physical memory (e.g. 1 TB) to such architectures thus...
Despite many efforts to better utilize the potential of GPUs and CPUs, it is far from being fully exploited. Although many tasks can be easily sped up by using accelerators, most of the existing schedulers are not flexible enough to really optimize the resource usage of the complete system. The main reasons are (i) that each processing unit requires a specific program code and that this code is often...
Software memory disclosure attacks, such as buffer over-read, often work quietly and may cause leakage of secrets. The well-known OpenSSL Heartbleed vulnerability leaked out millions of servers' private keys, and caused most of Internet services insecure during that time. Existing solutions are either hard to apply to large code bases, or too heavyweight (e.g. by involving a hypervisor software or...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.