The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With FPGAs emerging as a promising accelerator for general-purpose computing, there is a strong demand to make them accessible to software developers. Recent advances in OpenCL compilers for FPGAs pave the way for synthesizing FPGA hardware from OpenCL kernel code. To enable broader adoption of this paradigm, significant challenges remain. This paper presents our efforts in developing dynamic profiling...
While Deep Neural Networks (DNNs) push the state-of-the-art in many machine learning applications, they often require millions of expensive floating-point operations for each input classification. This computation overhead limits the applicability of DNNs to low-power, embedded platforms and incurs high cost in data centers. This motivates recent interests in designing low-power, low-latency DNNs...
In response to the tremendous growth of the Internet, towards what we call the Internet of Things (IoT), there is a need to move from costly, high-time-to-market specific-purpose hardware to flexible, low-time-to-market general-purpose devices for packet processing. Among several such devices, GPUs have attracted attention in the past, mainly because the high computing demand of packet processing...
As more sophisticated services are increasingly offered by the OS kernel on mobile devices, the security and sensitivity of kernel data that they depend on are becoming a critical issue. Data isolation has emerged as a key technique that can address the issue by providing strong protection for sensitive kernel data. However, existing data isolation mechanisms for mobile devices all incur non-negligible...
The binary-weight CNN is one of the most efficient solutions for mobile CNNs. However, a large number of operations are required to process each image. To reduce such a huge operation count, we propose an energy-efficient kernel decomposition architecture, based on the observation that a large number of operations are redundant. In this scheme, all kernels are decomposed into sub-kernels to expose...
Detection of malicious software at the hardware level is emerging as an effective solution to increasing security threats. Hardware based detectors rely on Machine Learning(ML) classifiers to detect malware-like execution pattern based on Hardware Performance Counters(HPC) information at run-time. The effectiveness of these learning methods mainly relies on the information provided by expensive-to-implement...
Approximate computing aims to expose and exploit quality vs. efficiency tradeoffs to enable ever-more demanding applications on energy-constrained devices such as smartphones, or IoT devices. This paper makes the case for arbitrary quantization as a compelling approximation technique that exposes quality vs. energy tradeoffs and provides practical error guarantees. We present QAPPA (Quality Autotuner...
REDEFINE is a distributed dynamic dataow architecture, designed for exploiting parallelism at various granularities as an embedded system-on-chip (SoC). is paper dwells on the exibility of REDEFINE architecture and its execution model in accelerating real-time applications coupled with a WCET analyzer that computes execution time bounds of real time applications.
GPUs have been widely adopted in data centers to provide acceleration services to many applications. Sharing a GPU is increasingly important for better processing throughput and energy efficiency. However, quality of service (QoS) among concurrent applications is minimally supported. Previous efforts are too coarse-grained and not scalable with increasing QoS requirements. We propose QoS mechanisms...
Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates significant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space...
With the end of Dennard scaling, architects have increasingly turned to special-purpose hardware accelerators to improve the performance and energy efficiency for some applications. Unfortunately, accelerators don't always live up to their expectations and may under-perform in some situations. Understanding the factors which effect the performance of an accelerator is crucial for both architects and...
We require a very good technical knowledge to create automated tests to exploit the browser vulnerabilities. It is usually a combination of technical abilities and set of specific tools. Security concerns is of prime importance when it comes to web browsers. Attacks during surfing, executing any downloaded file and while transmission are very frequent these days and hence all browsers need to be hardened...
As computer systems increase in size and complexity, bugs become ever subtler and more difficult to detect and diagnose. A bug could exist at different layers of computer systems (e.g., applications, shared libraries, file systems, device firmware), or could be caused by the incompatibility among layers. In many cases, bugs would require a very specific combination of events to be triggered and are...
In this paper, DFGenTool, a dataflow graph (DFG) generation tool, is presented, which converts loops in a sequential program given in a high-level language such as C, into a DFG. DFGenTool adapts DFGs for mapping to Coarse Grain Reconfigurable Architectures (CGRA) to enable a variety of CGRA implementations and compilers to be benchmarked against a standard set of DFGs. Several kernels have been converted...
Virtualization is essential in supporting today's information infrastructure, and in particular the Cloud Computing area. However, the move to a virtualized architecture implies the addition of a new single point of failure: the hypervisor. Attempts to characterize and compare the susceptibility of systems (including virtualized systems) are often limited to the study of failure modes and their probabilities...
Security in Cyber-Physical Systems (CPS) has become a serious concern owing to the rapid adoption of technologies such as plug-and-play connectivity, robotics and remote coordination and control. It is well understood that the performance overhead incurred due to security considerations is rather high, which needs to be captured holistically for a real-time CPS with strict timing budget and hard deadlines...
Application profiling is an important step in the design and optimization of embedded systems. Accurately identifying and analyzing the execution of frequently executed computational kernels is needed to effectively optimize the system implementation, at both design time and runtime. Most previous profiling approaches are software based, which can incur significant overhead and may be prohibitive...
Cloud computing is one of the most popular Internet concepts, and many large companies provide cloud services to users. These large companies have built their own data centers to support upper layers of cloud services. To save cost and increase flexibility, SDN and virtualization technologies are widely used in data centers. Open vSwitch is an open source virtual switch that supports the OpenFlow...
This paper introduces a hardware TCP Offload Engine (TOE) aiming at low-latency communication systems. The throughput can reach 9.99 Gbps with the Jumbo frame. The input-to-output receiving latency of a packet consists of 100 bytes payload and 64 bytes header with timestamp is close to 90 nanoseconds. The application-to-application latency between the proposed acceleration system and the native Windows...
Random projections have recently emerged as a powerful technique for large scale dimensionality reduction in machine learning applications. Crucially, the projection can be obtained from sparse probability distributions, enabling hardware implementations with little overhead. In this paper, we describe a Field-Programmable Gate Array (FPGA) implementation alongside a kernel adaptive filter (KAF) that...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.