The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This article discusses the general structure of the CNC control system with the Soft PLC module which was implemented as an external unit. The advantages of reducing the EtherCAT cycle, increasing control system reliability and disadvantages associated with the additional cost of the proposed solution were investigated. The proposed solution also uses the external Soft PLC instead of the safety controller...
We present a highly hardware friendly STDP (Spike Timing Dependent Plasticity) learning rule for training Spiking Convolutional Cores in Unsupervised mode and training Fully Connected Classifiers in Supervised Mode. Examples are given for a 2-layer Spiking Neural System which learns in real time features from visual scenes obtained with spiking DVS (Dynamic Vision Sensor) Cameras.
Machine Learning techniques such as Support Vector Machines (SVM) have found applications in many fields, e.g. in Wireless Sensor Networks (WSN) and sensor data processing in general. Especially in the case of WSN energy is very limited as agents solely operate based on battery power after they have been deployed, therefore energy efficiency is of great importance. Furthermore, agents are supposed...
High-Level Synthesis (HLS) has been widely recognized as an efficient compilation process targeting FPGAs for algorithm evaluation and product prototyping. However, the massively parallel memory access demands and the extremely expensive cost of single-bank memory with multi-port have impeded loop pipelining performance. Thus, based on an alternative multi-bank memory architecture, a joint approach...
This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design...
Bilateral filtering (BLF) and median filtering (MF) are key components in many applications. As the image resolution grows rapidly, implementation of efficient filtering is highly demanded. In this paper, we present a unified VLSI architecture that is able to compute both kinds of filters for 4k2k videos at 30fps. One feature of this design is that we leverage an emerging layer-based algorithm for...
Heterogeneous Multi-Processor Systems-on-Chip, whether ARM or x86 based, promise further performance scalability by complementing temporal compute in CPUs/GPUs with spatial compute in digital circuitry. Dynamic partial reconfiguration (DPR) extends such compute architectures by making use of different spatial compute elements over time. Novel research [1] presents means for operating DPR by the Linux...
The fast evolution of mobile devices has made them the center of attention for not only the research industry, but also malicious actors, as smartphones are used to store, transmit and process sensitive information. The diversity and number of typically installed applications create windows of opportunity for attackers. Attackers can use vulnerable applications to gain control over the device or change...
In this paper, we present our work to enable optimized one-sided communication operations on the ARM v8 architecture using a high-performance InfiniBand network interconnect, as well as an evaluation of our implementation. For this study, we started with an OpenSHMEM implementation based on Open MPI/SHMEM, and combined it with the UCX framework and the XPMEM kernel extension for shared memory communication...
FPGAs are continuously increasing in both chip size and operating frequency. Dynamic reconfiguration is easier and more stable with current generation of hardware and software tools. These characteristics have made them more accessible to generic acceleration tasks instead of specialized functions. As a consequence, FPGAs are being deployed in more computing clusters than in the past. This leads to...
Realizing the next generation of radio telescopes such as the Square Kilometre Array (SKA) requires both more efficient hardware and algorithms than today's technology provides. The recently introduced image-domain gridding (IDG) algorithm is a novel approach towards solving the most compute-intensive parts of creating sky images: gridding and degridding. It avoids the performance bottlenecks of traditional...
Computing systems have become increasingly heterogeneous contributing to higher performance and power efficiency. However, this is at the cost of increasing the overall complexity of designing such systems. One key challenge in the design of heterogeneous systems is the efficient scheduling of computational load. To address this challenge, this paper thoroughly analyzes state of the art scheduling...
The increased use of application-specific computational devices turns even low-power chips into high-performance computers. Not only additional accelerators (e.g., GPU, DSP, or even FPGA), but also heterogeneous CPU clusters form modern computer systems. Programming these chips is however challenging, due to management overhead, data transfer delays, and a missing unification of the programming flow...
Based on the requirements of miniaturization, stability and definition of the image acquisition device, an embedded Linux image acquisition and display system based on embedded system is designed. The system hardware using ARM core S3C2440 microprocessor, USB camera and LCD display to build image acquisition and display system; the software system placed Linux system as the core is built. Build hardware...
Simulation is a fast, controlled, and reproducible way to evaluate new algorithms for distributed computing platforms in a variety of conditions. However, the realism of simulations is rarely assessed, which critically questions the applicability of a whole range of findings. In this paper, we present our efforts to build platform models from application traces, to allow for the accurate simulation...
Convolutional Neural Networks (CNNs) have proven effective for machine learning tasks such as computer vision. Analog, asynchronous hardware implementations of such neural networks appear to be promising avenues for fast, online, real-time, energy efficient machine learning. However, the weight-sharing requirements of CNNs present challenges for such neuromorphic designs. We propose a biologically...
In the field of high performance heterogeneous computing systems, field programmable gate arrays (FPGAs) have shown great advantages in terms of acceleration and energy efficiency. And with the inclusion of the OpenCL framework for parallel programming, the design complexity has been greatly reduced. However, the parallel implementation of applications containing data-dependent branches usually experiences...
This paper deals with the recently introduced class of Non-Surjective Finite Alphabet Iterative Decoders (NS-FAIDs). First, optimization results for an extended class of regular NS-FAIDs are presented. They reveal different possible trade-offs between decoding performance and hardware implementation efficiency. To validate the promises of optimized NS-FAIDs in terms of hardware implementation benefits,...
High performance computing (HPC) systems frequently suffer errors and failures from hardware components that negatively impact the performance of jobs run on these systems. We analyzed system logs from two HPC systems at Purdue University and created statistical models for memory and hard disk errors. We created a small-scale error injection testbed—using a customized QEMU build, libvirt, and Python—that...
Instruction set randomization (ISR) was proposed early in the last decade as a countermeasure against code injection attacks. However, it is considered to have lost its relevance; with the pervasiveness of code-reuse techniques in modern attacks, code injection no longer remains a foundational component in contemporary exploits. This paper revisits the relevance of ISR in the current security landscape...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.