The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called BUCKWILD! that uses both asynchronous execution and low-precision...
Support Vector Machines (SVMs) are supervised learning models of the machine learning field whose performance strongly depended on its hyperparameters. The Bio-inspired Optimization Tool for SVM (BIOTS) tool is based on a Multi-Objective Particle Swarm Algorithm (MOPSO) to tune hyperparameters of SVMs. In this work, BIOTS is proposed along with a custom hardware design generator (VHDL) that implements...
Field-Programmable Gate Arrays (FPGAs) are gaining considerable momentum in mainstream high-performance systems in recent years due to their flexibility and low power consumption. Still, FPGAs remain largely unavailable to software programmers due to programming and debugging difficulties that are inherent to standard Hardware Description Languages. The performance that hardware-oblivious software...
This paper deals with the evaluation of FPGAs resurgence for hardware acceleration applied to computed tomography on the back-projection operator used in iterative reconstruction algorithms. We focus our attention on the tools developed by FPGAs manufacturers, in particular the Intel FPGA SDK for OpenCL, that promises a new level of hardware abstraction from the developer's perspective, allowing a...
Hash functions represent a fundamental building block of many network security protocols. The SHA-3 hashing algorithm is the most recently developed hash function, and the most secure. Implementation of the SHA-3 hashing algorithm in Hardware Description Language (HDL) is time demanding and tedious to debug. On the other hand, High-Level Synthesis (HLS) tools offer potential solutions to the hardware...
The addition of hard blocks such as Block RAMs and Digital Signal Processors, have proven to be good means of improving various performance metrics in FPGAs. This however places stricter constraints on runtime relocation of hardware tasks and hence reduces their application in dealing with permanent faults. In this paper, we present a strategy that enhances the utilization of heterogeneous reconfigurable...
As the power wall has become one of the main limiting factors for the performance of general purpose processors, the trend in High Performance Computing (HPC) is moving towards application-specific accelerators in order to meet the stringent performance requirements for exascale computing while still satisfying power budget constraints. Within this context, reconfigurable devices, and more specifically...
The recent advancement in software industry such as Microsoft utilizing FPGAs (Field Programmable Gate Arrays) for acceleration in its search engine Bing and Intel's initiative to have its CPU along with Altera FPGA in the same chip indicates FPGA's potential as well as growing demand in the field of high performance computing. FPGAs provide accelerated computation due to their flexible architecture...
To the present day, a multitude of studies aims to understand how the Central Nervous System (CNS) translates neural pulses to muscle motor tasks, through the analysis of surface EMG (sEMG) recordings. One of the most considerable methods applies the Non-Negative Matrix Factorization (NMF) to data recorded from sEMG electrodes, to extract coordinated motor patterns, the so-called muscle synergies,...
High performance computing platform is moving from homogeneous individual unites to heterogeneous systems. Where each unit is a combination of homogeneous cores and accelerator devices. Accelerator s uch as GPUs, FPGAs, DSPs, these devices usually designed for the specific and intensive type of computing tasks. The presence of these devices have created fresh and attractive development platforms for...
With RFC 7748 the two elliptic curves Curve25519 and Curve448 were proposed for the next generation of TLS. Both curves were designed and optimized purely for software implementation; their implementation in hardware or physical protection against side-channel attacks were not considered in the design phase. Recently, it has been shown that for Curve25519 an efficient implementations in hardware along...
Convolutional Neural Networks (CNN) is widely applied in modern machine learning and pattern recognition area. Not only performance, more and more attention is paid on energy efficient and scalable devices like FPGA as a better solution than CPU and GPU. In this paper, we propose methods to optimize CNN by fixed-point quantization, activation function approximation, loops and tasks pipelining and...
most of advanced driver assistance systems are developed for safety and better driving. Safety system using image processing, like Hough transform, requires a lot of memory whose underutilization can lead to decrease the real time performances. Internal memories on reconfigurable devices such as FPGA are limited in size, number and bandwidth. Memory optimization cannot be done solely at the application...
Evolutionary-based algorithms play an important role in finding solutions to many problems that are not solved by classical methods, and particularly so for those cases where solutions lie within extreme non-convex multidimensional spaces. The intrinsic parallel structure of evolutionary algorithms are amenable to the simultaneous testing of multiple solutions; this has proved essential to the circumvention...
According to compression sensing reconstruction algorithm of Orthogonal Matching Pursuit (OMP) algorithm the problem of each iteration can't select the optimal atomic, to optimize the OMP algorithm design, ensures that each iteration of the current allowance minimum observation signal, and proposes a kind of optimize the OMP algorithm based on FPGA to realize the hardware structure design. In the...
Metaheuristic algorithms such as Bat algorithm, is now becoming powerful algorithm to solve many tough optimization problems. In fact, it is one of the new proposed metaheuristic algorithms to perform global optimization. And it has a good performance compared to the other most famous algorithms such as particle swarm optimization (PSO) or the genetic algorithm (GA). The application of BA in this...
The newly introduced Kubelka-Munk Genetic Algorithm (KMGA) is a promising technique used in the assessment of skin lesions. Unfortunately, this method is computationally expensive due to its function inverting process. In the work of this paper, we design a Predictive Function Optimization Algorithm in order to improve the efficiency of KMGA by speeding up its convergence rate. Using this approach,...
In this paper, we propose an FPGA memory hierarchy based on the OpenCL memory model. The memory hierarchy allows application-specific memory optimizations during design compilation using information provided in OpenCL kernels. With the proposed memory hierarchy, FPGA application developers can focus on their designs in OpenCL kernel codes, and their designs can be synthesized into FPGA hardware via...
In today's cars, more than 50 electronic control units are used to provide safety and to care about the occupants comfort. The development of advanced driver assistance systems is a key role in the automotive domain. It is essential to validate and verify results and to ensure faultless interoperability of the embedded systems. Not uncommonly, the dimensioning of parameters affects safety aspects...
Optimizing the power-performance tradeoff of a software system is challenging as the design space is large and live data is difficult to obtain. As a result, many power reduction techniques are based on power models which may not represent the full complexity of the system being analyzed. In this paper, in contrast, we propose a process for performing a tradeoff analysis using live power/performance...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.