Search results

chapter

An FPGA hardware implementation approach for a phylogenetic tree reconstruction algorithm with incremental tree optimization

Henry Block, Tsutomu Maruyama

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

In this paper, we present an FPGA hardware implementation approach for a phylogenetic tree reconstruction with maximum parsimony algorithm. The algorithm, based on stochastic local search, uses the Indirect Calculation of Tree Lengths and the Incremental Tree Optimization methods. We evaluate and compare our new approach against previous hardware approaches, and against TNT, the fastest available...

chapter

FPGA acceleration of spark applications in a Pynq cluster

Christoforos Kachris, Elias Koromilas, Ioannis Stamelos, Dimitrios Soudris

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

In this paper we present a framework for the seamlessly utilization of hardware accelerators in heterogeneous SoCs that are used to speedup the processing of Spark data analytics applications.

chapter

Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation

Shengjia Shao, Wayne Luk

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 6

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment to find an optimal policy that maximises the reward. Trust Region Policy Optimisation (TRPO) is a recent policy optimisation algorithm that achieves superior results in various RL benchmarks, but is computationally...

chapter

doppioDB: A hardware accelerated database

David Sidler, Muhsen Owaida, Zsolt Istvan, Kaan Kara, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Relational databases provide a wealth of functionality to a wide range of applications. Yet, there are tasks for which they are less than optimal, for instance when processing becomes more complex (e.g., regular expression evaluation, data analytics) or the data is less structured (e.g., text or long strings). With the increasing amount of user-generated data stored in relational databases, there...

chapter

A Power-Efficient Accelerator Based on FPGAs for LSTM Network

Yiwei Zhang, Chao Wang, Lei Gong, Yuntao Lu, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 629 - 630

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Today, artificial neural networks (ANNs) are widely used in a variety of applications, including speech recognition, face detection, disease diagnosis, etc. And as the emerging field of ANNs, Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) which contains complex computational logic. To achieve high accuracy, researchers always build large-scale LSTM networks which are time-consuming...

chapter

Application of convolutional neural networks on Intel® Xeon® processor with integrated FPGA

Philip Colangelo, Enno Luebbers, Randy Huang, Martin Margala, more

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

Intel®'s Xeon® processor with integrated FPGA is a new research platform that provides all the capabilities of a Broadwell Xeon Processor with the added functionality of an Arria 10 FPGA in the same package. In this paper, we present an implementation on this platform to showcase the abilities and effectiveness of utilizing both hardware architectures to accelerate a convolutional based neural network...

chapter

Auto-SI: An adaptive reconfigurable processor with run-time loop detection and acceleration

Tanja Harbaum, Christoph Schade, Marvin Damschen, Carsten Tradowsky, more

2017 30th IEEE International System-on-Chip Conference (SOCC) > 153 - 158

2017 30th IEEE International System-on-Chip Conference (SOCC)

Modern computer architectures have an ever-increasing demand for performance, but are constrained in power dissipation and chip area. To tackle these demands, architectures with application-specific accelerators have gained traction in research and industry. While this is a very promising direction, hard-wired accelerators fall short when too many applications need to be supported or flexibility is...

chapter

AIScale — A coarse grained reconfigurable CNN hardware accelerator

Rastislav Struharik, Bogdan Vukobratovic

2017 IEEE East-West Design & Test Symposium (EWDTS) > 1 - 9

2017 IEEE East-West Design & Test Symposium (EWDTS)

In this paper we propose a novel CNN hardware accelerator, called AlScale, capable of accelerating convolutional, pooling, fully-connected and adding CNN layers. In contrast to most existing solutions, AIScale offers a complete solution to the full CNN acceleration. AIScale is designed as a coarse-grained reconfigurable architecture, which uses rapid, dynamic reconfiguration during the CNN layer processing...

chapter

Reconfigurable logic embedded architecture of support vector machine linear kernel

Jeevan Sirkunan, N. Shaikh-Husin, Trias Andromeda, M. N. Marsono

2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI) > 1 - 5

2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI)

Support Vector Machine (SVM) is a linear binary classifier that requires a kernel function to handle non-linear problems. Most previous SVM implementations for embedded systems in literature were built targeting a certain application; where analyses were done through comparison with software implementations only. The impact of different application datasets towards SVM hardware performance were not...

article

T-NOVA: An Open-Source MANO Stack for NFV Infrastructures

Michail-Alexandros Kourtis, Michael J. McGrath, Georgios Gardikis, Georgios Xilouris, more

IEEE Transactions on Network and Service Management > 2017 > 14 > 3 > 586 - 602

One of the primary challenges associated with network functions virtualization (NFV) is the automated management of the service lifecycle. In this paper, we present a full software-based management and orchestration (MANO) stack which operates with OpenStack and OpenDaylight controllers and has the in-built functionality to automate the key phases of the NFV service lifecycle, namely resource discovery...

chapter

Methods and infrastructure in the era of accelerator-centric architectures

Brandon Reagen, Yakun Sophia Shao, Sam Likun Xi, Gu-Yeon Wei, more

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS) > 902 - 905

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)

Computer architecture today is anything but business as usual, and what is bad for business is often great for science. As Moore's Law continues to unwaveringly march forward, despite the ceasing of Dennard scaling, continued performance gains with each processor generation has become a significant challenge, and requires creative solutions. Namely, the way to continue to scale performance in light...

chapter

Packet Classification with Limited Memory Resources

Michal Kekely, Jan Korenek

2017 Euromicro Conference on Digital System Design (DSD) > 179 - 183

2017 Euromicro Conference on Digital System Design (DSD)

Network security and monitoring devices use packet classification to match packet header fields in a set of rules. Many hardware architectures have been designed to accelerate packet classification and achieve wire-speed throughput for 100 Gbps networks. The architectures are designed for high throughput even for the shortest packets. However, FPGA SoC and Intel Xeon with FPGA have limited resources...

chapter

Using the Integrated GPU to Improve CPU Sort Performance

Grigore Lupescu, Emil-Ioan Slusanschi, Nicolae Tapus

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 39 - 44

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In this paper we discuss the potential of the integrated GPU to accelerate sorting by performing a partial sort prior to a comparison based CPU sort. We experiment along with several CPU comparison based sorting algorithms and outline the performance gain for a random input data set. We then analyze different x86 SoC architectures, and show that by sorting chunks stored inside the onchip GPU memory,...

chapter

Design of a LabVIEW-based polyphase filter bank spectrometer for radio astronomy using FlexRIO FPGA technology and CUDA-enabled GPU

Rodrigo G. Freundt, Jorge A. Heraud

2017 XXXIInd General Assembly and Scientific Symposium of the International Union of Radio Science (URSI GASS) > 1 - 4

2017 XXXIInd General Assembly and Scientific Symposium of the International Union of Radio Science (URSI GASS)

The spectrometer is the most important back-end in single antenna radio astronomy observations. The state-of-the-art designs for this type of instruments propose to reduce the effects of spectral leakage by using the Polyphase Filter Bank (PFB) technique and to achieve wideband and high resolution by using digital, reconfigurable, and high-performance computing hardware, such as commercial-available...

chapter

An indoor AR registration technique based on iBeacons

Xingfu Zhong, Wenming Wang, Quanyu Wang

2017 IEEE International Conference on Information and Automation (ICIA) > 1093 - 1098

2017 IEEE International Conference on Information and Automation (ICIA)

The existing AR indoor registration technologies based on hardware often have the disadvantage of low registration accuracy. To solve the problem, a new indoor AR registration technology based on iBeacon is proposed in this paper. Firstly, the coordinates of the phone are calculated based on the data received by iBeacons. Secondly, the 3D directions of the phone are obtained based on the acceleration...

chapter

Heterogeneous Hardware from Homogeneous Software

Alberto Dassatti, Roberto Rigamonti

2017 International Conference on High Performance Computing & Simulation (HPCS) > 913 - 914

2017 International Conference on High Performance Computing & Simulation (HPCS)

Our society relies upon information processing at a scale never seen before in human history. We are indeed experiencing an exponential growth in processing demand, as more and more applications in the most disparate domains emerge. While continuous improvements in the manufacturing processes of microprocessors has been able so far to mitigate the ecological and economical costs this trend imposes,...

chapter

Hardware-Efficient Guided Image Filtering for Multi-label Problem

Longquan Dai, Mengke Yuan, Zechao Li, Xiaopeng Zhang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4905 - 4913

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The Guided Filter (GF) is well-known for its linear complexity. However, when filtering an image with an n-channel guidance, GF needs to invert an n × n matrix for each pixel. To the best of our knowledge existing matrix inverse algorithms are inefficient on current hardwares. This shortcoming limits applications of multichannel guidance in computation intensive system such as multi-label...

chapter

Acceleration of RSA processes based on hybrid ARM-FPGA cluster

Xu Bai, Lei Jiang, Qiong Dai, Jiajia Yang, more

2017 IEEE Symposium on Computers and Communications (ISCC) > 682 - 688

2017 IEEE Symposium on Computers and Communications (ISCC)

Cooperation of software and hardware with hybrid architectures, such as Xilinx Zynq SoC combining ARM CPU and FPGA fabric, is a high-performance and low-power platform for accelerating RSA Algorithm. This paper adopts the none-subtraction Montgomery algorithm and the Chinese Remainder Theorem (CRT) to implement high-speed RSA processors, and deploys a 48-node cluster infrastructure based on Zynq SoC...

chapter

Exploring the Granularity of Sparsity in Convolutional Neural Networks

Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, more

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) > 1927 - 1934

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Sparsity helps reducing the computation complexity of DNNs by skipping the multiplication with zeros. The granularity of sparsity affects the efficiency of hardware architecture and the prediction accuracy. In this paper we quantitatively measure the accuracy-sparsity relationship with different granularity. Coarse-grained sparsity brings more regular sparsity pattern, making it easier for hardware...

chapter

Motionword: An activity recognition algorithm based on intelligent terminal and cloud

Zhenjie Yao, Zhipeng Zhang, Junyan Wang, Li-Qun Xu

2017 20th International Conference on Information Fusion (Fusion) > 1 - 5

2017 20th International Conference on Information Fusion (Fusion)

The ability to recognize physical activity, such as sedentary, driving, riding, daily activities and effective training, is useful for health conscious users to catalogue their daily activities and to develop good exercise routines. Conventional activity recognition algorithms require complex calculations, which are not suitable for wearable devices developed on low-cost, low-power hardware platforms...

INFONA - science communication portal

Search results

An FPGA hardware implementation approach for a phylogenetic tree reconstruction algorithm with incremental tree optimization

FPGA acceleration of spark applications in a Pynq cluster

Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation

doppioDB: A hardware accelerated database

A Power-Efficient Accelerator Based on FPGAs for LSTM Network

Application of convolutional neural networks on Intel® Xeon® processor with integrated FPGA

Auto-SI: An adaptive reconfigurable processor with run-time loop detection and acceleration

AIScale — A coarse grained reconfigurable CNN hardware accelerator

Reconfigurable logic embedded architecture of support vector machine linear kernel

T-NOVA: An Open-Source MANO Stack for NFV Infrastructures

Methods and infrastructure in the era of accelerator-centric architectures

Packet Classification with Limited Memory Resources

Using the Integrated GPU to Improve CPU Sort Performance

Design of a LabVIEW-based polyphase filter bank spectrometer for radio astronomy using FlexRIO FPGA technology and CUDA-enabled GPU

An indoor AR registration technique based on iBeacons

Heterogeneous Hardware from Homogeneous Software

Hardware-Efficient Guided Image Filtering for Multi-label Problem

Acceleration of RSA processes based on hybrid ARM-FPGA cluster

Exploring the Granularity of Sparsity in Convolutional Neural Networks

Motionword: An activity recognition algorithm based on intelligent terminal and cloud

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options