Search results

chapter

VLSI implementation of LS-SVM training and classification using entropy based subset-selection

Andreas Bytyn, Jannik Springer, Rainer Leupers, Gerd Ascheid

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Machine Learning techniques such as Support Vector Machines (SVM) have found applications in many fields, e.g. in Wireless Sensor Networks (WSN) and sensor data processing in general. Especially in the case of WSN energy is very limited as agents solely operate based on battery power after they have been deployed, therefore energy efficiency is of great importance. Furthermore, agents are supposed...

chapter

Snowflake: An efficient hardware accelerator for convolutional neural networks

Vinayak Gokhale, Aliasger Zaidy, Andre Xian Ming Chang, Eugenio Culurciello

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Deep learning is becoming increasingly popular for a wide variety of applications including object detection, classification, semantic segmentation and natural language processing. Convolutional neural networks (CNNs) are a type of deep neural network that achieve high accuracy for these tasks. CNNs are hierarchical mathematical models comprising billions of operations to produce an output. The high...

chapter

VLSI architecture design of layer-based bilateral and median filtering for 4k2k videos at 30fps

Ming-Yi Tai, Wei-Chih Tu, Shao-Yi Chien

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Bilateral filtering (BLF) and median filtering (MF) are key components in many applications. As the image resolution grows rapidly, implementation of efficient filtering is highly demanded. In this paper, we present a unified VLSI architecture that is able to compute both kinds of filters for 4k2k videos at 30fps. One feature of this design is that we leverage an emerging layer-based algorithm for...

chapter

PACENet: Energy efficient acceleration for convolutional network on embedded platform

Adwaya Kulkarni, Tahmid Abtahi, Colin Shea, Amey Kulkarni, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Lightweight convolutional neural network (CNN) on tiny embedded platforms can offer energy efficient solution for today's IoT devices. However, CNN implementation on embedded system faces processing bottleneck in convolutional layers and memory storage issues in fully connected layers. In past years, heterogeneous acceleration, where compute intensive tasks are performed on kernel specific cores,...

chapter

swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight

Jiarui Fang, Haohuan Fu, Wenlai Zhao, Bingwei Chen, more

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 615 - 624

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

To explore the potential of training complex deep neural networks (DNNs) on other commercial chips rather than GPUs, we report our work on swDNN, which is a highly-efficient library for accelerating deep learning applications on the newly announced world-leading supercomputer, Sunway TaihuLight. Targeting SW26010 processor, we derive a performance model that guides us in the process of identifying...

chapter

Automatic stellar spectral classification with multiple intelligent classifiers

Israel Cruz-Vega, Hayde Peregrina-Barreto, Jose de Jesus Rangel-Magdaleno, Juan Manuel Ramirez-Cortes, more

2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) > 1 - 5

2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC)

Stellar Classification is based on their spectral characteristics. In order to improve performance rates previously reported, like those based on statistical analysis or data transformations, classifiers based on computational intelligence provide a high level of accuracy no matter the presented high level of non-linearity or high dimensionality characteristics of data. In this paper, the star's classification...

chapter

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors

Benjamin Klenk, Holger Froening, Hans Eberle, Larry Dennison

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 855 - 865

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Accelerators, such as GPUs, have proven to be highly successful in reducing execution time and power consumption of compute-intensive applications. Even though they are already used pervasively, they are typically supervised by general-purpose CPUs, which results in frequent control flow switches and data transfers as CPUs are handling all communication tasks. However, we observe that accelerators...

chapter

MOCHA: Morphable Locality and Compression Aware Architecture for Convolutional Neural Networks

Syed Mohammad Asad Hassan Jafri, Ahmed Hemani, Kolin Paul, Naeem Abbas

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 276 - 286

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Today, machine learning based on neural networks has become mainstream, in many application domains. A small subset of machine learning algorithms, called Convolutional Neural Networks (CNN), are considered as state-ofthe- art for many applications (e.g. video/audio classification). The main challenge in implementing the CNNs, in embedded systems, is their large computation, memory, and bandwidth...

chapter

Enabling One-Sided Communication Semantics on ARM

Pavel Shamis, M. Graham Lopez, Gilad Shainer

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 805 - 813

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we present our work to enable optimized one-sided communication operations on the ARM v8 architecture using a high-performance InfiniBand network interconnect, as well as an evaluation of our implementation. For this study, we started with an OpenSHMEM implementation based on Open MPI/SHMEM, and combined it with the UCX framework and the XPMEM kernel extension for shared memory communication...

chapter

A deep learning approach to traffic lights: Detection, tracking, and classification

Karsten Behrendt, Libor Novak, Rami Botros

2017 IEEE International Conference on Robotics and Automation (ICRA) > 1370 - 1377

2017 IEEE International Conference on Robotics and Automation (ICRA)

Reliable traffic light detection and classification is crucial for automated driving in urban environments. Currently, there are no systems that can reliably perceive traffic lights in real-time, without map-based information, and in sufficient distances needed for smooth urban driving. We propose a complete system consisting of a traffic light detector, tracker, and classifier based on deep learning,...

chapter

Incorporating side-channel information into convolutional neural networks for robotic tasks

Yilun Zhou, Kris Hauser

2017 IEEE International Conference on Robotics and Automation (ICRA) > 2177 - 2183

2017 IEEE International Conference on Robotics and Automation (ICRA)

Convolutional neural networks (CNN) are a deep learning technique that has achieved state-of-the-art prediction performance in computer vision and robotics, but assume the input data can be formatted as an image or video (e.g. predicting a robot grasping location given RGB-D image input). This paper considers the problem of augmenting a traditional CNN for handling image-like input (called main-channel...

chapter

A software technique to enhance register utilization of Convolutional Neural Networks on GPGPUs

Che-Huai Lin, An-Ting Cheng, Bo-Cheng Lai

2017 International Conference on Applied System Innovation (ICASI) > 614 - 617

2017 International Conference on Applied System Innovation (ICASI)

CNNs (Convolutional Neural Networks) have demonstrated superior results in a wide range of applications. However, the time-consuming convolution operations required by CNNs pose great challenges to designers. GPGPUs (General Purpose Graphic Processing Units) have been widely used to exploiting the massive parallelism of convolution operations. This paper proposes a software-based loop-unrolling technique...

chapter

A generic execution framework for shared FPGA-based accelerators

Dumitru Laurentiu Alexandru, Rares Maniu

2017 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM) & 2017 Intl Aegean Conference on Electrical Machines and Power Electronics (ACEMP) > 803 - 808

2017 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM) & 2017 Intl Aegean Conference on Electrical Machines and Power Electronics (ACEMP)

FPGAs are continuously increasing in both chip size and operating frequency. Dynamic reconfiguration is easier and more stable with current generation of hardware and software tools. These characteristics have made them more accessible to generic acceleration tasks instead of specialized functions. As a consequence, FPGAs are being deployed in more computing clusters than in the past. This leads to...

chapter

Fusing attention with visual question answering

Ryan Burt, Mihael Cudic, Jose C. Principe

2017 International Joint Conference on Neural Networks (IJCNN) > 949 - 953

2017 International Joint Conference on Neural Networks (IJCNN)

Visual Question Answering is a complex problem that fuses natural language and image processing to answer a question based on information from the image. The basic architecture for accomplishing this is using a CNN to extract features from the image and an RNN for the language processing, then combine the two in an MLP to produce an answer. These architectures perform well at identifying content,...

chapter

Graph Analytics: Complexity, Scalability, and Architectures

Peter M. Kogge

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1039 - 1047

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Big Data as expressed as "Big Graphs" are growing in importance. Looking forward, there is also increasing interest in streaming versions of the associated analytics. This paper develops an initial template for the relationship between "traditional" batch graph problems, and streaming forms. Variations of streaming problems are discussed, along with their relationship to existing...

chapter

Comparative Performance and Optimization of Chapel in Modern Manycore Architectures

Engin Kayraklioglu, Wo Chang, Tarek El-Ghazawi

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1105 - 1114

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Chapel is an emerging scalable, productive parallel programming language. In this work, we analyze Chapel's performance using The Parallel Research Kernels on two different manycore architectures including a state-of-the-art Intel Knights Landing processor. We discuss implementation techniques in Chapel and their relation to the OpenMP implementations of the PRK. We also suggest and prototype several...

chapter

Deep graph node kernels: A convex approach

Luca Oneto, Nicolio Navarin, Alessandro Sperduti, Davide Anguita

2017 International Joint Conference on Neural Networks (IJCNN) > 316 - 323

2017 International Joint Conference on Neural Networks (IJCNN)

Nowadays, developing effective techniques able to deal with data coming from structured domains is becoming crucial. In this context kernel methods are the state-of-the-art tool widely adopted in real-world applications that involve learning on structured data. Contrarily, when one has to deal with unstructured domains, deep learning methods represent a competitive, or even better, choice. In this...

chapter

Extremely parallel memristor crossbar architecture for convolutional neural network implementation

Chris Yakopcic, Md Zahangir Alom, Tarek M. Taha

2017 International Joint Conference on Neural Networks (IJCNN) > 1696 - 1703

2017 International Joint Conference on Neural Networks (IJCNN)

This paper presents a simulated memristor crossbar based Convolutional Neural Network (CNN). Deep networks implemented on GPU clusters have become the state of the art in providing excellent classification ability, at the cost of a more complex data manipulation process. In this work we show that once deep networks are trained, the analog crossbar circuits in this paper can parallelize the recognition...

chapter

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms

Jie Wang, Xinfeng Xie, Jason Cong

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 72 - 81

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Data movement is increasingly becoming the bottleneck of both performance and energy efficiency in modern computation. Until recently, it was the case that there is limited freedom for communication optimization on GPUs, as conventional GPUs only provide two types of methods for inter-thread communication: using shared memory or global memory. However, a new warp shuffle instruction has been introduced...

chapter

Static WCET Analysis of GPUs with Predictable Warp Scheduling

Yijie Huangfu, Wei Zhang

2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC) > 101 - 108

2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC)

The capability of GPUs to accelerate general-purpose applications that can be parallelized into massive number of threads makes it promising to apply GPUs to real-time applications as well, where high throughput and intensive computation are also needed. However, due to the different architecture and programming model of GPUs, the worst-case execution time (WCET) analysis methods and techniques designed...

INFONA - science communication portal

Search results

VLSI implementation of LS-SVM training and classification using entropy based subset-selection

Snowflake: An efficient hardware accelerator for convolutional neural networks

VLSI architecture design of layer-based bilateral and median filtering for 4k2k videos at 30fps

PACENet: Energy efficient acceleration for convolutional network on embedded platform

swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight

Automatic stellar spectral classification with multiple intelligent classifiers

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors

MOCHA: Morphable Locality and Compression Aware Architecture for Convolutional Neural Networks

Enabling One-Sided Communication Semantics on ARM

A deep learning approach to traffic lights: Detection, tracking, and classification

Incorporating side-channel information into convolutional neural networks for robotic tasks

A software technique to enhance register utilization of Convolutional Neural Networks on GPGPUs

A generic execution framework for shared FPGA-based accelerators

Fusing attention with visual question answering

Graph Analytics: Complexity, Scalability, and Architectures

Comparative Performance and Optimization of Chapel in Modern Manycore Architectures

Deep graph node kernels: A convex approach

Extremely parallel memristor crossbar architecture for convolutional neural network implementation

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms

Static WCET Analysis of GPUs with Predictable Warp Scheduling

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options