The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Convolutional neural networks (CNNs) are deployed in a wide range of image recognition, scene segmentation and object detection applications. Achieving state of the art accuracy in CNNs often results in large models and complex topologies that require significant compute resources to complete in a timely manner. Binarised neural networks (BNNs) have been proposed as an optimised variant of CNNs, which...
We explore techniques to significantly improve the compute efficiency and performance of Deep Convolution Networks without impacting their accuracy. To improve the compute efficiency, we focus on achieving high accuracy with extremely low-precision (2-bit) weight networks, and to accelerate the execution time, we aggressively skip operations on zero-values. We achieve the highest reported accuracy...
Text analytics applications using machine learning techniques have grown in importance with ever increasing amount of data being generated from web-scale applications, social media and digital repositories. Apart from being large in size, these generated data are often unstructured and are heavily sparse in nature. The performance of these applications on current systems is hampered by hard to predict...
Deep neural networks (DNNs) are widely used in data analytics, since they deliver state-of-the-art accuracies. Binarized neural networks (BNNs) are recently proposed optimized variant of DNNs. BNNs constraint network weight and/or neuron value to either +1 or −1, which is representable in 1 bit. This leads to dramatic algorithm efficiency improvement, due to reduction in the memory and computational...
Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix vector multiplications (SGEMVs) that are the majority of the computation in GRU. Then, we study the...
Rapid growth of Internet led to web applications that produce large unstructured sparse datasets (e.g., texts, ratings). Machine learning (ML) algorithms are the basis for many important analytics workloads that extract knowledge from these datasets. This paper characterizes such workloads on a high-end server for real-world datasets and shows that a set of sparse matrix operations dominates runtime...
Sparse matrix vector multiplication (SpMV) is a linear algebra construct commonly found in machine learning (ML) algorithms, such as support vector machine (SVM). We profiled a popular SVM software (libSVM) on an energy-efficient microserver and a high-performance server for real-world ML datasets, and observed that SpMV dominates runtime. We propose a novel SpMV algorithm tailored for ML and a hardware...
Maximum a posteriori probability (MAP) inference on Markov random fields (MRF) is the basis of many computer vision applications. Sequential tree-reweighted belief propagation (TRW-S) has been shown to provide very good inference quality and strong convergence properties. However, software TRW-S solvers are slow due to the algorithm's high computational requirements. A state-of-the-art FPGA implementation...
Vertex-centric graph computations are widely used in many machine learning and data mining applications that operate on graph data structures. This paper presents GraphGen, a vertex-centric framework that targets FPGA for hardware acceleration of graph computations. GraphGen accepts a vertex-centric graph specification and automatically compiles it onto an application-specific synthesized graph processor...
Large scale 3D image localization requires computationally expensive matching between 2D feature points in the query image and a 3D point cloud. In this paper, we present a method to accelerate the matching process and to reduce the memory footprint by analyzing the view-statistics of points in a training corpus. Given a training image set that is representative of common views of a scene, our approach...
Many promising embedded computer vision applications, such as stereo estimation, rely on inference computation on Markov Random Fields (MRFs). Sequential Tree-Reweighted Message passing (TRW-S) is a superior MRF solving method, which provides better convergence and energy than others (e.g., belief propagation). Since software TRW-S solvers are slow, custom TRW-S hardware has been proposed to improve...
The MEMOCODE 2013 design contest problem is stereo matching. Given a stereo image pair (i.e., a left image and a right image), the challenge is to infer the depth information (i.e., third dimension) for each pixel in the image utilizing belief propagation algorithm. Contestants were given a month to develop a fast and/or cost-effective system for stereo matching. From a total of eight participating...
When a processor implementation is synthesized from a specification using an automatic framework, this implementation still should be verified against its specification to ensure the automatic framework introduced no error. This paper presents our effort in integrating fully automated formal verification with a high-level processor pipeline synthesis framework. As an integral part of the pipeline...
We present a technique to automatically synthesize a multithreaded in-order pipeline from a high-level unpipelined datapath specification. This work extends the previously proposed transactional specification (T-spec) and synthesis technology (T-piper). The technique not only works with instruction processors but also flexible enough to accept any sequential datapath. It maintains previously proposed...
We present a transactional datapath specification (T-spec) and the tool (T-piper) to synthesize automatically an in-order pipelined implementation from it. T-spec abstractly views a datapath as executing one transaction at a time, computing next system states based on current ones. From a T-spec, T-piper can synthesize a pipelined implementation that preserves original transaction semantics, while...
This paper presents a comparison of power-aware video decoding techniques that utilize dynamic voltage scaling (DVS). These techniques reduce the power consumption of a processor by exploiting high frame variability within a video stream. This is done through scaling of the voltage and frequency of the processor during the video decoding process. However, DVS causes frame deadline misses due to inaccuracies...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.