The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Convolutional neural network (CNN) has become a successful algorithm in the region of artificial intelligence and a strong candidate for many computer vision algorithms. But the computation complexity of CNN is much higher than traditional algorithms. With the help of GPU acceleration, CNN-based applications are widely deployed in servers. However, for embedded platforms, CNN-based solutions are still...
Sparsity helps reducing the computation complexity of DNNs by skipping the multiplication with zeros. The granularity of sparsity affects the efficiency of hardware architecture and the prediction accuracy. In this paper we quantitatively measure the accuracy-sparsity relationship with different granularity. Coarse-grained sparsity brings more regular sparsity pattern, making it easier for hardware...
Convolutional neural networks (CNNs) have recently broken many performance records in image recognition and object detection problems. The success of CNNs, to a great extent, is enabled by the fast scaling-up of the networks that learn from a huge volume of data. The deployment of big CNN models can be both computation-intensive and memory-intensive, leaving severe challenges to hardware implementations...
Recent progress in the machine learning field makes low bit-level Convolutional Neural Networks (CNNs), even CNNs with binary weights and binary neurons, achieve satisfying recognition accuracy on ImageNet dataset. Binary CNNs (BCNNs) make it possible for introducing low bit-level RRAM devices and low bit-level ADC/DAC interfaces in RRAM-based Computing System (RCS) design, which leads to faster read-and-write...
Edge detection is an active and critical topic in the field of image processing, and plays a vital role for some important applications such as image segmentation, pattern classification, object tracking etc. In this paper, an approach using varied local edge pattern descriptor is proposed for edge detection. This method contains the following steps: firstly, Gaussian filter is used to smooth the...
Convolutional Neural Network (CNN) has become a successful algorithm in the region of artificial intelligence and a strong candidate for many applications. However, for embedded platforms, CNN-based solutions are still too complex to be applied if only CPU is utilized for computation. Various dedicated hardware designs on FPGA and ASIC have been carried out to accelerate CNN, while few of them explore...
Deep learning, and especially Convolutional Neural Network (CNN, is among the most powerful and widely used techniques in computer vision. Applications range from image classification to object detection, segmentation, Optical Character Recognition (OCR), etc. At the same time, CNNs are both computationally intensive and memory intensive, making them difficult to be deployed on low power lightweight...
Sparse matrix-vector multiplication (SpMV) is an important computational kernel in many applications. For performance improvement, software libraries designated for SpMV computation have been introduced, e.g., MKL library for CPUs and cuSPARSE library for GPUs. However, the computational throughput of these libraries is far below the peak floating-point performance offered by hardware platforms, because...
Convolutional Neural Network (CNN) is a powerful technique widely used in computer vision area, which also demands much more computations and memory resources than traditional solutions. The emerging metal-oxide resistive random-access memory (RRAM) and RRAM crossbar have shown great potential on neuromorphic applications with high energy efficiency. However, the interfaces between analog RRAM crossbars...
The key to high performance for GPU architecture lies in massive threading to drive the large number of cores and enable overlapping of threading execution. However, in reality, the number of threads that can simultaneously execute is often limited by the size of the register file on GPUs. The traditional SRAM-based register file costs so large amount of chip area that it cannot scale to meet the...
In this paper, we study how to initialize the convolutional neural network (CNN) model for training on a small dataset. Specially, we try to extract discriminative filters from the pre-trained model for a target task. On the basis of relative entropy and linear reconstruction, two methods, Minimum Entropy Loss (MEL) and Minimum Reconstruction Error (MRE), are proposed. The CNN models initialized by...
One key issue for people re-identification is to find good features or representation to bridge the gaps among different appearances of the same people, which is introduced by large variances in view point, illumination and non-rigid deformation. In this paper, we create a deep convolutional neural network (deep CNN) to solve this problem and integrate feature learning and re-identification into one...
Channel interference factor for the identification result is prevalent among the existing speaker recognition algorithms. In order to improve the accuracy of the algorithm, the paper utilizes the technique of latent factor analysis(LFA) to deal with the channel factors in the speaker's Gaussian Mixture Model(GMM). In the endpoint detection phase of speaker recognition, the algorithm introduces the...
Computing nodes in reconfigurable clusters are occupied and released by applications during their execution. At compile time, application developers are not aware of the amount of resources available at run time. Dynamic Stencil is an approach that optimises stencil applications by constructing scalable designs which can adapt to available run-time resources in a reconfigurable cluster. This approach...
In many application domains, data are represented using large graphs involving millions of vertices and billions of edges. Graph exploration algorithms, such as breadth-first search (BFS), are largely dominated by memory latency and are challenging to process efficiently. In this paper, we present a reconfigurable hardware methodology for efficient parallel processing of large-scale graph exploration...
In recent years, the overall breast screening uptake rate in South West London is lower than national average figure. It is well acknowledged that population turnover, minutes for travel time to screening units, deprivation and culture factors impact on breast screening uptake from previous research. This paper focuses on the relationship between breast screening uptake and its determinant factors:...
Many clustering techniques have been proposed for the analysis of gene expression data. However, the optimal method for a given experimental dataset is still not resolved. Fuzzy c-means and kernel fuzzy c-means algorithm have been widely applied to gene expression data, but they give the equal weight to the genes and noises, which lead to results that are not stable or accurate. In this paper, we...
Surface curvature is used in a number of areas in computer graphics, including texture synthesis and shape representation, mesh simplification, surface modeling, and nonphotorealistic line drawing. Most real-time applications must estimate curvature on a triangular mesh. This estimation has been limited to CPU algorithms, forcing object geometry to reside in main memory. However, as more computational...
According to the problem that the ECC cannot correct the multibit error in ECC memory, this paper proposes a memory error processing method on software level. On the foundation of revising the Linux kernel code, the method can discover this area of influence area of memory error according to seek the process information mapping to the mistaken address. This way can avoid wastage to the user due to...
Nowadays, Graphics Processing Unit (GPU), as a kind of massive parallel processor, has been widely used in general purposed computing tasks. Although there have been mature development tools, it is not a trivial task for programmers to write GPU programs. Based on this consideration, we propose a novel parallel computing architecture. The architecture includes a parallel programming model, named Gemma,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.