The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In recent years, deep convolutional neural networks have achieved state of the art performance in various computer vision tasks such as classification, detection or segmentation. Due to their outstanding performance, CNNs are more and more used in the field of document image analysis as well. In this work, we present a CNN architecture that is trained with the recently proposed PHOC representation...
The H-KWS 2016, organized in the context of the ICFHR 2016 conference aims at setting up an evaluation framework for benchmarking handwritten keyword spotting (KWS) examining both the Query by Example (QbE) and the Query by String (QbS) approaches. Both KWS approaches were hosted into two different tracks, which in turn were split into two distinct challenges, namely, a segmentation-based and a segmentation-free...
We present an approach for analyzing the visual aesthetic property of a handwritten document page which matches with human perception. We formulate the problem at two independent levels: (i) coarse level which deals with the overall layout, space usages between lines, words and margins, and (ii) fine level, which analyses the construction of each word and deals with the aesthetic properties of writing...
Often, videos are composed of multiple concepts or even genres. For instance, news videos may contain sports, action, nature, etc. Therefore, encoding the distribution of such concepts/genres in a compact and effective representation is a challenging task. In this sense, we propose the Bag of Genres representation, which is based on a visual dictionary defined by a genre classifier. Each visual word...
In the last few years, deep convolutional neural networks have become ubiquitous in computer vision, achieving state-of-the-art results on problems like object detection, semantic segmentation, and image captioning. However, they have not yet been widely investigated in the document analysis community. In this paper, we present a word spotting system based on convolutional neural networks. We train...
As video gameplay recording and streaming is becoming very popular on the Internet, there is an increasing need for automatic classification solutions to help service providers with indexing the huge amount of content and users with finding relevant content. The automatic classification of gameplay videos into specific genres is not a trivial task due to their high content diversity. This paper address...
Effect of adaptive threshold on shot boundary detection performance is analyzed in this paper, where the threshold is used to determine whether a target frame is a shot boundary or not in a broadcasting video content. Adaptive threshold is calculated using input threshold and visual similarities of adjacent frames of a target frame. The experimental results show that application of adaptive threshold...
Underwater scene search turns out to be one of the most challenging topics in the underwater image analysis. In this paper, we present one underwater scene search scheme combined with similarity measure and sparse representation. The color histogram is first adopted to classify the candidate image patches for each kind of the underwater scene. At the same time, the feature similarity (FSIM) considers...
Vision-based place recognition in underwater environments is a key component for autonomous robotic exploration. However, this task can be very challenging due to the inherent properties of this kind of places such as: color distortion, poor visibility, perceptual aliasing and dynamic illumination. In this paper, we present a method for vision-based place recognition in coral reefs. Our method relies...
The topic presented in this paper covers statistical studies on illumination conducted using specialised software dedicated to such simulations. A pre-designed computer visualisation of illumination, which included zonal illuminations of a selected architectural structure was modified by a selected group of respondents. As a result of responders individual aesthetic preferences sets of average luminance...
Image enhancement processes consist of a collection of techniques that inquire about to improve the visual appearance of degraded image. This paper introduces a multimodal enhancement technique for dense foggy images. The present available techniques don't work in low visibility like dense fog. The proposed methods changes the intensity component among the converted HIS components from the RGB components...
Action recognition has been one of the most popular fields of computer vision. This paper presents a novel approach to action recognition problem using the dimension reduction method, local fisher discriminant analysis, to reduce the dimension of feature descriptors as the preprocessing step after feature extraction. We propose to use sparse matrix and randomized kd-tree to modify and accelerate the...
Distributed object recognition is a significantly fast-growing research area, mainly motivated by the emergence of high performance cameras and their integration with modern wireless sensor network technologies. In wireless distributed object recognition, the bandwidth is limited and it is desirable to avoid transmitting redundant visual features from multiple cameras to the base station. In this...
A novel approach to spatio-temporal saliency detection in video is proposed. Saliency computation is considered as an optimization problem that maximizes the energy of a fully-connected graphical model based on spatio-temporal feature distinctiveness. Each pixel in a video is modeled by a node, and the spatio-temporal feature distinctiveness between pixels by edges connecting the nodes in the graph...
Earth mover's distance is one of the most effective metric for comparing histograms in various image retrieval applications. The main drawback is its computational complexity which hinders its usage in various comparison tasks. We propose fast earth mover's distance computation by providing better initialization to the transportation simplex algorithm. The new approach enables faster EMD computation...
Image classification is a general visual analysis task based on the image content coded by its representation. In this research, we proposed an image representation method that is based on the perceptual shape features and their spatial distributions. A natural language processing concept, N-gram, is adopted to generate a set of perceptual shape visual words for encoding image features. By combining...
The images of distant view and close-up view indicate a photographers' attention which can be further utilized for user behavior analysis and scene evaluation. As images may compose arbitrary contexts, distant view and close-up view classification becomes non-trivial. In this work, we found two cues can represent human visual attention, i.e. focus cue and scale cue. We model the focus cue in frequency...
We present a novel video representation for human action recognition by considering temporal sequences of visual words. Based on state-of-the-art dense trajectories, we introduce temporal bundles of dominant, that is most frequent, visual words. These are employed to construct a complementary action representation of ordered dominant visual word sequences, that additionally incorporates fine grained...
Smile detection in the wild is an interesting and challenging problem. This paper presents an efficient approach with hierarchical visual feature to handle this problem. In our approach, Gabor filters with multi-scale, multi-orientation are first applied to extract facial textures namely Gabor faces from the input face image. After this, Histograms of Oriented Gradients (HOG) are employed to encode...
We have proposed a novel representation to describe color, intensity, edge orientation, frequency and spatial layout as histogram-based features via simulating human's visual mechanism. In the representation, Color volume is used as a low-level feature to detect salient regions. At the same time, a novel representation method of visual feature, namely Cauchy density function histogram, is used to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.