The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Content based indexing is critical to the effective access of the multimedia data. To this end, visual data is often annotated with textual content for bridging the semantic gap. In this paper, we present a method to generate frame level fine grained annotations for a given video clip. Access to the frame level fine grained annotations lead to rich, dense and meaningful semantic associations between...
Road detection from images is a challenging task in computer vision. Previous methods are not robust, because their features and classifiers cannot adapt to different circumstances. To overcome this problem, we propose to apply unsupervised feature learning for road detection. Specifically, we develop an improved encoding function and add a feature selection process to obtain robust and discriminative...
Image annotation is a hard multi-label learning problem which aims at automatically tagging each input image with relevant keywords reflecting its semantic concepts. Recently, several late fusion methods were proposed to improve the accuracy of image annotation. But these late fusion methods need normalization of confidence score vectors of independent models corresponding to distinct representations...
We propose to learn semantic spatio-temporal embeddings for videos to support high-level video analysis. The first step of the proposed embedding employs a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Gated Recurrent Unit encoders for capturing longer-term temporal structure of the CNN features...
Image registration is an important and fundamental problem in computer vision and image processing. Although there are currently a large number of image registration algorithms such as RANSAC and its extensions, image registration under very noisy conditions remains difficult when it cannot obtain enough number of correct corresponding points. This paper solves this issue by introducing a random resample...
In this paper, we propose a new local descriptor for action recognition in depth images. The proposed descriptor relies on surface normals in 4D space of depth, time, spatial coordinates and higher-order partial derivatives of depth values along spatial coordinates. In order to classify actions, we follow the traditional Bag-of-words (BoW) approach, and propose two encoding methods termed Multi-Scale...
We propose mutually incoherent pose bases for action recognition in static image, each of which implicitly represents co-occurrence of poselets. First of all, action specific poselets are trained. To suppress the ambiguity of detection, we cluster poselet activations by the overlap of predicted torso bound of each poselet. Then pose feature of an action person can be extracted which is a vector composed...
The particle size distribution (PSD) of a dispersed phase is a fundamental geometrical characteristic that needs to be determined from digital images for many industrial processes involving a multiphase flow. Nevertheless, when dealing with 2-D images, only the projections of the particles are visualized and therefore the particles can overlap each other. In this way, this paper aims to develop and...
With the success of deep learning in the last few years, the object detection community shifted from processing on exhaustive sliding windows to smaller set of object proposals using more powerful and deep visual representations. Object proposals increase the accuracy and speed up detection process by reducing the search space. In this paper we propose a novel idea of filtering irrelevant edges using...
A novel similarity-covariant feature detector that extracts points whose neighborhoods, when treated as a 3D intensity surface, have a saddle-like intensity profile. The saddle condition is verified efficiently by intensity comparisons on two concentric rings that must have exactly two dark-to-bright and two bright-to-dark transitions satisfying certain geometric constraints. Experiments show that...
We construct a robust and precise multi-orientation text detection system in scene images which can extensively locate possible characters with multi-information fusion. In our method, an adaptive multi-channel character grouping algorithm is first proposed to extract all possible character candidates robustly, and an AdaBoost classifier is then to properly identify character candidates as characters...
In this paper, we describe a one-class classification method based on Support Vector Data Description, which exploits multiple graph structures in its optimization process. We derive in a generic solution which can be employed for supervised one-class classification tasks. The devised method can produce linear or non-linear decision functions, depending on the adopted kernel function. In our experiments,...
We propose a machine learning based approach to real-time detection and classification assistance for images from unknown environments. While systems for detecting and classifying regular structures like faces in still images are well established, the task of e. g. detecting new morphotypes/objects in an environment is much more complex. The morphotypes/objects are not guaranteed to have apriori known...
The performance of an object detection system relies heavily on two components: an object model to capture the compositional relationship among the object body and its parts, and a feature representation to describe object appearance. In this work, we present an empirical study of combining two state-of-the-art such components: Deformable Part Model (DPM), a proven effective and flexible part-based...
This paper presents a method for detecting a pedestrian by leveraging multi-spectral image pairs. Our approach is based on the observation that a multi-spectral image, especially far-infrared (FIR) image, enables us to overcome inherent limitations for pedestrian detection under challenging circumstances, such as even dark environments. For that task, multi-spectral color-FIR image pairs are used...
The clustering algorithm by fast search and find of density peaks is shown to be a promising clustering approach. However, this algorithm involves manual selection of cluster centers, which is not convenient in practical applications. In this paper we discuss the correlation between density peaks and cluster centers. As a result, we present a new local density estimation method to highlight the uniqueness...
We introduce a new algorithm that maps multiple instance data using both positive and negative target concepts into a data representation suitable for standard classification. Multiple instance data are characterized by bags which are in turn characterized by a variable number of feature vectors or instances. Each bag has a known positive or negative label, but the labels of any given instances within...
The use of different evaluation measures for classification tasks have gained a significant amount of attention in the past decade, specially for those problems with multiple and imbalanced classes [1], [2]. However, the optimization of classifiers with respect to these measures is still heuristic, using ad-hoc rules with classical accuracy-optimized classifiers. We propose a classifier designed specifically...
In this paper, we discuss a novel approach to incrementally construct a rule ensemble. The approach constructs an ensemble from a dynamically generated set of rule classifiers. Each classifier in this set is trained by using a different class ordering. We investigate criteria including accuracy, ensemble size, and the role of starting point in the search. Fusion is done by averaging. Using 22 data...
This paper presents a novel Robust Deep Appearance Models (RDAMs) approach to learn the non-linear correlation between shape and texture of face images. In this approach, two crucial components of face images, i.e. shape and texture, are represented by Deep Boltzmann Machines and Robust Deep Boltzmann Machines (RDBM), respectively. The RDBM, an alternative form of Robust Boltzmann Machines, can separate...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.