The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recently, multiple kernel learning (MKL) methods have shown promising performance in image classification. As a sort of supervised learning, training MKL-based classifiers relies on selecting and annotating extensive dataset. In general, we have to manually label large amount of samples to achieve desirable MKL-based classifiers. Moreover, MKL also suffers a great computational cost on kernel computation...
This paper introduces a self-similarity matrix (SSM) based video copy detection scheme and a visual character-string (VCS) descriptor for SSM matching. SSM, which exploits the spatial and temporal information in a video clip, is extracted from exhaustive calculation of distances between the frames. The SSM based method treats the video clip as a whole and transforms the temporal self-similarity into...
Thumbnail cropping helps improve thumbnail readability by cropping images before shrinking them. In this paper we propose a learning based method for automatic thumbnail cropping. To this end, we use a support vector machine to learn a discriminative model that simultaneously captures the saliency distribution and spatial priors. The model is then used to determine the best cropping rectangle. The...
Effective texture feature is an essential component in any content based image retrieval system. In the past, spectral features, like Gabor and wavelet, have shown superior retrieval performance than many other statistical and structural based features. Recent researches on multi-resolution analysis have found that curvelet captures texture properties, like curves, lines, and edges, more accurately...
To understand video affective content automatically, the primary task is to transform the abstract concept of emotion into the form which can be handled by the computer easily. An improved V-A emotion space is proposed to address this problem. It unifies the discrete and dimensional emotion model by introducing the typical fuzzy emotion subspace. Fuzzy C-mean clustering (FCM) algorithm is adopted...
In this paper, a novel discriminant sparse non-negative matrix factorization (DSNMF) algorithm is proposed. We derive DSNMF method from original NMF algorithm by considering both sparseness constraint and discriminant information constraint. Furthermore, projected gradient method is used to solve the optimization problem. DSNMF makes use of prior class information which is important in classification,...
While many efforts have been made in the audio signal classification field, the noise interruption problem is seldom concerned so far, especially in many telecommunication applications, where a real-time and noise robust approach is needed. This paper addresses this problem by proposing two novel robust features: average pitch density (APD) and relative tonal power density (RTPD). APD refers to the...
We explore the problem of rapid automatic semantic tagging of video frames of unstructured (unedited) videos. We apply the sort-merge algorithm for feature selection on a large (>1000) heterogeneous feature set for videos showing lectures, to quickly locate low-level image features most predictive for concepts such as "key frame with text" or "key frame with computer source code"...
Spatiograms were generalization of histograms, which can harvest spatial information of images. The similarity measure is important when applying spatiograms to various computer vision problems such as tracking and image retrieval. The original proposed measures use Mahalanobis distance of coordinate mean to measure spatial information in spatiograms. However, spatial information which is described...
This paper proposes a novel scheme in performing feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first decomposed into non-uniform sub-bands using discrete wavelet transform (DWT), and then each sub-band stream is individually processed by the well-known normalization methods, like mean and variance...
In learning based single image super-resolution (SR) approach, the super-resolved image are usually found or combined from training database through patch matching. But because the representation ability of small patch is limited, it is difficult to guarantee that the super-resolved image is best under global view. To tackle this problem, we propose a statistical learning method for SR with both global...
Pseudo relevance feedback (PRF) has shown effective performance in information retrieval, but it has seldom been applied in the area of high level feature detection (HLF). In this paper, we explicitly propose to introduce PRF into HLF. Our contributions mainly lie in two-fold: (1) proposing three novel PRF approaches to extract pseudo positive samples, i.e., nearest-neighbor (NN) based PRF, score-evaluation...
Gaussian mixture models (GMM) have become one of the standard acoustic approaches for language identification. Furthermore, the GMM-SVM is proven to work well by introducing the discriminative method into the GMM-based acoustic systems. In these systems, the intersession variability within language has become an important adverse factor that degrades the system performance. To tackle this problem,...
People are among the most popular subjects in photography, and in many social settings, images of groups of people are captured. People often arrange themselves in a very structured manner in these group images. For example, taller people might stand in a row behind smaller people. This structure is often exploited in captions that sequentially label the individuals in each row. We present an algorithm...
In this research, we propose a method to automatically generate a landmark identification system for geo-tagged photographs, based on analysis of various data collected from the Web. The method first conducts Web analysis based on three major procedures: (1) Automatic extraction of points-of-interest (POIs) based on geographical clustering of geo-tagged images, (2) Retrieval of landmark candidates...
In this paper we propose a system that annotates a user generated video based on the associated location metadata, by exploiting user-tagged image databases. An example of such a database is a photo sharing Web site such as Flickr where users upload their images and annotate them with various tags. The goal is to find the tags that have high probability of being relevant to the video without any complex...
Existing pedestrian and vehicle detection algorithms use 2D cues of objects, such as pixel values, color, texture, shape information or motion. The use of 3D cues in object detection, on the other hand, is not well studied in the literature. In this paper, we propose an efficient algorithm that detects pedestrian and vehicle using their 3D cues. The proposed algorithm first detects moving objects...
Genre and emotion have been applied to content-based music retrieval and organization; however, the intrinsic correlation between them has not been explored. In this paper we present a statistical association analysis to examine such intrinsic correlation and propose a two-layer scheme that exploits the correlation for emotion classification. Significant improvement of classification accuracy over...
Action recognition has attracted much attention for human behavior analysis in recent years. Local spatial-temporal (ST) features are widely adopted in many works. However, most existing works which represent action video by histogram of ST words fail to have a deep insight into a fine structure of actions because of the local nature of these features. In this paper, we propose a novel method to simultaneously...
This paper presents a model-free and training-free two-phase method for audio segmentation that separates monophonic heterogeneous audio files into acoustically homogeneous regions where each region contains a single sound. A rough segmentation separates audio input into audio clips based on silence detection in the time domain. Then a self-similarity matrix, based on selected audio features in the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.