The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a statistical framework for high-level feature extraction that uses SIFT Gaussian mixture models (GMMs) and audio models. SIFT features were extracted from all the image frames and modeled by a GMM. In addition, we used mel-frequency cepstral coefficients and ergodic hidden Markov models to detect high-level features in audio streams. The best result obtained by using SIFT GMMs in terms...
Automatically extracting rhythmic information from musical recordings is inarguably one of the most critical subtasks in many systems of music information retrieval. This paper presents a system for automatically extracting rhythm feature of audio music signal in the WAV format by using a new approach based on metric structure and Bayesian theory. In this system, an detected method is applied in the...
In this paper, we propose a new general low-level feature representation for audio signals. Our approach, called Dominant Audio Descriptor is inspired by the MPEG-7 Dominant Color Descriptor. It is based on clustering time-local features and identifying dominant components. The features used to illustrate this approach are the well-known Mel Frequency Cepstral Coefficients. The performance of the...
As one of important information component in multimedia, audio enriches information perception and acquisition. Analyses and extractions of audio features are the base of audio classification. It's important to extract audio features effectively for content-based audio retrieval. In this paper, based on the theory of rough set, audio features are reduced and a lower-dimension feature set can be obtained...
This paper presents a method which able to integrate audio and visual information for human action scene analysis. The approach is top-down for determining and extracting action scenes in video by analyzing both audio and video data. We proposed a framework for recognizing actions by measuring image and action-based information from video with the following characteristics: feature extraction is done...
Audio segmentation and classification can provide useful information for multimedia content analysis. In this paper, we present a approach to segment and categorize the sports audio into speech, music and other environmental sounds for sports video classification and highlight detection. We investigate the performance of mel frequency cepstral coefficients (MFCC) in a Gaussian mixture model frame...
A novel affective video segment retrieval method based on the correlation between emotion and emotional audio events (EAEs) is presented. The proposed method focuses on retrieving three types of affective video segments, joy, sadness and excitement, by utilizing correlations between emotions and EAEs. The correlation between these emotions and EAEs is investigated by a subjective evaluation. The proposed...
We introduce a regularized kernel-based rule for unsupervised change detection based on a simpler version of the recently proposed kernel fisher discriminant ratio. Compared to other kernel-based change detectors found in the literature, the proposed test statistic is easier to compute and has a known asymptotic distribution which can effectively be used to set the false alarm rate a priori. This...
In this paper, we consider representing a musical signal as a dynamic texture, a model for both the timbral and rhythmical qualities of sound. We apply the new representation to the task of automatic song segmentation. In particular, we cluster sequences of audio feature-vectors, extracted from the song, using a dynamic texture mixture model (DTM). We show that the DTM model can both detect transition...
This paper presents a method which able to integrate audio and visual information for action scene analysis in any movie. The approach is top-down for determining and extract action scenes in video by analyzing both audio and video data. In this paper, we directly modelled the hierarchy and shared structures of human behaviours, and we present a framework of the hidden Markov model based application...
This paper presents two approaches for speaker role recognition in multiparty audio recordings. The experiments are performed over a corpus of 96 radio bulletins corresponding to roughly 19 h of material. Each recording involves, on average, 11 speakers playing one among six roles belonging to a predefined set. Both proposed approaches start by segmenting automatically the recordings into single speaker...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.