The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Over the last decade, a lot of research has been done on sound event classification. But a main problem with sound event classification is that the performance sharply degrades in the presence of noise. As spectrogram-based image features and denoising auto encoder reportedly have superior performance in noisy conditions, this paper proposes a new robust feature called denoising auto encoder image...
This paper focuses on cover song identification over a large-scale dataset. Identifying all covers of a query song from music collection is a challenging task since covers vary in multiple aspects, such as tempo, key, and structure. For the large-scale dataset, cover song identification is more challenging and few works have been published. Previous works usually use a single representation for a...
In the audio event classification or detection research field, the representation of the audio itself is important. Many researchers tried to apply Deep Belief Network (DBN) to learn new representations of the audio. The mel filter-bank feature, which is obtained based on mel scale, is commonly used as the low level representation of the audio in the pre-processing procedure of DBN. However, the mel...
Audio event classification plays an important role in surveillance systems. Due to the constrain of short-time Fourier transform (STFT), the extraction of the audio frequency domain features, as the essential work among the audio event classification, still have some difficulty when conducted on a big audio frame. The traditional concatenation method of feature vector for the successive audio windows...
For music identification, conventional bag of audio words model methods generally compute a histogram for a piece of music, which ignores the temporal characteristic of music and has a negative influence on the accuracy. In addition, they are usually based on DFT spectrogram, which cannot represent music as well as Constant Q (CQ) spectrogram. To address the above problems, we propose a two-layer...
This paper addresses the problem of detection and recognition of impulsive sounds in surveillance system, such as door slams, footsteps, glass breaks, gunshots and human screams. We build an acoustic event dataset of about 1k sound clips and a ground truth dataset of a surveillance system. We investigate the influence of different frame size in audio feature extraction when classify acoustic events...
This paper presents a detection-based method for tracking an uncertain number of persons in complex scenarios with frequent occlusions. Frame-by-frame data association based particle filters are adopted to track targets in occlusion-free regions. When occlusion is detected, the associated trackers are deactivated and they are re-activated when the tracked persons are re-identified after occlusion...
Audio fingerprints can be used to implement an efficient music identification system on a million-song library, but the system requires huge amount of memory to hold the fingerprints and indexes. Therefore, for a large-scale music library, memory imposes a restriction on the speed of music identification. In this paper, we propose an efficient music identification system which utilizes a kind of space-saving...
Emotion is a useful mean to organize music library, and automatic music emotion recognition is drawing more and more attention. Music structure information is imported to improve the result for music emotion regression. Music dataset with emotion and structure annotations is built, and features concerning lyrics, audio and midi are extracted. For each emotion dimension, regressors are built using...
We adopt a two-layer regression model for music pleasure regression. Pleasure orientation of a song is estimated first, and then different regressors are used to predict degree of pleasure according to the estimated orientation. By using corresponding regressors for each instance, there is a big improvement when we assume the first layer is perfect in comparison with one-layer model. By tuning the...
In this paper, we present a new approach to content-based music mood classification. Music, especially song, is born with multi-modality natures. But current studies are mainly focus on its audio modality, and the classification capability is not good enough. In this paper we use three modalities which are audio, lyric and MIDI. After extracting features from these three modalities respectively, we...
Mood annotation of music is challenging as it concerns not only audio content but also extra-musical information. It is a representative research topic about how to traverse the well-known semantic gap. In this paper, we propose a new music-mood-specific ontology. Novel ontology-based semantic reasoning methods are applied to effectively bridge content-based information with web-based resources. Also,...
Recently, class labels are commonly used to structure the increasing amounts of music available in digital form on the Web and are important for music information retrieval. An evaluation for automatic classification of Chinese folk music according to an audio taxonomy is presented. The audio taxonomy is organized as hierarchical, resulting in good coverage of Chinese folk music. Continuous Hidden...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.