The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we describe a semi-supervised algorithm to segment bird vocalizations using matrix factorization and Rényi entropy based mutual information. Singular value decomposition (SVD) is applied on pooled time-frequency representations of bird vocalizations to learn basis vectors. By utilizing only a few of the bases, a compact feature representation is obtained for input test data. Rényi entropy...
Bird activity detection is the task of determining if a bird sound is present in a given audio recording. This paper describes a bird activity detector which utilises a support vector machine (SVM) with a dynamic kernel. Dynamic kernels are used to process sets of feature vectors having different cardinalities. Probabilistic sequence kernel (PSK) is one such dynamic kernel. The PSK converts a set...
In this paper, we describe an unsupervised method to segment birdcalls from the background in bioacoustic recordings. The method utilizes information derived from both source features as well as system features. Three types of source features are extracted from the linear prediction residual signal, and Mel frequency cepstral coefficients are extracted from the system features. The source features...
In this paper, we describe an unsupervised, species independent method to segment birdcalls from the background in bio-acoustic recordings. The method follows a two-pass approach. An initial segmentation is performed utilizing K-means clustering. This provides labels to train Gaussian mixture acoustic models, which are built using Mel frequency cepstral coefficients. Using the acoustic models, the...
In this paper, we apply speech and audio processing techniques to bird vocalizations and for the classification of birds found in the lower Himalayan regions. Mel frequency cepstral coefficients (MFCC) are extracted from each recording. As a result, the recordings are now represented as varying length sets of feature vectors. Dynamic kernel based support vector machines (SVMs) and deep neural networks...
Speaker diarization is the task of determining “who spoke when” in a speech recording of an unknown duration containing an unknown number of speakers. The very unsupervised nature of this task makes it more challenging and demands that the feature representation used be highly discriminative across speakers. Commonly used features based on the short time Fourier transform are usually derived from...
The availability of multiple utterances (and hence, i-vectors) for speaker enrollment brings up several alternatives for their utilization with probabilistic linear discriminant analysis (PLDA). This paper provides an overview of their effective utilization, from a practical viewpoint. We derive expressions for the evaluation of the likelihood ratio for the multi-enrollment case, with details on the...
A voice activity detector (VAD) plays a vital role in robust speaker verification, where energy VAD is most commonly used. Energy VAD works well in noise-free conditions but deteriorates in noisy conditions. One way to tackle this is to introduce speech enhancement preprocessing. We study an alternative, likelihood ratio based VAD that trains speech and nonspeech models on an utterance-by-utterance...
In this paper, we present a speech activity detection (SAD) technique for speaker verification in noisy environments. The proposed SAD is based on phoneme posteriors derived from a multi-layer perceptron (MLP). The MLP is trained using modulation spectral features, where long temporal segments of the speech signal are analyzed in critical bands. In each sub-band, temporal envelopes are derived using...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.