The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Hearing instruments are frequently used in notoriously difficult acoustic scenarios. Even for normal-hearing people ambient noise, reverberation and echoes often contribute to a degraded communication experience. The impact of these factors becomes significantly more prominent when participants suffer from a hearing loss. Nevertheless, hearing instruments are frequently used in these adverse conditions...
The foundations of signal processing are firmly set in least squares, an approach that has served us very well for years (and still does). With the increasing presence of machine learning and sophisticated statistics in audio processing, we are slowly seeing that not everything has to be based on Gaussians anymore. One recently popular approach along these lines is that of non-negative modeling, especially...
Mobile phones and modern hearing aids comprise advanced digital signal processing techniques as well as coding algorithms. From a functional point of view, digital hearing devices and mobile phones are approaching each other. In both types of devices similar or partly even identical algorithms can be found such as echo, reverberation and feedback control, noise reduction, intelligibility enhancement,...
Speech separation, or the cocktail party problem, is a widely acknowledged challenge. Part of the challenge stems from the confusion of what the computational goal should be. While the separation of every sound source in a mixture is considered the gold standard, I argue that such an objective is neither realistic nor what the human auditory system does. Motivated by the auditory masking phenomenon,...
Speech separation, or the cocktail party problem, is a widely acknowledged challenge. Part of the challenge stems from the confusion of what the computational goal should be. While the separation of every sound source in a mixture is considered the gold standard, I argue that such an objective is neither realistic nor what the human auditory system does. Motivated by the auditory masking phenomenon,...
A roomprint is a quantifiable description of an acoustic environment which can be measured under controlled conditions and estimated from a monophonic recording made in that space. We here identify the properties required of a roomprint in forensic audio applications and review the observable characteristics of a room that, when extracted from recordings, could form the basis of a room-print. Frequency-dependent...
This work describes a system for acoustic scene classification using large-scale audio feature extraction. It is our contribution to the Scene Classification track of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (D-CASE). The system classifies 30 second long recordings of 10 different acoustic scenes. From the highly variable recordings, a large number of spectral,...
In voice acquisition, variations of the microphone distance introduce not only level changes, but also frequency response changes due to the near-field effect. This paper presents a method for adaptive distance and near-field compensation based on the talker-to-microphone distance and the microphone polar pattern. If available, the microphone orientation and the critical distance associated with the...
Acoustic intensity can be used for different purposes such as sound source localisation, source separation and spatial audio object coding. Three-dimensional measurement of the acoustic intensity requires the design of special microphone arrays. A theoretical analysis and numerical simulations of intensity measurements using open spherical microphone arrays are presented in this paper. The calculation...
Speech and audio coding have during the last decade converged to an increasingly unified technology. This contribution discusses one of the remaining fundamental differences between speech and audio paradigms, namely, windowing of the input signal. Audio codecs generally use lapped transforms and apply a perceptual model in the transform domain, whereby temporal continuity is achieved by windowing...
In this paper we propose a new clustering approach for solving the permutation ambiguity in convolutive blind source separation. After the transformation to the time-frequency domain, the problem of separation of sources can be reduced to multiple instantaneous problems, which may be solved using independent component analysis. The drawbacks of this approach are the inherent permutation and scaling...
Self-similarity matrices have been widely used to analyze the sectional form of music signals, e.g. enabling the detection of parts such as verse and chorus in popular music. Two main types of structures often appear in self-similarity matrices: rectangular blocks of high similarity and diagonal stripes off the main diagonal that represent recurrent sequences. In this paper, we introduce a novel method...
Auditory saliency refers to the characteristics of a sound that cause it to attract the attention of a listener. Pre-attentive or bottom-up saliency has to do with automatic processing in the human auditory system that does not require and often precedes attention. Unlike visual saliency, where eye-tracking is a commonly used evaluation method, with auditory saliency, there is no easily trackable...
Detection of overlapping sound events generally requires training class models either from separate data for each class or by making assumptions about the dominating events in the mixed signals. Methods based on sound source separation are currently used in this task, but involve the problem of assigning separated components to sources. In this paper, we propose a method which bypasses the need to...
The automatic recognition of sound events allows for novel applications in areas such as security, mobile and multimedia. In this work we present a hierarchical hidden Markov model for sound event detection that automatically clusters the inherent structure of the events into sub-events. We evaluate our approach on an IEEE audio challenge dataset consisting of office sound events and provide a systematic...
In this paper we implement expectation maximization (EM) based methods in the short time Fourier transform (STFT) domain for background noise reduction in multi-channel systems. The models introduce a Wishart prior for the unknown signal covariance matrix. An EM algorithm is used to maximise the posterior probability for the clean signal, approaching a stationary point of the distribution with increasing...
In earlier work, we have formulated word discovery from speech as a latent component analysis problem. In more recent work, we proposed a Bayesian approach for estimating the model order, i.e. the vocabulary size, by evaluation of the marginal likelihood for different order values. That technique was expensive since the algorithm should be repeated for several order values to estimate the proper order...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.