The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The Throat Microphone (TM) is a non-acoustic device, relying on the vibrations of vocal folds rather than the audible sound produced. Correctly capturing vocal fold vibrations is difficult due to poor signal representation capabilities. The system recognizes the TM vibrations and produces the corresponding speech sound. This is done by extracting features from the spectrum of the TM vibrations and...
The object is to develop a program for the analysis of speech designed to help speech therapists and phoniatrics at work. There are functions of recording and voice analysis, as well as implementation of the ability to add information about patients in the database (DB)In this program.
In this paper we present a new database with speech recordings in Spanish. The database contains recordings of 54 native Spanish speakers. It is appropriate to be used in the development and testing of better Speaker Verification systems. The recording procedure, equipments and speech tasks are detailed. Experiments using the GMM-UBM speaker verification methodology were performed. The methodology...
In this paper, we present a latent variable (LV) framework to identify all the speakers and their keywords given a single channel microphone recording containing a multi-speaker mixture signal. We introduce two separate LVs to denote active speakers and the keywords uttered. The dependency of a spoken keyword on the speaker is modeled through a conditional probability mass function. The distribution...
Wind noise is one of the most significant issues for hearing aid users. In this paper, a contribution to this issue is made by using binaural phase and level difference. Most of sounds including speech signal have a directional information, that is, interaural phase difference (IPD) and level difference (ILD) are not varied if sound direction is fixed. However, wind noise have no directional information,...
We report on a recently-recorded database for use in processing of ad hoc microphone constellations. Twenty-four microphones were positioned in various locations at a central table in a large room, and their outputs were recorded while 4 target talkers at the table both read from a list of sentences in a constrained way and also maintained a natural conversation for several minutes. This was done...
Home automation has become a subject of increasing interest for both industry and research as there is an increase in the awareness of such systems and their benefits can be easily seen. The new trend is to develop smart homes where commands can be given by speech. This way of communication, besides being the most natural, has the advantage of offering flexibility to the users especially when they...
The reversal of the current directions in audio circuit elements causes polarity inversion of the acquired audio signal with respect to the reference input signal. The objective of the work presented in this paper is to implement a simple polarity detection circuit in audio preamplifiers which provides an indication of the signal polarity inversion. The present work also demonstrates the possibilities...
In this paper the speech enhancement abilities of a new array-based processor have been tested. The proposed system works in three cascade stages. First, the signals are time aligned with the estimated direction of the desired sound source. Second, the signal is decomposed in its allpass and minimum-phase components using cepstral processing. In this moment, beamforming and liftering in cepstral domain...
In light of the scarcity of both published and free Acoustic Arabic databases, we propose in this paper Acoustic Arabic database to be a reference in the field of automatic Arabic speech recognition, this database is the result of a case study that has been developed to contribute to the automatic diagnosis of speech disorders in Arabic speaking children, the field work was in collaboration with experts...
In this contribution, we study the characteristics of sound generated by wind and a signal model for the synthesis of wind noise signals is derived. An analysis of the statistics of wind noise recorded in a laboratory setup is carried out with respect to the spectral and temporal properties of the signals. In particular, an autoregresive model is developed for the spectral shape description and the...
In this paper we describe a new multichannel room impulse responses database. The impulse responses are measured in a room with configurable reverberation level resulting in three different acoustic scenarios with reverberation times RT60 equals to 160 ms, 360 ms and 610 ms. The measurements were carried out in recording sessions of several source positions on a spatial grid (angle range of −90° to...
A new single- and multichannel audio recordings database (SMARD) is presented in this paper. The database contains recordings from a box-shaped listening room for various loudspeaker and array types. The recordings were made for 48 different configurations of three different loudspeakers and four different microphone arrays. In each configuration, 20 different audio segments were played and recorded...
Speech is one of the most popular parameter used to identify a speaker by her spoken phrase. Feature extraction from speech is a necessary first step in a speaker identification process. Traditionally computation of the Mel Frequency Cepstral Coefficient (MFCC) features use hamming window, as a preprocessing step to reduce spectral leakages. However, hamming window results in reasonable side lobes...
In this work, a method for multi channel speech enhancement using linear prediction (LP) residual cepstrum is proposed. The method performs deconvolution at each microphone output using cepstral domain. The deconvolution of acoustic impulse response from reverberated signal in each individual channel removes early reverberation. This dereverberated output from each channel is then spatially filtered...
King Saud University speech database (KSU-DB) is a very rich speech database of Arabic language. Its richness is in many dimensions. It has more than three hundred speakers of both genders. The speakers are Arabs and non-Arabs belonging to twenty-nine different nationalities. The database has different types of text such as isolated words, digits, phonetically rich words and sentences, phonetically...
This paper describes a method where an interference noise source within an audio source separation scenario is suppressed from a mixture. The principal idea of the proposed method is to use a video camera array for locating a interference noise source whose 3D position will be used to estimate a matrix of frequency responses (FRs) by linearly combining a series of previously known FRs. A filter is...
Shigin is the singing of Japanese or Chinese poetry, following a melody called “seicho” in Japanese. However, it is difficult to master Shigin because a trainer teaches according to his/her own impressions, and its melody employs a relative music scale. Therefore, this paper proposes a singing training support system for Shigin that clarifies differences in signal characteristics between a trainee...
In this article, we propose solutions to the problem of speaker diarization of TV talk-shows, a problem for which adapted multimodal approaches, relying on other streams of data than only audio, remain largely under exploited. Hence we propose an original system that leverages prior knowledge on the structure of this type of content, especially the visual information relating to the active speakers,...
This paper presents a system that gives a robot the ability to diminish its own disturbing noise (i.e., ego noise) by utilizing template-based ego noise estimation, an algorithm previously developed by the authors. In pursuit of an autonomous, online and adaptive template learning system in this work, we specifically focus on eliminating the requirement of an offline training session performed in...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.