The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
After subjects take part in the Mongolian Standard Speech Test, they have some divergent opinions about their tested results. In order to eliminate these divergences of the test results, a new window is opened to employ a kind of software to assist the testing of the Mongolian Standard Speech based on the comparisons of language features. Specifically speaking, after the subject's speech data are...
The goal of the blabbering voice-to-Speech Translation research is to enable real-time, interpersonal communication via natural spoken language for people who do not share a common language. The Multilingual Automatic blabbering voice-to-Speech Translator (MASTOR) system is the first Speech-to-Speech system that allows for bidirectional (blabbering voice Tamil) free-form speech input and output. The...
This paper deals with the problem of noise cancellation of speech signals in an acoustic environment. In this regard, generally, different adaptive filter algorithms are employed, many of them may lack the flexibility of controlling the convergence rate, range of variation of filter coefficients, and consistency in error within tolerance limit. In order to achieve these desirable attributes as well...
Voice is the natural communication system used by all beings, human beings in particular. Understanding and recognizing human uttered voice for various applications is the core technology of "information" age. Automatic speech recognition has wide spread applications in real life situations. Here speech recognition of Malayalam isolated digit is created by using Mel Frequency Cepstral Coefficients...
Here have been great efforts made in the development of automated Instrumentation system for speech recognition (AISR) to provide a two-way communication between deaf and vocal people. This system performance achievable with the output of current real-time speech recognition systems would be extremely poor relative to normal speech reception. An alternate application of AISR technology to aid the...
To solve the problems of slow convergence and low computational precision of blind source separation(BSS) based on traditional particle swarm optimization(PSO), a novel approach-based adaptive particle swarm optimization for real-time blind source separation is proposed, in which the observations are linear convolutive mixtures of statistically independent speech sources. It combines the independent...
Wavelet packet transform is an efficient method in speech denoising processing. In this paper, we research on various wavelet packet basis, decomposition layers, values of the threshold and threshold functions which are key parameters in wavelet packet denoising. Furthermore, we adopt three methods to evaluate the effects of denoised speech, including signal-noise-ratio(SNR), wavelet spectrum distortion...
In this paper, we present a FPGA-based voice activity detection system. DoV (Degree of Voicing) and QSNR (Quantile Signal-to-Noise Ratio) are used as parameters of the VAD algorithm of the proposed system. All VAD system functions are implemented using a dedicated parallel architecture, including signal capturing, DoV processing module and QSNR processing module. The system uses several DPRAMs (Dual...
Linear source-filter models have been widely used by researchers as a front-end for speaker identification systems. It uses the cepstral features derived from the power spectrum of the speech signal. But it is also well known that a significant part of the acoustic information cannot be modeled by the linear source-filter model, and thus, the need for nonlinear features becomes apparent. In this paper,...
The paper provides a novel approach to emotion recognition from facial expression and voice of subjects. The subjects are asked to manifest their emotional exposure in both facial expression and voice, while uttering a given sentence. Facial features including mouth-opening, eye-opening, eyebrow-constriction, and voice features including, first three formants: F1, F2, and F3, and respective powers...
This paper presents the design and development of a frame based approach for speech to sign language machine translation system in the domain of railways and banking. This work aims to utilize the capability of Artificial intelligence for the improvement of physically challenged, deaf-mute people. Our work concentrates on the sign language used by the deaf community of Indian subcontinent which is...
Three new methods of feature extraction based on time-frequency analysis of speech are presented and compared. In the first approach, speech spectrograms were passed through a bank of 12 log-Gabor filters and the outputs are averaged. In the second approach, the spectrograms were sub-divided into ERB frequency bands and the average energy for each band is calculated. In the third approach, wavelet...
Speech has recently been recognized as an attractive method for the measurement of cognitive load. Current speech-based cognitive load measurement systems utilize acoustic features derived from auditory-motivated frequency scales. This paper aims to investigate the distribution of speech information specific to cognitive load discrimination as a function of frequency. We found that this distribution...
In this paper, a new algorithm for speech coding is proposed. This algorithm is based a revised sinusoidal model, in which each component is represented with two instantaneous amplitudes and a frequency. This model avoids the difficulty in estimating the highly nonlinear phases and allows one to optimize the amplitudes once the frequencies are estimated. Simulations indicate that the proposed model...
Voiced speech is produced by excitation of the vocal tract system with the quasiperiodic vibrations of the vocal folds at the glottis. These excitations have become significantly stronger when the vocal folds are fully opened or about to be closed. In this work, the focus is on estimating these instants of significant excitation using temporal phase periodicity present in the speech signal. Assuming...
Inadequate velopharyngeal closure, due to structural or neurological problems, allows air to pass through the nasal cavity leading to introduction of inappropriate nasal resonances during speech production resulting in hypernasal speech. Our previous work on the acoustic analysis of hypernasal speech using group delay function for the detection of hypernasality showed stable effects of vowel nasalization...
The pioneering work on the `separation of speech from mixture of acoustic sources' dates back to as early as 70s and since then, two main approaches namely traditional approach using signal-processing techniques and computational auditory scene analysis (CASA) approach using auditory-modeling methods have been concurrently attempted by researchers to find solution to the problem of what is known as...
A database for speaker verification of Chinese whispered speech is established. It is based on the assumption that whispers are easily affected by the environmental and speakers' emotional factors. The manuscript for the corpus considers the structure of Chinese syllables, including all the categories of the initials, finals and tones. 8 typical channels are applied to collect the speech, mainly stated...
The performance of a speaker recognition system decreases when the speaker is under stress or emotion. In this paper we explore and identify a mechanism that enables use of inherent stress-in-speech or speaking style information present in speech of a person as additional cues for speaker recognition. We quantify the the inherent stress present in the speech of a speaker mainly using 3 features, namely,...
Laryngeal diseases affect many professionals who use their voices as the main working tool, such as teachers, singers, radio and TV presenters, among others. Advanced diagnosis techniques of these diseases are typically invasive, causing much discomfort to the patient. In recent years techniques of digital voice processing have been investigated to obtain non-invasive systems to aid the diagnosis...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.