The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays there is an increasing interest on deploying spoken conversational agents to provide ubiquitous Question and Answering information to customers about corporate services and commercial products and supporting different users' devices such as PC desktops or mobile phones. Unfortunately, creating an accurate system requires a lot of handwork, where developers must consider several factors such...
We propose a spoken dialog strategy for car navigation systems to facilitate safe driving. To drive safely, drivers need to concentrate on their driving; however, their concentration may be disrupted due to disagreement with their spoken dialog system. Therefore, we need to solve the problems of user misunderstandings as well as misunderstanding of spoken dialog systems. For this purpose, we introduced...
This paper presents a voice controlled speaker verification system for hand-held devices in noisy environments. In noisy environments, users unintentionally increase their voice intensity because of the ear-mouth feedback mechanism i.e., the Lombard effect; thus, the characteristic of the input signal is much different from that in a quiet environment. To enhance the accuracy of a speaker verification...
State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling. Magnitude spectrum has been a dominant feature over the years. Although perceptual studies have shown that phase spectrum is essential to the quality of synthesized speech, it is often ignored by using a minimum phase filter...
Augmented Reality (AR) applications are nowadays largely diffused in many fields of use, especially for entertainment, and the market of AR applications for mobile devices grows faster and faster. Moreover, new and innovative hardware for human-computer interaction has been deployed, such as the Leap Motion Controller. This paper presents some preliminary results in the design and development of a...
Acoustic perturbation due to reverberation and the changes in speaker position are detrimental to seamless human-robot speech-based communication. These cause a mismatch between the speech features at runtime condition and the acoustic model (training condition). Then the degradation of the Automatic Speech Recognition (ASR) and the Spoken Language Understanding (SLU) performances is imminent. As...
This work proposes two different methods for polarity detection in speech and Electroglottograph (EGG) signals using Hilbert Envelope (HE). HE is defined as the magnitude of complex time function and hence an unipolar signal. The zero frequency filtering (ZFF) obtained from HE of LP residual is of same phase for both polarity. Alternatively, the ZFF of speech and EGG, integrated linear prediction...
As one of the recent popular discriminative training methods, Minimum Classification Error (MCE) training aims at efficiently developing high-performance classifiers through the minimization of smooth (differentiable in classifier parameters) classification error count loss. However, MCE training, sometimes referred to as Functional Margin (FM) MCE training, does not necessarily guarantee training...
Accurately identifying the word endpoints is an important step of speech recognition process. This paper proposes a robust word endpoints detection algorithm of continuous speech signal collected from real world environment. In this process energy feature is used along with zero crossing rate feature to locate the endpoints of word in speech signal. A set of 100 different sentences have been recorded...
A robust version of non-negative matrix factorization (RNMF) with generalized Kullback-Leibler divergence designed for the task of unsupervised monaural speech enhancement is proposed. RNMF tackles unsupervised speech enhancement problem through factorizing the magnitude spectrum of mixture into the sum of a non-negative sparse matrix and a non-negative low-rank matrix. The parameters of nonnegative...
Songs play an important role in entertainment. An audio signal separation system should be able to identify different audio signals such as speech, music and background noise. In a song the singing voice provides useful information. An automatic singing voice separation system is used for attenuating or removing the music accompaniment. The singing voice becomes a main attractive focus of attention...
Bio-medical research extends towards human voice and auditory systems day by day. Similarly it helps for the security issues. Emotion analysis and recognition for such purpose is a challenging task. To analyze and recognize, the emotions has been attempted in this piece of work. Initially, Sub-band spectral features have been extracted to characterize high arousal angry, happy, fear, surprise and...
Emotions are important constituents of human behavior. The production and perception of cues of emotions is a complex task involving both verbal and nonverbal aspects of behavior. This complexity is further enhanced by the fact that emotions are subject to interpretation; a resulting emotion cannot be compositionally derived from its constituent building blocks. Even though we commonly associate an...
The earliest research on emotion recognition starts with simulated/acted stereotypical emotional corpus, and then extends to elicited corpus. Recently, the demanding for real application forces the research shift to natural and spontaneous corpus. Previous research shows that accuracies of emotion recognition are gradual decline from simulated speech, to elicited and totally natural speech. This paper...
To analyze auditory scenes of robots' surrounding environments, not only speeches but also non-speech sounds are important, which are spatially distributed and have different spectral and temporal characteristics. Thus, this paper investigates Acoustic Event Identification (AEI) which includes problems of localization, detection, and identification of sound sources. To achieve AEI by a robot in a...
In the article there are presented the results of research on the influence of the lossy compression, used in codecs G.711, G.723.1 and iLBC, on the efficiency of isolated speech phrase recognition. In the research the degree of robustness against degrading factors in the parameterisation method of audio signal LPCC and MFCC (Linear Prediction Cepstral Coefficients, Mel Frequency Cepstral Coefficients)...
In this paper, a novel speech enhancement approach was proposed to improve the quality of the speech contaminated with various types of non-stationary noises. An EMD based clear recursive thresholding (EMD-CRT) approach was proposed in this approach, inspired by wavelet thresholding. This approach performs the thresholding operation on the noisy speech recursively, such that the non-stationary noises...
Speaker recognition can be used as a security means to authenticate the speaker or as a forensic tool to determine who is likely to be the talker. For such critical applications, robustness or reliability of the system is crucial. In spite of the development and advancement in the field of speaker recognition, there are still many limitations and challenges. Amongst these, environment factors, in...
This paper proposes the unification of the codeexcited linear prediction (CELP) codec process with watermarking based on formant tuning. The serial problem in atermarking and then encoding with the CELP codec was thereby reduced by using the proposed method which also ncreased the bit detection rate. We took advantage of two key properties: I) humans do not perceive alterations applied to formants...
Audio hashes are compact and robust representations of audio data and allow the efficient identification of specific recordings and their transformations. Audio hashing for music identification is well established and similar algorithms can also be used for speech data. A possible application is the identification of replayed telephone spam. This contribution investigates the security and privacy...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.