The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper addresses an over-smoothing effect in Gaussian Mixture Model (GMM)-based Voice Conversion (VC). The flexible use of the statistical approach is one of the major reason why this approach is widely applied to the speech-based systems. However, quality degradation by over-smoothed speech parameter converted is unavoidable problem of statistical modeling. One of common approaches to this over-smoothness...
This paper presents an emotional voice conversion (VC) technology using non-negative matrix factorization, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The input source spectrum is decomposed into the source spectrum exemplars and their weights. By replacing source exemplars with target exemplars, the converted spectrum and FO...
In this paper, we propose a method for noise-robust speech recognition in a home environment based on noise modeling and parallel decoding. There are three basic ideas of the proposed method. First, we model the noise signals observed in the environment using a GMM. Second, we generate multiple noise-reduced signals using the mean vectors of the GMM and decode the signals in parallel. Third, we choose...
Speech signal can be represented as a combination of acoustic parameters extracted from the speech signal. The parameter vectors are assumed to be the constituents of the speech signal over a specified duration during which it is stationary. Typical representations are Mel Frequency Cepstral Coefficients, Linear Prediction Coefficients etc. The process of isolated word recognition involves the mapping...
Now-a-days, online interpersonal communications have become more preferable than face-to-face interactions. However, emotions play a significant role in online communication. Automatic extraction of emotions from the text is a hot research issue because it minimizes the communication gap and misunderstanding between users. To become emotionally more intelligent, our previous text to emotion analyzing...
This paper proposed a new approach used for tracking multi-pith within one mixture speech signal. In this method, we employed a novel continuous correlation feature for calculating pitch model. This feature not only represents the harmonicity but also includes the information of spectral continuity, and hence improving the accuracy of the multi-pitch estimate. A DBNs and HMM hybrid model was further...
In human-to-human speech communication, various barriers are caused by some constraints, such as physical constraints causing vocal disorders and environmental constraints making it hard to produce intelligible speech. These barriers would be overcome if our speech production was augmented so that we could produce speech sounds as we want beyond these constraints. Voice conversion (VC) is a technique...
In this article, we introduce a novel approach for estimating the coefficients of a memoryless preprocessor for nonlinear acoustic echo cancellation (NL-AEC) using particle filtering. The acoustic echo path is modeled by a nonlinear-linear cascade of a memoryless preprocessor (to model the loudspeaker nonlinearities) preceding a linear finite impulse response filter (estimated by the normalized least...
Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of time domain, frequency domain and cepstral...
In this paper we implement state of the art factor analysis based methods and fused their scores to gain a channel robust speaker recognition system. These two methods are joint factor analysis (JFA) and i-Vector which define low-dimensional speaker and channel dependent spaces. For score fusion we propose a simple weight computation without training step. We experiment our method on two conditions;...
A new approach of speaker clustering is presented and discussed in this paper. The main technique consists in grouping all the homogeneous speech segments obtained at the end of the segmentation process, by using the spatial information provided by the stereophonic speech. The proposed system is suitable for debates or multi-conferences for which the speakers are located at fixed positions. The new...
The use of deep neural networks (DNNs) has improved performance in several fields including computer vision, natural language processing, and automatic speech recognition (ASR). The increased use of DNNs in recent years has been largely due to performance afforded by GPUs, as the computational cost of training large networks on a CPU is prohibitive. Many training algorithms are well-suited to the...
Sentence similarity measures play an increasingly important role in text-related research and applications in areas such as text mining, Web page retrieval, and dialogue systems. Existing methods for computing sentence similarity have been adopted from approaches used for long text documents. These methods process sentences in a very high-dimensional space and are consequently inefficient, require...
In this paper, we address an exemplar-based hidden markov model (HMM) that represents the lip motion activity using visual cues for lipreading. The discriminative visual features including the geometric shape parameters and contour-constrained spatial histogram are selected for representing each lip frame. Then, a set of exemplars associated with the HMM is learned jointly to serve as a typical representation...
In this paper, we proposed a fundamental frequency prediction method which is used primarily in the voice conversion system. This paper establishes a Gaussian Mixture Model (GMM) to predict the fundamental frequency based on the Linear Predictive Cepstral Coefficient (LPCC). The model may be the speaker-dependent Gaussian mixture model or the speaker-independent universal background model that is...
The propagated sound waves in an indoor environment hit the surfaces of solid objects and produce reverberant speech signals. Reverberated speech signals in noisy acoustical environments cause some problems such as reducing speech intelligibility, distinguishing speakers, locating source, quality for hands-free telephony, hearing aid, etc. Adaptive filters can be applied to suppress the interfering...
Dataflow modeling offers a myriad of tools in designing and optimizing signal processing systems. A designer is able to take advantage of dataflow properties to effectively tune the system in connection with functionality and different performance metrics. However, a disparity in the specification of dataflow properties and the final implementation can lead to incorrect behavior that is difficult...
The aim of this paper is to introduce an enhanced approach for standard Automatic Speaker Recognition (ASR) systems in noisy environment in conjunction with a Blind Source Separation (BSS) algorithm. This latter is able to discern between interfering noise signals and the reference speech signal, hence it can be consider as a necessary preprocessing step. The main problem of the proposed approach...
With the rapid advance in information technology, more and more information exchange platforms appear. People can freely exchange information on these platforms. However, not all information is reliable. To make correct decisions, it is necessary to detect and remove unreliable information. The main purpose of this study is to improve the reliability of hotel ranking by detecting and deleting outlier...
Human-computer interaction is a hot topic in artificial intelligence. Artificial navigation is an interesting application of human-computer interaction, which control the action of the target device by speech or gestures information. The main virtue of artificial navigation is that it can control target device within a distance without any remote control device. This technology can be used in the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.