The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a system to convert neutral speech to emotional with controlled intensity of emotions. Most of previous researches considering synthesis of emotional voices used statistical or concatenative methods that can synthesize emotions in categorical emotional states such as joy, angry, sad, etc. While humans sometimes enhance or relieve emotional states and intensity during daily life,...
This paper proposes an emotional speech synthesis system based on a three-layered model using a dimensional approach. Most previous studies related to emotional speech synthesis using the dimensional approach focused on the relationship between acoustic features and emotion dimensions (valence and activation) only. However, people do not perceive emotion directly from acoustic features. Hence, the...
This paper proposes a newly revised three-layered model to improve emotion dimensions (valence, activation) estimation for bilingual scenario, using knowledge of commonalities and differences of human perception among multiple languages. Most of previous systems on speech emotion recognition only worked in each mono-language. However, to construct a generalized emotion recognition system which be...
Speech-to-speech translation (S2ST) is the process by which a spoken utterance in one language is used to produce a spoken output in another language. The conventional approach to S2ST has focused on processing linguistic information only by directly translating the spoken utterance from the source language to the target language without taking into account par-alinguistic and non-linguistic information...
Speech to Speech translation (S2ST) systems are very important for processing by which a spoken utterance in one language is used to produce a spoken output in another language. In S2ST techniques, so far, linguistic information has been mainly adopted without para- and non-linguistic information (emotion, individuality and gender, etc.). Therefore, this systems have a limitation in synthesizing affective...
Speech-to-speech translation (S2ST) is the process by which a spoken utterance in one language is used to produce a spoken output in another language. The conventional approach to S2ST has focused on processing linguistic information only by directly translating the spoken utterance from the source language to the target language without taking into account paralinguistic and non-linguistic information...
The speech transmission index (STI) is an objective measurement that is used to assess the quality of speech transmission in room acoustics. This paper proposes a simplified method of blindly estimating the STI in room acoustics based on the concept of the modulation transfer function (MTF). STI can be estimated with this method in four steps: (1) MTF is estimated in the whole band from the reverberant...
In this paper, we investigate eight objective speech intelligibility prediction measures for noisy signals before and after noise-reduction processing in Japanese. The Japanese speech signals were first corrupted by three types of noises at two signal-to-noise ratios and processed by four classes of noise-reduction algorithms, whose intelligibility was subsequently predicted by objective measures...
Concatenative speech synthesis (CSS) provides the greatest naturalness. However, it requires a huge stored database resulting a huge footprint. Reducing the capacity of stored database while preserving the quality of CSS, or improving the quality to size ratio (QSr), is still a challenge. In this paper, we propose a method of transforming fundamental frequency (F0) contours of lexical tones, developed...
In this paper, the performance of eight state-of-the-art objective measures is evaluated in terms of predicting speech intelligibility in Mandarin of the processed signals by noise-reduction algorithms. The speech signals were first corrupted by three types of noises at two signal-to-noise ratios and subsequently processed by four classes of noise reduction algorithms, followed by objective intelligibility...
This paper proposes a three-layer model for estimating the expressed emotions in a speech signal based on a dimensional approach. Most of the previous studies using the dimensional approach mainly focused on the direct relationship between acoustic features and emotion dimensions (valence, activation, and dominance). However, the acoustic features that correlate to valence dimension are less numerous,...
Quality of unit-based concatenative speech synthesis is low while that of corpus-based concatenative speech synthesis with unit selection is great natural. However, unit selection requires a huge data for concatenation that reduces the range of its applications. In this paper, by using temporal decomposition for modeling contextual effects intra-syllable and inter-syllables, we propose a context-fitting...
To construct a front-end for ASR systems using a small-scale microphone array in real environments, robustness for unstable sudden-noises, multi-noises and near-field sound sources are required. This paper proposes a front-end method for enhancing target signals that subtracts estimated noise from noisy signals by using paired microphones in each sub-band. The proposed method assumes one integrated...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.