The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This study compares the perceptual performance of Mandarin basic vowels “e” (/ɤ/) and “u” (/u/) in different contexts (independent & contextual). Results indicate that perception of the target vowel is influenced by the adjacent vowel context in a contrastive manner in both identification and discrimination tests. Moreover, in a context of higher F1 and F2, listeners found it more difficult to...
In Mandarin language speaking, some consonant and vowel pairs are hard to be distinguished and pronounced clearly even for some native speakers. This study investigates the signal distance between consonants compared in pairs from the signal processing point of view to reveal the correlation of signal distance and consonant pronunciation. Some popular speech quality objective measures are innovatively...
In process of learning Chinese as a second language (CSL), Japanese natives have difficulties in tone perception. Among the four Chinese lexical tones, the tone pairs Tone 1-Tone 2 and Tone 1-Tone 4 are problematic for Japanese CSL beginners. In order to help them develop efficiently discriminating capability of the tone pairs, we designed a hybrid perceptual training scheme which combined adaptive...
We proposed an auxiliary categorization framework for training speech synthesis systems using deep neural networks (DNNs) and recurrent neural networks (RNNs). The adopted artificial neural networks (ANNs) are regression models comprising a few hidden layers and an affine-transform layer for transforming the contextual features into a set of speech synthesis parameters. In order to incorporate categorization...
This study aims to find out how gender affects prosodic entrainment in Mandarin conversation. Based on the analyses of Tongji Games Corpus, it is found that in Mandarin conversations, mixed the gender groups entrain on the greatest number of features and males entrain on the least; A cross-linguistic comparison between Mandarin Chinese and English finds striking similarities over the number of prosodic...
This study compared neutral tone (T0) of Mandarin produced by native speakers and by Cantonese L2 learners, using both acoustic analysis and perceptual experiment. The T0 syllables after four different tones in three word contexts (i.e., isolated, non-focused, and on-focus) were investigated. The perceptual experiment showed that T0 in the L2 group obtained a lower rate of acceptance than in the L1...
The present study investigated the effects of syllable structure and prosodic strengthening on the consonant production in SHC, which has a three-way contrast among aspirated, unaspirated and breathy (voiced) stops. Obviously they had different mechanisms, as glottal coda shortened the VOT while focus lengthened the VOT of aspirated and breathy stops, but they both increased the intensity. While the...
In this paper, a hidden Markov model (HMM)-based cue parameters estimation method for single-channel speech enhancement is proposed, in which the cue parameters of binaural cue coding (BCC) are applied to single-channel speech enhancement system successfully. First, the clean speech and noise signals are considered as the left and right channels of stereo signal, respectively; and the noisy speech...
Aphasia is a type of acquired language impairment caused by brain injury. This paper presents an automatic speech recognition (ASR) based approach to objective assessment of aphasia patients. A dedicated ASR system is developed to facilitate acoustical and linguistic analysis of Cantonese aphasia speech. The acoustic models and the language models are trained with domain- and style-matched speech...
The impulse-sequence representation of the excitation source information in normal speech signal has been explored for speech coding. Such a representation, if can be developed for paralinguistic and emotional speech sounds, would help in their acoustic analyses. This paper proposes a sparse representation of the excitation source characteristics of nonnormal speech sounds signal, in terms of a time-domain...
In this paper, we propose a cluster-based senone selection method to speed up the computation of deep neural networks (DNN) at the decoding time of speech recognition. In DNN-based acoustic models, the large number of senones at the output layer is one of the main causes that lead to the high computation complexity of DNNs. Inspired by the mixture selection method designed for the Gaussian mixture...
Relatively little research has addressed the role of LI in the perception of English speech contrasts by Chinese learners of English as L3. The present study investigates the role of LI in the perception of the English alveolar-velar nasal coda contrast (/n/ vs. /η/) after the vowels /i Λ æ/ by bilingual Changsha Chinese speakers, whose LI is Changsha Chinese and L2 is Standard Mandarin. Changsha...
This study explored word-level prosodic strength in Mandarin Chinese reflected by tone reduction on the second syllables in Tone4+Tone4 words, by examining the slope difference between the two consecutive tones as an indicator for tonal reduction. It was found that firstly, the occurrence of tonal reduction is dependent on the internal structure of the word: words formed by apposition, (pseudo-)suffixation...
This paper proposes a robust front-end for speech applications based on restoration scheme of instantaneous amplitude and phase. Typical applications such as hearing aids and automatic speech recognition systems still have challenging issues with regard to robustness against noise and reverberation. The proposed front-end employed a combination of our previously proposed method for restoring instantaneous...
In this paper, we describe our practical efforts for applying speech emotion recognition(SER) in customer care scenarios. We systematically analyze the challenges we observe in our data, which are very different from speech emotion databases uttered by actors. Our contributions are two-fold. One, we propose a 2-level framework to measure the customers satisfaction score on the conversation level....
Recurrent neural networks (RNNs) with a gating mechanism have been shown to give state-of-the-art performance in acoustic modeling, such as gated recurrent unit (GRU), long short-term memory (LSTM), long short-term memory projected (L-STMP), etc. But little is known about why these gated RNNs work and what the differences are among these networks. Based on a series of experimental comparison and analysis,...
This paper proposes a novel speech denoising method based on tensor filtering, in which the microphone array speech signal is constructed by tensor data and processed by tensor filtering model. The multi-microphone signal is represented with three-order tensor space in the way of channel, time and frequency. Noise can be reduced by finding the lower-rank approximation of the three-order tensor with...
Conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover art, table of contents, copyright statements, title-page or half title-pages, blank pages, venue maps or other general information relating to the conference that was part of the original...
In this paper, we propose adapt the recurrent neural network (RNN) based language model to improve the performance of multi-accent Mandarin speech recognition. N-gram based language model has already been applied to speech recognition system, but it is hard to describe the long span information in a sentence and arises a serious phenomenon of data sparse. Instead, RNN based language model can overcome...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.