The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Speech has recently been recognized as an attractive method for the measurement of cognitive load. Current speech-based cognitive load measurement systems utilize acoustic features derived from auditory-motivated frequency scales. This paper aims to investigate the distribution of speech information specific to cognitive load discrimination as a function of frequency. We found that this distribution...
This paper focuses on tone classification for the Vietnamese speech. Traditionally, tone was classified or recognized by the fundamental frequency F0. However, our experimental results indicate that along with the fundamental frequency, Mel Frequency Cepstrum Coefficients and frequency modulation also carry a significant amount of tone information in the Vietnamese speech. Therefore, the proposed...
In this paper we propose a novel language identification system which utilizes fused phonotactic information. The phase spectrum of speech signals is used with the magnitude spectrum in order to obtain a more robust feature representation. Parallel Broad Phoneclass Recognition followed by Language Model (PBPRLM) is used in order to remove the bias of the likelihood scores introduced by the size inequality...
This paper presents two novel contributions to automatic language identification. The first one is the use of the modified multi-layer Kohonen self-organizing feature map (MLKSFM) as a pre-classification for language identification (LID). Secondly, we discuss the novel application of empirical mode decomposition (EMD) to generate features for the LID pre-classification task. The use of instantaneous...
Our previous research indicates that the multi-layer Kohonen self-organizing feature map (MLKSFM) gives a promising performance for spoken language identification (LID). In this paper, we enhance this approach in two distinct ways. Firstly, by considering the phase information, we propose a new type of feature vector which combines the modified group delay function (MODGDF) and the traditional MFCC...
This paper describes a novel method for tonal and non-tonal language classification using prosodic information. Normalized feature parameters that measure the speed and level of pitch change are used to perform the classification task. To demonstrate the effectiveness of the proposed method, the classification rates of different system configurations are compared. Evaluating a 16-language classification...
Efficient road traffic incident management (TIM) in metropolitan areas is crucial for the smooth traffic flow and the mobility and safety of community. TIM requires fast and accurate collection and retrieval of critical data, such as incident conditions and contact information for the intervention crew, public safety organisations and other resources. Access to critical data by traffic control operators...
This paper describes a novel and noise robust front-end that employs the use of Hough transform for simultaneous frequency and temporal masking, together with cumulative distribution mapping of cepstral coefficients, for noisy speech recognition. Recognition experiments on the Aurora II connected digits database have revealed that the proposed front-end achieves an average digit recognition accuracy...
Previous research indicates that automatic language identification systems based on phonotactic information produce the best results compared with other systems based on acoustic or prosodic information. This paper investigates two different approaches that use phonotactic information: parallel phoneme recognition followed by language modeling (PPRLM) and multi-lingual PRLM. In the PPRLM approach,...
Efficient road traffic incident management in metropolitan areas is crucial for the smooth traffic flow and the mobility and safety of community. Traffic incident management requires fast and accurate collection and retrieval of critical data, such as incident conditions, and contact information for the intervention crew, public safety organisations and other resources. Access to critical data by...
Robustness in the presence of various types and levels of environmental noise remains an important issue for automatic speech recognition (ASR) systems. This paper describes a new noise-robust ASR front-end that employs a functional model of forward temporal masking combined with cumulative distribution mapping based on MFCC's with c0. Recognition experiments on the Aurora II connected digits database...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.