The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a bi-lingual Thai-English text-to-speech synthesis (TTS) system on Android mobile devices. The system deploys a Thai text processor and a well-known open-source English text processor, which can analyzes English text at high intelligibility. With hidden Markov model (HMM) based speech unit and audio streaming optimization, it can synthesize highly smoothed sounds at a fast response...
Part-of-speech (POS) has been widely used as the main feature for predicting phrase breaks in text-to-speech synthesis (TTS) systems. However, POS does not clearly represent syntactic information that is necessary for analyzing the grammatical tree structure of a language to assign phrase breaks. Instead of using POS, this paper proposes to use categorial grammar (CG), which embeds fine syntactic...
This paper introduces the use of automatic speech recognition (ASR) in speech intelligibility testing for oral surgical patients caused by cancer, tumor, or fracture. The proposed automatic system aims to help reduce the effort and cost required by the conventional human listening test. We developed a phone-based ASR system which outputs the best matched word out of a set of target words given a patient's...
This paper presents a new technique for smoothing and reducing speech feature vectors for speaker recognition using an adaptive weighted-sum algorithm, aims at reducing computation time and increasing the recognition performance. The proposed technique is based on a three-frame sliding window. Each step of window sliding, three feature frames in the window are used to compute weight values based on...
We believe that a benchmark evaluation is one of the key factors that help accelerate research and development of a Thai speech recognition system as various algorithms and training techniques can be systematically compared. In this paper, we are interested in benchmarking a general-domain Thai Large Vocabulary Continuous Speech Recognition (LVCSR) system using the LOTUS speech corpus. We conducted...
This paper outlines the first Asian network-based speech-to-speech translation system developed by the Asian Speech Translation Advanced Research (A-STAR) consortium. The system was designed to translate common spoken utterances of travel conversations from a certain source language into multiple target languages in order to facilitate multiparty travel conversations between people speaking different...
This is a non-technical paper describing how and why we organized BEST 2009, the first contest in the series of ldquobenchmark for enhancing the standard of Thai language processingrdquo, which is expected to help accelerate the progress of the natural language processing technology in Thailand by assembling 3 essential components: common standards, resources and researchers. The BEST 2009 : Thai...
This paper describes the design and construction of the LOTUS-BN corpus, a Thai television broadcast news corpus. In addition to audio recordings and their transcription, this corpus also includes a detailed annotation of many interesting characteristics of broadcast news data such as acoustic condition, overlapping speech, news topic and named entity. The LOTUS-BN is still an ongoing project with...
The paper represents a front-end process for changing esophageal speech features into normal speech features in order to improve a recognition rate of esophageal speech in a speech recognition system that training by normal speech corpus based on Hidden Markov Models (HMMs). A system, that combines feature conversion technique and cepstral normalization technique in order to prevent variation bias...
This paper presents a monosyllabic Thai tone recognition system, which is based on the Ant-Miner algorithm. The system is composed of three main processes, fundamental frequency (F0) extraction from input speech signal, analysis of F0 contour for feature extraction, In the F0 feature extraction, the polynomial regression functions are employed to fit the segmented F0 curve where its coefficients are...
This article explains the history of Thai language development for computers, examining such factors as the language, script, and writing system, among others. The article also analyzes characteristics of Thai characters and I/O methods, and addresses key issues involved in Thai text processing. Finally, the article reports on language processing research and provides detailed information on Thai...
In segment-based speech recognition systems, the quality of the segmentation step is a major factor highly affecting their accuracies. This paper proposes methods to reduce missing segments caused by boundary insertion errors in segment graphs, which, in the case of Thai, could be generated from a probabilistic segmentation with limited speech resources. Acoustic discontinuities and manners of articulation...
This article summarizes recent works on two advanced user interfaces for traffic information systems, which are expected to support the future amount of users accessing daily for the information. Natural language processing technology has been applied in two ways. First, automatic speech recognition and text-to-speech synthesis are integrated in a telephone call center in order to automatically retrieve...
This paper proposes a novel approach called noise-cluster HMM interpolation for robust speech recognition. The approach helps alleviating the problem of speech recognition under noisy environments not trained in the system. In this method, a new HMM is interpolated from existing noisy-speech HMMs that are best matched to the input speech. This process is performed on-the-fly with an acceptable delay...
This paper presents the naturalness improvement in Thai unit-selection text-to-speech synthesis (TTS) by automatic weighting of targeted cost. An intuition of the proposed method is that the sensitivity of human perception might be varied to different phonemic and prosodic units. In this work, the unit-selection targeted-cost of each phoneme unit is weighted differently according to its duration statistic...
We propose and implement a low-cost Thai voice gateway that combines current technology in network systems and telephony. It enhances traditional telephony-based applications with access to resources on the Web. The system is based on open standards for speech technology and existing open source software. It supports the VoiceXML markup language for voice dialogs, the MRCP protocol for communication...
This paper proposes the use of tree-structured model selection and simulated-data in maximum likelihood linear regression (MLLR) adaptation for environment and speaker robust speech recognition. The objective of this work is to solve major problems in robust speech recognition system, namely unknown speaker and unknown environmental noise. The proposed solution is composed of two components. The first...
This paper aims to propose a technique of speech segmentation using zero-crossing (ZC) with an average of absolute amplitude to classify speech and non-speech by analyzing silence (sil) position and end point detection. To automatic segment speech file, we determine the silence position and speech position to specify point of sectioning. Thereafter, we compare the length of each sectioning file with...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.