The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents an improvement of a distributed Thai speech recognizer (SR). Two main objectives of the improvement are investigated; 1) the response time in terms of a real-time factor (RTF), 2) the cloud computing deployment. The proposed framework adapts and migrates the baseline collaborative DSR system to the Docker platform. Multiple containers are shared system resources such as CPU, memory,...
To build conversational robots, roboticists are required to have deep knowledge of both robotics and spoken dialogue systems. Unlike using stand-alone speech recognition/ synthesis toolkits, a cloud robotics platform for human-robot communication enables high-quality speech recognition and synthesis that is optimized to human-robot interactions. This is challenging because we need to build a wide...
This paper presents an improvement of a distributed Thai speech recognizer, aiming to enhance system response time as measured by a real-time factor (RTF) for a better user experience. The system is designed based on a collaborative multi-agents and task workers concept. A Streaming Agent is introduced to manage speech signal transfer while a Recognition Agent is applied to manage speech recognition...
TREN (Turkish Recognition ENgine) is a modular, HMM-based (Hidden Markov Model) and speaker-independent speech recognition system whose system software architecture is based on Distributed Component Object Model (DCOM). TREN contains specialized modules that allow a full interoperable platform including a Turkish speech recognizer, feature extractor, end-point detector and a performance monitoring...
We have developed a multi-user large vocabulary speech recognition system employing a fully composed one-level weighted finite state transducer (WFST) based network on a Graphics Processing Unit (GPU). This system improves the overall throughput and latency of speech recognition engine which processes multiple users' utterances at the same time with efficient scheduling, parameter sharing, and communication...
This paper describes our efforts developing the smart home environment for the assistive living. The key element of the smart environment is the ubiquitous voice user interface with several additional capabilities (such as the recognition of several gestures). This work is a further development of voice controlled devices. The presence of the commercial speech recognition engines and our experience...
The constant improvement of both hardware and software related to mobile computing is enhancing the capabilities of mobile devices. The present day mobile phones can run rich stand alone applications as well as distributed client-server applications that access information via a web gateway. This changed environment brings new opportunities as well as constraints for mobile application developers...
This paper developed a speech controlled interface with cloud computing technology for vehicle on-board diagnostic (OBD) system. The proposed vehicle OBD system is constructed by two parts. They are OBD embedded global position system (GPS-OBD) module and vehicle surveillance server. The speech recognition task is performed in vehicle surveillance server, instead of GPS-OBD module. The speech signal...
This paper developed a speech controlled interface for vehicle on-board diagnostic (OBD) system. The proposed vehicle OBD system contains three parts. They are OBD embedded global position system (GPS-OBD) module, speech controlled interface, and vehicle surveillance server. The GPSOBD module is designed to monitor the real-time location as well as operation information of vehicle. The real-time location...
In this paper, a real-time speech to speech translation (S2ST) system in mobile environment is designed and implemented as a client-server architecture. Particularly, we apply cross lingual speaker adaptation to adapt synthesized speech to enrolling speaker to ensure personalization. This realtime S2ST system provides streaming way, multi-threading and speaker adapted speech to speech translation...
One of the important challenges in today's contact center solution is to provide the service to the customer in a cost effective manner without disregarding the customer This paper describes the implementation of Personalized IVR system in Contact Center. Personalized IVRs are used to provide self service to the customer so as reducing the burden from the customer care representatives also called...
This paper outlines the first Asian network-based speech-to-speech translation system developed by the Asian Speech Translation Advanced Research (A-STAR) consortium. The system was designed to translate common spoken utterances of travel conversations from a certain source language into multiple target languages in order to facilitate multiparty travel conversations between people speaking different...
As the widely application of digital multi-media technology, G.729 has become one of the most popular audio standards. In this paper, the TMS320DM6446 which is a Davinci-based multi-core processor is chosen as our platform. We present our implementation method of G.729 algorithm which is compatible with eXpress DSP algorithm interface standard - digital media (xDAIS-DM, also called "xDM")...
With the proliferation of Web 2.0 applications, collaborative learning has gathered a lot of attention due its potentiality in the e-learning field. Forums, Wikis and Blogs for example are only some of the applications that exploit the collaborative nature of e-learning. However, these applications are originally designed for access from desktop systems and access to them when on the move can prove...
Voice interaction is a conversation mode between human and computer pursued by people. Based on the .NET platform, the techniques of data binding and text to speech are used to create voice homepages at the web sites of online shopping, and the technique of streaming media is used to transfer data, which make it possible to browse text and play voice simultaneously, and even enable a blind person...
We propose and implement a low-cost Thai voice gateway that combines current technology in network systems and telephony. It enhances traditional telephony-based applications with access to resources on the Web. The system is based on open standards for speech technology and existing open source software. It supports the VoiceXML markup language for voice dialogs, the MRCP protocol for communication...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.