The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We have made an application to make rap music video with CG animation by writing out a simple script. Aquestalk and TVML (TV program Making Language) are used for synthesized voice and real-time CG generation, respectively. A user can enjoy making rap music video easily by writing speech texts and character movements along with the music beat in the script.
This paper describes SIIP (Speaker Identification Integrated Project) a high performance innovative and sustainable Speaker Identification (SID) solution, running over large voice samples database. The solution is based on development, integration and fusion of a series of speech analytic algorithms which includes speaker model recognition, gender identification, age identification, language and accent...
In this paper, we explore the possibility of using existing voice recognition tools, in order to add the voice control interface to the existing smart home automation system. The choice of the voice recognition engine influences the architecture of the voice command interface, and determines its performance. We discuss the possible architectures of the voice enabled smart home automation systems....
In corpus preparation we do part-of-speech (POS) tagging where we add POS information into the corpus in the form of tags. The POS information contains a number of tags such as noun, pronoun, verb, adjective, adverbs, preposition, conjunction etc. Literature shows the lack of corpora for Santali language. In this paper we have created and described a Santali language corpus using Sketch Engine corpus...
This paper proposes an approach for automatic language identification (LID) for seven Indian languages. The proposed system uses language dependent phonotactic features and prosodic information. Phonetic Engine (PE)which serves as the front end of the phonotactic based LID system converts input speech utterance to a sequence of phonetic symbols. Syllable boundaries are detected and phones within a...
This paper presents the work done in the context of the Speech2Process project for Speech Dialogue System applied in call-centers, specifically in the banking domain. In our proposed solution, the client communicates with the system by natural language sentences, which will be automatically recognized and semantically analysed. The paper describes innovative features of the selected approach, which...
Machine Translation, sometimes referred by the acronym MT, is one of the important fields of study of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of atomic words in one natural language for words in another language. Around the world, numerous systems are...
The phoneme set influence for Lithuanian speech commands recognition accuracy is investigated. Four phoneme sets are discussed. LIEPA speech corpus for training of Acoustic Model is used. The phonetic representation of corpus transcriptions is generated by grapheme-to-phoneme transformation rules. Rule based transformations for Lithuanian language is proposed. Recognition engine with CMU Pocketsphinx...
In this paper, we propose a methodology for the fusion of different modes of speaker verification (SV) operation (fixed-passphrase, text-dependent and text-independent mode), using regression fusion models. The experimental results with and without spoofing attack conditions and using different single mode speaker verification engines, GMM-UBM, HMM-UBM and i-vector, indicated improvement in all the...
The main problem in communication is language bias between the communicators. This device basically can be used by people who do not know English and want it to be translated to their native language. The novelty component of this research work is the speech output which is available in 53 different languages translated from English. This paper is based on a prototype which helps user to hear the...
Keeping track of the multiple passwords, PINs, memorable dates and other authentication details needed to gainremote access to accounts is one of modern life's less appealingchallenges. The employment of a voice-based verification as abiometric technology for both children and adults could be agood replacement to the old fashioned memory dependentprocedure. Using voice for authentication could be...
Robust distant speech recognition (DSR) is necessary in many speech technology applications using multiple microphones but has received only limited treatment in the literature. In this paper, we work on communicating with vehicle voice-controlled system which is one of the applications of DSR. Two approaches for DSR are i) signal-level combination using beamforming followed by automatic speech recognition...
The creation of a robust system for speech recognition requires a great effort and resources. One of the crucial elements in the creation of such systems is the phonetic dictionary. The creation of such a dictionary requires phonologists experts and linguists to create the best possible dictionary. But the Arabic phonologists have two theory about the Arabic phonemes. The first one says that vowels...
Phonetic Engine (PE) is a system that is used to determine the sequence of phones in a spoken utterance. In order to transcribe the speech database, International Phonetic Alphabet (IPA) is used. This work focuses on developing multilingual PE for four Indian languages namely, Bengali, Hindi, Urdu and Telugu. The number of languages can be increased to any number. For developing the PE, read speech...
Deep neural network (DNN) based speech recognizers have recently replaced Gaussian Mixture Model (GMM) based systems as the state-of-the-art. Developing a phonetic engine and enhancing its performance can lead to significant improvement in Automatic Speech Recognition (ASR). However only a less work has been reported in developing Phonetic engine on large vocabulary Kannada speech corpus. In this...
This paper presents an improvement of a distributed Thai speech recognizer (SR). Two main objectives of the improvement are investigated; 1) the response time in terms of a real-time factor (RTF), 2) the cloud computing deployment. The proposed framework adapts and migrates the baseline collaborative DSR system to the Docker platform. Multiple containers are shared system resources such as CPU, memory,...
Communication is a very natural characteristic of every creature. Sometimes we use different symbols, or many formed languages to communicate each other. Every Languages we use are able for both oral and text communications. Writing symbols is a way to express our intentions through using any physical material. As we have oral communication capability too which we could use exactly as we want to speak...
This paper presents a low-power acoustic signal processor for fully-implantable cochlear implants. The developed processor supports adaptive beamforming, frequency-domain analysis, envelope detection, channel combination, and magnitude compression. Power and area are minimized by leveraging dedicated real-valued FFT, register count minimization, data allocation optimization, hardware complexity reduction,...
To build conversational robots, roboticists are required to have deep knowledge of both robotics and spoken dialogue systems. Unlike using stand-alone speech recognition/ synthesis toolkits, a cloud robotics platform for human-robot communication enables high-quality speech recognition and synthesis that is optimized to human-robot interactions. This is challenging because we need to build a wide...
In mobile speech communications, because of the noise interfering with speech at one end, the intelligibility of speech degrades at the other end. In this paper, we focus on suppression of noise produced by vehicular and automobile mechanical engines, in whose presence the intelligibility of the speech deteriorates, when transmitted to the other end (or) while recording. We propose a method using...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.