The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper deals with the recognition process of Bangla speech. The used database consists of two sets of data - one is for training containing 3824 utterances of Bangla digit sequences of 25 male and 25 female speakers and the other one is test dataset containing 1985 utterances of 26 male and 26 female speakers. The test set is subdivided into four groups such as clean1, clean2, clean3 and clean4...
In this paper, we propose a scheme for recognizing isolated spoken Arabic digits, based on the Discrete Wavelet Transform (DWT) features. The Discrete Wavelet Transform is a transformation that can be used to analyze the temporal and spectral properties of non-stationary signals like audio, based on the time-frequency multi-resolution property of wavelet transform. In this paper, the extracted wavelet...
This paper presents an algorithm for segmenting a subset of emphatic and non-emphatic sounds automatically from continuously spoken Arabic speech. The important contribution of this paper is to generate rules for automatic segmentation of these sounds which can be extended to the rest of Arabic sounds. In addition, the findings can be used for other speech analysis problems such as data training for...
This paper proposes an emotion recognition system which allows recognizing a person's emotional state from speech signal. The aim of proposed solution is to improve the interaction among humans and computers. The emotion recognition system must be capable of recognizing at least six basic emotions (happiness, anger, surprise, disgust, fear, sadness) and the neutral circumstances. The proposed system...
This paper investigates the effect of topic dependent language models (TDLM) on phonetic spoken term detection (STD) using dynamic match lattice spotting (DMLS). Phonetic STD consists of two steps: indexing and search. The accuracy of indexing audio segments into phone sequences using phone recognition methods directly affects the accuracy of the final STD system. If the topic of a document in known,...
For spoken language processing applications like speaker recognition/verification, not only that the silence segments do not contribute any speaker specific information, but also it dilutes the already available information content in the speech segments in the audio data. It has been experimentally studied that removing silence segments with the help of a voice activity detector(VAD) from the utterance...
Sentence boundary detection (SBD), also known as sentence breaking decides where a sentence begins and ends. This paper describes sentence boundary detection using acoustic and prosodic features for spontaneous Malay language spoken audio. We introduced the addition of volume change rate to 7 prosodic features and rate-of-speech for our preliminary experiment of detecting sentence boundary. Experiments...
The main goal of the development of systems of formal logging activities is to automate the whole process of transcription of the participant speech. In this paper we outline modern methods of audio and video signal processing and personification data analysis for multimodal speaker diarization. The proposed PARAD-R software for Russian speech analysis implemented for audio speaker diarization and...
The use of biometric information has been known widely for both person identification and security application. Each person can be identified by the unique characteristics of one or more of person biometrics. One of the biometric characteristics of that a person can be identified by his voice. In this research, we are interested in studying the effect of proper features that are extracted from discrete...
This paper proposes tone model enhancement for low complexity tone recognition. The tone model reduces the number of input frames by estimating fundamental frequency (F0) from only estimated vowel signals, called vowel magnitude difference function, vowel-MDF (VMDF). Accordingly, it reduces F0 negative influence from neighboring syllables in continuous speech. We enhance tone recognition accuracy...
In this paper, the study of methods of multi-parameter objective evaluation of pronunciation is introduced. The accuracy of sentences, the emotional expression, the volume matching degree, tone, speaking rate and rhythm are selected as the parameters for evaluating an English sentence. As the result of the evaluation, an objective rating of the input voice, as well as the feedback are presented to...
This work describes the development of a scheme for retrieving spoken documents in a remote fashion stored on a voice server. The spoken documents are recorded and indexed based on the frequency of occurrence of isolated keywords and are stored on the voice server. An isolated word recognizer (IWR) is developed for recognizing the identified keywords spoken in isolated fashion. The IWR employs foreground...
The use of digital technology is growing at a very fast pace which led to the emergence of systems based on the cognitive infocommunications. The expansion of this sector impose the use of combining methods in order to ensure the robustness in cognitive systems.
Emotion Recognition from speech has evolved itself as the most significant research area in the field of affective computing. In this paper, two emotional speech datasets, have been analyzed, based on gender distinction (male and female speech). This paper introduces a new approach of speech-emotion recognition based on the use of AdaBoost classification Algorithm. Artificial neural network has been...
Speech recognition is one of the promising technologies of the future. Voice user interfaces play an important role in many real world applications. This paper presents speaker independent isolated digit recognition for Malayalam language and reveals some application areas of digit recognition. Mel-Frequency Cepstral Coefficient(MFCC) is used as feature and Hidden Markov Model(HMM) is used as the...
Detecting emotional traits in call centre interactions can be beneficial to the quality management of the services provided, since this reveals the positioning of both speakers, i.e. satisfaction or frustration and anger on the customers' side, and stress detection, disappointment mitigation or failure to provide the requested service on the operators' side. This paper describes a machine learning...
In this paper, we present a model for Turkish speech recognition. The model is syllable-based, where the recognition is performed through syllables as speech recognition units. The main goal of the model is to recognize as much as possible of a given continuous speech by identifying only a small set of syllables in the language. For that purpose, only the syllable types with a higher frequency are...
Lip reading technologies play a great role not only in image pattern recognition e.g. computer vision, but also in audio-visual pattern recognition e.g. bimodal speech recognition. However, it is a problem that the recognition accuracy is still significantly low, compared to that of speech recognition. Another problem lies which the performance degradation occurs in real environments. To improve the...
The hidden Markov model is supposed as the most common and effective method used in speech recognition for all languages including Vietnamese. However, this method is quite cumbersome and difficult to implement in many embedded systems that have limited resources. Dynamic Time Warping (DTW) method, whereas, has been in much study by many scientists and is proved to be simple and efficient for a relatively...
In text-independent speaker verification, we compare two sets of sentences with different text content for their tonal similarity to determine if they were due to the same speaker. Since the sentences are different, we may not have matching words to compare. However, the sentences are constructed from the same set of phonemes of the language used, including vowels and consonants. Generally speaking,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.