The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The aim of our proposed research work is to identify language of spoken utterance using visual speech recognition and include Marathi language in language identification (LID) system. In this paper we have focused on the task of identifying first three digits in Marathi language. For this first Lips are extracted from video frames of face images and then landmark points on the lips are detected. Then...
In this paper, a text independent speaker recognition system based on Gaussian mixture models (GMM) was developed with a specific focus on the use of a voice activated detector (VAD) algorithm in the training and testing. At the training level, a modified estimation/maximization (EM) algorithm is used. It is less prone to get trapped around a local maximum and so, it will have more chance to converge...
The number of dementia patients has increased in recent years. The burden on caregivers has also increased. Nevertheless, no established treatment for dementia exists. Controlling dementia progression is an important goal of dementia treatment. The reminiscence method is one means of suppressing dementia progression. In the reminiscence method, a caregiver talks with a dementia patient. However, an...
Recent studies have shown that phase information contains speaker-dependent characteristics and is effective for speaker recognition. In this paper, we summarize a robust phase feature extracted from Fourier spectrum (including pitch non-synchronized phase information and pseudo-pitchsynchronized phase information) and its application for speaker recognition for different speaking rate speech and...
The accuracy of object recognition has been greatly improved due to the rapid development of deep learning, but the deep learning generally requires a lot of training data and the training process is very slow and complex. We propose an incremental object recognition system based on deep learning techniques and speech recognition technology with high learning speed and wide applicability. The system...
This paper describes SIIP (Speaker Identification Integrated Project) a high performance innovative and sustainable Speaker Identification (SID) solution, running over large voice samples database. The solution is based on development, integration and fusion of a series of speech analytic algorithms which includes speaker model recognition, gender identification, age identification, language and accent...
Effective presentation skills can help to succeed in business, career and academy. This paper presents the design of speech assessment during the oral presentation and the algorithm for speech evaluation based on criteria of optimal intonation. As the pace of the speech and its optimal intonation varies from language to language, developing an automatic identification of language during the presentation...
The goal of this work is to validate the impact of natural elicitation of emotions by the speakers during the development of speech emotion databases for Malayalam language. The work also proposes a Gaussian Mixture Model-Deep Belief Networks (GMM-DBN) based speech emotion recognition system. To test the effect of emotion elicitation by the speakers, two independent datasets with emotionally biased...
The emotional database can be classified as spontaneous and simulated emotions. Spontaneous emotions can be identified based on the two parameters 1) Arousal and 2) Valence values represented in a two dimensional plane. Arousal measures how calming or exciting the information is, whereas valence measures postive or negative affectivity of information. The objective of the paper is to predict the arousal...
Speaker verification based on phonetic-acoustic approach and text-dependent framework has been applied for forensic purposes in Indonesian court since 2008. In order to accelerate the speaker verification process, an automatic text-independent system is developed. This automatic system employs MFCC features and GMM speaker modeling, a standard and simple approach used in automatic speaker recognition...
The Throat Microphone (TM) is a non-acoustic device, relying on the vibrations of vocal folds rather than the audible sound produced. Correctly capturing vocal fold vibrations is difficult due to poor signal representation capabilities. The system recognizes the TM vibrations and produces the corresponding speech sound. This is done by extracting features from the spectrum of the TM vibrations and...
The increasing role of spoken language interfaces in human-computer interaction applications has created conditions to facilitate a new area of research — namely recognizing the emotional state of the speaker through speech signals. This paper proposes a text independent method for emotion classification of speech signals used for the recognition of the emotional state of the speaker. Different feature...
The aim of this research is to design an implementation of the speech recognition system to control the speed of a DC motor. The Linear Predictive Coding (LPC) method is used in the speed recognition system, tuned by the Adaptive Neuro-Fuzzy Inference Systems (ANFIS) method. There are 5 (five) samples of voice signals in Bahasa Indonesia recognized by this system, i.e.: “Nyala”, “Lambat”, “Sedang”,...
In-home IoT devices play a major role in healthcare systems as smart personal assistants. They usually come with a voice-enabled feature to add an extra level of usability and convenience to elderly, disabled people, and patients. In this paper, we propose an efficient and privacy-preserving voice-based search scheme to enhance the efficiency and the privacy of in-home healthcare applications. We...
In this study, an emotional speech database called Hanbat Emotional Database (HEMO) was constructed using movie and drama scenes in which emotion is abundantly expressed by professional actors. HEMO consists of 454 speech samples classified into seven emotion categories such as anger, happiness, sadness, disgust, surprise, fear, and neutral. In order to evaluate the performance of HEMO, consistent...
The Taiwan Mandarin Radio Speech Corpus consists of roughly 300 (and growing) hours of audio recordings, selected from Taiwan's National Education Radio (NER) archive. The corpus includes speech from hundreds of speakers and various speech styles (spontaneous conversational and read news). This corpus provides a rich resource for research in speech and automatic speech recognition (ASR). In this paper,...
The paper presents the emotions recognition for positive and negative emotions for Romanian language. The main purpose of this study is to highlight how emotions are recognized if it is not wanted to identify with precision the expressed emotion, but the emotion in general: positive, negative or neutral. This can be useful for a human-machine interface. The positive emotions were recognized with an...
Emotions play a key role in cognitive processes, particularly in learning. Educators should know the emotional state of each student during a teaching activity. They must help students to experiment, interact and explore new topics and constructs. Students must feel in a state that maximize their performance. To know the emotional state of student, we need an emotion recognition system. It can be...
In telecommunication industry, fraud becomes a serious problem that affects telecommunications service providers all around the world. As a significant amount of revenue losses to fraud every year, so an efficient system to detect fraud activities is greatly required. A well-known fraud which affects GSM and PSTN service providers is Bypass fraud. It is used to avoid a charge of international calls...
Embedded dictation, i.e. recognizing vocal commands in noisy environments, with good accuracy and using low complexity implementations is a desirable task with many applications. Such applications include automotive infotainment solutions particularly when no connectivity is available, personal assistants including embedded dictation solutions for disabled people, and so on. This paper reports our...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.