The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper addresses the problem of the automatic recognition of emotional states from speech recordings, especially those kind of emotions reflecting that the life or the human integrity are at risk. The paper compares the performance of two different systems: one being fed with speech signals recorded directly from the people (whole spectrum) and other one in which the speech signals are recorded...
To ensure a satisfactory QoE (Quality of Experience) and facilitate system design in speech recognition services, it is essential to establish a method that can be used to efficiently investigate recognition performance in different noise environments. Previously, we proposed a performance estimation method using the PESQ (Perceptual Evaluation of Speech Quality) as a spectral distortion measure....
This paper reports on the application of the dimensional emotion model in automatic emotional speech recognition. Using the perceptron rule in combination with acoustic features, an approach to speech-based emotion recognition is introduced, which can classify the utterance with respect to the valence-arousal (V-A) dimensions of its emotional content. The mapping of 5 discrete emotion classes onto...
Research in speech recognition area has made considerable progress in achieving the task with tremendous growth of technology. Speech rate is one of the important factors which affect the speech recognition accuracy. In the present work, training is performed on different speech rates (Normal, Slow and Fast) and testing also done on different rates of speech. Error rate will increase when the major...
Accent in speech is defined as a distinctive mode of pronunciation that is unique to a geographical region. In a similar way, we define accent in handwriting as distinctive writing characteristics that are unique to a group of people sharing a common native script. In other words, we postulate that a group of people with a common native script will share certain traits in their handwriting that can...
All of the previous syllable based Automatic Speech Recognizers (ASRs) for the Amharic language are built by training a separate acoustic model for each of the 196 distinctly pronounced Consonant-Vowel (CV) syllable. In this paper, we will demonstrate that a smaller number of acoustic models are sufficient to build a syllable based, speaker independent, continuous, Amharic ASR. It is built for weather...
A phoneme recognition system based on Discrete Wavelet Transforms (DWT) and Support Vector Machines (SVMs), is designed for multi-speaker continuous speech environments. Phonemes are divided into frames, and the DWTs are adopted, to obtain fixed dimensional feature vectors. For the multiclass SVM, the One-against-one method with the RBF kernel was implemented. To further improve the accuracies obtained,...
This paper presents the design of a digital hardware implementation based on Support Vector Machines (SVMs), for the task of multi-speaker phoneme recognition. The One-against-one multiclass SVM method, with the Radial Basis Function (RBF) kernel was considered. Furthermore, a priority scheme was also included in the architecture, in order to forecast the three most likely phonemes. The designed system...
The aim of speech training aid is to enhance the language abilities of the hearing-impaired Children. The traditional approach, utilization of photographs and teachers' gestures, is revealed to be an inefficient way. This study proposes a Speech training aid system (STAS) based on C# & Flash technology to instruct hearing-impaired children on how to improve their language abilities. An experimental...
Most emergency phones can be activated by a button press. However, to press the button requires the person needing help to touch the phone physically, which causes infeasibilities in practice: 1) it is hard for someone unfamiliar with the environment to locate an emergency phone quickly, and 2) even if the person knows where the phone is, she/he has to run a certain distance to reach it. Therefore,...
This paper proposes a novel approach to enhance the speech features in noise robustness for speech recognition. In the proposed approach, the speech feature time sequence is first converted into the modulation spectral domain via discrete Fourier transform (DFT). The magnitude part of the modulation spectrum is decomposed into overlapped non-uniform sub-band segments, and then each sub-band segment...
Current speech-controlled human computer interaction is purely based on spoken information. For a successful interaction, additional information such as the individual skills, preferences and actual affective state of the user are often mandatory. The most challenging of these additional inputs is the affective state, since affective cues are in general expressed very sparsely. The problem can be...
In this study, we utilized an improved version of the classical KNN algorithm which associates to each parameter from the features vectors weights according to their performance in the classification process. We obtained the recognition percents of emotions around 65–67%, for the Romanian language, on the SROL database, which are comparable with the results for other languages, with non-professional...
In this paper, we propose an lyrics-based classification approach. It estimates a mood of a song with only intro and refrain parts of lyrics. In general, the intro part creates a specific atmosphere of a song, and the chorus part is the strongest part of the song. The proposed method detects important features significantly associated with the mood of songs from the both parts. By calculating the...
Recently speaker recognition system became high interesting by researchers for both software and hardware solutions. Different technologies have been adopted to implement speaker recognition system that has performance with optimal time response with acceptable accuracy. Research progresses are going on to provide highly durable and precise recognition system that can be embedded into critical implementation...
This paper presents a novel approach to phonetic-based language identification (LID). Motivated by the assumption underlying phonotactic LID that accounting for permissible phone sequences supports the process of distinguishing one language from another, this paper presents a novel approach based on the automatic identification of phone sequences of different lengths unique to a language, which are...
The speech of cleft palate (CP) patients has typical characteristics. Hypernasality and low speech intelligibility are the primary characteristics for CP speech. In this work, an automatic evaluation of different levels of hypernasality and speech intelligibility algorithm for CP speech was proposed, in order to provide an objective tool for speech therapist. To identify different levels of hypernasality,...
Accent is a special trait of human speech that can deliver some information about a speaker's background. At the same time it is one of the profound factors that affects the intelligibility and performance of speech recognition systems (ASRs) if not delicately handled. Normally accent recognizer in the preceding stage offers subsystem training or adaptation strategy to improve the ASRs. Formant analysis...
In order to overcome the problem existing in original speech recognition (e.g. noise interruption and private data loss), many researchers have investigated to deal with these problems. Electromyography (EMG) from the muscles producing speech was used to replace a voiced signal. Similarly, we aim to develop EMG speech recognition based on Thai language. Tone is the important characteristic of this...
Isolated Speech recognition systems (ISRS) have been implemented using microprocessors, digital signal processors and FPGAs and have been reported in the literature. In this paper, the study and implementation of an ISRS using Cypress Programmable System on Chip (PSoC) is presented. For the implementation, PSoC5 containing the ARM Cortex-M3 CPU is used. Recognition performance is studied using three...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.