Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
This paper investigates the Malay speaker identification using Neural Networks. Speech database was developed with five speakers as trainers and five speakers as imposters. The speech training set included 30 vowel sounds of five trainer speakers. The test set included 30 vowel sounds from the five trainers and 30 vowel sounds from five imposters. The speech sounds were sampled at 20 kHz with 16 bit...
This paper deals with Automatic Speaker Recognition in a binaural context. Such a problematic, not so widely dealt with within the speech processing community, can have potential applications in humanoid robots where speech can be used as the most natural interface between humans and robots. The proposed recognition system is based on parallel Predictive Neural Networks exploiting MFCCs (Mel Frequency...
In this paper we present a new method for nonlinear compensation of mismatches, e.g. additive noise, on clean and noisy speech recognition. We were inspired by the human recognition system in development and implementation of a new Bidirectional Neural Network (BNN). This procedure, results in improvement of input features and consequently increasing the overall recognition accuracy. The feedforward...
According to multidimensional emotion space model, an improved queuing voting algorithm was proposed to implement the fusion among multiple emotion classifiers for a good emotion recognition result. Firstly, three kinds of classifier were designed based on hidden Markov model (HMM) and artificial neural network (ANN). Then, the improved queuing voting algorithm was used to fuse them. Experimental...
There are many speech recognition applications that use vowels phonemes. Among them are speech therapy systems that improve utterances of word pronunciation especially to children. There are also systems that teach hearing impaired person to speak properly by pronouncing words with a good degree of intelligibility. All of these systems require high degree of vowel recognition capability. This paper...
In this paper, we describe a novel conversion method for voice conversion (VC). Artificial Neural Network (ANN) model is employed for performing joint spectrum and pitch conversion between speakers. The conventional method converts spectral parameters and pitch independently. Those separate transformations lead to an unsatisfactory speech quality. The main reason maybe that F0 sequences are usually...
Most of the research works in Information Extraction focus only on written language processing, in which a few are devoted to the study of Spoken Language Information Extraction. This paper discusses a novel technique for recognition of the isolated question words from Malayalam (one of the south Indian languages) speech query. We have created and analyzed a database consisting of 250 isolated question...
Information Retrieval deals with the easy access to the information based on the user's request, which will be presented in the form of a query. A dialog system that understands spoken natural language queries asks for further information if necessary and produces an answer to the speaker's query. Most of the research works in Information Extraction focus only on written language processing, in which...
This paper deals with the problem of training an Artificial Neural Network (ANN) when the data sets are very imbalanced. Most learning algorithms, including ANN, are designed for well-balanced data and do not work properly on imbalanced ones. Of the approaches proposed for dealing with this problem, we are interested in the re-sampling ones, since they are algorithm-independent. We have recently proposed...
This paper explores the Linear Prediction (LP) residual of speech signal for characterizing the basic emotions. The emotions used in this study are anger, compassion, disgust, fear, happy, neutral, sarcastic and surprise. LP residual is derived by inverse filtering of the speech signal, and the process is known as LP analysis. LP residual mainly contains higher order relations among the samples. For...
In the present text, we deal with the problem of classification of speech emotion. Problems of speech processing are addressed through the use of artificial neural networks (ANN). The results can be use for two research projects - for prosody modelling and for analysis of disordered speech. The first ANN topology discussed is the multilayer neural network (MLNN) with the BPG learning algorithm, while...
Particle swarm optimization (PSO) is an algorithm modelled on swarm intelligence that finds a solution to an optimization problem in a search space. In this paper, a PSO-based artificial neural network algorithm is proposed to automatically grading the learning results. Basically, the PSO algorithm is utilized to adjust the connection weights of the selected ANN topology. Taken mandarin learning as...
We have created and analyzed an elicited emotional database consisting of 340 emotional speech samples under four different emotions neutral, happy, sad and anger. Malayalam (one of the south Indian languages) was used for the experiment. Daubechies8 wavelet was used for feature extraction and artificial neural network was used for pattern recognition. An overall recognition accuracy of 72.055% obtained...
In this paper, we deal with the problem of speaker segmentation. This speciality consists in splitting the audio document into homogeneous areas. Each area is attributed to one speaker. Speaker segmentation (or speaker change detection) consists in detecting the points where the speaker identity changes, in a multi-speaker audio stream. These points or times are called ??Break Points??.
Automatic Emotion Recognition (AER) from speech is one of the most interested research domains for the scientific world. AER simply means to make a machine able to recognize the different emotions from speech. We have created and analyzed an elicited database consisting of 700 utterances under four different emotional classes such as neutral happy sad and anger. Malayalam (One of the south Indian...
This paper analyzes the ability of several measurements to quantify the reverberation effect in speech signals. We consider an intrusive scheme, in which the clean and reverberated signals are available, allowing one to estimate the corresponding room impulse response (RIR) signal. An artificial neural network (ANN) is trained for all features and used in a regression approach to estimate the human...
The speech signal is an important tool for conveying information between humans; at the same time, it is an indicator of a speaker's emotions. In this paper, the automatic identification of affect from speech containing spontaneously expressed (not acted) emotions within different environments was investigated. The teager energy operator-perceptual wavelet packet (TEO-PWP) features as well as the...
This paper reports a comparative study between two identification engines to identify speakers automatically from their voices when speaking spontaneously in Arabic. The first engine is based on the continuous hidden Markov models (CHMMs) while the second one is based on the artificial neural networks (ANNs). The Mel frequency cepstral coefficients (MFCCs) were selected to describe the speech signal...
Artificial neural network (ANN) models based on static features vector as well as normalized temporal features vector, were used to recognize emotion state from speech. Moreover, relative features obtained by computing the changes of acoustic features of emotional speech relative to those of neutral speech were adopted to weaken the influence from the individual difference. The methods to relativize...
Indian languages are syllabic in nature where many syllables are found common across its languages. This motivates us to build a global syllable set by combining multiple language syllables to build a synthesizer which can borrow units from a different language when the required syllable is not found. Such synthesizer make use of speech database in different languages spoken by different speakers,...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.