Wyniki wyszukiwania

Pozycje od 21 do 40 spośród 154 wyników

Poprzednia

Następna

rozdział

DNN based detection of pronunciation erroneous tendency in data sparse condition

Yingming Gao, Yanlu Xie, Ju Lin, Jinsong Zhang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Detecting pronunciation erroneous tendency (PET) can provide second languages learners with detailedly instructive feedbacks in the computer aided pronunciation training (CAPT) systems. Due to the data sparseness, DNN-HMM achieved limited improvement over GMM-HMM in our previous work. Instead of directly employing DNN-HMM to detect PETs, this paper investigated how to further improve the performance...

rozdział

Research on the recognition of isolated Chinese lyrics in songs with accompaniment based on deep belief networks

Juanjuan Cai, Nana Wang, Hui Wang, Bing Zhu

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 535 - 540

2016 IEEE 13th International Conference on Signal Processing (ICSP)

Lyrics are an important part of songs. Lyrics recognition is the basis of retrieving songs and recognizing the content of songs, which is of great value. At present, the research of speech recognition has made great progresses. But there are still difficulties in recognition of lyrics in songs with accompaniment. Related research is generally lacking, especially for Chinese lyrics in songs with accompaniment,...

rozdział

A Comparative Study of Different Speech Features for Arabic Phonemes Classification

Ali Meftah, Yousef A. Alotaibi, Sid-Ahmed Selouani

2016 European Modelling Symposium (EMS) > 47 - 52

2016 European Modelling Symposium (EMS)

This paper presents the work related to phonetical analysis of classical Arabic speech. Hidden Markov model classifier is applied on Arabic phonemes. For the purpose of this work, a new classical Arabic speech corpus is created. The corpus is based on selected recordings of recitations of The Holy Quran. A number of acoustic features are analyzed and compared. Those are: linear predictive coding (LPC)...

rozdział

Speech recognition using Support Vector Machines

Kamil Aida-zade, Anar Xocayev, Samir Rustamov

2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT) > 1 - 4

2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT)

In this article we applied Support Vector Machines to acoustic model of Speech Recognition System based on MFCC and LPC features for Azerbaijani DataSet. This DataSet has been used for speech recognition by Multilayer Artificial Neural Network and achieved some results. The main goal of this work is applying SVM techniques to the Azerbaijan Speech Recognition System. The variety of results of SVM...

rozdział

Exploring tonal information for Lhasa dialect acoustic modeling

Jian Li, Hongcui Wang, Longbiao Wang, Jianwu Dang, więcej

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Detailed analysis of tonal features for Tibetan Lhasa dialect is an important task for Tibetan automatic speech recognition (ASR) applications. However, it is difficult to utilize tonal information because it remains controversial how many tonal patterns the Lhasa dialect has. Therefore, few studies have focused on modeling the tonal information of the Lhasa dialect for speech recognition purpose...

rozdział

Recognition of infant's emotions and needs from speech signals

Xuan Zhou, Hongzhi Hu, Lina Wei, Jian Wang, więcej

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 4620 - 4625

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Speech is not only a way for infants under one year of age to communicate with the outside world, but also the important information source to reflect their emotions and needs, as well as health status and mental level. In order to explore the intelligent machine technology for understanding infant's emotions and needs from speech signals, and therefore help parents in child rearing, this paper studied...

rozdział

Performance comparison of MFCC based bangla ASR system in presence and absence of third differential coefficients

Sudipto Debnath, Fatema-E-Jannat, Susmita Saha, Mohammad Tarik Aziz, więcej

2016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT) > 1 - 6

2016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT)

Present Mel Frequency Cepstral Coefficient (MFCC) based Bangla Automatic Speech Recognition (ASR) systems are mostly implemented with delta and acceleration coefficients. With delta and acceleration coefficients of MFCC and the log energy, a vector set of 39 dimensions is obtained per 10ms. In this paper, our objective is to observe the effect of third differential coefficients on the performance...

rozdział

Study of speech features robustness for speaker verification application in noisy environments

Mohsen Mohammadi, Hamid Reza Sadegh Mohammadi

2016 8th International Symposium on Telecommunications (IST) > 489 - 493

2016 8th International Symposium on Telecommunications (IST)

This paper presents a comparative study and evaluation of the performances of four speech feature vectors, i.e., MFCC, IMFCC, LFCC, and PNCC in a speaker verification system based on speaker modeling through the Gaussian mixture model (GMM) under clean and noisy speech conditions. The TIMIT and NOISEX92 dataset were used in implementing the tests for speech signal and noise, respectively. The evaluation...

rozdział

Development of multilingual phonetic engine for four Indian languages

Lincy Babykutty, Anu George, Leena Mary

2016 International Conference on Next Generation Intelligent Systems (ICNGIS) > 1 - 3

2016 International Conference on Next Generation Intelligent Systems (ICNGIS)

Phonetic Engine (PE) is a system that is used to determine the sequence of phones in a spoken utterance. In order to transcribe the speech database, International Phonetic Alphabet (IPA) is used. This work focuses on developing multilingual PE for four Indian languages namely, Bengali, Hindi, Urdu and Telugu. The number of languages can be increased to any number. For developing the PE, read speech...

rozdział

Hybridization process for text-independent speaker identification based on vector quantization model

Mohammed Djeghader, Qin Huang

2016 IEEE International Conference on Signal and Image Processing (ICSIP) > 596 - 601

2016 IEEE International Conference on Signal and Image Processing (ICSIP)

This paper examines performances of an independent Speaker Identification System (SIS) based on a template model using a Vector Quantization (VQ) method. Template model is characterized by the implementation platform based on a comparison process where the speaker model with the smallest distortion score is identified. In order to analyze the decision of the system and its confidence, a thresholding...

rozdział

Automatic speech recognition of isolated words in Hindi language

Priyanka Wani, U. G. Patil, D.S. Bormane, S.D. Shirbahadurkar

2016 International Conference on Computing Communication Control and automation (ICCUBEA) > 1 - 6

2016 International Conference on Computing Communication Control and automation (ICCUBEA)

Speech recognition is a broad subject as speech is natural way of communication. The acoustic and language model for this system are available but mostly in English language [15]. In India there are so many peoples who can't understand or speak English. So the speech recognition system in English language is of no use for these people. Here we presented Isolated Hindi words recognition system which...

rozdział

Word based dialect classification using extreme learning machines

Muhammad Rizwan, Babafemi O. Odelowo, David V. Anderson

2016 International Joint Conference on Neural Networks (IJCNN) > 2625 - 2629

2016 International Joint Conference on Neural Networks (IJCNN)

It is well known that the variability in speech caused by the accents or dialects of speakers degrades the performance of speech recognition systems. One method to prevent this degradation is to correctly identify the accent or dialect of a speaker so that the putative system can be designed to use this information. In this paper, we apply the extreme learning machine, an efficient neural network...

rozdział

Isolated Chinese lyrics with accompaniment recognition based on SVM

Juanjuan Cai, Na Li, Hui Wang, Bing Zhu

2016 International Conference on Audio, Language and Image Processing (ICALIP) > 475 - 481

2016 International Conference on Audio, Language and Image Processing (ICALIP)

The speech recognition technology is one of the hot spots in the field of audio technology. For the recognition of the lyrics with the accompaniment, there are two commonly used methods, one is applying automatic speech recognition technology to singing recognition, the other way is using sound classification, extracting audio features, and then using pattern matching classifier for classification...

rozdział

Holy Qur'an speech recognition system distinguishing the type of recitation

Bilal Yousfi, Akram M. Zeki

2016 7th International Conference on Computer Science and Information Technology (CSIT) > 1 - 6

2016 7th International Conference on Computer Science and Information Technology (CSIT)

The act of reading Qur'an and pronouncing its sound dwells on the type of recitation. These are referring to the recitation of Warsh or the recitation of Hafss. It's very important to recognise the type of recitations, especially with the diversity and the spread of Qira'at in the world. This research presents a speech recognition system that distinguishes between the different types of the Qur'an...

rozdział

Emotion classification using residual sinusoidal peak amplitude

Suman Deb, S. Dandapat

2016 International Conference on Signal Processing and Communications (SPCOM) > 1 - 5

2016 International Conference on Signal Processing and Communications (SPCOM)

In this work, a new feature, residual sinusoidal peak amplitude (RSPA), is proposed for emotion classification. The RSPA feature is evaluated from the LP residual of the speech signal using sinusoidal model. Residual signal is a major source of the excitation and it is expected that emotional information can be well manifested in the residual signal. The effectiveness of the proposed feature is explored...

rozdział

Analysis of hierarchical bottleneck framework for improved phoneme recognition

Mohammadi Zaki, Hardik B. Sailor, Hemant A. Patil

2016 International Conference on Signal Processing and Communications (SPCOM) > 1 - 5

2016 International Conference on Signal Processing and Communications (SPCOM)

In this paper, an attempt is made to examine and evaluate the effect of bottleneck and the hierarchical bottleneck (HBN) framework in MLP-based Automatic Speech Recognition (ASR) systems. In particular, the bottleneck and hierarchical bottleneck framework are analyzed using Volterra series. Experiments on several architectures with incorporation of systematic hierarchical and bottleneck properties...

rozdział

Speech Recognition in a Multi-speaker Environment by Using Hidden Markov Model and Mel-frequency Approach

Junzo Watada, Hanayuki

2016 Third International Conference on Computing Measurement Control and Sensor Network (CMCSN) > 80 - 83

2016 Third International Conference on Computing Measurement Control and Sensor Network (CMCSN)

The sound is a useful and versatile form of communication, where each sound have characteristics and levels of different frequency. Sound serves two basic functions for people around the world: signaling and communication. Several problems are found in sounds identifying, like pitch, velocity, and accuracy of processing voice data. The motivation of this research is to recognize and analyze human...

rozdział

Machine learning paradigms for speech recognition of an Indian dialect

N. D. Londhe, M. K. Ahirwal, P. Lodha

2016 International Conference on Communication and Signal Processing (ICCSP) > 780 - 786

2016 International Conference on Communication and Signal Processing (ICCSP)

Present era is full of speech recognition based services and products. The machine learning paradigms is at the centre stage of speech recognition methodology. Automatic speech recognition (ASR) technology has vastly evolved in recent years including emerging applications in mobile computing, natural user interface, and man-machine assistive technology. In this paper, it's the first time we are presenting...

rozdział

Emotional speaker recognition in simulated and spontaneous context

Asma Mansour, Zied Lachiri

2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) > 776 - 781

2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)

An interesting issue to be considered up to now in emotional speaker recognition system is the context in which speech database used to develop and evaluate the performance of the system. So, we propose and assess an emotional speaker recognition system based on different feature extraction methods, focusing on the diversities between simulated and natural emotional speech databases(BERLIN and IEMOCAP)...

rozdział

Correlative consideration concerning feature extraction techniques for speech recognition — A review

Arshpreet Kaur, Amitoj Singh, Virender Kadyan

2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) > 1 - 4

2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT)

This paper frames co-relation on three feature extraction techniques in ASR system. As compared to primarily used technique called MFCC (Mel Frequency Cepstral Coefficients), PNCC (Power Normalized Cepstral Coefficients) obtains impressive advancement in noisy speech recognition due of its inhibition in high frequency spectrum for human voice. The techniques differ in the way as MFCC uses traditional...

Poprzednia

Następna

Opcje filtrowania

Zbiór danych:
ieee
Słowa kluczowe:
FEATURE EXTRACTION
MEL FREQUENCY CEPSTRAL COEFFICIENT
SPEECH
HIDDEN MARKOV MODELS

Data publikacji

Ustaw własny zakres dat

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Typ publikacji

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu