Search results

Items from 61 to 80 out of 2,284 results

chapter

The impact of vocabulary size and language model order on the polish whispery speech recognition

Piotr Kozierski, Talar Sadalla, Szymon Drgas, Adam Dabrowski, more

2017 22nd International Conference on Methods and Models in Automation and Robotics (MMAR) > 616 - 621

2017 22nd International Conference on Methods and Models in Automation and Robotics (MMAR)

The article presents studies on the automatic whispery speech recognition. In the performed research a new corpus with whispery speech has been used. The aim of studies presented in this paper was to check, how the vocabulary size and the language model order influence on the speech recognition quality. It has been concluded that even using recordings with 5,000 different words only it is possible...

chapter

Speaker recognition based on MFCC and BP neural networks

Yi Wang, Bob Lawlor

2017 28th Irish Signals and Systems Conference (ISSC) > 1 - 4

2017 28th Irish Signals and Systems Conference (ISSC)

Speaker recognition has been developed over many years and it comes with many different methods. MFCC is one of more the successful methods due to it being generally modeled on the human auditory system. It represents high success rate of recognition and strong robustness against noise in the lower frequency regions. However, in the higher frequency regions, it captures speaker characteristics information...

chapter

Voice conversion based on continuous frequency warping and magnitude scaling

Yuhang Ye, Bob Lawlor

2017 28th Irish Signals and Systems Conference (ISSC) > 1 - 6

2017 28th Irish Signals and Systems Conference (ISSC)

In this paper, we present a novel spectrum mapping method — Continuous Frequency Warping and Magnitude Scaling (CFWMS) for voice conversion under the Joint Density Gaussian Mixture Model (JDGMM) framework. JDGMM is a mature clustering technique that models the joint probability density of speech signals from paired speakers. The conventional JDGMM-based approaches morph the spectral features via least...

chapter

Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Pawel Rosciszewski, Jakub Kaliski

2017 International Conference on High Performance Computing & Simulation (HPCS) > 560 - 565

2017 International Conference on High Performance Computing & Simulation (HPCS)

In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training...

chapter

Variable sparsity regularization factor based SNMF for monaural speech separation

Yash Vardhan Varshney, Zia Ahmad Abbasi, Musiur Raza Abidi, Omar Farooq

2017 40th International Conference on Telecommunications and Signal Processing (TSP) > 342 - 345

2017 40th International Conference on Telecommunications and Signal Processing (TSP)

Factor of sparsity in a speech signal plays an important role in the speech processing. This paper proposed a method in which variable regularization factor of sparsity is applied for the mixed signal and used to separate the monaural speech signals. The sparsity regularization factor for individual training and testing signal was find using particle swarm optimization. Algorithm has been tested for...

chapter

Improvement of maintenance through speech interaction in cyber-physical production systems

J. Fischer, D. Pantforder, B. Vogel-Heuser

2017 IEEE 15th International Conference on Industrial Informatics (INDIN) > 290 - 295

2017 IEEE 15th International Conference on Industrial Informatics (INDIN)

A much discussed topic in the recent years is the interconnectedness of industrial plants in the field of Cyber-Physical Production Systems (CPPS). In the future, the data and aggregated information from various production plants will be available globally at any time. Particularly in maintenance, this could be a helpful information expansion for the maintenance staff, since maintenance information...

chapter

HMM based isolated word Nepali speech recognition

Manish K. Ssarma, Avaas Gajurel, Anup Pokhrel, Basanta Joshi

2017 International Conference on Machine Learning and Cybernetics (ICMLC) > 1 > 71 - 76

2017 International Conference on Machine Learning and Cybernetics (ICMLC)

This paper describes the implementation of HMM (Hidden Markov Model) based speaker independent isolated word Automatic Speech Recognition (ASR) system for Nepali Language, a commonly spoken language in Nepal. The system has been developed in python using numpy[1] and YAHMM[2] libraries. The system is trained in different Nepali words by collecting data from different speakers in room environment....

chapter

Automatic speech recognition performance for training on noised speech

Arkadiy Prodeus, Kateryna Kukharicheva

2017 2nd International Conference on Advanced Information and Communication Technologies (AICT) > 71 - 74

2017 2nd International Conference on Advanced Information and Communication Technologies (AICT)

Performances of some training techniques of automatic speech recognition system are compared in this paper. Speech recognition accuracy was used as measure of performance. Different kinds of outdoor and indoor noise were used for studying. It is shown the superiority of training on noised speech methods over the competitive technique of training on clear speech. It has been found that training by...

chapter

Multi-scale feature based convolutional neural networks for large vocabulary speech recognition

Tong Fu, Xihong Wu

2017 IEEE International Conference on Multimedia and Expo (ICME) > 1093 - 1098

2017 IEEE International Conference on Multimedia and Expo (ICME)

Deep learning has brought a breakthrough to the performance of speech recognition. The speech recognition systems based on deep neural networks have obtained the state-of-the-art performance on various speech recognition tasks. These systems almost utilize the Mel-frequency cepstral coefficients or the Mel-scale log-filterbank coefficients, which are based on short-time Fourier transform. Although...

chapter

Text-to-speech of a talking robot for interactive speech training of hearing impaired

Thanh Vo Nhu, Hideyuki Sawada

2017 10th International Conference on Human System Interactions (HSI) > 166 - 171

2017 10th International Conference on Human-System Interactions (HSI)

The authors are developing a talking robot which is a mechanical vocalization system modeling the human articulatory system. The talking robot is constructed with mechanical parts that are made by referring to human vocal organs biologically and functionally. In this study, a newly redesign artificial vocal cord is developed for the purpose of extending the speaking capability of the talking robot...

chapter

Single-channel speech separation based on robust sparse Bayesian learning

Zhe Wang, Guoan Bi, Xiumei Li

2017 13th IEEE International Conference on Control & Automation (ICCA) > 113 - 117

2017 13th IEEE International Conference on Control & Automation (ICCA)

This paper describes a novel algorithm to improve the performance of sparsity based single-channel speech separation(SCSS) problem based on compressed sensing which is an emerging technique for efficient data reconstruction. The conventional approach assumes the mixing conditions and source signals are stationary. For practical applications of audio source separation, however, we face the challenges...

chapter

Privacy-Preserving Understanding of Human Body Orientation for Smart Meetings

Indrani Bhattacharya, Noam Eshed, Richard J. Radke

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) > 284 - 292

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

We present a method for estimating the body orientation of seated people in a smart room by fusing low-resolution range information collected from downward pointed time-of-flight (ToF) sensors with synchronized speaker identification information from microphone recordings. The ToF sensors preserve the privacy of the occupants in that they only return the range to a small set of hit points. We propose...

chapter

Playback detection using machine learning with spectrogram features approach

Jerzy Dembski, Jacek Ruminski

2017 10th International Conference on Human System Interactions (HSI) > 31 - 35

2017 10th International Conference on Human-System Interactions (HSI)

This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on...

chapter

Evaluation of touchscreen assistive technology for visually disabled users

Berglind Fjola Smaradottir, Santiago Gil Martinez, Jarle Audun Haland

2017 IEEE Symposium on Computers and Communications (ISCC) > 248 - 253

2017 IEEE Symposium on Computers and Communications (ISCC)

Touchscreen assistive technology is designed to support speech interaction between visually disabled people and mobile devices, allowing the use of a choreography of gestures to interact with a touch user interface. This paper presents the evaluation of VoiceOver, a screen reader in Apple Inc. products, made in the research project Visually impaired users touching the screen- A user evaluation of...

chapter

Improving acoustic modeling using audio-visual speech

Ahmed Hussen Abdelaziz

2017 IEEE International Conference on Multimedia and Expo (ICME) > 1081 - 1086

2017 IEEE International Conference on Multimedia and Expo (ICME)

Reliable visual features that encode the articulator movements of speakers can dramatically improve the decoding accuracy of automatic speech recognition systems when combined with the corresponding acoustic signals. In this paper, a novel framework is proposed to utilize audio-visual speech not only during decoding but also for training better acoustic models. In this framework, a multi-stream hidden...

chapter

SpeeD's DNN approach to Romanian speech recognition

Alexandru-Lucian Georgescu, Horia Cucu, Corneliu Burileanu

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) > 1 - 8

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)

This paper presents the main improvements brought recently to the large-vocabulary, continuous speech recognition (LVCSR) system for Romanian language developed by the Speech and Dialogue (SpeeD) research laboratory. While the most important improvement consists in the use of DNN-based acoustic models, instead of the classic HMM-GMM approach, several other aspects are discussed in the paper: a significant...

chapter

Towards a continuous speech corpus for banking domain automatic speech recognition

George Suciu, Stefan-Adrian Toma, Romulus Cheveresan

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) > 1 - 6

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)

This paper presents the work done towards developing a speech corpus for Romanian, for automatic speech recognition for the banking domain. This work is done in the context of the Speech2Process project, which aims at creating a system which allows interaction between customers and agents in the contact center much easier. The application to use the banking corpus will provide automatic response to...

chapter

Detection and Management of Depression in Cancer Patients Using Augmented Reality Technologies, Multimodal Signal Processing and Persuasive Interfaces

Alexandros Roniotis, Haridimos Kondylakis, Manolis Tsiknakis

2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS) > 751 - 752

2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS)

This visual paper aims at proposing a framework for detecting depression in cancer patients using prosodic and statistical features extracted by speech, while chatting with a virtual coach.

chapter

Improving English Pronunciation Via Automatic Speech Recognition Technology

Meihui Li, Meiting Han, Zejia Chen, Yiling Mo, more

2017 International Symposium on Educational Technology (ISET) > 224 - 228

2017 International Symposium on Educational Technology (ISET)

This study presents a research study on applying ASR (Automatic Speech Recognition) technology in English pronunciation correction. We also discuss the relationship between ELF/ESL learners' self-improvement and English teacher's classroom teaching. The results show that ASR technology can help Chinese English learners improve their English pronunciation. The research aims to provide a new and practical...

chapter

Faculty Support for Effective Flipped Classrooms in Higher Education

Chiaki Iwasaki

2017 International Symposium on Educational Technology (ISET) > 261 - 267

2017 International Symposium on Educational Technology (ISET)

In this research, we conducted a questionnaire survey and interview on the current state of flipped classrooms for university faculty members in order to obtain teacher support methods. The survey showed that flipped classrooms are practiced with various subjects and class sizes, which indicates the necessity of support for a wide range of subjects. In the educational method, the authors found that...

Keywords:
TRAINING
SPEECH

Publication date

Set your own date range

Content availability

Available (2,274)
None (10)

Keywords

SPEECH RECOGNITION (1,071)
HIDDEN MARKOV MODELS (916)
FEATURE EXTRACTION (648)
ACOUSTICS (482)
SPEECH PROCESSING (312)
DATABASES (292)
SPEAKER RECOGNITION (286)
MEL FREQUENCY CEPSTRAL COEFFICIENT (249)
SUPPORT VECTOR MACHINES (248)
ACCURACY (241)
DATA MODELS (184)
SPEECH SYNTHESIS (170)
ARTIFICIAL NEURAL NETWORKS (168)
TESTING (168)
COMPUTATIONAL MODELING (165)
TRAINING DATA (161)
NEURAL NETWORKS (155)
NATURAL LANGUAGE PROCESSING (151)
DATA MINING (143)
NOISE MEASUREMENT (127)
VECTORS (127)
ADAPTATION MODELS (125)
NOISE (121)
EMOTION RECOGNITION (114)
AUTOMATIC SPEECH RECOGNITION (113)
SIGNAL TO NOISE RATIO (105)
ADAPTATION MODEL (102)
HIDDEN MARKOV MODEL (102)
MATHEMATICAL MODEL (101)
GAUSSIAN PROCESSES (100)
SPEECH ENHANCEMENT (91)
CONTEXT (89)
KERNEL (88)
DECODING (86)
CLASSIFICATION ALGORITHMS (84)
LEARNING (ARTIFICIAL INTELLIGENCE) (82)
HMM (80)
GAUSSIAN MIXTURE MODEL (79)
NIST (77)
ESTIMATION (72)
ROBUSTNESS (72)
DICTIONARIES (71)
MFCC (70)
CEPSTRAL ANALYSIS (68)
SPEAKER VERIFICATION (68)
VOCABULARY (68)
MAXIMUM LIKELIHOOD ESTIMATION (67)
CORRELATION (62)
PATTERN CLASSIFICATION (62)
SPEECH CODING (62)
MACHINE LEARNING (60)
MICROPHONES (60)
ERROR ANALYSIS (59)
SPEAKER IDENTIFICATION (59)
NEURAL NETS (58)
STATISTICAL ANALYSIS (58)
DEEP NEURAL NETWORKS (56)
SUPPORT VECTOR MACHINE (55)
TRANSFORMS (55)
VISUALIZATION (55)
ALGORITHM DESIGN AND ANALYSIS (53)
SPECTROGRAM (51)
DEEP NEURAL NETWORK (50)
TEXT ANALYSIS (48)
CLUSTERING ALGORITHMS (47)
OPTIMIZATION (47)
GMM (46)
STANDARDS (46)
SVM (46)
VOICE CONVERSION (46)
CONTEXT MODELING (45)
NEURONS (45)
CONFERENCES (43)
HUMANS (43)
VECTOR QUANTIZATION (43)
PREDICTIVE MODELS (42)
EDUCATIONAL INSTITUTIONS (41)
ACOUSTIC SIGNAL PROCESSING (40)
PRINCIPAL COMPONENT ANALYSIS (40)
RECURRENT NEURAL NETWORKS (40)
PROBABILITY (39)
ENTROPY (38)
NATURAL LANGUAGES (38)
DISCRIMINATIVE TRAINING (37)
SIGNAL PROCESSING (37)
SIGNAL PROCESSING ALGORITHMS (36)
SUPPORT VECTOR MACHINE CLASSIFICATION (35)
AUDITORY SYSTEM (34)
DECISION TREES (34)
DETECTORS (34)
SIGNAL CLASSIFICATION (34)
LATTICES (33)
NEURAL NETWORK (33)
EQUATIONS (32)
REVERBERATION (32)
TRAJECTORY (32)
COMPUTERS (31)
JOINTS (31)
more

INFONA - science communication portal

Search results

The impact of vocabulary size and language model order on the polish whispery speech recognition

Speaker recognition based on MFCC and BP neural networks

Voice conversion based on continuous frequency warping and magnitude scaling

Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Variable sparsity regularization factor based SNMF for monaural speech separation

Improvement of maintenance through speech interaction in cyber-physical production systems

HMM based isolated word Nepali speech recognition

Automatic speech recognition performance for training on noised speech

Multi-scale feature based convolutional neural networks for large vocabulary speech recognition

Text-to-speech of a talking robot for interactive speech training of hearing impaired

Single-channel speech separation based on robust sparse Bayesian learning

Privacy-Preserving Understanding of Human Body Orientation for Smart Meetings

Playback detection using machine learning with spectrogram features approach

Evaluation of touchscreen assistive technology for visually disabled users

Improving acoustic modeling using audio-visual speech

SpeeD's DNN approach to Romanian speech recognition

Towards a continuous speech corpus for banking domain automatic speech recognition

Detection and Management of Depression in Cancer Patients Using Augmented Reality Technologies, Multimodal Signal Processing and Persuasive Interfaces

Improving English Pronunciation Via Automatic Speech Recognition Technology

Faculty Support for Effective Flipped Classrooms in Higher Education

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options