Advanced search

Advanced search in people

From:

To:

Items from 141 to 160 out of 2,970 results

1 ...
5
6
7
8
9
10
11

chapter

DNN acoustic models for dysarthric speech

Seeram Tejaswi, S Umesh

2017 Twenty-third National Conference on Communications (NCC) > 1 - 4

2017 Twenty-third National Conference on Communications (NCC)

In this paper, we investigate various training methods for building deep neural network (DNN) based acoustic models for dysarthric speech data. Methods like multitask learning, knowledge distillation and model adaptation, which overcome data sparsity and model over-fitting problems are employed to study the merits of each method. In Knowledge distillation framework, some privilege information in addition...

chapter

Acoustic and language modeling for children's read speech assessment

Hitesh Tulsiani, Prakhar Swarup, Preeti Rao

2017 Twenty-third National Conference on Communications (NCC) > 1 - 6

2017 Twenty-third National Conference on Communications (NCC)

Automatic speech recognition can be used to evaluate the accuracy of read speech and thus serve a valuable role in literacy development by providing the needed feedback on reading skills in the absence of qualified teachers. Given the known limitations of ASR in the face of insufficient task-specific training data, the selection of acoustic and language modeling strategies can play a crucial role...

chapter

Towards bootstrapping Acoustic Models for resource poor Indian languages

Prabhat Pandey, Praful Hebbar, Prashant Borole, Sandeep Satpal, more

2017 Twenty-third National Conference on Communications (NCC) > 1 - 4

2017 Twenty-third National Conference on Communications (NCC)

There are several challenges while building Automatic Speech Recognition (ASR) system for low resource languages such as Indic languages. One problem is the access to large amounts of training data required to build Acoustic Models (AM) from scratch. In the context of Indian English, another challenge encountered is code-mixing as many Indian speakers are multilingual and exhibit code-mixing in their...

chapter

Implicit language identification system based on random forest and support vector machine for speech

Manish Gupta, Shambhu Shankar Bharti, Suneeta Agarwal

2017 4th International Conference on Power, Control & Embedded Systems (ICPCES) > 1 - 6

2017 4th International Conference on Power, Control & Embedded Systems (ICPCES)

Speech uttered by the human beings contains the information about speakers, languages and contents. Language of uttered speech can easily be identified by extracting the language specific information from it. Identification of language of speech is known as Language Identification (LID). Identification of language from speech is helpful in its translation, speech recognition and speech activated automatic...

chapter

Presentation slides generation from scientific papers using support vector regression

S. Syamili, Anish Abraham

2017 International Conference on Inventive Communication and Computational Technologies (ICICCT) > 286 - 291

2017 International Conference on Inventive Communication and Computational Technologies (ICICCT)

As the presentation slides are of vital importance in a person's career, a significant amount of time is spent for its preparation. An automatic paper-summarizer will reduce the amount of time and human effort. Presently, tools exist for formatting and designing themes for slides, but not for content generation. This paper proposes a summarization system that automatically generates presentation slides...

chapter

A novel face recognition method based on one state of discrete Hidden Markov Model

Hameed R. Farhan, Mahmuod H. Al-Muifraje, Thamir R. Saeed

2017 Annual Conference on New Trends in Information & Communications Technology Applications (NTICT) > 252 - 257

2017 Annual Conference on New Trends in Information & Communications Technology Applications (NTICT)

The trend for about twenty years, the research regarding the number of states in Hidden Markov Model (HMM) was mainly aimed at increasing it in order to ensure the robustness of the face recognition system. In this paper, a novel face recognition method is presented based on one state of discrete HMM, where it seemed impossible in the past. Contrary to other approaches that use the three parameters...

chapter

Radio-browsing for developmental monitoring in Uganda

Raghav Menon, Armin Saeb, Hugh Cameron, William Kibira, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5795 - 5799

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We consider the extraction of information from broadcast radio speech in Uganda for the purposes of informing relief and development programmes by the United Nations. Although internet penetration in Uganda is low, mobile phones are ubiquitous and have made radio a vibrant medium for interactive public discussion. Vulnerable groups make use of radio to discuss issues related to, for example, agriculture,...

chapter

Automatic node selection for Deep Neural Networks using Group Lasso regularization

Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, Shigeru Katagiri

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5485 - 5489

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We examine the effect of the Group Lasso (gLasso) regularizer in selecting the salient nodes of Deep Neural Network (DNN) hidden layers by applying a DNN-HMM hybrid speech recognizer to TED Talks speech data. We test two types of gLasso regularization, one for outgoing weight vectors and another for incoming weight vectors, as well as two sizes of DNNs: 2048 hidden layer nodes and 4096 nodes. Furthermore,...

chapter

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Cheng-Kuan Wei, Cheng-Tao Chung, Hung-Yi Lee, Lin-Shan Lee

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5165 - 5169

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

It is well known that recognizers personalized to each user are much more effective than user-independent recognizers. With the popularity of smartphones today, although it is not difficult to collect a large set of audio data for each user, it is difficult to transcribe it. However, it is now possible to automatically discover acoustic tokens from unlabeled personal data in an unsupervised way. We...

chapter

Advances in all-neural speech recognition

Geoffrey Zweig, Chengzhu Yu, Jasha Droppo, Andreas Stolcke

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4805 - 4809

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly...

chapter

Lyric recognition in monophonic singing using pitch-dependent DNN

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 326 - 330

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...

chapter

Topic identification of spoken documents using unsupervised acoustic unit discovery

Santosh Kesiraju, Raghavendra Pappagari, Lucas Ondel, Lukas Burget, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5745 - 5749

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the application of unsupervised acoustic unit discovery for topic identification (topic ID) of spoken audio documents. The acoustic unit discovery method is based on a non-parametric Bayesian phone-loop model that segments a speech utterance into phone-like categories. The discovered phone-like (acoustic) units are further fed into the conventional topic ID framework. Using...

chapter

Learning a hierarchical spatio-temporal model for human activity recognition

Wanru Xu, Zhenjiang Miao, Xiao-Ping Zhang, Yi Tian

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1607 - 1611

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent works have shown that hierarchical models lead to significant improvement in human activity recognition, which can not only enhance descriptive capability, but also improve discriminative power. However, most existing methods exploit just one of the two advantages. In this paper, a new hierarchical spatio-temporal model (HSTM) is proposed to integrate feature learning into two-layer hierarchical...

chapter

The 2016 BBN Georgian telephone speech keyword spotting system

Tanel Alumae, Damianos Karakos, William Hartmann, Roger Hsiao, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5755 - 5759

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we describe the 2016 BBN conversational telephone speech keyword spotting system; the culmination of four years of research and development under the IARPA Babel program. The system was constructed in response to the NIST Open Keyword Search (OpenKWS) evaluation of 2016. We present our technological breakthroughs in building top-performing keyword spotting processing systems for new...

chapter

Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding

Su Zhu, Kai Yu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5675 - 5679

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep learning. In the sequence labelling task, the input and output sequences are aligned word by word, while the...

chapter

Drum transcription from polyphonic music with recurrent neural networks

Richard Vogl, Matthias Dorfer, Peter Knees

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 201 - 205

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic drum transcription methods aim at extracting a symbolic representation of notes played by a drum kit in audio recordings. For automatic music analysis, this task is of particular interest as such a transcript can be used to extract high level information about the piece, e.g., tempo, downbeat positions, meter, and genre cues. In this work, an approach to transcribe drums from polyphonic...

chapter

CNN architectures for large-scale audio classification

Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 131 - 135

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the...

chapter

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Hendrik Meutzner, Ning Ma, Robert Nickel, Christopher Schymura, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5320 - 5324

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Audio-visual speech recognition is a promising approach to tackling the problem of reduced recognition rates under adverse acoustic conditions. However, finding an optimal mechanism for combining multi-modal information remains a challenging task. Various methods are applicable for integrating acoustic and visual information in Gaussian-mixture-model-based speech recognition, e.g., via dynamic stream...

chapter

Blind bandwidth extension using K-means and Support Vector Regression

Chih-Wei Wu, Mark Vinton

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 721 - 725

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, a blind bandwidth extension algorithm for music signals has been proposed. This method applies the K-means algorithm to firstly cluster audio data in the feature space, and constructs multiple envelope predictors for each cluster accordingly using Support Vector Regression (SVR). A set of well-established audio features for Music Information Retrieval (MIR) has been used to characterize...

chapter

Low-rank and sparse soft targets to learn better DNN acoustic models

Pranay Dighe, Afsaneh Asaei, Herve Bourlard

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5265 - 5269

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize...

1 ...
5
6
7
8
9
10
11

Keywords:
TRAINING
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Content availability

Available (2,958)
None (12)

Publication type

book (2,639)
article (331)

Keywords

SPEECH (1,022)
SPEECH RECOGNITION (842)
FEATURE EXTRACTION (802)
ACOUSTICS (487)
COMPUTATIONAL MODELING (353)
HIDDEN MARKOV MODEL (352)
DATA MODELS (340)
ACCURACY (330)
DATABASES (306)
SUPPORT VECTOR MACHINES (250)
DATA MINING (239)
TRAINING DATA (230)
HMM (221)
HANDWRITING RECOGNITION (191)
MATHEMATICAL MODEL (191)
TESTING (187)
NATURAL LANGUAGE PROCESSING (184)
VECTORS (180)
SPEECH PROCESSING (166)
ADAPTATION MODELS (159)
ARTIFICIAL NEURAL NETWORKS (159)
NEURAL NETWORKS (158)
SPEECH SYNTHESIS (151)
LEARNING (ARTIFICIAL INTELLIGENCE) (149)
AUTOMATIC SPEECH RECOGNITION (125)
CONTEXT (125)
DECODING (123)
MEL FREQUENCY CEPSTRAL COEFFICIENT (120)
SPEAKER RECOGNITION (116)
PROBABILITY (114)
TRAJECTORY (113)
ADAPTATION MODEL (112)
IMAGE SEGMENTATION (111)
HUMANS (108)
VOCABULARY (102)
ESTIMATION (101)
GESTURE RECOGNITION (96)
MARKOV PROCESSES (96)
MAXIMUM LIKELIHOOD ESTIMATION (96)
CLASSIFICATION ALGORITHMS (95)
GAUSSIAN PROCESSES (93)
VISUALIZATION (90)
NOISE (89)
ERROR ANALYSIS (88)
OPTIMIZATION (88)
PATTERN RECOGNITION (88)
PREDICTIVE MODELS (87)
CHARACTER RECOGNITION (86)
MACHINE LEARNING (86)
VITERBI ALGORITHM (86)
DICTIONARIES (83)
TEXT ANALYSIS (82)
ROBUSTNESS (79)
PATTERN CLASSIFICATION (78)
IMAGE RECOGNITION (77)
KERNEL (77)
JOINTS (71)
TAGGING (70)
CLUSTERING ALGORITHMS (69)
CONTEXT MODELING (69)
STATISTICAL ANALYSIS (69)
SHAPE (68)
FACE RECOGNITION (66)
NOISE MEASUREMENT (65)
IMAGE CLASSIFICATION (64)
NEURONS (63)
RECURRENT NEURAL NETWORKS (63)
STANDARDS (63)
DISCRIMINATIVE TRAINING (60)
SEMANTICS (60)
TRANSFORMS (60)
LABELING (59)
SENSORS (59)
SUPPORT VECTOR MACHINE (59)
ALGORITHM DESIGN AND ANALYSIS (57)
CONDITIONAL RANDOM FIELDS (56)
BAYES METHODS (55)
FACE (55)
PROBABILISTIC LOGIC (54)
DETECTORS (53)
HANDWRITTEN CHARACTER RECOGNITION (51)
IMAGE SEQUENCES (51)
NEURAL NETS (51)
PRINCIPAL COMPONENT ANALYSIS (49)
NATURAL LANGUAGES (48)
COMPUTER VISION (46)
CONFERENCES (46)
EQUATIONS (46)
SIGNAL TO NOISE RATIO (46)
ANALYTICAL MODELS (45)
CAMERAS (45)
BIOLOGICAL SYSTEM MODELING (44)
CORRELATION (44)
INFORMATION RETRIEVAL (44)
ACOUSTIC MODELING (43)
LATTICES (43)
MONITORING (43)
TEXT RECOGNITION (43)
more

INFONA - science communication portal

Advanced search

Advanced search in people

DNN acoustic models for dysarthric speech

Acoustic and language modeling for children's read speech assessment

Towards bootstrapping Acoustic Models for resource poor Indian languages

Implicit language identification system based on random forest and support vector machine for speech

Presentation slides generation from scientific papers using support vector regression

A novel face recognition method based on one state of discrete Hidden Markov Model

Radio-browsing for developmental monitoring in Uganda

Automatic node selection for Deep Neural Networks using Group Lasso regularization

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Advances in all-neural speech recognition

Lyric recognition in monophonic singing using pitch-dependent DNN

Topic identification of spoken documents using unsupervised acoustic unit discovery

Learning a hierarchical spatio-temporal model for human activity recognition

The 2016 BBN Georgian telephone speech keyword spotting system

Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding

Drum transcription from polyphonic music with recurrent neural networks

CNN architectures for large-scale audio classification

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Blind bandwidth extension using K-means and Support Vector Regression

Low-rank and sparse soft targets to learn better DNN acoustic models

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options