Search results

Items from 121 to 140 out of 816 results

1 ...
4
5
6
7
8
9
10

chapter

Hyperarticulation detection in repetitive voice queries using pairwise comparison for improved speech recognition

Ranjitha Gurunath Kulkarni, Ahmed El Kholy, Ziad Al Bawab, Noha Alon, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4985 - 4989

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic speech recognition systems can benefit from cues in user voice such as hyperarticulation. Traditional approaches typically attempt to define and detect an absolute state of hyperarticulation, which is very difficult, especially on short voice queries. We present a novel approach for hyperarticulation detection using pairwise comparisons and demonstrate its application in a real-world speech...

chapter

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

A. Ragni, C. Wu, M. J. F. Gales, J. Vasilakes, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4830 - 4834

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Training neural network acoustic models on limited quantities of data is a challenging task. A number of techniques have been proposed to improve generalisation. This paper investigates one such technique called stimulated training. It enables standard criteria such as cross-entropy to enforce spatial constraints on activations originating from different units. Having different regions being active...

chapter

Joint CTC-attention based end-to-end speech recognition using multi-task learning

Suyoun Kim, Takaaki Hori, Shinji Watanabe

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4835 - 4839

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. One approach is the attention-based encoder-decoder framework that learns a mapping between variable-length input and output sequences in one step using a purely data-driven method. The attention model has often been shown to improve the performance...

chapter

Multi-accent speech recognition with hierarchical grapheme based models

Kanishka Rao, Hasim Sak

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4815 - 4819

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We train grapheme-based acoustic models for speech recognition using a hierarchical recurrent neural network architecture with connectionist temporal classification (CTC) loss. The models learn to align utterances with phonetic transcriptions in a lower layer and graphemic transcriptions in the final layer in a multi-task learning setting. Using the grapheme predictions from a hierarchical model trained...

chapter

Trainable frontend for robust and far-field keyword spotting

Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5670 - 5674

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Robust and far-field speech recognition is critical to enable true hands-free communication. In far-field conditions, signals are attenuated due to distance. To improve robustness to loudness variation, we introduce a novel frontend called per-channel energy normalization (PCEN). The key ingredient of PCEN is the use of an automatic gain control based dynamic compression to replace the widely used...

chapter

Unsupervised latent behavior manifold learning from acoustic features: Audio2behavior

Haoqi Li, Brian Baucom, Panayiotis Georgiou

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5620 - 5624

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Behavioral annotation using signal processing and machine learning is highly dependent on training data and manual annotations of behavioral labels. Previous studies have shown that speech information encodes significant behavioral information and be used in a variety of automated behavior recognition tasks. However, extracting behavior information from speech is still a difficult task due to the...

chapter

Improving latency-controlled BLSTM acoustic models for online speech recognition

Shaofei Xue, Zhijie Yan

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5340 - 5344

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Bidirectional long short-term memory (BLSTM) recurrent neural networks are powerful acoustic models in terms of recognition accuracy. When BLSTM acoustic models are used in decoding, the speech decoder needs to wait until the end of a whole sentence is reached, such that forward-propagation in the backward direction can then be performed. The nature of BLSTM acoustic models makes them inappropriate...

chapter

Exploring universal speech attributes for speaker verification

Sheng Zhang, Wu Guo, Guoping Hu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5355 - 5359

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The universal speech attributes for speaker verification (SV) are addressed in this paper. The aim of this work is to exploit fundamental characteristics across different speakers within the deep neural network (DNN)/i-vector framework. The manner and place of articulation form the fundamental speech attribute unit inventory, and new attribute units for acoustic modelling are generated by a two-step...

chapter

Beamnet: End-to-end training of a beamformer-supported multi-channel ASR system

Jahn Heymann, Lukas Drude, Christoph Boeddeker, Patrick Hanebrink, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5325 - 5329

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents an end-to-end training approach for a beamformer-supported multi-channel ASR system. A neural network which estimates masks for a statistically optimum beamformer is jointly trained with a network for acoustic modeling. To update its parameters, we propagate the gradients from the acoustic model all the way through feature extraction and the complex valued beamforming operation...

chapter

Network architectures for multilingual speech representation learning

Tom Sercu, George Saon, Jia Cui, Xiaodong Cui, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5295 - 5299

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Multilingual (ML) representations play a key role in building speech recognition systems for low resource languages. The IARPA sponsored BABEL program focuses on building speech recognition (ASR) and keyword search (KWS) systems in over 24 languages with limited training data. The most common mechanism to derive ML representations in the BABEL program has been with the use of a two-stage network,...

chapter

The microsoft 2016 conversational speech recognition system

W. Xiong, J. Droppo, X. Huang, F. Seide, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5255 - 5259

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training...

chapter

Robust Automatic Recognition of Speech with background music

Jiri Malek, Jindrich Zdansky, Petr Cerva

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5210 - 5214

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background, where the accuracy of recognition may deteriorate significantly. To improve the robustness of ASR in this task, e.g. for broadcast news transcription or subtitles creation, we adopt two approaches: 1) multi-condition training of the acoustic models and 2) denoising autoencoders followed by acoustic model...

chapter

Integrated DNN-based model adaptation technique for noise-robust speech recognition

Kang Hyun Lee, Woo Hyun Kang, Tae Gyoon Kang, Nam Soo Kim

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5245 - 5249

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Since the introduction of deep neural network (DNN)-based acoustic model, robust automatic speech recognition using DNN are being in research. Especially in model adaptation, the techniques utilizing auxiliary context features is known to be a promising technique. Recently, we proposed a technique which is called two-stage noise-aware training (TSNAT). The key idea of TS-NAT is to let the DNN clarify...

chapter

Adapting and controlling DNN-based speech synthesis using input codes

Hieu-Thi Luong, Shinji Takaki, Gustav Eje Henter, Junichi Yamagishi

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4905 - 4909

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Methods for adapting and controlling the characteristics of output speech are important topics in speech synthesis. In this work, we investigated the performance of DNN-based text-to-speech systems that in parallel to conventional text input also take speaker, gender, and age codes as inputs, in order to 1) perform multi-speaker synthesis, 2) perform speaker adaptation using small amounts of target-speaker...

chapter

Visual features for context-aware speech recognition

Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5020 - 5024

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic transcriptions of consumer generated multi-media content such as “Youtube” videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited.

chapter

Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training

C. Zhang, P. C. Woodland

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5015 - 5019

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The use of deep neural networks (DNNs) for feature extraction and Gaussian mixture models (GMMs) for acoustic modelling is often termed a tandem system configuration and can be viewed as a Gaussian mixture density neural network (MDNN). Compared to the direct use of DNN output probabilities in the acoustic model, the tandem approach suffers from a major weakness in that the feature extraction stage...

chapter

Automatic speech emotion recognition using recurrent neural networks with local attention

Seyedmahdad Mirsamadi, Emad Barsoum, Cha Zhang

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2227 - 2231

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic emotion recognition from speech is a challenging task which relies heavily on the effectiveness of the speech features used for classification. In this work, we study the use of deep learning to automatically discover emotionally relevant features from speech. It is shown that using a deep recurrent neural network, we can learn both the short-time frame-level acoustic features that are emotionally...

chapter

Clinical informatics: mining of pathological data by acoustic analysis

Zulfiqar Ali, Mansour Alsulaiman, Ghulam Muhammad, Ahmed Al-nasheri, more

2017 International Conference on Informatics, Health & Technology (ICIHT) > 1 - 8

2017 International Conference on Informatics, Health & Technology (ICIHT)

Data mining has a great potential in different areas of health informatics. Data mining in health industry can minimize the health cost as well as reduces the risk of life by informing a person at initial stage. An automatic classification system capable of mining pathological data may contribute in health informatics significantly. In this paper, an automatic system to differentiate between pathological...

chapter

Low-power neuromorphic speech recognition engine with coarse-grain sparsity

Shunti Yin, Deepak Kadetotad, Bonan Yan, Chang Song, more

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 111 - 114

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

In recent years, we have seen a surge of interest in neuromorphic computing and its hardware design for cognitive applications. In this work, we present new neuromorphic architecture, circuit, and device co-designs that enable spike-based classification for speech recognition task. The proposed neuromorphic speech recognition engine supports a sparsely connected deep spiking network with coarse granularity,...

chapter

An extended experimental investigation of DNN uncertainty propagation for noise robust ASR

Karan Nathwani, Juan A. Morales-Cordovilla, Sunit Sivasankaran, Irina Illina, more

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) > 26 - 30

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)

Automatic speech recognition (ASR) in noisy environments remains a challenging goal. Recently, the idea of estimating the uncertainty about the features obtained after speech enhancement and propagating it to dynamically adapt deep neural network (DNN) based acoustic models has raised some interest. However, the results in the literature were reported on simulated noisy datasets for a limited variety...

1 ...
4
5
6
7
8
9
10

Keywords:
TRAINING
ACOUSTICS

Publication date

Set your own date range

Content availability

Available (815)
None (1)

Keywords

SPEECH (482)
HIDDEN MARKOV MODELS (426)
SPEECH RECOGNITION (384)
FEATURE EXTRACTION (189)
DATA MODELS (128)
ACCURACY (88)
ADAPTATION MODELS (87)
TRAINING DATA (83)
NEURAL NETWORKS (82)
COMPUTATIONAL MODELING (76)
SPEECH PROCESSING (70)
ARTIFICIAL NEURAL NETWORKS (66)
SUPPORT VECTOR MACHINES (63)
AUTOMATIC SPEECH RECOGNITION (61)
DATABASES (58)
TESTING (54)
DECODING (49)
NATURAL LANGUAGE PROCESSING (46)
ADAPTATION MODEL (44)
ACOUSTIC SIGNAL PROCESSING (43)
VECTORS (43)
SPEAKER RECOGNITION (42)
CONTEXT (40)
DATA MINING (39)
MATHEMATICAL MODEL (38)
SIGNAL PROCESSING (38)
ACOUSTIC MODELING (37)
HIDDEN MARKOV MODEL (36)
NOISE (36)
DEEP NEURAL NETWORK (33)
SPEECH SYNTHESIS (33)
ERROR ANALYSIS (32)
ESTIMATION (32)
LATTICES (32)
DEEP NEURAL NETWORKS (31)
LEARNING (ARTIFICIAL INTELLIGENCE) (31)
ROBUSTNESS (30)
VOCABULARY (30)
DISCRIMINATIVE TRAINING (29)
MAXIMUM LIKELIHOOD ESTIMATION (29)
TRANSFORMS (28)
CLASSIFICATION ALGORITHMS (27)
VISUALIZATION (26)
ACOUSTIC MODEL (24)
DICTIONARIES (24)
KERNEL (23)
PATTERN RECOGNITION (23)
SIGNAL TO NOISE RATIO (22)
STANDARDS (22)
CONTEXT MODELING (21)
EMOTION RECOGNITION (21)
MACHINE LEARNING (21)
NOISE MEASUREMENT (21)
PROBABILITY (21)
SIGNAL PROCESSING ALGORITHMS (21)
CONFERENCES (20)
EQUATIONS (20)
ALGORITHM DESIGN AND ANALYSIS (19)
CLUSTERING ALGORITHMS (19)
EDUCATIONAL INSTITUTIONS (19)
HMM (19)
INDEXES (19)
MICROPHONES (19)
OPTIMIZATION (19)
COMPUTERS (18)
GAUSSIAN PROCESSES (18)
RECURRENT NEURAL NETWORKS (18)
CORRELATION (17)
COMPLEXITY THEORY (16)
COMPUTER ARCHITECTURE (16)
LANGUAGE MODEL (16)
NEURAL NETS (16)
DETECTORS (15)
GAUSSIAN MIXTURE MODEL (15)
SUPPORT VECTOR MACHINE CLASSIFICATION (15)
UNSUPERVISED LEARNING (15)
ACOUSTIC MEASUREMENTS (14)
EVENT DETECTION (14)
MEASUREMENT (14)
CONVOLUTION (13)
KEYWORD SEARCH (13)
MEL FREQUENCY CEPSTRAL COEFFICIENT (13)
PATTERN CLASSIFICATION (13)
PRAGMATICS (13)
PREDICTIVE MODELS (13)
SPEAKER ADAPTATION (13)
APPROXIMATION METHODS (12)
DNN (12)
LVCSR (12)
NIST (12)
PRINCIPAL COMPONENT ANALYSIS (12)
SILICON (12)
SUPPORT VECTOR MACHINE (12)
ENTROPY (11)
LABORATORIES (11)
SHAPE (11)
SPEECH CODING (11)
SPEECH ENHANCEMENT (11)
more

INFONA - science communication portal

Search results

Hyperarticulation detection in repetitive voice queries using pairwise comparison for improved speech recognition

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

Joint CTC-attention based end-to-end speech recognition using multi-task learning

Multi-accent speech recognition with hierarchical grapheme based models

Trainable frontend for robust and far-field keyword spotting

Unsupervised latent behavior manifold learning from acoustic features: Audio2behavior

Improving latency-controlled BLSTM acoustic models for online speech recognition

Exploring universal speech attributes for speaker verification

Beamnet: End-to-end training of a beamformer-supported multi-channel ASR system

Network architectures for multilingual speech representation learning

The microsoft 2016 conversational speech recognition system

Robust Automatic Recognition of Speech with background music

Integrated DNN-based model adaptation technique for noise-robust speech recognition

Adapting and controlling DNN-based speech synthesis using input codes

Visual features for context-aware speech recognition

Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training

Automatic speech emotion recognition using recurrent neural networks with local attention

Clinical informatics: mining of pathological data by acoustic analysis

Low-power neuromorphic speech recognition engine with coarse-grain sparsity

An extended experimental investigation of DNN uncertainty propagation for noise robust ASR

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options