Wyniki wyszukiwania

Pozycje od 81 do 100 spośród 816 wyników

Poprzednia

Następna

rozdział

Active learning for sound event classification by clustering unlabeled data

Zhao Shuyang, Toni Heittola, Tuomas Virtanen

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 751 - 755

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes a novel active learning method to save annotation effort when preparing material to train sound event classifiers. K-medoids clustering is performed on unlabeled sound segments, and medoids of clusters are presented to annotators for labeling. The annotated label for a medoid is used to derive predicted labels for other cluster members. The obtained labels are used to build a classifier...

rozdział

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition

Kazuki Irie, Pavel Golik, Ralf Schluter, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5740 - 5744

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we present an investigation on technical details of the byte-level convolutional layer which replaces the conventional linear word projection layer in the neural language model. In particular, we discuss and compare the effective filter configurations, pooling types and the use of bytes instead of characters. We carry out experiments on language packs released by the IARPA Babel project...

rozdział

Hearing in a shoe-box: Binaural source position and wall absorption estimation using virtually supervised learning

Saurabh Kataria, Clement Gaultier, Antoine Deleforge

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 226 - 230

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper introduces a new framework for supervised sound source localization referred to as virtually-supervised learning. An acoustic shoe-box room simulator is used to generate a large number of binaural single-source audio scenes. These scenes are used to build a dataset of spatial binaural features annotated with acoustic properties such as the 3D source position and the walls' absorption coefficients...

rozdział

Respiratory airflow estimation from lung sounds based on regression

Elmar Messner, Martin Hagmuller, Paul Swatek, Freyja-Maria Smolle-Juttner, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1123 - 1127

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The aim of this work is the estimation of respiratory flow from lung sound recordings, i.e. acoustic airflow estimation. With a 16-channel lung sound recording device, we simultaneously record the respiratory flow and the lung sounds on the posterior chest from six lung-healthy subjects in supine position. For the recordings of four selected sensor positions, we extract linear frequency cepstral coefficient...

rozdział

A network of deep neural networks for Distant Speech Recognition

Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4880 - 4884

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.

rozdział

Anuran call classification with deep learning

Julia Strout, Bryce Rogan, S.M. Mahdi Seyednezhad, Katrina Smart, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2662 - 2665

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Ecologists can assess the health of flooded habitats or wetlands by studying the variations in the populations of bioindicators such as anurans (i.e., frogs and toads). To monitor anuran populations, ecologists manually identify anuran species from audio recordings. This identification task can be significantly streamlined by the availability of an automated method for anuran identification. Previous...

rozdział

A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition

Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schluter, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2462 - 2466

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent experiments show that deep bidirectional long short-term memory (BLSTM) recurrent neural network acoustic models outperform feedforward neural networks for automatic speech recognition (ASR). However, their training requires a lot of tuning and experience. In this work, we provide a comprehensive overview over various BLSTM training aspects and their interplay within ASR, which has been missing...

rozdział

On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition

Xiong Xiao, Shengkui Zhao, Douglas L. Jones, Eng Siong Chng, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3246 - 3250

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Acoustic beamforming has played a key role in the robust automatic speech recognition (ASR) applications. Accurate estimates of the speech and noise spatial covariance matrices (SCM) are crucial for successfully applying the minimum variance distortionless response (MVDR) beamforming. Reliable estimation of time-frequency (TF) masks can improve the estimation of the SCMs and significantly improve...

rozdział

A deep neural network integrated with filterbank learning for speech recognition

Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5480 - 5484

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as...

rozdział

Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis

Ya-Jun Hu, Zhen-Hua Ling, Li-Rong Dai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4915 - 4919

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a method to extract structural spectral features from spectral envelopes using what-where autoencoders (WWAE) for statistical parametric speech synthesis (SPSS). A WWAE is constructed by concatenating a convolutional net for input encoding and a deconvolutional net for reconstruction. The output values of the max-pooling layer in the encoder and the positions of the max-pooling...

rozdział

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Naoyuki Kanda, Xugang Lu, Hisashi Kawai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4855 - 4859

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...

rozdział

Very deep convolutional networks for end-to-end speech recognition

Yu Zhang, William Chan, Navdeep Jaitly

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4845 - 4849

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Sequence-to-sequence models have shown success in end-to-end speech recognition. However these models have only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more expressive power and better generalization for end-to-end ASR models. We apply network-in-network principles, batch normalization, residual connections and convolutional...

rozdział

Knowledge distillation for small-footprint highway networks

Liang Lu, Michelle Guo, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4820 - 4824

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep learning has significantly advanced state-of-the-art of speech recognition in the past few years. However, compared to conventional Gaussian mixture acoustic models, neural network models are usually much larger, and are therefore not very deployable in embedded devices. Previously, we investigated a compact highway deep neural network (HDNN) for acoustic modelling, which is a type of depth-gated...

rozdział

Noisy objective functions based on the f-divergence

Markus Nussbaum-Thom, Ralf Schluter, Vaibhava Goel, Hermann Ney

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2327 - 2331

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Dropout, the random dropping out of activations according to a specified rate, is a very simple but effective method to avoid over-fitting of deep neural networks to the training data.

rozdział

Training variance and performance evaluation of neural networks in speech

Ewout van den Berg, Bhuvana Ramabhadran, Michael Picheny

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2287 - 2291

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work we study variance in the results of neural network training on a wide variety of configurations in automatic speech recognition. Although this variance itself is well known, this is, to the best of our knowledge, the first paper that performs an extensive empirical study on its effects in speech recognition. We view training as sampling from a distribution and show that these distributions...

rozdział

Combining unidirectional long short-term memory with convolutional output layer for high-performance speech synthesis

Wenfu Wang, Bo Xu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5500 - 5504

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we target improving the accuracy of acoustic modelling for statistical parametric speech synthesis (SPSS) and introduce the convolutional neural network (CNN) due to its powerful capacity in locality modelling. A novel model architecture combining unidirectional long short-term memory (LSTM) and a time-domain convolutional output layer (COL) is proposed and employed to acoustic modelling...

rozdział

Fast tagging of natural sounds using marginal co-regularization

Qiang Huang, Yong Xu, Philip J. B. Jackson, Wenwu Wang, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2991 - 2995

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic and fast tagging of natural sounds in audio collections is a very challenging task due to wide acoustic variations, the large number of possible tags, the incomplete and ambiguous tags provided by different labellers. To handle these problems, we use a co-regularization approach to learn a pair of classifiers on sound and text. The first classifier maps low-level audio features to a true...

rozdział

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Ondrej Klejch, Peter Bell, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5700 - 5704

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present an extension of our previously described neural machine translation based system for punctuated transcription. This extension allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder. Furthermore, we show that a system combining lexical and acoustic...

rozdział

Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection

Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5645 - 5649

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose to use a feature representation obtained by pairwise learning in a low-resource language for query-by-example spoken term detection (QbE-STD). We assume that word pairs identified by humans are available in the low-resource target language. The word pairs are parameterized by a multi-lingual bottleneck feature (BNF) extractor that is trained using transcribed data in high-resource languages...

rozdział

An LSTM-CTC based verification system for proxy-word based OOV keyword search

Zhiqiang Lv, Jian Kang, Wei-Qiang Zhang, Jia Liu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5655 - 5659

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Proxy-word based out of vocabulary (OOV) keyword search has been proven to be quite effective in keyword search. In proxy-word based OOV keyword search, each OOV keyword is assigned several proxies and detections of the proxies are regarded as detections of the OOV keywords. However, the confidence scores of these detections are still those of the proxies from lattices. To obtain a better confidence...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
TRAINING
ACOUSTICS

Data publikacji

Ustaw własny zakres dat

Dostępność treści

Dostępna (815)
Brak (1)

Słowa kluczowe

SPEECH (482)
HIDDEN MARKOV MODELS (426)
SPEECH RECOGNITION (384)
FEATURE EXTRACTION (189)
DATA MODELS (128)
ACCURACY (88)
ADAPTATION MODELS (87)
TRAINING DATA (83)
NEURAL NETWORKS (82)
COMPUTATIONAL MODELING (76)
SPEECH PROCESSING (70)
ARTIFICIAL NEURAL NETWORKS (66)
SUPPORT VECTOR MACHINES (63)
AUTOMATIC SPEECH RECOGNITION (61)
DATABASES (58)
TESTING (54)
DECODING (49)
NATURAL LANGUAGE PROCESSING (46)
ADAPTATION MODEL (44)
ACOUSTIC SIGNAL PROCESSING (43)
VECTORS (43)
SPEAKER RECOGNITION (42)
CONTEXT (40)
DATA MINING (39)
MATHEMATICAL MODEL (38)
SIGNAL PROCESSING (38)
ACOUSTIC MODELING (37)
HIDDEN MARKOV MODEL (36)
NOISE (36)
DEEP NEURAL NETWORK (33)
SPEECH SYNTHESIS (33)
ERROR ANALYSIS (32)
ESTIMATION (32)
LATTICES (32)
DEEP NEURAL NETWORKS (31)
LEARNING (ARTIFICIAL INTELLIGENCE) (31)
ROBUSTNESS (30)
VOCABULARY (30)
DISCRIMINATIVE TRAINING (29)
MAXIMUM LIKELIHOOD ESTIMATION (29)
TRANSFORMS (28)
CLASSIFICATION ALGORITHMS (27)
VISUALIZATION (26)
ACOUSTIC MODEL (24)
DICTIONARIES (24)
KERNEL (23)
PATTERN RECOGNITION (23)
SIGNAL TO NOISE RATIO (22)
STANDARDS (22)
CONTEXT MODELING (21)
EMOTION RECOGNITION (21)
MACHINE LEARNING (21)
NOISE MEASUREMENT (21)
PROBABILITY (21)
SIGNAL PROCESSING ALGORITHMS (21)
CONFERENCES (20)
EQUATIONS (20)
ALGORITHM DESIGN AND ANALYSIS (19)
CLUSTERING ALGORITHMS (19)
EDUCATIONAL INSTITUTIONS (19)
HMM (19)
INDEXES (19)
MICROPHONES (19)
OPTIMIZATION (19)
COMPUTERS (18)
GAUSSIAN PROCESSES (18)
RECURRENT NEURAL NETWORKS (18)
CORRELATION (17)
COMPLEXITY THEORY (16)
COMPUTER ARCHITECTURE (16)
LANGUAGE MODEL (16)
NEURAL NETS (16)
DETECTORS (15)
GAUSSIAN MIXTURE MODEL (15)
SUPPORT VECTOR MACHINE CLASSIFICATION (15)
UNSUPERVISED LEARNING (15)
ACOUSTIC MEASUREMENTS (14)
EVENT DETECTION (14)
MEASUREMENT (14)
CONVOLUTION (13)
KEYWORD SEARCH (13)
MEL FREQUENCY CEPSTRAL COEFFICIENT (13)
PATTERN CLASSIFICATION (13)
PRAGMATICS (13)
PREDICTIVE MODELS (13)
SPEAKER ADAPTATION (13)
APPROXIMATION METHODS (12)
DNN (12)
LVCSR (12)
NIST (12)
PRINCIPAL COMPONENT ANALYSIS (12)
SILICON (12)
SUPPORT VECTOR MACHINE (12)
ENTROPY (11)
LABORATORIES (11)
SHAPE (11)
SPEECH CODING (11)
SPEECH ENHANCEMENT (11)
więcej

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu