ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pozycje od 61 do 80 spośród 120 wyników

Poprzednia

Następna

rozdział

A Conditional Random Field system for beat tracking

Thomas Fillon, Cyril Joder, Simon Durand, Slim Essid

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 424 - 428

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In the present work, we introduce a new probabilistic model for the task of estimating beat positions in a musical audio recording, instantiating the Conditional Random Field (CRF) framework. Our approach takes its strength from a sophisticated temporal modeling of the audio observations, accounting for local tempo variations which are readily represented in the CRF model proposed using well-chosen...

rozdział

A feedback framework for improved chord recognition based on NMF-based approximate note transcription

Satoshi Maruo, Kazuyoshi Yoshii, Katsutoshi Itoyama, Matthias Mauch, więcej

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 196 - 200

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a feedback framework that can improve chord recognition for music audio signals by performing approximate note transcription with Bayesian non-negative matrix factorization (NMF) using prior knowledge on chords. Although the names and note compositions of chords are intrinsically linked with each other (e.g., C major chords are highly likely to include C, E, and G notes, and those...

rozdział

An online em algorithm in hidden (semi-)Markov models for audio segmentation and clustering

Alberto Bietti, Francis Bach, Arshia Cont

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1881 - 1885

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Audio segmentation is an essential problem in many audio signal processing tasks, which tries to segment an audio signal into homogeneous chunks. Rather than separately finding change points and computing similarities between segments, we focus on joint segmentation and clustering, using the framework of hidden Markov and semi-Markov models. We introduce a new incremental EM algorithm for hidden Markov...

rozdział

A unified probabilistic framework for robust decoding of linear barcodes

Umut Simsekli, Tolga Birdal

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1946 - 1950

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Both consumer market and manufacturing industry makes heavy use of 1D (linear) barcodes. From helping the visually impaired to identifying the products to industrial automated industry management, barcodes are the prevalent source of item tracing technology. Because of this ubiquitous use, in recent years, many algorithms have been proposed targeting barcode decoding from high-accessibility devices...

rozdział

Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition

Xiangang Li, Xihong Wu

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4520 - 4524

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous research on constructing...

rozdział

Free energy for speech recognition

Rita Singh, Kenichi Kumatani

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4515 - 4519

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Traditionally, speech recognizers have used a strictly Bayesian paradigm for finding the best hypothesis from amongst all possible hypotheses for the data to be recognized. The Bayes classification rule has been shown to be optimal when the class distributions represent the true distributions of the data to be classified. In reality, however, this condition is often not satisfied - the classifier...

rozdział

Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR

Andros Tjandra, Sakriani Sakti, Graham Neubig, Tomoki Toda, więcej

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4525 - 4529

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper explores the use of auditory features based on cochleograms; two dimensional speech features derived from gammatone filters within the convolutional neural network (CNN) framework. Furthermore, we also propose various possibilities to combine cochleogram features with log-mel filter banks or spectrogram features. In particular, we combine within low and high levels of CNN framework which...

rozdział

Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition

Ivan Himawan, Petr Motlicek, David Imseng, Blaise Potard, więcej

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4540 - 4544

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic speech recognition from distant microphones is a difficult task because recordings are affected by reverberation and background noise. First, the application of the deep neural network (DNN)/hidden Markov model (HMM) hybrid acoustic models for distant speech recognition task using AMI meeting corpus is investigated. This paper then proposes a feature transformation for removing reverberation...

rozdział

A novel static parameter calculation method for model compensation

Suliang Bu, Yunxin Zhao, Yanmin Qian, Kai Yu

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4510 - 4514

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Vector Taylor Series (VTS) based model compensation approach has been successfully applied to various robust speech recognition tasks. In this paper, we propose a novel method of variable transformation to calculate the static statistics. In addition, we provide a detailed explanation of VTS and random variable transformations adopted in some recent papers. Experiments on Aurora 4 showed that the...

rozdział

On the importance of modeling and robustness for deep neural network feature

Shuo-Yiin Chang, Steven Wegmann

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4530 - 4534

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A large body of research has shown that acoustic features for speech recognition can be learned from data using neural networks with multiple hidden layers (DNNs) and that these learned features are superior to standard features (e.g., MFCCs). However, this superiority is usually demonstrated when the data used to learn the features is very similar in character to the data used to test recognition...

rozdział

Regularization of context-dependent deep neural networks with context-independent multi-task training

Peter Bell, Steve Renals

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4290 - 4294

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The use of context-dependent targets has become standard in hybrid DNN systems for automatic speech recognition. However, we argue that despite the use of state-tying, optimising to context-dependent targets can lead to over-fitting, and that discriminating between arbitrary tied context-dependent targets may not be optimal. We propose a multitask learning method where the network jointly predicts...

rozdział

Analysis and automatic recognition of Human BeatBox sounds: A comparative study

Benjamin Picart, Sandrine Brognaux, Stephane Dupont

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4255 - 4259

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

“Human BeatBox” (HBB) is a newly expanding contemporary singing style where the vocalist imitates drum beats percussive sounds as well as pitched musical instrument sounds. Drum sounds typically use a notation based on plosives and fricatives, and instrument sounds cover vocalisations that go beyond spoken language vowels. HBB hence constitutes an interesting use case for expanding techniques initially...

rozdział

Deep neural support vector machines for speech recognition

Shi-Xiong Zhang, Chaojun Liu, Kaisheng Yao, Yifan Gong

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4275 - 4279

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A new type of deep neural networks (DNNs) is presented in this paper. Traditional DNNs use the multinomial logistic regression (softmax activation) at the top layer for classification. The new DNN instead uses a support vector machine (SVM) at the top layer. Two training algorithms are proposed at the frame and sequence-level to learn parameters of SVM and DNN in the maximum-margin criteria. In the...

rozdział

An investigation into speaker informed DNN front-end for LVCSR

Yulan Liu, Penny Karanasou, Thomas Hain

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4300 - 4304

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep Neural Network (DNN) has become a standard method in many ASR tasks. Recently there is considerable interest in “informed training” of DNNs, where DNN input is augmented with auxiliary codes, such as i-vectors, speaker codes, speaker separation bottleneck (SSBN) features, etc. This paper compares different speaker informed DNN training methods in LVCSR task. We discuss mathematical equivalence...

rozdział

Learning acoustic frame labeling for speech recognition with recurrent neural networks

Hasim Sak, Andrew Senior, Kanishka Rao, Ozan Irsoy, więcej

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4280 - 4284

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We explore alternative acoustic modeling techniques for large vocabulary speech recognition using Long Short-Term Memory recurrent neural networks. For an acoustic frame labeling task, we compare the conventional approach of cross-entropy (CE) training using fixed forced-alignments of frames and labels, with the Connectionist Temporal Classification (CTC) method proposed for labeling unsegmented sequence...

rozdział

Convolutional Neural Networks-based continuous speech recognition using raw speech signal

Dimitri Palaz, Mathew Magimai.-Doss, Ronan Collobert

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4295 - 4299

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

State-of-the-art automatic speech recognition systems model the relationship between acoustic speech signal and phone classes in two stages, namely, extraction of spectral-based features based on prior knowledge followed by training of acoustic model, typically an artificial neural network (ANN). In our recent work, it was shown that Convolutional Neural Networks (CNNs) can model phone classes from...

rozdział

Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables

Zoltan Tuske, Muhammad Ali Tahir, Ralf Schluter, Hermann Ney

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4285 - 4289

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By...

rozdział

Parameter generation algorithm considering Modulation Spectrum for HMM-based speech synthesis

Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4210 - 4214

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes a novel parameter generation algorithm for high-quality speech generation in Hidden Markov Model (HMM)-based speech synthesis. One of the biggest issues causing significant quality degradation is the over-smoothing effect often observed in generated parameter trajectories. Global Variance (GV) is known as a feature well correlated with the over-smoothing effect and a metric on...

rozdział

Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis

Keiichi Tokuday, Heiga Zen

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4215 - 4219

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes a novel approach for directly-modeling speech at the waveform level using a neural network. This approach uses the neural network-based statistical parametric speech synthesis framework with a specially designed output layer. As acoustic feature extraction is integrated to acoustic model training, it can overcome the limitations of conventional approaches, such as two-step (feature...

rozdział

Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech

Thomas Merritt, Javier Latorre, Simon King

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4220 - 4224

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Even the best statistical parametric speech synthesis systems do not achieve the naturalness of good unit selection. We investigated possible causes of this. By constructing speech signals that lie in between natural speech and the output from a complete HMM synthesis system, we investigated various effects of modelling. We manipulated the temporal smoothness and the variance of the spectral parameters...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
HIDDEN MARKOV MODELS

Data publikacji

Ustaw własny zakres dat

Słowa kluczowe

SPEECH (86)
TRAINING (51)
SPEECH RECOGNITION (49)
ACOUSTICS (48)
NEURAL NETWORKS (27)
FEATURE EXTRACTION (22)
ADAPTATION MODELS (20)
DEEP NEURAL NETWORK (19)
SPEECH SYNTHESIS (13)
COMPUTATIONAL MODELING (12)
CONTEXT (12)
RECURRENT NEURAL NETWORKS (11)
NOISE (10)
ACCURACY (9)
DEEP NEURAL NETWORKS (9)
ARTIFICIAL NEURAL NETWORKS (8)
AUTOMATIC SPEECH RECOGNITION (8)
MATHEMATICAL MODEL (8)
SPEECH PROCESSING (8)
CONTEXT MODELING (7)
DATA MODELS (7)
PROBABILISTIC LOGIC (7)
STATISTICAL PARAMETRIC SPEECH SYNTHESIS (7)
CONVOLUTION (6)
DATABASES (6)
ESTIMATION (6)
JOINTS (6)
ROBUSTNESS (6)
VISUALIZATION (6)
COMPUTER ARCHITECTURE (5)
DNN (5)
ERROR ANALYSIS (5)
HIDDEN MARKOV MODEL (5)
NOISE MEASUREMENT (5)
ROBUST SPEECH RECOGNITION (5)
SPEAKER DIARIZATION (5)
TRAINING DATA (5)
TRAJECTORY (5)
ACOUSTIC MODELING (4)
CLUSTERING ALGORITHMS (4)
HARMONIC ANALYSIS (4)
HMM (4)
LATTICES (4)
RNN (4)
SPEAKER ADAPTATION (4)
SPEAKER RECOGNITION (4)
SPEECH ENHANCEMENT (4)
BAYES METHODS (3)
CONDITIONAL RANDOM FIELDS (3)
CONFERENCES (3)
CONVOLUTIONAL NEURAL NETWORKS (3)
EMOTION RECOGNITION (3)
HIDDEN MARKOV MODEL (HMM) (3)
INSTRUMENTS (3)
LABELING (3)
LONG SHORT-TERM MEMORY (3)
MEL FREQUENCY CEPSTRAL COEFFICIENT (3)
NEURAL NETWORK (3)
NEURONS (3)
NIST (3)
PRAGMATICS (3)
SIGNAL PROCESSING (3)
SIGNAL TO NOISE RATIO (3)
SILICON (3)
SPECTROGRAM (3)
SUPPORT VECTOR MACHINES (3)
TIME-FREQUENCY ANALYSIS (3)
ACOUSTIC MODEL (2)
ACOUSTIC MODELLING (2)
ACTIVE LEARNING (2)
ADAPTATION (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ANALYTICAL MODELS (2)
APPROXIMATION METHODS (2)
AUDIO SEGMENTATION (2)
BELIEF PROPAGATION (2)
BIRDS (2)
BLSTM (2)
CAMERAS (2)
CONVOLUTIONAL NEURAL NETWORK (2)
CORRELATION (2)
DECODING (2)
DEEP LEARNING (2)
DENSITY ESTIMATION ROBUST ALGORITHM (2)
DISCRETE COSINE TRANSFORMS (2)
FREQUENCY ESTIMATION (2)
GMM (2)
GRAPHICAL MODELS (2)
HANDWRITING RECOGNITION (2)
IMAGE SEGMENTATION (2)
INTEGRATED CIRCUITS (2)
JOINING PROCESSES (2)
LARGE VOCABULARY SPEECH RECOGNITION (2)
LSTM (2)
MATRIX DECOMPOSITION (2)
MAXOUT (2)
MEASUREMENT (2)
MULTI-TASK LEARNING (2)
MULTIPLE SIGNAL CLASSIFICATION (2)
więcej

INFONA - portal komunikacji naukowej

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) $("#expandableTitles").expandable();

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)