The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
State-of-the-art convolutional neural networks (CNNs) typically use a log-mel spectral representation of the speech signal. However, this representation is limited by the spectro-temporal resolution afforded by log-mel filter-banks. A novel technique known as Deep Scattering Spectrum (DSS) addresses this limitation and preserves higher resolution information, while ensuring time warp stability, through...
A plethora of different onset detection methods have been proposed in the recent years. However, few attempts have been made with respect to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. In this paper, we present a multi-resolution approach based on discrete wavelet transform and linear prediction filtering...
Speech synthesis systems are typically built with speech data and transcriptions. In this paper, we try to build synthesis systems when no transcriptions or knowledge about the language are available. It is usually necessary to at least possess phonetic knowledge about the language. In this paper, we propose an automated way of obtaining phones and phonetic knowledge about the corpus at hand by making...
This paper describes the new release of RASR — the open source version of the well-proven speech recognition toolkit developed and used at RWTH Aachen University. The focus is put on the implementation of the NN module for training neural network acoustic models. We describe code design, configuration, and features of the NN module. The key feature is a high flexibility regarding the network topology,...
A novel gesture binary classification procedure is presented to determine surgical ability. To this aim a sensory glove was employed to track surgical hand movements and sensors data were recorded to be processed by a specific algorithm. The classification task was able to discriminate a gesture made by an expert surgeon with respect to a novice one, thanks to a two steps classification strategy....
Recent works have shown Neural Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition. Prior works have shown that these models obtain lower perplexity and word error rate (WER) compared to both standard n-gram language models (LMs) and more advanced language models including maximum entropy and random forest LMs. While these results are compelling,...
This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with...
We describe a simple modification of neural networks which consists in extending the commonly used linear layer structure to an arbitrary graph structure. This allows us to combine the benefits of convolutional neural networks with the benefits of regular networks. The joint model has only a small increase in parameter size and training and decoding time are virtually unaffected. We report significant...
This paper explores asynchronous stochastic optimization for sequence training of deep neural networks. Sequence training requires more computation than frame-level training using pre-computed frame data. This leads to several complications for stochastic optimization, arising from significant asynchrony in model updates under massive parallelization, and limited data shuffling due to utterance-chunked...
This paper proposes a lattice-based sequential discriminative training method to extract more discriminative bottleneck features. In our method, the bottleneck neural network is first trained with cross entropy criteria, and then only the weights of bottleneck layer are retrained with sequential criteria. If the outputs of the layer before bottleneck are treated as the raw features, the new method...
A method for speaker normalization in deep neural network (DNN) based discriminative feature estimation for automatic speech recognition (ASR) is presented. This method is applied in the context of a DNN configured for auto-encoder based low dimensional bottleneck (AE-BN) feature extraction where the derived features are used as input to a continuous Gaussian density hidden Markov model (HMM/GMM)...
Recent advances in neural network training provide a way to efficiently learn representations from raw data. Good representations are an important requirement for Music Information Retrieval (MIR) tasks to be performed successfully. However, a major problem with neural networks is that training time becomes prohibitive for very large datasets and the learning algorithm can get stuck in local minima...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.