Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
Biometric is a pattern recognition system that automatically identifies people according to their physiologic and behavioral properties. Among the physiologic properties, hand has a special place so that all features of hand like palm lines, inner knuckles, external knuckles and geometry could be used. More recently, the usage of blood vessels pattern in the palm, in addition to the high acceptability,...
In statistical parametric speech synthesis (SPSS), a few studies have investigated the Lombard effect, specifically by using hidden Markov model (HMM)-based systems. Recently, artificial neural networks have demonstrated promising results in SPSS, specifically by using long short-term memory recurrent neural networks (LSTMs). The Lombard effect, however, has not been studied in the LSTM-based speech...
Constructing deep neural network (DNN) acoustic models from limited training data is an important issue for the development of automatic speech recognition (ASR) applications that will be used in various application-specific acoustic environments. To this end, domain adaptation techniques that train a domain-matched model without overfitting by lever-aging pre-constructed source models are widely...
Adapting acoustic models to speakers have shown to greatly improve performance for many tasks. Among the adaptation approaches, exploiting auxiliary features characterizing speakers or environments has received great attention because they allow rapid adaptation, i.e. adaptation with limited amount of speech data such as a single utterance. However, the auxiliary features are usually computed in batch...
We train grapheme-based acoustic models for speech recognition using a hierarchical recurrent neural network architecture with connectionist temporal classification (CTC) loss. The models learn to align utterances with phonetic transcriptions in a lower layer and graphemic transcriptions in the final layer in a multi-task learning setting. Using the grapheme predictions from a hierarchical model trained...
We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training...
Speech recognition in varying background conditions is a challenging problem. Acoustic condition mismatch between training and evaluation data can significantly reduce recognition performance. For mismatched conditions, data-adaptation techniques are typically found to be useful, as they expose the acoustic model to the new data condition(s). Supervised adaptation techniques usually provide substantial...
Methods for adapting and controlling the characteristics of output speech are important topics in speech synthesis. In this work, we investigated the performance of DNN-based text-to-speech systems that in parallel to conventional text input also take speaker, gender, and age codes as inputs, in order to 1) perform multi-speaker synthesis, 2) perform speaker adaptation using small amounts of target-speaker...
Automatic transcriptions of consumer generated multi-media content such as “Youtube” videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited.
Recently, the low-rank plus diagonal (LRPD) adaptation was proposed for speaker adaptation of deep neural network (DNN) models. The LRPD restructures the adaptation matrix as a superposition of a diagonal matrix and a product of two low-rank matrices. In this paper, we extend the LRPD adaptation into the subspace-based approach to further reduce the speaker-dependent (SD) footprint. We apply the extended...
Recognising detailed clothing characteristics (finegrained attributes) in unconstrained images of people inthe-wild is a challenging task for computer vision, especially when there is only limited training data from the wild whilst most data available for model learning are captured in well-controlled environments using fashion models (well lit, no background clutter, frontal view, high-resolution)...
This paper proposes a novel learning-based image super-resolution via a weighted random forest model (SWRF). The proposed method uses the LR-HR training data to train a random forest model. The underlying idea of this approach is to use several decision trees to classify the training data based on a simple splitting threshold value at each class. A linear regression model is learnt to map the relationship...
While semantic visual attributes have been shown useful for a variety of tasks, many attributes are difficult to model computationally. One of the reasons for this difficulty is that it is not clear where in an image the attribute lives. We propose to tackle this problem by involving humans more directly in the process of learning an attribute model. We ask humans to examine a set of images to determine...
In this work, we decompose a first-person action into verb and noun. We then study how the coupling of an action's constituent verb and noun affects the learners' ability to learn them separately and to combine them to perform recognition. We compare different information fusion methods on conventional action recognition and zero-shot learning, of which the latter is a strong indication of the feature's...
Semantic labels are crucial parts of many location-based applications. Previous efforts in location-based systems have mostly paid attention to achieve high accuracy in localization or navigation, with the assumption that the mapping between the locations and the semantic labels are given or will be done manually. In this paper, we propose a system called Deep-Crowd-Label that automatically assigns...
Conventional machine learning algorithms based on keystroke dynamics build a classifier from labeled data in one or more sessions but assume that the dataset at the time of verification exhibits the same distribution. Ideally, the keystroke data collected at a session is expected to be an invariant representation of an individual's behavioral biometrics. In real applications, however, the data is...
Regarding the spoken language understanding (SLU) pilot task of the Dialog State Tracking Challenge 5 (DSTC5), it is required to classify label sets of speech acts on human-to-human dialogues. In this paper, we propose a multi-label classification model with the assistance of algorithm adaptation method. To be specific, a Convolutional Neural Network (CNN) model on top of pre-trained word vectors...
Recent studies show that increasing numbers of design bugs are escaping to post-silicon due to the complexity of advanced designs and the lack of adequate verification tools that can validate complex electrical interactions between electrical subsystems on an integrated circuit. In this paper, we present a novel tool for post-silicon validation of mixed-signal/RF circuits through cooperative test...
Deep neural networks are intensively researched field of artificial intelligence. Big companies like Google, Microsoft, Baidu or Facebook are supporting research and development in this field. The recent victory over human player in the game of Go points to a huge potential of this approach. Machine learning approaches based on deep learning techniques bring significant gain over existing methods...
Speaker Transformation adapts the speaker dependent characteristics of the source speaker according to that of a target speaker, so that it is perceived like the target speaker. Speaker Transformation is generally carried out using speech analysis-synthesis system. The full-band adaptive Harmonic Model (a-HM) based analysis-synthesis has ability to produce a high quality resynthesized speech. Thus...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.