The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recent developments in deep learning methods have greatly influenced the performances of speech recognition systems. In a Hidden Markov model-Deep neural network (HMM-DNN) based speech recognition system, DNNs have been employed to model senones (context dependent states of HMM), where HMMs capture the temporal relations among senones. Due to the use of more deeper networks significant improvement...
The article presents studies on the automatic whispery speech recognition. In the performed research a new corpus with whispery speech has been used. It has been checked how is the speech recognition quality changing at variables sampling frequency and signal frame length. It has been found that the optimal sampling frequency of whispery speech is about 32–48 kHz, while the optimal signal frame length...
Cognitive impairments are an unavoidable community problem. People suffering from such diseases need all day long attention with varying care difficulty depending on the type of disorder. What makes care harder in the case of autism is the frequent occurrence of self aggressive behaviors. The monitoring system is supposed to detect such situations and differentiate them from similar normal activities...
Vocal Tract Length Normalization (VTLN) is a very important speaker normalization technique for speech recognition tasks. In this paper, we propose the use of Gaussian posteriorgram of VTLN-warped spectral features for a Query-by-Example Spoken Term Detection (QbE-STD). This paper presents the use of a Gaussian Mixture Model (GMM) framework for estimation of VTLN warping factor. This GMM framework...
In this paper, the development of Multilingual Phone Recognition System (MPRS) in the context of Indian languages is described. MPRS is a language independent Phone Recognition System (PRS) that could recognise the phonetic units present in a speech utterance of any language. We have developed two Bilingual and a quadrilingual PRS using four Indian languages — Kannada, Telugu, Bengali, and Odia. International...
The article presents studies on the automatic whispery speech recognition. In the performed research a new corpus with whispery speech has been used. The aim of studies presented in this paper was to check, how the vocabulary size and the language model order influence on the speech recognition quality. It has been concluded that even using recordings with 5,000 different words only it is possible...
In this paper, we propose a novel localization algorithm using LTE signaling data. Specifically, we use TA (Timing Advance) and RSRP (Reference Signal Receiving Power) data that are required in LTE standard and already available in current LTE systems. The combination of (TA, RSRP) is used as a signature, and one can expect that different locations will have distinctive signatures. Our real world...
For the randomness and uncertainty of fault for Radar transmitter, a prognostic method based on discrete Hidden Markov Model (DHMM) is proposed. In the paper, three monitoring parameters of transmitter are collected and a discrete Hidden Markov Model is established. In order to have a fast convergence, The Baum-Welch (B-W) algorithm is used for training of DHMM. Finally, the state probability transition...
To reduce data-storage costs and enhance high accuracy of industrial process fault detection, a data driven fault diagnosis method is proposed based on diffusion maps and hidden Markov model. Firstly, the correlation dimension of sample data is calculated. Secondly, the high-dimensional eigenvectors are extracted into low-dimensional manifold space by diffusion maps. Finally, the low-dimensional eigenvectors...
Generating diverse questions for given images is an important task for computational education, entertainment and AI assistants. Different from many conventional prediction techniques is the need for algorithms to generate a diverse set of plausible questions, which we refer to as creativity. In this paper we propose a creative algorithm for visual question generation which combines the advantages...
Existing RNN-based approaches for action recognition from depth sequences require either skeleton joints or hand-crafted depth features as inputs. An end-to-end manner, mapping from raw depth maps to action classes, is non-trivial to design due to the fact that: 1) single channel map lacks texture thus weakens the discriminative power, 2) relatively small set of depth training data. To address these...
This work presents an iterative re-alignment approach applicable to visual sequence labelling tasks such as gesture recognition, activity recognition and continuous sign language recognition. Previous methods dealing with video data usually rely on given frame labels to train their classifiers. However, looking at recent data sets, these labels often tend to be noisy which is commonly overseen. We...
This work presents a weakly supervised framework with deep neural networks for vision-based continuous sign language recognition, where the ordered gloss labels but no exact temporal locations are available with the video of sign sentence, and the amount of labeled sentences for training is limited. Our approach addresses the mapping of video segments to glosses by introducing recurrent convolutional...
Classification methods typically make use only of labeled data, in what is known as supervised learning. In some applications, however, labeled data is either scarce or costly to obtain. For these applications, unsupervised or semisupervised learning are adequate, since they will be able to use unlabeled data. This work proposes a new method for unsupervised and semisupervised learning of non-Gaussian...
Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as intentions. But how do we model and reason about these? We propose a fully-connected temporal CRF model for reasoning...
We present an approach for weakly supervised learning of human actions. Given a set of videos and an ordered list of the occurring actions, the goal is to infer start and end frames of the related action classes within the video and to train the respective action classifiers without any need for hand labeled frame boundaries. To address this task, we propose a combination of a discriminative representation...
A human action can be seen as transitions between ones body poses over time, where the transition depicts a temporal relation between two poses. Recognizing actions thus involves learning a classifier sensitive to these pose transitions as well as to static poses. In this paper, we introduce a novel method called transitions forests, an ensemble of decision trees that both learn to discriminate static...
We propose a novel method for temporally pooling frames in a video for the task of human action recognition. The method is motivated by the observation that there are only a small number of frames which, together, contain sufficient information to discriminate an action class present in a video, from the rest. The proposed method learns to pool such discriminative and informative frames, while discarding...
Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects outside of training corpora (i.e., novel objects). In this paper, we present Long Short-Term Memory with Copying Mechanism (LSTM-C) — a new architecture...
Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.