The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Increasing demands in endovascular intervention have motivated technical skill training and competency-based measures of performance. However, there are no well-established online metrics for technical skill assessment; few studies have explored operator behavioral patterns from catheter motion and operator hand motions. This paper proposes a platform for active online training and objective assessment...
Deep learning approaches have been used to perform classification in several applications with high-dimensional input data. In this paper, we investigate the potential for deep learning for classifying affective touch on robotic skin in a social setting. Three models are considered, a convolutional neural network, a convolutional-recurrent neural network and an autoencoder-recurrent neural network...
A parallel corpus aligned at both sentence and word level is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. This paper presents the first ever empirical evaluation carried out to identify the best unsupervised word alignment technique for Sinhala and Tamil. It also presents...
Novelty detection is the task of recognising events the differ from a model of normality. This paper proposes an acoustic novelty detector based on neural networks trained with an adversarial training strategy. The proposed approach is composed of a feature extraction stage that calculates Log-Mel spectral features from the input signal. Then, an autoencoder network, trained on a corpus of “normal”...
In this paper, we investigate high-resolution modeling units of deep neural networks (DNNs) from concrete to abstract for acoustic scene classification based on Gaussian mixture model (GMM) and ergodic hidden Markov model (HMM). A direct modeling strategy for DNN to classify acoustic scenes is to map each frame feature of an audio to one scene category. However, all frames tagged with the same label...
In this paper, we present Rapid Activity Prediction Through Object-oriented Regression (RAPTOR), a scalable method for performing rapid, real-time activity recognition and prediction that achieves state-of-the-art classification accuracy on both a generic human activity dataset and two domain-specific collaborative robotics manufacturing datasets. Our approach is designed to be human-interpretable:...
One service provided by our application ‘Speech Assistant System’ assisting the teaching of the hearing impaired to speak is the automatic assessment of words and sentences in the course of practice and feedback to the person. Individual speech sounds can only be correctly evaluated if they are compared with the appropriate reference speech sounds. This requires segmenting the speech to be examined...
Recently, bottleneck features as effective representations have been successfully used in Speaker Recognition (SR) and Language Recognition (LR), but little work has focused on bottleneck features for Bird Species Verification (BSV). In SR, LR and BSR tasks, using short-time spectra features may be insufficient, so it need some more abstract and discriminative representations as complementation to...
Urban environments are characterised by the presence of distinctive audio signals which alert the drivers to events that require prompt action. The detection and interpretation of these signals would be highly beneficial for smart vehicle systems, as it would provide them with complementary information to navigate safely in the environment. In this paper, we present a framework that spots the presence...
Learning complex manipulation tasks often requires to collect a large training dataset to obtain a model of a specific skill. This process may become laborious when dealing with high-DoF robots, and even more tiresome if the skill needs to be learned by multiple robots. In this paper, we investigate how this learning process can be accelerated by using shared latent variable models for knowledge transfer...
In a real-life scenario, the acoustic characteristics of speech often suffer from the variations induced by diverse environmental noises and different speakers. To overcome the speaker-related speech variation problem for Automatic Speech Recognition (ASR), many speaker adaptation techniques have been proposed and studied. Almost all of these studies, however, only considered the speakers' long-term...
Hidden Markov Models are very efficient in speech recognition. Based on machine states, HMMs combine Bayesian probability and decision making to approximate each output to its appropriate class. In this paper, we propose to use HMMs for ECG QRS detection. We select a set of models to represent QRS complex and noise aiming to a better discrimination between them. For a total of 44510 beats of the MIT/BIH...
In this research, we consider the related problem of malware classification based on HMMs. We train HMMs for a variety of malware generators and a variety of compilers. The results of HMM are further classified using k means algorithm but k means algorithm has drawback of stuck into local minima so we optimized the k means with genetic algorithm (GA). Genetic algorithm (GA) tuned k means clustering...
We present a simple yet effective LSTM-based approach for recognizing machine-print text from raw pixels. We use a fully-connected feed-forward neural network for feature extraction over a sliding window, the output of which is directly fed into a stacked bi-directional LSTM. We train the network using the CTC objective function and use a WFST language model during recognition. Experimental results...
Due to the variability of writing styles and to other problems related to the nature of Arabic scripts, the recognition of Arabic handwriting is still awaiting accurate results. Segmentation of Arabic handwritten words into graphemes poses a major challenge in Arabic handwriting recognition and is highly error prone. In this paper, we adopt the holistic approach which handles the whole word image...
Power consuming users and buildings with different power consumption patterns may be treated with different conditions and can be taken into consideration with different parameters during capacity planning and distribution. Thus the automated, unsupervised categorization of power consumers is a very important task of smart power transmission systems. Knowing the behavioral categories of power consumers...
Introducing features that better represent the visual information of speakers during the speech production is still an open issue that highly affects the quality of the lip-reading and Audio Visual Speech Recognition (AVSR) tasks. In this paper, three different types of visual features from both the image-based and model-based ones are investigated inside a professional lip reading task. The simple...
The phoneme set influence for Lithuanian speech commands recognition accuracy is investigated. Four phoneme sets are discussed. LIEPA speech corpus for training of Acoustic Model is used. The phonetic representation of corpus transcriptions is generated by grapheme-to-phoneme transformation rules. Rule based transformations for Lithuanian language is proposed. Recognition engine with CMU Pocketsphinx...
We present a novel approach for large speech databases quantization. It uses an unsupervised iterative process to regulate a similarity measure to set the number of clusters and their boundaries, thus overcoming the shortcomings of conventional clustering algorithms such as k-Means and Fuzzy C-Means, which require a priori knowledge of the number of clusters and a similarity measure that follows the...
Speech recognition systems are ubiquitous and find its application in automated voice control, voice dialling and automated directory assistance. This paper aims at implementing a neural network based isolated spoken word recognition system on an embedded board — Raspberry Pi using open source software called octave. Mel-Frequency Cepstral Coefficient (MFCC) features are extracted from speech signal...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.