The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we investigate the punctuated transcription of multi-genre broadcast media. We examine four systems, three of which are based on lexical features, the fourth of which uses acoustic features by integrating punctuation into the speech recognition acoustic models. We also explore the combination of these component systems using voting and log-linear interpolation. We performed experiments...
We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance...
Recently, machine learning methods have provided a broad spectrum of original and efficient algorithms based on Deep Neural Networks (DNN) to automatically predict an outcome with respect to a sequence of inputs. Recurrent hidden cells allow these DNN-based models to manage long-term dependencies such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these RNNs process...
This paper describes the automatic speech recognition (ASR) systems developed by LIUM in the framework of the 2016 Multi-Genre Broadcast (MGB-2) Challenge in the Arabic language. LIUM participated in the first of the two proposed tasks, namely the speech-to-text transcription of Aljazeera recordings. We present the approaches and details found in our systems, as well as our results in the evaluation...
Morphological analysis is an essential step for processing the Korean language, due to highly agglutinative properties of the language. In this paper, we propose a novel approach for constructing a Korean morphological analyzer that can capture linguistic properties using graphemes as basic processing units. Since our model does not utilize prior linguistic knowledge, the model can be applied to other...
In this paper, we investigate a DNN tone-based extended recognition network (ERN) approach to Mandarin tone recognition and tone mispronunciation detection. Given a toneless syllable sequence, a tone-based ERN is constructed by assigning five different tones to each toneless syllable, obtaining a fully expanded tonal syllable network. Next, Viterbi decoding is carried out on the tone-based ERN to...
To work safely, efficiently and robustly, Advanced Driver Systems (ADAS) need a substantial understanding of the environment. Just as a human driver, the system needs to interpret the current situation and its possible developments, especially when it comes to longer prediction horizons or complex urban scenarios. The prerequisite of prediction is the recognition of traffic participants' behaviour...
The act of predicting a destination and route a user will take, as soon as he/she begins to move, has several benefits. A system with this kind of information is able to help the user to avoid a congested route or to suggest a Place of Interest (POI). Nowadays, the task of tracking a user movement is more feasible thanks to current smartphones, with embedded GPS devices. Many related work addresses...
As wireless communication and mobile devicesadvances, recommendation system is one of the keytechnologies to realize personalized service. This paperproposes a service recommendation mechanism using aprobabilistic model in mobile devices. With the contextualinformation and the use's demand state inferred by the model, we can recommend a service to meet the user's preferencesand needs at real time...
Creating a highly accurate pronunciation dictionary plays an important role in building English TTS system to produce high quality synthesised speech. Majority of the existing studies related to building Indian English TTS systems adapt CMU pronunciation dictionary to corresponding target Indian accent. Majority of these studies use hand-crafted rule-based approaches to adapt to the target language...
The main goal of this paper is to explain important terms of the word sense disambiguation (WSD) in the Slovak language. A comprehensive survey of current approaches and evaluation methodologies is provided. Special attention is given to necessary language resources and tools. The paper deals with problems specific to Slovak language: missing language resources, rich morphology, free word order and...
In functional-structural plant models, inferring latent levels of organization from data while accounting for both connections between levels and within-individual heterogeneity is a challenging task. Here, we develop an approach based on multiple change-point models. It aims at partitioning a heterogeneous tree into homogeneous subtrees of consequent sizes. While multiple change-point models for...
This paper studies the occupancy and movement prediction of residents in the smart home based on a compression-based sequential prediction approach and discusses home automation applications that can benefit from such predictions. The prediction approach studied here is based on the Active LeZi algorithm, which is a compression-based approach that uses an order-k Markov model. The effects of the order...
In order to facilitate and improve robots social acceptance, they must be equipped with behaviors similar to those of humans. It is therefore necessary to study and model the phenomenon to be reproduce. This paper studies and analyzes the physical parameters of the handshake in order to have its characteristic features (frequency, duration, strength, synchronization, etc.) used to model this interaction...
The human activity recognition is widely used for human behavior prediction especially for dependent people. This is achieved to provide safety, health monitoring, and well being of this population at home. In this paper, the problem of human activity recognition is reformulated as joint segmentation of multidimensional time series. The hidden Markov model regression (HMMR) is used to perform unsupervised...
Proofreading, the act of checking first-draft writings performed by native experts, is essential for professional writing by non-native speakers. How to automatically proofread could be an interesting topic of NLP, but have not yet been well-explored. Our research carried out the first step toward automatic proof-reading by automatically analyzing the correspondences between original and proofreading...
This paper studies the use of word embeddings for POS tagging in Bahasa Indonesia. The experiments are conducted with an architecture based on neural network model, that is a simple feed forward neural network with one hidden layer. The word embeddings (i.e., CBOW, skip-gram, and GloVe) are trained on unlabelled text corpus created from Wikipedia Bahasa Indonesia. The results show that word embeddings...
By far there are more than 1.2 million Dai compatriots using Dai language in Yunnan province, researching Dai speech synthesis has great significance in advancing the informationization of Dai. This paper focuses on the study of the implementation of Dai speech synthesis by taking the HMM speech synthesis framework and STRAIGHT synthesizer into account. The methods of collection and selection of Dai...
This paper presents a deep neural network (DNN)-based unit selection method for waveform concatenation speech synthesis using frame-sized speech segments. In this method, three DNNs are adopted to calculate target costs and concatenation costs respectively for selecting frame-sized candidate units. The first DNN is built in the same way as the DNN-based statistical parametric speech synthesis, which...
Punctuation plays an important role in language processing. However, automatic speech recognition systems only output plain word sequences. It is then of interest to predict punctuations on plain word sequences. Previous works have focused on using lexical features or prosodic cues captured from small corpus to predict simple punctuations. Compared with simple punctuations, rich punctuations provide...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.