The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Deep Neural Networks (DNNs) beat the Gaussian Mixture Models (GMMs), and become the state-of-the-art techniques for acoustic model. Then various neural networks based acoustic models are proposed to make the speech recognition systems better and better. However these successes are not adopted in the researches of Mongolian speech recognition. This study fills in this gap. We study a series of neural...
The best way to prepare for an interview is to review the different types of possible interview questions you will be asked during an interview and practice responding to questions. An interview coaching system tries to simulate an interviewer to provide mock interview practice simulation sessions for the users. The traditional interview coaching systems provide some feedbacks, including facial preference,...
This paper addresses the policy optimization of a dialogue management scheme based on partially observable Markov decision processes (POMDP), which is designed for out-of-domain (OOD) utterances processing in spoken dialogue system. First, POMDP-Based DM Modeling for OOD Utterances is proposed, together with detail of some principal elements. Then, joint state transition exploration and dialogue policy...
Due to the short length, diversity, openness and colloquialism characteristic of out-of-domain (OOD) utterances, dialogue act (DA) recognition for OOD utterances in restricted domain spoken dialogue system remains a great challenge. This paper tackles this problem by proposing an effective DA recognition method using hybrid convolutional neural network (CNN) and random forest (RF). CNN acts as a feature...
Annotating complicated noun phrases is a difficulty in semantic analysis. In this paper we investigate the annotation methods of noun phrases in Nombank, Chinese Nombank and Sinica Treebank trying to propose an annotation scheme based on semantic dependency graph for noun phrases.
This paper describes about the development and details of a linguistic resource, Sense Annotated Hindi Corpus. Word Sense Disambiguation (WSD) is an important task in Natural Language Processing. Sense annotated Hindi Corpus was developed for Lexical Sample WSD task for Hindi language. It consists of 60 polysemous Hindi nouns. The sense inventory for sense annotated Hindi corpus was derived from Hindi...
Keyphrases are short phrases that best represent a document content. They can be useful in a variety of applications, including document summarization and retrieval models. In this paper, we introduce the first dataset of keyphrases for an Arabic document collection, obtained by means of crowdsourcing. We experimentally evaluate different crowdsourced answer aggregation strategies and validate their...
This study examines the challenging issues in the semantic annotation of the characteristics of verbal information of Mandarin Chinese. It proposes a frame-based constructional approach that aligns with linguistic premises in Frame Semantics, Construction Grammar and Cognitive Grammar. Given that semantic processing has a lot to do with human cognitive capacities, semantic transfer and profile on...
The aim of this paper is to develop a system that involves character recognition of Brahmi, Grantha and Vattezuthu characters from palm manuscripts of historical Tamil ancient documents, analyzed the text and machine translated the present Tamil digital text format. Though many researchers have implemented various algorithms and techniques for character recognition in different languages, ancient...
Short Message Service (SMS) spam is a serious problem in Vietnam because of the availability of very cheap prepaid SMS packages. There are some systems to detect and filter spam messages for English, most of which use machine learning techniques to analyze the content of messages and classify them. For Vietnamese, there is some research on spam email filtering but none focused on SMS. In this work,...
With the rapid growth of on-line news media, guarding against malicious news articles is becoming an essential requirement for on-line news service providers. Near duplicate articles are one of the most common types of malicious news articles. However, previous research has concentrated on how to improve the effectiveness and accuracy of finding near-duplicate article pairs or clusters, and not so...
This paper explores the use of statistical methods to describe the phenomenon of parallelism in Classical Chinese poems. We apply a graph-based clustering method to automatically induce word clusters from a corpus of poems. We describe several methods for computing similarity scores. We compare these methods by evaluating the quality of the induced clusters, with respect to a semantic taxonomy for...
Corpus annotation at discourse level requires modeling the entire structure of a discourse. The existing methods have difficulties in differentiate macro- and microstructure of a discourse. Taking account of this, discourse information theory (DIT) provides the theoretical basis for establishing discourse information annotation tagsets and practical annotation methods. Having set up an equation between...
The subtree kernel and the information tree kernel proposed here measure the syntactic similarity of sentences. For two syntactic trees, these kernels are defined, respectively, as the total number of common subtrees in the syntactic trees and the total information content contained in their common subtrees, where the information content of a common subtree is calculated using its probability. Analyses...
Semantic role labeling (SRL) is a task to assign semantic role labels to sentence elements. This paper describes the initial development of an Indonesian semantic role labeling system and its application to extract event information from Tweets. We compare two feature types when designing the SRL systems: Word-to-Word and Phrase-to-Phrase. Our experiments showed that the Word-to-Word feature approach...
Creating a highly accurate pronunciation dictionary plays an important role in building English TTS system to produce high quality synthesised speech. Majority of the existing studies related to building Indian English TTS systems adapt CMU pronunciation dictionary to corresponding target Indian accent. Majority of these studies use hand-crafted rule-based approaches to adapt to the target language...
In this paper, we study language models based on recurrent neural networks on three databases in two languages. We implement basic recurrent neural networks (RNN) and refined RNNs with long short-term memory (LSTM) cells. We use the corpora of Penn Tree Bank (PTB) and AMI in English, and the Academia Sinica Balanced Corpus (ASBC) in Chinese. On ASBC, we investigate word-based and character-based language...
It has been argued that recurrent neural network language models are better in capturing long-range dependency than n-gram language models. In this paper, we attempt to verify this claim by investigating the prediction accuracy and the perplexity of these language models as a function of word position, i.e., the position of a word in a sentence. It is expected that as word position increases, the...
This paper presents initial research on English-to-Tigrinya statistical machine translation (SMT). Tigrinya is a highly inflected Semitic language spoken in Eritrea and Ethiopia. Translation involving morphologically complex languages is challenged by factors including data sparseness, word alignment and language model. We try to address these problems through morphological segmentation of Tigrinya...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.