The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the rapid growth of on-line news media, guarding against malicious news articles is becoming an essential requirement for on-line news service providers. Near duplicate articles are one of the most common types of malicious news articles. However, previous research has concentrated on how to improve the effectiveness and accuracy of finding near-duplicate article pairs or clusters, and not so...
Corpus annotation at discourse level requires modeling the entire structure of a discourse. The existing methods have difficulties in differentiate macro- and microstructure of a discourse. Taking account of this, discourse information theory (DIT) provides the theoretical basis for establishing discourse information annotation tagsets and practical annotation methods. Having set up an equation between...
In this paper, we study language models based on recurrent neural networks on three databases in two languages. We implement basic recurrent neural networks (RNN) and refined RNNs with long short-term memory (LSTM) cells. We use the corpora of Penn Tree Bank (PTB) and AMI in English, and the Academia Sinica Balanced Corpus (ASBC) in Chinese. On ASBC, we investigate word-based and character-based language...
It has been argued that recurrent neural network language models are better in capturing long-range dependency than n-gram language models. In this paper, we attempt to verify this claim by investigating the prediction accuracy and the perplexity of these language models as a function of word position, i.e., the position of a word in a sentence. It is expected that as word position increases, the...
The constitutive role is one of qualia roles, this paper presents an approach for the automatic acquisition of constitutive role for Chinese nouns. In our methods, we get the shortest dependency path as a constitutive role extraction pattern, and calculate the credibility of each pattern. Then according to the co-occurrence information of nouns, we will mine the (Error! Reference source not found...
Morphemes are not independent units and attached to each other based on morphotactics. However, they are assumed to be independent from each other to cope with the complexity in most of the models in the literature. We introduce a language independent model for unsupervised morphological segmentation using hierarchical Dirichlet process (HDP). We model the morpheme dependencies in terms of morpheme...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.