Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
The paper describes an automatic parts-of-speech tagging for Bengali sentences using Global Linear Model (GLM) which learns to represent the whole sentence through a feature vector called Global feature. Tagger has been trained using averaged perceptron algorithm. Performance of this tagger has been compared to Conditional Random Field (CRF), Support Vector Machine (SVM), Hidden Markov Model (HMM)...
Parsing based on tree bank is a central issue of current natural language processing. The machine learning method of SVM and the dependency tree bank of HIT-IR-CDT is adopted in this work. In order to increase the parsing accuracy by linguistic means, verb subdivision and noun incorporation is done. The result shows, after verb subdivision, the accuracy of unlabeled attachment score increases from...
Textual Entailment (TE) is a critical issue in natural language processing (NLP); many NLP applications can be benefited from the recognition of textual entailment (RTE). In this paper we report our observation on how to improve the Chinese textual entailment system and the experiment results on the NTCIR-10 RITE-2 dataset. To complement the traditional machine learning approach, which treat every...
Language Identification (LI) is the process of determining the natural language in which the given content is written. It is an important preprocessing step in many tasks of Natural Language Processing (NLP). In a multilingual society like India, automatic language identification has a wider scope, since it would be a vital step in bridging the digital divide between the Indian masses and others....
A gene regulatory network (GRN) is a network of interacting cellular components. The components are genes and their products, and the interactions represent regulatory relationships among genes, specifically activation and inhibition of gene expression, under certain conditions. Many regulatory relationships are known in the literature. However, assembling isolated relationships into networks is a...
With the exponential growth of the online available Arabic documents, classifying and processing large Arabic corpora has became a challenging task. The presence of noisy information embedded in these documents has made it even more difficult to get accurate results when applying a Topic Detection (TD) process. To address this problem, a proper features selection approach is needed to enhance the...
The task of Named Entity Recognition (NER) is crucial to Natural Language Processing (NLP). NER can be defined as the computational identification and classification of Named Entities in running text. The importance of NER stems from the variety of Natural Language Processing applications where accurate NLP would be highly useful. Such include machine translation and information extraction. In this...
Wordplay generated by letters of its original word being repeated is commonly found in social network texts. Most of the time, wordplay items of this type are ambiguous to machines in language processing tasks such as Text-to-Speech. This paper shows some statistics on the number of letters from 102,586 real social network text items and proposes a set of classification features together with a few...
Following a statistical study carried out on the typographical errors committed when typing documents in Arabic language, it was found that most of these typos are character permutation errors, accounting for 65% of overall errors.
Keyword extraction problem is one of the most significant tasks in information retrieval. High-quality keyword extraction sufficiently influences the progress in the following subtasks of information retrieval: classification and clustering, data mining, knowledge extraction and representation, etc. The research environment has specified a layout for keyphrase extraction. However, some of the possible...
We propose an approach to domain adaptation that selects instances from a source domain training set, which are most similar to a target domain. The factor by which the original source domain training set size is reduced is determined automatically by measuring domain similarity between source and target domain as well as their domain complexity variance. Domain similarity is measured as divergence...
Sentiment analysis aims to automatically estimate the sentiment in a given text as positive or negative. Polarity lexicons, often used in sentiment analysis, indicate how positive or negative each term in the lexicon is. However, since creating domain-specific polarity lexicons is expensive and time consuming, researchers often use a general purpose or domain independent lexicon. In this work, we...
We describe a methodology for identifying characterizing terms from a source text or paper and automatically building an ontology around them, with the purpose of semantically categorizing a paper corpus where documents sharing similar subjects may be subsequently clustered together by means of ontology alignment. We first employ a Natural Language Processing pipeline to extract relevant terms from...
The automatic insertion of diacritics in electronic texts is necessary for a number of languages, including French, Romanian, Croatian, Sindhi, Vietnamese, etc. When diacritics are removed from a word and the resulting string of characters is not a word, it is easy to recover the diacritics. However, sometimes the resulting string is also a word, possibly with different grammatical properties or a...
Standard supervised approach to sentiment classification requires a large amount of manually labeled data which is costly and time-consuming to obtain. To tackle this problem, we propose a novel semi-supervised learning method based on multi-view learning. The main idea of our approach is generate multiple views by exploiting both feature partition and language translation strategies and then standard...
With the emergence of Web 2.0, Sentiment Analysis is receiving more and more attention. Several interesting works were performed to address different issues in Sentiment Analysis. Nevertheless, the problem of Unbalanced Data Sets was not enough tackled within this research area. This paper presents the study we have carried out to address the problem of unbalanced data sets in supervised sentiment...
Hidden Markov models (HMM) have been widely used in natural language processing (NLP), especially in syntactic level applications, which appears naturally as short-range-dependent sequence recognition problems. But the structure of HMM limits the usage of global knowledge including the sentiment analysis of the text, which has become an increasingly popular research topic in NLP now. In this paper,...
Word sense disambiguation is an important intermediate stage for many natural language processing applications, especially transformation from Cyrillic into Mongolian script. A word sense could be disambiguated by other words in the context as nouns, verbs used with the word. In this research, we have analyzed the result of an experiment on a word disambiguation system for Mongolian language based...
Due to the development of World Wide Web technologies, people are living in the place flooding trillions of web pages in every moment. The amount of web size has been increasing dramatically. For this reason, it is getting more difficult to find relevant web documents corresponding to what users want to read. Classifying documents into predefined categories is one of the most important tasks in Natural...
In this paper, we describe how we improve our system for Chinese Textual Entailment Recognition by a monolingual machine translation system. Previously, our approach is based on the standard supervised learning classification. We integrate the result of monolingual machine translation system with the other available computational linguistic resources of Chinese language processing to build the system...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.