The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays, text classification (TC) becomes the main applications of NLP (natural language processing). Actually, we have a lot of researches in classifying text documents, such as Random Forest, Support Vector Machines and Naive Bayes. However, most of them are applied for English documents. Therefore, the text classification researches on Vietnamese still are limited. By using a Vietnamese news corpus,...
Cardiac Resynchronization Therapy (CRT) is an established pacing therapy for heart failure patients. The New York Heart Association (NYHA) classification is often used as a measure of a patient's response to CRT. Identifying NYHA class for heart failure patients in an electronic health record (EHR) consistently, over time, can provide better understanding of the progression of heart failure and assessment...
Coreference resolution plays a significant role in natural language processing systems. It is the method of figuring out all the noun phrases that refer back to the identical real world entity. Several researches have been done in noun phrase coreference resolution by using certain machine learning techniques. Our paper proposes a machine learning approach using support vector machines (SVM) towards...
This paper presents the results of systematic and comparative experimentation with major types of methodologies for automatic duplicate question detection when these are applied to datasets of progressively larger sizes, thus allowing to study the learning profiles of this task under these different approaches and evaluate their merits. This study was made possible by resorting to the recent release...
People tend to read multiple news articles on a topic since a single article may not contain all important information. A summary of all the articles related to topic will save the time and energy. Text Summarization is a way of minimizing a textual document to a meaningful summary. In this research, an extractive-based approach is used to generate a two-level summary from online news articles. News...
Due to the vast amount of data, searching and obtaining relevant information on the web is a challenging task. Despite that a broad range of classification techniques have been proposed to improve the information retrieval methods, many difficulties are still present because of the continuous increase in the amount of web contents, as well as its diversity. In this paper, we propose a method that...
Named Entity Recognition (NER) is an important natural language processing (NLP) tool for information extraction and retrieval from unstructured texts such as newspapers, blogs and emails. NER involves processing unstructured text for classification of words or expressions into relevant categories. In literature, NER has been developed for various languages but limited work has been conducted to develop...
In this paper we tackle the issue of sentiment analysis of social network posts in a not well targeted language — Slovak. There is a significant lack of research in this area for minor languages, as they often introduce additional language-specific issues for text processing. In case of Slovak, common issues are high flection, complex morphology and syntax. User-generated content of social networks...
Stop words occur multiple times in a document and the occurrence of stop words have least semantic value in the document sentences. These words cover a noteworthy bundle of archives that have no semantic significance. So, the stop words ought to be removed for better language description. In this paper, we have proposed a proficient algorithm which will eliminate the Urdu document stop words. Many...
With the tremendous development of data science, using unstructured documents to analyze marketing dynamics is attracting a great deal of attention. In this letter, we propose an iterative scheme to extract the new words, which is often a bottleneck for Chinese natural language processing (NLP) in financial markets analysis. In contrast to existing static features, the key novelty is the proposed...
A Question Answering (QA) system backed by a comprehensive and up-to-date knowledge base would be appropriate for travellers to satisfy their information needs. In this paper, a complete QA system is presented. It has two main phases: question identification (Expected Answer Type (EAT) identification) and searching the knowledge base (KB) to find the answer to the classified question. In QA systems,...
This work includes processing and classification of tweets which are written in Turkish language. Four different sector tweet datasets are vectorized with Word Embedding model and classified with Support Vector Machine and Random Forests classifiers and results have been compared. We have showed that sector based tweet classification is more successful compared to general tweets. Accuracy rates for...
The rapid increase in the number of the electronic and online texts such as electronic mails, online newspapers and magazines, blog posts and online forum messages has also accelerated the studies carried out on authorship attribution. Although the studies are not as abundant as in English language, there have been considerable studies on author identification in Turkish in the last fifteen years...
In natural language processing and text mining, highly successful applications are developed with the recently introduced techniques. Particularly, noticeable performance increases are achieved on countless applications by using word embedding method. In this paper, we propose a novel text mining method based on word embedding and Fisher vector. The automatic analysis of political records is selected...
The analysis of user generated content on social media and the accurate specification of user opinions towards products and events is quite valuable to many applications. With the proliferation of Web 2.0 and the rapid growth of user-generated content on the web, approaches on aspect level sentiment analysis that yield fine grained information are of great interest. In this work, a classifier ensemble...
Sentiment Analysis is the process which helps to identify and classifying the opinions or feelings expressed in opinioned data, in order to ascertain whether the attitude of the writer towards a particular service, product etc. is negative, positive or neutral. Sentiment analysis also helps the consumers to identify if the information in the neighborhood of the product or service is satisfactory or...
Natural language processing and machine learning can be applied to student feedback to help university administrators and teachers address problematic areas in teaching and learning. The proposed system analyzes student comments from both course surveys and online sources to identify sentiment polarity, the emotions expressed, and satisfaction versus dissatisfaction. A comparison with direct-assessment...
Authorship analysis deals with the identification of authors which is a problem of text data mining and classification. There are numerous techniques and algorithms that have been published so far, in the field of stylometry. In this regard, the primary objective of the present review is to provide the status of the different studies carried out on authorship analysis based on the important research...
Sentiment Analysis (SA) is the task of detecting people's emotions from their written text. Many algorithms have been studied for that purpose, with different authors claiming one or the other as better by a given metric. In recent years, the focus of SA has shifted to online text and microblog text, messages so short that good analysis becomes difficult that the choice of algorithm becomes critical...
Medical synonym identification has been an important part of medical natural language processing (NLP). However, in the field of Chinese medical synonym identification, there are problems like low precision and low recall rate. To solve the problem, in this paper, we propose a method for identifying Chinese medical synonyms. We first selected 13 features including Chinese and English features. Then...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.