The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The volume of academic paper submissions and publications is growing at an ever increasing rate. While this flood of research promises progress in various fields, the sheer volume of output inherently increases the amount of noise. We present a system to automatically separate papers with a high from those with a low likelihood of gaining citations as a means to quickly find high impact, high quality...
Sentiment analysis or opinion mining consist of many different fields like natural language processing, text mining, decision making and linguistics. It is a type of text analysis that classifies the text and makes decision by extracting and analyzing the text. Opinions can be categorized as positive and negative and measures the degree of positive or negative associated with that event (people, organization,...
A critical item of a bug report is the so-called "severity", i.e. the impact the bug has on the successful execution of the software system. Consequently, tool support for the person reporting the bug in the form of a recommender or verification system is desirable. In previous work we made a first step towards such a tool: we demonstrated that text mining can predict the severity of a given...
This article proposes such a question classification approach that integrates multiple semantic features. It is aimed at these two questions in Chinese question classification models: inaccurate semantic information extraction and too slow processing speed caused by too high Eigenvector dimension. With the help of HowNet and the support vector machine and syntactic and semantic information of question...
One of the most serious problems that conventional knowledge management (KM) encompasses has been pointed out tardy and ineffective acquisition of knowledge. To resolve this problem, knowledge must be autonomously acquired according to its context of use by applying the technique of keyword extraction in machine learning algorithm-based text mining. Once the topic of the given knowledge can be identified...
The field of Text Mining has evolved over the past years to analyze textual resources. However, it can be used in several other applications. In this research, we are particularly interested in performing text mining techniques on audio materials after translating them into texts in order to detect the speakers' emotions. We describe our overall methodology and present our experimental results. In...
The following topics are dealt with: data mining; local clustering; spatiotemporal event detection; time series; Markov models; email classification; data stream; parallel mining; Bayesian network; unsupervised learning; missing values prediction; anomaly detection; decision tree; binary classifier; data similarity matrix; data mapping; support vector machine; Mapreduce; document similarity; social...
In this paper, we extend LELC (PU Learning by Extracting Likely Positive and Negative Micro-Clusters) method to cope with positive and unlabeled data streams. Our developed approach, which is called vote-based LELC, works in three steps. In the first step, we extract representative documents from unlabeled data and assign a vote score to each document. The assigned vote score reflects the degree of...
Stemming is a fundamental step in processing textual data preceding the tasks of text mining, Information Retrieval (IR), and natural language processing (NLP). The common goal of stemming is to standardize words by reducing a word to its base (root or stem), thus can be also considered a feature reduction technique. This paper aims at presenting a new dictionary free, content-based Arabic stemmer...
Conference web pages display their topics information in different ways, and conferences in different domains accept papers on different topics. Automatic extraction of topics information from conference web pages is thus a difficult task and has not received much attention from the research community. In this paper, we propose a method for extracting topics information that uses a web page segmentation...
Parallel corpus is the valuable resource for some important applications of natural language processing such as statistical machine translation, dictionary construction, cross-language information retrieval. The Web is a huge resource of knowledge, which partly contains bilingual information in various kinds of web pages. It currently attracts many studies on building parallel corpora based on the...
Text classification is an important research field of data mining topics. This article brings a mutual information and information entropy pair based feature selection method (MIIEP_FS) based on the theory of information entropy and information entropy pair concept. This method measure the classification effect using feature by mutual information method and show the difference extent between the features...
Corpus is the set of language materials which are stored in computers and can use computers to search, query and analyze for enterprise decision-makers. Automated text categorization has been extensively studied and various techniques for document categorization. But based on the current scarcity of Chinese corpus, especially in the field of text categorization, the Chinese categorization corpus is...
Stock price prediction is one of the most important issues to be investigated in academic and financial researches. Data mining techniques are frequently involved in the studies aimed to achieve this problem. In this paper we investigate predicting stock prices using financial news articles. A prediction model, finding and analyzing correlation between contents of news articles and stock prices and...
In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs...
As a developing endeavor of data mining on semi-structured information, sentiment analysis to the comments on the Internet has aroused people's great interest recently. This paper analysis the influence of different stop word removal methods on the result of text classification and represent the more effective stop word removal list. The experiment bases on the sentiment comments which have been grasped...
Rapid progress of network arouses much attention on Internet public opinion, it is important to grasp the internet public opinion in time and understand the trends of their opinion correctly. Text mining plays a fundamental role in categorization and monitoring of internet public opinion, but internet public opinion is much more difficult than pure-text process because of their semi-structured characteristic...
Along with the rapid popularity of the Internet, crime information on the web is becoming increasingly rampant, and the majority of them are in the form of text. Because a lot of crime information in documents is described through events, event-based semantic technology can be used to study the patterns and trends of web-oriented crimes. In our research project on cyber crime mining, we construct...
There is an important issue that text summarization has to embody personal information need and provide indicative message to user. In this paper, a method of acquiring relevant documents based on user-feedback information and transductive inference SVM machine learning is presented. This method can well avoid the subjectivity of deciding relevant documents empirically. Furthermore, a sentence selection...
In this paper, a drug knowledge platform based on intelligent information technology is constructed. Unlike the traditional virtual drug compound library which focus on the patents of drug only, the aim of the new platform is to design a system including information retrieval, information extraction, construction of drug compound and drug ontology, structure based virtual screening, and text classification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.