The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Constructing interaction network from biomedical texts is a very important and interesting work. The authors take advantage of text mining and reinforcement learning approaches to establish protein interaction network. Considering the high computational efficiency of co-occurrence-based interaction extraction approaches and high precision of linguistic patterns approaches, the authors propose an interaction...
The biomedical society makes wide use of text mining technology. Named Entity (NE) extraction is one of the most primary and significant tasks in biomedical information extraction of text mining technology. Named Entity Recognition (NER) involves processing structured and unstructured documents to recognize the definite kinds of entities and categorization of them into some predefined classes. Several...
The internet is rich in directional text (i.e., text containing opinions and emotions). World Wide Web provides volumes of text-based data about consumer preferences, stored in online review websites, web forums, blogs, etc. Sentiment analysis is a technique to classify people's opinions in product reviews, blogs or social networks has emerged as a method for mining opinions from such text archives...
This paper aims to specify a Case-Based Reasoning strategy for correctly classifying, storing and preventing duplication efforts of electronic text material. Preservation of complete source documents for checking similarity between them pose a daunting amount of spatial and computational complexity to researchers in this area. The problem is partially solved by applying certain preprocessing steps...
Supervised classification has been extensively addressed in the literature as it has many applications, especially for text categorization or web content mining where data are organized through a hierarchy. On the other hand, the automatic analysis of brand names can be viewed as a special case of text management, although such names are very different from classical data. They are indeed often neologisms,...
Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, which attempts selecting the proper sense of ambiguous terms. In the biomedical domain, general WSD has not received much attention compared to the disambiguation of specific categories of entities like proteins and genes or diseases. Statistical learning approaches have achieved better...
Internet is becoming an increasingly important platform for ordinary life and work. It is expected that keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. In this paper, we propose to use text clustering method based on semi-supervised learning to get focuses of social topics in a large...
The following topics are dealt with: data mining; local clustering; spatiotemporal event detection; time series; Markov models; email classification; data stream; parallel mining; Bayesian network; unsupervised learning; missing values prediction; anomaly detection; decision tree; binary classifier; data similarity matrix; data mapping; support vector machine; Mapreduce; document similarity; social...
We study the retrieval task that ranks a set of objects for a given query in the pair wise preference learning framework. Recently researchers found out that raw features (e.g. words for text retrieval) and their pair wise features which describe relationships between two raw features (e.g. word synonymy or polysemy) could greatly improve the retrieval precision. However, most existing methods can...
In this work, we investigate sentiment mining of Arabic text at both the sentence level and the document level. Existing research in Arabic sentiment mining remains very limited. For sentence-level classification, we investigate two approaches. The first is a novel grammatical approach that employs the use of a general structure for the Arabic sentence. The second approach is based on the semantic...
Text categorization is one important task of text mining, for automated classification of large numbers of documents. Many useful supervised learning methods have been introduced to the field of text classification. Among these useful methods, K-Nearest Neighbor (KNN) algorithm is a widely used method and one of the best text classifiers for its simplicity and efficiency. For text categorization,...
Parallel corpus is the valuable resource for some important applications of natural language processing such as statistical machine translation, dictionary construction, cross-language information retrieval. The Web is a huge resource of knowledge, which partly contains bilingual information in various kinds of web pages. It currently attracts many studies on building parallel corpora based on the...
Text classification is an important research field of data mining topics. This article brings a mutual information and information entropy pair based feature selection method (MIIEP_FS) based on the theory of information entropy and information entropy pair concept. This method measure the classification effect using feature by mutual information method and show the difference extent between the features...
Corpus is the set of language materials which are stored in computers and can use computers to search, query and analyze for enterprise decision-makers. Automated text categorization has been extensively studied and various techniques for document categorization. But based on the current scarcity of Chinese corpus, especially in the field of text categorization, the Chinese categorization corpus is...
In this paper, we define an activity by five basic attributes: actor, action, object, time and location. The goal of this paper is to describe a method to automatically extract all attributes in each sentence retrieved from Japanese weblogs. Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled,...
A representation of the World Wide Web as a directed graph, with vertices representing web pages and edges representing hypertext links, underpins the algorithms used by web search engines today. However, this representation involves a key oversimplification of the true complexity of the Web: an edge in the traditional Web graph represents only the existence of a hyperlink; information on the context...
The mining methods for comment text polarity are usually used to adopted supervised learning algorithms, but supervised learning algorithms require significant manual labor marked the training set, and its text set in dealing with will be also faced with dimension disaster, sparse vector, high spatial and temporal complexity, low recall and precision rates that cannot be used for a flood of text polarity...
This paper presents the concept of surface text patterns for extracting purpose data from the web. In order to obtain an optimal set of patterns, we have developed a method for learning purpose patterns automatically. A corpus was downloaded from the Internet using bootstrapping by providing a few hand-crafted examples of each purpose pattern to a generic search engine. This corpus was then tagged...
It is vital to develop automatic information extraction systems to help researchers cope up with the vast amount of data available on the Internet. In this paper, we describe a framework to extract precise information about coexpression relationship among genes, from published literature using a supervised machine learning approach. We use a graphical model, Dynamic Conditional Random Fields (DCRFs),...
There is an important issue that text summarization has to embody personal information need and provide indicative message to user. In this paper, a method of acquiring relevant documents based on user-feedback information and transductive inference SVM machine learning is presented. This method can well avoid the subjectivity of deciding relevant documents empirically. Furthermore, a sentence selection...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.