The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In BioWorld, a medical intelligent tutoring system, novice physicians are tasked with solving virtual patient cases. Whilst the importance of modeling and predicting clinical reasoning is recognized, an important aspect of the learner contribution remains unexplored — the written case summary prepared by the learner. The premise of investigating the case summaries is that it captures the thought and...
For the last few years, text mining has been gaining significant importance. Since Knowledge is now available to users through variety of sources i.e. electronic media, digital media, print media, and many more. Due to huge availability of text in numerous forms, a lot of unstructured data has been recorded by research experts and have found numerous ways in literature to convert this scattered text...
Search By Multiple Examples (SBME) is a new search paradigm that allows users to specify their information needs as a set of relevant documents rather than as a set of keywords. In this study, we propose a Transductive Positive Unlabeled learning (TPU learning) based framework for SBME. The framework consists of two steps: 1) identifying potential relevant documents for searching space reduction,...
With the advent of the information age, various kinds of information have been spread on the Internet. The amount of junk information affects people's lives seriously. In order to filter the harmful Web pages efficiently and effectively, we have suggested a novel text classification algorithm based on Vector Space Model in this paper. This algorithm has adopted the modularized processing mode to deal...
He growth of the online data provides the user a access to information on the Internet but also creates the challenges to obtain the valuable knowledge. In this paper we focus on news text classification, which is meaningful for information provider to organize and display the news but also for the users to reach the valuable information easily. A hierarchy method based on LDA and SVM is proposed...
People rely on data mining techniques like text categorization more and more to explore valuable information, due to the ever-increasing electronic documents produced. Although the energy consumed by text categorization increases with the data, people usually pay attention to its effectiveness and there is little research about its energy cost. In this paper, we evaluate the energy cost of different...
Text classification is one of the most significant contents in Natural Language Processing research field. In most real cases, text classification is usually a multi-label learning task. Currently, there are three mainstream attribute measures (i.e., information gain, document frequency and chi-square test values) which are often used to describe documents. The three attribute measures have been applied...
This paper proposes a new approach for clustering English text documents, based on finding the pair wise correlation of documents in a given set of text documents. The correlation coefficient for each pair of documents is calculated on the basis of ranks given to the words in the documents. The ranking of the words occurring in a document is computed on the basis of weights of the words calculated...
Rapid increases of the documents which are created in digital media necessitate analyze and classify of these documents automatically. Feature extraction, feature selection and classifier selection in the analysis of documents and classification affects performance. In text document categorization, it is a fundamental problem that the numbers of extracted features are a lot of. In this study, by using...
Feature selection plays an important role in text categorization. Classic feature selection methods such as document frequency (DF), information gain (IG), mutual information (MI) are commonly applied in text categorization. But usually they only take plain text into account. Knowledge Gain (KG) is a new feature selection method which is proposed in my previous paper. It measures attribute's importance...
Supervised learning methods rely on large sets of labeled training examples. However, large training sets are rare and making them is expensive. In this research, Latent Semantic Indexing Subspace Signature Model (LSISSM) is applied to labeling for active learning of unstructured text. Based on Singular Value Decomposition (SVD), LSISSM represents terms and documents as semantic signatures by the...
The exponential growth of the data may lead us to the information explosion era, an era where most of the data cannot be managed easily. Text mining study is believed to prevent the world from entering that era. One of the text mining studies that may prevent the explosion era is text classification. It is a way to classify articles into several predefined categories. In this research, the classifier...
The paper proposes a solution for document and aspect levels sentiment analysis for unstructured documents written in the Romanian language. The opinion extraction relies on two different approaches for polarity identification. At the aspect level we propose a rule-based approach. For the document level we consider supervised learning techniques, based on features extracted and filtered in different...
Nowadays, the web is the most relevant data source. Its size does not stop growing day by day. Web page classification becomes crucial due to this overwhelming amount of data. Web pages contain many noisy contents that bias textual classifiers and lead them to lose focus on their main subject. Web pages are related to each other either implicitly by users' intuitive judgments or explicitly by hyperlinks...
Today, as more and more businesses and individuals into the study of cloud computing, data storage in the cloud platform is also growing. So how cloud environment quickly and effectively store, manage and use these data has become a very important and challenging issues. This paper mainly discusses the storage model based on Map/Reduce text categorization, at the same time combining forecasting data...
A new text classification algorithm has been put forward based on basic support vector machine algorithm. The SVM-KNN algorithm for text classification has been proposed which combined SVM algorithm and KNN algorithm. The SVM-KNN algorithm can improve the performance of classifier by the feedback and improvement of classifying prediction probability. The actual effect of SVM-KNN algorithm is tested...
In this paper, a public opinion analysis system is built up. It consists of a crawler used to retrieve online microblog content and a text classifier for distinguishing sentimental content. This system is used to identify public opinions towards certain topics. Microblogs are divided into three categories based on their emotional tendency, namely "positive", "negative" and "objective",...
Twitter users tweet their views in the form of short text messages. Twitter topic classification is classifying the tweets in to a set of predefined classes. In this work, a new tweet classification Method that makes use of tweet features like URL's in the tweet, retweeted tweets and influential users tweet is proposed. Experiments were carried out with extensive tweet data set. The performance of...
The high computational complexity of text classification is a significant problem with the growing surge in text data. An effective but computationally expensive classification is the k-nearest-neighbor (kNN) algorithm. Principal Component Analysis (PCA) has commonly been used as a preprocessing phase to reduce the dimensionality followed by kNN. However, though the dimensionality is reduced, the...
This research investigates the design of a unified framework for the content-based classification of highly imbalanced hierarchical datasets, such as web directories. In an imbalanced dataset, the prior probability distribution of a category indicates the presence or absence of class imbalance. This may include the lack of positive training instances (rarity) or an overabundance of positive instances...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.