The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Feature selection algorithm has a great influence on the accuracy of text categorization. The traditional information gain (IG) feature selection algorithm usually selects the features that rarely appear in the specified categories, but frequently appear in other categories. To overcome this drawback, on the basis of in-depth analysis of the related algorithms, an improved IG feature selection method...
Text classification is one of the key methods used in text mining. Generally, traditional classification algorithms from machine learning field are used in text classification. These algorithms are primarily designed for structured data. In this paper, we propose a new classifier for textual data, called Supervised Meaning Classifier (SMC). The new SMC classifier uses meaning measure, which is based...
Feature selection is a strategy that aims at making text classifiers more efficient and accurate. In this paper, we proposed a novel feature selection method based on Tibetan grammar for Tibetan classification. Tibetan language express grammatical meaning through the function words and word order, and the function word has large proportions. By analyzing the Tibetan grammar and distribution of part...
NeuroinformaticsNatural Language Processing (NeuroNLP) relies on clustering and classification for information categorization of biologically relevant extraction targets and for interconnections to knowledge-related patterns in event and text mined datasets. The accuracy of machine learning algorithms depended on quality of text-mined data while efficacy relied on the context of the choice of techniques...
Text Categorization plays an important role in the fields of information retrieval, machine learning, natural language processing, data mining and others. With the development of computer and information technology, there have been many classification algorithms. Each text classification algorithms will get result at differing speeds and efficiency due to the various feature of test text. It has been...
Text classification is the foundation and core of text mining. Naive Bayes is an effective method for text classification. This paper improves the accuracy of Naive Bayes classification using improved information gain, one of methods of feature extraction, by reducing the impact of low-frequency words. In this paper, we use a widely corpus of NLTK. According to the test results, The accuracy of the...
Chinese text classification is always challenging, especially when data are high dimensional and sparse. In this paper, we are interested in the way of text representation and dimension reduction in Chinese text classification. First, we introduces a topic model — Latent Dirichlet Allocation(LDA), which is uses LDA model as a dimension reduction method. Second, we choose Support Vector Machine(SVM)...
The objective of the present work is to design a HADOOP based parallel Marathi content retrieval system using clustering technique to get the efficient and optimized result than existing systems. The system also focuses on providing the personalized documents in Marathi language to the end user based on their interests identified from the browsing history and using time session mechanism for re ranking...
In BioWorld, a medical intelligent tutoring system, novice physicians are tasked with solving virtual patient cases. Whilst the importance of modeling and predicting clinical reasoning is recognized, an important aspect of the learner contribution remains unexplored — the written case summary prepared by the learner. The premise of investigating the case summaries is that it captures the thought and...
For the last few years, text mining has been gaining significant importance. Since Knowledge is now available to users through variety of sources i.e. electronic media, digital media, print media, and many more. Due to huge availability of text in numerous forms, a lot of unstructured data has been recorded by research experts and have found numerous ways in literature to convert this scattered text...
Supervised learning methods rely on large sets of labeled training examples. However, large training sets are rare and making them is expensive. In this research, Latent Semantic Indexing Subspace Signature Model (LSISSM) is applied to labeling for active learning of unstructured text. Based on Singular Value Decomposition (SVD), LSISSM represents terms and documents as semantic signatures by the...
The exponential growth of the data may lead us to the information explosion era, an era where most of the data cannot be managed easily. Text mining study is believed to prevent the world from entering that era. One of the text mining studies that may prevent the explosion era is text classification. It is a way to classify articles into several predefined categories. In this research, the classifier...
The high computational complexity of text classification is a significant problem with the growing surge in text data. An effective but computationally expensive classification is the k-nearest-neighbor (kNN) algorithm. Principal Component Analysis (PCA) has commonly been used as a preprocessing phase to reduce the dimensionality followed by kNN. However, though the dimensionality is reduced, the...
Since the inception of the concept of social networking, communication patterns have shifted drastically with the unmitigated trend in socializing over the Internet, especially when people began connecting via mobile devices. Nowadays people tend to use these modern communication systems to share their emotions with each other. Human emotions play a vital role in human relationships and people share...
Webpage text Classification is an important problem that has been studied through different approaches and algorithms. It aims to assign a predefined category to a Webpage based on its content and linguistic features. It has many applications such as word sense disambiguation, document indexing, text filtering, Webpages hierarchical categorization and document organization. This study is a part of...
Large scale hierarchical classification problem researches how to classify web documents into the categories among a class hierarchy. As the class hierarchy is very large that containing thousands or even tens of thousands of categories, the performance of the classification is still lower. While a reduce-and-conquer strategy has been proposed to make the problem tractable, candidate search is a bottleneck...
Feature selection is one of several factors affecting text classification systems. Feature selection aims to choose a representative subset of all features to reduce the complexity of classification problems. Usually a single method is used for feature selection. For English, several attempts were reported examining the combination of different feature selection methods. To the best of our knowledge...
Many methods, such as mutual information (MI), document frequency (DF), information gain (IG) and χ2 statistics (CHI) algorithm, have been discussed and applied to the study of meta feature selection. This paper gives a brief review of the recent approaches on this topic. By summarizing and synthesizing these approaches, we propose a framework of the application of meta feature selections, where the...
Given the importance of organizing and managing the rapid growth in knowledge of Arabic electronic content, this study introduces the Weirdness Coefficient (W) as a new feature selection method for Arabic special domain text classification. The proposed method was used to classify a dataset comprising five Islamic topics using Naïve base (NB) and K-nearest neighbor (K-NN) classifiers, and three representation...
In Chinese text classification field, the content and size of feature space have decisive impact on accuracy and efficiency. Those two kinds feature information of incremental unlabeled training samples are ignored during current incremental learning research. For large scale of high dimensional Chinese texts, this paper presents a flexible, effective and universal feature selection strategy. In this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.