The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the last ten years, automatic Text Categorization (TC) has been gaining an increasing interest from the research community, due to the need to organize a massive number of digital documents. Following a machine learning paradigm, this paper presents a model which regards TC as a classification task supported by a wrapper approach and combines the utilization of a Genetic Algorithm (GA) with a filter...
Text classification is an important research direction of text mining and the research of Chinese text automatic classification is also becoming a research focus of intelligent classification. Against the particularity of the Chinese text classification, this paper presents a three-dimensional vector space model on the basis of the vector space model to improve the accuracy and efficiency of text...
The high dimensionality of the text categorization raises big hurdles in applying many sophisticated learning algorithms to the text categorization. Feature selection, which reduces the number of features that represent documents, is an absolute requirement in text categorization. In this paper, we proposed a feature selection method, which improved the performance of the Ambiguity Measure feature...
With the highly increasing availability of text data on the Internet, the process of selecting an appropriate set of features for text classification becomes more important, for not only reducing the dimensionality of the feature space, but also for improving the classification performance. This paper proposes a novel feature selection approach to improve the performance of text classifier based on...
Text categorization is used to assign each text document to predefined categories. This paper presents a new text classification method for classifying Chinese text based on Rocchio algorithm. We firstly use the TFIDF to extract document vectors from the training documents which have been correctly categorized, and then use those document vectors to generate codebooks as classification models using...
Self-labeled training data in semi-supervised learning may contain much noise due to the initial insufficient training data, which may hurt the generalization ability of the final hypothesis. In this paper, we propose an Active Semi-Supervised framework with Data Editing(ASSDE) to improve sparsely labeled text classification. A data editing technique is used to identify and remove noise introduced...
Supervised learning is a popular approach to text classification among the research community as well as within software development industry. It enables intelligent systems to solve various text analysis problems such as document organization, spam detection and report scoring. However, the extremely difficult and time intensive process of creating a training corpus makes it inapplicable to many...
The traditional KNN algorithm for text classification has some insufficiencies, an improved KNN algorithm has been presented in this paper. By use of the clustering center vector, we put the distance of the be classified text and the text category into the similarity calculation formula, and take the ratio of the number of common features appear in two texts and the maximum number of respective features...
This paper analyses the defections of traditional support vector machine (for short SVM). According to the characteristics of grain information on the web, a multi-class classification method based on Huffman binary tree SVM (for short HBT-SVM) is presented for grain information classification. Compared with existing SVM methods, this method has higher computation efficiency. The experimental results...
Clustering aided classification methods are based on the assumption that the learned clusters under the guidance of initial training data can somewhat characterize the underlying distribution of the data set. However, our experiments show that whether such assumption holds is based on both the separability of the considered data set and the size of the training data set. It is often violated on data...
It is a great challenge for information technology that how to organize and manage large amount of document data, and find users' interested information quickly and exactly. Text classification can achieve the goal of information distributaries and solve the problem of information disorder, and then it can offer the convenience to users to make decisions. Centroid classifier is one of the most efficient...
with the rapid development of the Computer Science and Technology, It has become a major problem for the users that how to quickly find useful or needed information. Text categorization can help people to solve this question. The feature selection method has become one of the most critical techniques in the field of the text automatic categorization. A new method of the text feature selection based...
One of the several benefits of text classification is to automatically assign document in predefined category. Researchers using LVQ algorithm in English and Persian [1, 2] and don't be attention for Arabic language. So in our research, we used neural network approach for classify Arabic text by using Learning Vector Quantization (LVQ) algorithm. This algorithm is based on Kohonen self organizing...
Text categorization is the main issue which affects search results. Moreover, most approaches suffer from the high dimensionality of feature space. To overcome this problem, the use of feature selection techniques with statistical text categorization is investigated. The methods were evaluated based on Chi-Square, Information Gain and Gain Ratio. The data used to test the system consisted of 1,510...
Text classification is the process of assigning document to a set of previously fixed categories. It is widely used in many applications, such as web page categorization, email spam filtering, and document indexing, etc. Many popular algorithms for text classification have been proposed, such as Naive Bayes, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). However, these classification...
In order to overcome the SVM for text classification ignoring the context of semantic information and the use of a community to text classification, one boundary point can only belong to a community of view, the concept of contribution and overlapping coefficient based on the complex network diagram is introduced. And feature selection algorithm based on community discovery is proposed. Experiments...
Information has a great value, in order to use the existing information we need to store it in a manner which can be retrieved easily when needed. So classifying the available information becomes inevitable. In addition to the existing supervised and unsupervised paradigms of classification the paper attempts to exploit the concept of semi-supervised learning paradigm. Semi-supervised learning is...
The process of text categorization has been used in many applications and areas. Classifying of Arabic texts is different than classifying of English texts because Arabic is highly inflectional and derivational language which makes monophonical analysis a very complex task. This short paper has made a review of some researches in Arabic text categorization, and recent works for adopting rough sets...
A major difficulty of text categorization is extremely high dimensionality of text feature space. The use of feature selection techniques for large-scale text categorization task is desired for improving the accuracy and efficiency. χ2 statistic and simplified χ2 are two effective feature selection methods in text categorization. Using these two feature selection criteria, for a term, one needs to...
Effective classification of web pages can improve the quality of information retrieval. The traditional classification algorithms are basically based on the analysis of Web content, but the content of the web page is complicated, filled with a large number of false, erroneous information, has seriously affected the accuracy of the classification of network information. To solve this problem, this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.