The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Naïve Bayes classifier is proved to be one of the most effective classifier an be used widely. It applies statistical theory to text classification. This paper researched and implemented a Chinese text classifier using JAVA base on Naïve Bayes Method. First of all, this paper described test classification system, the content includes text information expressing, extracting and the method of Chinese...
Text classification is enduring to be one of the most researched problems due to continuously-increasing amount of electronic documents and digital data. Naive Bayes is an effective and a simple classifier for data mining tasks, but does not show much satisfactory results in automatic text classification problems. In this paper, the performance of naive Bayes classifier is analyzed by training the...
Text classification plays an important role in information extraction and summarization, text retrieval, and question-answering. The discriminative multinomial naive Bayes classifier has been a focus of research in the field of text classification. This paper increases the accuracy of discriminative multinomial Bayesian classifier with the usage of the feature selection technique that evaluates the...
Text classification is continuing to be one of the most researched problems due to continuously-increasing amount of electronic documents and digital data. Classifying documents to closely related categories is the most complex task in text categorization. Feature selection is an essential preprocessing step for improving the efficiency and accuracy of the text classifiers by removing redundant and...
We consider the problem of both supervised and unsupervised classification for multidimensional data that are non-Gaussian and of mixed types (continuous and/or discrete). An important subclass of graphical model techniques called generalized linear statistics (GLS) is used to capture the underlying statistical structure of these complex data. GLS exploits the properties of exponential family distributions,...
Text classification refers to determine the class of an unknown text according to its content in the given classification system. In order to extract fewer features to express the information in the text as much as possible, the paper analysis the various features' statistical properties and to extract the global features according to Zipf's law; and then, based on the statistical analysis of the...
Researchers have concentrated on topic-based text classification while the genre of a document is rarely considered. In this article, we discuss the automatic genre classification and its application. We argue that word level features and sentence level features are two important measures which vary in number among different genres. Word level features include word frequency and POS (part of speech)...
Key words are expressions that indicate and express the subject concept of a text, the major property of key words is to denote subject. Based on the domainal inhomogeneity and critical region of key words, subject degree is brought up and calculated by statistical model to cue textpsila subject concept. Based on key words and itspsila subject degree, constructed a comprehensive auto-indexing system,...
Text categorization is a key issue of text mining. Although there are many studies on this problem, the majority of them are focused on classification of rough categories. In this kind of problem, there are obviously different features that can differentiate one category from others. Only very few researches concerned fine text categorization (FTC) problem which is characterized by many duplicated...
Port state control (PSC) inspection is the most important mechanism to ensure world marine safe. Recently, some SVM-based risk assessment systems have been presented in the world. They estimate the risk of each candidate ship based on its generic factors and history inspection factors to select high-risk one before conducting on-board PSC inspection. However, how to improve the performance of the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.