The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A study of educational approaches to higher education computing curricula using publicly available data from a representative sample of 27 public colleges in the State University of New York (SUNY) system is performed. Data and text analysis techniques were applied to a corpus built from course descriptions and listed prerequisites across school, department, school type (2- or 4- year or graduate)...
Now a day's many of crimes are related to financial domain so forensic analysis of such documents is required. Due to digitization many of documents for investigation is faster. If analyzer analyzes the document manually it will time consuming and tedious task so, we follow the approach which will specify the clustering algorithm to document for forensic analysis of seize system which will help the...
There are several requirements to the preprocessing of the classified texts. Within the frame of this work importance of these requirements have been analysed.
The paper adopts the fuzzy c-means text mining method in lots of text mining methods. But aim at the defect that the initial value of the fuzzy c-means is more sensitivity and poor stability, an improved GAFCM text mining method has been put forward. GAFCM uses global search features of genetic algorithms to improve the fuzzy c-means. Finally, it has proved that the improved text mining method has...
Keywords Extraction plays a very important role in the text-mining domain, since the keywords can represent the asserted main point in a document. Based on the term network and deleting actor index, an effective keywords extraction algorithm is proposed to extract high frequent terms as well as important terms with low frequencies. The experiment results support the conclusion.
In this paper, we present a new mathematical model based on a “Vector Space Model” and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, and Taiwanese China Times 2005 data set using the proposed method. The Reuters-21578 data set is a benchmark data set for automatic...
Feature selection is an important preprocessing step of Chinese Text Categorization, which reduces the high dimension and keeps the reduced results comprehensible compared to feature extraction. A novel criterion to filter features coarsely is proposed, which integrating the superiorities of term frequency-inverse document frequency as inner-class measure and CHI-square as inter-class, and a new feature...
Several features existed in Chinese texts result in technologic bottleneck in Chinese text mining, at present the results of Chinese text clustering obtained by traditional methods are not very satisfactory. In this paper, we propose the text clustering method by the English texts clustering method called as Text Clustering via Particle Swarm Optimizer (TCPSO) to solve the Chinese text clustering...
Since online news articles are updated daily, hourly and sometimes every minute, therefore the data from online news articles are glowing rapidly. These data seem like a large corpus of text mining. This research focuses on Thai personal names that appear in the online news which sometimes have slightly different spelling but they actually refer to the same person. From the news data that were collected...
On the basis of analyzing the basic concepts and the process of text excavation, the present study proposes some new methods in extraction of text features, deflation of characteristic collection, extraction of study and knowledge pattern, and appraisal of model quality. Meanwhile, it makes a comparison of two types of text categorization, text classifications and text cluster, and it briefly explores...
Document classification is a key task for many text mining applications. However, traditional text classification requires labeled data to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available. In this work, we propose a universal text classifier, which does not require any labeled document. Our approach simulates the capability of people to classify documents...
At present, graduate students need choose some courses by themselves, which had some blindness. The paper put forward a suit of text mining algorithms based on association rule. The algorithms were used in studying relevance between choosing course and research project, which could provide some reference for graduate students. At first, the scheme of computing words' relevant degree was put forward...
In this paper, we propose a new similarity measure to compute the pair-wise similarity of text-based documents based on patterns of the words in the documents. First we develop a kappa measure for pair-wise comparison of documents then we use ordered weighting averaging operator to define a document similarity measure for a set of documents.
Text categorization is an important research field within text mining. A document, actually, is often full of class-independent ??general?? words which many documents and classes share. These ??general?? words do harm to text categorization rather than contribute to the task. Inspired by human cognitive procedure in text classification task, we propose a novel approach called Class Core Extraction...
In this work we look into analyzing blogs to classify products according to users' query. Blogs can be found over the Internet where buyers share their opinions on different products that are available in the market. Such pages may prove to be good guides for a prospective buyer. However, going through a large number of blogs and to convert their opinions into a meaningful decision is often difficult...
An important problem in text mining is the automatic extraction of semantic relations. The paper provides a domain independent method for automatic extraction of part-whole relations in Chinese corpusa. The method consists of there phases. First, a set of lexico-syntactical patterns for part-whole relations are designed using known pairs of concepts encoding part-whole relations as seeds, and manually...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.