The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Work on sentiment analysis has thus far been limited in the news article domain. This has mainly been caused by 1) news articles lacking a clearly defined target, 2) the difficulty in separating good and bad news from positive and negative sentiment, and 3) the seeming necessity of, and complexity in, relying on domain-specific interpretations and background knowledge. In this paper we propose, define,...
Support Vector Machines (SVM), one of the new techniques for text classification, have been widely used in many application areas. SVM try to find an optimal hyperplane within the input space so as to correctly classify the binary classification problem. We present a novel heuristic text classification approach based on genetic algorithm (GA) and SVM. Simulation results demonstrate that GA and SVM...
The following topics are dealt with: data mining; local clustering; spatiotemporal event detection; time series; Markov models; email classification; data stream; parallel mining; Bayesian network; unsupervised learning; missing values prediction; anomaly detection; decision tree; binary classifier; data similarity matrix; data mapping; support vector machine; Mapreduce; document similarity; social...
This paper investigates how to integrate multi-modal features for story boundary detection in broadcast news. The detection problem is formulated as a classification task, i.e., classifying each candidate into boundary/non-boundary based on a set of features. We use a diverse collection of features from text, audio and video modalities: lexical features capturing the semantic shifts of news topics...
The following topics are dealt with: linear approximation; license plate recognition; color image segmentation; image quantization; wireless video transmission; congestion control; stochastic search; transmembrane helical segments; wavelet transform; semisupervised cluster algorithm; anomaly detection; data privacy; online market information processing; user behavior; particle swarm optimization;...
With a rapid growth of the internet communication, many types of text are produced. They can convey the meanings that can contribute to text categorization. Emotion classification also becomes more interesting, but emotion classification in Thai text is still not able to be correctly classified. Thus, this paper proposes a novel approach that takes advantage of bi-words occurrence to classify emotion...
Bug assignment is an important step in bug life-cycle management. In large projects, this task would consume a substantial amount of human effort. To compare with the previous studies on automatic bug assignment in FOSS (free/open source software) projects, we conduct a case study on a proprietary software project in China. Our study consists of two experiments of automatic bug assignment, using Chinese...
This paper concerns the fundamental problem of identifying the content nature of a flow, namely text, binary, or encrypted, for the first time. We propose Iustitia, a tool for identifying flow nature on the fly. The key observation behind Iustitia is that text flows have the lowest entropy and encrypted flows have the highest entropy, while the entropy of binary flows stands in between. The basic...
In this paper we compare the effectiveness of using morphological and ontological information for text categorization. We induce morphological information using stemmed features. Ontological information, on the other hand, has been induced in the form of WordNet hypernyms. We form text representations based on stemming and hypernyms. Those representations are evaluated using four different machine...
We propose a feature called category browsing to enhance the full-text search function of Thai-language news article search engine. The category browsing allows users to browse and filter search results based on some predefined categories. To implement the category browsing feature, we applied and compared among several text categorization algorithms including decision tree, Naive Bayes (NB) and Support...
According to this paper, a novel approach based on non-linear support vetor machine decision tree (NSVMDT) and K nearest neighbors (KNN) is proposed towards Chinese text categorization. To begin with, SVM is extended to non-linear SVM by using kernel functions. And then the method of NSVMDT is presented based on traditional SVM decision tree. Furthermore, the KNN is combined with NSVMDT to solve the...
Information retrieval is facing great challenge due to the explosion of the network scales. This makes more researchers focus on the issue of Web page classification technology. By applying binary decision tree into support vector machine (SVM), an assorted SVM method, decision support vector machine (DSVM), based on Web information classification was introduced in the paper. And the way was further...
This paper describes the sentiment classification with review extraction. Whole process can be illustrated logically as: (1) extract the review expressions on specific subjects and attach sentiment tag and weight to each expression; (2) calculate the sentiment indicator of each tag by accumulating the weights of all the expression with the corresponding tag; (3) given the indicators on different tags,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.