The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Outlier detection is a primary step in many data mining applications. An outlier is an abnormal individual from a population, which usually leads poor accuracy in models. Medical literatures are the most reliable resources for researchers to know the progress in their research areas and latest contributions from others. Traditional keyword search retrieves all the text data that contain the keywords...
Multi-Label classification aims to classify an example that can belong to many classes. Although One-versus-All (OVA) is the most common approach, our prior work has shown that the proposed One-versus-One (OVO) always gives higher prediction accuracy than OVA. However, OVO requires an extremely high computational cost when there are a large number of labels. In this paper, we apply our OVO SVMs on...
Blind steganalysis is a method used to detect whether there is a hidden message in a media without having to know the steganography algorithm behind it. Digital image is converted into features using feature extraction algorithm subtractive pixel adjacency matrix. A model is built based on the resulting features using machine learning method support vector machine. The support vector machine method...
Spectral clustering is a suitable technique to deal with problems involving unlabeled clusters and having a complex structure, being kernel-based approaches the most recommended ones. This work aims at demonstrating the relationship between a widely-recommended method, so-named kernel spectral clustering (KSC) and other well-known approaches, namely normalized cut clustering and kernel k-means. Such...
People write online documents from different personal perspectives. The competitive perspectives they hold reflect the conflicts in their fundamental stances and viewpoints. For many security-related applications, it is both beneficial and critical to identify the competitive perspectives implied in online documents. Previous work on competitive perspective identification is based on word features,...
The rapid growth of web source has changed language learning behavior. More and more people utilized web sources instead of paper books. However, the problem now is that it is overwhelming to find useful information. In addition, when considering using different words, good example sentences demonstrating nuance among words are extremely helpful but learners can hardly find them as most web dictionaries...
Speaker Verification is the process of determining whether the speaker is who he/she claims to be. This paper proposes a speaker verification system which considers phonemes as the feature. Phonemes are extracted from the input using spectrogram image analysis without any training and the speaker verification is done using Binary SVM. Phonemes are meaning distinguishing sounds within words. Phonemes...
In this letter, we propose a novel hyperspectral image (HSI) classification method based on the joint collaborative representation (JCR) and support vector machine (SVM) models with decision fusion. First, motivated by the joint model, we adopt a JCR model to deal with HSI classification and develop an effective method to learn contextual basis vectors for the JCR model. Second, the mid-features are...
In this paper, we propose a new high quality pseudo-relevance feedback documents selection approach that uses machine learning based classifier for selecting a set of good feedback documents for boosting the effectiveness of Query Expansion (QE). Our proposed classification technique utilizes very small amount of labelled data set for training purpose that is very appropriate to select a set of good...
Some pattern recognition techniques may present a high computational cost for learning samples' behaviour. The Optimum-Path Forest (OPF) classifier has been recently developed in order to overcome such drawbacks. Although it can achieve faster training steps when compared to some state-of-art techniques, OPF can be slower for testing in some situations. Therefore, we propose in this paper an implementation...
Identifying the mentor and the mentee in the online community is very difficult because of the hidden or lacking personal characteristics but it is very important for the organization. The new members in the organization probably will not only require the knowledge explicitly from their colleagues in the same department, but also the implicit knowledge from online community. The mentor and the mentee...
Accurate and timely traffic classification is a key to providing Quality of Service (QoS), application-level visibility, and security monitoring for network operations and management. A class of traffic classification techniques have emerged that apply machine learning technology to predict the application class of a traffic flow based on the statistical properties of flow-features. In this paper,...
The development of social media, especially Twitter is growing rapidly. Twitter is usually used to comment on a product, a person or even a television program. The written comments by Twitter users can reach hundred thousand or even millions every day. By using the comments obtained from Twitter, it can complement a television program assessment that usually done by using rating, which only represented...
Classifier learning is challenging when the training data is inadequate in either quantity or quality. Prior knowledge hence is important in such cases to improve the performance of classification. In this paper we study a specific type of prior knowledge called hidden information, which is only available during training but not available during testing. Hidden information has abundant applications...
Finding out an effective way to score Chinese written essays automatically remains challenging for researchers. Several methods have been proposed and developed but limited in the character and word usage levels. As one of the scoring standards, however, content or topic perspective is also an important and necessary indicator to assess an essay. Therefore, in this paper, we propose a novel perspective...
In many classification problems, there exists additional information which is available during training but not available during testing. In this paper we denote such information as hidden information, and study how to incorporate it to improve the learning performance. Despite its importance, learning with hidden information has not attracted enough attention from the field and existing work in this...
The optimization method based extreme learning machine (optimization-based ELM) is generalized from single-hidden-layer feed-forward neural networks (SLFNs) by making use of kernels instead of neuron-alike hidden nodes. This approach is known for its high scalability, low computational complexity, and mild optimization constrains. The multi-kernel learning (MKL) framework Simple MKL iteratively determines...
Large amounts of handwritten documents have been digitized, and the need to search and index these documents is increasing to make them more accessible. Different word spotting systems have been proposed to search for words for this purpose. Since the precision of the word spotting system is crucial, verifying the results of a word spotting system is becoming an effective approach to improve the system...
Long-term (multi-step-ahead) time series prediction is a much more challenging task comparing to the short-term (one-step-ahead) time series prediction. This is due to the increasing uncertainty and the lack of knowledge about the future trend. In this paper, we propose a multi-model integration strategy to 1) generate predicted values using multiple predictive models; and then 2) integrate the predicted...
Teachers and parents may use readability to select appropriate learning materials for primary school students. This research constructs Thai stop word list and evaluates the impact of eliminating stop words on readability assessment of Thai text. The corpus contains 1,188 textbook articles used by students from grade 1 to grade 6. Word segmentation, stop word list extraction, and feature selection...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.