The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A large number of videos are generated and uploaded to video websites (like youku, youtube) every day and video websites play more and more important roles in human life. While bringing convenience, the big video data raise the difficulty of video summarization to allow users to browse a video easily. However, although there are many existing video summarization approaches, the key frames selected...
To extract implicit knowledge and data relationships from the audio and audio similarity measure, this paper uses the audio mining techniques. A model for audio clustering and classification technique is proposed. Neural networks are used for classifying the data. The working prototype of the Music classification system has been developed and tested in MATLAB 6.5 using the signal Processing Toolbox...
Classification, or supervised learning, is one of the major data mining processes. Protein classification focuses on predicting the function or the structure of new proteins. This can be done by classifying a new protein to a given family with previously known characteristics. There are many approaches available for classification tasks, such as statistical techniques, decision trees and the neural...
Feature selection is commonly used in bioinformatics applications, such as gene selection from DNA micro array data. Recently, wrapper methods have been proposed as an improvement over traditionally used filter based feature selection methods. In wrapper methods, the goodness of a feature set is often measured using the cross-validation performance of a machine learning method trained with the features...
Worms are self-contained programs that spread over the Internet. Worms cause problems such as lost of information, information theft and denial-of-service attacks. The first part of the paper evaluates the detection of worms based on content classification by using all machine learning techniques available in WEKA data mining tools. Four most accurate and quite fast classifiers are identified for...
Knowledge about protein-protein interactions unveils the molecular mechanisms of biological processes. This paper presents a multiple kernels learning-based approach to automatically extracting protein-protein interactions from biomedical literature. Experimental evaluations show that our approach can achieve state-of-the-art performance with respect to comparable evaluations, with 64.88% F-score...
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of ``normal'' training data points in a chosen...
When we think of an object in a supervised learning setting, we usually perceive it as a collection of fixed attribute values. Although this setting may be suited well for many classification tasks, we propose a new object representation and therewith a new challenge in data mining: an object is no longer described by one set of attributes but is represented in a hierarchy of attribute sets in different...
We study the retrieval task that ranks a set of objects for a given query in the pair wise preference learning framework. Recently researchers found out that raw features (e.g. words for text retrieval) and their pair wise features which describe relationships between two raw features (e.g. word synonymy or polysemy) could greatly improve the retrieval precision. However, most existing methods can...
This paper presents a novel framework for multi-folder email classification using graph mining as the underlying technique. Although several techniques exist (e.g., SVM, TF-IDF, n-gram) for addressing this problem in a delimited context, they heavily rely on extracting high-frequency keywords, thus ignoring the inherent structural aspects of an email (or document in general) which can play a critical...
Human motion recognition in video data has several interesting applications in fields such as gaming, senior/assisted living environments, and surveillance. In these scenarios, we might have to consider adding new motion classes (i.e. new types of human motions to be recognized) as well as new training data (say, for handling different type of subjects). Hence, both accuracy of classification and...
Lazy Associative Rule Mining (LARM) integrates lazy learning and Associative Rule Mining (ARM) to tailor label prediction results by generating related class associative rules (CARs) only when an unlabeled document comes. However, two main problems should be carefully concerned in LARM classification: (1) computing efficiency and (2) dominant class bias prediction. The main idea of the proposed method,...
MEDLINE®, the flagship database of the U.S. National Library of Medicine, is a critical source of information for biomedical research and clinical medicine. The automated extraction of bibliographic data, such as article titles, author names, abstracts, and references, is essential to the affordable creation of this citation database. References, typically appearing at the end of journal articles,...
This paper presents a model of a supervised machine learning approach for classification of a dataset. The model extracts a set of patterns common in a single class from the training dataset according to the rules of the pattern-based subspace clustering technique. These extracted patterns are used to classify the objects of that class in the testing dataset. The user-defined threshold dependence...
Classical intrusion detection system tends to identify attacks by using a set of rules known as signatures defined before the attack, this kind of detection is known as misuse intrusion detection. But reality is not always quantifiable, and this drives us to a new intrusion detection technique known as anomaly intrusion detection, due to the difficulties of defining normal pattern for random data...
To relieve "News Information Overload", classification, summarization and recommendation techniques have been proposed. However, these techniques fail to provide sufficient semantic information about news events. In this paper, considering5W1H (Who, What, Whom, When, Where and How), the full list of elements of a news article, we propose a novel approach to extract event semantic elements...
In this paper, we propose a category-specific incremental visual codebook training method for scene categorization. In this method, based on a preliminary codebook trained from a subset of training samples, we incrementally introduce the remaining training samples to enrich the content of the visual codebook. Then, the incremental learned codebook is used to encode the images for scene categorization...
It is vital to develop automatic information extraction systems to help researchers cope up with the vast amount of data available on the Internet. In this paper, we describe a framework to extract precise information about coexpression relationship among genes, from published literature using a supervised machine learning approach. We use a graphical model, Dynamic Conditional Random Fields (DCRFs),...
Research on opinion detection has shown that a large number of opinion-labeled data are necessary for capturing subtle opinions. However, opinion-labeled data, especially at the sub-document level, are often limited. This paper describes the application of Semi-Supervised Learning (SSL) to automatically produce more labeled data and explores the potential of SSL to improve transfer of labeled data...
The paper presents a novel approach to automate the Change Detection (CD) problem for the specific task of road extraction. Manual approaches to CD fail in terms of the time for releasing updated maps; in the contrary, automatic approaches, based on machine learning and image processing techniques, allow to update large areas in a short time with an accuracy and precision comparable to those obtained...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.