The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A large number of people download music files easily from Web sites. But rare music sites provide personalized services. So, we suggest a method for personalized services. We extract the properties of music from music's sound wave. We use STFT (shortest time fourier form) to analyze music's property. And we infer users' preferences from users' music list. To analyze users' preferences we propose a...
We study the problem of correcting spelling mistakes in text using memory-based learning techniques and a very large database of token n-gram occurrences in web text as training data. Our approach uses the context in which an error appears to select the most likely candidate from words which might have been intended in its place. Using a novel correction algorithm and a massive database of training...
Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the...
In modern business, educational, and other settings, it is common to provide a digital network that interconnects hardware devices for shared access by the users (e.g., in an office where printers are available for use by all the office workers). In such a context, so-called "soft" failures, where a device silently starts working in degraded mode, may easily go un-noticedfor a long time,...
Advances in high throughput technology provide massive high dimensional data. It is very important and challenging to study the association of genes with various clinical outcomes. Due to large variability in time to certain clinical event among patients, studying possibly censored survival data can be more informative than classification. We proposed the Cox's proportional hazards model with Lp penalty...
Microarray techniques give biologists first peek into the molecular states of living tissues. Previous studies have proven that it is feasible to build sample classifiers using the gene expressional profiles. To build an effective sample classifier, dimension reduction process is necessary since classic pattern recognition algorithms do not work well in high dimensional space. In this paper, we present...
Class imbalance tends to cause inferior performance in data mining learners. Evolutionary sampling is a technique which seeks to counter this problem by using genetic algorithms to evolve a reduced sample of a complete dataset to train a classification model. Evolutionary sampling works to remove noisy and duplicate instances so that the sampled training data will produce a superior classifier. We...
Several methods have been proposed for solving reinforcement learning (RL) problems. In addition to temporal difference (TD) methods, evolutionary algorithms (EA) are among the most promising approaches. The relative performance of these approaches in certain subdomains of the general RL problem remains an open question at this time. In addition to theoretical analysis, benchmarks are one of the most...
We introduce a supervised reinforcement learning (SRL) architecture for robot control problems with high dimensional state spaces. Based on such architecture two new SRL algorithms are proposed. In our algorithms, a behavior model learned from examples is used to dynamically reduce the set of actions available from each state during the early reinforcement learning (RL) process. The creation of such...
We introduce a polynomial-time algorithm to learn Bayesian networks whose structure is restricted to nodes with in-degree at most k and to edges consistent with the optimal branching, that we call consistent k-graphs (CkG). The optimal branching is used as an heuristic for a primary causality order between network variables, which is subsequently refined, according to a certain score, into an optimal...
In this paper, we introduce a new variant of growing self-organizing maps (GSOM) based on Alahakoon's algorithm for SOM training; so called 2IBGSOM (interior and irregular boundaries growing self-organizing maps). It's dynamically evolving structure for SOM, which allocates map size and shape during the unsupervised training process. 2IBGSOM starts with a small number of initial nodes and generates...
A practical problem in data mining and machine learning is the limited availability of data. For example, in a binary classification problem it is often the case that examples of one class are abundant, while examples of the other class are in short supply. Examples from one class, typically the positive class, can be limited due to the financial cost or time required to collect these examples. This...
Manifold clustering aims to partition a set of input data into several clusters each of which contains data points from a separate, simple low-dimensional manifold. This paper presents a novel solution to this problem. The proposed algorithm begins by randomly selecting some neighboring orders of the input data and defining an energy function that is described by geometric features of underlying manifolds...
With the rapid growth of the World Wide Web (www), it becomes a critical issue to design and organize the vast amounts of on-line documents on the web according to their topic. Even for the search engines it is very important to group similar documents in order to improve their performance when a query is submitted to the system. Clusterng is useful for taxonomy design and similarity search of documents...
When a data set contains objects of multiple types, to cluster the objects of one type, it is often necessary to consider the cluster structure on the objects of the other types. Co-clustering the related objects often generates better clusters. One basic connection here is that the similarity among the objects of one type is often affected by the cluster structures on the objects of the other types...
In many real world scenarios, mixture models have successfully been used for analyzing features in data ([11, 13, 21]). Usually, multivariate Gaussian distributions for continuous data ([2, 8, 4]) or Bayesian networks for nominal data ([15, 16]) are applied. In this paper, we combine both approaches in a family of Bayesian models for continuous data that are able to handle univariate as well as multivariate...
This paper briefly describes the AQ21 learning system that implements a simple form of natural induction, an approach to learning that generates hypotheses in forms resembling natural language descriptions, and by that easy to understand and interpret. The system was applied to the analysis of aggregated data obtained from non-invasive tests performed on different groups of patients with metabolic...
The identification of cis-regulatory binding sites in DNA in multicellular eukaryotes is a particularly difficult problem in computational biology. To obtain a full understanding of the complex machinery embodied in genetic regulatory networks it is necessary to know both the identity of the regulatory transcription factors together with the location of their binding sites in the genome. We show that...
This paper describes a hybrid neural network based model for predicting the performance of a single cylinder two stroke cycle spark ignition engine. The engine was run in the carburetor mode and engine mapping was done by collecting the engine performance data in terms of power and brake specific fuel consumption for various combinations of speed, load and air-fuel ratio. This data was used for predicting...
Neural networks have been used to examine a set of thirteen objective features and a single subjective physician's assessment for emergency room patients with symptoms possibly indicative of acute coronary syndrome (ACS). The objective data is information routinely collected during triage. The neural networks were used to fuse the disparate types of information with the goal of forecasting thirty-day...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.