The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we show how the technologies associated with the evolution of Cloud computing to Dew computing can contribute to the advancing scientific computational productivity through automation. In the current big data paradigm developments, there is growing trend towards automation of data mining and other analytical processes involved in data science to increase productivity of associated applications...
Dynamic data if used properly can bring huge benefits to the humanity, science and business. The various properties of dynamic data like volume, velocity, variety, variation and veracity render the current methods of data analysis ineffective. Dynamic data analysis needs fusion of methods for the data mining with those of machine learning. The k-means algorithm is one such algorithm that has existed...
Recently, every enterprise generates large volumes of high dimensional data on a regular basis. Complex data mining and analysis techniques are used to feasibly analyse this data. Feature selection aids in this by providing a reduced representation of this data while maintaining integrity. We propose a graph-based feature selection algorithm utilizing feature intercorrelation to construct a weighted...
Data quality plays an important role in modern intelligent information system and is crucial to any data analysis task. Many imperfection-handling techniques avoid overfitting or simply remove offending portions of the data. Data correction can help to retain and recover as much information as possible from the original data resources. In this paper, we proposed a novel technique based on polynomial...
In today's world, large volumes of medical data are being continuously generated, but their value is severely undermined by our inability to translate them into knowledge and, ultimately, actions. Data mining techniques allow the extraction of previously unknown interesting patterns from large datasets, but their complexity limits their practical diffusion. Data-driven analysis is a multi-step process,...
Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering approaches are normally generating global models by aggregating local results that are obtained on each site. While this approach mines the datasets on their locations...
The characterization of optimization problems over continuous parameter spaces plays an important role in optimization. A form of “fitness landscape” analysis is often carried out to describe the problem space in terms of modality, smoothness and variable separability. The outcomes of this analysis can then be used as a measure of problem difficulty and to predict the behaviour of a given algorithm...
With the advent of modern techniques for scientific data collection, large quantities of data are getting accumulated at various databases. Systematic data analysis methods are necessary to extract useful information from rapidly growing data banks. Cluster analysis is one of the major data mining methods and the k-means clustering algorithm is widely used for many practical applications. But the...
Multidimensional sequences are common, and measuring their similarity is a key to any analysis of such data. There is a wealth of similarity measures for sequences in the literature, but most of them are designed for a special type of sequence and later extended to more general types. These extensions are usually ad hoc, and the extended versions may lose the original conceptual interpretation of...
The paper presents a multi-density clustering algorithm based on grid adjacency relation (GAMD) using data distribution characteristics within units, which is reflected by the unit density and the center of mass. In order to determine the unit boundary, the algorithm measures the similarity between units by the relative density of units and relative distance of center of mass. Goodness of fit is proposed...
Concept lattice is a new mathematical tool for data analysis and knowledge processing. Attribute reduction is very important in the theory of concept lattice because it can make the discovery of implicit knowledge in data easier and the representation simpler. In this paper the reduction of the concept lattice was investigated. First, we present a close-degree of concept to measure the close-degree...
Identifying outliers is a difficult thing in data mining. We adopt the notion of deviants for outliers in data streams. Deviants are data set whose removal from the data sequence over data streams lead to sum of error SSE minimize. We present DDA algorithm to detect deviants over massive data streams. With this algorithm the histogram can more accurately determine the deviants and greatly reduce error.
In this paper, we present a scalable evolutionary algorithm for clustering large and dynamic data sets, called Scalable Evolutionary Clustering with Self Adaptive Genetic Operators (Scalable ECSAGO). The proposed evolutionary clustering algorithm can adapt its genetic operators rate while the evolution leads to the optimal centers of the clusters. The sizes of the clusters are estimated using a hybrid...
Skyline attracts more and more attention from academic circle and industrial circle because of its application in multi-criterion decision support, preference answering and data analysis. However, it seems unnecessary to recommend all services in skyline while the number of skyline points is large. The number of services in skyline is always large for the reason that comparability decreases with the...
Computer forensics is the traces collection and processing, which is the offender to remain in the computer or network system, and as a legally binding evidence in the proceedings available to the court, so that suspects would be brought to justice. It mainly includes data protection, data collection, data analysis, the evidence presented in such processes. the data analysis is the key to computer...
Selecting informative genes from microarray gene expression data is the most important task while performing data analysis on the large amount of data. Mining genes having regulatory relations within thousands of genes is essential. To fit this need, a number of methods were proposed from various points of view. However, most existing methods solely focus on gene expression values themselves without...
Early ID3, C4.5, CART and the other decision tree algorithms are no longer met the situation of massive data analysis for the time being. Those algorithms has the same limitations that they can not handle the updated data sets dynamically and the decision tree generated by these algorithms need to be purned. These weaknesses limit the use of the above-mentioned algorithms. So a novel parallel decision...
This paper introduces a novel incremental approach to clustering uncertain categorical data. This so-called Incremental K Belief K-modes Method (IK-BKM) extends the Belief K-modes one to update the cluster partition when new information is available namely the increase of final desired clusters' number. The main objective is to update clusters' partition without complete reclustring. Our method will...
Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics a novel algorithm based on clustering to extract rules from neural networks is proposed. After neural networks have been trained and pruned successfully, inner-rules are generated by...
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.