The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Biological data is often represented as networks, as in the case of protein-protein interactions and metabolic pathways. Modeling, analyzing, and visualizing networks can help make sense of large volumes of data generated by high-throughput experiments. However, due to their size and complex structure, biological networks can be difficult to interpret without further processing. Cluster analysis is...
Selecting relevant features in data modeling is critical to ensure effective and accurate prediction of future effects. The problem becomes compounded when the relevance of previously selected features cannot be guaranteed due to changes in the underlying dataset. We propose an algorithm based on the statistical plaid model for the discovery and tracking of feature relevance scores in datasets that...
With data storage and processing technology developing fast, there has been accumulated a great amount of open data that comes from everywhere including social media. One of the promising tools to analyze these data is fuzzy cognitive maps that help to describe connections and substances to reveal patterns, facts and knowledge. One of the problems when creating cognitive maps is the identification...
The fuzzy c-means method is investigated to cluster the heavy tailed data by using some measures of distance. A comparison study is provided based on time and precision. The results show that when using the Euclidean distance, the time required is less than if we used Manhattan distance, but the precision is higher when using the Manhattan distance.
There is always a lack of a cluster validity function and optimization strategy to find out clusters and catch the evolution trend of cluster structures on a categorical data stream. Therefore, this paper presents an optimization model for clustering categorical data streams. In the model, a cluster validity function is proposed as the objective function to evaluate the effectiveness of the clustering...
The identification of students' typologies plays interesting role in adapting educational strategies and improving academic performances. In this work, we show how unsupervised learning techniques can be applied to educational data for the extraction of typologies and profiles of graduate students based on educational outcomes in combination with the time to degree. We also describe a web-based tool...
The scope of this work is the development of a mathematical model of a gasification process to be used for the prediction of the Syngas composition. The predictions are intended to support the gascromathographic measurements of the Syngas composition which are often not available due to periodic calibrations. This work represents the first step of broader project which scope is the development of...
IDS (Intrusion Detection system) is an active and driving defense technology. This paper mainly focuses on intrusion detection based on data mining. The aim is to improve the detection rate and decrease the false alarm rate, and the main research method is clustering analysis. The algorithm and model of ID are proposed and corresponding simulation experiments are presented. Firstly, a method to reduce...
Cluster analysis is one of the most important functions of data mining. Expectation Maximization (EM) method is an important technology based on model clustering method. The expectation maximization algorithm is analyzed in this research and applied to Adaptive Testing System, in which logistic function in item response theory serves as a model, and the combination of methods of marginal maximum likelihood...
In cluster analysis, current algorithms assume that all features in the data contribute uniformly in assigning samples to clusters. This assumption can lead to poor clustering results, due to the existence of noisy and less important features. Feature weighting overcomes this issue by assigning different weights to features based on some notion of importance. According to feature weighting, more important...
Cluster analysis is one of the most well known methods in data mining. One of the major problems in clustering is the dendrogram instability due to data input order. Rough set has already been used as an intelligent approach to data mining. The core concept of classical rough sets is to cluster similarities and differences of data objects based on the notions of indiscernibility and indiscernibility...
The recent extensive growth of data on the Web, has generated an enormous amount of log records on Web server databases. Applying Web usage mining techniques on these vast amounts of historical data can discover potentially useful patterns and reveal user access behaviors on the Web site. Cluster analysis has widely been applied to generate user behavior models on server Web logs. Most of these off-line...
Rapid advances in data collection and storage technology have enabled telecom company to accumulate vast amounts of data. However, extracting useful information has proven extremely challenging. Telecom enterprises are holding massive customers' data and should convert it to competitive advantage in order to maximize customers' profitability. Based on CRISP-DM (cross-industry standard process for...
Parameters of tracked video objects (for example: the angles of moving objects) are discrete random variables and the amount of data increases over time. In this paper we use a new method to analyze the parameter angle: the video frame is segmented into small sections and in each section the angle values during some time period are gathered. Through analysis the angle data in each section these angles...
Phishing is a form of online fraud with drastic consequences for the victims and institutions being defrauded. A phishing attack tries to create a believable environment for the intended victim to enter their confidential data such that the attacker can use or sell this information later. In order to apprehend phishers, law enforcement agencies need automated systems capable of tracking the size and...
Clustering of real-world data is often ill-posed. Because of noise and intrinsic ambiguity in data, optimization models attempting to maximize a fitness function can be misled by the assumption of uniqueness of the solution. In this work we present a methodology including classic and novel techniques to approach clustering in a systematic way, with two application examples to biological data sets...
Urban traffic state analysis plays an important role in the solution of traffic congestion problem. To estimate traffic state effectively is a foundational work for improving traffic condition and preventing traffic congestion. In this paper, a novel pattern-based approach is proposed to model the clustering and classification of traffic state. First, fuzzy-set clustering method is utilized to divide...
In many vision and image problems there are multiple structures in a single data set and we need to identify the multiple models. To preserve most structures in presence of noise makes the estimation difficult. In such case for each structure, data which belong to other structures are also outliers in addition to the outliers for all the structures. Robust regression techniques are commonly used to...
Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. Most clustering techniques ignore the fact about the different size or levels - where in most cases, clustering is more concern with grouping similar objects or samples together ignoring the fact that even though they are similar, they might be of different...
Facing to the characters of railway transport industry, parameters of railway freight market customer subdivision were presented; clustering model K-Means was established for market subdivision. The algorithm of K-Means was modified by using square error and objective function. Actual experiment of railway transport market subdivision is done based on Clementine 8.0 platform, and the features of various...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.