The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In recent years, the search of efficient and non-invasive methods for the diagnosis of diseases has grown among the scientific community. One of the explored areas for that purpose is the analysis of the pupillary response to light stimulus, obtaining results in areas such Diabetes, Alzheimer, Neurological Disorders, Melancholia and other different physiological states. However, each of those investigations...
Recent advances in clustering have shown that ensuring a minimum separation between cluster centroids leads to higher quality clusters compared to those found by methods that explicitly set the number of clusters to be found, such as k-means. One such algorithm is DP-means, which sets a distance parameter λ for the minimum separation. However, without knowing either the true number of clusters or...
Botnets, networks of compromised devices, are considered as one of the most costly incidents in network security. Since the botnets are able to obfuscate firewalls, interconnect to vast networks, attack enterprise systems, and lead to massive damages, it is getting more urgent to detect the botnets. Some detection mechanisms have been proposed, particularly applying machine learning techniques into...
Community structure is a common feature in real-world network. Overlap community detection is an important method to analyze topology structure and function of the network. Most algorithms are based on the network structure, without considering the node attributes. In this paper, we propose an overlapping community detection algorithm based on node convergence degree which combines the network topology...
Nowadays, it is widely accepted that exploiting all forms of parallelism is the only way to significantly improve performance. The three major forms of parallelism on a modern processor are ILP, DLP, and TLP, which are not mutually exclusive. To gain further performance improvements, MPI can be used on a cluster of computers. This paper exploits the capabilities of distributed multi-core Intel processors...
In the last years the volume of data that was generated by the mankind has increased and the complexity of data generated has also increased. Since the computers have evolved and provide more processing power, it is possible to carry out the real-time analysis of big volumes of data. This paper suggests the architecture of a big data processing platform called BigTim, which is able to run clustering...
In view of today's information available, recent progress in data mining research has lead to the development of various efficient methods for mining interesting patterns in large databases. It plays a vital role in knowledge discovery process by analyzing the huge data from various sources and summarizing it into useful information. It is helpful for analyzing the volumes of data in different domains...
Outlier detection is an important issue in the realm of data mining. Several applications relay on outlier detection such as intrusion detection, fraud detection, medical and public health data, image processing, etc. Clustering-based outlier detection algorithms are considered as the most important outlier detection approaches. They provide high detection rate, however, they suffer from high false...
This paper presents a new clustering algorithm, called Cell-MST-Based Method that is a combination of a Cell-based method and Minimum Spanning Tree based (MST-based) methods. The algorithm is dedicated for Big Datasets on a limited memory computer, especially for thin big datasets which have a small number of attributes but a very large number of instances. Firstly, a Cell-based method converts a...
The most important task of clustering process is the validation of results obtained from clustering algorithms. There are many cluster validation criteria's but the most commonly used approaches are founded on internal validity indices. There are numerous indices that have been suggested from time to time but there are only some of them that have been popularly used. In this paper we have drawn a...
A cluster can be defined as the collection of data objects grouped into the same group which are similar to each other whereas data objects which are different are grouped into different groups. The process of grouping a set objects into classes of similar objects is called clustering. In fuzzy c means clustering, every data point belongs to every cluster by some membership value. Hence, every cluster...
The tremendous growth in data volumes has created a need for new tools and algorithms to quickly analyze large datasets. Cluster analysis techniques, such as K-means can be used for large datasets distributed across several machines. The accuracy of K-means depends on the selection of seed centroids during initialization. K-means++ improves on the K-means seeder, but suffers from problems when it...
Among hybrid systems, the piecewise affine systems are a common class to be identified from input/output data. The work presented in this paper is concerned with the identification of piecewise affine systems using clustering based procedures. In fact, the Kohonen's Self Organizing Map is used to identify both the parameters of the affine sub-models and the hyperplanes defining the partitions of the...
Hierarchical agglomerative clustering treats given data as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all data. However, if two data are merged incorrectly in the beginning, errors will be accumulated and amplified by the following iterations. Thus, we will get a worse cluster...
Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce is a programming model that empowers us to analyze big data in a cluster of computers. In this paper we propose a Parallel Structural Clustering Algorithm for big Networks (PSCAN) in MapReduce for the detection of clusters or community structures in big networks...
Data mining is the extraction of hidden predictive information from large databases and it is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. In data mining there are two activities such as Classification...
Efficient utilization of resources is a vital factor for any environment. The grid environment is dynamic and it allows the clusters to move around freely. Grid acts as a supercomputer to the users by handling voluminous data. So proper allocation of the available resources is important. Hence resource discovery and scheduling of jobs is a challenging area in grid. This paper presents a new approach...
Many traditional clustering algorithms have the scalability problem while dealing with large data sets. One common strategy to handle the problem is to parallelize the algorithms and execute them along with the input data on high-performance computers. However, many graph-based clustering algorithms are hard to be parallelized since they need to calculate the similarity of all-pairs of all data nodes...
Clustering analysis is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes. This paper proposes a new algorithm for Modified K-Means clustering which executes like the K-means algorithm and k-medoids algorithms and tests several methods for selecting initial cluster. Modified K-Mean Algorithm is better in terms of number of clusters and execution...
More and more intruders are used to using stepping-stone to launch the attacks on their interested targets because exploiting stepping-stones can hide them deeply and make them feel safe. Clustering-Partitioning approach was proposed to detect stepping-stone intrusion and resist intruders' evasion. The biggest issue of this approach is that it mines network traffic in a very inefficient way. Double...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.