The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering large data is one of the recently challenging tasks that is used in many application areas such as social networking, bioinformatics and many others. Traditional clustering algorithms need to be modified to handle the increasing data sizes. In this paper, a scalable design and implementation of glowworm swarm optimization clustering (MRCGSO) using MapReduce is introduced to handle big data...
This paper is motivated by our three key observations: (1) there exists a degradation of performance as the interleaved accesses of heterogeneous streams, (2) for the slow stream, sequential accesses suffer huge misses in the prefetching cache, (3) in concurrence paradigm, providing fairness and QoS to concurrent streams is very important which always ignored by the traditional prefetching algorithms...
Clustering is an important tool in many fields such as exploratory data mining and pattern recognition. It consists in organizing a large data set into groups of objects that are more similar to each other than to those in other groups. Despite its use for over three decades, it is still subject to a lot of controversy. In this paper, we cast clustering as a Pareto based multi-objective optimization...
In our previous research, we showed that the maximum flow and the minimum cut problem can be solved by using a resistive circuit consisting of nonlinear devices with a saturation characteristic. Usually, the flow network consists of three elements, such as connectivity between nodes, branch capacities, and flows. The network information obtained from nonlinear resistive circuit analysis also has conventional...
Rectangles are the smallest cycles (i.e., cycles of length 4) and most elementary sub-structures in a bipartite graph. Similar to triangle counting in uni-partite graphs, rectangle counting has many important applications where data is modeled as bipartite graphs. However, efficient algorithms for rectangle counting are lacking. We propose three different types of algorithms to cope with different...
Searching and discovering the relevant information on the web have always been challenging task. It is very hard to wade through the large number of returned documents in a response to a user query. This leads to the need to organize a large set of documents into categories through clustering. There is a need of efficient clustering algorithms for organizing documents. Clustering on large dataset...
The high computational complexity and great memory requirement prevent the SIFT algorithm from being processed realtimely. The block-parallel SIFT algorithm with boundary extension adopted by existing researches suffers from redundant storages or extra communications to process the boundaries of partitions. The block-parallel SIFT algorithm without boundary extension (pSIFT-noBE) can spontaneously...
This paper provides an overview of the current status of methods that may be used to induce parallel properties into the temporal axis for time dependent problems described by differential equations. An extension to problems with two spatial dimensions is also included.
This paper presents partitioning fuzzy clustering algorithms for mixed feature-type symbolic data. The proposed algorithms need a previous pre-processing step in order to obtain a suitable homogenization of the mixed feature-type symbolic data into histogram-valued symbolic data. These fuzzy clustering algorithms give a fuzzy partition and a prototype for each fuzzy cluster by optimizing an adequacy...
Among the techniques that the distributed database designer considers for performance improvement are fragmentation, replication and allocation techniques. Fragmentation is the process of dividing a relation into two or more relations called fragments. Usually two fragmentation techniques are considered, these are: vertical and horizontal fragmentation. A third rarely considered one is hybrid or mixed...
In this paper, a new recursive multibit recoding multiplication algorithm is introduced. It provides a general space-time partitioning of the multiplication problem that not only enables a drastic reduction of the number of partial products (N/r), but also eliminates the need of pre-computing odd multiples of the multiplicand in higher radix (r≥3) multiplication. Based on a mathematical proof that...
In this paper, a waveform relaxation algorithm for the fast electromagnetic interference analysis of distributed transmission line networks is presented. The proposed work models lossy transmission lines as a cascade of lumped circuit elements and lossless line segments, where the incident field coupling with the network is represented as lumped sources connected to each lossless line segment. A longitudinal...
This paper introduces an approach to derive whether an individual is related to an item or not. In our approach, the well-known DBLP dataset is used and we try to find some skills that are related to an author that we were not aware of before. To realize our objective, we cluster authors and skills using Spectral Graph Clustering algorithm, then simultaneously obtain user and movie clusters via Bipartite...
Outliers detection is a task that finds objects that are dissimilar or inconsistent with respect to the remaining data. It has many uses in applications like fraud detection, network intrusion detection and clinical diagnosis of diseases. Using clustering algorithms for outlier detection is a technique that is frequently used. The clustering algorithms consider outlier detection only to the point...
The distributed single and/or multiple user selection problems are strongly related to that of partitioning a sample with binary-type questions. Although several algorithms have been proposed for user selection, the comprehensive view of designing the optimal algorithm has not been fully investigated yet. In this paper, we reformulated a splitting based selection algorithm by introducing a new parameter,...
This paper proposes the Fuzzy Particle Swarm Clustering (FPSC) algorithm, which is an extension of the crisp data clustering algorithm PSC particularly tailored to deal with fuzzy clusters. The main structural changes of the original PSC algorithm to design FPSC occurred in the selection and evaluation steps of the winner particle, comparing the degree of membership of each object from the database...
A common question asked about unlabeled data sets is how many subsets (or clusters) of objects are represented in the data? The answer to this question is usually obtained by first clustering the data, and then employing a cluster validity measure to validate one or more candidate partitions of the objects. In this paper we describe an universal cluster validity measure that, unlike most existing...
In this paper, a novel data clustering algorithm based on the subtractive clustering (SC) algorithm and a new validity index are proposed. The SC algorithm is a simple method for data clustering; however, it has two problems which must be overcome. The first problem is such that the cluster centers found by SC are taken from data with the highest potential values, but that this data may not be the...
This paper presents a data clustering approach using modified K-Means algorithm based on the improvement of the sensitivity of initial center (seed point) of clusters. This algorithm partitions the whole space into different segments and calculates the frequency of data point in each segment. The segment which shows maximum frequency of data point will have the maximum probability to contain the centroid...
Clustering is one of the popular techniques for data analysis. In this paper, we proposed a new method for the simultaneously clustering and feature selection through the use of the multi-objective particle swarm optimization (PSO). Since different features may have different important in various contexts; some features may be irrelevant and some of them may be misleading in clustering. Therefore,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.