The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we describe a new distributed community detection algorithm for billion-edge directed graphs that, unlike modularity-based methods, achieves cluster quality on par with the best-known algorithms in the literature. We show that a simple approximation to the best-known serial algorithm dramatically reduces computation and enables distributed evaluation yet incurs only a very small impact...
Community-detection is a powerful approach to uncover important structures in large networks. Since networks often describe flow of some entity, flow-based community-detection methods are particularly interesting. One such algorithm is called Info map, which optimizes the objective function known as the map equation. While Info map is known to be an effective algorithm, its serial implementation cannot...
Multidimensional Scaling (MDS) is a dimension reduction method for information visualization, which is set up as a non-linear optimization problem. It is applicable to many data intensive scientific problems including studies of DNA sequences but tends to get trapped in local minima. Deterministic Annealing (DA) has been applied to many optimization problems to avoid local minima. We apply DA approach...
Large high dimension datasets are of growing importance in many fields and it is important to be able to visualize them for understanding the results of data mining approaches or just for browsing them in a way that distance between points in visualization (2D or 3D) space tracks that in original high dimensional space. Dimension reduction is a well understood approach but can be very time and memory...
We present performance results on a Windows cluster with up to 768 cores using MPI and two variants of threading - CCR and TPL. CCR (Concurrency and Coordination Runtime) presents a message based interface while TPL (Task Parallel Library) allows for loops to be automatically parallelized. MPI is used between the cluster nodes (up to 32) and either threading or MPI for parallelism on the 24 cores...
The multicore revolution promises potentially hundreds of cores in desktop computers. The ever increasing number of cores per chip will be accompanied by a pervasive data deluge whose size will probably increase even faster than CPU core count over the next few years. This suggests the importance of parallel data analysis and data mining applications with good multicore, cluster and grid performance...
Multidimensional scaling constructs a configuration points into the target low-dimensional space, while the interpoint distances are approximated to the corresponding known dissimilarity values as much as possible. SMACOF algorithm is an elegant gradient descent approach to solve Multidimensional scaling problem. We design parallel SMACOF program using parallel matrix multiplication to run on a multicore...
The ever increasing number of cores per chip will be accompanied by a pervasive data deluge whose size will probably increase even faster than CPU core count over the next few years. This suggests the importance of parallel data analysis and data mining applications with good multicore, cluster and grid performance. This paper considers data clustering, mixture models and dimensional reduction presenting...
eScience applications need to use distributed Grid environments where each component is an individual or cluster of multicore machines. These are expected to have 64-128 cores 5 years from now and need to support scalable parallelism. Users will want to compose heterogeneous components into single jobs and run seamlessly in both distributed fashion and on a future "Grid on a chip" with different...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.