The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Traditional machine learning algorithms often require computations on centralized data, but modern datasets are collected and stored in a distributed way. In addition to the cost of moving data to centralized locations, increasing concerns about privacy and security warrant distributed approaches. We propose keybin, a distributed key-based binning clustering algorithm for high-dimensional spaces....
Expectation-Maximization (EM) is typically used to compute maximum likelihood estimates given incomplete samples and estimated the parameters. We proposed a new algorithm for generating an extension Dynamic Topic Model (exDTM)-in a time-based manner and based on the distribution of documents topics on Spark. The proposed algorithm can be applied in clustering documents from data streams for threat...
Big data clustering is one of the recently challenging tasks that is used in many application domains. Traditional clustering methods are not able to deal with large-scale of data. Furthermore, Big data are often characterized by the mixed type of data, including numerical and categorical attributes. Thus, we propose in this paper the parallelization of k-prototypes clustering method (MR-KP) using...
To analyze enormous datasets, collection of algorithms, associated systems and perform necessary processing on massive data structures there is obligation for a novel trend, which is framed by Big Data. Architecture of Big Data varies across compound machines and clusters with unique purpose sub systems. The data produced from several sources requires analysis and organization with meager amounts...
Big Data analytics are recently coming up as prominent research area in the field of Information Technology serving various data driven domains for effective processing of big data. Big data analytics have been facing various challenges such as inefficient storage, processing delays, low rate of information retrieval, complex algorithms which cannot be handled and managed using traditional methods...
With the growing popularity of the network, product information filled in the many pages of the Internet, which you want to get the information you need on these pages tend to consider clustering information, and the current explosive growth of data so that the information mass storage condition occurs, clustering to facing the problems such as large calculation complexity and time consuming, then...
K-means is the most widely used clustering algorithm due to its fairly straightforward implementations in various problems. Meanwhile, when the number of clusters increase, the number of iterations also tend to slightly increase. However there are still opportunities for improvement as some studies in the literature indicate. In this study, improved implementations of k-means algorithm with a centroid...
The paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.