The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
K-modes is a typical categorical clustering algorithm. Firstly, we improve the process of K-modes: when allocating categorical objects to clusters, the number of each attribute item in clusters is updated, so that the new modes of clusters can be computed after reading the whole dataset once. In order to make K-modes capable for large-scale categorical data, we then implement K-modes on Hadoop using...
Big data is a set of very large and complex data that is hard to load on computers. The main challenge in big data world is related to their search, categorize and analyze specially, when they are unbalanced. Despite, there are a lot of works in the field of big data but analyzing unbalanced big data is still a fundamental challenge in this area. In this paper we try to solve the problem of RSIO-LFCM...
In this paper a new algorithm RS-PEAK will be presented for locating peak values of the Riemann zeta function on the critical line. The method based on earlier results of Andrew M. Odlyzko, Tadej Kotnik, and on a recently achieved results of solving simultaneous Diophantine approximation problems. Until 2014 only a few hundred values were known where the Riemann-Siegel Z-function (i.e: Z(t)) larger...
Incremental clustering has been proposed to handle large datasets which can not fit into memory entirely. Single pass fuzzy c-means (SpFCM) and Online fuzzy c-means (OFCM) are two representative incremental fuzzy clustering methods. Both of them extend the scalability of fuzzy c-means (FCM) by processing the dataset chunk by chunk. However, due to the data sparsity and high-dimensionality, SpFCM and...
One type of distributed systems is the client/server system consist of clients and servers. In order to improve the performance of such a system, client assignment strategy plays an important role. There are two criteria to evaluate the load on the servers — total load and load balance. The total load increases when the load balance decreases, vice versa. It has been proved that finding the best client...
One of the main challenges in Computer- Supported Collaborative Learning (CSCL) is group formation. Various approaches have been reported in the literature to tackle this problem, but none have offered an optimal solution. In this study a novel binary integer programming formulation was proposed to model the group formation problem and optimally assign each learner to the most appropriate group. In...
Before inpainting damaged Thangka image using digital technology, it's necessary to segment the damaged regions. A segmentation algorithm is proposed to segment the spotted regions of damaged Thangka mage, which combines grayscale morphology with maximum entropy threshold method. First of all, the mathematical morphology is used to act on RGB channels respectively in order to segment the tiny spots...
Plagiarism is one of the growing issues in academia and is always a concern in Universities and other academic institutions. The situation is becoming even worse with the availability of ample resources on the web. This paper focuses on creating an effective and fast tool for plagiarism detection for text based electronic assignments. Our plagiarism detection tool named AntiPlag is developed using...
Results diversification for keyword search on XML documents has attracted considerable attentions from research community in recent years. Though search results were diversified from different perspectives in the existing methods, the effects were still far away from satisfactory. This paper proposes a new way to diversify search results according to the semantic information of central entities which...
Outlier detection is used to detect abnormalities in various application domains including clustering based disease onset identification, gene expression analysis, computer network intrusion, financial fraud detection and human behaviour analysis. Existing methods to detect outliers are inadequate due to poor accuracy and lack of any general technique. Most techniques consider either small clusters...
Data mining is the extraction of hidden predictive information from large databases and it is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. In data mining there are two activities such as Classification...
A movie recommendation is important in our social life due to its strength in providing enhanced entertainment. Such a system can suggest a set of movies to users based on their interest, or the popularities of the movies. Although, a set of movie recommendation systems have been proposed, most of these either cannot recommend a movie to the existing users efficiently or to a new user by any means...
Attribute reduction is a core research topic of rough set, but classical attribute reduction algorithm and its extended algorithms base on decision tables with decision attributes and can not be applied to attribute reduction of abnormal decision tables without decision attributes. So, based on rough set theory, it studied abnormal decision tables in fractal dimension and presented a heuristic attribute...
With the concepts of cloud computing springing up, the researches of data mining clustering algorithm which is based on cloud computing become a research focus for scholars both at home and abroad. This article aiming at the extensive data clustering problem, using cloud computing technology, according to Hadoop platform does a deep research based on cloud computing platforms Hadoop and parallel K-means...
Load profiling provides the necessary information about daily demand patterns for the short and medium-term actions οf retailers and utilities. Consumer characterization is a two stage approach: In the first stage, the daily load curves of each consumer are classified in a certain number of clusters. Each cluster constitutes a load profile. In the second stage, one of these profiles is chosen as representative...
This paper provided a Data Stream Clustering Algorithm Based on Grid and Relative Density which inherits the advantages of grid based clustering and relative density based clustering which can discover arbitrary-shape and multi-resolution clusters. Meanwhile introduced a concept of support which made algorithm adopt thoughts based on distance-based algorithm, provide the way to solve the effects of...
Solving duster identification problem on large amount of data ig known to be time consuming. Àlmoit au the state of art clustering techniques focuses on sequential algorithms which suffer from me problem of long runtime. So, parallel algorithms are needed. One of the attempts is a parallel minimum spanning tree (MST)-based clustering technique, called CLUMP, which identifies dense clusters in a noisy...
A clustering routing algorithm for wireless sensor network (WSN) based on improved ant colony algorithm is proposed in this paper. This proposed algorithm is based on the advantages of clustering algorithm and ant colony algorithm, which applies improved ant colony algorithm to the clustering algorithm, in order to find the best path from cluster head to sink. To improve ant colony algorithm, the...
In this paper, examples are used to explain that the density-based clustering algorithm fits for grouping students according their score. And GDLD algorithm is used to analyze the information of students from School of Distance Education of Tianjin University so as to provide some useful inspiration on further study of learning mode. So teaching managers could make a much better understanding of students'...
This study aims at development of methods to track the center of and detect lanes as the first step of automated tool to analyze rose DNA using PCR gel electrophoresis images. Although several research results have been previously reported using projection profiles in a whole image, it is still challenge to track the center of and detect bent lanes using projection profiles. To resolve the problem,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.