The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering is an important tool for analyzing gene expression data. Many clustering algorithms have been proposed for the analysis of gene expression data. In this article we have clustered real life gene expression data via K-Means which is one of clustering algorithms. Also, we have proposed a new method determining the initial cluster centers for K-means. We have compared results of our method...
Clustering is an important unsupervised data analysis technique, which divides data objects into clusters based on similarity. Clustering has been studied and applied in many different fields, including pattern recognition, data mining, decision science and statistics. Clustering algorithms can be mainly classified as hierarchical and partitional clustering approaches. Partitioning around medoids...
In traditional text sentiment analysis methods, text feature vector has the problem of high dimensionality and high sparseness. In view of this situation, we can cluster the similar words together and use the generated clusters to fit into a new dimension so that the text feature vector dimension will be decreased. By using Word2Vec tool and K-means clustering algorithm, this task can be completed...
Trust and reputation management is introduced tothe Online Social networks (OSNs) as a solution to promote ahealthy collaboration relationship among participants. Currently, most trust and reputation systems focus on evaluating thecredibility of the users. The reputation systems in OSNs have asobjective to help users to make difference between trustworthyand untrustworthy, and encourage honest users...
With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data are implicit, previously unknown and potentially useful information. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data...
Search systems that is used to search for information. Cite Seer was a search engine to search academic documents. Platforms are not available to discover algorithms in scholarly big data. The limitations of these search engines make the searching more difficult. Hence special purpose systems are used. Here proposes a search system to extract algorithm representations. Algorithms can be represented...
Superpixel has been widely applied in hyperspectral image processing as a pre-processing step for over-segmentation. However, most superpixel algorithms are difficult to control the segmentation balance between fragmentation and accuracy. In this paper, we propose a superpixel aggregation model to cluster the over-segmentations. Based on the own importance and interrelationship of superpixels, a two-step...
The Euclid distance based K-means clustering is among the hard classification algorithms. When dealing with deterministic remote sensing data, it is difficult to gain satisfactory classification results using K-means algorithm. The traditional K-means clustering algorithm is faced with several shortcomings such as locally converged optimization, being sensitive to initial clustering centers, etc....
Ensemble clustering consists in combining multiple clustering solutions into a single one, called the consensus, which can produce a more accurate and robust clustering of the data. In this paper, we attempt to implement ensemble clustering using Dempster-Shafer evidence theory. Individual clustering solutions are obtained using evidence theory and a novel diversity measure is proposed using the distance...
The significant increase in the use of bike sharing systems (BSSs) causes imbalances in the distribution of bikes, creating logistical challenges and discouraging bike riders who find it difficult to pick up or drop off a bike at their desired location. We investigated this issue by finding the network-wide availability patterns and how these patterns evolve temporally using a novel supervised clustering...
In this paper, we propose an incremental ensemble classifier learning method. In the proposed method, a set of accurate and diverse classifiers are generated and added to the ensemble by means of accuracy and diversity comparison. The selection of classifiers in ensemble starts with a layer (where data is partitioned into any given number of clusters and fed to a set of base classifiers) and then...
Stream mining is a trending field of research in this digital age. With the increase in number of users of digital technologies, data is generating exponentially and so is the need to analyse it. This data is very huge in size and cannot be kept stored for a long time, so it must be processed as soon as possible to make space for newly arriving data & to achieve this different single scan algorithms...
The Nearest Neighbor Classification (NNC) has been widely used as classification method, due to its simplicity, classification efficiency and its ability to deal with different classification problems. Despite its good classification accuracy, the NNC suffers from many shortcomings on the execution time, noise sensitivity, high storage requirements and lack of interpretability. In this paper, we propose...
In many governments and private institutions, one of the major tasks is to select the best project proposals for allocating the fund. These funding organizations select the proposals by submitting them to the reviewers for review. Manual process is too difficult when the number of projects is more. The earlier models introduced ontology based Text mining methods to cluster the proposals of any language...
Clustering is one of the most important unsupervised classification strategies in data analysis. In this sense, a new clustering approach proposed a fast search algorithm of cluster centers based on their local densities has taken place. In the present paper, we suggest a new performed approach that combine the estimation of the local density and the use of the entropy. So the clustering algorithm...
The dynamic nature of application workloads in modern MPSoC-based embedded systems is growing. To cope with the dynamism of application workloads at run time and to improve the efficiency of the underlying system architecture, this paper presents a novel run-time resource allocation algorithm for multimedia applications with the objective of minimizing energy consumption for predefined deadlines....
In this paper we present a methodology for monitoring of human activities in home using audio recordings captured from mobile phone. Specifically, after estimating a large set of audio features, unsupervised clustering is performed in order to extract feature subspaces. Human activity sound models were trained using different combinations of these subspaces. The best performance 92.46% was achieved...
Classification is a method used to predict target class for each case in datasets. Classification performance depends on the nature of the sample input datasets during the training of classifier. When data samples of one class are more than the data samples of other class, then it is called as imbalanced datasets. In such case, algorithms always favor classifying samples into the overrepresented (majority)...
Artificial Neural Networks (ANNs) are human made information processing artifacts, and grown up vast in two-three decade. Neural Networks are highly parallelized dynamic system which accept output response as input and produce output. They have confirmed to be extensively beneficial in solving those problems which cannot be solved by using algorithmic procedures which are considered to be conventional,...
This paper presents, simulate, access and applies the proposed for data classification of medical dataset with aims to classify patients based on medical history. This modified data classification algorithm was formulated using k-Means algorithm. The simulation has been performed by using Real and artificial datasets on MATLAB 7.7.0 and showed that increasing the accuracy of data classification of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.