Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
In this paper, we adapt two existing methods to perform semi-supervised temporal clustering: Aligned Cluster Analysis (ACA), a temporal clustering algorithm, and Constrained Spectral Clustering, a semi-supervised clustering algorithm. In the first method, we add side information in the form of pair wise constraints to its objective function, and in the second, we add a temporal search to its framework...
With the development of Internet, online social networks and websites generate a large amount of data. At the same time, several distributed systems, represented by Hadoop, has been proposed to handle mass data. These systems provide both efficient and convenient way to construct different kinds of algorithms. Community detection, a traditional research area, is now facing the challenge of Big Data...
Hierarchical Multi-label Classification (HMC) is a challenging real-world problem that naturally emerges in several areas. This work proposes two new algorithms using a Probabilistic Graphical Model based on Dependency Networks (DN) to solve the HMC problem of classifying gene functions into pre-established class hierarchies. DNs are especially attractive for their capability of using traditional,...
As an important branch of machine learning, clustering is wildly used for data analysis in various domains. Hierarchical clustering algorithm, one of the traditional clustering algorithms, has excellent stability yet relatively poor time complexity. In this paper, we proposed an efficient hierarchical clustering algorithm by searching given nodes' nearest neighbors iteratively, which depends on an...
Many real-world applications involve multi-label data streams, so effective concept drift detection methods should be able to consider the unique properties of multi-label stream data, such as label dependence. To deal with these challenges, we proposed an efficient and effective method to detect concept drift based on label grouping and entropy for multi-label data. Two methods are proposed to group...
Ensuring high reliability of large-scale clusters is becoming more critical as the size of these machines continues to grow, since this increases the complexity and amount of interactions between different nodes and thus results in a high failure frequency. For this reason, predicting node failures in order to prevent errors from happening in the first place has become extremely valuable. A common...
Incomplete data clustering plays an important role in the big data analysis and processing. Existing algorithms for clustering incomplete high-dimensional big data have low performances in both efficiency and effectiveness. The paper proposes an incomplete high-dimensional big data clustering algorithm based on feature selection and partial distance strategy. First, a hierarchical clustering-based...
This paper proposes a new method for automated clustering of high dimensional datasets. The method is based on a recursive binary division strategy that successively divides an original dataset into distinct clusters. Each binary division is carried out using a model-free expectation maximization scheme that exploits the posterior probability computation capability of the quasi-supervised learning...
Aiming at the long response time, inaccurate recommendation and cold-start problems that faced by present recommendation algorithm, this paper, taking movie recommendation system as an example, proposes a collaborative filtering recommendation model based on user's credibility clustering. This model divides recommendation process into offline and online phases. Offline, it uses the result of user's...
Abstract-Image annotation has been identified to be a suitable means by which the semantic gap which has made the accuracy of Content-based image retrieval unsatisfactory be eliminated. However existing methods of automatic annotation of images depends on supervised learning, which can be difficult to implement due to the need for manually annotated training samples which are not always readily available...
The planted (l, d) motif discovery has been successfully used to locate transcription factor binding sites in dozens of promoter sequences over the past decade. However, there has not been enough work done in identifying (l, d) motifs in the next-generation sequencing (ChIP-seq) data sets, which contain thousands of input sequences and thereby bring new challenge to make a good identification in reasonable...
The Texture Feature Extraction (TFE) plays an important role in satellite image processing application. This paper proposes a novel method for Satellite Imagery Classification. Our proposed method is a combination of Local Binary Pattern (LBP) and Fuzzy c-means classification algorithm. Local Binary Pattern is calculated by thresholding a 3 × 3 neighborhood of each pixel by the center pixel value...
Idiopathic generalized epilepsy (IGE) and symptomatic generalized epilepsy (SGE) are two kinds of generalized epilepsy. In this study, we discussed the methods of automatically segmentation of MR images for patients with these two kinds of epilepsy. K-Means clustering, expectation-maximization, and fuzzy c-means algorithms were employed to perform segmentation on brain images for patients with IGE...
Big data is a set of very large and complex data that is hard to load on computers. The main challenge in big data world is related to their search, categorize and analyze specially, when they are unbalanced. Despite, there are a lot of works in the field of big data but analyzing unbalanced big data is still a fundamental challenge in this area. In this paper we try to solve the problem of RSIO-LFCM...
Parkinson's disease (PD) is a chronic neurological progressive disorder caused by lack of the chemical dopamine in the brain. Up to today, there is still no cure or prevention for PD, and usually the disease worsens gradually over time. However, this disease can be controlled with some treatment, especially in the early stage. Hence, this study proposes a method in early detection and diagnosis of...
Original K-medoid algorithm use to take initial medoids arbitrarily that bears on the resulting clusters and it leads to unstable and empty clusters which are no meaningful and also amount of iterations can be rather high so K-Medoid is not a substitute for big databases because of its computational complexity. Also the original k-means algorithm is computationally. Though existing algorithms usually...
In today's networked environment, massive volume of data being generated, gathered and stored in databases across the world. This trend is growing very fast, year after year. Today it is normal to find databases with terabytes of data, in which vital information and knowledge is hidden. The unseen information in such databases is not feasible to mine without efficient mining techniques for extracting...
This paper aims on collaborative filtering (CF) in TV recommendation system which combines content-based and collaborative filtering recommendation mechanism, we propose an algorithm that using the self-organizing mapping (SOM) to optimize the improved k-means (IK) clustering in collaborative filtering. The whole clustering algorithm is divided into two phases: at the first stage, the quantity of...
Clustering the genes based on their expression patterns is one of the important subjects in analyzing microarray data. Discovering the genes co-expressed in particular conditions has been done by different clustering algorithms. In these methods, the similar genes are located in the same cluster. Thus, the closer the similar genes, the further the dissimilar ones will be. Each of the applied methods...
The majority of learning systems don't take in consideration real world data problem and consider that the training sets are perfect. However, in real world data, this hypothesis is not always true. In fact, real world data is characterized by many different problems like redundancy, incoherence or the big size of data. In this paper we focus on the problem of imbalance between class. Many solutions...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.