The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering analysis has been widely used in many areas such as astronomy, bioinformatics, and pattern recognition. In 2014, Rodriguez proposed an algorithm based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher density. But the density relies on cutoff distance, which might be affected by large statistical...
In traditional multiple instance learning (MIL), both positive and negative bags are required to learn a prediction function. However, a high human cost is needed to know the label of each bag—positive or negative. Only positive bags contain our focus (positive instances) while negative bags consist of noise or background (negative instances). So we do not expect to spend too much to label the negative...
Images, text, web documents, videos, real-world data are very often high-dimensional. Many researchers or users may need to construct accurate predictive models for a variety of applications, especially those that involve clustering. Handling high dimensional data is a reality in processing task involving areas such as high-throughput genotyping platforms and human genetic clustering in bioinformatics,...
Network is a powerful paradigm for representing complex relationships and finding the community structure of networks can help people better understand the real world. Infomap, which employs the minimum description length as the optimization objective, is a competent algorithm for community structure analysis. In this paper, we propose a novel algorithm combining flow-based ensemble learning and Label...
Detecting overlapping protein complexes in protein-protein interaction (PPI) networks can provide insight into cellular functional organization and thus elucidate underlying cellular mechanisms. Recently, various algorithms for protein complex detection have been developed for PPI networks. However, the majority of algorithms primarily depend on network topological features and/or gene expression...
In this paper, we demonstrate new techniques for data representation in the context of deep learning using agglomerative clustering. The results from previous work show that a good number of encoding and decoding filters of layered autoencoders are duplicative thereby enforcing two or more processing filters to extract the same features due to filtering redundancy. We propose a new way to circumvent...
In the multi-label classification issue, some implicit constraints and dependencies are always existed among labels. Exploring the correlation information among different labels is important for many applications. It not only can enhance the classifier performance but also can help to interpret the classification results for some specific applications. This paper presents an improved multi-label classification...
The classification of high dimensional data is an arduous task especially with the emergence of high quality data acquisition techniques. This problem is accentuated when the whole set of features is needed to learn a classifier such as the case of genomic data. The Bayesian approach is suitable for these applications because it represents graphically and statistically the dependencies between the...
Unsupervised transfer learning has attracted a lot of attention in the big data era, due to its capability of extracting knowledge from large-scale unlabeled samples in multiple data domains. Existing unsupervised transfer learning methods mainly focus on learning a common latent space for source and target domains, while the data representation and subspace structure in target domain are usually...
Kernel k-means is seen as a non-linear extension of the k-means clustering method, with good performance in identifying non-isotropic and linearly inseparable clusters. However space and time requirement of kernel k-means is expensive with O(n2) complexity. Present applications with large in-memory computations make this method insuitable for large data sets. Recently, a simple prototype based hybrid...
We propose an effective subspace selection scheme as a post-processing step to improve results obtained by sparse subspace clustering (SSC). Our method starts by the computation of stable subspaces using a novel random sampling scheme. Thus constructed preliminary subspaces are used to identify the initially incorrectly clustered data points and then to reassign them to more suitable clusters based...
It is well known that clustering is an unsupervised machine learning technique. However, most of the clustering methods need setting several parameters such as number of clusters, shape of clusters, or other user- or problem-specific parameters and thresholds. In this paper, we propose a new clustering approach which is fully autonomous, in the sense that it does not require parameters to be pre-defined...
The Johnson-Lindenstrauss (JL) lemma, with known probability, sets a lower bound q0 on the dimension for which a random projection of p-dimensional vector data is guaranteed to be within (1±ε) of being an isometry in a randomly projected downspace. We study several ways to identify a “good” rogue random projection when the target downspace has dimensions below the JL limit. The tools used towards...
In this paper we deal with one of the most relevant problems in the field of data mining, the real time processing and visualization of data streams. To deal with data streams we propose a novel approach that uses a neighborhood-based clustering. Instead of processing each new element one by one, we propose to process each group of new elements simultaneously. A clustering is applied on each new group...
In this paper we introduce a new learning approach, which provides automated topological co-clustering based on Self-Organizing Map. The proposed approach (wd-TCoC) is computationally simple, learns a different feature's weights vector for each prototype (relevance vector) and estimate the data density distribution on the map to produce an automatic clustering. The feature's weights are computed for...
Clustering is an important technique widely used in many areas such as machine learning, pattern recognition, data analysis etc. Data stream clustering is a branch of clustering that draws much attention in recent years, where data objects are processed as an ordered sequence. In this paper, we propose an unsupervised learning neural network named Density Based Self Organizing Incremental Neural Network(DenSOINN)...
Growing volumes of text and increasing expectations on the complexity of analysis entail advanced approaches to text mining. Unsupervised text clustering is an efficient approach to determine structural groupings in a text corpus without the impact of external bias. The information content of such structural groupings needs to be enhanced by integrating semantics into the cluster outcomes. This integration...
Multiple-kernel k-means (MKKM) clustering has demonstrated good clustering performance by combining pre-specified kernels. In this paper, we argue that deep relationships within data and the complementary information among them can improve the performance of MKKM. To illustrate this idea, we propose a diversity-induced MKKM algorithm with extreme learning machine (ELM)-based feature extracting method...
The importance of the q-Gaussian distributions is attributed to their power law nature and the fact that they generalize the Gaussian distributions (q → 1 retrieves the Gaussian distributions). While for q > 1, a q-Gaussian distribution is nothing but a Student's t-distribution, which is a long tailed distribution, for q < 1 it is a distribution with a compact support. Though mixture modeling...
Collaborative filtering provides recommendations based on the behavior of each user combined with behavior of users with similar interests. Recommender systems are becoming widespread, helping people choose movies, books, and things to buy. In this study, we examine the use of Biclustering ARTMAP to build a collaborative filtering recommendation system. We introduce a novel modification to how the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.