The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Feature selection, as a fundamental component of building robust models, plays an important role in many machine learning and data mining tasks. Since acquiring labeled data is particularly expensive in both time and effort, unsupervised feature selection on unlabeled data has recently gained considerable attention. Without label information, unsupervised feature selection needs alternative criteria...
This paper considers ship route extraction and clustering problem based on Automatic Identification System (AIS) data. For the ships with known Maritime Mobile Service Identify (MMSI), we propose a ship route extraction method by using AIS data. For ship route clustering, hierarchical clustering method is selected. We firstly define a distance between ship routes to measure the dissimilarity of them...
Consider a problem of estimating an unknown high dimensional density whose support lies on unknown low-dimensional data manifold. This problem arises in many data mining tasks, and the paper proposes a new geometrically motivated solution for the problem in manifold learning framework, including an estimation of an unknown support of the density. Firstly, tangent bundle manifold learning problem is...
The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on...
Post Traumatic Stress Disorder (PTSD) is a public health problem afflicting millions of people each year. It is especially prominent among military veterans. Understanding the language, attitudes, and topics associated with PTSD presents an important and challenging problem. Based on their expertise, mental health professionals have constructed a formal definition of PTSD. However, even the most assiduous...
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts, choosing a "good" value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may...
There has been a surge in research interest in learning feature representation of networks in recent times. Researchers, motivated by the recent successes of embeddings in natural language processing and advances in deep learning, have explored various means for network embedding. Network embedding is useful as it can exploit off-the-shelf machine learning algorithms for network mining tasks like...
Clustering is an important branch in the field of data mining as well as statistical analysis and is widely used in exploratory analysis. Many algorithms exist for clustering in the Euclidean space. However, time series clustering introduces new problems, such as inadequate distance measure, inaccurate cluster center description, lack of efficient and accurate clustering techniques. When dealing with...
Network clustering is an essential approach to finding latent clusters in real-world networks. As the scale of real-world networks becomes increasingly larger, the existing network clustering algorithms fail to discover meaningful clusters efficiently. In this paper, we propose a framework called AnySCAN, which applies anytime theory to the structural clustering algorithm for networks (SCAN). Moreover,...
Time series motifs are approximately repeating patterns in real-valued time series data. They are useful for exploratory data mining and are often used as inputs for various time series clustering, classification, segmentation, rule discovery, and visualization algorithms. Since the introduction of the first motif discovery algorithm for univariate time series in 2002, multiple efforts have been made...
IoT systems deployed in industrial and smart factory settings generate large volumes of data at high velocity. Context awareness is mandatory for knowledge discovery and actionable insights from such high-velocity, high-volume IoT data streams. Changes to the context of a data stream are represented in the underlying data distribution. Research in concept drift aims to detect and adapt to such changes...
Clustering is an important tool for analyzing gene expression data. Many clustering algorithms have been proposed for the analysis of gene expression data. In this article we have clustered real life gene expression data via K-Means which is one of clustering algorithms. Also, we have proposed a new method determining the initial cluster centers for K-means. We have compared results of our method...
The unified Parkinson's disease rating scale (UPDRS) is the most widely employed scale for tracking Parkinson's disease (PD) symptom progression. However, conventional way to achieve UPDRS, mainly based on the physical examinations of clinic patients performed by the trained medical staffs, involves the disadvantages of inconvenience and high medical expense. Hence, in this study, we try to explore...
Density peak (DP) based clustering algorithm is a recently proposed clustering approach and has been shown to be with great potential. This algorithm is based on the simple assumption that cluster centers have high local density and they are relatively far from each other. This observation is used to isolate cluster centers from other data. By making use of the density relationship among neighboring...
Association rule mining is a very essential data mining technique in different fields. The enormous development of the information needs increased computational power. To address this issue, it is important to study executions of mining algorithms. To find out the frequent itemsets is an essential and vital issue in numerous information mining applications. There are many algorithms present to extract...
Content-Centric Networking (CCN) proposals rethink the communication model around named data. In-network caching is a fundamental feature to distinguish the CCN from the current host-centric IP network. In this paper, we have proposed a hybrid caching scheme which combines the on-path one and the off-path one. We leverage the ISOMAP manifold learning algorithm to distinguish the importance of nodes...
The random Fourier Features method has been found very effective in approximating the kernel functions. Our former studies show that through a mixing mechanism of the feature space formed by random Fourier features and certain linear algorithms, the fuzzy clustering results in the approximated feature space are comparable to or even exceed the classical kernel-based algorithms. To increase the robustness...
How to reduce the computation time and how to improve the quality of the clustering result are the two major research issues. Although several efficient and effective clustering algorithms have been presented, none of which is perfect. As such, an effective clustering algorithm, which is based on the prediction of searching information to determine the search directions at later iterations and employs...
Social networks are no longer a place where you can spend leisure time and chat with friends. It is also a business instrument in work with their audiences to increase brand recognition, total result from marketing and move sales up. For this purposes it's needed to make thorough analysis of the target audience, scan dozens of user profiles, reveal their interests, positions and estimate users LTV...
Spectral clustering is one of the most effective methods of data mining, in which the adjacency matrix is constructed by using the similarity matrix. In this paper, to extend spectral clustering method for uncertain data clustering, we propose a new spectral clustering method based on JS-divergence. In the proposed method, the JS-divergence is used to construct the adjacency matrix in the spectral...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.