The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering is an important branch in the field of data mining as well as statistical analysis and is widely used in exploratory analysis. Many algorithms exist for clustering in the Euclidean space. However, time series clustering introduces new problems, such as inadequate distance measure, inaccurate cluster center description, lack of efficient and accurate clustering techniques. When dealing with...
We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale...
There has been a surge in research interest in learning feature representation of networks in recent times. Researchers, motivated by the recent successes of embeddings in natural language processing and advances in deep learning, have explored various means for network embedding. Network embedding is useful as it can exploit off-the-shelf machine learning algorithms for network mining tasks like...
The interests of individual Internet users fall into a hierarchical structure which is useful in regards to building personalized searches and recommendations. Most studies on this subject construct the interest hierarchy of a single person from the document perspective. In this study, we constructed the user interest hierarchy via user profiles. We organized 433,397 user interests, referred to here...
Post Traumatic Stress Disorder (PTSD) is a public health problem afflicting millions of people each year. It is especially prominent among military veterans. Understanding the language, attitudes, and topics associated with PTSD presents an important and challenging problem. Based on their expertise, mental health professionals have constructed a formal definition of PTSD. However, even the most assiduous...
The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on...
Consider a problem of estimating an unknown high dimensional density whose support lies on unknown low-dimensional data manifold. This problem arises in many data mining tasks, and the paper proposes a new geometrically motivated solution for the problem in manifold learning framework, including an estimation of an unknown support of the density. Firstly, tangent bundle manifold learning problem is...
The Krylov subspace based information retrieval (IR) approach has been shown to provide comparable accuracy to latent semantic indexing (LSI), while providing some computational advantages. Recently, in the area of numerical linear algebra, attention has been drawn to the block Krylov subspace methods, which are shown to be more efficient than the classic Krylov subspace methods in solving linear...
Due to the advances of wireless sensor networks, radiofrequency identification (RFID) and Web-based services, large volume of devices have been interconnected to the Internet of Things (IoT). In addition, the tremendous number of IoT services provided by service providers arises an urgent need to propose effective recommendation methods to discover suitable services to users. In this paper, we propose...
Feature selection, as a fundamental component of building robust models, plays an important role in many machine learning and data mining tasks. Since acquiring labeled data is particularly expensive in both time and effort, unsupervised feature selection on unlabeled data has recently gained considerable attention. Without label information, unsupervised feature selection needs alternative criteria...
Sparse subspace clustering (SSC) is an effective approach to cluster high-dimensional data. However, how to adaptively select the number of clusters/eigenvectors for different data sets, especially when the data are corrupted by noise, is a big challenge in SSC and also an open problem in field of data mining. In this paper, considering the fact that the eigenvectors are robust to noise, we develop...
The data anonymization landscape has become quite complex in the last decades. On the methodology side, the statistical disclosure control methods designed in official statistics have been supplemented by a number of privacy models proposed by computer scientists. On the data side, static data sets now coexist with big data, and particularly data streams. In the quest for a unified and conceptually...
This paper presents detailed anomaly detection evaluation on operational time-series data of Internet of Things (IoT) based household devices in general and Heating, Ventilation and Air Conditioning (HVAC) systems in specific. Due to the number of issues observed during evaluation of widely used distance-based, statistical-based, and cluster-based anomaly detection techniques, we also present a pattern-based...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.