The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Consider a problem of estimating an unknown high dimensional density whose support lies on unknown low-dimensional data manifold. This problem arises in many data mining tasks, and the paper proposes a new geometrically motivated solution for the problem in manifold learning framework, including an estimation of an unknown support of the density. Firstly, tangent bundle manifold learning problem is...
Data Jacket (DJ) is a technique for sharing information about data and for considering the potential value of datasets, with the data itself hidden, by describing the summary of data in natural language. In DJs, variables are described by variable labels (VLs), which are the names/meanings of variables. In the previous study, the matrix-based method for inferring VLs in DJs whose VLs are unknown,...
To collect and explicate meaningful knowledge of a community, we propose an Activity Model based on structured knowledge. The following issues arise related to the model development: (a) difficulties in capturing activities; (b) difficulty of acquiring knowledge; and (c) difficulty in optimizing the activities to newly adopted technologies. Therefore, we are developing technologies that use on-site...
We present a novel and configurable synthetic data generator for evolving region trajectories that emulates certain characteristics of a given input dataset, such as the spatial position, velocity, lifespan, and geometry shape and size. This tool aims to facilitate faster prototyping and evaluation of new spatiotemporal data mining algorithms that operate on a specific type of trajectory data, of...
In this paper, we propose a new discriminative dictionary learning framework, called robust Label Embedding Projective Dictionary Learning (LE-PDL), for data classification. LE-PDL can learn a discriminative dictionary and the blockdiagonal representations without using the l0-norm or l1-norm sparsity regularization, since the l0 or l1-norm constraint on the coding coefficients used in the existing...
The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on...
Post Traumatic Stress Disorder (PTSD) is a public health problem afflicting millions of people each year. It is especially prominent among military veterans. Understanding the language, attitudes, and topics associated with PTSD presents an important and challenging problem. Based on their expertise, mental health professionals have constructed a formal definition of PTSD. However, even the most assiduous...
Opioid (e.g., heroin and morphine) addiction has become one of the largest and deadliest epidemics in the United States. To combat such deadly epidemic, there is an urgent need for novel tools and methodologies to gain new insights into the behavioral processes of opioid addiction and treatment. In this paper, we design and develop an intelligent system named iOPU to automate the detection of opioid...
In recent years, predicting future hot events in online social networks is becoming increasingly meaningful in marketing, advertisement, and recommendation systems to support companies' strategy making. Currently, most prediction models require long-term observations over the event or depend a lot on other features which are expensive to extract. However, at the early stage of an event, the temporal...
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts, choosing a "good" value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may...
We study bribery resistance properties in two classes of reputation-based ranking systems, where the rankings are computed by weighting the rates given by users with their reputations. In the first class, the rankings are the result of the aggregation of all the ratings, and all users are provided with the same ranking for each item. In the second class, there is a first step that clusters users by...
Time series classification has attracted much attention due to the ubiquity of time series. With the advance of technologies, the volume of available time series data becomes huge and the content is changing rapidly. This requires time series data mining methods to have low computational complexities. In this paper, we propose a parameter-free time series classification method that has a linear time...
Media analysis can reveal interesting patterns in the way newspapers report the news and how these patterns evolve over time. One example pattern is the quoting choices that media make, which could be used as bias indicators. Media slant can be expressed both with the choice of reporting an event, e.g. a person’s statement, but also with the words used to describe the event. Thus, automatic discovery...
Product bundling is widely adopted for information goods and online services because it can increase profit for companies. For example, cable companies often bundle Internet access and video streaming services together. However, it is challenging to obtain an optimal bundling strategy, not only because it is computationally expensive, but also that customers’ private information (e.g., valuations...
In recent years, finding repetitive similar patterns in time series has become a popular problem. These patterns are called time series motifs. Recent studies show that using grammar compression algorithms to find repeating patterns from the symbolized time series holds promise in discovering approximate motifs with variable length. However, grammar compression algorithms are traditionally designed...
NB-UVB Phototherapy is one of the most common treatments administrated by dermatologists for psoriasis patients. Although in general, the treatment results in improving the condition, it also can worsen it. If a model can predict the treatment response before hand, the dermatologists can adjust the treatment accordingly. In this paper, we use data mining techniques and conduct four experiments. The...
The rapidly increasing availability of healthcare data from multiple heterogeneous sources has spearheaded the adoption of data-driven approaches for improved clinical research, decision making, and patient management. The patient healthcare data are usually longitudinal and can be expressed as medical event sequences, where the events include clinical diagnosis, medications, laboratory reports, etc...
Granger causality is proposed to fuse stock prices and social media sentiment information for stock market prediction. Sentiment extraction is performed on the Twitter data from major stock companies. Analysis shows that authoritative user's sentiment affects the other users after an event with the lag of 3 days. The prediction is performed for Twitter and stock data from four companies. The sentiment...
In recent years, the usage of unmanned aircraft systems (UAS) for security-related purposes has increased, ranging from military applications to different areas of civil protection. The deployment of UAS can support security forces in achieving an enhanced situational awareness. However, in order to provide useful input to a situational picture, sensor data provided by UAS has to be integrated with...
Background: Code smells are indicators of quality problems that make a software hard to maintain and evolve. Given the importance of smells in the source code's maintainability, many studies have explored the characteristics of smells and analyzed their effects on the software's quality. Aim: We aim to investigate fundamental characteristics of code smells through an empirical study on frequently...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.