The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The widespread prevalence of dietary supplements has drawn extensive attention due to the safety and efficacy issue. Clinical notes document a great amount of detailed information on dietary supplement usage, thus providing a rich source for clinical research on supplement safety surveillance. Identification the use status of dietary supplements is one of the initial steps for the ultimate goal of...
Clustering is an important branch in the field of data mining as well as statistical analysis and is widely used in exploratory analysis. Many algorithms exist for clustering in the Euclidean space. However, time series clustering introduces new problems, such as inadequate distance measure, inaccurate cluster center description, lack of efficient and accurate clustering techniques. When dealing with...
Network clustering is an essential approach to finding latent clusters in real-world networks. As the scale of real-world networks becomes increasingly larger, the existing network clustering algorithms fail to discover meaningful clusters efficiently. In this paper, we propose a framework called AnySCAN, which applies anytime theory to the structural clustering algorithm for networks (SCAN). Moreover,...
Driven by the dramatic growth of data both in terms of the size and sources, learning from heterogeneous data is emerging as an important research direction for many real applications. One of the biggest challenges of this type of problem is how to meaningfully integrate heterogeneous data to considerably improve the generality and quality of the learning model. In this paper, we first present a unified...
CANDECOMP/PARAFAC Decomposition (CPD) is one of the most popular tensor decomposition methods that has been extensively studied and widely applied. In recent years, sparse tensors that contain a huge portion of zeros but a limited number of non-zeros have attracted increasing interest. Existing techniques are not directly applicable to sparse tensors, since they mainly target dense ones and usually...
In machine learning, data augmentation is the process of creating synthetic examples in order to augment a dataset used to learn a model. One motivation for data augmentation is to reduce the variance of a classifier, thereby reducing error. In this paper, we propose new data augmentation techniques specifically designed for time series classification, where the space in which they are embedded is...
Nowadays, a hot challenge for supermarket chains is to offer personalized services to their customers. Market basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable of capturing at the same time the different factors influencing the customer's decision process: co-occurrence,...
Secondary use of biomedical data has gained much attention recently to facilitate rapid knowledge discovery in biomedicine. Association Rule Mining (ARM) has been a popular technique for biomedical researchers to perform exploratory data analysis and discover potential relationships among variables in biomedical datasets. However, ARM of a high-dimensional biomedical dataset may produce a large number...
We deal with online learning of acyclic Conditional Preference networks (CP-nets) from data streams, possibly corrupted with noise. We introduce a new, efficient algorithm relying on (i) information-theoretic measures defined over the induced preference rules, which allow us to deal with corrupted data in a principled way, and on (ii) the Hoeffding bound to define an asymptotically optimal decision...
Linear Discriminant Analysis (LDA) is widely-used for supervised dimension reduction and linear classification. Classical LDA, however, suffers from the ill-posed estimation problem on data with high dimension and low sample size (HDLSS). To cope with this problem, in this paper, we propose an Adaptive Wishart Discriminant Analysis (AWDA) for classification, that makes predictions in an ensemble way...
Electronic medical record (EMR) system has become increasingly more important in developed countries due to its convenience and efficiency in medical information storage, management and analysis. However, one of the main limitations of EMR lies in that the clinical data for patients cannot be exchanged among different medical institutions. Recently, the cloud-based clinic system, featuring in lower...
Messages posted to social media in the aftermath of a natural disaster have value beyond detecting the event itself. Mining such deliberately dropped digital traces allows a precise situational awareness, to help provide a timely estimate of the disaster’s consequences on the population and infrastructures. Yet, to date, the automatic assessment of damage has received little attention. Here, the authors...
Scene text extraction is always a challenging task owing to its usual disturbing factors such as complex image backgrounds and various text behaviors (sizes, colors, styles and alignments). This paper proposes a scene text extraction approach based on the novel concept of ‘symmetrical edge-point pairs’ (‘point-pair’), which is adopted to describe the sizes, directions and brightness information of...
Banks and financial institutions around the world must comply with several policies for the prevention of money laundering and in order to combat the financing of terrorism. Nowadays, there is a raise in the popularity of novel financial technologies such as digital currencies, social trading platforms and distributed ledger payments, but there is a lack of approaches to enforce the aforementioned...
IP Addresses are a central part of packet- and flow-based network data. However, visualization and similarity computation of IP Addresses are challenging to due the missing natural order. This paper presents a novel similarity measure IP2Vec for IP Addresses that builds on ideas from Word2Vec, a popular approach in text mining. The key idea is to learn similarities by extracting available context...
Data-driven analytics and decision-making have been essential for numerous applications in our society. To transform the data into a source of rich intelligence and support decision-making, data-driven analytics often need to aggregate intelligence from multiple sources and disaggregate signals into significant constituents. Though many existing approaches perform these two tasks respectively, there...
Information networks such as social networks, publication networks, and the World Wide Web are ubiquitous in the real world. Traditionally, adjacency matrices are used to represent the networks. However, adjacency matrices are too sparse and too high dimensional when the scale of the networks is large. Network embedding, which aims to learn low-dimensional continuous representations for nodes, has...
Networks naturally capture a host of real-world interactions, from social interactions and email communication to brain activity. However, graphs are not always directly observed, especially in scientific domains, such as neuroscience, where monitored brain activity is often captured as time series. How can we efficiently infer networks from time series data (e.g., model the functional organization...
Graph data management and mining in HPC environments has been a widely discussed issue in recent times. In this talk I will describe the use of Partitioned Global Address Space languages for graph data mining and management. I will first discuss the rationale behind X10 based graph libraries and graph database benchmarks using ScaleGraph and XGDBench as examples. Next, I will take Acacia which is...
Caregiving is the act of providing assistance to an individual unable to perfom some daily living activities. Caregiving can be either paid or unpaid. An informal caregiver is an unpaid caregiver to an older, sick, or disabled family member or friend on a daily basis. Informal caregiving is associated with increased physical, mental, and emotional stressors contributing to poor health outcomes, caregiver...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.