The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
CANDECOMP/PARAFAC Decomposition (CPD) is one of the most popular tensor decomposition methods that has been extensively studied and widely applied. In recent years, sparse tensors that contain a huge portion of zeros but a limited number of non-zeros have attracted increasing interest. Existing techniques are not directly applicable to sparse tensors, since they mainly target dense ones and usually...
In machine learning, data augmentation is the process of creating synthetic examples in order to augment a dataset used to learn a model. One motivation for data augmentation is to reduce the variance of a classifier, thereby reducing error. In this paper, we propose new data augmentation techniques specifically designed for time series classification, where the space in which they are embedded is...
Nowadays, a hot challenge for supermarket chains is to offer personalized services to their customers. Market basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable of capturing at the same time the different factors influencing the customer's decision process: co-occurrence,...
Secondary use of biomedical data has gained much attention recently to facilitate rapid knowledge discovery in biomedicine. Association Rule Mining (ARM) has been a popular technique for biomedical researchers to perform exploratory data analysis and discover potential relationships among variables in biomedical datasets. However, ARM of a high-dimensional biomedical dataset may produce a large number...
We deal with online learning of acyclic Conditional Preference networks (CP-nets) from data streams, possibly corrupted with noise. We introduce a new, efficient algorithm relying on (i) information-theoretic measures defined over the induced preference rules, which allow us to deal with corrupted data in a principled way, and on (ii) the Hoeffding bound to define an asymptotically optimal decision...
Linear Discriminant Analysis (LDA) is widely-used for supervised dimension reduction and linear classification. Classical LDA, however, suffers from the ill-posed estimation problem on data with high dimension and low sample size (HDLSS). To cope with this problem, in this paper, we propose an Adaptive Wishart Discriminant Analysis (AWDA) for classification, that makes predictions in an ensemble way...
Electronic medical record (EMR) system has become increasingly more important in developed countries due to its convenience and efficiency in medical information storage, management and analysis. However, one of the main limitations of EMR lies in that the clinical data for patients cannot be exchanged among different medical institutions. Recently, the cloud-based clinic system, featuring in lower...
Scene text extraction is always a challenging task owing to its usual disturbing factors such as complex image backgrounds and various text behaviors (sizes, colors, styles and alignments). This paper proposes a scene text extraction approach based on the novel concept of ‘symmetrical edge-point pairs’ (‘point-pair’), which is adopted to describe the sizes, directions and brightness information of...
Banks and financial institutions around the world must comply with several policies for the prevention of money laundering and in order to combat the financing of terrorism. Nowadays, there is a raise in the popularity of novel financial technologies such as digital currencies, social trading platforms and distributed ledger payments, but there is a lack of approaches to enforce the aforementioned...
IP Addresses are a central part of packet- and flow-based network data. However, visualization and similarity computation of IP Addresses are challenging to due the missing natural order. This paper presents a novel similarity measure IP2Vec for IP Addresses that builds on ideas from Word2Vec, a popular approach in text mining. The key idea is to learn similarities by extracting available context...
Data-driven analytics and decision-making have been essential for numerous applications in our society. To transform the data into a source of rich intelligence and support decision-making, data-driven analytics often need to aggregate intelligence from multiple sources and disaggregate signals into significant constituents. Though many existing approaches perform these two tasks respectively, there...
Information networks such as social networks, publication networks, and the World Wide Web are ubiquitous in the real world. Traditionally, adjacency matrices are used to represent the networks. However, adjacency matrices are too sparse and too high dimensional when the scale of the networks is large. Network embedding, which aims to learn low-dimensional continuous representations for nodes, has...
Networks naturally capture a host of real-world interactions, from social interactions and email communication to brain activity. However, graphs are not always directly observed, especially in scientific domains, such as neuroscience, where monitored brain activity is often captured as time series. How can we efficiently infer networks from time series data (e.g., model the functional organization...
Graph data management and mining in HPC environments has been a widely discussed issue in recent times. In this talk I will describe the use of Partitioned Global Address Space languages for graph data mining and management. I will first discuss the rationale behind X10 based graph libraries and graph database benchmarks using ScaleGraph and XGDBench as examples. Next, I will take Acacia which is...
Caregiving is the act of providing assistance to an individual unable to perfom some daily living activities. Caregiving can be either paid or unpaid. An informal caregiver is an unpaid caregiver to an older, sick, or disabled family member or friend on a daily basis. Informal caregiving is associated with increased physical, mental, and emotional stressors contributing to poor health outcomes, caregiver...
Domain generation algorithms (DGAs) automatically generate large numbers of domain names in DNS domain fluxing for the purpose of command-and-control (C&C) communication. DGAs are immune to static prevention methods like blacklisting and sinkholing. Detection of DGAs in a live stream of queries in a DNS server is referred to as inline detection. Most of the previous approaches in the literature...
In this article we address the problem of expanding the set of papers that researchers encounter when conducting bibliographic research on their scientific work. Using classical search engines or recommender systems in digital libraries, some interesting and relevant articles could be missed if they do not contain the same search key-phrases that the researcher is aware of. We propose a novel model...
This paper presents a novel approach for activity recognition from accelerometer data. Existing approaches usually extract hand-crafted features that are used as input for classifiers. However, hand-crafted features are data dependent and could not be generalized for different application domains. To overcome these limitations, our approach relies on matrix factorization for dimensionality reduction...
In this paper, we propose a work flow for processing and analysing large-scale tracking data with spatio-temporal marks that uses an infrastructure for machine learning methods based on a meta-data representation of point patterns. The tracking log (IP address) of cyber security devices usually maps to geolocation and timestamp, such data is called spatiotemporal data. Existing spatio-temporal analysis...
Finding the best candidates to match a set of job requirements can be viewed as both an art and a science. In this paper, we conduct an empirical study using actual job candidates and job applicants. We compare the ranked lists generated by executive recruiting experts with the list generated by three search strategies: one using crowdworkers in a gamified environment, a second using information retrieval-based...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.