The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we study a highly generic version of influence maximization (IM), one of optimizing influence campaigns by sequentially selecting "spread seeds" from a set of candidates, a small subset of the node population, under the hypothesis that, in a given campaign, previously activated nodes remain "persistently" active throughout and thus do not yield further rewards. We...
Supply chain management aims at delivering goods in the shortest time at the lowest possible price while ensuring the best possible quality and is now vital to the success of the online retail business. Executing effective warehouse site selection has been one of the key challenges in the development of a successful supply chain system. While some effective strategies for warehouse site selection...
We study efficient computation of Minimax distances measures, which enable to capture the correct structures via taking the transitive relations into account. We analyze in detail two settings, the dense graphs and the sparse graphs. In particular, we show that an adapted variant of the Kruskal’s algorithm is the most efficient approach for computing pairwise Minimax distances. However, for dense...
We propose EC3, a novel algorithm that merges classification and clustering together in order to support both binary and multi-class classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using a convex optimization function. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental...
The rapid growth of Electronic Health Records (EHRs), as well as the accompanied opportunities in Data-Driven Healthcare (DDH), has been attracting widespread interests and attentions. Recent progress in the design and applications of deep learning methods has shown promising results and is forcing massive changes in healthcare academia and industry, but most of these methods rely on massive labeled...
We consider the problem of causal structure learning from data with missing values, assumed to be drawn from a Gaussian copula model. First, we extend the 'Rank PC' algorithm, designed for Gaussian copula models with purely continuous data (so-called nonparanormal models), to incomplete data by applying rank correlation to pairwise complete observations and replacing the sample size with an effective...
Stories can have tremendous power – not only useful for entertainment, they can activate our interests and mobilize our actions. The degree to which a story resonates with its audience may be in part reflected in the emotional journey it takes the audience upon. In this paper, we use machine learning methods to construct emotional arcs in movies, calculate families of arcs, and demonstrate the ability...
In order to yield a more balanced partitioning, we investigate the use of additive regularizations for the Min Cut cost function, instead of normalization. In particular, we study the case where the regularization term is the sum of the squared size of the clusters, which then leads to shifting (adaptively) the pairwise similarities. We study the connection of such a model with Correlation Clustering...
In multi-tier storage systems with large amounts of data, most of the data is stored on inexpensive slower tiers such as cloud or tape to achieve cost savings. This also implies that retrieving the data from the slower storage tiers incurs high latency. Therefore, it would be beneficial to proactively prefetch data from slower tiers to faster tiers by predicting future data accesses. State-of-the-art...
Active learning aims to reduce manual labeling efforts by proactively selecting the most informative unlabeled instances to query. In real-world scenarios, it's often more practical to query a batch of instances rather than a single one at each iteration. To achieve this we need to keep not only the informativeness of the instances but also their diversity. Many heuristic methods have been proposed...
With the rapid rise of various e-commerce and social network platforms, users are generating large amounts of heterogeneous behavior data, such as purchasehistory, adding-to-favorite, adding-to-cart and click activities, and this kind of user behavior data is usually binary, only reflecting a user's action or inaction (i.e., implicit feedback data). Tensor factorization is a promising means of modeling...
Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming data, where the elements of a histogram are observed in a streaming manner. First, the ever-growing cardinality of histogram elements makes any similarity computation inefficient. Second, the concept-drift issue in the data streams also impairs...
Network embedding aims at projecting the network data into a low-dimensional feature space, where the nodes are represented as a unique feature vector and network structure can be effectively preserved. In recent years, more and more online application service sites can be represented as massive and complex networks, which are extremely challenging for traditional machine learning algorithms to deal...
Literature based discovery (LBD) is a task that aims to uncover hidden associations between non-interacting scientific concepts by rationally connecting independent nuggets of information. Broadly, prior approaches to LBD include use of: a) distributional statistics and explicit representation, b) graph-theoretic measures, and c) supervised machine learning methods to find associations. However, purely...
Given a contact network and coarse-grained diagnostic information like electronic Healthcare Reimbursement Claims (eHRC) data, can we develop efficient intervention policies to control an epidemic? Immunization is an important problem in multiple areas especially epidemiology and public health. However, most existing studies focus on developing pre-emptive strategies assuming prior epidemiological...
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
The rapid growth of medical recording data has increased the demand for automated analysis. An important problem in recent medical research is automated medical diagnosis, which is to infer likely diseases for the observed symptoms. Existing approaches typically perform the inference on a sparse bipartite graph with two sets of nodes representing diseases and symptoms, respectively. By using this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.