Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
In the big data era, the information about the same object collected from multiple sources is inevitably conflicting. The task of identifying true information (i.e., the truths) among conflicting data is referred to as truth discovery, which incorporates the estimation of source reliability degrees into the aggregation of multi-source data. However, in many real-world applications, large-scale data...
In large-scale data classification tasks, it is becoming more and more challenging in finding a true class from a huge amount of candidate categories. Fortunately, a hierarchical structure usually exists in these massive categories. The task of utilizing this structure for effective classification is called hierarchical classification. It usually follows a top-down fashion which predicts a sample...
Linear Discriminant Analysis (LDA) is widely-used for supervised dimension reduction and linear classification. Classical LDA, however, suffers from the ill-posed estimation problem on data with high dimension and low sample size (HDLSS). To cope with this problem, in this paper, we propose an Adaptive Wishart Discriminant Analysis (AWDA) for classification, that makes predictions in an ensemble way...
Literature based discovery (LBD) is a task that aims to uncover hidden associations between non-interacting scientific concepts by rationally connecting independent nuggets of information. Broadly, prior approaches to LBD include use of: a) distributional statistics and explicit representation, b) graph-theoretic measures, and c) supervised machine learning methods to find associations. However, purely...
Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming data, where the elements of a histogram are observed in a streaming manner. First, the ever-growing cardinality of histogram elements makes any similarity computation inefficient. Second, the concept-drift issue in the data streams also impairs...
Product bundling is widely adopted for information goods and online services because it can increase profit for companies. For example, cable companies often bundle Internet access and video streaming services together. However, it is challenging to obtain an optimal bundling strategy, not only because it is computationally expensive, but also that customers’ private information (e.g., valuations...
Time series motifs are approximately repeating patterns in real-valued time series data. They are useful for exploratory data mining and are often used as inputs for various time series clustering, classification, segmentation, rule discovery, and visualization algorithms. Since the introduction of the first motif discovery algorithm for univariate time series in 2002, multiple efforts have been made...
Active learning aims to reduce manual labeling efforts by proactively selecting the most informative unlabeled instances to query. In real-world scenarios, it's often more practical to query a batch of instances rather than a single one at each iteration. To achieve this we need to keep not only the informativeness of the instances but also their diversity. Many heuristic methods have been proposed...
With the rapid rise of various e-commerce and social network platforms, users are generating large amounts of heterogeneous behavior data, such as purchasehistory, adding-to-favorite, adding-to-cart and click activities, and this kind of user behavior data is usually binary, only reflecting a user's action or inaction (i.e., implicit feedback data). Tensor factorization is a promising means of modeling...
Given an undirected network where some of the nodes are labeled, how can we classify the unlabeled nodes with high accuracy? Loopy Belief Propagation (LBP) is an inference algorithm widely used for this purpose with various applications including fraud detection, malware detection, web classification, and recommendation. However, previous methods based on LBP have problems in modeling complex structures...
Network embedding aims at projecting the network data into a low-dimensional feature space, where the nodes are represented as a unique feature vector and network structure can be effectively preserved. In recent years, more and more online application service sites can be represented as massive and complex networks, which are extremely challenging for traditional machine learning algorithms to deal...
Given a contact network and coarse-grained diagnostic information like electronic Healthcare Reimbursement Claims (eHRC) data, can we develop efficient intervention policies to control an epidemic? Immunization is an important problem in multiple areas especially epidemiology and public health. However, most existing studies focus on developing pre-emptive strategies assuming prior epidemiological...
In today's era of big data, robust least-squares regression becomes a more challenging problem when considering the adversarial corruption along with explosive growth of datasets. Traditional robust methods can handle the noise but suffer from several challenges when applied in huge dataset including 1) computational infeasibility of handling an entire dataset at once, 2) existence of heterogeneously...
Besides the text content, documents and their associated words usually come with rich sets of meta information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence...
Matrix Factorization (MF) is a very popular method for recommendation systems. It assumes that the underneath rating matrix is low-rank. However, this assumption can be too restrictive to capture complex relationships and interactions among users and items. Recently, Local LOw-Rank Matrix Approximation (LLORMA) has been shown to be very successful in addressing this issue. It just assumes the rating...
With the rapid development of location-based social networks, Point-of-Interest (POI) recommendation has played an important role in helping people discover attractive locations. However, existing POI recommendation methods assume a flat structure of POIs, which are better described in a hierarchical structure in reality. Furthermore, we discover that both users' content and spatial preferences exhibit...
Network clustering is an essential approach to finding latent clusters in real-world networks. As the scale of real-world networks becomes increasingly larger, the existing network clustering algorithms fail to discover meaningful clusters efficiently. In this paper, we propose a framework called AnySCAN, which applies anytime theory to the structural clustering algorithm for networks (SCAN). Moreover,...
CANDECOMP/PARAFAC Decomposition (CPD) is one of the most popular tensor decomposition methods that has been extensively studied and widely applied. In recent years, sparse tensors that contain a huge portion of zeros but a limited number of non-zeros have attracted increasing interest. Existing techniques are not directly applicable to sparse tensors, since they mainly target dense ones and usually...
Driven by the dramatic growth of data both in terms of the size and sources, learning from heterogeneous data is emerging as an important research direction for many real applications. One of the biggest challenges of this type of problem is how to meaningfully integrate heterogeneous data to considerably improve the generality and quality of the learning model. In this paper, we first present a unified...
Since their introduction over a decade ago, time series motifs have become a fundamental tool for time series analytics, finding diverse uses in dozens of domains. In this work we introduce Time Series Chains, which are related to, but distinct from, time series motifs. Informally, time series chains are a temporally ordered set of subsequence patterns, such that each pattern is similar to the pattern...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.