Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
Clustering vertices in graphs or in sequences of graphs has important applications in network science, bioinformatics, and other areas. Most research to date has focused on static graphs or sequences where the number of vertices does not change. We propose a new algorithm that successfully partitions the vertices of a graph sequence into smooth clusters, even when the number of vertices is allowed...
Skypatterns are an elegant answer to the pattern explosion issue, when a set of measures can be provided. Skypatterns for all possible measure combinations can be explored thanks to recent work on the skypattern cube. However, this leads to too many skypatterns, where it is difficult to quickly identify which ones are more important. First, we introduce a new notion of pattern steadiness which measures...
In this work, we propose Max-Node sampling, a novel sampling algorithm for data collection. The goal of Max-Node is to maximize the number of nodes observed in the sample, given a budget constraint. Max-Node is based on the intuition that networks contain many densely connected regions (i.e., communities), that may be only weakly connected to another, and to maximize the number of nodes observed,...
Based on the concept of isomorphism of relations, a relation is turned into a simplicial complex, which is a combinatorial representation of a polyhedron. So frequent itemsets mining is transform turned into geometric traversal problem. By leveraging on geometric structure of simplicial complex, a very fast algorithm for traversal is found; it is based on a geometric concept, called sub-cone construction...
Reproductive performance is important for the economic efficiency of pasture-based dairy farms. In these seasonal calving systems, a concise period of breeding is essential to ensure the alignment of peak grass availability with peak lactating cow energy demands. Trials and statistical analysis have identified the factors affecting overall reproductive performance, but few studies have analysed performance...
Understanding the socio-economical background of voters supporting a certain cause or, vice versa, understanding the political stance of people from a certain socio-economical niche are important questions in political sciences. Traditionally, answering these questions has required the researcher to fix either the political stance or the socio-economical background. In this paper, we propose using...
In this study, we focus on extraction of latent topic transition from POS data. POS analysis is conducted to obtain the frequent pattern of customer's behavior. The fundamental method for POS analysis is to conduct market basket analysis. By doing Market basket analysis, the sets of products that are often bought at the same time can be extracted. In market basket analysis, however, the effect of...
With the increase of systems' complexity, exception detection becomes more important and difficult. For most complex systems, like cloud platform, exception detection is mainly conducted by analyzing a large amount of telemetry data collected from systems at runtime. Time series data and events data are two major types of telemetry data. Techniques of correlation analysis are important tools that...
Understanding the value of a football player is a challenging problem. Player valuation is not only critical for scouting, bidding and negotiation processes but also attracts a large media and fan interest. Due to the complexities which arise from the fact that player pool is distributed over hundreds of different leagues and many different playing positions, many clubs hire domain experts (often...
According to the Merriam-Webster dictionary, satire is a trenchant wit, irony, or sarcasm used to expose and discredit vice or folly. Though it is an important language aspect used in everyday communication, the study of satire detection in natural text is often ignored. In this paper, we identify key value components and features for automatic satire detection. Our experiments have been carried out...
Semi-supervised learning is the required paradigm when data are partially labeled. It is more adapted for large domain applications when labels are hardly and costly to obtain. In addition, when data are large, feature selection and instance selection are two important dual operations for removing irrelevant information. To address theses challenges together, we propose a unified framework, called...
We present a novel nearest neighbor search scheme named aggregating tree (A-Tree) for high dimensional data that uses vector quantization encodings (VQ-encodings) to build a radix tree, and perform the nearest neighbor search by beam search. To search accurately and efficiently, we suggest VQ-encodings to satisfy locally aggregating encoding criterion: for any node of the corresponding A-Tree, neighboring...
Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are ubiquitous, and capture rich information about interactions between nodes. In this paper, we aim to utilize exactly this information to discern suspicious from typical behavior in an unsupervised fashion, lending well to the traditional scarcity of ground-truth labels in practical anomaly...
System event logs have been frequently used as a valuable resource in data-driven approaches to enhance system health and stability. A typical procedure in system log analytics is to first parse unstructured logs, and then apply data analysis on the resulting structured data. Previous work on parsing system event logs focused on offline, batch processing of raw log files. But increasingly, applications...
Considering the fact that the underlying structural information in the training data within classes is vital for a good classifier in real-world classification problems, Structural Nonparallel Support Vector Machine (or SNPSVM, for short) has been proposed. By combining the structural information with nonparallel support vector machine (NPSVM), SNPSVM can fully exploit prior knowledge to directly...
Microaggregation is a well known and widely used statistical disclosure limitaton method. In the case of univariate microaggregation, there is a polynomial time algorithm that obtains optimal partitions by representing the optimal partition as a shortest path in a directed acyclic graph. Such algorithm is frequently used for obtaining optimal k-degree anonymizations of networks. Since there is a large...
Social networks are now a primary source for news and opinions on topics ranging from sports to politics. Analyzing opinions with an associated sentiment is crucial to the success of any campaign (product, marketing, or political). However, there are two significant challenges that need to be overcome. First, social networks produce large volumes of data at high velocities. Using traditional (semi-)...
In order to generate effective results, it is essential for a recommender system to model the information about the user interests (user profiles). A profile usually contains preferences that reflect the recommendation technique, so collaborative systems represent a user with the ratings given to items, while content-based approaches assign a score to semantic/text-based features of the evaluated...
The analysis of the temporal evolution of dynamic networks is a key challenge for understanding complex processes hidden in graph structured data. Graph evolution rules capture such processes on the level of small subgraphs by describing frequently occurring structural changes within a network. Existing rule discovery methods make restrictive assumptions on the change processes present in networks...
Robust principal component analysis (RPCA) has been widely used for recovering low-rank matrices in many data mining and machine learning problems. It separates a data matrix into a low-rank part and a sparse part. The convex approach has been well studied in the literature. However, state-of-the-art algorithms for the convex approach usually have relatively high complexity due to the need of solving...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.