Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
This paper primarily addresses a dataset relating to cellular, chemical and physical conditions of patients gathered at the time they are operated upon to remove colorectal tumours. This data provides a unique insight into the biochemical and immunological status of patients at the point of tumour removal along with information about tumour classification and post-operative survival. The relationship...
MicroRNAs (miRNAs) are a group of small noncoding RNA (ncRNA) molecules that play an important role in biological functions. This paper proposes a microRNA prediction application, which is based on the multi-layer hierarchical MapReduce framework and provides four prediction workflows for four different datasets: miRNA-like sequences, miRNA cluster sequences, unknown miRNA sequences and the next generation...
In this paper, we view the task of identifying spammers in social networks from a mixture modeling perspective, based on which we devise a principled unsupervised approach to detect spammers. In our approach, we first represent each user of the social network with a feature vector that reflects its behaviour and interactions with other participants. Next, based on the estimated users feature vectors,...
The pervasiveness of location-acquisition technologies enable location-based social networks (LBSN) to become increasingly popular in recent years. Users are able to check-in their current location and share information with other users through these networks. LBSN check-in data can be used for the benefit of users by providing personalized recommendations. There are several location recommendation...
In this paper a new on-line algorithm is proposed (the Droplets algorithm) for dealing with concept drifts and to produce reliable predictions. The two main characteristics of this algorithm are that it is able to adapt to different types of drifts without making any assumptions regarding their type or when they occur, and can provide reliable predictions in a non-stationary environment without using...
Recommending items to users is a challenging task due to the large amount of missing information. In many cases, the data solely consist of ratings or tags voluntarily contributed by each user on a very limited subset of the available items, so that most of the data of potential interest is actually missing. Current approaches to recommendation usually assume that the unobserved data is missing at...
Population aging in developed countries brings an increased prevalence of chronic disease and of polymedication-patients with several prescribed types of medication. Attention to chronic, polymedicated patients is a priority for its high cost and the associated risks, and tools for analyzing, understanding, and managing this reality are becoming necessary. We describe a prototype of a system for discovering,...
This paper addresses the issue of detecting changes in stochastic processes. In conventional studies on change detection, it has been explored how to detect discrete changes for which the statistical models of data suddenly change. We are rather concerned with how to detect continuous changes which occurs incrementally over some successive periods. This paper gives a novel methodology for detecting...
The Needleman-Wunsch algorithm (NW) marked the genesis of a new field of research known as sequence alignment. Its inception was motivated by the growing need for automated methods to find homologous biological sequences. Subsequently, sequence alignment has established itself as a standard approach in bioinformatics, and has also been applied to other domains, including sequences of temporal events...
A general approach for anomaly detection or novelty detection consists in estimating high density regions or Minimum Volume (MV) sets. The One-Class Support Vector Machine (OCSVM) is a state-of-the-art algorithm for estimating such regions from high dimensional data. Yet it suffers from practical limitations. When applied to a limited number of samples it can lead to poor performance even when picking...
The paper objectives are twofold: to discuss the essence and challenges of automatic ontology design as applied to the Big data semantic modeling and to present Semantic Concept Analysis (SCA), a framework specifically developed for automatic actionable ontology design in Big data scenario. This framework integrates the data-driven DBpedia-based technology for semi-automatic design of the ontology...
The blogosphere allows analysts to track opinions and sentiments of individuals, groups or the general public with large sample sizes regarding many topics. Essential for the sentiment analysis are visualizations. The visual understanding of large corpora's sentiment is far more effective than relying on textual representations of the analyzed content. Users are very interested in changes in the public...
This paper addresses a new problem concerning the discovery and tracking of influencer-influencee relationships between communities in dynamic social networks. A weighted temporal multigraph is employed to represent the dynamics of the social networks. To discover and track influencer-influencee relationships over time, communities sharing common interests are first grouped together in meta-communities...
Political debates about a reform may sparkle national controversies, by leading members of the community to polarize their opinions and sentiment about the topic addressed. With the rise of social media like Twitter users are encouraged to voice and share their strong and polarized views and in general people are exposed to broader viewpoints than they were before. The large amount of user-generated...
In many Data Analysis tasks, one deals with data that are presented in high-dimensional spaces. In practice original high-dimensional data are transformed into lower-dimensional representations (features) preserving certain subject-driven data properties such as distances or geodesic distances, angles, etc. Preserving as much as possible available information contained in the original high-dimensional...
Method of elastic maps allows fast learning of nonlinear principal manifolds for large datasets. We present user-friendly implementation of the method in ViDaExpert software. Equipped with several dialogs for configuring data point representations (size, shape, color) and fast 3D viewer, ViDaExpert is a handy tool allowing to construct an interactive 3D-scene representing a table of data in multidimensional...
Discovering changes in the data distribution of streams and discovering recurrent data distributions are challenging problems in data mining and machine learning. Both have received a lot of attention in the context of classification. With the ever increasing growth of data, however, there is a high demand of compact and universal representations of data streams that enable the user to analyze current...
The widespread use of computing and communications technologies has enabled the popularity of social networks oriented to learn. In this work, we study the nature and strength of associations between students using an online social network embedded in a learning management system. With datasets from two offerings of the same course, we mined the sequences of questions and answers posted by the students...
Despite recent open data initiatives in many countries, a significant percentage of the data provided is in non-machine-readable formats like image format rather than in a machine-readable electronic format, thereby restricting their usability. This paper describes the first unified framework for converting legacy open data in image format into a machine-readable and reusable format by using crowdsourcing...
Clustering ensemble is an unsupervised learning method, which combines a number of partitions in order to produce a better clustering result. In this paper, we have proposed a clustering ensemble algorithm named Dual-Similarity Clustering Ensemble (DSCE). The core of our ensemble is a consensus function, consists of three stages. The first stage is to transform the initial clusters into a binary representation,...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.