Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale...
Missing data is a data mining problem that adversely affects data analysis and decision making processes that are frequently encountered in healthcare data for a variety of reasons. Missing data is still an important research topic because the success of the method is influenced by many factors such as the characteristics of the data and the type of the missing data. In this study, a clustering and...
Over the past few years, the dimensionality of functional MRI (fMRI) effects the analysis of brain data. In the field of machine learning and statistical analysis, classification of objects plays a significant role. Machine learning classifiers are used to discover the class of new data points from a set of data points. The application of learning techniques on fMRI data alleviates to cognitive state...
Topological data analysis is a noble method to analyze high-dimensional qualitative data using a set of properties from topology. In this paper, we explore the feasibility of topological data analysis for mining social media data by investigating the problem of image popularity. We randomly crawl images from Instagram, convert their captions to 300 dimensional numerical vectors using Word2vec, calculate...
The aim of this paper is to examine possibilities for the initial data analyses of the failure data from industrial production process. To perform the initial data analysis of the data from production process we have used graphical statistical method and also data mining methods like drill-down analysis and cluster analysis. Before applying mentioned techniques and methods it was necessary to know...
With the continuous growing of aging population, the society is facing new challenges, namely the implementation of healthcare services for older people, as well as the promotion of the active aging and well-being. These challenges imply the optimization of these services through biomedical, physical, psychological and socio-environmental interventions. ICT technologies can support the implementation...
This installment of Computer's series highlighting the work published in IEEE Computer Society journals comes from IEEE Transactions on Network Science and Engineering
Distributed Applications from different domains like Health care, E-Commerce, science, social networks etc., tend to generate large volumes of heterogeneous data that grow exponentially over a period of time leading to big data sets. Descriptive Analytics, on big data sets, pose a great challenge for traditional data analytical tools, since it is to be performed on the full data set, unlike predictive...
In this paper, we report an application of data analytics in a real world business case of the telecom industry. This work has been tied up with an IT company in India with a large data set of telecom customers. As part of data analytics, the first task was to perform cleansing of bad and missing data, transforming heterogeneous formats into a unified format, semantic analysis on the data (semantics...
Flow cytometry (FCM) is a very well-known method that is broadly used in clinical and research laboratories. Both clinical and research laboratories have been the target domains of FCM applications. The key research question in this particular field is “how to effectively automate FCM data analysis?”. To answer this question, this paper systematically reviews current advances in the automation of...
Spectral analysis of neighborhood graphs is one of the most widely used techniques for exploratory data analysis, with applications ranging from machine learning to social sciences. In such applications, it is typical to first encode relationships between the data samples using an appropriate similarity function. Popular neighborhood construction techniques such as k-nearest neighbor (k-NN) graphs...
This paper introduces the relative principium of K-Means algorithm, simulated annealing (SA) algorithm and particle swarm optimization (PSO) algorithm at first. Then, in allusion to the influence of the initial value of the K-Means algorithm on the optimal solution of the algorithm, a hybrid algorithm of K-Means based on SA-PSO is proposed. The new algorithm uses the advantage of jumping out of local...
In this paper, we present a new approach of distributed clustering for spatial datasets, based on an innovative and efficient aggregation technique. This distributed approach consists of two phases: 1) local clustering phase, where each node performs a clustering on its local data, 2) aggregation phase, where the local clusters are aggregated to produce global clusters. This approach is characterised...
This paper, deals with a study of data mining techniques such as clustering, biclustering and triclustering. A large number of clustering approaches have been proposed for analysis of gene expression. However, the results of the application of standard clustering methods are limited. For this reason, concurrent clustering such as biclustering to find sub-matrices that are a subset of rows and a subset...
Real-world data are often acquired as a collection of matrices rather than as a single matrix. Such multiblock data are naturally linked and typically share some common features while at the same time exhibiting their own individual features, reflecting the underlying data generation mechanisms. To exploit the linked nature of data, we propose a new framework for common and individual feature extraction...
A dynamic system is represented by its outputs, as time-series data. Modeling of the time-series is an important data-mining task for prediction in future, or detection of deviation from normal behavior (anomaly). Clustering of multiple time-series leads to understanding the system, as well as improve efficiency of monitoring.
Football (aka Soccer) is the most popular sport in the world. The popularity of the sport leads to several stories (some perhaps anecdotal) about supporters behaviors and to the emergence of rivalries such as the famous Barcelona-Real Madrid (in Spain). Little however has been done to characterize/profile online users' behaviors as football supporters and use them as an aggregate measure to club characterization...
Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high complexity of some algorithms. For instance, some algorithms may have linear complexity but they require the domain knowledge in order to determine their input...
Big data analytics are very fruitful for solving problems in cybersecurity. We have analyzed modern trends in intelligent security systems research and practice and worked out a syllabus for a new university course in the area of data mining and machine learning with applications to cybersecurity. The course is for undergraduate and graduate students studying the cybersecurity. The main objective...
Phishing attacks against financial institutions constitutes a major concern and forces them to invest thousands of dollars annually in prevention, detection and takedown of these kinds of attacks. This operation is so massive and time critical that there is usually no time to perform analysis to look for patterns and correlations between attacks. In this work we summarize our findings after applying...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.