Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
Though accident data have been collected across industries, they may inherently contain uncertainty of randomness and fuzziness which in turn leads to misleading interpretation of the analysis. To handle the issue of uncertainty within accident data, the present work proposes a rough set theory (RST)-based approach to provide rule-based solution to the industry to minimize the number of accidents...
Real-time applications are usually well-defined and operate based on a particular system model. However, in practical scenarios, the applications can perform differently because of the uncertainties in the environment. The system can use video streams to capture sequential real-time information of its surroundings. The system also needs to identify various constraints that have significant effects...
The success of Big Data relies fundamentally on the ability of a person (the data scientist) to make sense and generate insights from this wealth of data. The process of generating actionable insights, called data exploration, is a difficult and time-consuming task. Data exploration of a big dataset usually requires first generating a small and representative data sample that can be easily plotted...
Digitalisation of industrial processes, also called the fourth industrial revolution, is leading to availability of large volume of data containing measurements of many process variables. This offers new opportunities to gain deeper insights on process variability and its effects on quality and performance. Manufacturing facilities already use data driven approaches to study process variability and...
The large, publicly accessible online product reviews have become a significant information resource for enterprise to discover preferences of the public and market trend. In this paper, we propose the text mining driven information gain model for identifying notable product features to enable enterprise understand what product features determine the customers' satisfaction about the given products...
Advanced pattern mining to extract the hidden but useful information by using proper structure is vital important for efficient information mining in large-scale practical datasets. The existing algorithms have not been capable of effective solving the fuzziness uncertainty of items and confirming the appropriate structure of studied patterns. In order to generate more proper practical patterns, a...
Regression trees are extended to be learnt from data with epistemic uncertainty. Modelling uncertainty with belief functions, the attribute selection strategy based on error interval is discussed and a complete tree construction procedure is proposed. As a general approach, error intervals weighted by mass functions are calculated for making the best splitting choice. Including classical regression...
Biclustering is a well-known approach for data mining, and it is applied in many fields, such as genome analyses, security services, and social network analyses. Biclustering finds bicliques contained in a bipartite graph. However, in real data, a biclique may lack several edges because of various reasons, such as errors. In this situation, traditional biclustering methods cannot find correct biclusters...
We investigate a subgraph mining framework, that can connect similar entities according to their structure and attribute similarities. We take one mapping between two related points chosen from the query and target graph as one vertex in the correspondence graph and decide the weight of the edge based on the similarity score. In this way, we transform the problem to a dense subgraph discovery problem...
The growth of semantic web technologies underpins the ever-increasing development of linked data and their applications. In recent years, the number of linked data sources has been raised from 12 to more than 2973 sets. The datasets are managed as decentralized sources, and their quality is a serious concern. The assessment of the quality of linked data is a key to adopting them in different fields...
The Support Vector Machine(SVM) is well known in machine learning and artificial intelligence for its high performance in data classification, regression and forecasting. Usually for large scaled dataset, an incremental training algorithm is applied for tuning or balancing the training cost and the accuracy in SVM applications. This paper presents an improved incremental training approach for large...
Uncertain data mining is becoming a research hotspot with the emergence of uncertain data in sensor network, Web application and other fields. How to obtain effective uncertain data sets is a prerequisite for the study of clustering, classification, frequent itemsets mining, isolated point detection and so on. In this paper, the representation model of uncertain data is analyzed, and corresponding...
With the advance of next generation sequencing technology, RNA-seq is widely being used for transcriptomics as an alternative for microarray. RNA-seq has a dynamic range of applications such as gene expression quantification, alternative splicing identification, and novel transcript discovery. Generally, the primary aim of RNA-seq analysis is to detect differentially expressed genes in different biological...
MicroRNAs form a family of single strand RNA molecules having length of approximately 22 nucleotides that are present in all animals and plants. Various studies have revealed that microRNA tend to cluster on chromosomes. In this regard, a novel clustering algorithm is presented in this paper, integrating rough hypercuboid approach with fuzzy c-means. Using the concept of rough hypercuboid equivalence...
Among many Big Data applications are those that deal with data streams. A data stream is a sequence of data points with timestamps that possesses the properties of transiency, infiniteness, uncertainty, concept drift, and multi-dimensionality. In this paper we propose an outlier detection technique called Orion that addresses all the characteristics of data streams. Orion looks for a projected dimension...
Many association rule mining algorithms find associations and correlations from traditional transaction databases, in which the content of each transaction is definitely precise. However, due to instrument errors, imprecise of sensor monitoring systems, and so on, real-world data tend to be numerical data with inherent uncertainty. To deal with these situations, we propose a FP-growth-based mining...
With the rapid development of computer technology, web services has been widely used. In these applications, the uncertain data is in the form of streams. In view of this kind of situation, present a new generalized data structure, that is, PSUF - tree, to store uncertain data streams, all itemsets in recent window are contained in global PStree in a condensed format, establish a header table in which...
Online reviews are very important for lots of Web applications. Extracting opinion targets and opinion words from online reviews is one of the core works for review analysis and mining. The traditional extraction methods mainly include two categories: the pipeline-based methods and the propagation-based ones. The former extracts opinion targets and opinion words separately, which ignores the opinion...
Clustering is a crucial task for massive data that continuously arrive and evolve over time, generated as stream. However, data may be pervaded by uncertainty and imprecision, and techniques that achieve the unsupervised learning with imperfect data sets are unable to deal with such evolving environment. On the other hand, standard methods for clustering data streams are not adapted to an uncertain...
Neighborhood Covering Reduction extracts rules for classification through formulating the covering of data space with neighborhoods. The neighborhoods of covering are constructed based on distance measure and strictly constrained to be homogeneous. However, this strategy over focuses on boundary samples and thus makes the neighborhood covering model sensitive to noise. To tackle this problem, we construct...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.