The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Statistical dependency analysis is the basis of all empirical science. A commonly occurring problem is to find the most significant dependency rules, which describe either positive or negative dependencies between categorical attributes. For example, in medical science one is interested in genetic factors, which can either predispose or prevent diseases. The requirement of statistical significance...
The concept of Triclusters has been investigated recently in the context of two relational datasets that share labels along one of the dimensions. By simultaneously processing two datasets to unveil triclusters, new useful knowledge and insights can be obtained. However, some recently reported methods are either closely linked to specific problems or constrain datasets to have some specific distributions...
Mining companies investigate very carefully the area of proposed mine sites. This is done by first looking at the geology of the area and then drilling the boreholes to predict the quantity and if possible approximate the structure of the mine and distribution of the metal grades. The data obtained from boreholes is analysed using point interpolation techniques such as inverse distance weighting (IDW)...
With more older adults and people with cognitive disorders preferring to stay independently at home, prompting systems that assist with Activities of Daily Living (ADLs) are in demand. In this paper, with the introduction of “The PUCK”, we take the very first approach to automate a prompting system without any predefined rule set or user feedback. We statistically analyze realistic prompting data...
Detecting interesting patterns in data has been a focus of recent work in knowledge discovery. Understanding the patterns of interaction between attributes is relevant to many fields. Existing measures of interestingness do not adequately detect these interaction patterns. Here we present a new measure that explores the interactions to be found in data. We combine this interestingness measure with...
Generally, numerous data may increase the statistical power. However, many algorithms in data mining community only focus on small samples. This is because when the sample size increases, the data set is not necessarily identically distributed in spite of being generated by some common data generating mechanism. In this paper, we realize restricted Bayesian network classifiers are robust even when...
The authors review literature of Traditional Chinese Medicine on congestive heart failure treatment from 2005 to 2010 of CNKI database and analyze the common principle of treatment and prescription of Traditional Chinese Medicine by combining computer retrieval with artificially retrieval. The result of this study reflect the principle of treatment and law of taking medicine basically, hope to provide...
In the news video, As a high-level semantic features, video text play a crucial role for the semantics understanding, video analysis, quick video clips retrieval of news video. We propose a new video character extraction method that is gray-based. It makes full use of gray information of the video image and the news video text features to detect the video frame which contain text information, and...
Data captured from a live cellular network with the real users during their common daily routine help to understand how the users move within the network. Unlike the simulations with limited potential or expensive experimental studies, the research in user-mobility or spatio-temporal user behavior can be conducted on publicly available datasets such as the Reality Mining Dataset. These data have been...
Traffic accident records data mining is very important to understand why traffic accidents occurred frequently under some driving, environment, and vehicle conditions. There are many reasons can lead to accident, and their relationships are complex, it is very difficult to build a correct evaluation model. To overcome this problem, statistical models such as neural network, fuzzy logic, decision tree...
Data mining is a technology in data analysis with rising application in sports. Basketball is one of most popular sports. Due to its dynamics, a large number of events happen during a game. Basketball statisticians have task to note as many of these events as possible, in order to provide their analysis. In this paper, we used data from the First B basketball league for men in Serbia, for seasons...
The following topics are dealt with: signal processing and statistical approaches for functional genomics; computational biology; computational genomics and epigenomics; personal sequencing; discovery of molecular signatures and biomarkers; gene and protein sequence analysis; analysis of microarray and mass spectrometry data; denoising and compression of genomic data; regulatory network modeling,...
This paper proposes a method employing text mining techniques to analyze e-mails collected at various source. The method inputs the subject and the body of an email and decides a text class for the e-mail. Also, the method extracts key concepts from e-mails and presents their statistical information. The results of numerical experiments indicate that acquired concept relation dictionaries correspond...
In the past decades, the amount of information available to law enforcement agencies has increased significantly. Most of this information is in textual form, however analyses have mainly focused on the structured data. In this paper, we give an overview of the concept discovery projects at the Amsterdam-Amstell and police where Formal Concept Analysis (FCA) is being used as text mining instrument...
Sequential pattern mining is a process of extracting useful patterns in data sequences. Existing works on mining Top-K patterns on data streams are mostly for non-sequential patterns. In our framework, we focus on the topic of Top-K sequential pattern mining, where users can obtain adequate amount of interesting patterns. The proposed method can automatically adjust the minimum support during mining...
With the development of digital information technology in mining industry, Digital Mine plays an important role in the process of mine construction. The current situation is that mine management information system is usually based on one certain technology platform. Due to the diversity of mine tasks and the inability of a single technology platform to manage multiple data types, people are focusing...
Background: The majority of software faults are present in small number of modules, therefore accurate prediction of fault-prone modules helps improve software quality by focusing testing efforts on a subset of modules. Aims: This paper evaluates the use of the faults-slip-through (FST) metric as a potential predictor of fault-prone modules. Rather than predicting the fault-prone modules for the complete...
In this article we show how to find evidence of incomplete or fractured processes in non-structured reports of known business processes, by means of rules, patterns and detection of cause-effect relationships. A priori classifications and probabilities of process activities are used as inputs for the analysis and rules detection. In this method we use a domain-specific ontology associated to process...
Lightning location information is the message on lightning received from lightning monitoring net. It contains lightning time, latitude, longitude, intensity, as well as steepness, positioning errors and other critical information. There're some differences in the information caused by the diversity of lightning positioning system models. This paper introduces a kind of methods on extracting the lightning...
A new method for TV commercial detection based on multi-feature fusion is proposed. Firstly, the video and audio features of TV commercials are analyzed and extracted. Then, TV commercial detection is performed by using these video and audio features as well as a statistical analysis method. Finally, experiments are implemented and results are analyzed. Experimental results show that TV commercials...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.