The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We explore methods for effectively extracting information from clinical narratives, which are captured in a public health consulting phone service called HealthLink. The currently available data consists of dialogues constructed by nurses while consulting patients on the phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and variety...
The analysis of high dimensional data comes with many intrinsic challenges. In particular, cluster structures become increasingly hard to detect when the data includes dimensions irrelevant to the individual clusters. With increasing dimensionality, distances between pairs of objects become very similar, and hence, meaningless for knowledge discovery. In this paper we propose Cartification, a new...
As spatio-temporal data have become ubiquitous, an increasing challenge facing computer scientists is that of identifying discrete patterns in continuous spatio-temporal fields. In this paper, we introduce a parameter-free pattern mining application that is able to identify dynamic anomalies in ocean data, known as ocean eddies. Despite ocean eddy monitoring being an active field of research, we provide...
Independent Component Analysis (ICA) algorithms taking advantage of the potential non-circular property of complex signals have been recently derived and shown to lead to improved performances. We investigate the performance of three ICA approaches to extract a weak co-channel interfering communications signal from a television broadcast signal over varied interference-to-noise ratios: complex maximization...
Many applications are benefited from data sharing, especially data statistics and data mining. But as the shared data may contain private information of data owner, it has a high risk of revealing data owner's privacy. Data obfuscation is proposed to gain a balance between data privacy and data usability. But it is hard for the present obfuscation schemes to remain the usability of data in a fine-grained...
This paper presents a new RANSAC based method for extracting planes from 3D range data. The generic RANSAC Plane Extranction (PE) method may over-extract a plane. It may fail in the case of a multi-step scene where the RANSAC process results in multiple inlier patches that form a slant plane straddling the steps. The CC-RANSAC algorithm overcomes the latter limitation if the inlier patches are separate...
As the age of big data evolves, outsourcing of data mining tasks to multi-cloud environments has become a popular trend. To ensure the data privacy in outsourcing of mining tasks, the concept of support anonymity was proposed to hide sensitive information about patterns. Existing methods that tackle the privacy issues, however, do not address the related parallel mining techniques. To fill this gap,...
The research of palm print features has attracted a lot of attention, and the principal lines which is one of the stable and important features in palm print images can provide effective information for application of palm print technology. Aimed to accurate and natural extraction, in this paper, an associated extraction method of palm print principal lines is presented based on the own properties...
Many people use the web as the main information source in their daily lives. However, most web pages contain non-information components, such as site bars, footers and ads, etc., which make it complicated to extract text from the original HTML documents. Because of the high human intervention and the low results extraction quality, although the web text extraction techniques have been developed, the...
An efficient and accurate noise parameter statistical extraction algorithm is proposed and validated experimentally using a high performance Silicon MOSFET transistor. The proposed algorithm is applicable to most devices with high input reflection coefficients and operating over wide bandwidth. Measured data agree well with theoretical expectation.
Large software development projects receive many bug reports and each of these reports needs to be triaged. An important step in the triage process is the assignment of the report to a developer. Most previous efforts towards improving bug report assignment have focused on using an activity-based approach. We address some of the limitations of activity-based approaches by proposing a two-phased location-based...
Software teams record their work progress in task repositories which often require them to encode their activities in a set of edits to field values in a form-based user interface. When others read the tasks, they must decode the schema used to write the activities down. We interviewed four software teams and found out how they used the task repository fields to record their work activities. However,...
Data mining techniques are very popular in modern days and are used in NLP (Natural Language Processing). It allows users to analyze data from many different perspectives, categorize it, and summarize the relationships identified. One of the techniques, clustering items to groups, has been very popular. We use this technique here to find different topics in a document. We aim to replicate previous...
Time series data mining is an useful tool for us to design data-driven condition monitoring as well as fault diagnosis system. Aiming at monitoring abnormal changes of dynamic process, a series of mining algorithms are built up to mine signal form, model structure of process and statistical properties of noise in sampling data series, the architecture of information mining system of sampling time...
In speech processing systems, the performance of the Voice Activity Detector (VAD) is a bottleneck to the whole system. Traditional VADs are solely based on acoustic features. Additional modality in form of visual information is used to make robust VADs. In this paper, we propose a multimodal VAD based on decision fusion between two modalities. Visual VAD (VVAD) decision vectors are interpolated so...
Noise is a challenge for process mining algorithms, but there is no standard definition of noise nor accepted way to quantify it. This means it is not possible to mine with confidence from event logs which may not record the underlying process correctly. We discuss one way of thinking about noise in process mining. We consider mining from a ‘noisy log’ as learning a probability distribution over traces,...
Data, especially in large item sets, hide a wealth of information on the processes that have created and modified them. Often, a data-field or a set of data-fields are not modified only through well-defined processes, but also through latent processes; without the knowledge of the second type of processes, testing cannot be considered exhaustive. As a matter of fact, changes in the data deriving from...
Millions of geo-tagged photos are becoming available due to the widespread of photo-sharing websites. These social medias capture attractive points-of-interest and contain interesting photo-taking patterns. Massive amount of these user-oriented data produces new challenges and understanding people's photo-taking behavior is of great importance for local tourism-related businesses. This paper analyzes...
Frequent episode mining has been proposed as a data mining task with the goal of recovering sequential patterns from temporal data sequences. While several episode mining approaches have been proposed in the last fifteen years, most of the developed techniques have not been evaluated on a common benchmark data set, limiting the insights gained from experimental evaluations. In particular, it is unclear...
Frequent Item set Mining (FISM) attempts to find large and frequent item sets in bag-of-items data such as retail market baskets. Such data has two properties that are not naturally addressed by FISM: (i) a market basket might contain items from more than one customer intent(mixture property) and (ii) only a subset of items related to a customer intent are present in most market baskets (projection...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.