The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The decision tree induction learning is a typical machine learning approach which has been extensively applied for data mining and knowledge discovery. For numerical data and mixed data, discretization is an essential pre-processing step of decision tree learning. However, when coping with big data, most of the existing discretization approaches will not be quite efficient from the practical viewpoint...
In this paper, we applied two methods of process mining techniques (from Discovery class/approach) in order to extract knowledge from event logs recorded by an online information system. The event log was created via information received from an online proceedings review system in Thailand. Accordingly, Alpha and Heuristic algorithms were used with the objective of automatically visualizing the models...
Understanding the total mass balance of Earth's polar cryosphere is a significant aspect of estimating sea level rise due to climate change. Measuring ice sheet elevation and ground height is essential for determining the total glacier mass balance. Satellite-based and airborne photon-counting lidar sensors, such as the upcoming Ice, Cloud and land Elevation Satellite 2 (ICESat-2), will provide accurate...
Decision tree, as one of the most widely used methods in data mining, has been used in many realistic application. Incremental decision tree handles streaming data scenario that is applicable for big data analysis. However, imperfect data are unavoidable in real-world applications. Studying the state-of-art incremental decision tree induction using Hoeffding bound, we investigated the influence of...
Original sequential pattern mining model only considers occurrence frequentness of sequential patterns, disregards their occurrence periodicity. We propose the asynchronous periodic sequential pattern mining model to discover the sequential patterns which are not only occurring frequently, but also appearing periodically. For this mining model, we propose a pattern-growth mining algorithm to mine...
This study attempts to establish methods for characterizing the complexity of ordinal data through the information and entropy parameters. In this respect, there were examined the methods for measuring the complexity of data with similar statistical characteristics and the parameters that can make the difference between them were established. For this purpose, the analysis was applied to three data...
This paper presents a new sequential clustering algorithm based on sequential hard c-means clustering. The word sequential cluster extraction means that the algorithm extract one cluster at a time. The sequential hard c-means is one of the typical and conventional sequential clustering methods. The proposed new sequential clustering algorithm is based on Dave's noise clustering approach. A characteristic...
The palmprint recognition has become a focus in biological recognition and image processing fields. In this process, the features extraction (with particular attention to palmprint principal line extraction) is especially important. Although a lot of work has been reported, the representation of palmprint is still an open issue. In this paper we propose a simple, efficient, and accurate palmprint...
This paper addresses the problem of lossy compression for hyperspectral images and presents an efficient compression algorithm based on FastICA. Firstly, an efficient algorithm for segmentation of hyperspectral images is proposed. Secondly, based on the targets, a lossy compression based on ROI (Region of Interest) is proposed for hyperspectral compression, which employs KLT(Karhunen-Loève transform)...
Three recent trends aim to make local pattern mining more directly suited for use on data as it presents itself in practice, namely in a multi-relational form and affected by noise. The first of these trends is the generalisation of local pattern syntaxes to approximate, noise-tolerant, variants (notably fault-tolerant itemset mining and community detection). The second of these trends is to develop...
Since mirror-like odd and even features in face recognition reflect the symmetrical and asymmetrical image information, respectively, their proper combination can improve the recognition rates to some extent. However, the face imaging process can easily be affected by external factors and encounter the noise signal, which disturbs the effect of face recognition based on combinational mirror-like odd...
Recent studies have suggested significant differences in motor performances of Parkinson's Disease (PD) patients who have L-dopa induced dyskinesias (LIDs), even when off of L-dopa medication. The pathophysiology of LIDs remains obscure, so applying data-mining techniques to the patients' motor performance may provide some heuristic insight. This paper investigated visually-guided tracking performance...
Considering the wide spectrum of both practical and research applicability, opinion mining has attracted increased attention in recent years. This article focuses on breaking the domain-dependency barrier which occurs in supervised opinion mining strategies by using a semi-supervised approach, which ensures domain independence. Our work devises a generalized methodology by considering a set of grammar...
Commercial websites usually contain noisy information blocks along with main content. Noisy information degrades the performance of web content mining. Web content mining is used for discovering useful knowledge or information from the web page. In this paper, we propose noise elimination method that uses tag based filtering followed by structural analysis of the web page. The proposed tag based filtering...
One of the main goals of process mining is to automatically discover meaningful process models from event logs. Since these logs are the essential source of information for discovery algorithms, their quality is of high importance. In recent years, many studies on the quality of resulting process models have been conducted. However, the analysis of event log quality prior to the generation of models...
Clustering algorithms based on Grid are attractive for the task of data partition in spatial database. In the background of big data more and more research focuses on how to solve the conflict between efficiency and accuracy of clustering. Existing Grid-based clustering algorithms generally have a high time efficiency without considering the distribution of the data inside a grid. In this paper, a...
Web databases contain a huge amount of structured data which are easily obtained via their query interfaces only. The query results are presented in dynamically generated web pages, usually in the form of data records, for human use. The automatical web data extraction is critical in web integration. A number of approaches have been proposed. The early work are most based on the source code or the...
In this paper we propose a noise detection system based on similarities between instances. Having a data set with instances that belongs to multiple classes, a noise instance denotes a wrongly classified record. The similarity between different labeled instances is determined computing distances between them using several metrics among the standard ones. In order to ensure that this approach is computational...
We previously proposed the artificial fiber (AF) patterns in order to be able to hide information in printed documents. AF pattern uses the features of the medium (e.g., paper). It has features of rotational invariance, low visibility of the hidden information. But it still suffered extraction threshold instability when using a camera to extract the information. This problem has now been overcome...
In this paper, we propose a method for conversation summarization. For the method, we combine two approaches, a scoring method and a machine learning technique (SVMs). First we compare important utterance extraction by the scoring method and SVMs. In the machine learning technique, we introduce verbal features, such as relations between utterances and anaphora features, and nonverbal features. Next...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.