The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Sensor networks play an important role in applications concerned with environmental monitoring, disaster management, and policy making. Effective and flexible techniques are needed to explore unusual environmental phenomena in sensor readings that are continuously streamed to applications. In this paper, we propose a framework that allows to detect outlier sensors and to efficiently construct outlier...
Knowledge discovery from temporal, spatial and spatiotemporal data is critical for climate change science and climate impacts. Climate statistics is a mature area. However, recent growth in observations and model outputs, combined with the increased availability of geographical data, presents new opportunities for data miners. This paper maps climate requirements to solutions available in temporal,...
In sequential pattern mining, languages based on regular expressions (RE) were proposed to restrict frequent sequences to the ones that satisfy user-specified constraints. In these languages, REs are applied over items. We propose a much powerful language, based on regular expressions, denoted RE-SPaM, where the basic elements are constraints over the attributes of the items. Expressions in this language...
Unlike the traditional incremental updating problem for discrete data, the appended data to spatial dataset may introduce lots of new relations between the added events and the existing events. Moreover, as the measure in mining of colocation patterns, participation index is complicated to handle compared with simply support counter. Thus, the incremental maintenance of colocation patterns for dynamic...
Interpreted languages frequently suffer from higher processing times as compared to compiled approaches. Typically this happens when complex computations are performed. Array DBMSs, which extend database functionality with multidimensional array modeling and query support, find themselves in exactly this situation: queries often involve a large number of operations, and each such operation is applied...
This paper presents a method to discover the discriminative patterns or features in hyperspectral data for classification. The proposed method searches the data space along both spectral and spatial frequency axis and combines the adjacent spectral and spatial frequency bands so that a simpler but more effective feature set is achieved. The algorithm is tested on hyperspectral images of hazelnut kernels...
Traffic routes through a street network contain patterns and are no random walks. Such patterns exist for instance along streets or between neighbouring street segments. The extraction of these patterns is a challenging task due to the enormous size of city street networks, the large number of required training data and the unknown distribution of the latter. We apply Bayesian Networks to model the...
Remote sensing has been applied to agriculture at very coarse levels of granularity (i.e., national levels) but few investigations have focused on yield prediction at the farm unit level. Specific aims of the present investigation are to analyze the ability of Moderate Resolution Imaging Spectroradiometer (MODIS) data to predict cotton yields in two highly homogeneous counties in west Texas. In one...
Recently, many commercial products, such as Google Trends and Yahoo! Buzz, are released to monitor the past search engine query frequency trend. However, little research has been devoted for predicting the upcoming query trend, which is of great importance in providing guidelines for future business planning. In this paper, a unified solution is presented for such a purpose. Besides the classical...
We introduce s-kNN, a nearest neighbor based spatial data mining algorithm. It belongs to the class of vector-geometry based algorithms that reason on complex spatial objects instead of point measurements. In contrast to most methods in this class, it does on the fly spatial computations that cannot be replaced by a pre-processing step without sacrificing efficiency. The key is a partial evaluation...
This paper addresses the problem of detecting and tracking moving clusters in spatio-temporal data sets. Spatio-temporal data sets contain data elements that move in space over time. Traditional data clustering algorithms work well on static data sets that contain well separated clusters. When traditional techniques are applied to spatio-temporal data they breakdown when the moving data elements intersect...
In many practical situations it is not feasible to collect labeled samples for all available classes in a domain. Especially in supervised classification of remotely sensed images it is impossible to collect ground truth information over large geographic regions for all thematic classes. As a result often analysts collect labels for aggregate classes (e.g., Forest, Agriculture, Urban). In this paper...
Humans communicate with text in thousands of languages, in dozens of scripts, in a variety of binary codes, on millions of topics. There is a need, for both government and commercial applications, to identify these text characteristics to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time...
The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper describes one distributed...
This paper describes how distributed data mining models, such as collective learning, ensemble learning, and meta-learning models, can be implemented as WSRF mining services by exploiting the Grid infrastructure. Our goal is to design a general distributed architectural model that can be exploited for different distributed mining algorithms deployed as Grid services for the analysis of dispersed data...
Weka4WS is an extension of the Weka toolkit to support remote execution of data mining tasks as grid services. A first version of Weka4WS supporting concurrent execution of multiple data mining tasks on remote grid nodes has been presented in a previous work. In this paper we present a new version supporting also the composition and execution of data mining workflows on a grid. This new version of...
Within business Intelligence contexts, the importance of data mining algorithms is continuously increasing, particularly from the perspective of applications and users that demand novel algorithms on the one hand and an efficient implementation exploiting novel system architectures on the other hand. Within this paper, we focus on the latter issue and report our experience with the exploitation of...
With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window...
Data clustering has been proven to be a promising data mining technique. Recently, there have been many attempts for clustering market-basket data. In this paper, we propose a parallelized hierarchical clustering approach on market-basket data (PH-Clustering), which is implemented using MPI. Based on the analysis of the major clustering steps, we adopt a partial local and partial global approach to...
Distance computation is one of the most computationally intensive operations employed by many data mining algorithms. Performing such matrix computations within a DBMS creates many optimization challenges. We propose techniques to efficiently compute Euclidean distance using SQL queries and user-defined functions (UDFs). We concentrate on efficient Euclidean distance computation for the well-known...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.