The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Several marketing problems involve prediction of customer purchase behavior and forecasting future preferences. We consider predictive modeling of large scale, bi-modal or multimodal temporal marketing data, for instance, datasets consisting of customer spending behavior over time. Such datasets are characterized by variability in purchase patterns across different customer subgroups and shifting...
This article introduces ARUBAS, a new framework to build associative classifiers. In contrast with many existing associative classifiers, it uses class association rules to transform the feature space and uses instance-based reasoning to classify new instances. The framework allows the researcher to use any association rule mining algorithm to produce the class association rules. Every aspect of the...
This paper shows the meaning of Pearson residuals as an indicator of statistical independence. While information granules of statistical independence of two variables can be viewed as determinants of 2times2-submatrices, those of three variables consist of several combinations of linear equations which will become residuals for odds ratio (outer products) when they are equal to 0. Interestingly, the...
Ordered information table is one of the most important research areas of granular computing. In this thesis, we introduce multiple decisions ordered information tables based on the concept of ordered information tables. Multiple decisions ordered information tables are used to describe the actual multiple decision attributes situation of reality. We study the process of rule extraction from multiple...
The function of proteins in the living cells varies with respect to their localizations. Extracellular plant proteins are responsible for vital functions such as nutrition acquisition, protection from pathogens, communication with other soil organisms, etc. Hence, characterizing these proteins and distinguishing them from intracellular proteins is of high interest to biologists. Nonetheless, the small...
In this paper we present a new algorithm for semisupervised clustering. We assume to have a small set of labeled samples and we use it in a clustering algorithm to discover relevant patterns. We study how our algorithm works against two other semisupervised algorithms when the data are multimodal. Then, we study the case where the user is able to produce few samples for some classes but not for each...
Most research on Internet topology is based on active measurement methods. A major difficulty in using these tools is that one comes across many unresponsive routers. Different methods of dealing with these anonymous nodes to preserve the connectivity of the real graph have been suggested. One of the more practical approaches involves using a placeholder for each unknown, resulting in multiple copies...
Recently, many commercial products, such as Google Trends and Yahoo! Buzz, are released to monitor the past search engine query frequency trend. However, little research has been devoted for predicting the upcoming query trend, which is of great importance in providing guidelines for future business planning. In this paper, a unified solution is presented for such a purpose. Besides the classical...
Humans communicate with text in thousands of languages, in dozens of scripts, in a variety of binary codes, on millions of topics. There is a need, for both government and commercial applications, to identify these text characteristics to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time...
This paper presents a method to discover the discriminative patterns or features in hyperspectral data for classification. The proposed method searches the data space along both spectral and spatial frequency axis and combines the adjacent spectral and spatial frequency bands so that a simpler but more effective feature set is achieved. The algorithm is tested on hyperspectral images of hazelnut kernels...
Sensor networks play an important role in applications concerned with environmental monitoring, disaster management, and policy making. Effective and flexible techniques are needed to explore unusual environmental phenomena in sensor readings that are continuously streamed to applications. In this paper, we propose a framework that allows to detect outlier sensors and to efficiently construct outlier...
Coastal buoys and stations provide frequent, high quality marine observations for oceanographic study, weather service, atmospheric and public safety. Sharing of the generated data sets requires tremendous efforts and coordination among the different sensor network agencies to come to a shared understanding and for dissemination in a uniform way. Syntactic standardization provides data description...
In sequential pattern mining, languages based on regular expressions (RE) were proposed to restrict frequent sequences to the ones that satisfy user-specified constraints. In these languages, REs are applied over items. We propose a much powerful language, based on regular expressions, denoted RE-SPaM, where the basic elements are constraints over the attributes of the items. Expressions in this language...
Detection of anomalies in multivariate time series is an important data mining task with potential applications in medical diagnosis, ecosystem modeling, and network traffic monitoring. In this paper, we present a robust graph-based algorithm for detecting anomalies in noisy multivariate time series data. A key feature of the algorithm is the alignment of kernel matrices constructed from the time...
Knowledge discovery from temporal, spatial and spatiotemporal data is critical for climate change science and climate impacts. Climate statistics is a mature area. However, recent growth in observations and model outputs, combined with the increased availability of geographical data, presents new opportunities for data miners. This paper maps climate requirements to solutions available in temporal,...
We present and discuss several spatiotemporal kernels designed to mine real-life and simulated data in support of drought prediction. We implement and empirically validate these kernels for support vector machines. Issues related to the nature of geographic data such as autocorrelation and directionality are investigated.
On-line data stream mining has attracted much research interest, but systems that can be used as a workbench for online mining have not been researched, since they pose many difficult research challenges. The proposed system addresses these challenges by an architecture based on three main technical advances, (i) introduction of new constructs and synoptic data structures whereby complex KDD queries...
Nowadays, small and medium enterprises (SMEs) are forced to compete on a global market and to make strategic decisions in short periods of time. In order to allow SMEs access to information technologies which can support their competition on a global scale, public administrations are fostering the setting up of digital districts. In this paper, we describe a distributed collaborative data mining platform,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.