The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we present a dynamic clustering algorithm that efficiently deals with data streams and achieves several important properties which are not generally found together in the same algorithm. The dynamic clustering algorithm operates online in two different time-scale stages, a fast distance-based stage that generates micro-clusters and a density-based stage that groups the micro-clusters...
Clusters are well recognized regardless of their shape and of the dimensionality of the space in which they are embedded in traditional CFSFDP (Clustering by fast search and find of density peaks). But when large-scale dataset is processed, it takes too long time to calculate the distance between two data points. In this paper, we present a novel MapReduce-based CFSFDP clustering algorithm called...
The class of density-based clustering algorithms excels in detecting clusters of arbitrary shape. DBSCAN, the most common representative, has been demonstrated to be useful in a lot of applications. Still the algorithm suffers from two drawbacks, namely a non-trivial parameter estimation for a given dataset and the limitation to data sets with constant cluster density. The first was already addressed...
One of the more challenging real-world problems in computational intelligence is to learn from non-stationary streaming data, also known as concept drift. Perhaps even a more challenging version of this scenario is when - following a small set of initial labeled data - the data stream consists of unlabeled data only. Such a scenario is typically referred to as learning in initially labeled nonstationary...
This paper extends our previous work on deriving meaningful storm patterns from very large rainfall data. In an earlier work, we described MapReduce-based algorithms to identify three types of the storms: local, hourly and overall storms. In general, local storms have temporal characteristics of the storms at a particular site, hourly storms have spatial characteristics of the storms at a particular...
We propose a framework for Twitter events detection, differentiation and quantification of their significance for predicting spikes in sales. In previous approaches, the differentiation between Twitter events has mainly been done based on spatial, temporal or topic information. We suggest a novel approach that performs clustering of Twitter events based on their shapes (taking into account growth...
In this paper, we present a new approach of distributed clustering for spatial datasets, based on an innovative and efficient aggregation technique. This distributed approach consists of two phases: 1) local clustering phase, where each node performs a clustering on its local data, 2) aggregation phase, where the local clusters are aggregated to produce global clusters. This approach is characterised...
Spectral variability, unrelated to the purity of endmembers, can change the geometry of the dataspace and affect conventional methods used to identify endmembers. Several methods have been developed to identify and extract endmember bundles representing the spectral variability within each endmember class. These methods, however, operate on the geometry of the dataspace. In addition, they commonly...
The number of devices capable of measurement Power Quality (PQ) parameters is increasing continuously in all voltage levels. Consequently, the amount of available PQ data is also growing very fast. These data contain a lot of valuable information about the behavior of PQ, but up to now it is in the most cases used only to assess compliance with limits (e.g. EN 50160 in Europe). Beside long-term characteristics...
Leukemia is routinely diagnosed by light microscopic images. However, pathologists' criteria for a disease diagnosis from images are mostly qualitative and empirical in nature. Reports suggest that though leukemia is a cancer of leukocytes; however, there are morphological alterations of red blood cells (RBCs) under the condition of leukemia. This has been evident by in observation of ultra-structural...
This paper presents a method for human action recognition from depth sequence. First, we subdivided the normalized motion energy vector into a set of segments, whose corresponding frame indices are used to partition a video. Then each sub-action is represented by three Depth Motion Maps (DMMs) to capture motion cues in three orthogonal projection views. Multi-scale Histogram of Oriented Gradients...
In this paper an approach to deal with Predictive Maintenance (PdM) problems with time-series data is discussed. PdM is a important approach to tackle maintenance and it is gaining an increasing attention in advanced manufacturing to minimize scrap materials, downtime, and associated costs. PdM approaches are generally based on Machine Learning tools that require the availability of historical process...
Venn & Euler diagrams are well-defined mathematical diagram types, which are the major representation methods of Set Theory. Although understanding of different diagram types such as charts and coordinate graphs has been addressed, no research has been done for Venn and Euler diagram interpretation from an image. Venn and Euler Diagrams exist in various media types such as printed format in books,...
Data stream is relatively new and emerging domain in the current era of Internet advancement. Clustering data streams is equally important and difficult because of the numerous hurdles attached to it. A number of algorithms have been proposed to offer solutions for efficient clustering. Grid-based clustering approach was adopted few years ago to overcome the limitations of conventional partition-based...
Line graphs are abundant in scholarly papers. They are usually generated from a data table and that data can not be accessed. One important step in an automated data extraction pipeline is the curve separation problem: segmenting the pixels into separate curves. Previous work in this domain has focused on raster graphics extracted from scholarly PDFs, whereas most scholarly plots are embedded as vector...
The polygonal representation method is put forward in the paper. The method is based on recursion and boundary division which describes the shape of the incomplete and overlapped weed seeds. The method extracts the contour shape features as local features using the scale space method. The local features are irrelevant to the position and orientation, at the same time, meet the scale, rotation and...
Interaction experience in multimedia systems can be improved by adding personalization. Current applications for building and animating characters to represent real users are typically based on pose and motion detection. For so doing, computer vision algorithms do not exploit the anatomical characteristics of the human body for improving their classification accuracy. This work presents an strategy...
Nowadays, since urban planning and landscape simulations are required to get the "characteristics" of target buildings, especially in structures, the reconstruction of 3D building models, necessary for visualization on a computer, becomes crucial. In 3D landscape modeling, a 3D building model that can well represent reality is always the upmost goal. We have already proposed an automatic...
In the Era of Information, Extracting useful information out of massive amount of data and process them in less span of time has become crucial part of Data mining. CURE is very useful hierarchical algorithm which has ability to identify cluster of arbitrary shape and able to identify outliers. In this paper we have implemented CURE clustering algorithm over distributed environment using Apache Hadoop...
Data mining has gained much importance in the field of research these days. It makes perfect blend for analyzing data of any fields and provide decision based output. Data generation and storage these days are done at high speed. Non stationary systems play holistic role in providing such data. Availability of such data creates scope of analysis for researchers. Such data which are continuous, unbounded,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.