The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One of the sub areas of the data mining includes sequential pattern mining. This mining algorithm is to find the repeating patterns after mining the sequence databases. These are used to find the relation between the various items in the data for different purposes. As these data keep changing according to the change in time, mining should be done on incremented or updated database to obtain the frequent...
Subspace clustering has typically been approached as an unsupervised machine learning problem. However in several applications where the union of subspaces model is useful, it is also reasonable to assume you have access to a small number of labels. In this paper we investigate the benefit labeled data brings to the subspace clustering problem. We focus on incorporating labels into the k-subspaces...
Distributed computations on graphs gained importance with the emergence of large graphs, e.g., in the web or social networks. Frameworks like Hadoop, Giraph and Spark are used for their processing. Yet, they require advanced programming techniques to minimize skew and data shuffling. Declarative, query-like, but at the same time efficient solutions like Pig for general purpose analytics are lacking...
Imputation of missing attribute values in medical datasets for extracting hidden knowledge from medical datasets is an interesting research topic of interest which is very challenging. One cannot eliminate missing values in medical records. The reason may be because some tests may not been conducted as they are cost effective, values missed when conducting clinical trials, values may not have been...
Distributed computing, data availability and data analytics supporting for strategic decision making are the essential key requirements for success of any organization business. These features are the leading frontiers in the current business and in research which opens lot of expectations by the end users. This paper attempts a design methodology for distributing the current data warehouse features...
The facility location problem is a well-known challenge in logistics that is proven to be NP-hard. In this paper we specifically simulate the geographical placement of facilities to provide adequate service to customers. Determining reasonable center locations is an important challenge for a management since it directly effects future service costs. Generally, the objective is to place the central...
In batch systems monitoring information at the level of individual jobs is crucial to optimize resource utilization and prevent misusage. However, especially the usage of network resources is difficult to track. In order to understand usage patterns in modern computing clusters, a more detailed monitoring than existent solutions is required. A monitoring on job level leads to dynamic graphs of processes...
In real world, social networks are large scale, noisy and evolutionary. Communities are inherent characteristics of human interaction in social networks. Tracking evolutionary communities in dynamic social networks has become an increasingly important research topic. Several classic incremental clustering and evolutionary clustering algorithms have been proposed. But they all face a problem of controlling...
Clustering high dimensional datasets is challenging due to the curse of dimensionality. One approach to address this challenge is to search for subspace clusters, i.e., clusters present in subsets of attributes. Recently the cartification algorithm was proposed to find such subspace clusters. The distinguishing feature of this algorithm is that it operates on a neighborhood database, in which for...
Identifying regions of interest (ROIs) in images is a very active research problem as it highly depends on the types and characteristics of images. In this paper we present a comparative evaluation of unsupervised learning methods, in particular clustering, to identify ROIs in solar images from the Solar Dynamics Observatory (SDO) mission. With the purpose of finding regions within the solar images...
In the industry of email marketing, it is important to send content relevant to the recipient. If the recipients are uninterested they may ignore the email or worse report it as spam. Such actions compromise the ability of the senders to deliver emails to the inboxes of other recipients and permanently harm their relationship with the uninterested recipients. Targeting highly engaged recipients with...
How can we retrieve meaningful information from a large and sparse graph?. Traditional approaches focus on generic clustering techniques and discovering dense cumulus in a network graph, however, they tend to omit interesting patterns such as the paradigmatic relations. In this paper, we propose a novel graph clustering technique modelling the relations of a node using the paradigmatic analysis. We...
With the help of Internet, Massive Open Online Courses (MOOC) are recognized as a new path to learn courses via the web instead of in the traditional classrooms. MOOC can break many limits such as distance, time, participants, on the traditional courses. At the same time, it brings some new issues, such as high drop out ratio. Nowadays increasing MOOC courses are available and even more common people...
In the Web of data, entities are described by interlinked data rather than documents on the Web. In this work, we focus on entity resolution in the Web of data, i.e., identifying descriptions that refer to the same real-world entity. To reduce the required number of pairwise comparisons, methods for entity resolution perform blocking as a pre-processing step. A blocking technique places similar entity...
In the last decade the nature inspired algorithms have gained a lot of popularity in solving complex optimization problems. Partitional clustering deals with the optimization of data points from the cluster centroids to classify a dataset into several groups (clusters). In this paper we introduce clustering as an optimization problem and solve it with a recently developed natural meta-heuristic League...
In the field of neuropsychiatrie disorders, it is known that brain segmentation is important for both detection and diagnosis. The segmentation of the brain, which leads to the computation of brain volume proved to be vital in the detection of many brain pathology having Computed Tomography (CT) scan as the primary modality. Due to the fact that Fuzzy c-Means (FCM) proven to be robust, it is often...
Localisation defines the process of determining the topographical location of sensor nodes in wireless sensor network. Current localisation method is space oriented which adopts GPS (Global Positioning System) signal to yield the location information of the nodes. The GPS signals are suited for outdoor environments, however they fail to work indoors, which makes the need to opt for terrestrial localisation...
Image segmentation plays an important role in analyzing medical images. Brain tumor detection is one of the applications that require brain image segmentation. Due to the complex nature of brain magnetic resonance images (MRI), the accurate computer aided detection (CAD) system for brain tumor segmentation has a lot of advantages over manual segmentation as it requires a lot of time and its results...
The focus of this paper is on multitask learning over adaptive networks where different clusters of nodes have different objectives. We propose an adaptive regularized diffusion strategy using Gaussian kernel regularization to enable the agents to learn about the objectives of their neighbors and to ignore misleading information. In this way, the nodes will be able to meet their objectives more accurately...
DNS has been increasingly abused by adversaries for cyber-attacks. Recent research has leveraged DNS failures (i.e. DNS queries that result in a Non-Existent-Domain response from the server) to identify malware activities, especially domain-flux botnets that generate many random domains as a rendezvous technique for command-&-control. Using ISP network traces, we conduct a systematic analysis...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.