The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Research focus increases rapidly in recent years in mining imbalanced data sets, because of its challenge and its extensive application in the real world. A dataset is said to be imbalanced, if the representation of attribute categories are not approximately even. All the existing classifiers are inclined to perform poorly on imbalanced datasets. Hence it is very essential to go for well balanced...
The opinion mining is very much essential in e-commerce websites, furthermore advantageous with individual. An ever increasing amount of results are stored in the web as well as the amount of people would acquiring items from web are increasing. As a result, the users' reviews or posts are increasing day by day. The reviews toward shipper sites express their feeling. Any organization for example,...
Correctness and completeness are the two major factors in the medical field to take the accurate decision for the treatment in a span of time. Automated Patient Records (APR) will help to the Health Management Organization (HMO) to take the decision on any specific disease. Among the huge APR's Retrieving the data is very important to HMOs. Proposed Collocation Rules in the spatial data mining will...
Watermarking is the process of embedding information into a digital signal which may be used to verify its authenticity or the identity of its owners which can further resolve the pilfering of analytical properties. Watermarking plays major role in quality retrieval of data or the message embedded in the image. The process of data retrieval is very much critical as many stumbling blocks are there...
Graphs are widely used to represent many differentkinds of real world data such as social networks, protein-proteininteractions, and road networks. In many cases, each node in agraph is associated with a set of its attributes and it is criticalto not only consider the link structure of a graph but also usethe attribute information to achieve more meaningful results invarious graph mining tasks. Most...
Mining subgraph patterns is an active area of research. Existing research has primarily focused on mining all subgraph patterns in the database. However, due to the exponential subgraph search space, the number of patterns mined, typically, is too large for any human mediated analysis. Consequently, deriving insights from the mined patterns is hard for domain scientists. In addition, subgraph pattern...
High-dimensional and sparse (HiDS) matrices are commonly encountered in many big data-related industrial applications like recommender systems. When acquiring useful patterns from them, non-negative matrix factorization (NMF) models have proven to be highly effective because of their fine representativeness of non-negative data. However, current NMF techniques suffer from a) inefficiency in addressing...
The problem considered in this paper is regression with a constraint on the precision of each prediction in the framework of data streams subject to concept drifts (when the hidden distribution which generates the observations can change over time). Concept drifts can diminish the reliability of the predictions over time and it might not be possible to output a prediction which satisfies the constraints...
The Levy Walk (or Levy flight) is a concept fromBiomathematics to describe the hunting–behaviour of manypredatory species. It is a very efficient way to find prey in avery short time frame. We now want to use this concept ina clustering–context to – if you so will – "hunt" for clusters. We describe how we convert this concept into an efficient wayto find cluster centres by linking the data...
With the increasing popularity of online review sites, developing methods to mine and analyze information contained in the vast amounts of noisy user-generated reviews becomes a necessity. In this work, we develop a method to uncover the various aspects of a product or service reviewed by a user, and the opinions associated with them, in an automated fashion. We use the neural network model Word2Vec...
In this paper we present a novel Markov Switching generative model for continuous multivariate time series and longitudinal data based on Gaussian copula functions. We assume that the values of the multivariate time series at every time slice are sampled out of a joint probability distribution that is selected by the latent state. The use of Gaussian copula functions give the flexibility of individual...
In this paper, we present a new data mining framework for discovering sequence effects. In particular, we focus on the sequences consisting of actions that are taken in chronological order, like sequences of clinical procedures or marketing actions. Each sequence is associated with a binary outcome, a success or a failure. We investigate the hypothesis that certain subsequences of actions contribute...
Many kinds of real world data can be modeled by a heterogeneous information network (HIN) which consists of multiple types of objects. Clustering plays an important role in mining knowledge from HIN. Several HIN clustering algorithms have been proposed in recent years. However, these algorithms suffer from one or moreof the following problems: (1) inability to model general HINs, (2) inability to...
We develop a warped correlation finder to identify correlated user accounts in social media websites such as Twitter. The key observation is that humans cannot be highly synchronous for a long duration, thus, highly synchronous user accounts are most likely bots. Existing bot detection methods are mostly supervised, which requires a large amount of labeled data to train, and do not consider cross-user...
The Market of Data is an environment where data are reasonably deal with. Some data in the market of data are large and hard to analyze. How to efficiently analyze and organize such large scale data in the market of data is a difficult problem. When using Hadoop to analyze these massive data, if input data of a data mining task are not locally available in a processing node, data have to be migrated...
As a novel concept, "Informed Design" is proposed in a multidisciplinary project "Livable Places" in Singapore to innovate place design from empirical to evidential by harnessing geo-referenced "Big Data" for a responsive design. As a final delivery, an Informed Design Platform (IDP) is being implemented as a design support tool interpreting multi-source big data to adaptive...
Complex data analytics that involve data mining often comprise not only a single algorithm but also further data processing steps, for example, to restrict the search space or to filter the result. We demonstrate graph mining with Gradoop, the first scalable system supporting declarative analytical programs composed from multiple graph operations. We use a business intelligence example including frequent...
Clustering vertices in graphs or in sequences of graphs has important applications in network science, bioinformatics, and other areas. Most research to date has focused on static graphs or sequences where the number of vertices does not change. We propose a new algorithm that successfully partitions the vertices of a graph sequence into smooth clusters, even when the number of vertices is allowed...
Skypatterns are an elegant answer to the pattern explosion issue, when a set of measures can be provided. Skypatterns for all possible measure combinations can be explored thanks to recent work on the skypattern cube. However, this leads to too many skypatterns, where it is difficult to quickly identify which ones are more important. First, we introduce a new notion of pattern steadiness which measures...
In this work, we propose Max-Node sampling, a novel sampling algorithm for data collection. The goal of Max-Node is to maximize the number of nodes observed in the sample, given a budget constraint. Max-Node is based on the intuition that networks contain many densely connected regions (i.e., communities), that may be only weakly connected to another, and to maximize the number of nodes observed,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.