The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Information is becoming the most precious resource. Mining out implied information from unstructured data has become essential in several project situations, and this itself is a complex process. Anonymity in data makes the data mining process much more difficult. Data Anonymization techniques, as being used in areas such as Cryptography and Cloud computing, aim to limit the possibilities of unnecessary...
The context of this paper is to come up with an analytical query model for data categorization within DBMS. DBMS being the asset for most of the organizations, classification can help in getting better insight and control over the data. Conventionally, classification algorithms like logistic regression, KNN, etc. are applied after exporting the data out of DBMS, using non DBMS tools like R, matrix...
Supervised methods for inferring gene regulatory networks (GRNs) perform well with good training data. However, when training data is absent, these methods are not applicable. Unsupervised methods do not need training data but their accuracy is low. In this paper, we combine supervised and unsupervised methods to infer GRNs using time-series gene expression data. Specifically, we use results obtained...
How can we rank users in signed social networks? Relationships between nodes in a signed network are represented as positive (trust) or negative (distrust) edges. Many social networks have adopted signed networks to express trust between users. Consequently, ranking friends or enemies in signed networks has received much attention from the data mining community. The ranking problem, however, is challenging...
The Levy Walk (or Levy flight) is a concept fromBiomathematics to describe the hunting–behaviour of manypredatory species. It is a very efficient way to find prey in avery short time frame. We now want to use this concept ina clustering–context to – if you so will – "hunt" for clusters. We describe how we convert this concept into an efficient wayto find cluster centres by linking the data...
In this research, we have developed a model for predicting the profitability class of a movie namely "Profit" and "Loss" based on the data about movies released between the years 2010 and 2015. Our methodology considers both historical data as well as data extracted from the social media. This data is normalized and then given a weight using standard normalization techniques. The...
We propose a new approach to anomaly detection from multivariate noisy sensor data. We address two major challenges: To provide variable-wise diagnostic information and to automatically handle multiple operational modes. Our task is a practical extension of traditional outlier detection, which is to compute a single scalar for each sample. To consistently define the variable-wise anomaly score, we...
Authorship identification is a problem of data mining and classification. There are numerous methods and algorithms have been published to understand its nature. Although, researchers still investigate best and simple solutions due to its heterogeneous and multilingual characteristics. This study introduced new authorship identification process based on artificial neural network (ANN) model using...
Anomaly or outlier detection is a fundamental task of data mining and widely used in various application domains. The main aim of anomaly detection is to identify all the data points with significant deviation from other normal data points. Mining the outliers become more challenging in environments where data is received at extreme pace. Such environments demand detection of outliers on-the-fly mode...
Local Process Model (LPM) discovery is focused on the mining of a set of process models where each model describes the behavior represented in the event log only partially, i.e. subsets of possible events are taken into account to create socalled local process models. Often such smaller models provide valuable insights into the behavior of the process, especially when no adequate and comprehensible...
An e-commerce website provides a platform for merchants to sell products to customers. While most existing research focuses on providing customers with personalized product suggestions by recommender systems, in this paper, we consider the role of merchants and introduce a parallel problem, i.e., how to select the most valuable customers for a merchant? Accurately answering this question can not only...
How to succinctly represent the truly relevant information in big data graphs? The approach presented in this paper aims to discover hidden graph structures and exploit them to compactly summarize large graphs. First, we show that some special graph classes such as cliques and bicliques can be represented efficiently as Pseudo-Boolean (PB) constraints. Then, we propose three new graph classes representable...
The rapid spread of mobile internet and location-acquisition technologies have led to the increasing popularity of Location-Based Social Networks(LBSNs). Users in LBSNs can share their life by checking in at various venues at any time. In LBSNs, identifying home locations of users is significant for effective location-based services like personalized search, targeted advertisement, local recommendation...
The paradigm of drug discovery has moved from finding new drugs that exhibit therapeutic properties for a disease to reusing existing approved drugs for a newer disease. The association between a drug and a disease involves a complex network of targets and pathways. In order to provide new insights, there has been a constant need for sophisticated tools that have the potential to discover new associations...
Bursty behavior normally indicates that the workload generated by data accesses happens in short time, uneven spurts. In order to handle the bursts, the physical resources of IT devices have to be configured to offer capability which goes far beyond the average resource utilization, thus satisfying the performance. However, this kind of fat provisioning incurs wasting resources when the system does...
Genome-wide association studies (GWASs) have received an increasing attention to understand genotype-phenotype relationships. In this paper, we study how to build Bayesian networks from publicly released GWAS statistics to explicitly reveal the conditional dependency between single-nucleotide polymorphisms (SNPs) and traits. The key challenge in building a Bayesian network is the specification of...
In this paper, a self-healing scheme in active distribution network (ADN) with inverter-based distributed generators (IBDGs) based on multi-agent and big data is proposed. The multi-agent system (MAS), big data storage and mining technology are used to accomplish fault discrimination, fault localization, isolation and service restoration. In this paper, the use of a new type of the relay which takes...
One of the most important tools for studying fluid flow behavior in oil and gas reservoirs is reservoir simulation. It is constructed based on a comprehensive geological information. A comprehensive numerical reservoir model has tens of millions of grid blocks. Therefore, it becomes computationally expensive and time consuming to run the model for different reservoir simulation scenarios. There are...
Association rules mining is a data mining technique that seeks interesting associations between attributes from massive high-dimensional categorical feature spaces. However, as the dimensionality gets higher, the data gets sparser which results in the discovery of a large number of association rules and makes it difficult to understand and to interpret. In this paper, we focus on a particular type...
The problem of data deluge is prevailing everywhere. Analyzing voluminous and variety of data is a great challenge to the researchers. The MapReduce framework is adapted to many computational methodologies to overcome these issues. Clustering is one of the most commonly used data mining techniques in various pattern analysis applications. This paper is mainly focuses on quality based data clustering...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.