The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Frequent itemset mining is a fundamental step in analysis of big data where correlation among the raw data in deemed necessary. In modern era the amount of data available for processing has grown exponentially, making it a stepper task for mining algorithms to provide solution in a timely manner. The software implementations are normally not efficient in handling such datasets thus focus on parallel...
Flatness is one of the most important specifications for strip products in cold rolling processes. Shape control of cold rolled product is often characterized as a complex process with multiple operation conditions, multi-variables, time-varying parameters, strong coupling and nonlinearity. Accurate online shape defect diagnosis is still a difficult task. This paper proposed a frequent pattern mining...
The paper presents a parallel implementation of a Dynamic Itemset Counting (DIC) algorithm for many-core systems, where DIC is a variation of the classical Apriori algorithm.We propose a bit-based internal layout for transactions and itemsets with the assumption that such a representation of the transaction database fits in main memory. This technique reduces the memory space for storing the transaction...
Software developers often need to repeat similar modifications in multiple different locations of a system's source code. These repeated similar modifications, or systematic edits, can be both tedious and error-prone to perform manually. While there are tools that can be used to assist in automating systematic edits, it is not straightforward to find out where the occurrences of a systematic edit...
Frequent pattern mining across streaming data i s a challenging task. It require real time response and incurs great computational complexity. In this paper, we discuss challenges of developing frequent pattern mining algorithms for streaming data, compare three algorithms proposed in literature and explore scope of improvement in the algorithms. We discuss the suitability of these algorithms according...
Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits. Although a number of relevant approaches have been proposed in recent years, but they incur the problem of producing a large number of candidate itemsets for high utility itemsets. Such a large number of candidate itemsets degrades the mining performance in terms of execution...
Frequent pattern mining is playing an increasingly important role in a growing number of real-time data flow scenarios, such as large-scale order stream data, network traffic monitoring, web accessing record stream, and so on. The continuous, unbounded and high speed characteristics of massive data stream are a huge challenge for the current frequent pattern mining approach. The main challenge is...
Methods for cleaning dirty data typically rely on additional information about the data, such as user-specified constraints that specify when a database is dirty. These constraints often involve domain restrictions and illegal value combinations. Traditionally, a database is considered clean if all constraints are satisfied. However, many real-world scenario's only have a dirty database available...
Given a programming problem, because of a variety of data structures and algorithms that can be applied and different tradeoffs, such as space-time, to be considered, there may be many distinct solutions. By comparing his/her solution against others’ and learning from the distinct solutions, a learner may quickly improve programming skills and gain experience in making trade-offs. Meanwhile, on the...
Due to large scale and complexity of big data, mining the big data using a single personal computer is a difficult problem. With increasing in the size of databases, parallel computing systems can cause considerable advantages in the data mining applications by means of the exploitation of data mining algorithms. Parallelization of association rule mining algorithms is an important task in data mining...
In recent years, big data has become an important resource of the information society, and the focus of research in all walks of life is to extract the effective information from big data with the greatest possibility. On the other hand, with the increase of the complexity of network data, the problem of cyber security becomes more and more serious. Protocol identification technology is an effective...
Frequent Itemset Mining is one of the most investigated fields of data mining. It is expensive to mine frequent itemsets for a large scale data set. Especially when some data is added into the data set, it is still time-consuming from the scratch to re-compute the complete data set to update the frequent itemsets of the data set. Aiming to improve the performance of frequent itemset mining for large...
Frequent Itemset Mining (FIM) is the most important and time-consuming step of association rules mining. With the increment of data scale, many efficient single-machine algorithms of FIM, such as FP-growth and Apriori, cannot accomplish the computing tasks within reasonable time. As a result of the limitation of single-machine methods, researchers presented some distributed algorithms based on MapReduce...
The traditional intrusion detection technology is mostly based on the needs of Web log, using a single data mining to improve the algorithm analysis, which cannot be used in an unknown environment of zero-knowledge rule database, and the efficiency of detecting the potential threats and abnormal behavior is not significant. Therefore, the Paper proposes an intrusion detection system based on data...
With the continuous expansion of the data stream applications, the data stream frequent pattern mining is becoming a hot research topic in the field of data mining, the domestic and foreign scholars put forward a large number of data stream frequent itemsets mining algorithms. This paper improves the related definitions of frequent itemsets and sliding windows, classifies sliding windows from data...
Maximal frequent itemset is the largest frequent itemset in a database which is not covered by other itemsets. All frequent itemsets can be built up from maximal one. Moreover, it is possible to focus on any part of the maximal frequent itemset to supervise Data Mining. Bees' Algorithm is simple, robust and population-based stochastic optimization algorithm which is based on bees' natural foraging...
Many existing approaches to data cube computation search for the group-by partitions on fact table with support greater than some threshold, that is, those can be obtained from the SQL group-by queries with the clause HAVING COUNT(*) >= supp, where supp is a support threshold. Those partitions constitute what is called the iceberg data cube. The present work proposes an efficient method to compute...
Customer retention in Telecom market is a big research challenge in developed as well as developing economies as the market is almost saturated as well as competitive with large number of local and global service providers. It is also well known that from business point of view retaining an existing customer is much less costly than acquiring a new one. Hence retaining existing customer by making...
Cloud computing proposes a policy to user where the data to be retrieved can be swapped between the user and the server. The information being given to a third party server comprises confidencial threats as users with fragile computational power cannot validate the correctness of the data that are grouped. This paper, aims at the broken itemsets, in which the server is not reliable and outbursts the...
Generally, the medical datasets are heterogeneous and large dimensional that contains a million of patient records. Extracting information from such datasets is a tedious process, which can be made easier by some of the clustering algorithms available in data mining. In this paper, three clustering algorithms such as Medical Storage Platform for data Mining (MSPM), Homogeneity Similarity based Hierarchical...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.