The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Telecommunications fraud, a new type of crime, is showing a rising trend in recent years. However, research from data mining perspectives to detect such frauds is scarce, especially with the behavioral sequences considered. Though the call detail records (CDRs) in telecommunication is generally a snapshot, the history of a caller/callee can be treated as sequences. Indeed, the historical calling sequences...
This paper addresses the issue of diagnosing the transfer function of mass transit hub on the basis of Automated Fare Collection data. A spatial and temporal diagnosis approach is proposed. Our investigation is focused on ‘inter-’ and ‘intra-’ stations diagnoses about passenger flows and facilities. Station transfer function definition, inter-stations diagnosis algorithm and intra-station calculation...
Traditionally the data retrieval is achieved by searching the metadata with keywords, though it is often difficult for ordinary users to express professional and precise query demands in the water industry. Regarding this issue, this paper introduces an exploratory retrieval method called faceted search by gradually recommending relevant facets to the users. Firstly, a unified modeling algorithm is...
Advanced pattern mining to extract the hidden but useful information by using proper structure is vital important for efficient information mining in large-scale practical datasets. The existing algorithms have not been capable of effective solving the fuzziness uncertainty of items and confirming the appropriate structure of studied patterns. In order to generate more proper practical patterns, a...
Data Clustering in Data Mining is a domain which never gets out of focus. Clustering a data was always an easy task but achieving the required accuracy, precision and performance was never so easy. K means being an archaic clustering algorithm got tested and experimented thousands of times with variety of datasets and other combination of algorithm due to its robustness and simplicity but what this...
With the popularity of the mobile phones and location-based social networks, rich location data has become widely available nowadays, enabling study on friendship detection based on human mobility. However, in some circumstances, limited by data collection techniques, only discrete location (such as location IDs) can be fetched which leads to methods of detecting friendship based on distance metric...
In this paper we show how the technologies associated with the evolution of Cloud computing to Dew computing can contribute to the advancing scientific computational productivity through automation. In the current big data paradigm developments, there is growing trend towards automation of data mining and other analytical processes involved in data science to increase productivity of associated applications...
The paper presents a parallel implementation of a Dynamic Itemset Counting (DIC) algorithm for many-core systems, where DIC is a variation of the classical Apriori algorithm.We propose a bit-based internal layout for transactions and itemsets with the assumption that such a representation of the transaction database fits in main memory. This technique reduces the memory space for storing the transaction...
The detection of similar code can support many software engineering tasks such as program understanding and API replacement. Many excellent approaches have been proposed to detect programs having similar syntactic features. However, some programs dynamically or statistically close to each other, which we call kindred programs, may be ignored. We believe the detection of kindred programs can enhance...
The article presents the analysis of the clustering problem formalization and considers possibilities to use the classical methods and bio-inspired methods for solving problems of the cluster analysis. In this paper we do not present full review of the new clustering methods, but identify some trends in the development of cluster analysis and special attention is given to area of bio-inspired methods...
Social networks are usually analyzed and mined without taking into account the presence of missing values. In this article, we consider dynamic networks represented by sequences of graphs that change over time and we study the robustness and the accuracy of the community detection algorithms in presence of missing edges. We assume that the network evolution can provide a complementary information...
Based on dynamic research perspectives of time slicing, this paper shows an endeavor on mining dynamic features of the scientific teams. Traditionally the static method of network structure analysis can successfully be used to analyze the distribution of network resource structure. But it cannot be used to explore the dynamic features of research groups, because of its lacks on the influence of some...
Today, we witness the appearance of many lifelogging cameras that are able to capture the life of a person wearing the camera, which produce a large number of images everyday. Automatically characterizing the experience and extracting patterns of behavior of individuals from this huge collection of unlabeled and unstructured egocentric data present major challenges and require novel and efficient...
In many organizations huge amount of data is generated. Organizations use this data for their own benefit. Data mining extracts useful knowledge from huge data. Association rule mining is a powerful technique to find hidden patterns in large database. The limitation of mining association rules is that some sensitive patterns are revealed from sensitive rules. It is necessary to hide sensitive rules...
Frequent pattern mining is playing an increasingly important role in a growing number of real-time data flow scenarios, such as large-scale order stream data, network traffic monitoring, web accessing record stream, and so on. The continuous, unbounded and high speed characteristics of massive data stream are a huge challenge for the current frequent pattern mining approach. The main challenge is...
The ability to construct domain specific knowledge graphs (KG) and perform question-answering or hypothesis generation is a transformative capability. Despite their value, automated construction of knowledge graphs remains an expensive technical challenge that is beyond the reach for most enterprises and academic institutions. We propose an end-toend framework for developing custom knowledge graph...
Methods for cleaning dirty data typically rely on additional information about the data, such as user-specified constraints that specify when a database is dirty. These constraints often involve domain restrictions and illegal value combinations. Traditionally, a database is considered clean if all constraints are satisfied. However, many real-world scenario's only have a dirty database available...
The main aim of a hospital is to provide an effective and efficient environment for the patient which supports effective utilization of resources for the diagnostic center which reduces the complexity of diagnosis process. In order to do so it is necessary to have an accurate view of care flows under consideration. In this paper we apply process mining techniques to obtain meaningful knowledge about...
In this paper, we proposed new framework for human action representation, which leverages the strengths of convolutional neural networks (CNNs) and the linear dynamical system (LDS) to represent both spatial and temporal structures of actions in videos. We make two principal contributions: first, we incorporate image-trained CNNs to detect action clip concepts, which takes advantage of different levels...
Big Data has become commonplace in most Internet-based applications, which by delivering services to planetary scale numbers of users generate very large data sets. Such data sets are considered as a valuable source of analytics information and knowledge for many purposes and domains. It is claimed each time more that Big Data and machine learning, especially data mining, are the basis for developing...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.