The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Health care data collections are usually characterized by an inherent sparseness due to a large cardinality of patient records and a variety of medical treatments usually adopted for a given pathology. Innovative data analytics approaches are needed to effectively extract interesting knowledge from these large collections. This paper presents an explorative data mining approach, based on a density-based...
We describe a novel method for data mining spectro-spatiotemporal network motifs from electrocorticographic (ECoG) data. The method utilizes wavelet feature extraction from ECoG data, generation of compact binary vectors from these features, and binary vector hierarchical clustering. The potential utility of this method in the discovery of recurring neural patterns is demonstrated in an example showing...
Being transmitted as part of numerous Internet services, geo location data is increasingly bringing hints of people's real-world activities into Internet traffic. This paper focuses on the discovery of key properties that motivate personal activities - locational interests. We propose and design GeoEcho, a mobile traffic analysis system that extracts and analyses a wealth of latitude-longitude geotag...
Attacks against web servers and web-based applications remain a serious global network security threat. Attackers are able to compromise web services, collect confidential information from web data bases, interrupt or completely paralyze web servers. In this study, we consider the analysis of HTTP logs for the detection of network intrusions. First, a training set of HTTP requests which does not contain...
With the legerity of the Internet, daily life of a common man has changed. Rapid growth of the Internet has a diverse effect on the daily life. The influence of the Internet has changed the way we live and even the way we think. The use of Internet for purchasing different products of the daily needs has increased exponentially in recent years. Now customers prefer online shopping for the acquisition...
As a product of Web2.0, micro-blog is developing rapidly these years. More and more information spread on the micro-blog because of its high speed and convenience, social hotspots and news events included. As a result, discovering, extraction and analyzing information become researching hotspots. By studying micro-blog text and long text cluster, this article draws a conclusion that traditional cluster...
Analysis and mining of social media has become an important research area. A challenging problem in this area consists in the identification of a group of users with similar patterns. In this paper, we propose the classification of users based on their activity profiles (e.g., periods of the day when the user is most and least active in online communications). Activity profiles can be useful for many...
Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, based on LDA Model, a new method of sentence-ranking is proposed. The method combines topic-distribution of each sentence with topic-importance of the...
In mining massive datasets, often two of the most important and immediate problems are sampling and feature selection. Proper sampling and feature selection contributes to reducing the size of the dataset while obtaining satisfactory results in model building. Theoretically, therefore, it is interesting to investigate whether a given dataset possesses a critical feature dimension, or the minimum number...
The present study proposes prediction approaches of student's grade based on their comments data. Students describe their learning attitudes, tendencies and behaviors by writing their comments freely after each lesson. The main difficulty of this research is to predict students' performance by separately using two class data in each lesson. Although students learn the same subject, there exist differences...
The rapid development of online social networks (OSN) renders them a powerful tool for information diffusion. Understanding the temporal behavior of OSN users is critical in studying the diffusion process. While there is much work on building various diffusion models to characterize the information propagation process, the diversity of OSN users' behavior patterns is seldom addressed in these models...
In this paper, we present an approach that extracts attributes of open-domain named entities for the Chinese language. The approach contains two steps. The first step consists in an unsupervised technique which captures high frequency attributes from online encyclopedias. The second step discovers uncommon attributes with low frequency. Lastly, an integrated framework is proposed to obtain attributes...
The internet and the Web 2.0 gave rise to a wide variety of user generated content. This caused a massive growth in the amount and availability of opinionated information. This collection of complex, unstructured information is often referred as Big Data. A common practical application of such Big Data is social media sentiment analysis. The general aim of sentiment analysis is to determine/extract...
Classical machine learning techniques assume the data to be i.i.d., but the real world data is inherently relational and can generally be represented using graphs or some variants of a graph representation. The importance of modeling relational data is evident from its increasing presence in many domains: Telecom networks, WWW, social networks, organizational networks, images, protein sequences, etc...
Existing algorithms of mining preferred browsing paths just consider the influence of user visiting times, but ignore the accuracy influenced by other factors. In order to solve the problem, an improved algorithm which imports page similarity and support-preference concepts is proposed. Firstly a Web-log-based user access matrix is set up. Then by calculating the angel cosine similarity and support-preference,...
This main purpose of this study is to understand the self-seeking behavior of participants in developed kiosks that provides interactive service at Huashan Creative Park in Taipei City. To understand user self-seeking patterns, log data from actual cases of interactive kiosk service were collected and analysed by web usage mining. This study analysed 5724 sessions of 8 kiosks for the month of December,...
Cluster analysis is an important and challenging subject in time series data mining. It has a very important application prospect in many areas, such as medical images, atmosphere, finance, etc. Many current clustering techniques have still many problems, for example, k-means is a very effective method in finding different shapes and tolerating noise, but its result severely depends on the suitable...
Outlier detection is an important issue in data mining and knowledge discovery. The aim is to find the patterns that deviate too much from others. In this paper, a universal outlier detection method based on normalized residual is proposed. Different from previous methods, the residual of a pattern is calculated corresponding to its nearest normal patterns, so that the interaction between outliers...
Nowadays, processing traffic flows has become an important part in intelligent transportation system (ITS). Prediction and estimation of flows, as a main application in this field, has gradually developed. Moreover, there exist some inherent relationships among various traffic flows, and the mining of related information can provide a platform for traffic flow prediction and estimation, and it can...
Clustering is an important unsupervised learning approach and widely used in pattern recognition, data mining and image processing, etc. Different from existing clustering algorithms based on partitioning within data, dominant sets clustering extracts clusters in a sequential fashion. Based on graph-theoretic concept of a cluster, dominant sets clustering can be accomplished with a game dynamics efficiently...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.