The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The following topics are dealt with: data mining; local clustering; spatiotemporal event detection; time series; Markov models; email classification; data stream; parallel mining; Bayesian network; unsupervised learning; missing values prediction; anomaly detection; decision tree; binary classifier; data similarity matrix; data mapping; support vector machine; Mapreduce; document similarity; social...
What do graphs look like? How do they evolve over time? How to handle a graph with a billion nodes? We present a comprehensive list of static and temporal laws, and some recent observations on real graphs (e.g., "eigenSpokes"). For generators, we describe some recent ones, which naturally match all of the known properties of real graphs. Finally, for tools, we present "oddball"...
Summary form only only given. We consider the problem of assessing the significance of groups in high-dimensional data. In the case of supervised classification where there are data of known origin with respect to the groups under consideration, a guide to the degree of separation among the groups can be given in terms of the estimated error rate of a classifier formed to allocate a new observation...
We address the problem of detecting characteristic patterns in communication networks. We introduce a scalable approach based on set-system discrepancy. By implicitly labeling each network edge with the sequence of times in which its two endpoints communicate, we view an entire communication network as a set-system. This view allows us to use combinatorial discrepancy as a mechanism to "observe"...
We consider the problem of local graph clustering where the aim is to discover the local cluster corresponding to a point of interest. The most popular algorithms to solve this problem start a random walk at the point of interest and let it run until some stopping criterion is met. The vertices visited are then considered the local cluster. We suggest a more powerful alternative, the multi-agent random...
Learning and identifying events in network traffic is crucial for service providers to improve their mobility network performance. In fact, large special events attract cell phone users to relative small areas, which causes sudden surge in network traffic. To handle such increased load, it is necessary to measure the increased network traffic and quantify the impact of the events, so that relevant...
Mobile context modeling is a process of recognizing and reasoning about contexts and situations in a mobile environment, which is critical for the success of context-aware mobile services. While there are prior work on mobile context modeling, the use of unsupervised learning techniques for mobile context modeling is still under-explored. Indeed, unsupervised techniques have the ability to learn personalized...
Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is...
There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of time series. Examples of such applications come from astronomy, biology, the web, and other domains. It is not unusual for these applications to involve numbers of time series in the order of hundreds of millions to billions. However, all relevant...
High accuracy sequence classification often requires the use of higher order Markov models (MMs). However, the number of MM parameters increases exponentially with the range of direct dependencies between sequence elements, thereby increasing the risk of over fitting when the data set is limited in size. We present abstraction augmented Markov models (AAMMs) that effectively reduce the number of numeric...
This paper presents a novel framework for multi-folder email classification using graph mining as the underlying technique. Although several techniques exist (e.g., SVM, TF-IDF, n-gram) for addressing this problem in a delimited context, they heavily rely on extracting high-frequency keywords, thus ignoring the inherent structural aspects of an email (or document in general) which can play a critical...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.