The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Content-centric networking (CCN) is considered to be the future Internet which moves from host-to-host model to transfer data model based on content name. Network data caching is a characteristic of CCN whose effectiveness depends on the content caching policies in the nodes. Leave copy everywhere (LCE) is the default policy in CCN which caches all content at all nodes leading to poor caching performance...
Gait recognition gradually becomes a research focus in the field of identification. Gait is a unique biometric feature and it can be measured unobtrusively in a long distance. In this paper, we study and analyze current techniques of feature extraction. The algorithms of gait recognition are also discussed. After highlighting these issues, there is a discussion about the multi-feature fusion that...
Multi-relational data, like knowledge graphs, are generated from multiple data sources by extracting entities and their relationships. We often want to include inferred, implicit or likely relationships that are not explicitly stated, which can be viewed as link-prediction in a graph. Tensor decomposition models have been shown to produce state-of-the-art results in link-prediction tasks. We describe...
In this paper, we present a new data mining framework for discovering sequence effects. In particular, we focus on the sequences consisting of actions that are taken in chronological order, like sequences of clinical procedures or marketing actions. Each sequence is associated with a binary outcome, a success or a failure. We investigate the hypothesis that certain subsequences of actions contribute...
As a novel concept, "Informed Design" is proposed in a multidisciplinary project "Livable Places" in Singapore to innovate place design from empirical to evidential by harnessing geo-referenced "Big Data" for a responsive design. As a final delivery, an Informed Design Platform (IDP) is being implemented as a design support tool interpreting multi-source big data to adaptive...
We investigate where and how key dependency structure between measures of network activity change throughout the course of daily activity. Our approach to data-mining is probabilistic in nature, we formulate the identification of dependency patterns as a regularised statistical estimation problem. The resulting model can be interpreted as a set of time-varying graphs and provides a useful visual interpretation...
Complex data analytics that involve data mining often comprise not only a single algorithm but also further data processing steps, for example, to restrict the search space or to filter the result. We demonstrate graph mining with Gradoop, the first scalable system supporting declarative analytical programs composed from multiple graph operations. We use a business intelligence example including frequent...
We introduce a powerful technique to make classifiers more reliable and versatile. Background Check equips classifiers with the ability to assess the difference of unlabelled test data from the training data. In particular, Background Check gives classifiers the capability to (i) perform cautious classification with a reject option, (ii) identify outliers, and (iii) better assess the confidence in...
NDN (Named Data Networking) is an increasingly important topic with the realm of the future Internet architecture research. Naturally, a new congestion control mechanism is important for a new Internet architecture. Existing research on NDN congestion control mainly considers the one-Interest-one-Data transport mode, internally, they control the Data sending rate by adjusting the Interest sending...
Detection of string and column delimiters is a critical first step in the automated ingestion of files containing tabular data. In this paper we present an algorithm that uses a logistic-regression classifier to evaluate whether a particular choice of delimiters is correct. The delimiter choice that is given the highest score by the classifier is chosen as the one most likely to be correct. The algorithm...
In this study, we focus on extraction of latent topic transition from POS data. POS analysis is conducted to obtain the frequent pattern of customer's behavior. The fundamental method for POS analysis is to conduct market basket analysis. By doing Market basket analysis, the sets of products that are often bought at the same time can be extracted. In market basket analysis, however, the effect of...
According to the Merriam-Webster dictionary, satire is a trenchant wit, irony, or sarcasm used to expose and discredit vice or folly. Though it is an important language aspect used in everyday communication, the study of satire detection in natural text is often ignored. In this paper, we identify key value components and features for automatic satire detection. Our experiments have been carried out...
Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are ubiquitous, and capture rich information about interactions between nodes. In this paper, we aim to utilize exactly this information to discern suspicious from typical behavior in an unsupervised fashion, lending well to the traditional scarcity of ground-truth labels in practical anomaly...
Virtual machine live migration technology allows a running VM migrates from one physical host to another with no impact on users. As the scale of distributed computing gets larger and larger, hybrid cloud, which is integrated cloud service utilizing both private and public clouds to perform distinct functions within the same organization becomes a hotspot for both academia and industry. Thus, it will...
While the topic of validating simulation models is rich in literature, validating the environments in which models run has been poorly researched. Despite the fact that such environments have high face validity, in most of the cases there are no formal methods developed for validating them. In this project, we first distinguish between the different forms of validity, such as data, model, and environment...
In Machine learning (ML) the model we use is increasingly important, and the model's parameters, the key point of the ML, are adjusted through iteratively processing a training dataset until convergence. Although data-parallel ML systems often engage a perfect error tolerance when synchronizing the model parameters for maximizing parallelism, the synchronization of model parameters may delay in completion,...
We present a city-scale crowd simulation model based on a large data set (25 million GPS data points from 28'000 volunteers recorded during a 3-day city-wide festival held in Zurich in 2013). The model is based on a spatio-temporal abstraction of the festival, focusing on event sites and event times. Thus, we assume a certain number of events (concerts, shows, etc. as it's typical at such festivals)...
Time bound sequences are constraints deemed necessary to ensure product quality and avoid yield loss due to time dependent effects. Although they are commonly applied in production system control they cause severe logistical challenges. In this paper, we evaluate the effects of time constraints in combination with batching on a real metallization work center of an opto-semiconductor fab. We use simulation...
A relational table over a set of attributes can be mapped onto a multi-dimensional array and stored as such. Such a conceptual view of relations lends itself to easy formulations of numerous analytical algorithms. This is the view taken in the representation of relations in data-warehousing to support On-Line Analytical Processing (OLAP). The main drawback of such a storage scheme is that the equivalent...
Are hybrid simulation models always beneficial? When should one modeling paradigm be used more than another? How does one know the right balance has been reached between different simulation techniques for the system under investigation? We illustrate selected insights into hybrid simulation through the use of a discrete event simulation (DES) model and a hybrid DES agent based model (ABM) of the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.