The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The problem of completing low-tubal-rank tensors from incomplete noisy observations is studied. To recover the underlying tensor, an iterative singular tube thresholding (ISTT) algorithm is proposed. To explore the statistical performance of the proposed algorithm, the estimation error in terms of the Frobenius norm is upper bounded non-asymptotically. The minimax optimal lower bound of the estimation...
Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise...
Similarity measure is a central problem in time series data mining. Although most approaches to this problem have been developed, with the rapid growth of the amount of data, we believe there is a challenging demand for supporting similarity measure in a fast and accurate way. In this paper, we propose a new time series representation model and a corresponding similarity measure, which is able to...
Data Jacket (DJ) is a technique for sharing information about data and for considering the potential value of datasets, with the data itself hidden, by describing the summary of data in natural language. In DJs, variables are described by variable labels (VLs), which are the names/meanings of variables. In the previous study, the matrix-based method for inferring VLs in DJs whose VLs are unknown,...
To collect and explicate meaningful knowledge of a community, we propose an Activity Model based on structured knowledge. The following issues arise related to the model development: (a) difficulties in capturing activities; (b) difficulty of acquiring knowledge; and (c) difficulty in optimizing the activities to newly adopted technologies. Therefore, we are developing technologies that use on-site...
In this study, we have developed the video based risk recognition training tool with an eye tracking device and a motion sensor. We applied the tool on the risk recognition training in a construction company and extracted features in risk recognition of expert field overseers from their eyes and utterances during the training. As the results of the examinations, typical risk recognition processes...
In this paper, we will first explain the FS (familiarity and strangeness) model as a requirement for attracting people's attention and bringing about analogical thinking. After introducing the idea of shikake (triggers for behavior change) and its requirements, we propose the inclusion of the FS model as an attribute of MoDAT in order to encourage MoDAT participants to come up with new shikake ideas.
Following the trend of big data, the business value of data is becoming a hot research field in recent years. The novel concept of Data Jacket introduced by Ohsawa et al. solved the difficult problem of data transactions due to the particular characteristic of data, i.e. the safeguarding privacy. In order to make sure the mechanism of the market of data, there are some researchers proposed a gamified...
The worldwide market for luxury and fashion goods is dominated today by a handful of multinational corporations (MNCs). The way MNCs access foreign markets and organize distribution, however, remains unclear. In this paper, based on an analysis of foreign trade statistics, we take the example of watches and provide a model to highlight the most important flows as well as regional hubs in this global...
This paper presents a temporal pattern mining method for medical data. It modifies the mining algorithms proposed by Batal et al. to incorporate with ranged relations. Experimental results demonstrate that the proposed method could generate frequent patterns with abstracted time ranges embedded in their temporal relations.
The recent progress of motion sensor system enables to the personal identification from the human behavior observed from the sensor. Kinect is a motion sensing input device developed by Microsoft for Xbox 360 and Xbox One. The personal identification using the Microsoft Kinect sensor, shortly Kinect, is presented in this study. The use of the Kinect estimates the pedestrian's body size and walk behavior...
This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at...
As the use of the Internet grows every year, e-commerce's usage does as well. There is a tough competition between companies to be able to attract customers to use their services. The design of a website is crucial to retain a customer, and a retained client is more valuable over time, so understanding what attracts the attention of a potential client on a website is really important. This work proposes...
Due to the advances of wireless sensor networks, radiofrequency identification (RFID) and Web-based services, large volume of devices have been interconnected to the Internet of Things (IoT). In addition, the tremendous number of IoT services provided by service providers arises an urgent need to propose effective recommendation methods to discover suitable services to users. In this paper, we propose...
Criminal activity in the Internet is becoming more sophisticated. Traditional information security techniques hardly cope with recent trends. Honeypots proved to be a valuable source of threat intelligence. In this work several Honeypots are combined into a Honeynet and observed exploitation attempts. The Honeynet consists of six Honeypots and was operated for 222 days. 12 million exploitation attempts...
IP Addresses are a central part of packet- and flow-based network data. However, visualization and similarity computation of IP Addresses are challenging to due the missing natural order. This paper presents a novel similarity measure IP2Vec for IP Addresses that builds on ideas from Word2Vec, a popular approach in text mining. The key idea is to learn similarities by extracting available context...
We introduce a system for automatically generating warnings of imminent or current cyber-threats. Our system leverages the communication of malicious actors on the darkweb, as well as activity of cyber security experts on social media platforms like Twitter. In a time period between September, 2016 and January, 2017, our method generated 661 alerts of which about 84% were relevant to current or imminent...
In this paper, we propose a work flow for processing and analysing large-scale tracking data with spatio-temporal marks that uses an infrastructure for machine learning methods based on a meta-data representation of point patterns. The tracking log (IP address) of cyber security devices usually maps to geolocation and timestamp, such data is called spatiotemporal data. Existing spatio-temporal analysis...
Domain generation algorithms (DGAs) automatically generate large numbers of domain names in DNS domain fluxing for the purpose of command-and-control (C&C) communication. DGAs are immune to static prevention methods like blacklisting and sinkholing. Detection of DGAs in a live stream of queries in a DNS server is referred to as inline detection. Most of the previous approaches in the literature...
This paper presents a novel approach for activity recognition from accelerometer data. Existing approaches usually extract hand-crafted features that are used as input for classifiers. However, hand-crafted features are data dependent and could not be generalized for different application domains. To overcome these limitations, our approach relies on matrix factorization for dimensionality reduction...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.