The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The primary failure mechanism in brittle materials such as ceramics, granite and some metal alloys is through the presence of defects which result in crack formation and propagation under the application of load. We are interested in studying this process of crack propagation, interaction and coalescence, which degrades the strength of the specimen. Traditionally, engineering applications that study...
Networks (i.e., graphs) appears in many high-impact applications. Often these networks are collected from different sources, at different times, at different granularities. In this talk, I will present our recent work on mining such multiple networks. First, we will present several new data models, whose key idea is to leverage networks as context to connect different data sets or different data mining...
In this paper we propose a new method to anonymize (share relevant and detailed information while not naming names) and protect data sets (minimize the utility loss) based on Factor Analysis. The method basically consists of obtaining the factors, which are uncorrelated, protecting them and undoing the transformation in order to get interpretable protected variables. We first show how to proceed when...
This paper explores recent achievements and novel challenges of the annoying privacy-preserving big data stream mining problem, which consists in applying mining algorithms to big data streams while ensuring the privacy of data. Recently, the emerging big data analytics context has conferred a new light to this exciting research area. This paper follows the so-depicted research trend.
The data anonymization landscape has become quite complex in the last decades. On the methodology side, the statistical disclosure control methods designed in official statistics have been supplemented by a number of privacy models proposed by computer scientists. On the data side, static data sets now coexist with big data, and particularly data streams. In the quest for a unified and conceptually...
Set-valued data is comprised of records that are sets of items, such as goods purchased by each individual. Methods of publishing and widely utilizing set-valued data while protecting personal information have been extensively studied in the field of privacy-preserving data publishing. Until now, basic models such as k-anonymity or km-anonymity could not cope with attribute inference by an adversary...
Cloud users have little visibility into the performance characteristics and utilization of the physical machines underpinning the virtualized cloud resources they use. This uncertainty forces users and researchers to reverse engineer the inner workings of cloud systems in order to understand and optimize the conditions their applications operate. At Massachusetts Open Cloud (MOC), as a public cloud...
Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) setting,...
The hashtag recommendation problem addresses recommending (suggesting) one or more hashtags to explicitly tag a post made on a given social network platform, based upon the content and context of the post. In this work, we propose a novel methodology for hashtag recommendation for microblog posts, specifically Twitter. The methodology, EmTaggeR, is built upon a training-testing framework that builds...
Social Media allows people to post widely and share the posted online-items. Such items gain their popularity by the amount of attention received. Thus, studies on modeling the arrival process of attention to an individual item have recently attracted a great deal of interest. In this paper, we propose, by combining a Dirichlet process with a Hawkes process in a novel way, a probabilistic model, called...
Even while engaged in an attention-consuming activity such as watching TV, social media users often end up paying attention to one or more social media. This is an example of a behavioral phenomenon called Continuous Partial Attention (CPA). Quantification of user attention can be a valuable metric in understanding user behavior under scenarios where their attention is divided. In this study, we propose...
Public entities such as companies and politicians increasingly use online social networks to communicate directly with their constituencies. Often, this public messaging is aimed at aligning the entity with a particular cause or issue, such as the environment or public health. However, as a consumer or voter, it can be difficult to assess an entity’s true commitment to a cause based on public messaging...
In this paper, we propose and evaluate the application of unsupervised machine learning to anomaly detection for a Cyber-Physical System (CPS). We compare two methods: Deep Neural Networks (DNN) adapted to time series data generated by a CPS, and one-class Support Vector Machines (SVM). These methods are evaluated against data from the Secure Water Treatment (SWaT) testbed, a scaled-down but fully...
This paper presents detailed anomaly detection evaluation on operational time-series data of Internet of Things (IoT) based household devices in general and Heating, Ventilation and Air Conditioning (HVAC) systems in specific. Due to the number of issues observed during evaluation of widely used distance-based, statistical-based, and cluster-based anomaly detection techniques, we also present a pattern-based...
Cyber-physical systems - systems that incorporate physical devices with cyber components - are appearing in diverse applications, and due to advances in data acquisition, are accompanied with large amounts of data. The interplay between the cyber and the physical components leaves such systems vulnerable to faults and intrusions, motivating the development of a general model that can efficiently and...
In this paper, we propose an online spatiotemporal data-driven methodology to detect malicious cyber attacks that target power system balancing and frequency control. The anomaly detection, which spots abnormal generator behavioral patterns in real time, is achieved locally at a power plant with peer to peer communication capability. We mainly consider the data integrity attack targeting Automatic...
The special characteristics of time series data, such as their high dimensionality and complex dependencies between variables make the problem of detecting anomalies in time series very challenging. Anomalies and more precisely dependency anomalies ensue from the temporal causal depen-dencies. Furthermore the graphical Granger causal models provide an appropriate environment to capture all the temporal...
Exponential growth in electronic health record (EHR) data has resulted in new opportunities and urgent needs to discover meaningful data-driven representations and patterns of diseases, i.e., computational phenotyping. Recent success and development of deep learning provides promising solutions to the problem of prediction and feature discovery tasks, while lots of challenges still remain and prevent...
Data-driven analytics and decision-making have been essential for numerous applications in our society. To transform the data into a source of rich intelligence and support decision-making, data-driven analytics often need to aggregate intelligence from multiple sources and disaggregate signals into significant constituents. Though many existing approaches perform these two tasks respectively, there...
Lung cancer is one of the most common types of cancer originated from malignant lung nodules. Early detection of lung nodule is key in prevention of lung cancer. In this paper, we developed an online content-based image retrieval (CBIR) system to assist novice radiologists in identifying lung nodules. The system takes advantages of cloud computing and deep learning to retrieve similar lung nodules...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.