The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Predicting ad click-through rates is the core problem in display advertising, which has received much attention from the machine learning community in recent years. In this paper, we present an online learning algorithm for click-though rate prediction, namely Follow-The-Regularized-Factorized-Leader (FTRFL), which incorporates the Follow-The-Regularized-Leader (FTRL-Proximal) algorithm with per-coordinate...
Semantic Knowledge is usually adding into topic model to improve topic coherence. However, it's hard to judge whether semantic information is related to topic without using complicated lexical characteristics. In this paper, we demonstrate a novel model called Cloud Transformation Model, which can easily judge whether semantic information is related to topic, and integrate semantic information into...
Big data is a broad data set that has been used in many fields. To process huge data set is a time consuming work, not only due to its big volume of data size, but also because data type and structure can be different and complex. Currently, many data mining and machine learning technique are being applied to deal with big data problem; some of them can construct a good learning algorithm in terms...
Understanding bike trip patterns in a bike sharing system is important for researchers designing models for station placement and bike scheduling. By bike trip patterns, we refer to the large number of bike trips observed between two stations. However, due to privacy and operational concerns, bike trip data are usually not made publicly available. In this paper, instead of relying on time-consuming...
Geospatial data volume exceeds hundreds of Petabytes and is increasing exponentially mainly driven by images/videos/data generated by mobile devices and high resolution imaging systems. Fast data discovery on historical archives and/or real time datasets is currently limited by various data formats that have different projections and spatial resolution, requiring extensive data processing before analytics...
In a regular retail shop the behavior of customers may yield a lot to the shop assistant. However, when it comes to online shopping it is not possible to see and analyze customer behavior such as facial mimics, products they check or touch etc. In this case, clickstreams or the mouse movements of e-customers may provide some hints about their buying behavior. In this study, we have presented a model...
We review in this paper several methods from Statistical Learning Theory (SLT) for the performance assessment and uncertainty quantification of predictive models. Computational issues are addressed so to allow the scaling to large datasets and the application of SLT to Big Data analytics. The effectiveness of the application of SLT to manufacturing systems is exemplified by targeting the derivation...
The advent of social networks and Internet-of-Things has resulted in unprecedented capability of collecting, sharing and analyzing massive amounts of data. From a security perspective, Big Data may seriously weaken confidentiality, as techniques for improving Big Data analytics performance-including early fusion of heterogeneous data sources — increase the hidden redundancy of data representation,...
Nowadays most metro advertising systems schedule advertising slots on digital advertising screens to achieve the maximum exposure to passengers by exploring passenger demand models. However, our empirical results show that these passenger demand models experience uncertainty at fine temporal granularity (e.g., per min). As a result, for fine-grained advertisements (shorter than one minute), a scheduling...
Many industries are applying various methods for optimizing energy use across the manufacturing life cycle. These methods are either physics-based or data-driven. Manufacturing systems generate a vast amount of data from operations and in simulations. Advances in data collection systems and data analytics (DA) tools have enabled the development of predictive analytics for energy prediction. Many of...
Generating the maximum number of visual patterns by uncovering the entire space of possible visual designs remains a challenge within the construction process of information visualization. Users interact with different mindsets consisting of design, data analysis, application development, and hardware resource usage. Therefore, they desire a flexible and productive interface that keeps them clued...
Volatility analysis plays a major role in finance and economics. It is the key input for many financial topics including risk management, option and derivative pricing. One pressing computational hurdle in high frequency financial statistics is the tremendous amount of data and the optimization procedures that require computing power beyond the currently available desktop systems. In this article,...
Current state-of-the-art in big social data analytics is largely limited to graph theoretical approaches such as social network analysis (SNA) informed by the social philosophical approach of relational sociology. This paper proposes and illustrates an alternate holistic approach to big social data analytics, social set analysis (SSA), which is based on the sociology of associations, mathematics of...
The comprehensive and innovative evaluation of climate models with newly available global observations is critically needed for the improvement of climate model current-state representation and future-state predictability. A climate model diagnostic evaluation process requires physics-based multi-variable analyses that typically involve large-volume and heterogeneous datasets, making them both computation-...
In this paper, we present a method to dynamically predict the failure of physiological subsystems from patients admitted to the Intensive Care Unit (ICU) using heterogeneous data. We model the probability of failure in each subsystem as a latent state that evolves over time. We propose a method using Generalized Linear Dynamic models to model this latent state which is updated each time new patient...
Prediction of a spindle's health is of critical significance in a manufacturing environment. Unexpected breakdowns in a spindles functioning can lead to high costs and production delays. Therefore, developing methods which can predict the time-to-failure of a spindle and its bearings can be of significant importance. One of the main challenges for successful prediction by a purely data-driven techniques...
Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and aviation. Support vector regression (SVR) is a popular technique for modeling the input-output relations of a set of variables under the added constraint of maximizing the margin, thereby leading to a very generalizable and regularized model. However, for a dataset...
Readmissions to a hospital after procedures are costly and considered to be an indication of poor quality. As Per the Affordable Care Act of 2010, hospitals may be reimbursed at a reduced rate for patients readmitted to a hospital within 30 days of discharge. In this project, we used statistical and machine-learning methods to analyze the Nationwide Inpatient Sample dataset provided by HCUP (Healthcare...
In this paper, we propose a new method for addressing post-purchase recommendations for a dynamic marketplace. The proposed method uses the transactional data as the primary data source to mine co-purchase relationships. The item listings from the transactional data are mapped to their static ‘cluster’ representation and a cluster-cluster directed graph is generated. Clusters have explicit definitions...
In this paper, we propose an architectural design and software framework for fast development of descriptive, diagnostic, predictive, and prescriptive analytics solutions for dynamic production processes. The proposed architecture and framework will support the storage of modular, extensible, and reusable Knowledge Base (KB) of process performance models. The approach requires the development of automatic...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.