The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A mechanism for identifying bandings in large "zero-one" N-dimensional data sets, using a sampling technique, is presented. The challenge of identifying bandings in data is the large number of potential permutations that need to be considered. To circumvent this a banding score mechanism is proposed that avoids the need to consider large numbers of permutations. This has been incorporated...
In this paper we accelerate the Alternating Least Squares (ALS) algorithm used for generating product recommendations on the basis of implicit feedback datasets. We approach the algorithm with concepts proven to be successful in High Performance Computing. This includes the formulation of the algorithm as a mix of cache-optimized algorithm-specific kernels and standard BLAS routines, acceleration...
Linkedln is the largest professional network with more than 350 million members. As the member base increases, searching for experts becomes more and more challenging. In this paper, we propose an approach to address the problem of personalized expertise search on LinkedIn, particularly for exploratory search queries containing skills. In the offline phase, we introduce a collaborative filtering approach...
Volatility analysis plays a major role in finance and economics. It is the key input for many financial topics including risk management, option and derivative pricing. One pressing computational hurdle in high frequency financial statistics is the tremendous amount of data and the optimization procedures that require computing power beyond the currently available desktop systems. In this article,...
NoSQL distributed databases have been devised to tackle the challenges resulting from volume, velocity and variety of big data. Graph representation of datasets requires efficient distributed linear algebra operations for large sparse matrix constructed from big data. Storing the transformed matrix into the database not only speeds up the big data analysis process but also facilitates the computation...
Diffusion Tensor Imaging (DTI) is an effective tool for the analysis of structural brain connectivity in normal development and in a broad range of brain disorders. However efforts to derive inherent characteristics of structural brain networks have been hampered by the very high dimensionality of the data, relatively small sample sizes, and the lack of widely acceptable connectivity-based regions...
An enduring issue in higher education is student retention to successful graduation. To further this goal, we develop a system for the task of predicting students' course grades for the next enrollment term in a traditional university setting. Each term, students enroll in a limited number of courses and earn grades in the range A-F for each course. Given historical grade data, our task is to predict...
The widespread use of social media and the internet are emerging trends that offer an additional interaction channel for companies to better understand customer sentiments about their brands and products. Sentiment analysis uses text data from social media such as customer comments and reviews, which has the nature of high dimensionality. Without selection, typically there are at least thousands of...
Many systems have been developed for machine learning at scale. Performance has steadily improved, but there has been relatively little work on explicitly defining or approaching the limits of performance. In this paper we describe the application of roofline design, an approach borrowed from computer architecture, to large-scale machine learning. In roofline design, one exposes ALU, memory, and network...
In many applications, we may want to build a classifier with high confidence, while reducing the number of features. We consider the case where features are assigned to predefined groups and cannot be removed individually. An additional and important constraint is that the datasets may be very large and may not fit in memory. We use logistic regression with group penalty, which results in sparse solutions...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.