The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Subspace learning techniques for text analysis, such as latent semantic indexing (LSI), have been widely studied in the past decade. However, to our best knowledge, no previous study has leveraged the rank information for subspace learning in ranking tasks. In this paper, we propose a novel algorithm, called learning latent semantics for ranking (LLSR), to seek the optimal latent semantic space tailored...
Dimension reduction for large-scale text data is attracting much attention lately due to the rapid growth of World Wide Web. We can consider dimension reduction algorithms in two categories: feature extraction and feature selection. An important problem remains: it has been difficult to integrate these two algorithm categories into a single framework, making it difficult to reap the benefit of both...
The exponential growth of text documents available on the Internet has created an urgent need for accurate, fast, and general purpose text classification algorithms. However, the "bag of words" representation used for these classification methods is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. In order to deal with this problem,...
Feature selection on multi-label documents for automatic text categorization is an under-explored research area. This paper presents a systematic document transformation framework, whereby the multi-label documents are transformed into single-label documents before applying standard feature selection algorithms, to solve the multi-label feature selection problem. Under this framework, we undertake...
Many text processing applications adopted the bag of words (BOW) model representation of documents, in which each document is represented as a vector of weighted terms or n-grams, and then the cosine distance between two vectors is used as the similarity measurement. Although the great success in information retrieval and text categorization, the conventional BOW model ignores the detailed local text...
We propose a novel algorithm for extracting diverse topic phrases in order to provide summary for large corpora. Previous works often ignore the importance of diversity and thus extract phrases crowded on some hot topics while failing to cover other less obvious but important topics. We solve this problem through document re-weighting and phrase diversification by using latent semantic analysis (LSA)...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.