The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Highly imbalanced datasets continue to be a challenge in many data mining applications. It is surprising that state-of-the-art techniques countering class imbalances are usually very computationally expensive and therefore unscalable. Most research effort has been directed into enhancing those techniques, e.g., by focusing on borderline examples or combining multiple techniques. This is usually accompanied...
Online job boards are used by millions of job seekers, who browse through the postings for jobs that match their interest. Queries are crafted using terminology generated by the users, which may not match the language used in the job postings. Semantic enrichment methods attempt to fill such a lexical gap by re-writing the queries based on richer terms, which are mined using behavioral logs. However,...
Aspect Term Extraction (ATE) detects opinionated aspect terms in sentences or text spans, with the end goal of performing aspect-based sentiment analysis. The small amount of available datasets for supervised ATE and the fact that they cover only a few domains raise the need for exploiting other data sources in new and creative ways. Publicly available review corpora contain a plethora of opinionated...
Sentiment analysis or recognizing emotions from short and noisy text from social networks such as twitter has been a challenging task. Most of the existing models use word level embeddings for the final classification of the sentiments. This paper proposes a novel representation of short text derived from a combination of word embeddings and character embeddings using Bidirectional LSTM (BiLSTM)....
The problem of completing low-tubal-rank tensors from incomplete noisy observations is studied. To recover the underlying tensor, an iterative singular tube thresholding (ISTT) algorithm is proposed. To explore the statistical performance of the proposed algorithm, the estimation error in terms of the Frobenius norm is upper bounded non-asymptotically. The minimax optimal lower bound of the estimation...
Sparse subspace clustering (SSC) is an effective approach to cluster high-dimensional data. However, how to adaptively select the number of clusters/eigenvectors for different data sets, especially when the data are corrupted by noise, is a big challenge in SSC and also an open problem in field of data mining. In this paper, considering the fact that the eigenvectors are robust to noise, we develop...
The paper considers the problem of feature selection in learning using privileged information (LUPI), where some of the features (referred to as privileged ones) are only available for training, while being absent for test data. In the latest implementation of LUPI, these privileged features are approximated using regressions constructed on standard data features, but this approach could lead to polluting...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.