The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Survival prediction on time-to-event data associated with patients is crucial in clinical research. Cox-type regression models are widely used for such prediction, but their performance for practical survival prediction suffers due to their use of a maximum partial likelihood estimator, which undermines the effectiveness and robustness of such models. To address this problem, we propose to maximize...
The paper deals with determining the neural network model uncertainty for the purpose of robust controller design. The approach presented in the paper is based on the application of optimum experimental design for the choice of sequences providing the most informative data during the training of neural network. As a criterion quantifying the quality of training process a measure operating on the Fisher...
Linear modeling is the most common used statistical technique to discover hidden relationship between underlying random variables of interests because of its simplicity and interpretability. In this paper, we utilize linear models to study glycosylated hemoglobin, which a measure of the disease of diabetes. We want to find which other predictors or indicators have the most influential power on glycosolated...
The traditional random sample consensus (RANSAC) algorithm is capable of estimating a model with fewer data points and almost unaffected by noise. There are several drawbacks of such algorithm including detection errors, unstable threshold and massive calculation. By analyzing the spatial relations of graphics pixels, a hypothetical circle is firstly formed with three hypothetical points which are...
Neighborhood Covering Reduction (NCR) is an effective tool to learn rules from structural data for classification. However, the existing neighborhood covering model is not robust enough. A neighborhood is constructed according to the nearest heterogeneous samples. This strategy over focuses on the boundary samples and makes the model sensitive to noise. To tackle this problem, we proposed a Rough...
This paper develops and analyzes a randomized design for robust Principal Component Analysis (PCA). In the proposed randomized method, a data sketch is constructed using random row sampling followed by random column sampling. The proposed randomized approach is shown to bring about substantial savings in complexity and memory requirements for robust subspace learning over conventional approaches that...
Kernel principal component analysis (kPCA) learns nonlinear modes of variation in the data by nonlinearly mapping the data to kernel feature space and performing (linear) PCA in the associated reproducing kernel Hilbert space (RKHS). However, several widely-used Mercer kernels map data to a Hilbert sphere in RKHS. For such directional data in RKHS, linear analyses can be unnatural or suboptimal. Hence,...
In this paper, we propose a general framework to detect burned area using multiple partially occluded Moderate Resolution Imaging Spectroradiometer (MODIS) images. By treating each MODIS image as the superimposition of 3 layers: background, burned area, and cloud, we first apply a low-rank and sparse matrix decomposition technique known as Robust Principal Component Analysis (RPCA) to separate cloud...
In this paper, a new active learning scheme is proposed for linear regression problems with the objective of resolving the insufficient training data problem and the unreliable training data labeling problem. A pool-based active regression technique is applied to select the optimal training data to label from the overall data pool. Then, compressive sensing is exploited to remove labeling errors if...
This paper studies the problem of crack detection in images characterized by high gradient backgrounds. We propose an extension of a Marked Point Process model which has been successfully used for wrinkle detection. We show that our method exhibits state of the art results on a difficult image dataset, by proposing a robust trade-off between local analysis approaches, which exploit a limited amount...
This paper considers using deep neural networks for handwritten Chinese character recognition (HCCR) with arbitrary position, scale, and orientations. To solve this problem, we combine the recently proposed spatial transformer network (STN) with the deep residual network (DRN). The STN acts like a character shape normalization procedure. Different from the traditional heuristic shape normalization...
This paper addresses the problem of identifying signals of interest from discrete-time sequences contaminated by erroneous segments, which we define as the part of time series whose dynamic patterns are inconsistent with that of the signals. Assuming the signals of interest consist of consecutive samples with arbitrary starting point, duration and following a stationary dynamic pattern, we propose...
Regularization plays an important role in machine learning systems. We propose a novel methodology for model regularization using random projection. We demonstrate the technique on neural networks, since such models usually comprise a very large number of parameters, calling for strong regularizers. It has been shown recently that neural networks are sensitive to two kinds of samples: (i) adversarial...
In this paper, we propose an online multi-view clustering algorithm, OMVC, which deals with large-scale incomplete views. We model the multi-view clustering problem as a joint weighted NMF problem and process the multi-view data chunk by chunk to reduce the memory requirement. OMVC learns the latent feature matrices for all the views and pushes them towards a consensus. We further increase the robustness...
In this paper we propose a two-stage algorithm for robust K-subspaces recovery. In the first stage, a large number of local candidate subspaces are generated by probabilistic farthest insertion, and then the initial near-optimal K-subspaces are solved by combinatorial selection with randomized greedy method. In the second stage, the K-subspaces are further refined by assigning each data vector to...
Motivated by real applications, heterogeneous learning has emerged as an important research area, which aims to model the co-existence of multiple types of heterogeneity. In this paper, we propose a HEterogeneous REpresentation learning model with structured Sparsity regularization (HERES) to learn from multiple types of heterogeneity. HERES aims to leverage two kinds of information to build a robust...
Outlier detection algorithms are often computationally intensive because of their need to score each point in the data. Even simple distance-based algorithms have quadratic complexity. High-dimensional outlier detection algorithms such as subspace methods are often even more computationally intensive because of their need to explore different subspaces of the data. In this paper, we propose an exceedingly...
Multi-label learning is widely applied in many tasks, where an object possesses multiple concepts with each represented by a class label. Previous studies on multi-label learning have focused on a fixed set of class labels, i.e., the class label set of test data is the same as that in the training set. In many applications, however, the environment is open and new concepts may emerge with previously...
Heterogeneous events, which are defined as events connecting strongly-typed objects, are ubiquitous in the real world. We propose a HyperEdge-Based Embedding (Hebe) framework for heterogeneous event data, where a hyperedge represents the interaction among a set of involving objects in an event. The Hebe framework models the proximity among objects in an event by predicting a target object given the...
This paper approaches the problem of geometric multi-model fitting as a data segmentation problem which is solved by a sequence of sampling, model selection and clustering steps. We propose a sampling method that significantly facilitates solving the segmentation problem using the Normalized cut. The sampler is a novel application of Markov-Chain-Monte-Carlo (MCMC) method to sample from a distribution...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.