Two-stage variable selection for molecular prediction of disease

Hamed Firouzi; Bala Rajaratnam; Alfred O. Hero

doi:10.1109/CAMSAP.2013.6714034

Two-stage variable selection for molecular prediction of disease

Firouzi, Hamed, Rajaratnam, Bala, Hero, Alfred O.

Source

2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) > 169 - 172

Abstract

A two-stage predictor strategy is introduced in the context of high dimensional data (large p, small n). Here the focus application is a medical one: prediction of symptomatic infection given molecular expression levels in blood. The first stage of the two-stage predictor uses the previously introduced method of Predictive Correlation Screening (PCS) to select a subset of genes that are important in the prediction of symptom scores. Selected genes are used in the second stage to learn a predictor for the prediction of symptom scores. Under sampling budget constraints we derive the optimal sample allocation rules to the first and second stages of the two-stage predictor. Superiority of the proposed predictor relative to the well known method of LASSO is shown via experiment.