With the explosive growth of multimedia data, the information is usually represented in multi-modal version. The cross-modal based applications attracted increasing attention in recent years, and cross-modal retrieval is the popular one of them. In this paper, we propose a semi-supervised modality-dependent cross-modal retrieval method based on coupled feature selection (Semi-CoFe). It is different from most of the previous cross-modal retrieval methods, which usually used only labeled data for training to obtain the projection matrices under the constraint of l2-norm. In details, we propagate the label of cluster centers to unlabeled data via a devised weight matrix and construct the pseudo corresponding heterogeneous data. And then we jointly considered the semantic regression and pair-wised correlation analysis when learning the mapping matrices to keep the semantic consistency and the closeness of pair-wised data. Meanwhile, the l2,1-norm constraint is used for informative and discriminative features selection and noise reduction. In addition, we learn different mapping matrices for different sub-tasks (such as, using image to search text (I2T) and using text to search image (T2I)) to distinguish the semantic information of query data, and the optimal mapping matrices are achieved via an iterative optimization method. The experimental results on three public datasets verify that the proposed method performs better than the state-of-the-art methods.