In this work, we study a novel approach of deep neural machine translation to find linkage between multimodal brain imaging data, such as structural MRI (sMRI) and functional MRI (fMRI). The idea is to consider two different imaging views of the same brain like two different languages conveying some common concepts or facts. An important aspect of the translation model is an attention network module that learns alignment between features from fMRI and sMRI. We use independent component analysis (ICA) based features for the translation model. Our study shows significant group differences between healthy controls and patients with schizophrenia in the learned alignments. Furthermore, this novel approach reveals a group differential relation between a cognitive score (attention and vigilance) and alignments that could not be found when individual modality of data were considered.