We propose a semi-supervised learning technique to address the problem of fusing multimodal information sources for CBIR. In our approach, user's preferences in the form of reference feedback are treated as labeled data, and the key idea is to devise an on-line scheme to effectively transform the abstract semantics into useful training data for improving the query performance. Specifically, our method can be characterized with the following three advantages: 1) Kernel matrices are used to encode each modality of information so that the fusion can be conveniently carried out via boosting; 2) The base kernel matrices are derived from eigendecomposing the graph Laplacian, and further refined to satisfy a pivotal monotone property that ensures intrinsic structure will be reasonably maintained for each modality; 3) The adopted optimization criterion in boosting is to align with a target kernel matrix accounting for relevance feedback, and the learned multimodal kernel matrix can be used for training, and then for testing with those unlabeled ones in the database. To demonstrate the efficiency of the proposed framework, experimental results on CBIR are provided to illustrate several practical considerations.