In this paper, we propose a framework that fuses textual and visual features of user generated social media data to mine the distribution of user interests. The proposed framework consists of three steps: feature extraction, model training, and user interest mining. We choose boards from popular users on Pinterest to collect training and test data. For each pin we extract the term-document matrices as textual features, bag of visual words as low-level visual features, and attributes as mid-level visual features. Representative features are then selected for training topic models using discriminative latent Dirichlet allocation (DLDA). In performance evaluation, pins collected from popular users are used to evaluate the classification accuracy and pins collected from other common users are used to evaluate the recommendation performance. Our experimental results show the efficacy of the proposed method.