This paper discusses human preference learning by robot partners through interaction with a person. We use a robot music player; miuro, and we focus on the music selection for providing the person with comfortable sound field. First, we propose a control architecture of miuro based on autonomous behavior mode, interactive behavior mode, and human control mode. Next, we propose a learning method of the relationship between human position and its corresponding music selection based on Q-learning. Furthermore, we proposed a similarity matrix to reduce the learning time of Q-learning. The experimental results show that the proposed method can learn the relationship between human position and its corresponding human preferable music.