In image annotation, the annotation words are expected to represent image content at both visual level and semantic level. However, a single word sometimes is ambiguous in annotation, for example, ”apple” may refer to a fruit or a company. However, when ”apple” combines with ”phone” or ”fruit”, it will be more semantically and visually consistent. In this paper, we attempt to find this kind of combination and construct a less ambiguous phrase-level lexicon for annotation. First, concept-based image search is conducted to obtain a semantically consistent image set (SC-IS). Then, a hierarchical clustering algorithm is adopted to visually cluster the images in SC-IS to obtain a semantically and visually specific image set (SVC-IS). Finally, we apply a frequent itemset mining in SVC-IS to construct the phrase-level lexicon and associate the lexicon into a probabilistic annotation framework to estimate annotation words of any untagged images. Our experimental results show that the discovered phrase-level lexicon is able to improve the annotation performance.