Most approaches for image retrieval are based on the bag-of-visual-word (BoV) representation. However, the BoV model typically ignores the spatial information that is crucial for visual representation. Zhang, et al. [1] proposed an approach to encode spatial information into Bo V using Geometry-preserving Visual Phrases (GVP). They found 2-GVP gives the best results in general, while the performance of high order GVP (length> 2) decreases because it encodes too much spatial constraint. Although high order GVP performs worse in general, it can often generate a better top 10 retrieved images than 2-GVP because near duplicate images usually correspond with a strong spatial constraint with the query image. Based on this observation, we propose an approach to merge the result of multiple orders of GVP to encode more spatial information reasonably in the searching step (M-GVP). Experiment results on the Oxford 5K dataset show that M-GVP can stably improve the general retrieval accuracy, and particularly give a better top 10 ranked images compared with GVP method.