Bag-of-words model is one of the most widely used methods in the recent studies of multimedia data retrieval. The key idea of the bag-of-words model is to quantize the bag of local features, for example SIFT, to a histogram of visual words and then standard information retrieval technologies developed from text retrieval can be applied directly. Despite its success, one problem of the bag-of-words model is that the two key steps, i.e., feature quantization and retrieval, are separated. In other words, the step of generating bag-of-words representation is not optimized for the step of retrieval which often leads to a sub-optimal performance. In this paper we propose a statistical framework for large-scale near-duplication image retrieval which unifies the two steps by introducing kernel density function. The central idea of the proposed method is to represent each image by a kernel density function and the similarity between the query image and a database image is then estimated as the query likelihood. In order to make the proposed method applicable to large-scale data sets, we have developed efficient algorithms for both estimating the density function of each image and computing the query likelihood. Our empirical studies confirm that the proposed method is not only more effective but also more efficient than the bag-of-words model.