Clustering is an important research direction in knowledge discovery. As the classical method in clustering, the k-median algorithm is with serious deficiency such as low efficiency, bad adaptability for large data set etc. To solve this problem, a new method named LCPD (linear clustering based on probability distribution) is proposed in this paper. The main contribution includes: (1) partitions the buckets by using the space of equal probability in the m-dimension super-cube to make the number of data items in each layer ( namely the bucket of Hash) approximate equal, gets the layering sampling with the small cost; (2) The samples under the new algorithms is with sufficient representative power for total data set; (3) proves that the complexity of the new algorithm is O(n); (4) by the comparing experiment shows that the performance of LCPD is 2 magnitude higher than traditional with the number of data set near to 10000, and the clustering quantity is increase 55% with number of data set near to 8000.