In this paper, we apply and evaluate a modified Gaussian-test-based hierarchical clustering method for high-resolution satellite images. The purpose is to obtain homogeneous clusters within each hierarchy level which later allow the classification and annotation of image data ranging from single scenes up to large satellite data archives. After cutting a given image into small patches and feature extraction from each patch, $k$-means are used to split sets of extracted image feature vectors to create a hierarchical structure. As image feature vectors usually fall into a high-dimensional feature space, we test different distance metrics, to tackle the “curse of dimensionality” problem. By using three different synthetic aperture radar (SAR) and optical image datasets, Gabor texture and Bag-of-Words (BoW) features are extracted, and the clustering results are analyzed via visual and quantitative evaluations. We also compared our approach with other classic unsupervised clustering methods. The most important contributions of this paper are the discussion and evaluation of cluster homogeneity by comparing various datasets, feature descriptors, evaluation measures, and clustering methods, as well as the analysis of the clustering performances under various distance metrics. The results show that the Gaussian-test-based hierarchical patch clustering method is able to obtain homogeneous clusters, while Gabor texture features perform better than the BoW features. In addition, it turns out that a distance parameter ranging from 1.2 to 2 performs best. Also indicated by <xref ref-type="bibr" rid="ref1">[1]</xref>, our modified G-means algorithm is faster than the original algorithm.