Combining heterogeneous systems has been shown to provide significant improvement in the spoken term detection (STD) task. However, there has been little research into why the system combination improves STD performance. In this paper, we analyze the heterogeneousness of the systems by calculating the correlation between their scores and evaluating the effectiveness of the combined subword-based systems. Here, we investigate both heterogeneous detection schemes and heterogeneous subword units, using a test-bed of NTCIR-10 task. Experimental analysis shows that the higher improvement rates can be achieved by combining the more heterogeneous systems which are with lower correlation each other, that is, with lager amount of complementary information. Compared with the highest performance among each individual system to be combined, a parallel combination of heterogeneous subword units improves the STD performance by 13.59%, and the system with an efficient cascaded combination of heterogeneous subword units and heterogeneous detection schemes improves by 12.79%. Finally, the state-of-the-art performance of 74.07 average maximum F-measure on the NTCIR-10 task can be achieved by the combination of heterogeneous subword units and heterogeneous detection schemes.