Hidden Markov models (HMM) have proved their success in several research areas, especially in speech recognition field. However, the major drawback of HMM classifier, is its sensitiveness to some initial parameters such as the number of states, which need to be tuned carefully. In fact, it is well known that the number of states suitable for a certain utterance may not perform as well for other utterances of the same class. To deal with this problem, and in order to take into consideration some levels of data variability, we investigate a new hybrid framework for speech recognition, in which we integrate the HMM classifier within the K-nearest neighbors (KNN) architecture. In this framework, we propose to build several HMMs differing in their numbers of states to represent each class of data, and to use KNN rule to decide the K nearest models and the most represented class using Viterbi likelihood as similarity measurement. In order to remove ambiguity during the decision step, we propose two different methods. The proposed framework is evaluated using the UCI Spoken Arabic Digit dataset. The obtained results show the effectiveness of our approach either when compared to HMM and KNN baseline or when compared to previous works on the same dataset.