Hidden Markov Models (HMM) (i.e. doubly stochastic probabilistic networks) have been widely used in analyzing time-series data such as those obtained from speech and molecular biology. A crucial issue in modeling time-series data using HMM, is the problem of determining the appropriate model architecture: the number of states and the links between the states. While current HMM training procedures iteratively optimize model parameters, they usually require the model configuration to be fixed. The task of model configuration is done manually by trained experts. In this paper we present a procedure that addresses the problem of automatically configuring HMM's. It starts with a large, possibly over-fitted HMM, and attempts to prune it down to the appropriate complexity fit. The procedure can be seen as a generalization of the wellknown iterative Baum-Welch Algorithm. The parameter re-estimates in our procedure can be formally derived and its local convergence can be formally proved. Compared to existing methods, our procedure offers the following advantages: (1) better convergence characteristics than the standard Baum-Welch algorithm, (2) automatic reduction of model size to the right complexity fit, (3) better generalization, and (4) relative insensitivity to the initial model size. We demonstrate these features by presenting empirical results on the problem of recognizing DNA promoter sequences.