In this paper, we study the achievable rate of the training-based multi-input multi-output (MIMO) systems over time-varying flat fading channels modeled with the $L$-th order autoregressive, AR($L$), process. Using the Bayesian Cram\'{e}r-Rao lower bound (BCRB) to characterize the mean squared error of channel estimation, the achievable rate of the MIMO systems is investigated from the information-theoretical perspective. The optimum lengths for the training and the data blocks are determined to maximize the achievable rate. Besides, by modeling the time-varying wireless fading channel with a proper AR($L$) model for the specific normalized Doppler frequency, $f_dT_s$, the aforementioned results can be extended to characterize the achievable rate in a more realistic time-varying wireless fading channel.