While deep neural networks (DNNs) have become the dominant acoustic model (AM) for speech recognition systems, they are still dependent on Gaussian mixture models (GMMs) for alignments both for supervised training and for context dependent (CD) tree building. Here we explore bootstrapping DNN AM training without GMM AMs and show that CD trees can be built with DNN alignments which are better matched to the DNN model and its features. We show that these trees and alignments result in better models than from the GMM alignments and trees. By removing the GMM acoustic model altogether we simplify the system required to train a DNN from scratch.