Data imbalance problem often exists in our real life dataset, especial for massive video dataset, however, the balanced data distribution and the same misclassification cost are assumed in traditional machine learning algorithms, thus, it will be difficult for them to accurately describe the true data distribution, and resulting in misclassification. In this paper, the data imbalance problem in semantic extraction under massive video dataset is exploited, and enhanced and hierarchical structure (called EHS) algorithm is proposed. In proposed algorithm, data sampling, filtering and model training are considered and integrated together compactly via hierarchical structure algorithm, thus, the performance of model can be improved step by step, and is robust and stability with the change of features and datasets. Experiments on TRECVID2010 Semantic Indexing demonstrate that our proposed algorithm has much more powerful performance than that of traditional machine learning algorithms, and keeps stable and robust when different kinds of features are employed. Extended experiments on TRECVID2010 Surveillance Event Detection also prove that our EHS algorithm is efficient and effective, and reaches top performance in four of seven events.