In this paper, we proposed the new system for Hajj event classification in diverse and realistic Hajj videos and image scenes is investigated based on machine learning techniques. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. The main contribution of this work is to address the limitation and investigate the use of video for automatic annotation of human event classification. The proposed system consist of three main phases. Firstly, preprocessing phase which apply shot boundary detection algorithm for Hajj videos. After that feature extraction phase applying sparse coding based on Scale Invariant Feature Transform (SIFT) features. Finally, the event classification phase by applying several machine learning techniques including the K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forests (RF) classifiers. Experiments with real data sets revealed the significant performance advantage of the machine learning techniques over the scale invariant feature transform (SIFT) features selection method. Receiver operating characteristics (ROC) analysis is used to compare classifier performance.