Due to advancement in digital multimedia technologies, huge amount of video contents in such arenas as sports, surveillance, news, etc., is becoming widely available. Due to factors like presence of inactive frames, inter-view dependencies and frequent illumination changes, whole video content may not be important. Moreover, users may not have the adequate time to access/ store the entire video in real time. We propose a machine learning ensemble method to summarize the events in multiview videos, as a solution to the aforementioned issues. In order to make the accurate decision on keyframes, we trained our ensembles using meta approach where the inter-dependency and illumination changes of views are considered in the training phase. Experimental results on two benchmark datasets show that our model outperforms the state-of-the-art models with the best Recall and F-measure. The computing analysis of the model also indicates that it meets all requirements of real-time applications.