Shot is a basic unit of content-based video retrieval and indexing. Relevant shots are typically grouped into a high-level unit called a scene. Browsing and retrieval in these scenes enables users to locate their desired video segments quickly and efficiently. This paper introduces a novel algorithm for clustering relevant shots into a scene using the semantic concept vectors defined by us and formed by N binary classifiers based on support vector machine (SVM). At first, the video clips are segmented into the shot. The shot key frames are extracted and the color and texture features of the shot key frames are computed. Then the N trained binary classifiers are used to classify the shot key frames into different semantic classes by means of their color and texture features. So the semantic concept vectors of the shot key frames can formed. The semantic concept vectors are used to cluster the shots into the scenes in our algorithm. Experimental results have indicated that the recall and precision of our algorithm is higher than the algorithm of SIM and ToC.