This paper proposes an automatic semantic video content indexing and retrieval system based on fusing various low level visual and shape descriptors. Extracted features from region and sub-image blocks segmentation of video shots key-frames are described via IVSM signature (Image Vector Space Model) in order to have a compact and efficient description of the content. Static feature fusion based on averaging and concatenation are introduced to obtain effective signatures. Support Vector Machines (SVM) and neurals network (NNs) are employed to perform classification. The task of the classifiers is to detect the video semantic content. Then, classifiers outputs are fused using neural network based on evidence theory (NN-ET) in order to provide a decision on the content of each shot. The experimental results are conducted in the framework of soccer video feature extraction task.