Soccer video semantic analysis has attracted a lot of researchers in the last few years. Many methods of machine learning have been applied to this task and have achieved some positive results, but the neural network method has not yet been used to this task from now. Taking into account the advantages of Convolution Neural Network(CNN) in fully exploiting features and the ability of Recurrent Neural Network(RNN) in dealing with the temporal relation, we construct a deep neural network to detect soccer video event in this paper. First we determine the soccer video event boundary which we used Play-Break(PB) segment by the traditional method. Then we extract the semantic features of key frames from PB segment by pre-trained CNN, and at last use RNN to map the semantic features of PB to soccer event types, including goal, goal attempt, card and corner. Because there is no suitable and effective dataset, we classify soccer frame images into nine categories according to their different semantic views and then construct a dataset called Soccer Semantic Image Dataset(SSID) for training CNN. The sufficient experiments evaluated on 30 soccer match videos demonstrate the effectiveness of our method than state-of-art methods.