The problems of hand detection have been widely addressed in many areas, e.g. human computer interaction environment, driver behaviors monitoring, etc. However, the detection accuracy in recent hand detection systems are still far away from the demands in practice due to a number of challenges, e.g. hand variations, highly occlusions, low-resolution and strong lighting conditions. This paper presents the Multiple Scale Faster Region-based Convolutional Neural Network (MS-FRCNN) to handle the problems of hand detection in given digital images collected under challenging conditions. Our proposed method introduces a multiple scale deep feature extraction approach in order to handle the challenging factors to provide a robust hand detection algorithm. The method is evaluated on the challenging hand database, i.e. the Vision for Intelligent Vehicles and Applications (VIVA) Challenge, and compared against various recent hand detection methods. Our proposed method achieves the state-of-the-art results with 20% of the detection accuracy higher than the second best one in the VIVA challenge.