This paper proposes a new approach to model arm pose configuration from color images based on the learned features and arm part structure constraints. It aims to model human arm pose without assuming of a particular clothing style, action category and background. It uses an energy model that describes the dependence relationships among arm joints and parts. A joint convolutional neural network (J-CNN) based on multi-scaled images is then developed for feature extraction of joints and parts, where the local rigidity of arm part is used to constrain the occurrence between the joints and arm parts in a dynamic programming inference. The experimental results show better performance than alternative approaches using hand-crafted features for arm pose modeling.