This paper devotes to build a novel perceptual video quality metric based on features used for semantic task and human material perception for video quality assessment (VQA). Based on the neuroscientific and psychophysical evidence, from preprocessed video information, texton, shape, contour and color can be important mid-level cues for semantic task. And based on Human Material Perception studied, bilateral filters were used to separate structural information from the texture information for material recognition. Finally several estimators are gathered together with different weights. All of these are according to the spatio-temporal tubes scheme with block-wise motion estimation.