We propose a computational model for estimating scalable visual sensitivity profile (SVSP) of video, which is a hierarchy of saliency maps that simulates the bottom-up and top- down attention of the human visual system (HVS). The bottom- up process considers low level stimulus-driven visual features such as intensity, color, orientation and motion. The top-down process simulates the high level task-driven cognitive features such as finding human faces and captions in the video. The nonlinear addition model has been used for integrating low level visual features. A full center-surrounded receptive field profile is introduced to provide spatial scalability of the model. Due to the hierarchical nature, the proposed SVSP can be directly used to augment the visual quality of codings with spatial scalability. To justify the effectiveness of the proposed SVSP, extended experiments of its application in visual quality assessment are conducted.