Online visual tracking with high-order pooling

Xiyu Yan; Bo Ma

doi:10.1109/ICME.2017.8019349

Online visual tracking with high-order pooling

Source

2017 IEEE International Conference on Multimedia and Expo (ICME) > 289 - 294

Abstract

Most local sparse representation models in visual tracking generally contain three components: 1) extracting local descriptors from target region, 2) encoding the extracted local descriptors as mid-level features, 3) aggregating statistics of mid-level features into a signature. Since the last step aggregates only first-order statistics of mid-level features, it is named as First-order Pooling (FP). However, FP lacks highorder statistical information of target. Hence, it couldn't reflect the correlation of features, which leads to poor tracking performance. In this paper, we introduce an appearance model for visual tracking that conducts High-order Pooling (HP) over mid-level features under the framework of sparse coding. Instead of first-order signature, we find that higher-order statistics of mid-level features with additional image information could bring large tracking performance gains. Moreover, a simple but effective updating scheme is adopted to improve the tracker adaptability. Experiments on various challenging videos show that the tracking performance with appearance model using HP is superior to those using FP.