This paper presents a novel sequential variational inference algorithm for distributed multi-sensor tracking and fusion. The algorithm is based on a multi-sensor target representation where a target is represented jointly by its states at different sensors and a global state fusing all sensor data. A tree-structured graphical model is adopted to model the dependencies between these states at a time instant. In contrast to previous work, most of which is based on belief propagation, we propose an alternative variational inference algorithm that combines importance sampling techniques and variational methods for graphical models to infer the multi-sensor target states over time. In particular, the sequential variational inference algorithm distributes the global inference to each node in graphical models. Via a message-passing scheme similar to BP, the inference processes at different nodes collaborate so that each integrates information from all sensors. One contribution of this paper is the proper design of an importance function for generating samples to approximate the target distributions. Experiments on a synthetic example show that our method achieves results comparable to BP-based methods.