Vector processing is gaining attention for supporting multimedia workloads, particularly small subword vectors. In this paper we propose a novel vector instruction set combining the benefits of subword parallelism and traditional vector processing. We also develop a simple cache prefetching optimisation that exploits the two dimensional data access pattern of multimedia MPEG2 video applications. The architecture parameter space is explored by a simple analytical study. The analysis is complemented by detailed simulation of the actual system where it is shown that the optimised cache removes 75% of the misses and the instruction set performance is equivalent to a subword instruction with double the word size on the average.