Configurable coprocessors have been an active area for some time. The limitation of word length of instruction set and the number of operands in a single instruction have become a potential performance bottleneck for traditional SIMD extension. In this paper, we use LEON-2 as the host platform and present a novel low-cost architecture with extended shadow_f registers. In each extended instruction, some shadow_f registers are introduced to provide a copy of results received in the writeback stage, which can efficiently reduce the time of data transfer between LEON-2 and the coprocessor. Analysis of our proposed architecture shows that only partial replication of the whole register file is needed to mitigate the bandwidth limitation. At the same time, the proposed vector arithmetic unit is proved to be highly compatible to the required calculation patterns in the integer version of MELP algorithm. The application of our approach implemented on Stratix II FPGA show a promising speedup (up to 3.85X to some dominant kernels) with only 16% area increment.