Reconfigurable architectures grant many circuits more flexibility as well as more efficiency. By dynamically reconnecting the datapath between calculation units, we can optimize the performance of many designs. Inspired by some prior works, we proposed a new MIMD Streaming (MIMDS) execution scheme on the aid of reconfigurable design, featuring high efficient stream processing. In this work, we also take the locality of programs into account when designing our reconfigurable architecture. Therefore, we use the permutation network [1] as our reconfigurable path, which provides less but enough reconfigurability, leading to less area cost and less power consumption. In this paper, we will take a commercial processor, C54x from Texas Instrument [2], as example, as well as detail the modification from the baseline C54x to our proposed MIMDS architecture. We show that with the extra ALUs and efficient datapath, C54x with MIMDS feature has overall 63% less execution cycles and 45% less memory access at most. Compared with traditional C54x, our design has only 12% area overhead. Besides, if we consider only configurable network, our permutation network saves 85% area compared to fully reconfigurable datapath while supports sufficient reconfigurability.