This paper presents a novel simultaneous multithreading (SMT) VLIW DSP architecture with dynamic dispatch mechanism to address the challenge of the underutilization of computing resources in the non-unit assumed latency (NUAL) VLIW DSPs. The SMT technology exploits the unused instruction slots by converting the thread-level parallelism to the instruction-level parallelism, improving the efficiency. With the specifically designed registers for eliminating the horizontal dependencies among the execution-packet, the NUAL VLIW DSP architecture supports issuing any subset of instructions of the execution-packet based on the availability of the corresponding functional units. With the dynamic dispatch mechanism, the DSP issues instructions to functional unit at run-time rather than at compile-time, such that the issue conflicts among multiple threads are reduced significantly. The new VLIW DSP architecture is implemented and evaluated, and the results show that the architecture can effectively increase the processor throughput, hide the cache miss latencies, and improve the performance on digital signal processing.