The evolution of wireless communication protocols drives the quest of power-efficient and flexible computing for embedded DSPs, but popular architectures, very-long-instruction-word (VLIW) and application-specific instruction set processor (ASIP), serve as opposite extreme cases in regard to power-efficiency and flexibility. To this end, we present DeAr: Dual-thread Architecture DSP, which manipulates a multi-banked register file that enables simultaneous multi-threading (SMT), and a transport-triggered bus that exploits the data forwarding mechanism in its compact datapath. We also propose a novel scheduling algorithm which leverages the compact hardware to achieve both high throughput and flexible computation. In the experiment of common DSP kernels, DeAr saves 20.3%–13.1% and 31.8%–2.2% of power dissipation, 36.1%–31.5% and 28.2%–5.7% of area, compared with VLIW and ASIP respectively.