In this paper we highlight the suitability of MDSP architecture to exploit the data, algorithmic, and pipeline parallelism offered by video processing algorithms like the MPEG-2 for real-time performance. Most existing implementations extract either data or pipeline parallelism along with Instruction Level Parallelism (ILP) in their implementations. We discuss the design of MP@ML decoding system on shared memory MDSP platform and give insights on building larger systems like HDTV. We also highlight how the processor scalability is exploited. Software implementation of video decompression algorithms provides flexibility, but at the cost of being CPU intensive. Hardware implementations have a large development cycle and current VLIW dsp architectures are less flexible. MDSP platform offered us the flexibilty to design a system which could scale from four MSPs (Media Stream Processor is a logical cluster of one RISC and two DSP processors) to eight MSPs and build a single-chip solution including the IO interfaces for video/audio output. The system has been tested on CRA2003 board. Specific contributions include the multiple VLD algorithm and other heuristic approaches like early-termination IDCT for fast video decoding.