Video has become a key multimedia application in embedded systems and various standards have been developed for specific purposes. As a result, high performance and flexible functionality are required to design embedded systems for video CODEC. SIMD extension is well known as a representative approach to overcome performance bottlenecks of programmable processors, especially in the multimedia operations. This paper proposes a novel linear SIMD processing array with an intelligent local memory structure and its associated software optimization for video decoding. An entire evaluation, including component design, system integration, and cycle accurate simulation is accomplished by a system-level SoC design tool. Compared to conventional SIMD approaches, the proposed method can reduce the execution cycle by approximately 25%.