This paper presents an on chip SIMD co-processor (OCSC) architecture which is managed efficiently by a small-size application specific adaptive double buffer and targeting at computation-intensive video compression applications. Being tightly coupled to a main RISC processor, the proposed SIMD co-processor is composed of a control unit (CU), an 8 × 8 array of processing units (PUs), a bus interface unit (BIU), and double sets of application specific adaptive buffer (ADB). ADS brings the ability to hide memory access latency effectively such that while PUs are performing computation using values stored in one buffer, BIU and CU are loading the other buffer. Enabling row/column majoring order processing and diagonal based concurrent broadcasting, BIU achieves the high-speed computations of video compression. We evaluate the effect of memory latency hiding brought by ADB analytically and its optimal size on the proposed architecture for DCT application. The result shows that the memory latency hiding achieves 29.5% performance improvement for DCT with 512 bytes of small-size buffer over the non-buffered system.